UDF error with Spark 2.4 on scala 2.12

2019-01-17 Thread Andrés Ivaldi
Hello I'm having problems with UDF, I was reading a bit about it , and it look's like a closure issue, but I don't know hoy to fix it, it works fine on 2.11. my code for udf definition (I tried several posibilites this is the las one) val

Re: Spark Core - Embed in other application

2018-12-11 Thread Andrés Ivaldi
Hi, yes you can, also I've developed an engine to perform ETL. I've build a Rest service with Akka, with a method called "execute" that recibe a JSON structure representing the ETL. You just need to configure your embedded standalone Spark, I did something like this, this is in scala: val spark

Spark version performance

2018-12-11 Thread Andrés Ivaldi
Hello list, I'm having a little performance issue, with different Spark versions. I've a spark embedded application written in scala, Initially I've use Spark 2.0.2, and works fine, with good speed response, but when I updated to 2.3.2 , with no any code changes It becomes slower. Mainly what

Re: Select entire row based on a logic applied on 2 columns across multiple rows

2017-08-30 Thread Andrés Ivaldi
;purna2prad...@gmail.com> > wrote: > >> @Andres I need latest but it should less than 10 months based income_age >> column and don't want to use sql here >> >> On Wed, Aug 30, 2017 at 8:08 AM Andrés Ivaldi <iaiva...@gmail.com> wrote: >> >>>

Re: Select entire row based on a logic applied on 2 columns across multiple rows

2017-08-30 Thread Andrés Ivaldi
Hi, if you need the last value from income in window function you can use last_value. No tested but meaby with @ayan sql spark.sql("select *, row_number(), last_value(income) over (partition by id order by income_age_ts desc) r from t") On Tue, Aug 29, 2017 at 11:30 PM, purna pradeep

Re: Spark 2.0.0 and Hive metastore

2017-08-29 Thread Andrés Ivaldi
need Hive? Can't you > save your aggregation using parquet for example? > > jg > > > > On Aug 29, 2017, at 08:34, Andrés Ivaldi <iaiva...@gmail.com> wrote: > > > > Hello, I'm using Spark API and with Hive support, I dont have a Hive > insta

Spark 2.0.0 and Hive metastore

2017-08-29 Thread Andrés Ivaldi
Hello, I'm using Spark API and with Hive support, I dont have a Hive instance, just using Hive for some aggregation functions. The problem is that Hive crete the hive and metastore_db folder at the temp folder, I want to change that location Regards. -- Ing. Ivaldi Andres

Re: UDF percentile_approx

2017-06-14 Thread Andrés Ivaldi
rdo Ferrari <ferra...@gmail.com> > wrote: > >> Hi Andres, >> >> I can't find the refrence, last time I searched for that I found that >> 'percentile_approx' is only available via hive context. You should register >> a temp table and use it from there. &g

UDF percentile_approx

2017-06-13 Thread Andrés Ivaldi
Hello, I`m trying to user percentile_approx on my SQL query, but It's like spark context can´t find the function I'm using it like this import org.apache.spark.sql.functions._ import org.apache.spark.sql.DataFrameStatFunctions val e = expr("percentile_approx(Cantidadcon0234514)")

Re: why we can t apply udf on rdd ???

2017-04-13 Thread Andrés Ivaldi
Hi, what Spark version are you using? Did you register the UDF? How are you using the UDF? Does the UDF support that data type as parameter? What I do with Spark 2.0 is -Create the UDF for each dataType I need -Register the UDF to sparkContext -I use UDF over dataFrame not RDD, you can convert it

Exception on Join with Spark2.1

2017-04-11 Thread Andrés Ivaldi
Hello, I'm using spark embedded, So far with Spark 2.0.2 was all ok, after update Spark to 2.1.0, I'm having problems when join to Datset. The query are generated dinamically, but I have two Dataset one with a WindowFunction and the other is de same Dataset before the application of the

Re: Grouping Set

2016-11-17 Thread Andrés Ivaldi
ail.com> wrote: > >> It should be A,yes. Can you please reproduce this with small data and >> exact SQL? >> On 15 Nov 2016 02:21, "Andrés Ivaldi" <iaiva...@gmail.com> wrote: >> >>> Hello, I'm tryin to use Grouping Set, but I dont know if it is a bug

Grouping Set

2016-11-14 Thread Andrés Ivaldi
Hello, I'm tryin to use Grouping Set, but I dont know if it is a bug or the correct behavior. Givven the above example Select a,b,sum(c) from table group by a,b grouping set ( (a), (a,b) ) What shound be the expected result A: A | B| sum(c) xx | null | xx | yy | xx | zz |

Re: DataSet toJson

2016-11-08 Thread Andrés Ivaldi
t(i)) { gen.writeFieldName(field.name) fieldWriters(i).apply(row, i) } i += 1 } } So null values are directly ignored, I've to rewrite the method toJson to use my own JacksonGenerator. Regards. On Tue, Nov 8, 2016 at 10:06 AM, Andrés Ivaldi <iaiva...@gmail.com> wrote: > Hello, I'm

DataSet toJson

2016-11-08 Thread Andrés Ivaldi
Hello, I'm using spark 2.0 and I'm using toJson method. I've seen that Null values are omitted in the Json Record, witch is valid, but I need the field with null as value, it's possible to configure that? thanks.

Re: Aggregation Calculation

2016-11-04 Thread Andrés Ivaldi
(..) but for grouping set, Does any one knows how to do it? thanks. On Thu, Nov 3, 2016 at 5:17 PM, Andrés Ivaldi <iaiva...@gmail.com> wrote: > I'm not sure about inline views, it will still performing aggregation that > I don't need. I think I didn't explain right, I've already filtered the > val

Re: Aggregation Calculation

2016-11-03 Thread Andrés Ivaldi
ps only operate > on the pruned rows/columns. > > 2016-11-03 11:29 GMT-07:00 Andrés Ivaldi <iaiva...@gmail.com>: > >> Hello, I need to perform some aggregations and a kind of Cube/RollUp >> calculation >> >> Doing some test looks like Cube and RollUp performs aggreg

Aggregation Calculation

2016-11-03 Thread Andrés Ivaldi
Hello, I need to perform some aggregations and a kind of Cube/RollUp calculation Doing some test looks like Cube and RollUp performs aggregation over all posible columns combination, but I just need some specific columns combination. What I'm trying to do is like a dataTable where te first N

Re: Aggregations with scala pairs

2016-08-18 Thread Andrés Ivaldi
ld open a Jira and a PR related to it to discuss it c.f. >> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark# >> ContributingtoSpark-ContributingCodeChanges >> >> >> >> On Wed, Aug 17, 2016 4:01 PM, Andrés Ivaldi iaiva...@gmail.com wrote: >&g

Aggregations with scala pairs

2016-08-17 Thread Andrés Ivaldi
Hello, I'd like to report a wrong behavior of DataSet's API, I don´t know how I can do that. My Jira account doesn't allow me to add a Issue I'm using Apache 2.0.0 but the problem came since at least version 1.4 (given the doc since 1.3) The problem is simple to reporduce, also the work arround,

Spark 1.6.1 and regexp_replace

2016-08-09 Thread Andrés Ivaldi
I'm having a strange behaviour with regular expression replace, I'm trying to remove the spaces with trim and also remove the spaces when they are more than one to only one. Given a string like this " A B " with trim only I got "A B" so perfect, if I add regexp_replace I got " A B". Text1

Spark 2 and Solr

2016-08-01 Thread Andrés Ivaldi
Hello, does any one know if Spark 2.0 will have a Solr connector? Lucidworks has one but is not available yet for Spark 2.0 thanks!!

Re: Data Frames Join by more than one column

2016-06-23 Thread Andrés Ivaldi
23, 2016 at 2:57 PM, Andrés Ivaldi <iaiva...@gmail.com> wrote: > Hello, I'v been trying to join ( left_outer) dataframes by two columns, > and the result is not as expected. > > I'm doing this > > dfo1.get.join(dfo2.get, > > dfo1.get.col("Col1Left").equalTo(dfo

Data Frames Join by more than one column

2016-06-23 Thread Andrés Ivaldi
Hello, I'v been trying to join ( left_outer) dataframes by two columns, and the result is not as expected. I'm doing this dfo1.get.join(dfo2.get, dfo1.get.col("Col1Left").equalTo(dfo2.get.col("Col1Right")).and(dfo1.get.col("Col2Left").equalTo(dfo2.get.col("Col2Right"))) ,

JDBC Create Table

2016-05-27 Thread Andrés Ivaldi
Hello, yesterday I updated Spark 1.6.0 to 1.6.1 and my tests starts to fail because is not possible create new tables in SQLServer, I'm using SaveMode.Overwrite as in 1.6.0 version Any Idea regards -- Ing. Ivaldi Andres

Re: Insert into JDBC

2016-05-26 Thread Andrés Ivaldi
Done, version 1.6.1 has the fix, updated and work fine Thanks. On Thu, May 26, 2016 at 4:15 PM, Anthony May <anthony...@gmail.com> wrote: > It's on the 1.6 branch > > On Thu, May 26, 2016 at 4:43 PM Andrés Ivaldi <iaiva...@gmail.com> wrote: > >> I see, I'm usi

Re: Insert into JDBC

2016-05-26 Thread Andrés Ivaldi
inserting by column name: > > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L102 > > On Thu, 26 May 2016 at 16:02 Andrés Ivaldi <iaiva...@gmail.com> wrote: > >> Hello, >> I'r

Insert into JDBC

2016-05-26 Thread Andrés Ivaldi
Hello, I'realize that when dataframe executes insert it is inserting by scheme order column instead by name, ie dataframe.write(SaveMode).jdbc(url, table, properties) Reading the profiler the execution is insert into TableName values(a,b,c..) what i need is insert into TableNames

Re: Bit(N) on create Table with MSSQLServer

2016-05-04 Thread Andrés Ivaldi
the table? On Wed, May 4, 2016 at 6:44 AM, Andrés Ivaldi <iaiva...@gmail.com> wrote: > Yes, I can do that, it's what we are doing now, but I think the best > approach would be delegate the create table action to spark. > > On Tue, May 3, 2016 at 8:17 PM, Mich Talebzadeh <mich

Re: Bit(N) on create Table with MSSQLServer

2016-05-04 Thread Andrés Ivaldi
gt; <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 3 May 2016 at 22:19, Andrés Ivaldi <iaiva...@gmail.com> wrote: > >> Ok, Spark MSSQL dataType mapping is not right for me, ie. string is

Bit(N) on create Table with MSSQLServer

2016-04-29 Thread Andrés Ivaldi
Hello, Spark is executing a create table sentence (using JDBC) to MSSQLServer with a mapping column type like ColName Bit(1) for boolean types, This create table cannot be executed on MSSQLServer. In class JdbcDialect the mapping for Boolean type is Bit(1), so the question is, this is a problem

Re: Fill Gaps between rows

2016-04-26 Thread Andrés Ivaldi
com> wrote: > Yes you need hive Context for the window functions, but you don't need > hive for it to work > > On Tue, 26 Apr 2016, 14:15 Andrés Ivaldi, <iaiva...@gmail.com> wrote: > >> Hello, do exists an Out Of the box for fill in gaps between rows with a >>

Fill Gaps between rows

2016-04-26 Thread Andrés Ivaldi
Hello, do exists an Out Of the box for fill in gaps between rows with a given condition? As example: I have a source table with data and a column with the day number, but the record only register a event and no necessary all days have events, so the table no necessary has all days. But I want a

DataFrame group and agg

2016-04-25 Thread Andrés Ivaldi
Hello, Anyone know if this is on purpose or its a bug? in https://github.com/apache/spark/blob/2f1d0320c97f064556fa1cf98d4e30d2ab2fe661/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala class the def agg have many implemetations next two of them: Line 136: def

Re: Spark SQL Transaction

2016-04-23 Thread Andrés Ivaldi
l do the following: > > if (supportsTransactions) { conn.setAutoCommit(false) // Everything in > the same db transaction. } Then at line 224, it will issue the commit: > if (supportsTransactions) { conn.commit() } HTH -Todd > > On Sat, Apr 23, 2016 at 8:57 AM, Andrés Ivaldi <iaiv

Re: Spark SQL Transaction

2016-04-23 Thread Andrés Ivaldi
egin Work’ at the start…. your >> statements should be atomic and there will be no ‘redo’ or ‘commit’ or >> ‘rollback’. >> >> I don’t see anything in Spark’s documentation about transactions, so the >> statements should be atomic. (I’m not a guru here so I could be

Tracing Spark DataFrame Execition

2016-04-21 Thread Andrés Ivaldi
Hello, It's possible to trace DataFrame, I'd like to do a progress DataFrame Execution?, I looked at SparkListeners, but nested dataframes produces several Jobs, and I dont know how to relate these Jobs also I'm reusing SparkContext. Regards. -- Ing. Ivaldi Andres

Re: Spark SQL Transaction

2016-04-20 Thread Andrés Ivaldi
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 20 April 2016 at 19:42, Andrés Ivaldi <iaiv

Re: Spark SQL Transaction

2016-04-20 Thread Andrés Ivaldi
can see that record has gone as it rolled back! > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://t

Re: Spark SQL Transaction

2016-04-20 Thread Andrés Ivaldi
xianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 19 April 2016 at 23:41, Andrés Ivaldi <iaiva...@gmail.com> wrote: &g

Re: Spark SQL Transaction

2016-04-19 Thread Andrés Ivaldi
ew?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 19 April 2016 at 21:18, Andrés Ivaldi <iaiva...@gmail.com> wrote: > >> Hello, is possible to execute a SQL write without Transaction? we dont >> ne

Spark SQL Transaction

2016-04-19 Thread Andrés Ivaldi
Hello, is possible to execute a SQL write without Transaction? we dont need transactions to save our data and this adds an overhead to the SQLServer. Regards. -- Ing. Ivaldi Andres

Microsoft SQL dialect issues

2016-03-15 Thread Andrés Ivaldi
Hello, I'm trying to use MSSQL, storing data on MSSQL but i'm having dialect problems I found this https://mail-archives.apache.org/mod_mbox/spark-issues/201510.mbox/%3cjira.12901078.1443461051000.34556.1444123886...@atlassian.jira%3E That is what is happening to me, It's possible to define the

Re: Can we use spark inside a web service?

2016-03-15 Thread Andrés Ivaldi
your objects in Java, POJOs, > annotate which ones you want indexed, upload your jars, then you can > execute queries. It's a different use case than typical OLAP. > There is some Spark integration, but then you would have the same > bottlenecks going through Spark. > > &g

Re: Can we use spark inside a web service?

2016-03-11 Thread Andrés Ivaldi
nice discussion , I've a question about Web Service with Spark. What Could be the problem using Akka-http as web service (Like play does ) , with one SparkContext created , and the queries over -http akka using only the instance of that SparkContext , Also about Analytics , we are working on

Multiple Spark taks with Akka FSM

2016-03-09 Thread Andrés Ivaldi
Hello, I'd like to know if this architecture is correct or not. We are studying Spark as our ETL engine, we have a UI designer for the graph, this give us a model that we want to translate in the corresponding Spark executions. What brings to us Akka FSM, Using same sparkContext for all actors,

Re: Multiple Spark taks with Akka FSM

2016-03-09 Thread Andrés Ivaldi
My Mistake, It's not Akka FSM is Akka Flow Graphs On Wed, Mar 9, 2016 at 1:46 PM, Andrés Ivaldi <iaiva...@gmail.com> wrote: > Hello, > > I'd like to know if this architecture is correct or not. We are studying > Spark as our ETL engine, we have a UI designer for the g

GoogleAnalytics GAData

2016-01-29 Thread Andrés Ivaldi
Hello , Im using Google api to retrive google analytics JSON I'd like to use Spark to load the JSON, but toString truncates the value, I could save it to disk and then retrive it, butI'm loosing performance, is there any other way? Regars -- Ing. Ivaldi Andres

Re: JSON to SQL

2016-01-28 Thread Andrés Ivaldi
;word"){words: String => words.split(" ")} > > > > The first argument to the explode method is the name of the input column > and the second argument is the name of the output column. > > > > Mohammed > > Author: Big Data Analytics with Spark &

Re: spark-xml data source (com.databricks.spark.xml) not working with spark 1.6

2016-01-28 Thread Andrés Ivaldi
Hi, could you get it work, tomorrow I'll be using the xml parser also, On windows 7, I'll let you know the results. Regards, On Thu, Jan 28, 2016 at 12:27 PM, Deenar Toraskar wrote: > Hi > > Anyone tried using spark-xml with spark 1.6. I cannot even get the sample

Problems when applying scheme to RDD

2016-01-28 Thread Andrés Ivaldi
Hello, I'm having an exception when trying to apply a new Scheme to RDD I'm reading an JSON with Databricks spark-csv v1.3.0 after applying some transformations I have RDD with Strings type columns Then I'm trying to apply Scheme where one of the field is Integer then this exception is riced

Re: JSON to SQL

2016-01-27 Thread Andrés Ivaldi
/Hive/LanguageManual+LateralView, > as well as the “a.b[0].c” format of expression. > > > > > > *From:* Andrés Ivaldi [mailto:iaiva...@gmail.com] > *Sent:* Thursday, January 28, 2016 3:39 AM > *To:* Sahil Sareen > *Cc:* Al Pivonka; user > *Subject:* Re: JSON to S

Re: JSON to SQL

2016-01-27 Thread Andrés Ivaldi
he persist the Domain Objects ? > > On Wed, Jan 27, 2016 at 9:45 AM, Andrés Ivaldi <iaiva...@gmail.com> wrote: > >> Sure, >> The Job is like an etl, but without interface, so I decide the rules of >> how the JSON will be saved into a SQL Table. >> >> I

Re: JSON to SQL

2016-01-27 Thread Andrés Ivaldi
, Sahil Sareen <sareen...@gmail.com> wrote: > Isn't this just about defining a case class and using > parse(json).extract[CaseClassName] using Jackson? > > -Sahil > > On Wed, Jan 27, 2016 at 11:08 PM, Andrés Ivaldi <iaiva...@gmail.com> > wrote: > >> We do

Re: JSON to SQL

2016-01-27 Thread Andrés Ivaldi
n of list or nested objects and create relations in other tables. On Wed, Jan 27, 2016 at 11:25 AM, Al Pivonka <alpivo...@gmail.com> wrote: > More detail is needed. > Can you provide some context to the use-case ? > > On Wed, Jan 27, 2016 at 8:33 AM, Andrés Ivaldi <ia

JSON to SQL

2016-01-27 Thread Andrés Ivaldi
Hello, I'm trying to Save a JSON filo into SQL table. If i try to do this directly the IlligalArgumentException is raised, I suppose this is beacouse JSON have a hierarchical structure, is that correct? If that is the problem, how can I flatten the JSON structure? The JSON structure to be

Re: Low Latency SQL query

2015-12-01 Thread Andrés Ivaldi
with a grouping at the columns takes like 1s regards On Tue, Dec 1, 2015 at 5:38 PM, Jörn Franke <jornfra...@gmail.com> wrote: > can you elaborate more on the use case? > > > On 01 Dec 2015, at 20:51, Andrés Ivaldi <iaiva...@gmail.com> wrote: > > > > Hi, > >

Re: Low Latency SQL query

2015-12-01 Thread Andrés Ivaldi
is not designed for interactive queries. > Currently hive is going into the direction of interactive queries. > Alternatives are Hbase on Phoenix or Impala. > > On 01 Dec 2015, at 21:58, Andrés Ivaldi <iaiva...@gmail.com> wrote: > > Yes, > The use case would be, > Have spa

Re: Low Latency SQL query

2015-12-01 Thread Andrés Ivaldi
g into the direction of interactive queries. >> Alternatives are Hbase on Phoenix or Impala. >> >> On 01 Dec 2015, at 21:58, Andrés Ivaldi <iaiva...@gmail.com> wrote: >> >> Yes, >> The use case would be, >> Have spark in a service (I didnt i

Re: Low Latency SQL query

2015-12-01 Thread Andrés Ivaldi
executed in a > performant fashion against a conventional (RDBMS?) database, why are you > trying to use Spark? How you answer that question will be the key to > deciding among the engineering design tradeoffs to effectively use Spark or > some other solution. > > On Tue, Dec 1, 2015 at 4:

Re: Re: OLAP query using spark dataframe with cassandra

2015-11-10 Thread Andrés Ivaldi
Hi, We have been evaluating apache Kylin, how flexible is it? I mean, we need to create the cube Structure Dynamically and populete it from different sources, the process time is not too important, what is important is the response time on queries? Thanks. On Mon, Nov 9, 2015 at 11:01 PM,

Re: Re: OLAP query using spark dataframe with cassandra

2015-11-10 Thread Andrés Ivaldi
most > the seem usage as your engine, e.g. using mysql to store > > initial aggregated data. Can you share more about your kind of Cube > queries ? We are very interested in that arch too : ) > > Best, > Sun. > -- > fightf...@163.com > > >

Re: OLAP query using spark dataframe with cassandra

2015-11-09 Thread Andrés Ivaldi
Hi, I'm also considering something similar, Spark plain is too slow for my case, a possible solution is use Spark as Multiple Source connector and basic transformation layer, then persist the information (actually is a RDBM), after that with our engine we build a kind of Cube queries, and the

Spark Analytics

2015-11-05 Thread Andrés Ivaldi
Hello, I'm newbie at spark world, With my team are analyzing Spark as integration frameworks between different sources, so far so good, but I't becomes slow when aggregations and calculations are applied to the RDD. Im using Spark as standalone and under windows. I'm running this exalple: -