Re: Load multiple CSV from different paths

2017-07-05 Thread Didac Gil
1.6.x I think this may work with spark-csv > <https://github.com/databricks/spark-csv> : > > spark.read.format("com.databricks.spark.csv").option("header", "false") > .schema(custom_schema) > .option('delimiter', '\t') > .op

Load multiple CSV from different paths

2017-07-05 Thread Didac Gil
elimiter', '\t') .option('mode', 'DROPMALFORMED') .load(paths.split(',')) However, even it mentions that this approach would work in Spark 2.x, I don’t find an implementation of load that accepts an Array[String] as an input parameter. Thanks in advance for your help. Did

Re: Analysis Exception after join

2017-07-03 Thread Didac Gil
Time_inc#200, Health#1014, Inf_period#1039, > infectedFamily#1355L, infectedWorker#1385L] > > +- Aggregate [S_ID#1903L], [S_ID#1903L, count(1) AS infectedStreet#1415L] > > Does someone have a clue about it? > Thanks, > > > Didac Gil de la Iglesia PhD in Computer Science didacg...@gmail.com Spain: +34 696 285 544 Sweden: +46 (0)730229737 Skype: didac.gil.de.la.iglesia signature.asc Description: Message signed with OpenPGP

Re: How to print data to console in structured streaming using Spark 2.1.0?

2017-05-16 Thread Didac Gil
t to console? When I run my standalone test Kafka consumer > jar I can see that it is receiving messages. so I am not sure what is going > on with above code? any ideas? > > Thanks! Didac Gil de la Iglesia PhD in Computer Science didacg...@gmail.com Spain: +34 696 285 544 Sweden: +46 (0)730229737 Skype: didac.gil.de.la.iglesia signature.asc Description: Message signed with OpenPGP

Re: How can i merge multiple rows to one row in sparksql or hivesql?

2017-05-15 Thread Didac Gil
; > I want to get the result as follow > user_id1 feature1 feature2 feature3 feature4 feature5...feature100 > > Is there a more efficient way except join? > > Thanks! Didac Gil de la Iglesia PhD in Computer Science didacg...@gmail.com Spain: +34 696 285 544 Sweden: +46 (

Re: How can i merge multiple rows to one row in sparksql or hivesql?

2017-05-15 Thread Didac Gil
> user_id1 feature1 feature2 feature3 feature4 feature5...feature100 > > Is there a more efficient way except join? > > Thanks! Didac Gil de la Iglesia PhD in Computer Science didacg...@gmail.com Spain: +34 696 285 544 Sweden: +46 (0)730229737 Skype: didac.gil.de.la.iglesia signature.asc Description: Message signed with OpenPGP

Re: Dataframes na fill with empty list

2017-04-11 Thread Didac Gil
playing around with > coalesce in a sql expression, but I'm not having any luck here either. > > Obviously, I can do a null check on the fields downstream, however it is not > in the spirit of scala to pass around nulls, so I wanted to see if I was > missing another approach first. > &

Re: kafka and spark integration

2017-03-22 Thread Didac Gil
Spark can be a consumer and a producer from the Kafka point of view. You can create a kafka client in Spark that registers to a topic and reads the feeds, and you can process data in Spark and generate a producer that sends that data into a topic. So, Spark lies next to Kafka and you can use

Re: Suprised!!!!!Spark-shell showing inconsistent results

2017-02-02 Thread Didac Gil
Is 1570 the value of Col1? If so, you have ordered by that column and selected only the first item. It seems that both results have the same Col1 value, therefore any of them would be a right answer to return. Right? > On 2 Feb 2017, at 11:03, Alex wrote: > > Hi As shown

Re: Dataframe fails to save to MySQL table in spark app, but succeeds in spark shell

2017-01-26 Thread Didac Gil
Are you sure that “age” is a numeric field? Even numeric, you could pass the “44” between quotes: INSERT into your_table ("user","age","state") VALUES ('user3’,’44','CT’) Are you sure there are no more fields that are specified as NOT NULL, and that you did not provide a value (besides user,

[no subject]

2016-11-28 Thread Didac Gil
Any suggestions for using something like OneHotEncoder and StringIndexer on an InputDStream? I could try to combine an Indexer based on a static parquet but I want to use the OneHotEncoder approach in Streaming data coming from a socket. Thanks! Dídac Gil de la Iglesia