Re: How can i merge multiple rows to one row in sparksql or hivesql?

2017-05-15 Thread Edward Capriolo
Here is a similar but not exact way I did something similar to what you did. I had two data files in different formats the different columns needed to be different features. I wanted to feed them into spark's:

Re: How can i merge multiple rows to one row in sparksql or hivesql?

2017-05-15 Thread ayan guha
You may consider writing all your data to a nosql datastore such as hbase, using user id as key. There is a sql solution using max and inner case and finally union the results, but that may be expensive On Tue, 16 May 2017 at 12:13 am, Didac Gil wrote: > Or maybe you

Re: How can i merge multiple rows to one row in sparksql or hivesql?

2017-05-15 Thread Didac Gil
Or maybe you could also check using the collect_list from the SQL functions val compacter = Data1.groupBy(“UserID") .agg(org.apache.spark.sql.functions.collect_list(“feature").as(“ListOfFeatures")) > On 15 May 2017, at 15:15, Jone Zhang wrote: > > For example >

Re: How can i merge multiple rows to one row in sparksql or hivesql?

2017-05-15 Thread Didac Gil
I guess that if your user_id field is the key, you could use the updateStateByKey function. I did not test it, but it could be something along these lines: def yourCombineFunction(input: Seq[(String)],accumulatedInput: Option[(String)] = { val state = accumulatedInput.getOrElse((“”))

How can i merge multiple rows to one row in sparksql or hivesql?

2017-05-15 Thread Jone Zhang
For example Data1(has 1 billion records) user_id1 feature1 user_id1 feature2 Data2(has 1 billion records) user_id1 feature3 Data3(has 1 billion records) user_id1 feature4 user_id1 feature5 ... user_id1 feature100 I want to get the result as follow user_id1 feature1 feature2 feature3