Re: How to merge multiple rows

2018-08-22 Thread Patrick McCarthy
You didn't specify which API, but in pyspark you could do import pyspark.sql.functions as F df.groupBy('ID').agg(F.sort_array(F.collect_set('DETAILS')).alias('DETAILS')).show() +---++ | ID| DETAILS| +---++ | 1|[A1, A2, A3]| | 3|[B2]| | 2|[B1]|

Re: How to merge multiple rows

2018-08-22 Thread Jean Georges Perrin
How do you do it now? You could use a withColumn(“newDetails”, ) jg > On Aug 22, 2018, at 16:04, msbreuer wrote: > > A dataframe with following contents is given: > > ID PART DETAILS > 11 A1 > 12 A2 > 13 A3 > 21 B1 > 31 C1 > > Target format should be as following: >

How to merge multiple rows

2018-08-22 Thread msbreuer
A dataframe with following contents is given: ID PART DETAILS 11 A1 12 A2 13 A3 21 B1 31 C1 Target format should be as following: ID DETAILS 1 A1+A2+A3 2 B1 3 C1 Note, the order of A1-3 is important. Currently I am using this alternative: ID DETAIL_1 DETAIL_2