You didn't specify which API, but in pyspark you could do
import pyspark.sql.functions as F
df.groupBy('ID').agg(F.sort_array(F.collect_set('DETAILS')).alias('DETAILS')).show()
+---++
| ID| DETAILS|
+---++
| 1|[A1, A2, A3]|
| 3|[B2]|
| 2|[B1]|
How do you do it now?
You could use a withColumn(“newDetails”, )
jg
> On Aug 22, 2018, at 16:04, msbreuer wrote:
>
> A dataframe with following contents is given:
>
> ID PART DETAILS
> 11 A1
> 12 A2
> 13 A3
> 21 B1
> 31 C1
>
> Target format should be as following:
>
A dataframe with following contents is given:
ID PART DETAILS
11 A1
12 A2
13 A3
21 B1
31 C1
Target format should be as following:
ID DETAILS
1 A1+A2+A3
2 B1
3 C1
Note, the order of A1-3 is important.
Currently I am using this alternative:
ID DETAIL_1 DETAIL_2