What's wrong just using a UDF doing for loop in scala? You can change the for loop logic for what combination you want.
scala> spark.version res4: String = 2.2.1 scala> aggDS.printSchema root |-- name: string (nullable = true) |-- colors: array (nullable = true) | |-- element: string (containsNull = true) scala> aggDS.show(false) +----+----------------+ |name|colors | +----+----------------+ |john|[red, blue, red]| |bill|[blue, red] | |sam |[gree] | +----+----------------+ scala> import org.apache.spark.sql.functions.udf import org.apache.spark.sql.functions.udf scala> val loopUDF = udf { x: Seq[String] => for (a <- x; b <-x) yield (a,b) } loopUDF: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,ArrayType(StructType(StructField(_1,StringType,true), StructField(_2,StringType,true)),true),Some(List(ArrayType(StringType,true)))) scala> aggDS.withColumn("newCol", loopUDF($"colors")).show(false) +----+----------------+---------------------------------------------------------------------------------------------------------+ |name|colors |newCol | +----+----------------+---------------------------------------------------------------------------------------------------------+ |john|[red, blue, red]|[[red,red], [red,blue], [red,red], [blue,red], [blue,blue], [blue,red], [red,red], [red,blue], [red,red]]| |bill|[blue, red] |[[blue,blue], [blue,red], [red,blue], [red,red]] | |sam |[gree] |[[gree,gree]] | +----+----------------+----------------------------------------------------------------- Yong ________________________________ From: Andy Davidson <a...@santacruzintegration.com> Sent: Friday, March 30, 2018 8:58 PM To: Andy Davidson; user Subject: Re: how to create all possible combinations from an array? how to join and explode row array? I was a little sloppy when I created the sample output. Its missing a few pairs Assume for a given row I have [a, b, c] I want to create something like the cartesian join From: Andrew Davidson <a...@santacruzintegration.com<mailto:a...@santacruzintegration.com>> Date: Friday, March 30, 2018 at 5:54 PM To: "user @spark" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: how to create all possible combinations from an array? how to join and explode row array? I have a dataframe and execute df.groupBy(“xyzy”).agg( collect_list(“abc”) This produces a column of type array. Now for each row I want to create a multiple pairs/tuples from the array so that I can create a contingency table. Any idea how I can transform my data so that call crosstab() ? The join transformation operate on the entire dataframe. I need something at the row array level? Bellow is some sample python and describes what I would like my results to be? Kind regards Andy c1 = ["john", "bill", "sam"] c2 = [['red', 'blue', 'red'], ['blue', 'red'], ['green']] p = pd.DataFrame({"a":c1, "b":c2}) df = sqlContext.createDataFrame(p) df.printSchema() df.show() root |-- a: string (nullable = true) |-- b: array (nullable = true) | |-- element: string (containsNull = true) +----+----------------+ | a| b| +----+----------------+ |john|[red, blue, red]| |bill | [blue, red]| | sam| [green]| +----+----------------+ The output I am trying to create is. I could live with a crossJoin (cartesian join) and add my own filtering if it makes the problem easier? +----+----------------+ | x1| x2| +----+----------------+ red | blue red | red blue | red +----+----------------+