I have a dataframe and execute df.groupBy(³xyzy²).agg( collect_list(³abc²)
This produces a column of type array. Now for each row I want to create a multiple pairs/tuples from the array so that I can create a contingency table. Any idea how I can transform my data so that call crosstab() ? The join transformation operate on the entire dataframe. I need something at the row array level? Bellow is some sample python and describes what I would like my results to be? Kind regards Andy c1 = ["john", "bill", "sam"] c2 = [['red', 'blue', 'red'], ['blue', 'red'], ['green']] p = pd.DataFrame({"a":c1, "b":c2}) df = sqlContext.createDataFrame(p) df.printSchema() df.show() root |-- a: string (nullable = true) |-- b: array (nullable = true) | |-- element: string (containsNull = true) +----+----------------+ | a| b| +----+----------------+ |john|[red, blue, red]| |bill | [blue, red]| | sam| [green]| +----+----------------+ The output I am trying to create is. I could live with a crossJoin (cartesian join) and add my own filtering if it makes the problem easier? +----+----------------+ | x1| x2| +----+----------------+ red | blue red | red blue | red +----+----------------+