I was a little sloppy when I created the sample output. Its missing a few pairs
Assume for a given row I have [a, b, c] I want to create something like the cartesian join From: Andrew Davidson <a...@santacruzintegration.com> Date: Friday, March 30, 2018 at 5:54 PM To: "user @spark" <user@spark.apache.org> Subject: how to create all possible combinations from an array? how to join and explode row array? > I have a dataframe and execute df.groupBy(³xyzy²).agg( collect_list(³abc²) > > This produces a column of type array. Now for each row I want to create a > multiple pairs/tuples from the array so that I can create a contingency table. > Any idea how I can transform my data so that call crosstab() ? The join > transformation operate on the entire dataframe. I need something at the row > array level? > > > Bellow is some sample python and describes what I would like my results to be? > > Kind regards > > Andy > > > c1 = ["john", "bill", "sam"] > c2 = [['red', 'blue', 'red'], ['blue', 'red'], ['green']] > p = pd.DataFrame({"a":c1, "b":c2}) > > df = sqlContext.createDataFrame(p) > df.printSchema() > df.show() > > root > |-- a: string (nullable = true) > |-- b: array (nullable = true) > | |-- element: string (containsNull = true) > > +----+----------------+ > | a| b| > +----+----------------+ > |john|[red, blue, red]| > |bill | [blue, red]| > | sam| [green]| > +----+----------------+ > > > The output I am trying to create is. I could live with a crossJoin (cartesian > join) and add my own filtering if it makes the problem easier? > > > +----+----------------+ > | x1| x2| > +----+----------------+ > red | blue > red | red > blue | red > +----+----------------+ > >