I was a little sloppy when I created the sample output. Its missing a few
pairs

Assume for a given row I have [a, b, c] I want to create something like the
cartesian join

From:  Andrew Davidson <a...@santacruzintegration.com>
Date:  Friday, March 30, 2018 at 5:54 PM
To:  "user @spark" <user@spark.apache.org>
Subject:  how to create all possible combinations from an array? how to join
and explode row array?

> I have a dataframe and execute  df.groupBy(³xyzy²).agg( collect_list(³abc²)
> 
> This produces a column of type array. Now for each row I want to create a
> multiple pairs/tuples from the array so that I can create a contingency table.
> Any idea how I can transform my data so that call crosstab() ? The join
> transformation operate on the entire dataframe. I need something at the row
> array level?
> 
> 
> Bellow is some sample python and describes what I would like my results to be?
> 
> Kind regards
> 
> Andy
> 
> 
> c1 = ["john", "bill", "sam"]
> c2 = [['red', 'blue', 'red'], ['blue', 'red'], ['green']]
> p = pd.DataFrame({"a":c1, "b":c2})
> 
> df = sqlContext.createDataFrame(p)
> df.printSchema()
> df.show()
> 
> root
>  |-- a: string (nullable = true)
>  |-- b: array (nullable = true)
>  |    |-- element: string (containsNull = true)
> 
> +----+----------------+
> |   a|               b|
> +----+----------------+
> |john|[red, blue, red]|
> |bill   |     [blue, red]|
> | sam|         [green]|
> +----+----------------+
> 
> 
> The output I am trying to create is. I could live with a crossJoin (cartesian
> join) and add my own filtering if it makes the problem easier?
> 
> 
> +----+----------------+
> |  x1|    x2|
> +----+----------------+
> red  | blue
> red  | red
> blue | red
> +----+----------------+
> 
> 


Reply via email to