Re: Column explode a map

2016-03-24 Thread Michał Zieliński
Thanks Sujit, Michael,

The list of columns is data driven (and in the order of 100s), but your 2nd
example looks exactly like the thing I want.

Appreciate the help!

On 24 March 2016 at 20:20, Michael Armbrust  wrote:

> If you know the map keys ahead of time then you can just extract them
> directly.
>
> Here are a few examples
> 
> .
>
> On Thu, Mar 24, 2016 at 12:01 PM, Michał Zieliński <
> zielinski.mich...@gmail.com> wrote:
>
>> Hi,
>>
>> Imagine you have a structure like this:
>>
>> val events = sqlContext.createDataFrame(
>>Seq(
>>  ("a", Map("a"->1,"b"->1)),
>>  ("b", Map("b"->1,"c"->1)),
>>  ("c", Map("a"->1,"c"->1))
>>)
>>  ).toDF("id","map")
>>
>> What I want to achieve is have the map values as a separate columns.
>> Basically I want to achieve this:
>>
>> +---++++
>> | id|   a|   b|   c|
>> +---++++
>> |  a|   1|   1|null|
>> |  b|null|   1|   1|
>> |  c|   1|null|   1|
>> +---++++
>>
>> I managed to create it with an explode-pivot combo, but for large
>> dataset, and a list of map keys around 1000 I imagine this will
>> be prohibitively expensive. I reckon there must be a much easier way to
>> achieve that, than:
>>
>> val exploded =
>> events.select(col("id"),explode(col("map"))).groupBy("id").pivot("key").sum("value")
>>
>> Any help would be appreciated. :)
>>
>
>


Re: Column explode a map

2016-03-24 Thread Michael Armbrust
If you know the map keys ahead of time then you can just extract them
directly.

Here are a few examples

.

On Thu, Mar 24, 2016 at 12:01 PM, Michał Zieliński <
zielinski.mich...@gmail.com> wrote:

> Hi,
>
> Imagine you have a structure like this:
>
> val events = sqlContext.createDataFrame(
>Seq(
>  ("a", Map("a"->1,"b"->1)),
>  ("b", Map("b"->1,"c"->1)),
>  ("c", Map("a"->1,"c"->1))
>)
>  ).toDF("id","map")
>
> What I want to achieve is have the map values as a separate columns.
> Basically I want to achieve this:
>
> +---++++
> | id|   a|   b|   c|
> +---++++
> |  a|   1|   1|null|
> |  b|null|   1|   1|
> |  c|   1|null|   1|
> +---++++
>
> I managed to create it with an explode-pivot combo, but for large dataset,
> and a list of map keys around 1000 I imagine this will
> be prohibitively expensive. I reckon there must be a much easier way to
> achieve that, than:
>
> val exploded =
> events.select(col("id"),explode(col("map"))).groupBy("id").pivot("key").sum("value")
>
> Any help would be appreciated. :)
>