A hack workaround is to use flatMap:
rdd.flatMap{ case (date, array) = for (x - array) yield (date, x) }
For those of you who don't know Scala, the for comprehension iterates
through the ArrayBuffer, named array and yields new tuples with the date
and each element. The case expression to the
Sweet - I'll have to play with this then! :)
On Fri, Apr 3, 2015 at 19:43 Reynold Xin r...@databricks.com wrote:
There is already an explode function on DataFrame btw
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L712
I think
There is already an explode function on DataFrame btw
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L712
I think something like this would work. You might need to play with the
type.
df.explode(arrayBufferColumn) { x = x }
On Fri,
Thanks Dean - fun hack :)
On Fri, Apr 3, 2015 at 6:11 AM Dean Wampler deanwamp...@gmail.com wrote:
A hack workaround is to use flatMap:
rdd.flatMap{ case (date, array) = for (x - array) yield (date, x) }
For those of you who don't know Scala, the for comprehension iterates
through the
Hint:
DF.rdd.map{}
Mohammed
From: Denny Lee [mailto:denny.g@gmail.com]
Sent: Thursday, April 2, 2015 7:10 PM
To: user@spark.apache.org
Subject: ArrayBuffer within a DataFrame
Quick question - the output of a dataframe is in the format of:
[2015-04, ArrayBuffer(A, B, C, D)]
and I'd
Quick question - the output of a dataframe is in the format of:
[2015-04, ArrayBuffer(A, B, C, D)]
and I'd like to return it as:
2015-04, A
2015-04, B
2015-04, C
2015-04, D
What's the best way to do this?
Thanks in advance!
Thanks Michael - that was it! I was drawing a blank on this one for some
reason - much appreciated!
On Thu, Apr 2, 2015 at 8:27 PM Michael Armbrust mich...@databricks.com
wrote:
A lateral view explode using HiveQL. I'm hopping to add explode shorthand
directly to the df API in 1.4.
On