Re: ArrayBuffer within a DataFrame

2015-04-03 Thread Dean Wampler
A hack workaround is to use flatMap:

rdd.flatMap{ case (date, array) = for (x - array) yield (date, x) }

For those of you who don't know Scala, the for comprehension iterates
through the ArrayBuffer, named array and yields new tuples with the date
and each element. The case expression to the left of the = pattern matches
on the input tuples.

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Thu, Apr 2, 2015 at 10:45 PM, Denny Lee denny.g@gmail.com wrote:

 Thanks Michael - that was it!  I was drawing a blank on this one for some
 reason - much appreciated!


 On Thu, Apr 2, 2015 at 8:27 PM Michael Armbrust mich...@databricks.com
 wrote:

 A lateral view explode using HiveQL.  I'm hopping to add explode
 shorthand directly to the df API in 1.4.

 On Thu, Apr 2, 2015 at 7:10 PM, Denny Lee denny.g@gmail.com wrote:

 Quick question - the output of a dataframe is in the format of:

 [2015-04, ArrayBuffer(A, B, C, D)]

 and I'd like to return it as:

 2015-04, A
 2015-04, B
 2015-04, C
 2015-04, D

 What's the best way to do this?

 Thanks in advance!






Re: ArrayBuffer within a DataFrame

2015-04-03 Thread Denny Lee
 Sweet - I'll have to play with this then! :)
On Fri, Apr 3, 2015 at 19:43 Reynold Xin r...@databricks.com wrote:

 There is already an explode function on DataFrame btw


 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L712

 I think something like this would work. You might need to play with the
 type.

 df.explode(arrayBufferColumn) { x = x }



 On Fri, Apr 3, 2015 at 6:43 AM, Denny Lee denny.g@gmail.com wrote:

 Thanks Dean - fun hack :)

 On Fri, Apr 3, 2015 at 6:11 AM Dean Wampler deanwamp...@gmail.com
 wrote:

 A hack workaround is to use flatMap:

 rdd.flatMap{ case (date, array) = for (x - array) yield (date, x) }

 For those of you who don't know Scala, the for comprehension iterates
 through the ArrayBuffer, named array and yields new tuples with the date
 and each element. The case expression to the left of the = pattern matches
 on the input tuples.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Thu, Apr 2, 2015 at 10:45 PM, Denny Lee denny.g@gmail.com
 wrote:

 Thanks Michael - that was it!  I was drawing a blank on this one for
 some reason - much appreciated!


 On Thu, Apr 2, 2015 at 8:27 PM Michael Armbrust mich...@databricks.com
 wrote:

 A lateral view explode using HiveQL.  I'm hopping to add explode
 shorthand directly to the df API in 1.4.

 On Thu, Apr 2, 2015 at 7:10 PM, Denny Lee denny.g@gmail.com
 wrote:

 Quick question - the output of a dataframe is in the format of:

 [2015-04, ArrayBuffer(A, B, C, D)]

 and I'd like to return it as:

 2015-04, A
 2015-04, B
 2015-04, C
 2015-04, D

 What's the best way to do this?

 Thanks in advance!








Re: ArrayBuffer within a DataFrame

2015-04-03 Thread Reynold Xin
There is already an explode function on DataFrame btw

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L712

I think something like this would work. You might need to play with the
type.

df.explode(arrayBufferColumn) { x = x }



On Fri, Apr 3, 2015 at 6:43 AM, Denny Lee denny.g@gmail.com wrote:

 Thanks Dean - fun hack :)

 On Fri, Apr 3, 2015 at 6:11 AM Dean Wampler deanwamp...@gmail.com wrote:

 A hack workaround is to use flatMap:

 rdd.flatMap{ case (date, array) = for (x - array) yield (date, x) }

 For those of you who don't know Scala, the for comprehension iterates
 through the ArrayBuffer, named array and yields new tuples with the date
 and each element. The case expression to the left of the = pattern matches
 on the input tuples.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Thu, Apr 2, 2015 at 10:45 PM, Denny Lee denny.g@gmail.com wrote:

 Thanks Michael - that was it!  I was drawing a blank on this one for
 some reason - much appreciated!


 On Thu, Apr 2, 2015 at 8:27 PM Michael Armbrust mich...@databricks.com
 wrote:

 A lateral view explode using HiveQL.  I'm hopping to add explode
 shorthand directly to the df API in 1.4.

 On Thu, Apr 2, 2015 at 7:10 PM, Denny Lee denny.g@gmail.com
 wrote:

 Quick question - the output of a dataframe is in the format of:

 [2015-04, ArrayBuffer(A, B, C, D)]

 and I'd like to return it as:

 2015-04, A
 2015-04, B
 2015-04, C
 2015-04, D

 What's the best way to do this?

 Thanks in advance!







Re: ArrayBuffer within a DataFrame

2015-04-03 Thread Denny Lee
Thanks Dean - fun hack :)

On Fri, Apr 3, 2015 at 6:11 AM Dean Wampler deanwamp...@gmail.com wrote:

 A hack workaround is to use flatMap:

 rdd.flatMap{ case (date, array) = for (x - array) yield (date, x) }

 For those of you who don't know Scala, the for comprehension iterates
 through the ArrayBuffer, named array and yields new tuples with the date
 and each element. The case expression to the left of the = pattern matches
 on the input tuples.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Thu, Apr 2, 2015 at 10:45 PM, Denny Lee denny.g@gmail.com wrote:

 Thanks Michael - that was it!  I was drawing a blank on this one for some
 reason - much appreciated!


 On Thu, Apr 2, 2015 at 8:27 PM Michael Armbrust mich...@databricks.com
 wrote:

 A lateral view explode using HiveQL.  I'm hopping to add explode
 shorthand directly to the df API in 1.4.

 On Thu, Apr 2, 2015 at 7:10 PM, Denny Lee denny.g@gmail.com wrote:

 Quick question - the output of a dataframe is in the format of:

 [2015-04, ArrayBuffer(A, B, C, D)]

 and I'd like to return it as:

 2015-04, A
 2015-04, B
 2015-04, C
 2015-04, D

 What's the best way to do this?

 Thanks in advance!







RE: ArrayBuffer within a DataFrame

2015-04-02 Thread Mohammed Guller
Hint:
DF.rdd.map{}

Mohammed

From: Denny Lee [mailto:denny.g@gmail.com]
Sent: Thursday, April 2, 2015 7:10 PM
To: user@spark.apache.org
Subject: ArrayBuffer within a DataFrame

Quick question - the output of a dataframe is in the format of:

[2015-04, ArrayBuffer(A, B, C, D)]

and I'd like to return it as:

2015-04, A
2015-04, B
2015-04, C
2015-04, D

What's the best way to do this?

Thanks in advance!




Re: ArrayBuffer within a DataFrame

2015-04-02 Thread Denny Lee
Thanks Michael - that was it!  I was drawing a blank on this one for some
reason - much appreciated!


On Thu, Apr 2, 2015 at 8:27 PM Michael Armbrust mich...@databricks.com
wrote:

 A lateral view explode using HiveQL.  I'm hopping to add explode shorthand
 directly to the df API in 1.4.

 On Thu, Apr 2, 2015 at 7:10 PM, Denny Lee denny.g@gmail.com wrote:

 Quick question - the output of a dataframe is in the format of:

 [2015-04, ArrayBuffer(A, B, C, D)]

 and I'd like to return it as:

 2015-04, A
 2015-04, B
 2015-04, C
 2015-04, D

 What's the best way to do this?

 Thanks in advance!