Re: Does RDD.cartesian involve shuffling?

2015-08-04 Thread Richard Marscher
Yes it does, in fact it's probably going to be one of the more expensive
shuffles you could trigger.

On Mon, Aug 3, 2015 at 12:56 PM, Meihua Wu rotationsymmetr...@gmail.com
wrote:

 Does RDD.cartesian involve shuffling?

 Thanks!

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 
*Richard Marscher*
Software Engineer
Localytics
Localytics.com http://localytics.com/ | Our Blog
http://localytics.com/blog | Twitter http://twitter.com/localytics |
Facebook http://facebook.com/localytics | LinkedIn
http://www.linkedin.com/company/1148792?trk=tyah


Re: Does RDD.cartesian involve shuffling?

2015-08-04 Thread Meihua Wu
Thanks, Richard!

I basically have two RDD's: A and B; and I need to compute a value for
every pair of (a, b) for a in A and b in B. My first thought is
cartesian, but involves expensive shuffle.

Any alternatives? How about I convert B to an array and broadcast it
to every node (assuming B is relative small to fit)?



On Tue, Aug 4, 2015 at 8:23 AM, Richard Marscher
rmarsc...@localytics.com wrote:
 Yes it does, in fact it's probably going to be one of the more expensive
 shuffles you could trigger.

 On Mon, Aug 3, 2015 at 12:56 PM, Meihua Wu rotationsymmetr...@gmail.com
 wrote:

 Does RDD.cartesian involve shuffling?

 Thanks!

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




 --
 Richard Marscher
 Software Engineer
 Localytics
 Localytics.com | Our Blog | Twitter | Facebook | LinkedIn

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Does RDD.cartesian involve shuffling?

2015-08-04 Thread Richard Marscher
That is the only alternative I'm aware of, if either A or B are small
enough to broadcast then you'd at least be done cartesian products all
locally without needing to also transmit and shuffle A. Unless spark
somehow optimizes cartesian product and only transfers the smaller RDD
across the network in the shuffle but I don't have reason to believe that's
true.

I'd try the cartesian first if you haven't tried at all, just to make sure
it actually is too slow before getting tricky with the broadcast.

On Tue, Aug 4, 2015 at 12:25 PM, Meihua Wu rotationsymmetr...@gmail.com
wrote:

 Thanks, Richard!

 I basically have two RDD's: A and B; and I need to compute a value for
 every pair of (a, b) for a in A and b in B. My first thought is
 cartesian, but involves expensive shuffle.

 Any alternatives? How about I convert B to an array and broadcast it
 to every node (assuming B is relative small to fit)?



 On Tue, Aug 4, 2015 at 8:23 AM, Richard Marscher
 rmarsc...@localytics.com wrote:
  Yes it does, in fact it's probably going to be one of the more expensive
  shuffles you could trigger.
 
  On Mon, Aug 3, 2015 at 12:56 PM, Meihua Wu rotationsymmetr...@gmail.com
 
  wrote:
 
  Does RDD.cartesian involve shuffling?
 
  Thanks!
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 
 
 
 
  --
  Richard Marscher
  Software Engineer
  Localytics
  Localytics.com | Our Blog | Twitter | Facebook | LinkedIn




-- 
*Richard Marscher*
Software Engineer
Localytics
Localytics.com http://localytics.com/ | Our Blog
http://localytics.com/blog | Twitter http://twitter.com/localytics |
Facebook http://facebook.com/localytics | LinkedIn
http://www.linkedin.com/company/1148792?trk=tyah


Does RDD.cartesian involve shuffling?

2015-08-03 Thread Meihua Wu
Does RDD.cartesian involve shuffling?

Thanks!

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org