Re: Does RDD.cartesian involve shuffling?
Yes it does, in fact it's probably going to be one of the more expensive shuffles you could trigger. On Mon, Aug 3, 2015 at 12:56 PM, Meihua Wu rotationsymmetr...@gmail.com wrote: Does RDD.cartesian involve shuffling? Thanks! - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- *Richard Marscher* Software Engineer Localytics Localytics.com http://localytics.com/ | Our Blog http://localytics.com/blog | Twitter http://twitter.com/localytics | Facebook http://facebook.com/localytics | LinkedIn http://www.linkedin.com/company/1148792?trk=tyah
Re: Does RDD.cartesian involve shuffling?
Thanks, Richard! I basically have two RDD's: A and B; and I need to compute a value for every pair of (a, b) for a in A and b in B. My first thought is cartesian, but involves expensive shuffle. Any alternatives? How about I convert B to an array and broadcast it to every node (assuming B is relative small to fit)? On Tue, Aug 4, 2015 at 8:23 AM, Richard Marscher rmarsc...@localytics.com wrote: Yes it does, in fact it's probably going to be one of the more expensive shuffles you could trigger. On Mon, Aug 3, 2015 at 12:56 PM, Meihua Wu rotationsymmetr...@gmail.com wrote: Does RDD.cartesian involve shuffling? Thanks! - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Richard Marscher Software Engineer Localytics Localytics.com | Our Blog | Twitter | Facebook | LinkedIn - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Does RDD.cartesian involve shuffling?
That is the only alternative I'm aware of, if either A or B are small enough to broadcast then you'd at least be done cartesian products all locally without needing to also transmit and shuffle A. Unless spark somehow optimizes cartesian product and only transfers the smaller RDD across the network in the shuffle but I don't have reason to believe that's true. I'd try the cartesian first if you haven't tried at all, just to make sure it actually is too slow before getting tricky with the broadcast. On Tue, Aug 4, 2015 at 12:25 PM, Meihua Wu rotationsymmetr...@gmail.com wrote: Thanks, Richard! I basically have two RDD's: A and B; and I need to compute a value for every pair of (a, b) for a in A and b in B. My first thought is cartesian, but involves expensive shuffle. Any alternatives? How about I convert B to an array and broadcast it to every node (assuming B is relative small to fit)? On Tue, Aug 4, 2015 at 8:23 AM, Richard Marscher rmarsc...@localytics.com wrote: Yes it does, in fact it's probably going to be one of the more expensive shuffles you could trigger. On Mon, Aug 3, 2015 at 12:56 PM, Meihua Wu rotationsymmetr...@gmail.com wrote: Does RDD.cartesian involve shuffling? Thanks! - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Richard Marscher Software Engineer Localytics Localytics.com | Our Blog | Twitter | Facebook | LinkedIn -- *Richard Marscher* Software Engineer Localytics Localytics.com http://localytics.com/ | Our Blog http://localytics.com/blog | Twitter http://twitter.com/localytics | Facebook http://facebook.com/localytics | LinkedIn http://www.linkedin.com/company/1148792?trk=tyah
Does RDD.cartesian involve shuffling?
Does RDD.cartesian involve shuffling? Thanks! - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org