Thanks, Richard!

I basically have two RDD's: A and B; and I need to compute a value for
every pair of (a, b) for a in A and b in B. My first thought is
cartesian, but involves expensive shuffle.

Any alternatives? How about I convert B to an array and broadcast it
to every node (assuming B is relative small to fit)?



On Tue, Aug 4, 2015 at 8:23 AM, Richard Marscher
<rmarsc...@localytics.com> wrote:
> Yes it does, in fact it's probably going to be one of the more expensive
> shuffles you could trigger.
>
> On Mon, Aug 3, 2015 at 12:56 PM, Meihua Wu <rotationsymmetr...@gmail.com>
> wrote:
>>
>> Does RDD.cartesian involve shuffling?
>>
>> Thanks!
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>
>
>
> --
> Richard Marscher
> Software Engineer
> Localytics
> Localytics.com | Our Blog | Twitter | Facebook | LinkedIn

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to