Hi,

I wanted to understand how the join on two pair rdd's work? Would it result
in shuffling data from both the RDD's with same key into same partition? If
that is the case would it be better to use partitionBy function to
partition (by the join attribute) the RDD at creation for lesser shuffling?

Thanks

Ankur

Reply via email to