Would Join on PairRDD's result in co-locating data by keys?

Ankur Srivastava Thu, 22 Jan 2015 09:45:12 -0800

Hi,

I wanted to understand how the join on two pair rdd's work? Would it result
in shuffling data from both the RDD's with same key into same partition? If
that is the case would it be better to use partitionBy function to
partition (by the join attribute) the RDD at creation for lesser shuffling?


Thanks

Ankur

Would Join on PairRDD's result in co-locating data by keys?

Reply via email to