Custom partioner

Jeetendra Gangele Thu, 16 Apr 2015 09:48:23 -0700

Hi All

I have a RDD which has 1 million keys and each key is repeated from around
7000 values so total there will be around 1M*7K records in RDD.


and each key is created from ZipWithIndex so key start from 0 to M-1
the problem with ZipWithIndex is it take long for key which is 8 bytes. can
I reduce it to 4 bytes?

Now how Can I make sure the record with same key will go the same node so
that I can avoid shuffling. Also how default partition-er will work here.

Regards
jeetendra

Custom partioner

Reply via email to