Hi All

I have a RDD which has 1 million keys and each key is repeated from around
7000 values so total there will be around 1M*7K records in RDD.

and each key is created from ZipWithIndex so key start from 0 to M-1
the problem with ZipWithIndex is it take long for key which is 8 bytes. can
I reduce it to 4 bytes?

Now how Can I make sure the record with same key will go the same node so
that I can avoid shuffling. Also how default partition-er will work here.


Reply via email to