I dont think you can change it to 4 bytes without any custom compilation.
To make same key go to same node, you'll have to repartition the data,
which is shuffling anyway. Unless your raw data is such that the same key
is on same node, you'll have to shuffle atleast once to make same key on
same node.

On Thu, Apr 16, 2015 at 10:16 PM, Jeetendra Gangele <gangele...@gmail.com>
wrote:

> Hi All
>
> I have a RDD which has 1 million keys and each key is repeated from around
> 7000 values so total there will be around 1M*7K records in RDD.
>
> and each key is created from ZipWithIndex so key start from 0 to M-1
> the problem with ZipWithIndex is it take long for key which is 8 bytes.
> can I reduce it to 4 bytes?
>
> Now how Can I make sure the record with same key will go the same node so
> that I can avoid shuffling. Also how default partition-er will work here.
>
> Regards
> jeetendra
>
>

Reply via email to