By custom installation, I meant change the code and build it. I have not
done the complete impact analysis, just had a look on the code.

When you say, same key goes to same node, It would need shuffling unless
the raw data you are reading is present that way.
On Apr 17, 2015 6:30 PM, "Jeetendra Gangele" <gangele...@gmail.com> wrote:

> Hi Archit Thanks for reply.
> How can I don the costom compilation so reduce it to 4 bytes.I want to
> make it to 4 bytes in any case can you please guide?
>
> I am applying flatMapvalue in each step after ZipWithIndex it should be in
> same Node right? Why its suffling?
> Also I am running with very less records currently still its shuffling ?
>
> regards
> jeetendra
>
>
>
> On 17 April 2015 at 15:58, Archit Thakur <archit279tha...@gmail.com>
> wrote:
>
>> I dont think you can change it to 4 bytes without any custom compilation.
>> To make same key go to same node, you'll have to repartition the data,
>> which is shuffling anyway. Unless your raw data is such that the same key
>> is on same node, you'll have to shuffle atleast once to make same key on
>> same node.
>>
>> On Thu, Apr 16, 2015 at 10:16 PM, Jeetendra Gangele <gangele...@gmail.com
>> > wrote:
>>
>>> Hi All
>>>
>>> I have a RDD which has 1 million keys and each key is repeated from
>>> around 7000 values so total there will be around 1M*7K records in RDD.
>>>
>>> and each key is created from ZipWithIndex so key start from 0 to M-1
>>> the problem with ZipWithIndex is it take long for key which is 8 bytes.
>>> can I reduce it to 4 bytes?
>>>
>>> Now how Can I make sure the record with same key will go the same node
>>> so that I can avoid shuffling. Also how default partition-er will work here.
>>>
>>> Regards
>>> jeetendra
>>>
>>>
>>
>
>
>

Reply via email to