Re: Custom partioner

2015-04-18 Thread Archit Thakur
Yes you can. Use partitionby method and pass partitioner to it. On Apr 17, 2015 8:18 PM, Jeetendra Gangele gangele...@gmail.com wrote: Ok is there a way, I can use hash Partitioning so that I can improve the performance? On 17 April 2015 at 19:33, Archit Thakur archit279tha...@gmail.com

Re: Custom partioner

2015-04-17 Thread Jeetendra Gangele
Hi Archit Thanks for reply. How can I don the costom compilation so reduce it to 4 bytes.I want to make it to 4 bytes in any case can you please guide? I am applying flatMapvalue in each step after ZipWithIndex it should be in same Node right? Why its suffling? Also I am running with very less

Re: Custom partioner

2015-04-17 Thread Archit Thakur
By custom installation, I meant change the code and build it. I have not done the complete impact analysis, just had a look on the code. When you say, same key goes to same node, It would need shuffling unless the raw data you are reading is present that way. On Apr 17, 2015 6:30 PM, Jeetendra

Custom partioner

2015-04-16 Thread Jeetendra Gangele
Hi All I have a RDD which has 1 million keys and each key is repeated from around 7000 values so total there will be around 1M*7K records in RDD. and each key is created from ZipWithIndex so key start from 0 to M-1 the problem with ZipWithIndex is it take long for key which is 8 bytes. can I