Re: repartition vs partitionby

Adrian Tanase Sat, 17 Oct 2015 13:26:39 -0700

If the dataset allows it you can try to write a custom partitioner to help 
spark distribute the data more uniformly.


Sent from my iPhone

On 17 Oct 2015, at 16:14, shahid ashraf 
<sha...@trialx.com<mailto:sha...@trialx.com>> wrote:

yes i know about that,its in case to reduce partitions. the point here is the 
data is skewed to few partitions..


On Sat, Oct 17, 2015 at 6:27 PM, Raghavendra Pandey 
<raghavendra.pan...@gmail.com<mailto:raghavendra.pan...@gmail.com>> wrote:
You can use coalesce function, if you want to reduce the number of partitions. 
This one minimizes the data shuffle.

-Raghav

On Sat, Oct 17, 2015 at 1:02 PM, shahid qadri 
<shahidashr...@icloud.com<mailto:shahidashr...@icloud.com>> wrote:
Hi folks

I need to reparation large set of data around(300G) as i see some portions have 
large data(data skew)

i have pairRDDs [({},{}),({},{}),({},{})]

what is the best way to solve the the problem
---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>





--
with Regards
Shahid Ashraf

Re: repartition vs partitionby

Reply via email to