If the dataset allows it you can try to write a custom partitioner to help spark distribute the data more uniformly.
Sent from my iPhone On 17 Oct 2015, at 16:14, shahid ashraf <sha...@trialx.com<mailto:sha...@trialx.com>> wrote: yes i know about that,its in case to reduce partitions. the point here is the data is skewed to few partitions.. On Sat, Oct 17, 2015 at 6:27 PM, Raghavendra Pandey <raghavendra.pan...@gmail.com<mailto:raghavendra.pan...@gmail.com>> wrote: You can use coalesce function, if you want to reduce the number of partitions. This one minimizes the data shuffle. -Raghav On Sat, Oct 17, 2015 at 1:02 PM, shahid qadri <shahidashr...@icloud.com<mailto:shahidashr...@icloud.com>> wrote: Hi folks I need to reparation large set of data around(300G) as i see some portions have large data(data skew) i have pairRDDs [({},{}),({},{}),({},{})] what is the best way to solve the the problem --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org> -- with Regards Shahid Ashraf