Can you increase the number of partitions and also increase the number of 
executors?
(This should improve the parallelization but you may become disk i/o bound)

On Nov 8, 2016, at 4:08 PM, Elf Of Lothlorein 
<redarro...@gmail.com<mailto:redarro...@gmail.com>> wrote:

Hi
I am trying to save a RDD to disk and I am using the saveAsNewAPIHadoopFile for 
that. I am seeing that it takes almost 20 mins for about 900 GB of data. Is 
there any parameter that I can tune to make this saving faster.
I am running about 45 executors with 5 cores each on 5 Spark worker nodes and 
using Spark on YARN for this..
Thanks for your help.
C

Reply via email to