Hi, If you require all the partitioned to be saved with saveAsTextFile you can use coalesce(1,true).saveAsTextFile(). This basically means do the computation then coalesce to only 1 partition. You can also use repartition(1) too which is just a wrapper for the coalesce that sets the shuffle argument as TRUE.
Val yourRDD = .... yourRDD.coalesce(1).saveAsTextFile("data/output") Hope that helps. Regards, _________________________________ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html> On 26 January 2017 at 16:21, Soheila S. <soheila...@gmail.com> wrote: > Hi all, > > Please tell me how can I tune output partition numbers. > I run my spark job on my local machine with 8 cores and input data is > 6.5GB. It creates 193 tasks and put the output into 193 partitions. > How can I change the number of tasks and consequently, the number of > output files? > > Best, > Soheila >