Re: How to tune number of tesks

Md. Rezaul Karim Thu, 26 Jan 2017 09:38:05 -0800

Hi,

If you require all the partitioned to be saved with saveAsTextFile you can
use coalesce(1,true).saveAsTextFile(). This basically means do the
computation then coalesce to only 1 partition. You can also use
repartition(1) too which is just a wrapper for the coalesce that sets the
shuffle argument as TRUE.

Val yourRDD = ....
yourRDD.coalesce(1).saveAsTextFile("data/output")

Hope that helps.

Regards,
_________________________________
*Md. Rezaul Karim*, BSc, MSc
PhD Researcher, INSIGHT Centre for Data Analytics
National University of Ireland, Galway
IDA Business Park, Dangan, Galway, Ireland
Web: http://www.reza-analytics.eu/index.html
<http://139.59.184.114/index.html>

On 26 January 2017 at 16:21, Soheila S. <soheila...@gmail.com> wrote:

> Hi all,
>
> Please tell me how can I tune output partition numbers.
> I run my spark job on my local machine with 8 cores and input data is
> 6.5GB. It creates 193 tasks and put the output into 193 partitions.
> How can I change the number of tasks and consequently, the number of
> output files?
>
> Best,
> Soheila
>

Re: How to tune number of tesks

Reply via email to