Re: How to decide the number of tasks in Spark?

2016-04-18 Thread Mich Talebzadeh
Try to have a look at this doc

http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 18 April 2016 at 20:43, Dogtail L  wrote:

> Hi,
>
> When launching a job in Spark, I have great trouble deciding the number of
> tasks. Someone says it is better to create a task per HDFS block size,
> i.e., make sure one task process 128MB of input data; others suggest that
> the number of tasks should be the twice of the total cores available to the
> job. Also, I found that someone suggests launching small tasks using Spark,
> i.e., make sure each task lasts around 100ms.
>
> I am quite confused about all these suggestions. Is there any general rule
> for deciding the number of tasks in Spark? Great thanks!
>
> Best
>


How to decide the number of tasks in Spark?

2016-04-18 Thread Dogtail L
Hi,

When launching a job in Spark, I have great trouble deciding the number of
tasks. Someone says it is better to create a task per HDFS block size,
i.e., make sure one task process 128MB of input data; others suggest that
the number of tasks should be the twice of the total cores available to the
job. Also, I found that someone suggests launching small tasks using Spark,
i.e., make sure each task lasts around 100ms.

I am quite confused about all these suggestions. Is there any general rule
for deciding the number of tasks in Spark? Great thanks!

Best