A "task" is the work to be done on a partition for a given stage - you
should expect the number of tasks to be equal to the number of partitions
in each stage, though a task might need to be rerun (due to failure or need
to recompute some data).

2-4 times the cores in your cluster should be a good starting place. Then
you can try different values and see how it affects your performance.

On Mon, Sep 29, 2014 at 5:01 PM, anny9699 <anny9...@gmail.com> wrote:

> Hi,
>
> I read the past posts about partition number, but am still a little
> confused
> about partitioning strategy.
>
> I have a cluster with 8 works and 2 cores for each work. Is it true that
> the
> optimal partition number should be 2-4 * total_coreNumber or should
> approximately equal to total_coreNumber? Or it's the task number that
> really
> determines the speed rather then partition number?
>
> Thanks a lot!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/about-partition-number-tp15362.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: daniel.siegm...@velos.io W: www.velos.io

Reply via email to