Optimal Amount of Tasks Per size of data in memory

Brandon White Wed, 20 Jul 2016 22:58:40 -0700

What is the best heuristic for setting the number of partitions/task on an
RDD based on the size of the RDD in memory?


The Spark docs say that the number of partitions/tasks should be 2-3x the
number of CPU cores but this does not make sense for all data sizes.
Sometimes, this number is way to much and slows down the executor because
of overhead.

Optimal Amount of Tasks Per size of data in memory

Reply via email to