How number of partitions effect the performance?

shahab Mon, 03 Nov 2014 01:58:17 -0800

Hi,

I just wonder how number of partitions effect the performance in Spark!


Is it just the parallelism (more partitions, more parallel sub-tasks) that
improves the performance? or there exist other considerations?

In my case,I run couple of map/reduce jobs on same dataset two times with
two different partition numbers, 7 and 9. I used a stand alone cluster,
with two workers on each, where the master resides with the same machine as
one of the workers.

Surprisingly, the performance of map/reduce jobs in case of 9 partitions is
almost  4X-5X better than that of 7 partitions !??  Does it mean that
choosing right number of partitions is the key factor in the Spark
performance ?

best,
/Shahab

How number of partitions effect the performance?

Reply via email to