Hi,

I just wonder how number of partitions effect the performance in Spark!

Is it just the parallelism (more partitions, more parallel sub-tasks) that
improves the performance? or there exist other considerations?

In my case,I run couple of map/reduce jobs on same dataset two times with
two different partition numbers, 7 and 9. I used a stand alone cluster,
with two workers on each, where the master resides with the same machine as
one of the workers.

Surprisingly, the performance of map/reduce jobs in case of 9 partitions is
almost  4X-5X better than that of 7 partitions !??  Does it mean that
choosing right number of partitions is the key factor in the Spark
performance ?

best,
/Shahab

Reply via email to