Hi,
I just wonder how number of partitions effect the performance in Spark!
Is it just the parallelism (more partitions, more parallel sub-tasks) that
improves the performance? or there exist other considerations?
In my case,I run couple of map/reduce jobs on same dataset two times with
two
factor that messed up your measurements :))
There can be instances where more partitions is slower too.
On Mon, Nov 3, 2014 at 9:57 AM, shahab shahab.mok...@gmail.com wrote:
Hi,
I just wonder how number of partitions effect the performance in Spark!
Is it just the parallelism (more partitions
more partitions is slower too.
On Mon, Nov 3, 2014 at 9:57 AM, shahab shahab.mok...@gmail.com wrote:
Hi,
I just wonder how number of partitions effect the performance in Spark!
Is it just the parallelism (more partitions, more parallel sub-tasks)
that
improves the performance