Yes partitions matter. Usually you can use the default, which will
make a partition per input split, and that's usually good, to let one
task process one block of data, which will all be on one machine.
Reasons I could imagine why 9 partitions is faster than 7:
Probably: Your cluster can execute
Thanks Sean for very useful comments. I understand now better what could be
the reasons that my evaluations are messed up.
best,
/Shahab
On Mon, Nov 3, 2014 at 12:08 PM, Sean Owen so...@cloudera.com wrote:
Yes partitions matter. Usually you can use the default, which will
make a partition per