Re: How number of partitions effect the performance?

2014-11-03 Thread Sean Owen
Yes partitions matter. Usually you can use the default, which will make a partition per input split, and that's usually good, to let one task process one block of data, which will all be on one machine. Reasons I could imagine why 9 partitions is faster than 7: Probably: Your cluster can execute

Re: How number of partitions effect the performance?

2014-11-03 Thread shahab
Thanks Sean for very useful comments. I understand now better what could be the reasons that my evaluations are messed up. best, /Shahab On Mon, Nov 3, 2014 at 12:08 PM, Sean Owen so...@cloudera.com wrote: Yes partitions matter. Usually you can use the default, which will make a partition per