Jeff Bean created MAPREDUCE-5323:
------------------------------------

             Summary: Min Spills For Combine Ignored
                 Key: MAPREDUCE-5323
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5323
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: task
            Reporter: Jeff Bean
            Priority: Minor


We've observed for some time that combiners always run when specified. However 
there is a config called mapreduce.map.combine.minspills which sort of implies 
that the developer or administrator ought to be able to control when combiners 
are invoked.

I spelunked into the code and found this gem in MapTask.java:

if (combinerRunner == null || numSpills < minSpillsForCombine) { 
Merger.writeFile(kvIter, writer, reporter, job); } else { 
combineCollector.setWriter(writer); combinerRunner.combine(kvIter, 
combineCollector); }

That looks way buggy to me. If ( A || B ) is made false by A then B is never 
executed. I spelunked around the code some more and it looks like 
combinerRunner is never null except on reflection failure. So it looks like the 
intention is for minSpillsForCombine to be respected, but due to this logic 
error it's totally ignored.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to