Jeff Bean created MAPREDUCE-5323: ------------------------------------ Summary: Min Spills For Combine Ignored Key: MAPREDUCE-5323 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5323 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Jeff Bean Priority: Minor
We've observed for some time that combiners always run when specified. However there is a config called mapreduce.map.combine.minspills which sort of implies that the developer or administrator ought to be able to control when combiners are invoked. I spelunked into the code and found this gem in MapTask.java: if (combinerRunner == null || numSpills < minSpillsForCombine) { Merger.writeFile(kvIter, writer, reporter, job); } else { combineCollector.setWriter(writer); combinerRunner.combine(kvIter, combineCollector); } That looks way buggy to me. If ( A || B ) is made false by A then B is never executed. I spelunked around the code some more and it looks like combinerRunner is never null except on reflection failure. So it looks like the intention is for minSpillsForCombine to be respected, but due to this logic error it's totally ignored. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira