[ https://issues.apache.org/jira/browse/MAPREDUCE-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588579#comment-13588579 ]
Hadoop QA commented on MAPREDUCE-4892: -------------------------------------- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571219/MAPREDUCE-4892.1.alt.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3369//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3369//console This message is automatically generated. > CombineFileInputFormat node input split can be skewed on small clusters > ----------------------------------------------------------------------- > > Key: MAPREDUCE-4892 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4892 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Bikas Saha > Assignee: Bikas Saha > Fix For: 3.0.0 > > Attachments: MAPREDUCE-4892.1.alt.patch, MAPREDUCE-4892.1.alt.patch, > MAPREDUCE-4892.1.patch > > > The CombineFileInputFormat split generation logic tries to group blocks by > node in order to create splits. It iterates through the nodes and creates > splits on them until there aren't enough blocks left on a node that can be > grouped into a valid split. If the first few nodes have a lot of blocks on > them then they can end up getting a disproportionately large share of the > total number of splits created. This can result in poor locality of maps. > This problem is likely to happen on small clusters where its easier to create > a skew in the distribution of blocks on nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira