[ https://issues.apache.org/jira/browse/HADOOP-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553498 ]
Hadoop QA commented on HADOOP-2437: ----------------------------------- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12371962/HADOOP-2437_2_20071220.patch against trunk revision r605672. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1396/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1396/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1396/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1396/console This message is automatically generated. > final map output not evenly distributed across multiple disks > ------------------------------------------------------------- > > Key: HADOOP-2437 > URL: https://issues.apache.org/jira/browse/HADOOP-2437 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Assignee: Arun C Murthy > Priority: Blocker > Fix For: 0.15.2 > > Attachments: HADOOP-2437_1_20071218.patch, > HADOOP-2437_1_20071218.patch, HADOOP-2437_2_20071220.patch > > > It seems that the final merge output of map tasks for a particular job does > not select the output location in random fashion. > This results in a job with a lot of map tasks eventually running out of > taskTrackers asking for more tasks because the disk with most of the map > outputs eventually has less disk space than specified by > mapred.local.dir.minspacestart. > Maybe the start of round-robin selection of multiple locations should be > randomized. > In our case: > 110,000 maps, each about 3GB final output, on a 1300 node cluster. > Out of 4 locations and after processing about 79,000 maps, the selection for > final map outputs 'file.out' looked like: > location1: 24,000 > location2: 25 > location3: 55,000 > location4: 7 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.