[ https://issues.apache.org/jira/browse/HADOOP-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552884 ]
Christian Kunz commented on HADOOP-2437: ---------------------------------------- +1 (similar patch is working fine) > final map output not evenly distributed across multiple disks > ------------------------------------------------------------- > > Key: HADOOP-2437 > URL: https://issues.apache.org/jira/browse/HADOOP-2437 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Assignee: Arun C Murthy > Priority: Blocker > Fix For: 0.15.2 > > Attachments: HADOOP-2437_1_20071218.patch > > > It seems that the final merge output of map tasks for a particular job does > not select the output location in random fashion. > This results in a job with a lot of map tasks eventually running out of > taskTrackers asking for more tasks because the disk with most of the map > outputs eventually has less disk space than specified by > mapred.local.dir.minspacestart. > Maybe the start of round-robin selection of multiple locations should be > randomized. > In our case: > 110,000 maps, each about 3GB final output, on a 1300 node cluster. > Out of 4 locations and after processing about 79,000 maps, the selection for > final map outputs 'file.out' looked like: > location1: 24,000 > location2: 25 > location3: 55,000 > location4: 7 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.