[ https://issues.apache.org/jira/browse/HADOOP-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-2437: ---------------------------------- Status: Patch Available (was: Open) > final map output not evenly distributed across multiple disks > ------------------------------------------------------------- > > Key: HADOOP-2437 > URL: https://issues.apache.org/jira/browse/HADOOP-2437 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Assignee: Arun C Murthy > Priority: Blocker > Fix For: 0.15.2 > > Attachments: HADOOP-2437_1_20071218.patch, > HADOOP-2437_1_20071218.patch > > > It seems that the final merge output of map tasks for a particular job does > not select the output location in random fashion. > This results in a job with a lot of map tasks eventually running out of > taskTrackers asking for more tasks because the disk with most of the map > outputs eventually has less disk space than specified by > mapred.local.dir.minspacestart. > Maybe the start of round-robin selection of multiple locations should be > randomized. > In our case: > 110,000 maps, each about 3GB final output, on a 1300 node cluster. > Out of 4 locations and after processing about 79,000 maps, the selection for > final map outputs 'file.out' looked like: > location1: 24,000 > location2: 25 > location3: 55,000 > location4: 7 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.