[
https://issues.apache.org/jira/browse/HADOOP-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arun C Murthy updated HADOOP-2437:
----------------------------------
Status: Open (was: Patch Available)
> final map output not evenly distributed across multiple disks
> -------------------------------------------------------------
>
> Key: HADOOP-2437
> URL: https://issues.apache.org/jira/browse/HADOOP-2437
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.16.0
> Reporter: Christian Kunz
> Assignee: Arun C Murthy
> Priority: Blocker
> Fix For: 0.15.2
>
> Attachments: HADOOP-2437_1_20071218.patch
>
>
> It seems that the final merge output of map tasks for a particular job does
> not select the output location in random fashion.
> This results in a job with a lot of map tasks eventually running out of
> taskTrackers asking for more tasks because the disk with most of the map
> outputs eventually has less disk space than specified by
> mapred.local.dir.minspacestart.
> Maybe the start of round-robin selection of multiple locations should be
> randomized.
> In our case:
> 110,000 maps, each about 3GB final output, on a 1300 node cluster.
> Out of 4 locations and after processing about 79,000 maps, the selection for
> final map outputs 'file.out' looked like:
> location1: 24,000
> location2: 25
> location3: 55,000
> location4: 7
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.