[ 
https://issues.apache.org/jira/browse/HADOOP-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HADOOP-2437:
----------------------------------

    Status: Patch Available  (was: Open)

> final map output not evenly distributed across multiple disks
> -------------------------------------------------------------
>
>                 Key: HADOOP-2437
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2437
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Christian Kunz
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.15.2
>
>         Attachments: HADOOP-2437_1_20071218.patch, 
> HADOOP-2437_1_20071218.patch
>
>
> It seems that the final merge output of map tasks for a particular job does 
> not select the output location in random fashion.
> This results in a job with a lot of map tasks eventually running out of 
> taskTrackers asking for more tasks because the disk with most of the map 
> outputs eventually has less disk space than specified by 
> mapred.local.dir.minspacestart.
> Maybe the start of round-robin selection of multiple locations should be 
> randomized.
> In our case:
> 110,000 maps, each about 3GB final output, on a 1300 node cluster.
> Out of 4 locations and after processing about 79,000 maps, the selection for 
> final map outputs 'file.out' looked like:
> location1: 24,000
> location2: 25
> location3: 55,000
> location4: 7

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to