[jira] Updated: (HADOOP-2437) final map output not evenly distributed across multiple disks

Arun C Murthy (JIRA) Wed, 19 Dec 2007 08:37:06 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Arun C Murthy updated HADOOP-2437:
----------------------------------

    Status: Open  (was: Patch Available)

Ok, org.apache.hadoop.fs.TestLocalDirAllocator.test3 failed with: 
{noformat}
junit.framework.AssertionFailedError
        at 
org.apache.hadoop.fs.TestLocalDirAllocator.validateTempDirCreation(TestLocalDirAllocator.java:71)
        at 
org.apache.hadoop.fs.TestLocalDirAllocator.test3(TestLocalDirAllocator.java:142)
{noformat}


The problem is that the test case assumes that the start of the round-robin is 
_zero_; I'll put up another patch fixing it shortly...


> final map output not evenly distributed across multiple disks
> -------------------------------------------------------------
>
>                 Key: HADOOP-2437
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2437
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Christian Kunz
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.15.2
>
>         Attachments: HADOOP-2437_1_20071218.patch, 
> HADOOP-2437_1_20071218.patch
>
>
> It seems that the final merge output of map tasks for a particular job does 
> not select the output location in random fashion.
> This results in a job with a lot of map tasks eventually running out of 
> taskTrackers asking for more tasks because the disk with most of the map 
> outputs eventually has less disk space than specified by 
> mapred.local.dir.minspacestart.
> Maybe the start of round-robin selection of multiple locations should be 
> randomized.
> In our case:
> 110,000 maps, each about 3GB final output, on a 1300 node cluster.
> Out of 4 locations and after processing about 79,000 maps, the selection for 
> final map outputs 'file.out' looked like:
> location1: 24,000
> location2: 25
> location3: 55,000
> location4: 7

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2437) final map output not evenly distributed across multiple disks

Reply via email to