[jira] Commented: (HADOOP-2437) final map output not evenly distributed across multiple disks

Hadoop QA (JIRA) Wed, 19 Dec 2007 13:10:06 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553498
 ]


Hadoop QA commented on HADOOP-2437:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12371962/HADOOP-2437_2_20071220.patch
against trunk revision r605672.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1396/testReport/
Findbugs warnings: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1396/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1396/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1396/console

This message is automatically generated.

> final map output not evenly distributed across multiple disks
> -------------------------------------------------------------
>
>                 Key: HADOOP-2437
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2437
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Christian Kunz
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.15.2
>
>         Attachments: HADOOP-2437_1_20071218.patch, 
> HADOOP-2437_1_20071218.patch, HADOOP-2437_2_20071220.patch
>
>
> It seems that the final merge output of map tasks for a particular job does 
> not select the output location in random fashion.
> This results in a job with a lot of map tasks eventually running out of 
> taskTrackers asking for more tasks because the disk with most of the map 
> outputs eventually has less disk space than specified by 
> mapred.local.dir.minspacestart.
> Maybe the start of round-robin selection of multiple locations should be 
> randomized.
> In our case:
> 110,000 maps, each about 3GB final output, on a 1300 node cluster.
> Out of 4 locations and after processing about 79,000 maps, the selection for 
> final map outputs 'file.out' looked like:
> location1: 24,000
> location2: 25
> location3: 55,000
> location4: 7

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2437) final map output not evenly distributed across multiple disks

Reply via email to