[
https://issues.apache.org/jira/browse/HADOOP-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arun C Murthy updated HADOOP-2437:
----------------------------------
Status: Open (was: Patch Available)
Ok, org.apache.hadoop.fs.TestLocalDirAllocator.test3 failed with:
{noformat}
junit.framework.AssertionFailedError
at
org.apache.hadoop.fs.TestLocalDirAllocator.validateTempDirCreation(TestLocalDirAllocator.java:71)
at
org.apache.hadoop.fs.TestLocalDirAllocator.test3(TestLocalDirAllocator.java:142)
{noformat}
The problem is that the test case assumes that the start of the round-robin is
_zero_; I'll put up another patch fixing it shortly...
> final map output not evenly distributed across multiple disks
> -------------------------------------------------------------
>
> Key: HADOOP-2437
> URL: https://issues.apache.org/jira/browse/HADOOP-2437
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.16.0
> Reporter: Christian Kunz
> Assignee: Arun C Murthy
> Priority: Blocker
> Fix For: 0.15.2
>
> Attachments: HADOOP-2437_1_20071218.patch,
> HADOOP-2437_1_20071218.patch
>
>
> It seems that the final merge output of map tasks for a particular job does
> not select the output location in random fashion.
> This results in a job with a lot of map tasks eventually running out of
> taskTrackers asking for more tasks because the disk with most of the map
> outputs eventually has less disk space than specified by
> mapred.local.dir.minspacestart.
> Maybe the start of round-robin selection of multiple locations should be
> randomized.
> In our case:
> 110,000 maps, each about 3GB final output, on a 1300 node cluster.
> Out of 4 locations and after processing about 79,000 maps, the selection for
> final map outputs 'file.out' looked like:
> location1: 24,000
> location2: 25
> location3: 55,000
> location4: 7
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.