[
https://issues.apache.org/jira/browse/HADOOP-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arun C Murthy updated HADOOP-1984:
----------------------------------
Status: Open (was: Patch Available)
Amar, some comments:
- Please specify the 'unit' (seconds) for {{mapred.reduce.copy.backoff}} in
hadoop-default.xml.
- I suggest we use integer arithmetic where possible:
{noformat}
+ long currentBackOff =
+ BACKOFF_INIT
+ * (long)Math.pow(BACKOFF_EXPONENTIAL_BASE,
+ noFailedFetches.intValue() - 1);
{noformat}
is actually:
{noformat}
+ long currentBackOff = (1 << (noFailedFetches.intValue() + 1));
{noformat}
given that the base is hard-coded as 2. It keeps things more readable and
easier to maintain.
I'm pretty sure we can do this to calculate {{maxFetchRetriesPerMap}} too...
> some reducer stuck at copy phase and progress extremely slowly
> --------------------------------------------------------------
>
> Key: HADOOP-1984
> URL: https://issues.apache.org/jira/browse/HADOOP-1984
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.16.0
> Reporter: Runping Qi
> Assignee: Amar Kamat
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1984-simple.patch, HADOOP-1984-simple.patch,
> HADOOP-1984.patch
>
>
> In many cases, some reducers got stuck at copy phase, progressing extremely
> slowly.
> The entire cluster seems doing nothing. This causes a very bad long tails of
> otherwise well tuned map/red jobs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.