[ https://issues.apache.org/jira/browse/HADOOP-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1984: ---------------------------------- Status: Open (was: Patch Available) Amar, some comments: - Please specify the 'unit' (seconds) for {{mapred.reduce.copy.backoff}} in hadoop-default.xml. - I suggest we use integer arithmetic where possible: {noformat} + long currentBackOff = + BACKOFF_INIT + * (long)Math.pow(BACKOFF_EXPONENTIAL_BASE, + noFailedFetches.intValue() - 1); {noformat} is actually: {noformat} + long currentBackOff = (1 << (noFailedFetches.intValue() + 1)); {noformat} given that the base is hard-coded as 2. It keeps things more readable and easier to maintain. I'm pretty sure we can do this to calculate {{maxFetchRetriesPerMap}} too... > some reducer stuck at copy phase and progress extremely slowly > -------------------------------------------------------------- > > Key: HADOOP-1984 > URL: https://issues.apache.org/jira/browse/HADOOP-1984 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.16.0 > Reporter: Runping Qi > Assignee: Amar Kamat > Priority: Critical > Fix For: 0.16.0 > > Attachments: HADOOP-1984-simple.patch, HADOOP-1984-simple.patch, > HADOOP-1984.patch > > > In many cases, some reducers got stuck at copy phase, progressing extremely > slowly. > The entire cluster seems doing nothing. This causes a very bad long tails of > otherwise well tuned map/red jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.