[ https://issues.apache.org/jira/browse/MAPREDUCE-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Graves updated MAPREDUCE-3851: ------------------------------------- Target Version/s: 1.0.1 Release Note: added new configuration variables to control when TT aborts if it sees a certain number of exceptions: // Percent of shuffle exceptions (out of sample size) seen before it's // fatal - acceptable values are from 0 to 1.0, 0 disables the check. // ie. 0.3 = 30% of the last X number of requests matched the exception, // so abort. conf.getFloat( "mapreduce.reduce.shuffle.catch.exception.percent.limit.fatal", 0); // The number of trailing requests we track, used for the fatal // limit calculation conf.getInt("mapreduce.reduce.shuffle.catch.exception.sample.size", 1000); Status: Patch Available (was: Open) > Allow more aggressive action on detection of the jetty issue > ------------------------------------------------------------ > > Key: MAPREDUCE-3851 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3851 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Affects Versions: 1.0.0 > Reporter: Kihwal Lee > Assignee: Thomas Graves > Fix For: 1.1.0, 1.0.1 > > Attachments: MAPREDUCE-3851.patch, MAPREDUCE-3851.patch > > > MAPREDUCE-2529 added the useful failure detection mechanism. In this jira, I > propose we add a periodic check inside TT and configurable action to > self-destruct. Blacklisting helps but is not enough. Hung jetty still accepts > connection and it takes very long time for clients to fail out. Short jobs > are delayed for hours because of this. This feature will be a nice companion > to MAPREDUCE-3184. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira