[ https://issues.apache.org/jira/browse/MAPREDUCE-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039909#comment-13039909 ]
Thomas Graves commented on MAPREDUCE-2529: ------------------------------------------ I'm proposing to add a new metric to the shuffle output metrics and increment it when it sees a configurable regex in the IOexception in the MapOutputServlet. This metric can then be viewed by external systems or potentially the health_check script (HADOOP-7144 should make that easier). Making it configurable will make it more useful in the future in case we see other Jetty/JVM exceptions/issues that need to be worked around. > Recognize Jetty bug 1342 and handle it > -------------------------------------- > > Key: MAPREDUCE-2529 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2529 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Affects Versions: 0.20.204.0, 0.23.0 > Reporter: Thomas Graves > Assignee: Thomas Graves > > We are seeing many instances of the Jetty-1342 > (http://jira.codehaus.org/browse/JETTY-1342). The bug doesn't cause Jetty to > stop responding altogether, some fetches go through but a lot of them throw > exceptions and eventually fail. The only way we have found to get the TT out > of this state is to restart the TT. This jira is to catch this particular > exception (or perhaps a configurable regex) and handle it in an automated way > to either blacklist or shutdown the TT after seeing it a configurable number > of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira