[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039909#comment-13039909
 ] 

Thomas Graves commented on MAPREDUCE-2529:
------------------------------------------

I'm proposing to add a new metric to the shuffle output metrics and increment 
it when it sees a configurable regex in the IOexception in the 
MapOutputServlet.  This metric can then be viewed by external systems or 
potentially the health_check script (HADOOP-7144 should make that easier).  
Making it configurable will make it more useful in the future in case we see 
other Jetty/JVM exceptions/issues that need to be worked around.






> Recognize Jetty bug 1342 and handle it
> --------------------------------------
>
>                 Key: MAPREDUCE-2529
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2529
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.20.204.0, 0.23.0
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>
> We are seeing many instances of the Jetty-1342 
> (http://jira.codehaus.org/browse/JETTY-1342). The bug doesn't cause Jetty to 
> stop responding altogether, some fetches go through but a lot of them throw 
> exceptions and eventually fail. The only way we have found to get the TT out 
> of this state is to restart the TT.  This jira is to catch this particular 
> exception (or perhaps a configurable regex) and handle it in an automated way 
> to either blacklist or shutdown the TT after seeing it a configurable number 
> of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to