[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538489#comment-13538489
 ] 

Hudson commented on MAPREDUCE-4833:
-----------------------------------

Integrated in Hadoop-trunk-Commit #3151 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3151/])
    MAPREDUCE-4833. Task can get stuck in FAIL_CONTAINER_CLEANUP. Contributed 
by Robert Parker (Revision 1425167)

     Result = SUCCESS
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1425167
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/launcher/TestContainerLauncherImpl.java

                
> Task can get stuck in FAIL_CONTAINER_CLEANUP
> --------------------------------------------
>
>                 Key: MAPREDUCE-4833
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.5
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Parker
>            Priority: Critical
>             Fix For: 2.0.3-alpha, 0.23.6
>
>         Attachments: MAPREDUCE4833-1.patch, MAPREDUCE4833-2.patch, 
> MAPREDUCE4833.patch
>
>
> If an NM goes down and the AM still tries to launch a container on it the 
> ContainerLauncherImpl can get stuck in an RPC timeout.  At the same time the 
> RM may notice that the NM has gone away and inform the AM of this, this 
> triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at the TaskAttemptImpl 
> before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try 
> to kill the container, but the ContainerLauncherImpl will not send back a 
> TA_CONTAINER_CLEANED event causing the attempt to be stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to