[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535288#comment-13535288
 ] 

Robert Parker commented on MAPREDUCE-4833:
------------------------------------------

Previously the Container did not send an event on kill if it was DONE, and 
returned (essentially a no-op). This patch will send a TA_CONTAINER_CLEANED 
event in all cases.
                
> Task can get stuck in FAIL_CONTAINER_CLEANUP
> --------------------------------------------
>
>                 Key: MAPREDUCE-4833
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.5
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Parker
>            Priority: Critical
>         Attachments: MAPREDUCE4833-23.patch
>
>
> If an NM goes down and the AM still tries to launch a container on it the 
> ContainerLauncherImpl can get stuck in an RPC timeout.  At the same time the 
> RM may notice that the NM has gone away and inform the AM of this, this 
> triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at the TaskAttemptImpl 
> before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try 
> to kill the container, but the ContainerLauncherImpl will not send back a 
> TA_CONTAINER_CLEANED event causing the attempt to be stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to