[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned MAPREDUCE-4955:
-------------------------------------------

    Assignee: Karthik Kambatla
    
> NM container diagnostics for excess resource usage can be lost if task fails 
> while being killed 
> ------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4955
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4955
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Karthik Kambatla
>
> When a nodemanager kills a container for being over resource budgets, it 
> provides a diagnostics message for the container status explaining why it was 
> killed.  However this message can be lost if the task fails during the 
> shutdown from the SIGTERM (e.g.: lost DFS leases because filesystem closed) 
> and notifies the AM via the task umbilical *before* the AM receives the NM's 
> container status message via the RM heartbeat.
> In that case the task attempt fails with the task's failure diagnostic, and 
> the user is left wondering exactly why the task failed because the NM's 
> diagnostics arrive too late, are not written to the history file, and are 
> lost.  If the AM receives the container status via the RM heartbeat before 
> the task fails during shutdown then the diagnostics are written properly to 
> the history file, and the user can see why the task failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to