[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned MAPREDUCE-4955:
-------------------------------------

    Assignee:     (was: Haibo Chen)

> NM container diagnostics for excess resource usage can be lost if task fails 
> while being killed 
> ------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4955
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4955
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.5
>            Reporter: Jason Lowe
>
> When a nodemanager kills a container for being over resource budgets, it 
> provides a diagnostics message for the container status explaining why it was 
> killed.  However this message can be lost if the task fails during the 
> shutdown from the SIGTERM (e.g.: lost DFS leases because filesystem closed) 
> and notifies the AM via the task umbilical *before* the AM receives the NM's 
> container status message via the RM heartbeat.
> In that case the task attempt fails with the task's failure diagnostic, and 
> the user is left wondering exactly why the task failed because the NM's 
> diagnostics arrive too late, are not written to the history file, and are 
> lost.  If the AM receives the container status via the RM heartbeat before 
> the task fails during shutdown then the diagnostics are written properly to 
> the history file, and the user can see why the task failed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to