[ https://issues.apache.org/jira/browse/MAPREDUCE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karthik Kambatla reassigned MAPREDUCE-4955: ------------------------------------------- Assignee: Karthik Kambatla > NM container diagnostics for excess resource usage can be lost if task fails > while being killed > ------------------------------------------------------------------------------------------------ > > Key: MAPREDUCE-4955 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4955 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am > Affects Versions: 2.0.3-alpha, 0.23.5 > Reporter: Jason Lowe > Assignee: Karthik Kambatla > > When a nodemanager kills a container for being over resource budgets, it > provides a diagnostics message for the container status explaining why it was > killed. However this message can be lost if the task fails during the > shutdown from the SIGTERM (e.g.: lost DFS leases because filesystem closed) > and notifies the AM via the task umbilical *before* the AM receives the NM's > container status message via the RM heartbeat. > In that case the task attempt fails with the task's failure diagnostic, and > the user is left wondering exactly why the task failed because the NM's > diagnostics arrive too late, are not written to the history file, and are > lost. If the AM receives the container status via the RM heartbeat before > the task fails during shutdown then the diagnostics are written properly to > the history file, and the user can see why the task failed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira