[ https://issues.apache.org/jira/browse/MAPREDUCE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Haibo Chen reassigned MAPREDUCE-4955: ------------------------------------- Assignee: (was: Haibo Chen) > NM container diagnostics for excess resource usage can be lost if task fails > while being killed > ------------------------------------------------------------------------------------------------ > > Key: MAPREDUCE-4955 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4955 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am > Affects Versions: 2.0.3-alpha, 0.23.5 > Reporter: Jason Lowe > > When a nodemanager kills a container for being over resource budgets, it > provides a diagnostics message for the container status explaining why it was > killed. However this message can be lost if the task fails during the > shutdown from the SIGTERM (e.g.: lost DFS leases because filesystem closed) > and notifies the AM via the task umbilical *before* the AM receives the NM's > container status message via the RM heartbeat. > In that case the task attempt fails with the task's failure diagnostic, and > the user is left wondering exactly why the task failed because the NM's > diagnostics arrive too late, are not written to the history file, and are > lost. If the AM receives the container status via the RM heartbeat before > the task fails during shutdown then the diagnostics are written properly to > the history file, and the user can see why the task failed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org