Haibo Chen created MAPREDUCE-6771:
-------------------------------------
Summary: Diagnostics information is lost in .jhist if task
containers are killed by Node Manager.
Key: MAPREDUCE-6771
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2
Affects Versions: 2.7.3
Reporter: Haibo Chen
Assignee: Haibo Chen
Task containers can go over their resource limit, and killed by Node Manager.
Then MR AM gets notified of the container status and diagnostics information
through its heartbeat with RM. However, it is possible that the diagnostics
information never gets into .jhist file, so when the job completes, the
diagnostics information associated with the failed task attempts is empty.
This makes it hard for users to root cause job failures that are often caused
by memory leak.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]