[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Haibo Chen updated MAPREDUCE-6771: ---------------------------------- Attachment: TaUnsuccessfullyEventEmission.jpg > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > -------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 2.7.3 > Reporter: Haibo Chen > Assignee: Haibo Chen > Attachments: TaUnsuccessfullyEventEmission.jpg, > mapreduce6771.001.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org