[ https://issues.apache.org/jira/browse/MAPREDUCE-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siddharth Seth updated MAPREDUCE-5079: -------------------------------------- Resolution: Fixed Fix Version/s: 2.0.5-beta 0.23.7 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk, branch-2 and branch-0.23. > Recovery should restore task state from job history info directly > ----------------------------------------------------------------- > > Key: MAPREDUCE-5079 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5079 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am > Affects Versions: 0.23.7 > Reporter: Jason Lowe > Assignee: Jason Lowe > Priority: Critical > Fix For: 0.23.7, 2.0.5-beta > > Attachments: MAPREDUCE-5079-branch-0.23.patch, > MAPREDUCE-5079-branch-0.23.patch, MAPREDUCE-5079.patch, MAPREDUCE-5079.patch, > MAPREDUCE-5079.patch, MAPREDUCE-5079.patch > > > We've encountered a lot of hanging issues during MR-AM recovery because the > state machines don't always end up in the same states after recovery. This > is especially true when speculative execution is enabled. It should be > straightforward to restore task and task attempt states directly from the > TaskInfo and TaskAttemptInfo records in the job history file to avoid relying > on the task state machines ending up in the proper states with the proper > number of attempts. > This should be a more robust solution that would also give us the option of > recovering start time and log locations for tasks that were in-progress when > the AM crashed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira