[ https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated MAPREDUCE-4819: ------------------------------------- Priority: Blocker (was: Critical) > AM can rerun job after reporting final job status to the client > --------------------------------------------------------------- > > Key: MAPREDUCE-4819 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4819 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am > Affects Versions: 0.23.3, 2.0.1-alpha > Reporter: Jason Lowe > Assignee: Bikas Saha > Priority: Blocker > Attachments: MAPREDUCE-4819.1.patch, MAPREDUCE-4819.2.patch, > MAPREDUCE-4819.3.patch, MR-4819-bobby-trunk.txt, MR-4819-bobby-trunk.txt, > MR-4819-bobby-trunk.txt, MR-4819-bobby-trunk.txt, MR-4819-bobby-trunk.txt > > > If the AM reports final job status to the client but then crashes before > unregistering with the RM then the RM can run another AM attempt. Currently > AM re-attempts assume that the previous attempts did not reach a final job > state, and that causes the job to rerun (from scratch, if the output format > doesn't support recovery). > Re-running the job when we've already told the client the final status of the > job is bad for a number of reasons. If the job failed, it's confusing at > best since the client was already told the job failed but the subsequent > attempt could succeed. If the job succeeded there could be data loss, as a > subsequent job launched by the client tries to consume the job's output as > input just as the re-attempt starts removing output files in preparation for > the output commit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira