[ https://issues.apache.org/jira/browse/MAPREDUCE-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243948#comment-15243948 ]
Vinod Kumar Vavilapalli commented on MAPREDUCE-6608: ---------------------------------------------------- [~srikanth.sampath] / [~djp], Got around to reading the design doc attached - there are a few important details that aren't covered in the doc, besides the AM discovery problem itself h4. Output Commit of previous tasks The new AM needs to make sure that output of previously running containers can be safely committed. IIRC, with today's FileOutputCommitter, new AM will only promote task-outputs that are present in $jobOutput/_temporary/$currentAttemptID/ Similar changes may be needed for other OutputCommitters out there. h4. Task Output Commit races It doesn't look like we record task-commit in JobHistory, so it is possible that the previous AM gave a commit go-ahead to a taskAttempt which is either (a) in the process of committing output or (b) committed the output but fails to report to either of the AMs. In this case, two taskAttempts can be committing at the same time! In the same line, without recording the success of a commit after a task finishes committing, we will run into issues. h4. Conflicting TaskAttemptIDs Today, we launch containers first and then record it in JobHistory. Because of this, if the previous AM started a TaskAttempt but crashed before recording it in JobHistory, and this oldTaskAttempt somehow cannot get reconnected to the new AM due to network issues, the new AM generates the same TaskAttemptID for a newer attempt and they both will collide on HDFS and/or the local NM output directories if they both happen to run on the same machine. The above problem will be worse when speculative tasks are involved. h4. Security AM should use the same job-token as the previous incarnation otherwise the old running tasks will get authentication failures. I quickly checked and it seems like the AM itself generates the token, which means the second AM will generate a different one and all running tasks will fail to sync back! h4. Others bq. In the WP case, upon a loss of connection to the AM the tasks will try and reestablish the connection with the new AM. This will not suffice. It is possible even today, but when a network partition occurs and two AMs end up running at the same time and give commit-go permission to two TaskAttempts of the same task, they will collide on the output-commit. h4. General comments This stuff is hard. Even if we forget about the AM discovery problem, I am sure others will find a bunch of other design considerations you may be missing now. I'd suggest spending more time on the design, atleast on some of the areas I pointed above and then create a branch, create sub-tasks, do some prototypes etc. > Work Preserving AM Restart for MapReduce > ---------------------------------------- > > Key: MAPREDUCE-6608 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6608 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Srikanth Sampath > Assignee: Srikanth Sampath > Attachments: Patch1.patch, WorkPreservingMRAppMaster-1.pdf, > WorkPreservingMRAppMaster-2.pdf, WorkPreservingMRAppMaster.pdf > > > Providing a framework for work preserving AM is achieved in > [YARN-1489|https://issues.apache.org/jira/browse/YARN-1489]. We would like > to take advantage of this for MapReduce(MR) applications. There are some > challenges which have been described in the attached document and few options > discussed. We solicit feedback from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)