[jira] [Commented] (MAPREDUCE-6608) Work Preserving AM Restart for MapReduce

Vinod Kumar Vavilapalli (JIRA) Fri, 15 Apr 2016 19:14:16 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243948#comment-15243948
 ]


Vinod Kumar Vavilapalli commented on MAPREDUCE-6608:
----------------------------------------------------

[~srikanth.sampath] / [~djp],

Got around to reading the design doc attached - there are a few important 
details that aren't covered in the doc, besides the AM discovery problem itself

h4. Output Commit of previous tasks
The new AM needs to make sure that output of previously running containers can 
be safely committed. IIRC, with today's FileOutputCommitter, new AM will only 
promote task-outputs that are present in 
$jobOutput/_temporary/$currentAttemptID/

Similar changes may be needed for other OutputCommitters out there.

h4. Task Output Commit races

It doesn't look like we record task-commit in JobHistory, so it is possible 
that the previous AM gave a commit go-ahead to a taskAttempt which is either 
(a) in the process of committing output or (b) committed the output but fails 
to report to either of the AMs. In this case, two taskAttempts can be 
committing at the same time!

In the same line, without recording the success of a commit after a task 
finishes committing, we will run into issues.

h4. Conflicting TaskAttemptIDs

Today, we launch containers first and then record it in JobHistory. Because of 
this, if the previous AM started a TaskAttempt but crashed before recording it 
in JobHistory, and this oldTaskAttempt somehow cannot get reconnected to the 
new AM due to network issues, the new AM generates the same TaskAttemptID for a 
newer attempt and they both will collide on HDFS and/or the local NM output 
directories if they both happen to run on the same machine.

The above problem will be worse when speculative tasks are involved.

h4. Security
AM should use the same job-token as the previous incarnation otherwise the old 
running tasks will get authentication failures. I quickly checked and it seems 
like the AM itself generates the token, which means the second AM will generate 
a different one and all running tasks will fail to sync back!

h4. Others
bq. In the WP case, upon a loss of connection to the AM the tasks will try and 
reestablish the connection with the new AM.
This will not suffice. It is possible even today, but when a network partition 
occurs and two AMs end up running at the same time and give commit-go 
permission to two TaskAttempts of the same task, they will collide on the 
output-commit.

h4. General comments
This stuff is hard. Even if we forget about the AM discovery problem, I am sure 
others will find a bunch of other design considerations you may be missing now.

I'd suggest spending more time on the design, atleast on some of the areas I 
pointed above and then create a branch, create sub-tasks, do some prototypes 
etc.

> Work Preserving AM Restart for MapReduce
> ----------------------------------------
>
>                 Key: MAPREDUCE-6608
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6608
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Srikanth Sampath
>            Assignee: Srikanth Sampath
>         Attachments: Patch1.patch, WorkPreservingMRAppMaster-1.pdf, 
> WorkPreservingMRAppMaster-2.pdf, WorkPreservingMRAppMaster.pdf
>
>
> Providing a framework for work preserving AM is achieved in 
> [YARN-1489|https://issues.apache.org/jira/browse/YARN-1489].  We would like 
> to take advantage of this for MapReduce(MR) applications.  There are some 
> challenges which have been described in the attached document and few options 
> discussed.  We solicit feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6608) Work Preserving AM Restart for MapReduce

Reply via email to