[ https://issues.apache.org/jira/browse/MAPREDUCE-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105267#comment-15105267 ]
Junping Du commented on MAPREDUCE-6608: --------------------------------------- Thanks [~srikanth.sampath] and [~raju.bairishetti] for proposing this JIRA and upload a design document. This work could be a significant improvement to our MapReduce framework reliability. Go through the current design doc, I think store new attempt address for MR AM in zookeeper could have scalability issues in case MR job has massive running tasks (ten thousands or more). I think it could be better to store/get new MR AM location from HDFS which has better scalability. Also, in my understanding, Yarn Service Registry may not best fit for this case. CC [~ste...@apache.org] who is author of YSR. I could propose another version of design with more details in next few days in case we haven't started the development work yet. > Work Preserving AM Restart for MapReduce > ---------------------------------------- > > Key: MAPREDUCE-6608 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6608 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Srikanth Sampath > Assignee: Raju Bairishetti > Attachments: WorkPreservingMRAppMaster.pdf > > > Providing a framework for work preserving AM is achieved in > [YARN-1489|https://issues.apache.org/jira/browse/YARN-1489]. We would like > to take advantage of this for MapReduce(MR) applications. There are some > challenges which have been described in the attached document and few options > discussed. We solicit feedback from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)