[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167140#comment-15167140
 ] 

Srikanth Sampath commented on MAPREDUCE-6608:
---------------------------------------------

Thanks [~djp].  Few points:
1. The reads will only be for the inflight tasks out of the large MR job.  That 
said, it is possible for it to be large - for example multiple AMs fail.

2. The read path in this case is required for communication between the task 
containers and the AM (not between task containers).  So indeed it is a subset 
of the cases addressed in 
[YARN-4602|https://issues.apache.org/jira/browse/YARN-4602].

3. We need more details on how 
[YARN-4602|https://issues.apache.org/jira/browse/YARN-4602] will be addressed.  
What's the latency for the payload to make it from the new AM to the registry 
(RM) and then to the NMs.  How will the task containers fetch the new address. 
Should we still have the registry based read path work as a fallback.

I will be very happy to work with you in parallel on this.  

> Work Preserving AM Restart for MapReduce
> ----------------------------------------
>
>                 Key: MAPREDUCE-6608
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6608
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Srikanth Sampath
>            Assignee: Srikanth Sampath
>         Attachments: Patch1.patch, WorkPreservingMRAppMaster-1.pdf, 
> WorkPreservingMRAppMaster-2.pdf, WorkPreservingMRAppMaster.pdf
>
>
> Providing a framework for work preserving AM is achieved in 
> [YARN-1489|https://issues.apache.org/jira/browse/YARN-1489].  We would like 
> to take advantage of this for MapReduce(MR) applications.  There are some 
> challenges which have been described in the attached document and few options 
> discussed.  We solicit feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to