[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978561#comment-13978561
 ] 

Jason Lowe commented on MAPREDUCE-5652:
---------------------------------------

Thanks for the feedback, Ming!

For the NM crash scenario above we need YARN-1336 so applications are persisted 
outside of an individual aux service.  Once that's present, it remembers when 
applications are finishing and persists that before responding to the NM 
heartbeat telling it of the apps that have just finished.  Upon recovery it 
will recover the application and re-send the finish event which will send the 
app stop event to the aux services.

For the graceful shutdown scenario we need YARN-1336 and/or YARN-1362.  Either 
YARN-1336 will never send app stop events upon NM shutdown if recovery is 
enabled or we need to be able to distinguish between a graceful NM shutdown and 
an NM kill/crash to know whether to send the app stop event to aux services.

> NM Recovery. ShuffleHandler should handle NM restarts
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-5652
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Jason Lowe
>              Labels: shuffle
>         Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, 
> MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, 
> MAPREDUCE-5652.patch
>
>
> ShuffleHandler should work across NM restarts and not require re-running 
> map-tasks. On NM restart, the map outputs are cleaned up requiring 
> re-execution of map tasks and should be avoided.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to