[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts

Karthik Kambatla (JIRA) Thu, 10 Apr 2014 08:55:36 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965476#comment-13965476
 ]


Karthik Kambatla commented on MAPREDUCE-5652:
---------------------------------------------

Approach looks good. Comments:
# How do we handle applications that finish while the NM is down? 
# Code related to initStateStore should ideally go into serviceInit(), 
primarily to future-proof against us supporting (re)starting stopped services.
# Use the constant {{JOB}} here?
{code}
      iter.seek(bytes("job"));
{code}
# ShuffleHandler#recordJobShuffleInfo: addJobToken() should come after attempt 
to include in the store? Fail early if we can't write to the store for any 
reason. The place where we call this method, we catch-ignore all exceptions.
# ShuffleHandler#close() should probably take care of clearing the static maps. 
Alternately, we could just make those maps non-static.
# ShuffleHander#forgetJob() - should we make those two maps
# Do we need to change hadoop-mapreduce-project/pom.xml, given we already add 
the dependencies in the shuffle module?
# Nice test.

> NM Recovery. ShuffleHandler should handle NM restarts
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-5652
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Jason Lowe
>              Labels: shuffle
>         Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, 
> MAPREDUCE-5652-v4.patch, MAPREDUCE-5652.patch
>
>
> ShuffleHandler should work across NM restarts and not require re-running 
> map-tasks. On NM restart, the map outputs are cleaned up requiring 
> re-execution of map tasks and should be avoided.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts

Reply via email to