[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965476#comment-13965476 ]
Karthik Kambatla commented on MAPREDUCE-5652: --------------------------------------------- Approach looks good. Comments: # How do we handle applications that finish while the NM is down? # Code related to initStateStore should ideally go into serviceInit(), primarily to future-proof against us supporting (re)starting stopped services. # Use the constant {{JOB}} here? {code} iter.seek(bytes("job")); {code} # ShuffleHandler#recordJobShuffleInfo: addJobToken() should come after attempt to include in the store? Fail early if we can't write to the store for any reason. The place where we call this method, we catch-ignore all exceptions. # ShuffleHandler#close() should probably take care of clearing the static maps. Alternately, we could just make those maps non-static. # ShuffleHander#forgetJob() - should we make those two maps # Do we need to change hadoop-mapreduce-project/pom.xml, given we already add the dependencies in the shuffle module? # Nice test. > NM Recovery. ShuffleHandler should handle NM restarts > ----------------------------------------------------- > > Key: MAPREDUCE-5652 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 2.2.0 > Reporter: Karthik Kambatla > Assignee: Jason Lowe > Labels: shuffle > Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, > MAPREDUCE-5652-v4.patch, MAPREDUCE-5652.patch > > > ShuffleHandler should work across NM restarts and not require re-running > map-tasks. On NM restart, the map outputs are cleaned up requiring > re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)