[jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated MAPREDUCE-5652: Resolution: Fixed Fix Version/s: 2.5.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the contribution and patience with multiple reviews, Jason. Just committed this to trunk and branch-2. > NM Recovery. ShuffleHandler should handle NM restarts > - > > Key: MAPREDUCE-5652 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Jason Lowe > Labels: shuffle > Fix For: 2.5.0 > > Attachments: MAPREDUCE-5652-v10.patch, MAPREDUCE-5652-v2.patch, > MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, > MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, > MAPREDUCE-5652-v9-and-YARN-1987.patch, MAPREDUCE-5652.patch > > > ShuffleHandler should work across NM restarts and not require re-running > map-tasks. On NM restart, the map outputs are cleaned up requiring > re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5652: -- Attachment: MAPREDUCE-5652-v10.patch Updated patch to trunk. > NM Recovery. ShuffleHandler should handle NM restarts > - > > Key: MAPREDUCE-5652 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Jason Lowe > Labels: shuffle > Attachments: MAPREDUCE-5652-v10.patch, MAPREDUCE-5652-v2.patch, > MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, > MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, > MAPREDUCE-5652-v9-and-YARN-1987.patch, MAPREDUCE-5652.patch > > > ShuffleHandler should work across NM restarts and not require re-running > map-tasks. On NM restart, the map outputs are cleaned up requiring > re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5652: -- Attachment: MAPREDUCE-5652-v9-and-YARN-1987.patch Filed YARN-1987 to cover the DBIterator wrapper and updating the patch to use that new wrapper class. Note that the patch includes YARN-1987 so Jenkins can comment. bq. If ShuffleHandler gets DBException during recoverState as part of serviceStart, should ShuffleHandler ignore the exception and continue like the store doesn't exist? Failure to recover should be a rare situation where the DB is corrupted/inaccessible or there's some schema incompatibility between versions if an upgrade occurs during the NM downtime. It should be investigated and corrected, otherwise the errors will likely be glossed over and we will continue to fail to shuffle across NM restarts from that point forward despite the user specifying otherwise. We could add a config to request a "best effort" mode where it will continue despite the inability to recover, but is that an NM-wide config, a config just for the shuffle handler, or something else? If we want a config to control this I propose we address it in a followup JIRA. > NM Recovery. ShuffleHandler should handle NM restarts > - > > Key: MAPREDUCE-5652 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Jason Lowe > Labels: shuffle > Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, > MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, > MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, > MAPREDUCE-5652-v9-and-YARN-1987.patch, MAPREDUCE-5652.patch > > > ShuffleHandler should work across NM restarts and not require re-running > map-tasks. On NM restart, the map outputs are cleaned up requiring > re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5652: -- Attachment: MAPREDUCE-5652-v8.patch Sigh. I discovered that leveldb's DBIterator isn't consistent with the DB interface and throws raw RuntimeException rather than the derived DBException. That means whenever we're interacting with the database via the iterator we risk leaking what should be caught as a DBException since it's a raw RuntimeException instead. There's a few approaches I considered to work around this: # Catch RuntimeException rather than DBException for the code blocks that interact with the iterator. # Catch RuntimeException and if the cause is NativeDB.DBException then throw an IOException otherwise rethrow the original exception. # Wrap DBIterator in a private wrapper class which catches RuntimeException for each method invoked and rethrows it as DBException. I dismissed the first approach since it's too ham-fisted. We're likely to catch NPEs and other unrelated RuntimeException and handle them as if they were leveldb errors. The second approach has the drawback that it knows a bit too much about the DBIterator implementation in that it's digging into the RuntimeException looking for a specific cause. If the cause were to switch to the iq80 DBException or some other type then we'd leak it instead of converting it. Therefore I went with the third approach. It's still catching raw RuntimeException like the first approach, but it has the advantage that the try..catch block is localized to just the iterator method being invoked. Also if leveldb's iterator is ever fixed in the future to throw DBException then we can simply remove the wrapper rather than change all the try..catch code blocks that work with the iterator. > NM Recovery. ShuffleHandler should handle NM restarts > - > > Key: MAPREDUCE-5652 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Jason Lowe > Labels: shuffle > Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, > MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, > MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652.patch > > > ShuffleHandler should work across NM restarts and not require re-running > map-tasks. On NM restart, the map outputs are cleaned up requiring > re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5652: -- Attachment: MAPREDUCE-5652-v7.patch bq. 1. Does leveDB's delete method throw exception? JNI has some exception handling and the caller needs to retrieve the exceptions, etc. Nice catch! I didn't notice there were _two_ DBExceptions flying around in leveldb code. org.fusesource.leveldbjni.internal.NativeDB.DBException comes from the JNI layer and derives from IOException, and it was the one I was familiar with. However the wrapper code around the JNI layer catches that exception and rethrows it as org.iq80.leveldb.DBException which is a RuntimeException. That means we need to wrap all calls that can throw the runtime form and either handle them directly or rethrow as an IOException if it's not appropriate to let the RuntimeException leak out of the method. Updated the patch to deal with the runtime DBException when necessary. I'll also have to make similar changes in the NMLevelDBStateStore for the other NM restart patches. bq. 2. It seems like recover/restore are common in NM/RM restart. Any abstract interface defined for that? They both support recovery but the forms in which they do it are very different (e.g.: types of state persisted are significantly different, backing store types have no overlap, etc.) There could be a generic Recoverable interface that supports a recover() method, but I'm not sure what value that adds. Did you have a particular interface in mind or ideas on how it would be used? > NM Recovery. ShuffleHandler should handle NM restarts > - > > Key: MAPREDUCE-5652 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Jason Lowe > Labels: shuffle > Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, > MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, > MAPREDUCE-5652-v7.patch, MAPREDUCE-5652.patch > > > ShuffleHandler should work across NM restarts and not require re-running > map-tasks. On NM restart, the map outputs are cleaned up requiring > re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5652: -- Attachment: MAPREDUCE-5652-v6.patch bq. Do you think renaming initStateStore to initAndOpenStore or startStore is reasonable? Changed it to startStore. bq. MAPREDUCE-5362 . Let us try to get that in first. Mind taking a look? Posted some comments to that JIRA but haven't seen any activity for a bit. In the interim this patch works without the changes from MAPREDUCE-5362. If MAPREDUCE-5362 happens to go in before this is committed then I'll update it accordingly. > NM Recovery. ShuffleHandler should handle NM restarts > - > > Key: MAPREDUCE-5652 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Jason Lowe > Labels: shuffle > Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, > MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, > MAPREDUCE-5652.patch > > > ShuffleHandler should work across NM restarts and not require re-running > map-tasks. On NM restart, the map outputs are cleaned up requiring > re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5652: -- Attachment: MAPREDUCE-5652-v5.patch Thanks for the review, Karthik! bq. How do we handle applications that finish while the NM is down? The RM is already telling the NM about finished applications, and since it's not creating a new RMNodeImpl when it rejoins it should continue to tell it about subsequent finished applications since the last time the node heartbeated. There may be some races in there where the RM thinks the NM processed the heartbeat response but the NM crashed before it completed processing it. If that's an issue we could fix it by having the NM send the list of apps it thinks are still active when it re-registers, giving the RM an opportunity to correct it on the next heartbeat. However that's outside of the scope (and project) of this JIRA. I'll try to address that in YARN-1354 or possibly a separate YARN JIRA. bq. Code related to initStateStore should ideally go into serviceInit(), primarily to future-proof against us supporting (re)starting stopped services. serviceStart() indirectly calls initStateStore to open the database, and serviceStop() closes the database. If we want later to support restarting stopped services then we need to continue to open the database in serviceStart(). Moving the initStateStore call to init rather than start means we will try to use a closed database after the service is restarted, or am I missing something? bq. Use the constant JOB here? Fixed. I had to change its visibility to public in order to access it. bq. ShuffleHandler#recordJobShuffleInfo: addJobToken() should come after attempt to include in the store? Fixed. bq. ShuffleHandler#close() should probably take care of clearing the static maps. Alternately, we could just make those maps non-static. I made userRsrc and secretManager regular members rather than static. Originally I wasn't sure why they were static and didn't want to mess with that too much as part of this JIRA, but it's problematic for testing this. Apparently they always were static, but I couldn't find any reason for them to be so. Even in light of multiple ShuffleHandler instances, I don't see why it's something we need (or necessarily even want) to share. Manually ran the unit tests under hadoop-mapreduce-client-jobclient to verify there wasn't something fishy going on with the multi-instance ShuffleHandler in mini clusters or something like that. bq. ShuffleHander#forgetJob() - should we make those two maps non-static? Removed forgetJob() now that the members are not static. bq. Do we need to change hadoop-mapreduce-project/pom.xml, given we already add the dependencies in the shuffle module? Yes, if I remove the dependency from that pom.xml then the leveldb jar doesn't show up in the resulting dist tree under mapreduce/lib/. Maybe may need the equivalent of YARN-888 for hadoop-mapreduce-project poms to only declare them in the leaf modules and still have them be picked up properly. > NM Recovery. ShuffleHandler should handle NM restarts > - > > Key: MAPREDUCE-5652 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Jason Lowe > Labels: shuffle > Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, > MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652.patch > > > ShuffleHandler should work across NM restarts and not require re-running > map-tasks. On NM restart, the map outputs are cleaned up requiring > re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5652: -- Summary: NM Recovery. ShuffleHandler should handle NM restarts (was: ShuffleHandler should handle NM restarts) > NM Recovery. ShuffleHandler should handle NM restarts > - > > Key: MAPREDUCE-5652 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Jason Lowe > Labels: shuffle > Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, > MAPREDUCE-5652-v4.patch, MAPREDUCE-5652.patch > > > ShuffleHandler should work across NM restarts and not require re-running > map-tasks. On NM restart, the map outputs are cleaned up requiring > re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5652: -- Status: Patch Available (was: Open) > NM Recovery. ShuffleHandler should handle NM restarts > - > > Key: MAPREDUCE-5652 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Jason Lowe > Labels: shuffle > Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, > MAPREDUCE-5652-v4.patch, MAPREDUCE-5652.patch > > > ShuffleHandler should work across NM restarts and not require re-running > map-tasks. On NM restart, the map outputs are cleaned up requiring > re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)