[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996771#comment-13996771 ] Hadoop QA commented on MAPREDUCE-5652: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12644621/MAPREDUCE-5652-v10.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4601//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4601//console This message is automatically generated. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v10.patch, MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652-v9-and-YARN-1987.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13999002#comment-13999002 ] Hudson commented on MAPREDUCE-5652: --- SUCCESS: Integrated in Hadoop-trunk-Commit #5605 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5605/]) MAPREDUCE-5652. NM Recovery. ShuffleHandler should handle NM restarts. (Jason Lowe via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594329) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/LICENSE.txt * /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobID.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/pom.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/proto * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/proto/ShuffleHandlerRecovery.proto * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/pom.xml NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Fix For: 2.5.0 Attachments: MAPREDUCE-5652-v10.patch, MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652-v9-and-YARN-1987.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998170#comment-13998170 ] Hudson commented on MAPREDUCE-5652: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/]) MAPREDUCE-5652. NM Recovery. ShuffleHandler should handle NM restarts. (Jason Lowe via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594329) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/LICENSE.txt * /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobID.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/pom.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/proto * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/proto/ShuffleHandlerRecovery.proto * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/pom.xml NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Fix For: 2.5.0 Attachments: MAPREDUCE-5652-v10.patch, MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652-v9-and-YARN-1987.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996497#comment-13996497 ] Karthik Kambatla commented on MAPREDUCE-5652: - +1, pending Jenkins. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v10.patch, MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652-v9-and-YARN-1987.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998208#comment-13998208 ] Hudson commented on MAPREDUCE-5652: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/]) MAPREDUCE-5652. NM Recovery. ShuffleHandler should handle NM restarts. (Jason Lowe via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594329) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/LICENSE.txt * /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobID.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/pom.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/proto * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/proto/ShuffleHandlerRecovery.proto * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/pom.xml NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Fix For: 2.5.0 Attachments: MAPREDUCE-5652-v10.patch, MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652-v9-and-YARN-1987.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1399#comment-1399 ] Karthik Kambatla commented on MAPREDUCE-5652: - [~jlowe] - now that YARN-1987 is committed, mind updating the patch against the latest trunk. We should be able to get this in soon. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652-v9-and-YARN-1987.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13987668#comment-13987668 ] Jason Lowe commented on MAPREDUCE-5652: --- bq. Just to confirm, protobuf should be backward compatible, e.g., the store state serialized with version 2.4 should be readable by NM/MR compiled with version 2.5. Yes, the protobuf incompatibility between 2.4 and 2.5 is an issue with the interfaces to the protobuf code and not an incompatibility with the data layout of protobuf messages. bq. On an unrelated note, based on how NM's AuxServices' serviceStart handles error for each AuxService' serviceStart, if one AuxService throws some exception, the rest of AuxServices' serviceStart will be skipped. I might be reading the code incorrectly, but it looks like AuxServices#serviceStart doesn't try to handle exceptions coming from individual aux services at all. If one of their startups throws then it will be converted into a RuntimeException (by AbstractService#start) which will bubble up out of AuxServices and likely all the way up such that the NM startup will fail. As you pointed out before, a better way to handle aux services would be to run them outside of the NM (maybe even within containers). It'd also be nice to make them more dynamic, such that application submissions can provide an aux service they require. They could be started on demand, ref-counted, and stopped accordingly based on usage, which would be a smoother answer to rolling upgrades and sharing multiple versions of the aux services within the cluster. This is of course a non-trivial amount of work for another JIRA. ;-) NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652-v9-and-YARN-1987.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13987115#comment-13987115 ] Ming Ma commented on MAPREDUCE-5652: Sounds good, we can use a new jira to cover best effort work. The patch looks good. Just to confirm, protobuf should be backward compatible, e.g., the store state serialized with version 2.4 should be readable by NM/MR compiled with version 2.5. On an unrelated note, based on how NM's AuxServices' serviceStart handles error for each AuxService' serviceStart, if one AuxService throws some exception, the rest of AuxServices' serviceStart will be skipped. That isn't important given we only have one AuxService. Perhaps there is some policy around that as well, should NM skip failed AuxService? It seems in general we might need to improve AuxService handling if there are other AuxServices. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652-v9-and-YARN-1987.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13987248#comment-13987248 ] Hadoop QA commented on MAPREDUCE-5652: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642848/MAPREDUCE-5652-v9-and-YARN-1987.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4573//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4573//console This message is automatically generated. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652-v9-and-YARN-1987.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985932#comment-13985932 ] Ming Ma commented on MAPREDUCE-5652: 1. Regarding generic interface for restore/recover, I agree there is no much benefit to generalize things for the sake of it. One scenario could be something like ShuffleHandler, some ShuffleHandlers support recovery, some don't. NM can ask if a specific ShuffleHandler if it supports recovery, NM will manage the underlying store and pass the store object to ShuffleHandler and ShuffleHandler manages the serialization and deserialization, etc. If NM decides to change the underlying store and ShuffleHandler doesn't need to change. But at this point, it seems unnecessary. 2. If ShuffleHandler gets DBException during recoverState as part of serviceStart, should ShuffleHandler ignore the exception and continue like the store doesn't exist? The argument for ignoring it is it is soft state and ShuffleHandler can still run without it. Or maybe this can be configurable. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981063#comment-13981063 ] Hadoop QA commented on MAPREDUCE-5652: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641927/MAPREDUCE-5652-v8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4558//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4558//console This message is automatically generated. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981076#comment-13981076 ] Karthik Kambatla commented on MAPREDUCE-5652: - +1 to the wrapper approach. We should probably go ahead and make it a util class, and ping yarn-dev@ so other features using LevelDB are aware of this. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981114#comment-13981114 ] Jason Lowe commented on MAPREDUCE-5652: --- As for the utility class, where would it go? I'm assuming yarn-server-common, otherwise we end up with clients picking it up. We'd have to add the leveldb dependency there which is unfortunate for those servers that don't really need it (i.e.: resourcemanager, webproxy). NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981357#comment-13981357 ] Karthik Kambatla commented on MAPREDUCE-5652: - yarn-server-common seems the best place. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979920#comment-13979920 ] Hadoop QA commented on MAPREDUCE-5652: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641736/MAPREDUCE-5652-v7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4554//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4554//console This message is automatically generated. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977938#comment-13977938 ] Ming Ma commented on MAPREDUCE-5652: Nice work. Jason, I would like to clarify how the following scenarios are handled. Perhaps they are covered at the YARN layer as part of https://issues.apache.org/jira/browse/YARN-1336. 1. NM crash scenario. There is a corner case, after RM notifies NM regarding the completion of a specific application, right before AuxServices get the chance to process the event, NM crashes. The app entry won't be removed after the recovery store after NM is restarted, as APPLICATION_STOP won't be delivered to NM for that application after NM restart. 2. NM graceful shutdown. It seems ContainerManagerImpl's serviceStop will generate ContainerManagerEventType.FINISH_APPS event. That means AuxServices could clean up and remove it from the recovery store as part of NM shutdown. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13978561#comment-13978561 ] Jason Lowe commented on MAPREDUCE-5652: --- Thanks for the feedback, Ming! For the NM crash scenario above we need YARN-1336 so applications are persisted outside of an individual aux service. Once that's present, it remembers when applications are finishing and persists that before responding to the NM heartbeat telling it of the apps that have just finished. Upon recovery it will recover the application and re-send the finish event which will send the app stop event to the aux services. For the graceful shutdown scenario we need YARN-1336 and/or YARN-1362. Either YARN-1336 will never send app stop events upon NM shutdown if recovery is enabled or we need to be able to distinguish between a graceful NM shutdown and an NM kill/crash to know whether to send the app stop event to aux services. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13978830#comment-13978830 ] Ming Ma commented on MAPREDUCE-5652: Thanks, Jason. It is good to know it will be taken care of at YARN layer. I will post some more comments at YARN-1336. 1. Does leveDB's delete method throw exception? JNI has some exception handling and the caller needs to retrieve the exceptions, etc. 2. It seems like recover/restore are common in NM/RM restart. Any abstract interface defined for that? NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977065#comment-13977065 ] Hadoop QA commented on MAPREDUCE-5652: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641270/MAPREDUCE-5652-v6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4543//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4543//console This message is automatically generated. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968971#comment-13968971 ] Karthik Kambatla commented on MAPREDUCE-5652: - bq. However that's outside of the scope (and project) of this JIRA. I'll try to address that in YARN-1354 or possibly a separate YARN JIRA. Thanks for the clarification. Let us track this in YARN-1354. bq. serviceStart() indirectly calls initStateStore to open the database, and serviceStop() closes the database. I see. Makes sense to leave it in serviceStart(). Do you think renaming initStateStore to initAndOpenStore or startStore is reasonable? bq. Maybe may need the equivalent of YARN-888 for hadoop-mapreduce-project poms to only declare them in the leaf modules and still have them be picked up properly. MAPREDUCE-5362 . Let us try to get that in first. Mind taking a look? I ll also look at it shortly. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967206#comment-13967206 ] Hadoop QA commented on MAPREDUCE-5652: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12639878/MAPREDUCE-5652-v5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4505//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4505//console This message is automatically generated. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965478#comment-13965478 ] Karthik Kambatla commented on MAPREDUCE-5652: - Point 6 above - should we make those two maps non-static? NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965476#comment-13965476 ] Karthik Kambatla commented on MAPREDUCE-5652: - Approach looks good. Comments: # How do we handle applications that finish while the NM is down? # Code related to initStateStore should ideally go into serviceInit(), primarily to future-proof against us supporting (re)starting stopped services. # Use the constant {{JOB}} here? {code} iter.seek(bytes(job)); {code} # ShuffleHandler#recordJobShuffleInfo: addJobToken() should come after attempt to include in the store? Fail early if we can't write to the store for any reason. The place where we call this method, we catch-ignore all exceptions. # ShuffleHandler#close() should probably take care of clearing the static maps. Alternately, we could just make those maps non-static. # ShuffleHander#forgetJob() - should we make those two maps # Do we need to change hadoop-mapreduce-project/pom.xml, given we already add the dependencies in the shuffle module? # Nice test. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
[ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963376#comment-13963376 ] Hadoop QA commented on MAPREDUCE-5652: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12639250/MAPREDUCE-5652-v4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4492//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4492//console This message is automatically generated. NM Recovery. ShuffleHandler should handle NM restarts - Key: MAPREDUCE-5652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Jason Lowe Labels: shuffle Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652.patch ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided. -- This message was sent by Atlassian JIRA (v6.2#6252)