[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239428#comment-13239428 ] Hudson commented on MAPREDUCE-3353: --- Integrated in Hadoop-Hdfs-trunk #997 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/997/]) MAPREDUCE-3353. Fixed commit msg to point to right jira. (Revision 1305457) Result = FAILURE acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1305457 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.3 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239475#comment-13239475 ] Hudson commented on MAPREDUCE-3353: --- Integrated in Hadoop-Mapreduce-trunk #1032 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1032/]) MAPREDUCE-3353. Fixed commit msg to point to right jira. (Revision 1305457) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1305457 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.3 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238393#comment-13238393 ] Robert Joseph Evans commented on MAPREDUCE-3353: Arun, It looks like you put in MAPREDUCE-3533 instead of MAPREDUCE-3353 in CHANGES.txt, could you please fix it. Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.3 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238601#comment-13238601 ] Hudson commented on MAPREDUCE-3353: --- Integrated in Hadoop-Common-trunk-Commit #1930 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1930/]) MAPREDUCE-3353. Fixed commit msg to point to right jira. (Revision 1305457) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1305457 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.3 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238603#comment-13238603 ] Hudson commented on MAPREDUCE-3353: --- Integrated in Hadoop-Hdfs-0.23-Commit #719 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/719/]) MAPREDUCE-3353. Fixed commit msg to point to right jira. (Revision 1305458) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1305458 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.3 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238617#comment-13238617 ] Hudson commented on MAPREDUCE-3353: --- Integrated in Hadoop-Hdfs-trunk-Commit #2005 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2005/]) MAPREDUCE-3353. Fixed commit msg to point to right jira. (Revision 1305457) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1305457 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.3 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238624#comment-13238624 ] Hudson commented on MAPREDUCE-3353: --- Integrated in Hadoop-Common-0.23-Commit #729 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/729/]) MAPREDUCE-3353. Fixed commit msg to point to right jira. (Revision 1305458) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1305458 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.3 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238683#comment-13238683 ] Hudson commented on MAPREDUCE-3353: --- Integrated in Hadoop-Mapreduce-0.23-Commit #738 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/738/]) MAPREDUCE-3353. Fixed commit msg to point to right jira. (Revision 1305458) Result = ABORTED acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1305458 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.3 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238703#comment-13238703 ] Hudson commented on MAPREDUCE-3353: --- Integrated in Hadoop-Mapreduce-trunk-Commit #1941 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1941/]) MAPREDUCE-3353. Fixed commit msg to point to right jira. (Revision 1305457) Result = ABORTED acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1305457 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.3 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233248#comment-13233248 ] Hadoop QA commented on MAPREDUCE-3353: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519012/MAPREDUCE-3353-branch-0.23.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 17 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 507 javac compiler warnings (more than the trunk's current 505 warnings). +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2074//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2074//console This message is automatically generated. Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.2 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233361#comment-13233361 ] Arun C Murthy commented on MAPREDUCE-3353: -- Bikas, can you pls look at the javac warnings? Thanks. Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.2 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233568#comment-13233568 ] Bikas Saha commented on MAPREDUCE-3353: --- They have already been clarified in the first patch submission. Pasting from an earlier comment. The javac warnings are because of events handlers being called in NodesListManager.java and are similar to pre-existing warnings. === [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java:[142,19] [unchecked] unchecked call to handle(T) as a member of the raw type org.apache.hadoop.yarn.event.EventHandler [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java:[155,21] [unchecked] unchecked call to handle(T) as a member of the raw type org.apache.hadoop.yarn.event.EventHandler [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java:[244,35] [unchecked] unchecked call to handle(T) as a member of the raw type org.apache.hadoop.yarn.event.EventHandler === Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.2 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233157#comment-13233157 ] Hadoop QA commented on MAPREDUCE-3353: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519006/MAPREDUCE-3353-branch-0.23.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 20 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 507 javac compiler warnings (more than the trunk's current 505 warnings). +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2071//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2071//console This message is automatically generated. Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.2 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233178#comment-13233178 ] Hadoop QA commented on MAPREDUCE-3353: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519012/MAPREDUCE-3353-branch-0.23.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 17 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 507 javac compiler warnings (more than the trunk's current 505 warnings). +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2072//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2072//console This message is automatically generated. Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.2 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228739#comment-13228739 ] Arun C Murthy commented on MAPREDUCE-3353: -- Looks good, some minor nits: # RegisterApplicationMasterResponse.(get,set)UnusableNodes is not used right now, so let's add it later when we need it. # AMResponse.getUpdatedNodes could use javadocs. # BuilderUtils.createNodeReport # RMContextImpl.applications shud be changed to ConcurrentSkipListMap to be safe - we should open a separate jira to fix the signature of RMContext.get* which return ConcurrentMap # RMAppImpl.pullRMNodeUpdates needs a writeLock since it's clearing # RMAppImpl shud ignore NodeUpdate in COMPLETED state (thus we can remove the 'if' condition in RMAppNodeUpdateTransition). Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.2 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222848#comment-13222848 ] Hadoop QA commented on MAPREDUCE-3353: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517166/MAPREDUCE-3353-branch-0.23.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 29 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 503 javac compiler warnings (more than the trunk's current 501 warnings). +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2008//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2008//console This message is automatically generated. Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.2 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222947#comment-13222947 ] Bikas Saha commented on MAPREDUCE-3353: --- The test failure is unrelated to the patch and happens on trunk. MAPREDUCE-3976. Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.2 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222111#comment-13222111 ] Amol Kekre commented on MAPREDUCE-3353: --- Not sure why this jira is marked critical. This only impacts if a node goes bad during AM life span right? If so given 3 attempts by MR, how important this jira (Major?). Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219337#comment-13219337 ] Amol Kekre commented on MAPREDUCE-3353: --- any updates? Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Priority: Critical Fix For: 0.23.2 When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219400#comment-13219400 ] Bikas Saha commented on MAPREDUCE-3353: --- The changes turned out to be more than initially expected. I have the code done and will start on the tests. Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Priority: Critical Fix For: 0.23.2 When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215306#comment-13215306 ] Bikas Saha commented on MAPREDUCE-3353: --- A potential solution would be the following 1) have the scheduler interface return the set of bad nodes on which it has stopped scheduling. This keeps the decision of which node is bad in the scheduler. The scheduler is the ultimate authority on what runs on a node and should tell its clients whether about the nodes that it is not considering for scheduling. 2) 1) above could be done as another interface API or piggybacked on the scheduler.allocate() API. 3) The response could contain all the known bad nodes or deltas to the previous response. Deltas are cheaper to send but are susceptible to message loss and retransmission. Also, deltas would have to be divided into new bad nodes and new good nodes. 4) The AM might want to know the type of bad node. Say lost or unhealthy etc. The bad nodes information could be enhanced via querying the RMNode object for the actual reason/health. As an enhancement, we could add a new RMNodeMananger entity that manages all the RMNodes. The above functionality could move from the scheduler into RMNodeManager (though it would need to be in sync with the scheduler). After that, getting detailed information may not need direct access to RMNode object. Potentially, other interactions with RMNode could be forwarded through the RMNodeManager. But this would be a fairly significant refactoring thats best left to a separate future work item. Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Priority: Critical Fix For: 0.23.2 When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215328#comment-13215328 ] Bikas Saha commented on MAPREDUCE-3353: --- Not doing deltas on the RM-AM channel does not seem viable because of high frequency message traffic. Sending information about 100 bad nodes at 100 bytes per node for 1000AM's every second is about 10MB/s of traffic. Sending deltas means tracking last and current states on the RM on a per AM attempt basis. That would not be good to do in the scheduler because its not the responsibility of the scheduler. So this needs to be done on each RMAttempt object. The RMAttempt object gets the current list of bad nodes and compares it with its last known list of bad nodes. Additions and deletions are sent to the AM as new bad and good nodes. Alternatively, each RMNode could send an event to each RMAppAttempt for healthy-unhealthy and vice versa transitions. These events could be accumulated and copied to the AM via the allocate response. Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Priority: Critical Fix For: 0.23.2 When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212963#comment-13212963 ] Amol Kekre commented on MAPREDUCE-3353: --- Vinod, Should this be in a .23.1 RC or can we move it to .23.2? Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Critical Fix For: 0.23.1 When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira