[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464677#comment-13464677 ] Hudson commented on HDFS-3860: -- Integrated in Hadoop-Hdfs-0.23-Build #387 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/387/]) svn merge -c 1378228 FIXES: HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1390632) Result = UNSTABLE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390632 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 0.23.4, 2.0.2-alpha > > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444078#comment-13444078 ] Hudson commented on HDFS-3860: -- Integrated in Hadoop-Mapreduce-trunk #1180 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1180/]) HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.2.0-alpha > > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444024#comment-13444024 ] Hudson commented on HDFS-3860: -- Integrated in Hadoop-Hdfs-trunk #1149 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1149/]) HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.2.0-alpha > > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443367#comment-13443367 ] Hudson commented on HDFS-3860: -- Integrated in Hadoop-Hdfs-trunk-Commit #2715 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2715/]) HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.2.0-alpha > > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443353#comment-13443353 ] Hudson commented on HDFS-3860: -- Integrated in Hadoop-Common-trunk-Commit #2651 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2651/]) HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.2.0-alpha > > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443338#comment-13443338 ] Hudson commented on HDFS-3860: -- Integrated in Hadoop-Mapreduce-trunk-Commit #2680 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2680/]) HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.2.0-alpha > > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443292#comment-13443292 ] Jing Zhao commented on HDFS-3860: - I just checked all the invocation of namesystem#writelock / writeunlock, and did not find similar problems. I will check other similar code too. > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.2.0-alpha > > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443289#comment-13443289 ] Suresh Srinivas commented on HDFS-3860: --- Thanks Aaron for committing the patch. bq. BTW could you please also ensure that this pattern of code is not repeated in any other places. Going back to my previous comment, Jing, if possible can you also see if there other such issues. > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.2.0-alpha > > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443271#comment-13443271 ] Aaron T. Myers commented on HDFS-3860: -- Oof, good catch, Jing. Fortunately this case seems like it would be pretty tough to hit, since if the NN is in SM then HeartbeatManager#heartbeatCheck will return early, so to hit this the NN would have to enter SM in a very short window of time. Still certainly worth fixing, though. The patch looks good to me. The findbugs warning is unrelated and TestHftpDelegationToken is known to currently be failing. +1, I'll commit this momentarily. > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443105#comment-13443105 ] Hadoop QA commented on HDFS-3860: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542695/HDFS-3860.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3106//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3106//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3106//console This message is automatically generated. > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443056#comment-13443056 ] Suresh Srinivas commented on HDFS-3860: --- Jing, nice find. Submitting the patch. > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443058#comment-13443058 ] Suresh Srinivas commented on HDFS-3860: --- BTW could you please also ensure that this pattern of code is not repeated in any other places. > HeartbeatManager#Monitor may wrongly hold the writelock of namesystem > - > > Key: HDFS-3860 > URL: https://issues.apache.org/jira/browse/HDFS-3860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch > > > In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the > monitor thread will acquire the write lock of namesystem, and recheck the > safemode. If it is in safemode, the monitor thread will return from the > heartbeatCheck function without release the write lock. This may cause the > monitor thread wrongly holding the write lock forever. > The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira