[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845293#comment-13845293 ] Hudson commented on HDFS-5504: -- FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/418/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 2.2.0 Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5504.patch, HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845389#comment-13845389 ] Hudson commented on HDFS-5504: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 2.2.0 Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5504.patch, HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845444#comment-13845444 ] Hudson commented on HDFS-5504: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 2.2.0 Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5504.patch, HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844901#comment-13844901 ] Hudson commented on HDFS-5504: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4859 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4859/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Priority: Blocker Fix For: 2.4.0 Attachments: HDFS-5504.patch, HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822319#comment-13822319 ] Hudson commented on HDFS-5504: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #391 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/391/]) HDFS-5504. In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Contributed by Vinay. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1541773) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotDeletion.java In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5504.patch, HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822410#comment-13822410 ] Hudson commented on HDFS-5504: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1608 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1608/]) HDFS-5504. In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Contributed by Vinay. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1541773) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotDeletion.java In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5504.patch, HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822435#comment-13822435 ] Hudson commented on HDFS-5504: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1582 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1582/]) HDFS-5504. In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Contributed by Vinay. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1541773) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotDeletion.java In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5504.patch, HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13821354#comment-13821354 ] Hadoop QA commented on HDFS-5504: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12613578/HDFS-5504.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5421//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5421//console This message is automatically generated. In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5504.patch, HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822045#comment-13822045 ] Jing Zhao commented on HDFS-5504: - +1. I will commit the patch shortly. In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5504.patch, HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822058#comment-13822058 ] Vinay commented on HDFS-5504: - Thanks Jing for the review and commit In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5504.patch, HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822089#comment-13822089 ] Hudson commented on HDFS-5504: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4733 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4733/]) HDFS-5504. In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Contributed by Vinay. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1541773) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotDeletion.java In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5504.patch, HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820426#comment-13820426 ] Hadoop QA commented on HDFS-5504: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12613367/HDFS-5504.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5399//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5399//console This message is automatically generated. In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820638#comment-13820638 ] Jing Zhao commented on HDFS-5504: - The patch looks good to me. One minor is that removePathAndBlocks already holds the FSNS write lock, and with the patch we will acquire the FSNS write lock again inside removePathAndBlocks when calling removeBlocks. Can we avoid the double locking here and still reuse the code? Maybe we can define new methods just to reuse the following code: {code} for (int i = 0; i BLOCK_DELETION_INCREMENT iter.hasNext(); i++) { Block b = iter.next(); if (trackBlockCounts) { BlockInfo bi = getStoredBlock(b); if (bi.isComplete()) { numRemovedComplete++; if (bi.numNodes() = blockManager.minReplication) { numRemovedSafe++; } } } blockManager.removeBlock(b); } {code} In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820767#comment-13820767 ] Vinay commented on HDFS-5504: - Hi Jing, Thanks for reviewing the patch. I thought about keeping the old code in {{removePathAndBlocks()}}. When we see the code, locking will happen only during loading edits. This can be only in startup or while tailing edits in SNN. So locking and unlocking again for every 1000 blocks may not be problem in my opinion. If still require updation, I will upload the patch addressing this. In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)