[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845289#comment-13845289 ] Hudson commented on HDFS-5283: -- FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/418/]) Move HDFS-5283 to section branch-2.3.0 (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550032) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Critical Fix For: 2.3.0 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845385#comment-13845385 ] Hudson commented on HDFS-5283: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/]) Move HDFS-5283 to section branch-2.3.0 (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550032) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Critical Fix For: 2.3.0 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845440#comment-13845440 ] Hudson commented on HDFS-5283: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/]) Move HDFS-5283 to section branch-2.3.0 (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550032) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Critical Fix For: 2.3.0 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845024#comment-13845024 ] Hudson commented on HDFS-5283: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4862 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4862/]) Move HDFS-5283 to section branch-2.3.0 (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550032) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Critical Fix For: 2.3.0 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797778#comment-13797778 ] Hudson commented on HDFS-5283: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #365 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/365/]) Add TestOpenFilesWithSnapshot.java for HDFS-5283. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532860) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestOpenFilesWithSnapshot.java HDFS-5283. Under construction blocks only inside snapshots should not be counted in safemode threshhold. Contributed by Vinay (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532857) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Namesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Fix For: 2.3.0 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797878#comment-13797878 ] Hudson commented on HDFS-5283: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1555 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1555/]) Add TestOpenFilesWithSnapshot.java for HDFS-5283. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532860) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestOpenFilesWithSnapshot.java HDFS-5283. Under construction blocks only inside snapshots should not be counted in safemode threshhold. Contributed by Vinay (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532857) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Namesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Fix For: 2.3.0 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797939#comment-13797939 ] Hudson commented on HDFS-5283: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1581 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1581/]) Add TestOpenFilesWithSnapshot.java for HDFS-5283. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532860) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestOpenFilesWithSnapshot.java HDFS-5283. Under construction blocks only inside snapshots should not be counted in safemode threshhold. Contributed by Vinay (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532857) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Namesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Fix For: 2.3.0 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796661#comment-13796661 ] Hadoop QA commented on HDFS-5283: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608680/HDFS-5283.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDFSUpgradeFromImage org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot org.apache.hadoop.hdfs.TestDecommission The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5208//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5208//console This message is automatically generated. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796863#comment-13796863 ] Hadoop QA commented on HDFS-5283: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608707/HDFS-5283.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5210//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5210//console This message is automatically generated. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797078#comment-13797078 ] Tsz Wo (Nicholas), SZE commented on HDFS-5283: -- +1 patch looks good. Since isInSnapshot() is being called holding the writeLock, hasReadlock() returning false ... It is a bug. Let's fix it separately. I will file a JIRA. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797164#comment-13797164 ] Hudson commented on HDFS-5283: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4612 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4612/]) Add TestOpenFilesWithSnapshot.java for HDFS-5283. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532860) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestOpenFilesWithSnapshot.java HDFS-5283. Under construction blocks only inside snapshots should not be counted in safemode threshhold. Contributed by Vinay (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532857) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Namesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Fix For: 2.3.0 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797590#comment-13797590 ] Vinay commented on HDFS-5283: - Thanks Nicholas for the reviews and commit. Thanks Jing for reviews. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Fix For: 2.3.0 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794971#comment-13794971 ] Hadoop QA commented on HDFS-5283: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608435/HDFS-5283.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5191//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5191//console This message is automatically generated. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795500#comment-13795500 ] Tsz Wo (Nicholas), SZE commented on HDFS-5283: -- - In FSNamesystem.isInSnapshot(..), we can safely assume the blockUC is non-null and put blockUC.getBlockCollection() in a local variable in the very beginning. Also the assert should be hasReadLock() instead of hasWriteLock() since the method does not write anything. {code} //FSNamesystem public boolean isInSnapshot(BlockInfoUnderConstruction blockUC) { assert hasReadLock(); final BlockCollection bc = blockUC.getBlockCollection(); if (bc == null || !(bc instanceof INodeFileUnderConstruction)) { return false; } final INodeFileUnderConstruction inodeUC = (INodeFileUnderConstruction) bc; ... } {code} - For DFSOutputStream.abort(), it is better to add DFSTestUtil.abortStream(..) than change it to public. Sorry that I did not see this previously. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792996#comment-13792996 ] Tsz Wo (Nicholas), SZE commented on HDFS-5283: -- Vinay, thanks for working on this. Some comments: The new method added to Namesystem is better to # pass BlockInfoUnderConstruction, # call it as isInSnapshot, and # do not throw IOException. i.e. {code} //Namesystem.java public boolean isInSnapshot(BlockInfoUnderConstruction block); {code} In the implementation in FSNamesystem, it should try-catch the UnresolvedLinkException and log it as an error since the full path obtained from a file should not have unresolved link. Second question: Why adding DFSTestUtil.abortStream(..)? It does not look very useful. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789688#comment-13789688 ] Jing Zhao commented on HDFS-5283: - The patch looks good to me. The only concern is that the extra check in our current solution may affect the performance of NN starting up. [~szetszwo], could you also take a look at the patch? NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787940#comment-13787940 ] Vinay commented on HDFS-5283: - One more doubt here, do we need to update the safemode blocksafe count on all block reports of a DN? current patch does only one first report. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788023#comment-13788023 ] Hadoop QA commented on HDFS-5283: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607132/HDFS-5283.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5121//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5121//console This message is automatically generated. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786021#comment-13786021 ] Tsz Wo (Nicholas), SZE commented on HDFS-5283: -- ... My only doubt is, why there are different behaviors with file delete and directory delete. Not changing Inodes recursively was intentional or its an issue? It is intentional. Otherwise, the running time of recordModification(..) becomes O(subtree size). For a non-WithSnapshot INode, the state (in current state or in some snapshot state) is determined by its parent. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786051#comment-13786051 ] Vinay commented on HDFS-5283: - bq. It is intentional. Otherwise, the running time of recordModification(..) becomes O(subtree size). For a non-WithSnapshot INode, the state (in current state or in some snapshot state) is determined by its parent. Thanks for the explanation Nicholas. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786427#comment-13786427 ] Jing Zhao commented on HDFS-5283: - bq. instead used ((BlockInfoUnderConstruction) storedBlock).getNumExpectedLocations(), so test was passing. This should also work actually. I failed to get the actual behavior when generating the new patch. I think your fix there should be better. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786429#comment-13786429 ] Jing Zhao commented on HDFS-5283: - bq. get the actual behavior I mean, get the actual result of the getNumExpectedLocations call. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785809#comment-13785809 ] Jing Zhao commented on HDFS-5283: - bq. We need to have some reference which tells that the BlockCollection resides inside the snapshot if we are not able to find outside. I think In directory delete case with snapshot, changing the Inode types recursively is necessary. This keeps the behaviour of both cases ( file deletion and directory deletion ) in consistent. So in our current solution, for each BlockCollection (which is an INodeUC) in the blocksMap, we first check if it's in the current fsdir tree. Here our claim is, if the inode is not in the current tree (i.e., we cannot identify the node's absolute full path or the node with the absolute full path in the current fsdir tree is actually not the node stored in the blocksMap), this inode should be a file only existing in snapshot, no matter this node is instance of INodeUCWithSnapshot or not. If this claim stands, to convert an INodeUC to an INodeUCWithSnapshot during deletion will be unnecessary. bq. storedBlock.addNode(node); Without this the new test will fail when setting DN number to a 1 value. Currently when NN receives the first block report from a DN, for each block in the report, it will check its total number of available replica, and if the number is EQUAL to the minimum required replica number, it increases the blockSafe value by 1 in the safemodeInfo. Thus here when we call namesystem.incrementSafeBlockCount(numOfReplicas), the numOfReplicas must be the current available replica's number. Otherwise we will miss the EQUAL case and fail to increase the blockSafe number. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785892#comment-13785892 ] Vinay commented on HDFS-5283: - bq. So in our current solution, for each BlockCollection (which is an INodeUC) in the blocksMap, we first check if it's in the current fsdir tree. Here our claim is, if the inode is not in the current tree (i.e., we cannot identify the node's absolute full path or the node with the absolute full path in the current fsdir tree is actually not the node stored in the blocksMap), this inode should be a file only existing in snapshot, no matter this node is instance of INodeUCWithSnapshot or not. If this claim stands, to convert an INodeUC to an INodeUCWithSnapshot during deletion will be unnecessary. I agree that the current solution mentioned here will work for this issue. My only doubt is, why there are different behaviors with file delete and directory delete. Not changing Inodes recursively was intentional or its an issue? bq. Without this the new test will fail when setting DN number to a 1 value. I got it. Got confused because in my earlier patch I haven't used {{countLiveNodes(storedBlock);}} instead used {{((BlockInfoUnderConstruction) storedBlock).getNumExpectedLocations()}}, so test was passing. But will it not be better if we use {{((BlockInfoUnderConstruction) storedBlock).getNumExpectedLocations()}} instead of {{storedBlock.addNode(node)}} and {{countLiveNodes(storedBlock);}}.. as in my patch ? .. Any problems you seeing in that ? NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784826#comment-13784826 ] Vinay commented on HDFS-5283: - Thanks zing for the update on patch. bq. For dir-deletion scenario, instead of changing the current snapshot code (i.e., to convert all the INodeFileUC under the deleted dir to INodeFIleUCWithSnapshot), We need to have some reference which tells that the BlockCollection resides inside the snapshot if we are not able to find outside. I think In directory delete case with snapshot, changing the Inode types recursively is necessary. This keeps the behaviour of both cases ( file deletion and directory deletion ) in consistent. Any thoughts..? I have small queries on the patch: 1. {{storedBlock.addNode(node);}} is it required.? This is done only for the finalized blocks. 2. {noformat} /* * 1. if bc is an instance of INodeFileUnderConstructionWithSnapshot, and * bc is not in the current fsdirectory tree, bc must represent a snapshot * file. * 2. if fullName is not an absolute path, bc cannot be existent in the * current fsdirectory tree. * 3. if bc is not the current node associated with fullName, bc must be a * snapshot inode. */ return true;{noformat} As of current code for all three cases returning true without explicitly checking holds good. But in any of the future changes in the code related in this area we will still get true as return value. Do you think we need to check something here.? NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782995#comment-13782995 ] Hadoop QA commented on HDFS-5283: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606091/HDFS-5283.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5073//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5073//console This message is automatically generated. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783061#comment-13783061 ] Hadoop QA commented on HDFS-5283: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606098/HDFS-5283.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots org.apache.hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5074//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5074//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5074//console This message is automatically generated. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783465#comment-13783465 ] Jing Zhao commented on HDFS-5283: - Looks like my patch will fail some tests. Will update the patch later. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783605#comment-13783605 ] Hadoop QA commented on HDFS-5283: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606236/HDFS-5283.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5082//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5082//console This message is automatically generated. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)