[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939344#comment-13939344 ] Hudson commented on HDFS-6094: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1730 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1730/]) HDFS-6094. The same block can be counted twice towards safe mode threshold. (Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1578478) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageReceivedDeletedBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestIncrementalBrVariations.java > The same block can be counted twice towards safe mode threshold > --- > > Key: HDFS-6094 > URL: https://issues.apache.org/jira/browse/HDFS-6094 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 3.0.0, 2.4.0 > > Attachments: HDFS-6094.03.patch, HDFS-6904.01.patch, > TestHASafeMode-output.txt > > > {{BlockManager#addStoredBlock}} can cause the same block can be counted > towards safe mode threshold. We see this manifest via > {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More > details to follow in a comment. > Exception details: > {code} > Time elapsed: 12.874 sec <<< FAILURE! > java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported > blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of > live datanodes 3 has reached the minimum number 0. Safe mode will be turned > off automatically in 28 seconds.' > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939222#comment-13939222 ] Hudson commented on HDFS-6094: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1705 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1705/]) HDFS-6094. The same block can be counted twice towards safe mode threshold. (Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1578478) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageReceivedDeletedBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestIncrementalBrVariations.java > The same block can be counted twice towards safe mode threshold > --- > > Key: HDFS-6094 > URL: https://issues.apache.org/jira/browse/HDFS-6094 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 3.0.0, 2.4.0 > > Attachments: HDFS-6094.03.patch, HDFS-6904.01.patch, > TestHASafeMode-output.txt > > > {{BlockManager#addStoredBlock}} can cause the same block can be counted > towards safe mode threshold. We see this manifest via > {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More > details to follow in a comment. > Exception details: > {code} > Time elapsed: 12.874 sec <<< FAILURE! > java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported > blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of > live datanodes 3 has reached the minimum number 0. Safe mode will be turned > off automatically in 28 seconds.' > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939065#comment-13939065 ] Hudson commented on HDFS-6094: -- FAILURE: Integrated in Hadoop-Yarn-trunk #513 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/513/]) HDFS-6094. The same block can be counted twice towards safe mode threshold. (Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1578478) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageReceivedDeletedBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestIncrementalBrVariations.java > The same block can be counted twice towards safe mode threshold > --- > > Key: HDFS-6094 > URL: https://issues.apache.org/jira/browse/HDFS-6094 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 3.0.0, 2.4.0 > > Attachments: HDFS-6094.03.patch, HDFS-6904.01.patch, > TestHASafeMode-output.txt > > > {{BlockManager#addStoredBlock}} can cause the same block can be counted > towards safe mode threshold. We see this manifest via > {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More > details to follow in a comment. > Exception details: > {code} > Time elapsed: 12.874 sec <<< FAILURE! > java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported > blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of > live datanodes 3 has reached the minimum number 0. Safe mode will be turned > off automatically in 28 seconds.' > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938095#comment-13938095 ] Hudson commented on HDFS-6094: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5339 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5339/]) HDFS-6094. The same block can be counted twice towards safe mode threshold. (Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1578478) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageReceivedDeletedBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestIncrementalBrVariations.java > The same block can be counted twice towards safe mode threshold > --- > > Key: HDFS-6094 > URL: https://issues.apache.org/jira/browse/HDFS-6094 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 3.0.0, 2.4.0 > > Attachments: HDFS-6094.03.patch, HDFS-6904.01.patch, > TestHASafeMode-output.txt > > > {{BlockManager#addStoredBlock}} can cause the same block can be counted > towards safe mode threshold. We see this manifest via > {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More > details to follow in a comment. > Exception details: > {code} > Time elapsed: 12.874 sec <<< FAILURE! > java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported > blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of > live datanodes 3 has reached the minimum number 0. Safe mode will be turned > off automatically in 28 seconds.' > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938081#comment-13938081 ] Jing Zhao commented on HDFS-6094: - The latest patch looks good to me. +1. > The same block can be counted twice towards safe mode threshold > --- > > Key: HDFS-6094 > URL: https://issues.apache.org/jira/browse/HDFS-6094 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-6094.03.patch, HDFS-6904.01.patch, > TestHASafeMode-output.txt > > > {{BlockManager#addStoredBlock}} can cause the same block can be counted > towards safe mode threshold. We see this manifest via > {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More > details to follow in a comment. > Exception details: > {code} > Time elapsed: 12.874 sec <<< FAILURE! > java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported > blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of > live datanodes 3 has reached the minimum number 0. Safe mode will be turned > off automatically in 28 seconds.' > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935862#comment-13935862 ] Arpit Agarwal commented on HDFS-6094: - The warnings are expected due to new deprecations. We can fix the test warnings later. > The same block can be counted twice towards safe mode threshold > --- > > Key: HDFS-6094 > URL: https://issues.apache.org/jira/browse/HDFS-6094 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-6094.03.patch, HDFS-6904.01.patch, > TestHASafeMode-output.txt > > > {{BlockManager#addStoredBlock}} can cause the same block can be counted > towards safe mode threshold. We see this manifest via > {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More > details to follow in a comment. > Exception details: > {code} > Time elapsed: 12.874 sec <<< FAILURE! > java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported > blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of > live datanodes 3 has reached the minimum number 0. Safe mode will be turned > off automatically in 28 seconds.' > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935859#comment-13935859 ] Hadoop QA commented on HDFS-6094: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634845/HDFS-6094.03.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1540 javac compiler warnings (more than the trunk's current 1531 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6407//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6407//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6407//console This message is automatically generated. > The same block can be counted twice towards safe mode threshold > --- > > Key: HDFS-6094 > URL: https://issues.apache.org/jira/browse/HDFS-6094 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-6094.03.patch, HDFS-6904.01.patch, > TestHASafeMode-output.txt > > > {{BlockManager#addStoredBlock}} can cause the same block can be counted > towards safe mode threshold. We see this manifest via > {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More > details to follow in a comment. > Exception details: > {code} > Time elapsed: 12.874 sec <<< FAILURE! > java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported > blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of > live datanodes 3 has reached the minimum number 0. Safe mode will be turned > off automatically in 28 seconds.' > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935500#comment-13935500 ] Arpit Agarwal commented on HDFS-6094: - Jing, I think it is a good idea to learn about storages from the IBR. One issue with doing so is that the storage type and state are not known while processing the IBR. We can assume some defaults but this can lead to bugs since the type and state can be used to make replication decisions. I think we need to enhance the incremental report protocol to send the storage type and state along with the storage ID. Then we can safely create a new storage entry. For protocol compatibility we can assume defaults if the type and state are not provided. I am going to code up the patch. Thanks for the ideas! > The same block can be counted twice towards safe mode threshold > --- > > Key: HDFS-6094 > URL: https://issues.apache.org/jira/browse/HDFS-6094 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-6904.01.patch, TestHASafeMode-output.txt > > > {{BlockManager#addStoredBlock}} can cause the same block can be counted > towards safe mode threshold. We see this manifest via > {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More > details to follow in a comment. > Exception details: > {code} > Time elapsed: 12.874 sec <<< FAILURE! > java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported > blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of > live datanodes 3 has reached the minimum number 0. Safe mode will be turned > off automatically in 28 seconds.' > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935303#comment-13935303 ] Hadoop QA commented on HDFS-6094: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634642/HDFS-6904.01.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6405//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6405//console This message is automatically generated. > The same block can be counted twice towards safe mode threshold > --- > > Key: HDFS-6094 > URL: https://issues.apache.org/jira/browse/HDFS-6094 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-6904.01.patch, TestHASafeMode-output.txt > > > {{BlockManager#addStoredBlock}} can cause the same block can be counted > towards safe mode threshold. We see this manifest via > {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More > details to follow in a comment. > Exception details: > {code} > Time elapsed: 12.874 sec <<< FAILURE! > java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported > blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of > live datanodes 3 has reached the minimum number 0. Safe mode will be turned > off automatically in 28 seconds.' > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935296#comment-13935296 ] Jing Zhao commented on HDFS-6094: - The patch looks good to me. One question is, currently NN adds info about a new datanode storage only when processing complete block report. Can we also do this for IBR? > The same block can be counted twice towards safe mode threshold > --- > > Key: HDFS-6094 > URL: https://issues.apache.org/jira/browse/HDFS-6094 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-6904.01.patch, TestHASafeMode-output.txt > > > {{BlockManager#addStoredBlock}} can cause the same block can be counted > towards safe mode threshold. We see this manifest via > {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More > details to follow in a comment. > Exception details: > {code} > Time elapsed: 12.874 sec <<< FAILURE! > java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported > blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of > live datanodes 3 has reached the minimum number 0. Safe mode will be turned > off automatically in 28 seconds.' > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934567#comment-13934567 ] Arpit Agarwal commented on HDFS-6094: - Hi Jing, thanks for taking a look at this! I agree that the additional check for {{added}} should fix the issue and I was thinking along the same lines earlier. However I wasn't sure about the timeline of steps leading up to the issue and I think you have a plausible explanation. With your findings I think we can extend the patch to have the NN reject IBR from a storage before the first block report has been received for the same storage. What do you think? > The same block can be counted twice towards safe mode threshold > --- > > Key: HDFS-6094 > URL: https://issues.apache.org/jira/browse/HDFS-6094 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: TestHASafeMode-output.txt > > > {{BlockManager#addStoredBlock}} can cause the same block can be counted > towards safe mode threshold. We see this manifest via > {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More > details to follow in a comment. > Exception details: > {code} > Time elapsed: 12.874 sec <<< FAILURE! > java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported > blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of > live datanodes 3 has reached the minimum number 0. Safe mode will be turned > off automatically in 28 seconds.' > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934472#comment-13934472 ] Jing Zhao commented on HDFS-6094: - Maybe another issue with the current code is that when an incremental block report comes before the full block report, if the stored block state is COMMITTED, we may increase the safemode total block number while not increase the safe block count. In that case I'm not sure if the NN can get stuck in the safemode. > The same block can be counted twice towards safe mode threshold > --- > > Key: HDFS-6094 > URL: https://issues.apache.org/jira/browse/HDFS-6094 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: TestHASafeMode-output.txt > > > {{BlockManager#addStoredBlock}} can cause the same block can be counted > towards safe mode threshold. We see this manifest via > {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More > details to follow in a comment. > Exception details: > {code} > Time elapsed: 12.874 sec <<< FAILURE! > java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported > blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of > live datanodes 3 has reached the minimum number 0. Safe mode will be turned > off automatically in 28 seconds.' > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934407#comment-13934407 ] Jing Zhao commented on HDFS-6094: - Another option is to add new storage id even for incremental block report. [~arpitagarwal], what do you think? > The same block can be counted twice towards safe mode threshold > --- > > Key: HDFS-6094 > URL: https://issues.apache.org/jira/browse/HDFS-6094 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > > {{BlockManager#addStoredBlock}} can cause the same block can be counted > towards safe mode threshold. We see this manifest via > {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More > details to follow in a comment. > Exception details: > {code} > Time elapsed: 12.874 sec <<< FAILURE! > java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported > blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of > live datanodes 3 has reached the minimum number 0. Safe mode will be turned > off automatically in 28 seconds.' > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934384#comment-13934384 ] Jing Zhao commented on HDFS-6094: - I can also reproduce the issue on my local machine. Looks like the issue is: 1. After the standby NN restarts, DN1 sends first the incremental block report then the complete block report to SBN. 2. DN2 sends the incremental block report to SBN. This block report will not change the replica number in SBN because the corresponding storage ID has not been added in SBN yet (the storage ID will only be added during the full block report processing). However, the SBN still checks the current live replica number (which is 1 because SBN already received the full block report from DN1) and use the number to update the safe block count. So maybe a simple fix can be: {code} @@ -2277,7 +2277,7 @@ private Block addStoredBlock(final BlockInfo block, if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED && numLiveReplicas >= minReplication) { storedBlock = completeBlock(bc, storedBlock, false); -} else if (storedBlock.isComplete()) { +} else if (storedBlock.isComplete() && added) { // check whether safe replication is reached for the block // only complete blocks are counted towards that // Is no-op if not in safe mode. {code} > The same block can be counted twice towards safe mode threshold > --- > > Key: HDFS-6094 > URL: https://issues.apache.org/jira/browse/HDFS-6094 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > > {{BlockManager#addStoredBlock}} can cause the same block can be counted > towards safe mode threshold. We see this manifest via > {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More > details to follow in a comment. > Exception details: > {code} > Time elapsed: 12.874 sec <<< FAILURE! > java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported > blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of > live datanodes 3 has reached the minimum number 0. Safe mode will be turned > off automatically in 28 seconds.' > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932341#comment-13932341 ] Arpit Agarwal commented on HDFS-6094: - No concrete diagnosis of this issue yet, I am still investigating. > The same block can be counted twice towards safe mode threshold > --- > > Key: HDFS-6094 > URL: https://issues.apache.org/jira/browse/HDFS-6094 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > > {{BlockManager#addStoredBlock}} can cause the same block can be counted > towards safe mode threshold. We see this manifest via > {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More > details to follow in a comment. > Exception details: > {code} > Time elapsed: 12.874 sec <<< FAILURE! > java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported > blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of > live datanodes 3 has reached the minimum number 0. Safe mode will be turned > off automatically in 28 seconds.' > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)