Lin Yiqun created HDFS-9865: ------------------------------- Summary: TestBlockReplacement fails intermittently in trunk Key: HDFS-9865 URL: https://issues.apache.org/jira/browse/HDFS-9865 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.7.1 Reporter: Lin Yiqun Assignee: Lin Yiqun
I found the testcase {{TestBlockReplacement}} will be failed sometimes in testing. And I looked the unit log, always I will found these infos: {code} org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement testDeletedBlockWhenAddBlockIsInEdit(org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement) Time elapsed: 8.764 sec <<< FAILURE! java.lang.AssertionError: The block should be only on 1 datanode expected:<1> but was:<2> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.testDeletedBlockWhenAddBlockIsInEdit(TestBlockReplacement.java:436) {code} Finally I found the reason is that not deleting block completely in testDeletedBlockWhenAddBlockIsInEdit cause the datanode's num not correct. And the time to wait FsDatasetAsyncDsikService to delete the block is not a accurate value. {code} LOG.info("replaceBlock: " + replaceBlock(block, (DatanodeInfo)sourceDnDesc, (DatanodeInfo)sourceDnDesc, (DatanodeInfo)destDnDesc)); // Waiting for the FsDatasetAsyncDsikService to delete the block Thread.sleep(3000); {code} When I adjust this time to 1 seconds, it will be always failed. Also the 3 seconds in test is not a accurate value too. We should adjust these code's logic to a better way such as waiting for the block to be replicated in testDecommision. -- This message was sent by Atlassian JIRA (v6.3.4#6332)