[ https://issues.apache.org/jira/browse/HDFS-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17720409#comment-17720409 ]
ASF GitHub Bot commented on HDFS-16999: --------------------------------------- zhangshuyan0 commented on code in PR #5622: URL: https://github.com/apache/hadoop/pull/5622#discussion_r1187085999 ########## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java: ########## @@ -2066,4 +2069,55 @@ public void testValidateReconstructionWorkAndRacksNotEnough() { // validateReconstructionWork return false, need to perform resetTargets(). assertNull(work.getTargets()); } + + /** + * Test whether the first block report after DataNode restart is completely + * processed. + */ + @Test + public void testBlockReportAfterDataNodeRestart() throws Exception { + Configuration conf = new HdfsConfiguration(); + try (MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf) + .numDataNodes(3).storagesPerDatanode(1).build()) { + cluster.waitActive(); + BlockManager blockManager = cluster.getNamesystem().getBlockManager(); + DistributedFileSystem fs = cluster.getFileSystem(); + final Path filePath = new Path("/tmp.txt"); + final long fileLen = 1L; + DFSTestUtil.createFile(fs, filePath, fileLen, (short) 3, 1L); + DFSTestUtil.waitForReplication(fs, filePath, (short) 3, 60000); + ArrayList<DataNode> datanodes = cluster.getDataNodes(); + assertEquals(datanodes.size(), 3); + + // Stop RedundancyMonitor. + blockManager.setInitializedReplQueues(false); + + // Delete the replica on the first datanode. + File dnDir = + datanodes.get(0).getFSDataset().getVolumeList().get(0) + .getCurrentDir(); + String[] children = FileUtil.list(dnDir); + for (String s : children) { + if (!s.equals("VERSION")) { + FileUtil.fullyDeleteContents(new File(dnDir, s)); + } + } + + // The number of replicas is still 3 because the datanode has not sent + // a new block report. + FileStatus stat = fs.getFileStatus(filePath); + BlockLocation[] locs = fs.getFileBlockLocations(stat, 0, stat.getLen()); + assertEquals(3, locs[0].getHosts().length); + + // Restart the first datanode. + cluster.restartDataNode(0, true); + + // Wait for the block report to be processed. + Thread.sleep(5000); Review Comment: Thanks for your review, I'll check it. > Fix wrong use of processFirstBlockReport() > ------------------------------------------ > > Key: HDFS-16999 > URL: https://issues.apache.org/jira/browse/HDFS-16999 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Shuyan Zhang > Assignee: Shuyan Zhang > Priority: Major > Labels: pull-request-available > > `processFirstBlockReport()` is used to process first block report from > datanode. It does not calculating `toRemove` list because it believes that > there is no metadata about the datanode in the namenode. However, If a > datanode is re registered after restarting, its `blockReportCount` will be > updated to 0. That is to say, the first block report after a datanode > restarts will be processed by `processFirstBlockReport()`. This is > unreasonable because the metadata of the datanode already exists in namenode > at this time, and if redundant replica metadata is not removed in time, the > blocks with insufficient replicas cannot be reconstruct in time, which > increases the risk of missing block. In summary, `processFirstBlockReport()` > should only be used when the namenode restarts, not when the datanode > restarts. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org