[jira] [Commented] (HDFS-16999) Fix wrong use of processFirstBlockReport()

ASF GitHub Bot (Jira) Sun, 07 May 2023 21:55:06 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17720385#comment-17720385
 ]


ASF GitHub Bot commented on HDFS-16999:
---------------------------------------

ayushtkn commented on code in PR #5622:
URL: https://github.com/apache/hadoop/pull/5622#discussion_r1187022192


##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java:
##########
@@ -2066,4 +2069,55 @@ public void 
testValidateReconstructionWorkAndRacksNotEnough() {
     // validateReconstructionWork return false, need to perform resetTargets().
     assertNull(work.getTargets());
   }
+
+  /**
+   * Test whether the first block report after DataNode restart is completely
+   * processed.
+   */
+  @Test
+  public void testBlockReportAfterDataNodeRestart() throws Exception {
+    Configuration conf = new HdfsConfiguration();
+    try (MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf)
+           .numDataNodes(3).storagesPerDatanode(1).build()) {
+      cluster.waitActive();
+      BlockManager blockManager = cluster.getNamesystem().getBlockManager();
+      DistributedFileSystem fs = cluster.getFileSystem();
+      final Path filePath = new Path("/tmp.txt");
+      final long fileLen = 1L;
+      DFSTestUtil.createFile(fs, filePath, fileLen, (short) 3, 1L);
+      DFSTestUtil.waitForReplication(fs, filePath, (short) 3, 60000);
+      ArrayList<DataNode> datanodes = cluster.getDataNodes();
+      assertEquals(datanodes.size(), 3);
+
+      // Stop RedundancyMonitor.
+      blockManager.setInitializedReplQueues(false);
+
+      // Delete the replica on the first datanode.
+      File dnDir =
+          datanodes.get(0).getFSDataset().getVolumeList().get(0)
+              .getCurrentDir();
+      String[] children = FileUtil.list(dnDir);
+      for (String s : children) {
+        if (!s.equals("VERSION")) {
+          FileUtil.fullyDeleteContents(new File(dnDir, s));
+        }
+      }
+
+      // The number of replicas is still 3 because the datanode has not sent
+      // a new block report.
+      FileStatus stat = fs.getFileStatus(filePath);
+      BlockLocation[] locs = fs.getFileBlockLocations(stat, 0, stat.getLen());
+      assertEquals(3, locs[0].getHosts().length);
+
+      // Restart the first datanode.
+      cluster.restartDataNode(0, true);
+
+      // Wait for the block report to be processed.
+      Thread.sleep(5000);

Review Comment:
   can you use specific wait methods or at worst some GenericTestUtils.waitFor 
instead of a specific sleep. MiniDfsCluster has some 
``waitDataNodeFullyStarted`` or ``waitFirstBRCompleted``, check if you can use 
that.





> Fix wrong use of processFirstBlockReport()
> ------------------------------------------
>
>                 Key: HDFS-16999
>                 URL: https://issues.apache.org/jira/browse/HDFS-16999
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Shuyan Zhang
>            Assignee: Shuyan Zhang
>            Priority: Major
>              Labels: pull-request-available
>
> `processFirstBlockReport()` is used to process first block report from 
> datanode. It does not calculating `toRemove` list because it believes that 
> there is no metadata about the datanode in the namenode. However, If a 
> datanode is re registered after restarting, its `blockReportCount` will be 
> updated to 0. That is to say, the first block report after a datanode 
> restarts will be processed by `processFirstBlockReport()`.  This is 
> unreasonable because the metadata of the datanode already exists in namenode 
> at this time, and if redundant replica metadata is not removed in time, the 
> blocks with insufficient replicas cannot be reconstruct in time, which 
> increases the risk of missing block. In summary, `processFirstBlockReport()` 
> should only be used when the namenode restarts, not when the datanode 
> restarts. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16999) Fix wrong use of processFirstBlockReport()

Reply via email to