virajjasani commented on code in PR #5432:
URL: https://github.com/apache/hadoop/pull/5432#discussion_r1117883225


##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java:
##########
@@ -1101,15 +1099,12 @@ public void testReportBadBlocks() throws Exception {
 
       block = DFSTestUtil.getFirstBlock(fs, filePath);
       // Test for the overloaded method reportBadBlocks
-      dataNode.reportBadBlocks(block, dataNode.getFSDataset()
-          .getFsVolumeReferences().get(0));
-      Thread.sleep(3000);
-      BlockManagerTestUtil.updateState(cluster.getNamesystem()
-          .getBlockManager());
-      // Verify the bad block has been reported to namenode
-      Assert.assertEquals(1, 
cluster.getNamesystem().getCorruptReplicaBlocks());
-    } finally {
-      cluster.shutdown();
+      dataNode.reportBadBlocks(block, 
dataNode.getFSDataset().getFsVolumeReferences().get(0));
+      GenericTestUtils.waitFor(() -> {
+        
BlockManagerTestUtil.updateState(cluster.getNamesystem().getBlockManager());
+        // Verify the bad block has been reported to namenode
+        return 1 == cluster.getNamesystem().getCorruptReplicaBlocks();
+      }, 100, 10000, "Corrupted replica blocks could not be found");

Review Comment:
   Basically what I am trying to say is that whether we should also consider 
increasing wait time here, by say 500/1000 ms instead of 100 ms?
   
   ```
     void triggerHeartbeatForTests() {
       synchronized (ibrManager) {
         final long nextHeartbeatTime = scheduler.scheduleHeartbeat();
         ibrManager.notifyAll();
         while (nextHeartbeatTime - scheduler.nextHeartbeatTime >= 0) {
           try {
             ibrManager.wait(100);  <=== how about 500ms at least?
           } catch (InterruptedException e) {
             return;
           }
         }
       }
     }
   
   ```
   
   Edit: Anyways until we have concrete proof of heartbeat based tests being 
flaky, this change might not be useful, not for this Jira at least.
   I updated the test to reflect the heartbeat trigger as I am not able to see 
any failures with inconsistent corrupt replica number.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to