[jira] [Commented] (HDFS-15654) TestBPOfferService#testMissBlocksWhenReregister fails intermittently

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626168#comment-17626168
 ] 

ASF GitHub Bot commented on HDFS-15654:
---

xinglin commented on PR #5089:
URL: https://github.com/apache/hadoop/pull/5089#issuecomment-1296088489

   It looks like four different unit tests failed the second time and there is 
no overlap between the first and second build. I guess unit tests were run in 
different orders between different builds and thus we see different unit tests 
failed each time. 
   
   @ashutoshcipher, I don't think these non-deterministic unit tests failures 
are related with this PR, since we only modified TestBPOfferService.java. What 
do you think? 




> TestBPOfferService#testMissBlocksWhenReregister fails intermittently
> 
>
> Key: HDFS-15654
> URL: https://issues.apache.org/jira/browse/HDFS-15654
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {{TestBPOfferService.testMissBlocksWhenReregister}}  is flaky. It fails 
> randomly when the 
> following expression is not true:
> {code:java}
>   assertTrue(fullBlockReportCount == totalTestBlocks ||
>   incrBlockReportCount == totalTestBlocks);
> {code}
> There is a race condition here that relies once more on "time" to synchronize 
> between concurrent threads. The code below is is causing the 
> non-deterministic execution.
> On a slow server, {{addNewBlockThread}} may not be done by the time the main 
> thread reach the assertion call.
> {code:java}
>   // Verify FBR/IBR count is equal to generate number.
>   assertTrue(fullBlockReportCount == totalTestBlocks ||
>   incrBlockReportCount == totalTestBlocks);
> } finally {
>   addNewBlockThread.join();
>   bpos.stop();
>   bpos.join();
> {code}
> Therefore, the correct implementation should wait for the thread to finish
> {code:java}
>  // the thread finished execution.
>  addNewBlockThread.join();
>   // Verify FBR/IBR count is equal to generate number.
>   assertTrue(fullBlockReportCount == totalTestBlocks ||
>   incrBlockReportCount == totalTestBlocks);
> } finally {
>   bpos.stop();
>   bpos.join();
> {code}
> {{DataNodeFaultInjector}} needs to have a longer wait_time too. 1 second is 
> not enough to satisfy the condition.
> {code:java}
>   DataNodeFaultInjector.set(new DataNodeFaultInjector() {
> public void blockUtilSendFullBlockReport() {
>   try {
> GenericTestUtils.waitFor(() -> {
>   if(count.get() > 2000) {
> return true;
>   }
>   return false;
> }, 100, 1); // increase that waiting time to 10 seconds.
>   } catch (Exception e) {
> e.printStackTrace();
>   }
> }
>   });
> {code}
> {code:bash}
> Stacktrace
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testMissBlocksWhenReregister(TestBPOfferService.java:350)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   

[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626156#comment-17626156
 ] 

ASF GitHub Bot commented on HDFS-16811:
---

tomscut commented on code in PR #5068:
URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008775990


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java:
##
@@ -137,6 +137,26 @@ public int getNumNodesChecked() {
 return numNodesChecked;
   }
 
+  @Override

Review Comment:
   Add `@VisibleForTesting`.



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java:
##
@@ -137,6 +137,26 @@ public int getNumNodesChecked() {
 return numNodesChecked;
   }
 
+  @Override
+  public int getPendingRepLimit() {
+return 0;
+  }
+
+  @Override
+  public void setPendingRepLimit(int pendingRepLimit) {
+// nothing.
+  }
+
+  @Override

Review Comment:
   Add `@VisibleForTesting`.





> Support to make dfs.namenode.decommission.backoff.monitor.pending.limit 
> reconfigurable 
> ---
>
> Key: HDFS-16811
> URL: https://issues.apache.org/jira/browse/HDFS-16811
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> When the Backoff monitor is enabled, the parameter 
> dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically 
> adjusted to determines the maximum number of blocks related to decommission 
> and maintenance operations that can be loaded into the replication queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626155#comment-17626155
 ] 

ASF GitHub Bot commented on HDFS-16811:
---

tomscut commented on code in PR #5068:
URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008775968


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminBackoffMonitor.java:
##
@@ -801,6 +801,23 @@ private boolean isBlockReplicatedOk(DatanodeDescriptor 
datanode,
 return false;
   }
 
+

Review Comment:
   Add `@VisibleForTesting`.



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminBackoffMonitor.java:
##
@@ -801,6 +801,23 @@ private boolean isBlockReplicatedOk(DatanodeDescriptor 
datanode,
 return false;
   }
 
+
+  public int getPendingRepLimit() {
+return pendingRepLimit;
+  }
+
+  public void setPendingRepLimit(int pendingRepLimit) {
+this.pendingRepLimit = pendingRepLimit;
+  }
+
+  public int getBlocksPerLock() {

Review Comment:
   Add `@VisibleForTesting`.





> Support to make dfs.namenode.decommission.backoff.monitor.pending.limit 
> reconfigurable 
> ---
>
> Key: HDFS-16811
> URL: https://issues.apache.org/jira/browse/HDFS-16811
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> When the Backoff monitor is enabled, the parameter 
> dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically 
> adjusted to determines the maximum number of blocks related to decommission 
> and maintenance operations that can be loaded into the replication queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626154#comment-17626154
 ] 

ASF GitHub Bot commented on HDFS-16811:
---

tomscut commented on code in PR #5068:
URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008775655


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java:
##
@@ -419,4 +419,28 @@ void runMonitorForTest() throws ExecutionException, 
InterruptedException {
 executor.submit(monitor).get();
   }
 
+  public void refreshPendingRepLimit(int pendingRepLimit, String key) {
+ensurePositiveInt(pendingRepLimit, key);
+this.monitor.setPendingRepLimit(pendingRepLimit);
+  }
+
+  public int getPendingRepLimit() {

Review Comment:
   Please add `@VisibleForTesting`.



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java:
##
@@ -419,4 +419,28 @@ void runMonitorForTest() throws ExecutionException, 
InterruptedException {
 executor.submit(monitor).get();
   }
 
+  public void refreshPendingRepLimit(int pendingRepLimit, String key) {
+ensurePositiveInt(pendingRepLimit, key);
+this.monitor.setPendingRepLimit(pendingRepLimit);
+  }
+
+  public int getPendingRepLimit() {
+return this.monitor.getPendingRepLimit();
+  }
+
+  public void refreshBlocksPerLock(int blocksPerLock, String key) {
+ensurePositiveInt(blocksPerLock, key);
+this.monitor.setBlocksPerLock(blocksPerLock);
+  }
+
+  public int getBlocksPerLock() {

Review Comment:
   Add `@VisibleForTesting`.





> Support to make dfs.namenode.decommission.backoff.monitor.pending.limit 
> reconfigurable 
> ---
>
> Key: HDFS-16811
> URL: https://issues.apache.org/jira/browse/HDFS-16811
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> When the Backoff monitor is enabled, the parameter 
> dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically 
> adjusted to determines the maximum number of blocks related to decommission 
> and maintenance operations that can be loaded into the replication queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626153#comment-17626153
 ] 

ASF GitHub Bot commented on HDFS-16811:
---

tomscut commented on code in PR #5068:
URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008775125


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminBackoffMonitor.java:
##
@@ -73,7 +73,7 @@ public class DatanodeAdminBackoffMonitor extends 
DatanodeAdminMonitorBase
* The numbe of blocks to process when moving blocks to pendingReplication

Review Comment:
   Please fix this typo `numbe` by the way. Thanks.





> Support to make dfs.namenode.decommission.backoff.monitor.pending.limit 
> reconfigurable 
> ---
>
> Key: HDFS-16811
> URL: https://issues.apache.org/jira/browse/HDFS-16811
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> When the Backoff monitor is enabled, the parameter 
> dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically 
> adjusted to determines the maximum number of blocks related to decommission 
> and maintenance operations that can be loaded into the replication queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626151#comment-17626151
 ] 

ASF GitHub Bot commented on HDFS-16811:
---

tomscut commented on code in PR #5068:
URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008774407


##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeReconfigure.java:
##
@@ -567,6 +571,92 @@ private List validatePeerReport(String 
jsonReport) {
 return containReport;
   }
 
+  @Test
+  public void testReconfigureDecommissionBackoffMonitorParameters()
+  throws ReconfigurationException, IOException {
+Configuration conf = new HdfsConfiguration();
+conf.setClass(DFSConfigKeys.DFS_NAMENODE_DECOMMISSION_MONITOR_CLASS,
+DatanodeAdminBackoffMonitor.class, 
DatanodeAdminMonitorInterface.class);
+int defaultPendingRepLimit = 1000;
+conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, 
defaultPendingRepLimit);
+int defaultBlocksPerLock = 1000;
+
conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK,
+defaultBlocksPerLock);
+MiniDFSCluster newCluster = new MiniDFSCluster.Builder(conf).build();
+newCluster.waitActive();
+
+try {
+  final NameNode nameNode = newCluster.getNameNode();
+  final DatanodeManager datanodeManager = nameNode.namesystem
+  .getBlockManager().getDatanodeManager();
+
+  // verify defaultPendingRepLimit.
+  
assertEquals(datanodeManager.getDatanodeAdminManager().getPendingRepLimit(),
+  defaultPendingRepLimit);
+
+  // try invalid pendingRepLimit.
+  try {
+
nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT,
+"non-numeric");
+fail("Should not reach here");
+  } catch (ReconfigurationException e) {
+assertEquals("Could not change property " +
+"dfs.namenode.decommission.backoff.monitor.pending.limit from '" +
+defaultPendingRepLimit + "' to 'non-numeric'", e.getMessage());
+  }
+
+  try {
+
nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT,
+"-1");
+fail("Should not reach here");
+  } catch (ReconfigurationException e) {
+assertEquals("Could not change property " +
+"dfs.namenode.decommission.backoff.monitor.pending.limit from '" +
+defaultPendingRepLimit + "' to '-1'", e.getMessage());
+  }
+
+  // try correct pendingRepLimit.
+  
nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT,
+  "2");
+  
assertEquals(datanodeManager.getDatanodeAdminManager().getPendingRepLimit(), 
2);
+
+  // verify defaultBlocksPerLock.
+  
assertEquals(datanodeManager.getDatanodeAdminManager().getBlocksPerLock(),
+  defaultBlocksPerLock);
+
+  // try invalid blocksPerLock.
+  try {
+nameNode.reconfigureProperty(
+DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK,
+"non-numeric");
+fail("Should not reach here");
+  } catch (ReconfigurationException e) {
+assertEquals("Could not change property " +
+"dfs.namenode.decommission.backoff.monitor.pending.blocks.per.lock 
from '" +
+defaultBlocksPerLock + "' to 'non-numeric'", e.getMessage());
+  }
+
+  try {
+nameNode.reconfigureProperty(
+DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, 
"-1");
+fail("Should not reach here");
+  } catch (ReconfigurationException e) {
+assertEquals("Could not change property " +
+"dfs.namenode.decommission.backoff.monitor.pending.blocks.per.lock 
from '" +
+defaultBlocksPerLock + "' to '-1'", e.getMessage());
+  }
+
+  // try correct blocksPerLock.
+  nameNode.reconfigureProperty(
+  DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, 
"1");
+  
assertEquals(datanodeManager.getDatanodeAdminManager().getBlocksPerLock(), 
1);
+} finally {

Review Comment:
   ```suggestion
   ```
   Please also update the finally block.





> Support to make dfs.namenode.decommission.backoff.monitor.pending.limit 
> reconfigurable 
> ---
>
> Key: HDFS-16811
> URL: https://issues.apache.org/jira/browse/HDFS-16811
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> When the Backoff monitor is enabled, the parameter 
> dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically 
> adjusted

[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626152#comment-17626152
 ] 

ASF GitHub Bot commented on HDFS-16811:
---

tomscut commented on code in PR #5068:
URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008774407


##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeReconfigure.java:
##
@@ -567,6 +571,92 @@ private List validatePeerReport(String 
jsonReport) {
 return containReport;
   }
 
+  @Test
+  public void testReconfigureDecommissionBackoffMonitorParameters()
+  throws ReconfigurationException, IOException {
+Configuration conf = new HdfsConfiguration();
+conf.setClass(DFSConfigKeys.DFS_NAMENODE_DECOMMISSION_MONITOR_CLASS,
+DatanodeAdminBackoffMonitor.class, 
DatanodeAdminMonitorInterface.class);
+int defaultPendingRepLimit = 1000;
+conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, 
defaultPendingRepLimit);
+int defaultBlocksPerLock = 1000;
+
conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK,
+defaultBlocksPerLock);
+MiniDFSCluster newCluster = new MiniDFSCluster.Builder(conf).build();
+newCluster.waitActive();
+
+try {
+  final NameNode nameNode = newCluster.getNameNode();
+  final DatanodeManager datanodeManager = nameNode.namesystem
+  .getBlockManager().getDatanodeManager();
+
+  // verify defaultPendingRepLimit.
+  
assertEquals(datanodeManager.getDatanodeAdminManager().getPendingRepLimit(),
+  defaultPendingRepLimit);
+
+  // try invalid pendingRepLimit.
+  try {
+
nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT,
+"non-numeric");
+fail("Should not reach here");
+  } catch (ReconfigurationException e) {
+assertEquals("Could not change property " +
+"dfs.namenode.decommission.backoff.monitor.pending.limit from '" +
+defaultPendingRepLimit + "' to 'non-numeric'", e.getMessage());
+  }
+
+  try {
+
nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT,
+"-1");
+fail("Should not reach here");
+  } catch (ReconfigurationException e) {
+assertEquals("Could not change property " +
+"dfs.namenode.decommission.backoff.monitor.pending.limit from '" +
+defaultPendingRepLimit + "' to '-1'", e.getMessage());
+  }
+
+  // try correct pendingRepLimit.
+  
nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT,
+  "2");
+  
assertEquals(datanodeManager.getDatanodeAdminManager().getPendingRepLimit(), 
2);
+
+  // verify defaultBlocksPerLock.
+  
assertEquals(datanodeManager.getDatanodeAdminManager().getBlocksPerLock(),
+  defaultBlocksPerLock);
+
+  // try invalid blocksPerLock.
+  try {
+nameNode.reconfigureProperty(
+DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK,
+"non-numeric");
+fail("Should not reach here");
+  } catch (ReconfigurationException e) {
+assertEquals("Could not change property " +
+"dfs.namenode.decommission.backoff.monitor.pending.blocks.per.lock 
from '" +
+defaultBlocksPerLock + "' to 'non-numeric'", e.getMessage());
+  }
+
+  try {
+nameNode.reconfigureProperty(
+DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, 
"-1");
+fail("Should not reach here");
+  } catch (ReconfigurationException e) {
+assertEquals("Could not change property " +
+"dfs.namenode.decommission.backoff.monitor.pending.blocks.per.lock 
from '" +
+defaultBlocksPerLock + "' to '-1'", e.getMessage());
+  }
+
+  // try correct blocksPerLock.
+  nameNode.reconfigureProperty(
+  DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, 
"1");
+  
assertEquals(datanodeManager.getDatanodeAdminManager().getBlocksPerLock(), 
1);
+} finally {

Review Comment:
   Please also update the finally block.





> Support to make dfs.namenode.decommission.backoff.monitor.pending.limit 
> reconfigurable 
> ---
>
> Key: HDFS-16811
> URL: https://issues.apache.org/jira/browse/HDFS-16811
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> When the Backoff monitor is enabled, the parameter 
> dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically 
> adjusted to determines the maxim

[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626150#comment-17626150
 ] 

ASF GitHub Bot commented on HDFS-16811:
---

tomscut commented on code in PR #5068:
URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008774411


##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeReconfigure.java:
##
@@ -567,6 +571,92 @@ private List validatePeerReport(String 
jsonReport) {
 return containReport;
   }
 
+  @Test
+  public void testReconfigureDecommissionBackoffMonitorParameters()
+  throws ReconfigurationException, IOException {
+Configuration conf = new HdfsConfiguration();
+conf.setClass(DFSConfigKeys.DFS_NAMENODE_DECOMMISSION_MONITOR_CLASS,
+DatanodeAdminBackoffMonitor.class, 
DatanodeAdminMonitorInterface.class);
+int defaultPendingRepLimit = 1000;
+conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, 
defaultPendingRepLimit);
+int defaultBlocksPerLock = 1000;
+
conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK,
+defaultBlocksPerLock);
+MiniDFSCluster newCluster = new MiniDFSCluster.Builder(conf).build();
+newCluster.waitActive();
+
+try {
+  final NameNode nameNode = newCluster.getNameNode();
+  final DatanodeManager datanodeManager = nameNode.namesystem
+  .getBlockManager().getDatanodeManager();
+
+  // verify defaultPendingRepLimit.
+  
assertEquals(datanodeManager.getDatanodeAdminManager().getPendingRepLimit(),
+  defaultPendingRepLimit);
+
+  // try invalid pendingRepLimit.
+  try {
+
nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT,
+"non-numeric");
+fail("Should not reach here");
+  } catch (ReconfigurationException e) {
+assertEquals("Could not change property " +
+"dfs.namenode.decommission.backoff.monitor.pending.limit from '" +
+defaultPendingRepLimit + "' to 'non-numeric'", e.getMessage());
+  }
+
+  try {
+
nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT,
+"-1");
+fail("Should not reach here");
+  } catch (ReconfigurationException e) {
+assertEquals("Could not change property " +
+"dfs.namenode.decommission.backoff.monitor.pending.limit from '" +
+defaultPendingRepLimit + "' to '-1'", e.getMessage());
+  }
+
+  // try correct pendingRepLimit.
+  
nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT,
+  "2");
+  
assertEquals(datanodeManager.getDatanodeAdminManager().getPendingRepLimit(), 
2);
+
+  // verify defaultBlocksPerLock.
+  
assertEquals(datanodeManager.getDatanodeAdminManager().getBlocksPerLock(),
+  defaultBlocksPerLock);
+
+  // try invalid blocksPerLock.
+  try {
+nameNode.reconfigureProperty(
+DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK,
+"non-numeric");
+fail("Should not reach here");
+  } catch (ReconfigurationException e) {
+assertEquals("Could not change property " +
+"dfs.namenode.decommission.backoff.monitor.pending.blocks.per.lock 
from '" +
+defaultBlocksPerLock + "' to 'non-numeric'", e.getMessage());
+  }
+
+  try {
+nameNode.reconfigureProperty(
+DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, 
"-1");
+fail("Should not reach here");
+  } catch (ReconfigurationException e) {
+assertEquals("Could not change property " +
+"dfs.namenode.decommission.backoff.monitor.pending.blocks.per.lock 
from '" +
+defaultBlocksPerLock + "' to '-1'", e.getMessage());
+  }
+
+  // try correct blocksPerLock.
+  nameNode.reconfigureProperty(
+  DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, 
"1");
+  
assertEquals(datanodeManager.getDatanodeAdminManager().getBlocksPerLock(), 
1);
+} finally {
+  if (newCluster != null) {

Review Comment:
   ```suggestion
   ```





> Support to make dfs.namenode.decommission.backoff.monitor.pending.limit 
> reconfigurable 
> ---
>
> Key: HDFS-16811
> URL: https://issues.apache.org/jira/browse/HDFS-16811
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> When the Backoff monitor is enabled, the parameter 
> dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically 
> adjusted to dete

[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626149#comment-17626149
 ] 

ASF GitHub Bot commented on HDFS-16811:
---

tomscut commented on code in PR #5068:
URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008774407


##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeReconfigure.java:
##
@@ -567,6 +571,92 @@ private List validatePeerReport(String 
jsonReport) {
 return containReport;
   }
 
+  @Test
+  public void testReconfigureDecommissionBackoffMonitorParameters()
+  throws ReconfigurationException, IOException {
+Configuration conf = new HdfsConfiguration();
+conf.setClass(DFSConfigKeys.DFS_NAMENODE_DECOMMISSION_MONITOR_CLASS,
+DatanodeAdminBackoffMonitor.class, 
DatanodeAdminMonitorInterface.class);
+int defaultPendingRepLimit = 1000;
+conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, 
defaultPendingRepLimit);
+int defaultBlocksPerLock = 1000;
+
conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK,
+defaultBlocksPerLock);
+MiniDFSCluster newCluster = new MiniDFSCluster.Builder(conf).build();
+newCluster.waitActive();
+
+try {
+  final NameNode nameNode = newCluster.getNameNode();
+  final DatanodeManager datanodeManager = nameNode.namesystem
+  .getBlockManager().getDatanodeManager();
+
+  // verify defaultPendingRepLimit.
+  
assertEquals(datanodeManager.getDatanodeAdminManager().getPendingRepLimit(),
+  defaultPendingRepLimit);
+
+  // try invalid pendingRepLimit.
+  try {
+
nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT,
+"non-numeric");
+fail("Should not reach here");
+  } catch (ReconfigurationException e) {
+assertEquals("Could not change property " +
+"dfs.namenode.decommission.backoff.monitor.pending.limit from '" +
+defaultPendingRepLimit + "' to 'non-numeric'", e.getMessage());
+  }
+
+  try {
+
nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT,
+"-1");
+fail("Should not reach here");
+  } catch (ReconfigurationException e) {
+assertEquals("Could not change property " +
+"dfs.namenode.decommission.backoff.monitor.pending.limit from '" +
+defaultPendingRepLimit + "' to '-1'", e.getMessage());
+  }
+
+  // try correct pendingRepLimit.
+  
nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT,
+  "2");
+  
assertEquals(datanodeManager.getDatanodeAdminManager().getPendingRepLimit(), 
2);
+
+  // verify defaultBlocksPerLock.
+  
assertEquals(datanodeManager.getDatanodeAdminManager().getBlocksPerLock(),
+  defaultBlocksPerLock);
+
+  // try invalid blocksPerLock.
+  try {
+nameNode.reconfigureProperty(
+DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK,
+"non-numeric");
+fail("Should not reach here");
+  } catch (ReconfigurationException e) {
+assertEquals("Could not change property " +
+"dfs.namenode.decommission.backoff.monitor.pending.blocks.per.lock 
from '" +
+defaultBlocksPerLock + "' to 'non-numeric'", e.getMessage());
+  }
+
+  try {
+nameNode.reconfigureProperty(
+DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, 
"-1");
+fail("Should not reach here");
+  } catch (ReconfigurationException e) {
+assertEquals("Could not change property " +
+"dfs.namenode.decommission.backoff.monitor.pending.blocks.per.lock 
from '" +
+defaultBlocksPerLock + "' to '-1'", e.getMessage());
+  }
+
+  // try correct blocksPerLock.
+  nameNode.reconfigureProperty(
+  DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, 
"1");
+  
assertEquals(datanodeManager.getDatanodeAdminManager().getBlocksPerLock(), 
1);
+} finally {

Review Comment:
   ```suggestion
   ```





> Support to make dfs.namenode.decommission.backoff.monitor.pending.limit 
> reconfigurable 
> ---
>
> Key: HDFS-16811
> URL: https://issues.apache.org/jira/browse/HDFS-16811
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> When the Backoff monitor is enabled, the parameter 
> dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically 
> adjusted to determines the maximum number of bloc

[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626148#comment-17626148
 ] 

ASF GitHub Bot commented on HDFS-16811:
---

tomscut commented on code in PR #5068:
URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008774315


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java:
##
@@ -2601,6 +2611,36 @@ private String reconfigureBlockInvalidateLimit(final 
DatanodeManager datanodeMan
 }
   }
 
+  private String reconfigureDecommissionBackoffMonitorParameters(
+  final DatanodeManager datanodeManager, final String property, final 
String newVal)
+  throws ReconfigurationException {
+String newSetting;
+try {
+  if 
(property.equals(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT)) {
+int pendingRepLimit = (newVal == null ?
+DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT_DEFAULT :
+Integer.parseInt(newVal));
+
datanodeManager.getDatanodeAdminManager().refreshPendingRepLimit(pendingRepLimit,
+DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT);
+newSetting = 
String.valueOf(datanodeManager.getDatanodeAdminManager().getPendingRepLimit());
+  } else if 
(property.equals(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK))
 {

Review Comment:
   Please fix the checkstyles warn here.





> Support to make dfs.namenode.decommission.backoff.monitor.pending.limit 
> reconfigurable 
> ---
>
> Key: HDFS-16811
> URL: https://issues.apache.org/jira/browse/HDFS-16811
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> When the Backoff monitor is enabled, the parameter 
> dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically 
> adjusted to determines the maximum number of blocks related to decommission 
> and maintenance operations that can be loaded into the replication queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626147#comment-17626147
 ] 

ASF GitHub Bot commented on HDFS-16811:
---

tomscut commented on code in PR #5068:
URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008773817


##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeReconfigure.java:
##
@@ -567,6 +571,92 @@ private List validatePeerReport(String 
jsonReport) {
 return containReport;
   }
 
+  @Test
+  public void testReconfigureDecommissionBackoffMonitorParameters()
+  throws ReconfigurationException, IOException {
+Configuration conf = new HdfsConfiguration();
+conf.setClass(DFSConfigKeys.DFS_NAMENODE_DECOMMISSION_MONITOR_CLASS,
+DatanodeAdminBackoffMonitor.class, 
DatanodeAdminMonitorInterface.class);
+int defaultPendingRepLimit = 1000;
+conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, 
defaultPendingRepLimit);
+int defaultBlocksPerLock = 1000;
+
conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK,
+defaultBlocksPerLock);
+MiniDFSCluster newCluster = new MiniDFSCluster.Builder(conf).build();
+newCluster.waitActive();
+
+try {

Review Comment:
   ```suggestion
   try (MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf).build()) {
   newCluster.waitActive();
   ```
   Please also update this by the way.





> Support to make dfs.namenode.decommission.backoff.monitor.pending.limit 
> reconfigurable 
> ---
>
> Key: HDFS-16811
> URL: https://issues.apache.org/jira/browse/HDFS-16811
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> When the Backoff monitor is enabled, the parameter 
> dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically 
> adjusted to determines the maximum number of blocks related to decommission 
> and maintenance operations that can be loaded into the replication queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15654) TestBPOfferService#testMissBlocksWhenReregister fails intermittently

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626146#comment-17626146
 ] 

ASF GitHub Bot commented on HDFS-15654:
---

hadoop-yetus commented on PR #5089:
URL: https://github.com/apache/hadoop/pull/5089#issuecomment-1296036271

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 49s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ branch-3.3 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  39m 38s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  mvnsite  |   1m 30s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  javadoc  |   1m 45s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  spotbugs  |   3m 44s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  shadedclient  |  28m 37s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   3m 23s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  28m 21s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 219m 14s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 55s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 333m 30s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestFsck |
   |   | hadoop.hdfs.server.mover.TestMover |
   |   | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
   |   | hadoop.hdfs.server.namenode.TestNameNodeMXBean |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5089 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux d180c017e220 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.3 / 40e025bf8b7dfdfb6c7a1cd7819ed674135d817d |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~18.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/2/testReport/ |
   | Max. process+thread count | 2239 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/2/console |
   | versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> TestBPOfferService#testMissBlocksWhenReregister fails intermittently
> 
>
> Key: HDFS-15654
> URL: https://issues.apache.org/jira/browse/HDFS-15654
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>

[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626145#comment-17626145
 ] 

ASF GitHub Bot commented on HDFS-16811:
---

tomscut commented on code in PR #5068:
URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008773687


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java:
##
@@ -2601,6 +2611,36 @@ private String reconfigureBlockInvalidateLimit(final 
DatanodeManager datanodeMan
 }
   }
 
+  private String reconfigureDecommissionBackoffMonitorParameters(
+  final DatanodeManager datanodeManager, final String property, final 
String newVal)
+  throws ReconfigurationException {
+String newSetting;
+try {
+  if 
(property.equals(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT)) {
+int pendingRepLimit = (newVal == null ?
+DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT_DEFAULT :
+Integer.parseInt(newVal));
+
datanodeManager.getDatanodeAdminManager().refreshPendingRepLimit(pendingRepLimit,
+DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT);
+newSetting = 
String.valueOf(datanodeManager.getDatanodeAdminManager().getPendingRepLimit());
+  } else if 
(property.equals(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK))
 {
+int blocksPerLock = (newVal == null ?
+
DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK_DEFAULT :
+Integer.parseInt(newVal));
+
datanodeManager.getDatanodeAdminManager().refreshBlocksPerLock(blocksPerLock,
+DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK);
+newSetting = 
String.valueOf(datanodeManager.getDatanodeAdminManager().getBlocksPerLock());
+  } else {
+throw new IllegalArgumentException("Unexpected property " +

Review Comment:
   Thanks @haiyang1987 for updating. This can be removed because the value of 
the key is already judged in the outer layer.
   
   





> Support to make dfs.namenode.decommission.backoff.monitor.pending.limit 
> reconfigurable 
> ---
>
> Key: HDFS-16811
> URL: https://issues.apache.org/jira/browse/HDFS-16811
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> When the Backoff monitor is enabled, the parameter 
> dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically 
> adjusted to determines the maximum number of blocks related to decommission 
> and maintenance operations that can be loaded into the replication queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-9536) OOM errors during parallel upgrade to Block-ID based layout

2022-10-29 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-9536.
---
Resolution: Duplicate

I believe this is no longer an issue after HDFS-15937 and HDFS-15610.

> OOM errors during parallel upgrade to Block-ID based layout
> ---
>
> Key: HDFS-9536
> URL: https://issues.apache.org/jira/browse/HDFS-9536
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
>
> This is a follow-up jira for the OOM errors observed during parallel upgrade 
> to Block-ID based datanode layout using HDFS-8578 fix.
> more clue 
> [here|https://issues.apache.org/jira/browse/HDFS-8578?focusedCommentId=15042012&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15042012]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626133#comment-17626133
 ] 

ASF GitHub Bot commented on HDFS-16811:
---

hadoop-yetus commented on PR #5068:
URL: https://github.com/apache/hadoop/pull/5068#issuecomment-1295952262

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 42s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  1s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  39m 40s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 33s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 18s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 46s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 17s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 35s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 11s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 25s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  0s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5068/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 75 unchanged - 
0 fixed = 76 total (was 75)  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 44s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 237m 52s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m  6s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 348m  9s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5068/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5068 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux af62785a4c81 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 5eb3fcea335142d2bf0e4c892d78c2c22f2c7126 |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5068/2/testReport/ |
   | Max. process+thread count | 3168 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-ha

[jira] [Commented] (HDFS-15654) TestBPOfferService#testMissBlocksWhenReregister fails intermittently

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626111#comment-17626111
 ] 

ASF GitHub Bot commented on HDFS-15654:
---

xinglin commented on PR #5089:
URL: https://github.com/apache/hadoop/pull/5089#issuecomment-1295936676

   Hi @ashutoshcipher,

   > @xinglin - Can you trigger to trigger once again. Let's have a happy Yetus
   
   Made an empty commit to trigger another build. Let's whether we will get a 
happy yetus.
   
   




> TestBPOfferService#testMissBlocksWhenReregister fails intermittently
> 
>
> Key: HDFS-15654
> URL: https://issues.apache.org/jira/browse/HDFS-15654
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {{TestBPOfferService.testMissBlocksWhenReregister}}  is flaky. It fails 
> randomly when the 
> following expression is not true:
> {code:java}
>   assertTrue(fullBlockReportCount == totalTestBlocks ||
>   incrBlockReportCount == totalTestBlocks);
> {code}
> There is a race condition here that relies once more on "time" to synchronize 
> between concurrent threads. The code below is is causing the 
> non-deterministic execution.
> On a slow server, {{addNewBlockThread}} may not be done by the time the main 
> thread reach the assertion call.
> {code:java}
>   // Verify FBR/IBR count is equal to generate number.
>   assertTrue(fullBlockReportCount == totalTestBlocks ||
>   incrBlockReportCount == totalTestBlocks);
> } finally {
>   addNewBlockThread.join();
>   bpos.stop();
>   bpos.join();
> {code}
> Therefore, the correct implementation should wait for the thread to finish
> {code:java}
>  // the thread finished execution.
>  addNewBlockThread.join();
>   // Verify FBR/IBR count is equal to generate number.
>   assertTrue(fullBlockReportCount == totalTestBlocks ||
>   incrBlockReportCount == totalTestBlocks);
> } finally {
>   bpos.stop();
>   bpos.join();
> {code}
> {{DataNodeFaultInjector}} needs to have a longer wait_time too. 1 second is 
> not enough to satisfy the condition.
> {code:java}
>   DataNodeFaultInjector.set(new DataNodeFaultInjector() {
> public void blockUtilSendFullBlockReport() {
>   try {
> GenericTestUtils.waitFor(() -> {
>   if(count.get() > 2000) {
> return true;
>   }
>   return false;
> }, 100, 1); // increase that waiting time to 10 seconds.
>   } catch (Exception e) {
> e.printStackTrace();
>   }
> }
>   });
> {code}
> {code:bash}
> Stacktrace
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testMissBlocksWhenReregister(TestBPOfferService.java:350)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provide

[jira] [Commented] (HDFS-15654) TestBPOfferService#testMissBlocksWhenReregister fails intermittently

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626110#comment-17626110
 ] 

ASF GitHub Bot commented on HDFS-15654:
---

ashutoshcipher commented on PR #5089:
URL: https://github.com/apache/hadoop/pull/5089#issuecomment-1295935074

   @xinglin - Can you trigger to trigger once again. Let's have a happy Yetus




> TestBPOfferService#testMissBlocksWhenReregister fails intermittently
> 
>
> Key: HDFS-15654
> URL: https://issues.apache.org/jira/browse/HDFS-15654
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {{TestBPOfferService.testMissBlocksWhenReregister}}  is flaky. It fails 
> randomly when the 
> following expression is not true:
> {code:java}
>   assertTrue(fullBlockReportCount == totalTestBlocks ||
>   incrBlockReportCount == totalTestBlocks);
> {code}
> There is a race condition here that relies once more on "time" to synchronize 
> between concurrent threads. The code below is is causing the 
> non-deterministic execution.
> On a slow server, {{addNewBlockThread}} may not be done by the time the main 
> thread reach the assertion call.
> {code:java}
>   // Verify FBR/IBR count is equal to generate number.
>   assertTrue(fullBlockReportCount == totalTestBlocks ||
>   incrBlockReportCount == totalTestBlocks);
> } finally {
>   addNewBlockThread.join();
>   bpos.stop();
>   bpos.join();
> {code}
> Therefore, the correct implementation should wait for the thread to finish
> {code:java}
>  // the thread finished execution.
>  addNewBlockThread.join();
>   // Verify FBR/IBR count is equal to generate number.
>   assertTrue(fullBlockReportCount == totalTestBlocks ||
>   incrBlockReportCount == totalTestBlocks);
> } finally {
>   bpos.stop();
>   bpos.join();
> {code}
> {{DataNodeFaultInjector}} needs to have a longer wait_time too. 1 second is 
> not enough to satisfy the condition.
> {code:java}
>   DataNodeFaultInjector.set(new DataNodeFaultInjector() {
> public void blockUtilSendFullBlockReport() {
>   try {
> GenericTestUtils.waitFor(() -> {
>   if(count.get() > 2000) {
> return true;
>   }
>   return false;
> }, 100, 1); // increase that waiting time to 10 seconds.
>   } catch (Exception e) {
> e.printStackTrace();
>   }
> }
>   });
> {code}
> {code:bash}
> Stacktrace
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testMissBlocksWhenReregister(TestBPOfferService.java:350)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 

[jira] [Commented] (HDFS-15654) TestBPOfferService#testMissBlocksWhenReregister fails intermittently

2022-10-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626083#comment-17626083
 ] 

ASF GitHub Bot commented on HDFS-15654:
---

xinglin commented on PR #5089:
URL: https://github.com/apache/hadoop/pull/5089#issuecomment-1295900208

   This PR only changed TestBPOfferService. It does not change any other files. 
The failed hdfs unit tests are probably due to flaky minihdfs.  




> TestBPOfferService#testMissBlocksWhenReregister fails intermittently
> 
>
> Key: HDFS-15654
> URL: https://issues.apache.org/jira/browse/HDFS-15654
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {{TestBPOfferService.testMissBlocksWhenReregister}}  is flaky. It fails 
> randomly when the 
> following expression is not true:
> {code:java}
>   assertTrue(fullBlockReportCount == totalTestBlocks ||
>   incrBlockReportCount == totalTestBlocks);
> {code}
> There is a race condition here that relies once more on "time" to synchronize 
> between concurrent threads. The code below is is causing the 
> non-deterministic execution.
> On a slow server, {{addNewBlockThread}} may not be done by the time the main 
> thread reach the assertion call.
> {code:java}
>   // Verify FBR/IBR count is equal to generate number.
>   assertTrue(fullBlockReportCount == totalTestBlocks ||
>   incrBlockReportCount == totalTestBlocks);
> } finally {
>   addNewBlockThread.join();
>   bpos.stop();
>   bpos.join();
> {code}
> Therefore, the correct implementation should wait for the thread to finish
> {code:java}
>  // the thread finished execution.
>  addNewBlockThread.join();
>   // Verify FBR/IBR count is equal to generate number.
>   assertTrue(fullBlockReportCount == totalTestBlocks ||
>   incrBlockReportCount == totalTestBlocks);
> } finally {
>   bpos.stop();
>   bpos.join();
> {code}
> {{DataNodeFaultInjector}} needs to have a longer wait_time too. 1 second is 
> not enough to satisfy the condition.
> {code:java}
>   DataNodeFaultInjector.set(new DataNodeFaultInjector() {
> public void blockUtilSendFullBlockReport() {
>   try {
> GenericTestUtils.waitFor(() -> {
>   if(count.get() > 2000) {
> return true;
>   }
>   return false;
> }, 100, 1); // increase that waiting time to 10 seconds.
>   } catch (Exception e) {
> e.printStackTrace();
>   }
> }
>   });
> {code}
> {code:bash}
> Stacktrace
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testMissBlocksWhenReregister(TestBPOfferService.java:350)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4P