[jira] [Commented] (HDFS-15654) TestBPOfferService#testMissBlocksWhenReregister fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626168#comment-17626168 ] ASF GitHub Bot commented on HDFS-15654: --- xinglin commented on PR #5089: URL: https://github.com/apache/hadoop/pull/5089#issuecomment-1296088489 It looks like four different unit tests failed the second time and there is no overlap between the first and second build. I guess unit tests were run in different orders between different builds and thus we see different unit tests failed each time. @ashutoshcipher, I don't think these non-deterministic unit tests failures are related with this PR, since we only modified TestBPOfferService.java. What do you think? > TestBPOfferService#testMissBlocksWhenReregister fails intermittently > > > Key: HDFS-15654 > URL: https://issues.apache.org/jira/browse/HDFS-15654 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > {{TestBPOfferService.testMissBlocksWhenReregister}} is flaky. It fails > randomly when the > following expression is not true: > {code:java} > assertTrue(fullBlockReportCount == totalTestBlocks || > incrBlockReportCount == totalTestBlocks); > {code} > There is a race condition here that relies once more on "time" to synchronize > between concurrent threads. The code below is is causing the > non-deterministic execution. > On a slow server, {{addNewBlockThread}} may not be done by the time the main > thread reach the assertion call. > {code:java} > // Verify FBR/IBR count is equal to generate number. > assertTrue(fullBlockReportCount == totalTestBlocks || > incrBlockReportCount == totalTestBlocks); > } finally { > addNewBlockThread.join(); > bpos.stop(); > bpos.join(); > {code} > Therefore, the correct implementation should wait for the thread to finish > {code:java} > // the thread finished execution. > addNewBlockThread.join(); > // Verify FBR/IBR count is equal to generate number. > assertTrue(fullBlockReportCount == totalTestBlocks || > incrBlockReportCount == totalTestBlocks); > } finally { > bpos.stop(); > bpos.join(); > {code} > {{DataNodeFaultInjector}} needs to have a longer wait_time too. 1 second is > not enough to satisfy the condition. > {code:java} > DataNodeFaultInjector.set(new DataNodeFaultInjector() { > public void blockUtilSendFullBlockReport() { > try { > GenericTestUtils.waitFor(() -> { > if(count.get() > 2000) { > return true; > } > return false; > }, 100, 1); // increase that waiting time to 10 seconds. > } catch (Exception e) { > e.printStackTrace(); > } > } > }); > {code} > {code:bash} > Stacktrace > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testMissBlocksWhenReregister(TestBPOfferService.java:350) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) >
[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626156#comment-17626156 ] ASF GitHub Bot commented on HDFS-16811: --- tomscut commented on code in PR #5068: URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008775990 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java: ## @@ -137,6 +137,26 @@ public int getNumNodesChecked() { return numNodesChecked; } + @Override Review Comment: Add `@VisibleForTesting`. ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java: ## @@ -137,6 +137,26 @@ public int getNumNodesChecked() { return numNodesChecked; } + @Override + public int getPendingRepLimit() { +return 0; + } + + @Override + public void setPendingRepLimit(int pendingRepLimit) { +// nothing. + } + + @Override Review Comment: Add `@VisibleForTesting`. > Support to make dfs.namenode.decommission.backoff.monitor.pending.limit > reconfigurable > --- > > Key: HDFS-16811 > URL: https://issues.apache.org/jira/browse/HDFS-16811 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When the Backoff monitor is enabled, the parameter > dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically > adjusted to determines the maximum number of blocks related to decommission > and maintenance operations that can be loaded into the replication queue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626155#comment-17626155 ] ASF GitHub Bot commented on HDFS-16811: --- tomscut commented on code in PR #5068: URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008775968 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminBackoffMonitor.java: ## @@ -801,6 +801,23 @@ private boolean isBlockReplicatedOk(DatanodeDescriptor datanode, return false; } + Review Comment: Add `@VisibleForTesting`. ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminBackoffMonitor.java: ## @@ -801,6 +801,23 @@ private boolean isBlockReplicatedOk(DatanodeDescriptor datanode, return false; } + + public int getPendingRepLimit() { +return pendingRepLimit; + } + + public void setPendingRepLimit(int pendingRepLimit) { +this.pendingRepLimit = pendingRepLimit; + } + + public int getBlocksPerLock() { Review Comment: Add `@VisibleForTesting`. > Support to make dfs.namenode.decommission.backoff.monitor.pending.limit > reconfigurable > --- > > Key: HDFS-16811 > URL: https://issues.apache.org/jira/browse/HDFS-16811 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When the Backoff monitor is enabled, the parameter > dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically > adjusted to determines the maximum number of blocks related to decommission > and maintenance operations that can be loaded into the replication queue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626154#comment-17626154 ] ASF GitHub Bot commented on HDFS-16811: --- tomscut commented on code in PR #5068: URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008775655 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java: ## @@ -419,4 +419,28 @@ void runMonitorForTest() throws ExecutionException, InterruptedException { executor.submit(monitor).get(); } + public void refreshPendingRepLimit(int pendingRepLimit, String key) { +ensurePositiveInt(pendingRepLimit, key); +this.monitor.setPendingRepLimit(pendingRepLimit); + } + + public int getPendingRepLimit() { Review Comment: Please add `@VisibleForTesting`. ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java: ## @@ -419,4 +419,28 @@ void runMonitorForTest() throws ExecutionException, InterruptedException { executor.submit(monitor).get(); } + public void refreshPendingRepLimit(int pendingRepLimit, String key) { +ensurePositiveInt(pendingRepLimit, key); +this.monitor.setPendingRepLimit(pendingRepLimit); + } + + public int getPendingRepLimit() { +return this.monitor.getPendingRepLimit(); + } + + public void refreshBlocksPerLock(int blocksPerLock, String key) { +ensurePositiveInt(blocksPerLock, key); +this.monitor.setBlocksPerLock(blocksPerLock); + } + + public int getBlocksPerLock() { Review Comment: Add `@VisibleForTesting`. > Support to make dfs.namenode.decommission.backoff.monitor.pending.limit > reconfigurable > --- > > Key: HDFS-16811 > URL: https://issues.apache.org/jira/browse/HDFS-16811 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When the Backoff monitor is enabled, the parameter > dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically > adjusted to determines the maximum number of blocks related to decommission > and maintenance operations that can be loaded into the replication queue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626153#comment-17626153 ] ASF GitHub Bot commented on HDFS-16811: --- tomscut commented on code in PR #5068: URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008775125 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminBackoffMonitor.java: ## @@ -73,7 +73,7 @@ public class DatanodeAdminBackoffMonitor extends DatanodeAdminMonitorBase * The numbe of blocks to process when moving blocks to pendingReplication Review Comment: Please fix this typo `numbe` by the way. Thanks. > Support to make dfs.namenode.decommission.backoff.monitor.pending.limit > reconfigurable > --- > > Key: HDFS-16811 > URL: https://issues.apache.org/jira/browse/HDFS-16811 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When the Backoff monitor is enabled, the parameter > dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically > adjusted to determines the maximum number of blocks related to decommission > and maintenance operations that can be loaded into the replication queue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626151#comment-17626151 ] ASF GitHub Bot commented on HDFS-16811: --- tomscut commented on code in PR #5068: URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008774407 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeReconfigure.java: ## @@ -567,6 +571,92 @@ private List validatePeerReport(String jsonReport) { return containReport; } + @Test + public void testReconfigureDecommissionBackoffMonitorParameters() + throws ReconfigurationException, IOException { +Configuration conf = new HdfsConfiguration(); +conf.setClass(DFSConfigKeys.DFS_NAMENODE_DECOMMISSION_MONITOR_CLASS, +DatanodeAdminBackoffMonitor.class, DatanodeAdminMonitorInterface.class); +int defaultPendingRepLimit = 1000; +conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, defaultPendingRepLimit); +int defaultBlocksPerLock = 1000; + conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, +defaultBlocksPerLock); +MiniDFSCluster newCluster = new MiniDFSCluster.Builder(conf).build(); +newCluster.waitActive(); + +try { + final NameNode nameNode = newCluster.getNameNode(); + final DatanodeManager datanodeManager = nameNode.namesystem + .getBlockManager().getDatanodeManager(); + + // verify defaultPendingRepLimit. + assertEquals(datanodeManager.getDatanodeAdminManager().getPendingRepLimit(), + defaultPendingRepLimit); + + // try invalid pendingRepLimit. + try { + nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, +"non-numeric"); +fail("Should not reach here"); + } catch (ReconfigurationException e) { +assertEquals("Could not change property " + +"dfs.namenode.decommission.backoff.monitor.pending.limit from '" + +defaultPendingRepLimit + "' to 'non-numeric'", e.getMessage()); + } + + try { + nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, +"-1"); +fail("Should not reach here"); + } catch (ReconfigurationException e) { +assertEquals("Could not change property " + +"dfs.namenode.decommission.backoff.monitor.pending.limit from '" + +defaultPendingRepLimit + "' to '-1'", e.getMessage()); + } + + // try correct pendingRepLimit. + nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, + "2"); + assertEquals(datanodeManager.getDatanodeAdminManager().getPendingRepLimit(), 2); + + // verify defaultBlocksPerLock. + assertEquals(datanodeManager.getDatanodeAdminManager().getBlocksPerLock(), + defaultBlocksPerLock); + + // try invalid blocksPerLock. + try { +nameNode.reconfigureProperty( +DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, +"non-numeric"); +fail("Should not reach here"); + } catch (ReconfigurationException e) { +assertEquals("Could not change property " + +"dfs.namenode.decommission.backoff.monitor.pending.blocks.per.lock from '" + +defaultBlocksPerLock + "' to 'non-numeric'", e.getMessage()); + } + + try { +nameNode.reconfigureProperty( +DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, "-1"); +fail("Should not reach here"); + } catch (ReconfigurationException e) { +assertEquals("Could not change property " + +"dfs.namenode.decommission.backoff.monitor.pending.blocks.per.lock from '" + +defaultBlocksPerLock + "' to '-1'", e.getMessage()); + } + + // try correct blocksPerLock. + nameNode.reconfigureProperty( + DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, "1"); + assertEquals(datanodeManager.getDatanodeAdminManager().getBlocksPerLock(), 1); +} finally { Review Comment: ```suggestion ``` Please also update the finally block. > Support to make dfs.namenode.decommission.backoff.monitor.pending.limit > reconfigurable > --- > > Key: HDFS-16811 > URL: https://issues.apache.org/jira/browse/HDFS-16811 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When the Backoff monitor is enabled, the parameter > dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically > adjusted
[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626152#comment-17626152 ] ASF GitHub Bot commented on HDFS-16811: --- tomscut commented on code in PR #5068: URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008774407 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeReconfigure.java: ## @@ -567,6 +571,92 @@ private List validatePeerReport(String jsonReport) { return containReport; } + @Test + public void testReconfigureDecommissionBackoffMonitorParameters() + throws ReconfigurationException, IOException { +Configuration conf = new HdfsConfiguration(); +conf.setClass(DFSConfigKeys.DFS_NAMENODE_DECOMMISSION_MONITOR_CLASS, +DatanodeAdminBackoffMonitor.class, DatanodeAdminMonitorInterface.class); +int defaultPendingRepLimit = 1000; +conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, defaultPendingRepLimit); +int defaultBlocksPerLock = 1000; + conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, +defaultBlocksPerLock); +MiniDFSCluster newCluster = new MiniDFSCluster.Builder(conf).build(); +newCluster.waitActive(); + +try { + final NameNode nameNode = newCluster.getNameNode(); + final DatanodeManager datanodeManager = nameNode.namesystem + .getBlockManager().getDatanodeManager(); + + // verify defaultPendingRepLimit. + assertEquals(datanodeManager.getDatanodeAdminManager().getPendingRepLimit(), + defaultPendingRepLimit); + + // try invalid pendingRepLimit. + try { + nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, +"non-numeric"); +fail("Should not reach here"); + } catch (ReconfigurationException e) { +assertEquals("Could not change property " + +"dfs.namenode.decommission.backoff.monitor.pending.limit from '" + +defaultPendingRepLimit + "' to 'non-numeric'", e.getMessage()); + } + + try { + nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, +"-1"); +fail("Should not reach here"); + } catch (ReconfigurationException e) { +assertEquals("Could not change property " + +"dfs.namenode.decommission.backoff.monitor.pending.limit from '" + +defaultPendingRepLimit + "' to '-1'", e.getMessage()); + } + + // try correct pendingRepLimit. + nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, + "2"); + assertEquals(datanodeManager.getDatanodeAdminManager().getPendingRepLimit(), 2); + + // verify defaultBlocksPerLock. + assertEquals(datanodeManager.getDatanodeAdminManager().getBlocksPerLock(), + defaultBlocksPerLock); + + // try invalid blocksPerLock. + try { +nameNode.reconfigureProperty( +DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, +"non-numeric"); +fail("Should not reach here"); + } catch (ReconfigurationException e) { +assertEquals("Could not change property " + +"dfs.namenode.decommission.backoff.monitor.pending.blocks.per.lock from '" + +defaultBlocksPerLock + "' to 'non-numeric'", e.getMessage()); + } + + try { +nameNode.reconfigureProperty( +DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, "-1"); +fail("Should not reach here"); + } catch (ReconfigurationException e) { +assertEquals("Could not change property " + +"dfs.namenode.decommission.backoff.monitor.pending.blocks.per.lock from '" + +defaultBlocksPerLock + "' to '-1'", e.getMessage()); + } + + // try correct blocksPerLock. + nameNode.reconfigureProperty( + DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, "1"); + assertEquals(datanodeManager.getDatanodeAdminManager().getBlocksPerLock(), 1); +} finally { Review Comment: Please also update the finally block. > Support to make dfs.namenode.decommission.backoff.monitor.pending.limit > reconfigurable > --- > > Key: HDFS-16811 > URL: https://issues.apache.org/jira/browse/HDFS-16811 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When the Backoff monitor is enabled, the parameter > dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically > adjusted to determines the maxim
[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626150#comment-17626150 ] ASF GitHub Bot commented on HDFS-16811: --- tomscut commented on code in PR #5068: URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008774411 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeReconfigure.java: ## @@ -567,6 +571,92 @@ private List validatePeerReport(String jsonReport) { return containReport; } + @Test + public void testReconfigureDecommissionBackoffMonitorParameters() + throws ReconfigurationException, IOException { +Configuration conf = new HdfsConfiguration(); +conf.setClass(DFSConfigKeys.DFS_NAMENODE_DECOMMISSION_MONITOR_CLASS, +DatanodeAdminBackoffMonitor.class, DatanodeAdminMonitorInterface.class); +int defaultPendingRepLimit = 1000; +conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, defaultPendingRepLimit); +int defaultBlocksPerLock = 1000; + conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, +defaultBlocksPerLock); +MiniDFSCluster newCluster = new MiniDFSCluster.Builder(conf).build(); +newCluster.waitActive(); + +try { + final NameNode nameNode = newCluster.getNameNode(); + final DatanodeManager datanodeManager = nameNode.namesystem + .getBlockManager().getDatanodeManager(); + + // verify defaultPendingRepLimit. + assertEquals(datanodeManager.getDatanodeAdminManager().getPendingRepLimit(), + defaultPendingRepLimit); + + // try invalid pendingRepLimit. + try { + nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, +"non-numeric"); +fail("Should not reach here"); + } catch (ReconfigurationException e) { +assertEquals("Could not change property " + +"dfs.namenode.decommission.backoff.monitor.pending.limit from '" + +defaultPendingRepLimit + "' to 'non-numeric'", e.getMessage()); + } + + try { + nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, +"-1"); +fail("Should not reach here"); + } catch (ReconfigurationException e) { +assertEquals("Could not change property " + +"dfs.namenode.decommission.backoff.monitor.pending.limit from '" + +defaultPendingRepLimit + "' to '-1'", e.getMessage()); + } + + // try correct pendingRepLimit. + nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, + "2"); + assertEquals(datanodeManager.getDatanodeAdminManager().getPendingRepLimit(), 2); + + // verify defaultBlocksPerLock. + assertEquals(datanodeManager.getDatanodeAdminManager().getBlocksPerLock(), + defaultBlocksPerLock); + + // try invalid blocksPerLock. + try { +nameNode.reconfigureProperty( +DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, +"non-numeric"); +fail("Should not reach here"); + } catch (ReconfigurationException e) { +assertEquals("Could not change property " + +"dfs.namenode.decommission.backoff.monitor.pending.blocks.per.lock from '" + +defaultBlocksPerLock + "' to 'non-numeric'", e.getMessage()); + } + + try { +nameNode.reconfigureProperty( +DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, "-1"); +fail("Should not reach here"); + } catch (ReconfigurationException e) { +assertEquals("Could not change property " + +"dfs.namenode.decommission.backoff.monitor.pending.blocks.per.lock from '" + +defaultBlocksPerLock + "' to '-1'", e.getMessage()); + } + + // try correct blocksPerLock. + nameNode.reconfigureProperty( + DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, "1"); + assertEquals(datanodeManager.getDatanodeAdminManager().getBlocksPerLock(), 1); +} finally { + if (newCluster != null) { Review Comment: ```suggestion ``` > Support to make dfs.namenode.decommission.backoff.monitor.pending.limit > reconfigurable > --- > > Key: HDFS-16811 > URL: https://issues.apache.org/jira/browse/HDFS-16811 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When the Backoff monitor is enabled, the parameter > dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically > adjusted to dete
[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626149#comment-17626149 ] ASF GitHub Bot commented on HDFS-16811: --- tomscut commented on code in PR #5068: URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008774407 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeReconfigure.java: ## @@ -567,6 +571,92 @@ private List validatePeerReport(String jsonReport) { return containReport; } + @Test + public void testReconfigureDecommissionBackoffMonitorParameters() + throws ReconfigurationException, IOException { +Configuration conf = new HdfsConfiguration(); +conf.setClass(DFSConfigKeys.DFS_NAMENODE_DECOMMISSION_MONITOR_CLASS, +DatanodeAdminBackoffMonitor.class, DatanodeAdminMonitorInterface.class); +int defaultPendingRepLimit = 1000; +conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, defaultPendingRepLimit); +int defaultBlocksPerLock = 1000; + conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, +defaultBlocksPerLock); +MiniDFSCluster newCluster = new MiniDFSCluster.Builder(conf).build(); +newCluster.waitActive(); + +try { + final NameNode nameNode = newCluster.getNameNode(); + final DatanodeManager datanodeManager = nameNode.namesystem + .getBlockManager().getDatanodeManager(); + + // verify defaultPendingRepLimit. + assertEquals(datanodeManager.getDatanodeAdminManager().getPendingRepLimit(), + defaultPendingRepLimit); + + // try invalid pendingRepLimit. + try { + nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, +"non-numeric"); +fail("Should not reach here"); + } catch (ReconfigurationException e) { +assertEquals("Could not change property " + +"dfs.namenode.decommission.backoff.monitor.pending.limit from '" + +defaultPendingRepLimit + "' to 'non-numeric'", e.getMessage()); + } + + try { + nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, +"-1"); +fail("Should not reach here"); + } catch (ReconfigurationException e) { +assertEquals("Could not change property " + +"dfs.namenode.decommission.backoff.monitor.pending.limit from '" + +defaultPendingRepLimit + "' to '-1'", e.getMessage()); + } + + // try correct pendingRepLimit. + nameNode.reconfigureProperty(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, + "2"); + assertEquals(datanodeManager.getDatanodeAdminManager().getPendingRepLimit(), 2); + + // verify defaultBlocksPerLock. + assertEquals(datanodeManager.getDatanodeAdminManager().getBlocksPerLock(), + defaultBlocksPerLock); + + // try invalid blocksPerLock. + try { +nameNode.reconfigureProperty( +DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, +"non-numeric"); +fail("Should not reach here"); + } catch (ReconfigurationException e) { +assertEquals("Could not change property " + +"dfs.namenode.decommission.backoff.monitor.pending.blocks.per.lock from '" + +defaultBlocksPerLock + "' to 'non-numeric'", e.getMessage()); + } + + try { +nameNode.reconfigureProperty( +DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, "-1"); +fail("Should not reach here"); + } catch (ReconfigurationException e) { +assertEquals("Could not change property " + +"dfs.namenode.decommission.backoff.monitor.pending.blocks.per.lock from '" + +defaultBlocksPerLock + "' to '-1'", e.getMessage()); + } + + // try correct blocksPerLock. + nameNode.reconfigureProperty( + DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, "1"); + assertEquals(datanodeManager.getDatanodeAdminManager().getBlocksPerLock(), 1); +} finally { Review Comment: ```suggestion ``` > Support to make dfs.namenode.decommission.backoff.monitor.pending.limit > reconfigurable > --- > > Key: HDFS-16811 > URL: https://issues.apache.org/jira/browse/HDFS-16811 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When the Backoff monitor is enabled, the parameter > dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically > adjusted to determines the maximum number of bloc
[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626148#comment-17626148 ] ASF GitHub Bot commented on HDFS-16811: --- tomscut commented on code in PR #5068: URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008774315 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java: ## @@ -2601,6 +2611,36 @@ private String reconfigureBlockInvalidateLimit(final DatanodeManager datanodeMan } } + private String reconfigureDecommissionBackoffMonitorParameters( + final DatanodeManager datanodeManager, final String property, final String newVal) + throws ReconfigurationException { +String newSetting; +try { + if (property.equals(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT)) { +int pendingRepLimit = (newVal == null ? +DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT_DEFAULT : +Integer.parseInt(newVal)); + datanodeManager.getDatanodeAdminManager().refreshPendingRepLimit(pendingRepLimit, +DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT); +newSetting = String.valueOf(datanodeManager.getDatanodeAdminManager().getPendingRepLimit()); + } else if (property.equals(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK)) { Review Comment: Please fix the checkstyles warn here. > Support to make dfs.namenode.decommission.backoff.monitor.pending.limit > reconfigurable > --- > > Key: HDFS-16811 > URL: https://issues.apache.org/jira/browse/HDFS-16811 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When the Backoff monitor is enabled, the parameter > dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically > adjusted to determines the maximum number of blocks related to decommission > and maintenance operations that can be loaded into the replication queue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626147#comment-17626147 ] ASF GitHub Bot commented on HDFS-16811: --- tomscut commented on code in PR #5068: URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008773817 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeReconfigure.java: ## @@ -567,6 +571,92 @@ private List validatePeerReport(String jsonReport) { return containReport; } + @Test + public void testReconfigureDecommissionBackoffMonitorParameters() + throws ReconfigurationException, IOException { +Configuration conf = new HdfsConfiguration(); +conf.setClass(DFSConfigKeys.DFS_NAMENODE_DECOMMISSION_MONITOR_CLASS, +DatanodeAdminBackoffMonitor.class, DatanodeAdminMonitorInterface.class); +int defaultPendingRepLimit = 1000; +conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT, defaultPendingRepLimit); +int defaultBlocksPerLock = 1000; + conf.setInt(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK, +defaultBlocksPerLock); +MiniDFSCluster newCluster = new MiniDFSCluster.Builder(conf).build(); +newCluster.waitActive(); + +try { Review Comment: ```suggestion try (MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf).build()) { newCluster.waitActive(); ``` Please also update this by the way. > Support to make dfs.namenode.decommission.backoff.monitor.pending.limit > reconfigurable > --- > > Key: HDFS-16811 > URL: https://issues.apache.org/jira/browse/HDFS-16811 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When the Backoff monitor is enabled, the parameter > dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically > adjusted to determines the maximum number of blocks related to decommission > and maintenance operations that can be loaded into the replication queue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15654) TestBPOfferService#testMissBlocksWhenReregister fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626146#comment-17626146 ] ASF GitHub Bot commented on HDFS-15654: --- hadoop-yetus commented on PR #5089: URL: https://github.com/apache/hadoop/pull/5089#issuecomment-1296036271 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 49s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ branch-3.3 Compile Tests _ | | +1 :green_heart: | mvninstall | 39m 38s | | branch-3.3 passed | | +1 :green_heart: | compile | 1m 22s | | branch-3.3 passed | | +1 :green_heart: | checkstyle | 1m 0s | | branch-3.3 passed | | +1 :green_heart: | mvnsite | 1m 30s | | branch-3.3 passed | | +1 :green_heart: | javadoc | 1m 45s | | branch-3.3 passed | | +1 :green_heart: | spotbugs | 3m 44s | | branch-3.3 passed | | +1 :green_heart: | shadedclient | 28m 37s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 22s | | the patch passed | | +1 :green_heart: | compile | 1m 14s | | the patch passed | | +1 :green_heart: | javac | 1m 14s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 45s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 17s | | the patch passed | | +1 :green_heart: | javadoc | 1m 25s | | the patch passed | | +1 :green_heart: | spotbugs | 3m 23s | | the patch passed | | +1 :green_heart: | shadedclient | 28m 21s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 219m 14s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 55s | | The patch does not generate ASF License warnings. | | | | 333m 30s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.TestFsck | | | hadoop.hdfs.server.mover.TestMover | | | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier | | | hadoop.hdfs.server.namenode.TestNameNodeMXBean | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5089 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux d180c017e220 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | branch-3.3 / 40e025bf8b7dfdfb6c7a1cd7819ed674135d817d | | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~18.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/2/testReport/ | | Max. process+thread count | 2239 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/2/console | | versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > TestBPOfferService#testMissBlocksWhenReregister fails intermittently > > > Key: HDFS-15654 > URL: https://issues.apache.org/jira/browse/HDFS-15654 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > >
[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626145#comment-17626145 ] ASF GitHub Bot commented on HDFS-16811: --- tomscut commented on code in PR #5068: URL: https://github.com/apache/hadoop/pull/5068#discussion_r1008773687 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java: ## @@ -2601,6 +2611,36 @@ private String reconfigureBlockInvalidateLimit(final DatanodeManager datanodeMan } } + private String reconfigureDecommissionBackoffMonitorParameters( + final DatanodeManager datanodeManager, final String property, final String newVal) + throws ReconfigurationException { +String newSetting; +try { + if (property.equals(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT)) { +int pendingRepLimit = (newVal == null ? +DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT_DEFAULT : +Integer.parseInt(newVal)); + datanodeManager.getDatanodeAdminManager().refreshPendingRepLimit(pendingRepLimit, +DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_LIMIT); +newSetting = String.valueOf(datanodeManager.getDatanodeAdminManager().getPendingRepLimit()); + } else if (property.equals(DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK)) { +int blocksPerLock = (newVal == null ? + DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK_DEFAULT : +Integer.parseInt(newVal)); + datanodeManager.getDatanodeAdminManager().refreshBlocksPerLock(blocksPerLock, +DFS_NAMENODE_DECOMMISSION_BACKOFF_MONITOR_PENDING_BLOCKS_PER_LOCK); +newSetting = String.valueOf(datanodeManager.getDatanodeAdminManager().getBlocksPerLock()); + } else { +throw new IllegalArgumentException("Unexpected property " + Review Comment: Thanks @haiyang1987 for updating. This can be removed because the value of the key is already judged in the outer layer. > Support to make dfs.namenode.decommission.backoff.monitor.pending.limit > reconfigurable > --- > > Key: HDFS-16811 > URL: https://issues.apache.org/jira/browse/HDFS-16811 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When the Backoff monitor is enabled, the parameter > dfs.namenode.decommission.backoff.monitor.pending.limit can be dynamically > adjusted to determines the maximum number of blocks related to decommission > and maintenance operations that can be loaded into the replication queue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-9536) OOM errors during parallel upgrade to Block-ID based layout
[ https://issues.apache.org/jira/browse/HDFS-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved HDFS-9536. --- Resolution: Duplicate I believe this is no longer an issue after HDFS-15937 and HDFS-15610. > OOM errors during parallel upgrade to Block-ID based layout > --- > > Key: HDFS-9536 > URL: https://issues.apache.org/jira/browse/HDFS-9536 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Major > > This is a follow-up jira for the OOM errors observed during parallel upgrade > to Block-ID based datanode layout using HDFS-8578 fix. > more clue > [here|https://issues.apache.org/jira/browse/HDFS-8578?focusedCommentId=15042012&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15042012] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16811) Support to make dfs.namenode.decommission.backoff.monitor.pending.limit reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626133#comment-17626133 ] ASF GitHub Bot commented on HDFS-16811: --- hadoop-yetus commented on PR #5068: URL: https://github.com/apache/hadoop/pull/5068#issuecomment-1295952262 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 42s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 1s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 39m 40s | | trunk passed | | +1 :green_heart: | compile | 1m 33s | | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 1m 30s | | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 18s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 46s | | trunk passed | | +1 :green_heart: | javadoc | 1m 17s | | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 35s | | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 35s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 11s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 21s | | the patch passed | | +1 :green_heart: | compile | 1m 25s | | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 1m 25s | | the patch passed | | +1 :green_heart: | compile | 1m 19s | | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 19s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 0s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5068/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 75 unchanged - 0 fixed = 76 total (was 75) | | +1 :green_heart: | mvnsite | 1m 25s | | the patch passed | | +1 :green_heart: | javadoc | 0m 53s | | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 30s | | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 18s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 44s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 237m 52s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 6s | | The patch does not generate ASF License warnings. | | | | 348m 9s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5068/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5068 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux af62785a4c81 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 5eb3fcea335142d2bf0e4c892d78c2c22f2c7126 | | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5068/2/testReport/ | | Max. process+thread count | 3168 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-ha
[jira] [Commented] (HDFS-15654) TestBPOfferService#testMissBlocksWhenReregister fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626111#comment-17626111 ] ASF GitHub Bot commented on HDFS-15654: --- xinglin commented on PR #5089: URL: https://github.com/apache/hadoop/pull/5089#issuecomment-1295936676 Hi @ashutoshcipher, > @xinglin - Can you trigger to trigger once again. Let's have a happy Yetus Made an empty commit to trigger another build. Let's whether we will get a happy yetus. > TestBPOfferService#testMissBlocksWhenReregister fails intermittently > > > Key: HDFS-15654 > URL: https://issues.apache.org/jira/browse/HDFS-15654 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > {{TestBPOfferService.testMissBlocksWhenReregister}} is flaky. It fails > randomly when the > following expression is not true: > {code:java} > assertTrue(fullBlockReportCount == totalTestBlocks || > incrBlockReportCount == totalTestBlocks); > {code} > There is a race condition here that relies once more on "time" to synchronize > between concurrent threads. The code below is is causing the > non-deterministic execution. > On a slow server, {{addNewBlockThread}} may not be done by the time the main > thread reach the assertion call. > {code:java} > // Verify FBR/IBR count is equal to generate number. > assertTrue(fullBlockReportCount == totalTestBlocks || > incrBlockReportCount == totalTestBlocks); > } finally { > addNewBlockThread.join(); > bpos.stop(); > bpos.join(); > {code} > Therefore, the correct implementation should wait for the thread to finish > {code:java} > // the thread finished execution. > addNewBlockThread.join(); > // Verify FBR/IBR count is equal to generate number. > assertTrue(fullBlockReportCount == totalTestBlocks || > incrBlockReportCount == totalTestBlocks); > } finally { > bpos.stop(); > bpos.join(); > {code} > {{DataNodeFaultInjector}} needs to have a longer wait_time too. 1 second is > not enough to satisfy the condition. > {code:java} > DataNodeFaultInjector.set(new DataNodeFaultInjector() { > public void blockUtilSendFullBlockReport() { > try { > GenericTestUtils.waitFor(() -> { > if(count.get() > 2000) { > return true; > } > return false; > }, 100, 1); // increase that waiting time to 10 seconds. > } catch (Exception e) { > e.printStackTrace(); > } > } > }); > {code} > {code:bash} > Stacktrace > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testMissBlocksWhenReregister(TestBPOfferService.java:350) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provide
[jira] [Commented] (HDFS-15654) TestBPOfferService#testMissBlocksWhenReregister fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626110#comment-17626110 ] ASF GitHub Bot commented on HDFS-15654: --- ashutoshcipher commented on PR #5089: URL: https://github.com/apache/hadoop/pull/5089#issuecomment-1295935074 @xinglin - Can you trigger to trigger once again. Let's have a happy Yetus > TestBPOfferService#testMissBlocksWhenReregister fails intermittently > > > Key: HDFS-15654 > URL: https://issues.apache.org/jira/browse/HDFS-15654 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > {{TestBPOfferService.testMissBlocksWhenReregister}} is flaky. It fails > randomly when the > following expression is not true: > {code:java} > assertTrue(fullBlockReportCount == totalTestBlocks || > incrBlockReportCount == totalTestBlocks); > {code} > There is a race condition here that relies once more on "time" to synchronize > between concurrent threads. The code below is is causing the > non-deterministic execution. > On a slow server, {{addNewBlockThread}} may not be done by the time the main > thread reach the assertion call. > {code:java} > // Verify FBR/IBR count is equal to generate number. > assertTrue(fullBlockReportCount == totalTestBlocks || > incrBlockReportCount == totalTestBlocks); > } finally { > addNewBlockThread.join(); > bpos.stop(); > bpos.join(); > {code} > Therefore, the correct implementation should wait for the thread to finish > {code:java} > // the thread finished execution. > addNewBlockThread.join(); > // Verify FBR/IBR count is equal to generate number. > assertTrue(fullBlockReportCount == totalTestBlocks || > incrBlockReportCount == totalTestBlocks); > } finally { > bpos.stop(); > bpos.join(); > {code} > {{DataNodeFaultInjector}} needs to have a longer wait_time too. 1 second is > not enough to satisfy the condition. > {code:java} > DataNodeFaultInjector.set(new DataNodeFaultInjector() { > public void blockUtilSendFullBlockReport() { > try { > GenericTestUtils.waitFor(() -> { > if(count.get() > 2000) { > return true; > } > return false; > }, 100, 1); // increase that waiting time to 10 seconds. > } catch (Exception e) { > e.printStackTrace(); > } > } > }); > {code} > {code:bash} > Stacktrace > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testMissBlocksWhenReregister(TestBPOfferService.java:350) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at
[jira] [Commented] (HDFS-15654) TestBPOfferService#testMissBlocksWhenReregister fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626083#comment-17626083 ] ASF GitHub Bot commented on HDFS-15654: --- xinglin commented on PR #5089: URL: https://github.com/apache/hadoop/pull/5089#issuecomment-1295900208 This PR only changed TestBPOfferService. It does not change any other files. The failed hdfs unit tests are probably due to flaky minihdfs. > TestBPOfferService#testMissBlocksWhenReregister fails intermittently > > > Key: HDFS-15654 > URL: https://issues.apache.org/jira/browse/HDFS-15654 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > {{TestBPOfferService.testMissBlocksWhenReregister}} is flaky. It fails > randomly when the > following expression is not true: > {code:java} > assertTrue(fullBlockReportCount == totalTestBlocks || > incrBlockReportCount == totalTestBlocks); > {code} > There is a race condition here that relies once more on "time" to synchronize > between concurrent threads. The code below is is causing the > non-deterministic execution. > On a slow server, {{addNewBlockThread}} may not be done by the time the main > thread reach the assertion call. > {code:java} > // Verify FBR/IBR count is equal to generate number. > assertTrue(fullBlockReportCount == totalTestBlocks || > incrBlockReportCount == totalTestBlocks); > } finally { > addNewBlockThread.join(); > bpos.stop(); > bpos.join(); > {code} > Therefore, the correct implementation should wait for the thread to finish > {code:java} > // the thread finished execution. > addNewBlockThread.join(); > // Verify FBR/IBR count is equal to generate number. > assertTrue(fullBlockReportCount == totalTestBlocks || > incrBlockReportCount == totalTestBlocks); > } finally { > bpos.stop(); > bpos.join(); > {code} > {{DataNodeFaultInjector}} needs to have a longer wait_time too. 1 second is > not enough to satisfy the condition. > {code:java} > DataNodeFaultInjector.set(new DataNodeFaultInjector() { > public void blockUtilSendFullBlockReport() { > try { > GenericTestUtils.waitFor(() -> { > if(count.get() > 2000) { > return true; > } > return false; > }, 100, 1); // increase that waiting time to 10 seconds. > } catch (Exception e) { > e.printStackTrace(); > } > } > }); > {code} > {code:bash} > Stacktrace > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testMissBlocksWhenReregister(TestBPOfferService.java:350) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4P