[jira] [Commented] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.
[ https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798456#comment-17798456 ] ASF GitHub Bot commented on HDFS-17297: --- hadoop-yetus commented on PR #6369: URL: https://github.com/apache/hadoop/pull/6369#issuecomment-1862225690 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 21s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 35m 21s | | trunk passed | | +1 :green_heart: | compile | 0m 45s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 41s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 41s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 42s | | trunk passed | | +1 :green_heart: | javadoc | 0m 40s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 2s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 55s | | trunk passed | | -1 :x: | shadedclient | 34m 52s | | branch has errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 41s | | the patch passed | | +1 :green_heart: | compile | 0m 41s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 41s | | the patch passed | | +1 :green_heart: | compile | 0m 36s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 36s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 31s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 40s | | the patch passed | | +1 :green_heart: | javadoc | 0m 30s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 56s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 53s | | the patch passed | | -1 :x: | shadedclient | 23m 0s | | patch has errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 0m 42s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch failed. | | +1 :green_heart: | asflicense | 0m 32s | | The patch does not generate ASF License warnings. | | | | 107m 55s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6369 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux c33f92e02368 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / dc032c9264cf9cfddd66312e378dbf1169df699b | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/2/testReport/ | | Max. process+thread count | 686 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/2/console | | versions |
[jira] [Commented] (HDFS-17292) Show the number of times the slowPeerCollectorDaemon thread has collected SlowNodes.
[ https://issues.apache.org/jira/browse/HDFS-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798449#comment-17798449 ] ASF GitHub Bot commented on HDFS-17292: --- huangzhaobo99 commented on code in PR #6364: URL: https://github.com/apache/hadoop/pull/6364#discussion_r1430976884 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyExcludeSlowNodes.java: ## @@ -176,4 +185,58 @@ public void testSlowPeerTrackerEnabledClearSlowNodes() throws Exception { } } + /** + * Dependent on the SlowNode related config, therefore placing + * 'testCollectSlowNodesIpAddrFrequencyMetrics' unit test in the + * TestReplicationPolicyExcludeSlowNodes class. + * + * Test metrics associated with CollectSlowNodesIpAddrFrequency. + */ + @Test + public void testCollectSlowNodesIpAddrFrequencyMetrics() throws Exception { +namenode.getNamesystem().writeLock(); +try { + FSNamesystem fsNamesystem = namenode.getNamesystem(); + assertEquals("{}", fsNamesystem.getCollectSlowNodesIpAddrFrequencyMap(), "{}"); + MBeanServer mBeanServer = ManagementFactory.getPlatformMBeanServer(); + ObjectName mxBeanName = new ObjectName("Hadoop:service=NameNode,name=FSNamesystemState"); + String ipAddrFrequency = + (String) mBeanServer.getAttribute(mxBeanName, "CollectSlowNodesIpAddrFrequencyMap"); + assertEquals("{}", ipAddrFrequency, "{}"); + + // add nodes + for (DatanodeDescriptor dataNode : dataNodes) { +dnManager.addDatanode(dataNode); + } + + // mock slow nodes + SlowPeerTracker tracker = dnManager.getSlowPeerTracker(); + Assert.assertNotNull(tracker); + OutlierMetrics outlierMetrics = new OutlierMetrics(0.0, 0.0, 0.0, 5.0); + tracker.addReport(dataNodes[0].getInfoAddr(), dataNodes[2].getInfoAddr(), outlierMetrics); + tracker.addReport(dataNodes[1].getInfoAddr(), dataNodes[2].getInfoAddr(), outlierMetrics); + + // waiting for slow nodes collector run and collect at least 2 times + Thread.sleep(3000); Review Comment: > If we want to wait for a while, we use `GenericTestUtils.waitFor(..)` instead of `thread.sleep`. Thanks @slfan1989, I have fixed it. > Show the number of times the slowPeerCollectorDaemon thread has collected > SlowNodes. > > > Key: HDFS-17292 > URL: https://issues.apache.org/jira/browse/HDFS-17292 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17292) Show the number of times the slowPeerCollectorDaemon thread has collected SlowNodes.
[ https://issues.apache.org/jira/browse/HDFS-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798439#comment-17798439 ] ASF GitHub Bot commented on HDFS-17292: --- slfan1989 commented on code in PR #6364: URL: https://github.com/apache/hadoop/pull/6364#discussion_r1430952283 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyExcludeSlowNodes.java: ## @@ -176,4 +185,58 @@ public void testSlowPeerTrackerEnabledClearSlowNodes() throws Exception { } } + /** + * Dependent on the SlowNode related config, therefore placing + * 'testCollectSlowNodesIpAddrFrequencyMetrics' unit test in the + * TestReplicationPolicyExcludeSlowNodes class. + * + * Test metrics associated with CollectSlowNodesIpAddrFrequency. + */ + @Test + public void testCollectSlowNodesIpAddrFrequencyMetrics() throws Exception { +namenode.getNamesystem().writeLock(); +try { + FSNamesystem fsNamesystem = namenode.getNamesystem(); + assertEquals("{}", fsNamesystem.getCollectSlowNodesIpAddrFrequencyMap(), "{}"); + MBeanServer mBeanServer = ManagementFactory.getPlatformMBeanServer(); + ObjectName mxBeanName = new ObjectName("Hadoop:service=NameNode,name=FSNamesystemState"); + String ipAddrFrequency = + (String) mBeanServer.getAttribute(mxBeanName, "CollectSlowNodesIpAddrFrequencyMap"); + assertEquals("{}", ipAddrFrequency, "{}"); + + // add nodes + for (DatanodeDescriptor dataNode : dataNodes) { +dnManager.addDatanode(dataNode); + } + + // mock slow nodes + SlowPeerTracker tracker = dnManager.getSlowPeerTracker(); + Assert.assertNotNull(tracker); + OutlierMetrics outlierMetrics = new OutlierMetrics(0.0, 0.0, 0.0, 5.0); + tracker.addReport(dataNodes[0].getInfoAddr(), dataNodes[2].getInfoAddr(), outlierMetrics); + tracker.addReport(dataNodes[1].getInfoAddr(), dataNodes[2].getInfoAddr(), outlierMetrics); + + // waiting for slow nodes collector run and collect at least 2 times + Thread.sleep(3000); Review Comment: If we want to wait for a while, we use `GenericTestUtils.waitFor(..)` instead of `thread.sleep`. > Show the number of times the slowPeerCollectorDaemon thread has collected > SlowNodes. > > > Key: HDFS-17292 > URL: https://issues.apache.org/jira/browse/HDFS-17292 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.
[ https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798425#comment-17798425 ] ASF GitHub Bot commented on HDFS-17297: --- hadoop-yetus commented on PR #6369: URL: https://github.com/apache/hadoop/pull/6369#issuecomment-1862112592 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 8m 39s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | -1 :x: | mvninstall | 0m 16s | [/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/branch-mvninstall-root.txt) | root in trunk failed. | | -1 :x: | compile | 0m 17s | [/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | hadoop-hdfs in trunk failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04. | | -1 :x: | compile | 0m 8s | [/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt) | hadoop-hdfs in trunk failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08. | | -0 :warning: | checkstyle | 3m 4s | [/buildtool-branch-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/buildtool-branch-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | The patch fails to run checkstyle in hadoop-hdfs | | -1 :x: | mvnsite | 0m 22s | [/branch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/branch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in trunk failed. | | -1 :x: | javadoc | 0m 22s | [/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | hadoop-hdfs in trunk failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04. | | -1 :x: | javadoc | 0m 21s | [/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt) | hadoop-hdfs in trunk failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08. | | -1 :x: | spotbugs | 0m 21s | [/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in trunk failed. | | +1 :green_heart: | shadedclient | 5m 13s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | -1 :x: | mvninstall | 0m 21s | [/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch failed. | | -1 :x: | compile | 0m 21s | [/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | hadoop-hdfs in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04. | | -1 :x: | javac | 0m 21s |
[jira] [Updated] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.
[ https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-17297: -- Labels: pull-request-available (was: ) > The NameNode should remove block from the BlocksMap if the block is marked as > deleted. > -- > > Key: HDFS-17297 > URL: https://issues.apache.org/jira/browse/HDFS-17297 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When call internalReleaseLease method: > {code:java} > boolean internalReleaseLease( > ... > int minLocationsNum = 1; > if (lastBlock.isStriped()) { > minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); > } > if (uc.getNumExpectedLocations() < minLocationsNum && > lastBlock.getNumBytes() == 0) { > // There is no datanode reported to this block. > // may be client have crashed before writing data to pipeline. > // This blocks doesn't need any recovery. > // We can remove this block and close the file. > pendingFile.removeLastBlock(lastBlock); > finalizeINodeFileUnderConstruction(src, pendingFile, > iip.getLatestSnapshotId(), false); > ... > } > {code} > if the condition `uc.getNumExpectedLocations() < minLocationsNum && > lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY > logic, the block is removed from the block list in the inode file and marked > as deleted. > However it is not removed from the BlocksMap, it may cause memory leak. > Therefore it is necessary to remove the block from the BlocksMap at this > point as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.
[ https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798420#comment-17798420 ] ASF GitHub Bot commented on HDFS-17297: --- haiyang1987 opened a new pull request, #6369: URL: https://github.com/apache/hadoop/pull/6369 ### Description of PR https://issues.apache.org/jira/browse/HDFS-17297 When call internalReleaseLease method: ``` boolean internalReleaseLease( ... int minLocationsNum = 1; if (lastBlock.isStriped()) { minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); } if (uc.getNumExpectedLocations() < minLocationsNum && lastBlock.getNumBytes() == 0) { // There is no datanode reported to this block. // may be client have crashed before writing data to pipeline. // This blocks doesn't need any recovery. // We can remove this block and close the file. pendingFile.removeLastBlock(lastBlock); finalizeINodeFileUnderConstruction(src, pendingFile, iip.getLatestSnapshotId(), false); ... } ``` if the condition `uc.getNumExpectedLocations() < minLocationsNum && lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY logic, the block is removed from the block list in the inode file and marked as deleted. However it is not removed from the BlocksMap, it may cause memory leak. Therefore it is necessary to remove the block from the BlocksMap at this point as well. > The NameNode should remove block from the BlocksMap if the block is marked as > deleted. > -- > > Key: HDFS-17297 > URL: https://issues.apache.org/jira/browse/HDFS-17297 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > > When call internalReleaseLease method: > {code:java} > boolean internalReleaseLease( > ... > int minLocationsNum = 1; > if (lastBlock.isStriped()) { > minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); > } > if (uc.getNumExpectedLocations() < minLocationsNum && > lastBlock.getNumBytes() == 0) { > // There is no datanode reported to this block. > // may be client have crashed before writing data to pipeline. > // This blocks doesn't need any recovery. > // We can remove this block and close the file. > pendingFile.removeLastBlock(lastBlock); > finalizeINodeFileUnderConstruction(src, pendingFile, > iip.getLatestSnapshotId(), false); > ... > } > {code} > if the condition `uc.getNumExpectedLocations() < minLocationsNum && > lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY > logic, the block is removed from the block list in the inode file and marked > as deleted. > However it is not removed from the BlocksMap, it may cause memory leak. > Therefore it is necessary to remove the block from the BlocksMap at this > point as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.
Haiyang Hu created HDFS-17297: - Summary: The NameNode should remove block from the BlocksMap if the block is marked as deleted. Key: HDFS-17297 URL: https://issues.apache.org/jira/browse/HDFS-17297 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haiyang Hu Assignee: Haiyang Hu When call internalReleaseLease method: {code:java} boolean internalReleaseLease( ... int minLocationsNum = 1; if (lastBlock.isStriped()) { minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); } if (uc.getNumExpectedLocations() < minLocationsNum && lastBlock.getNumBytes() == 0) { // There is no datanode reported to this block. // may be client have crashed before writing data to pipeline. // This blocks doesn't need any recovery. // We can remove this block and close the file. pendingFile.removeLastBlock(lastBlock); finalizeINodeFileUnderConstruction(src, pendingFile, iip.getLatestSnapshotId(), false); ... } {code} if the condition uc.getNumExpectedLocations() < minLocationsNum && lastBlock.getNumBytes() == 0 is met during the execution of UNDER_RECOVERY logic, the block is removed from the block list in the inode file and marked as deleted. However it is not removed from the BlocksMap, it may cause memory leak. Therefore it is necessary to remove the block from the BlocksMap at this point as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.
[ https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haiyang Hu updated HDFS-17297: -- Description: When call internalReleaseLease method: {code:java} boolean internalReleaseLease( ... int minLocationsNum = 1; if (lastBlock.isStriped()) { minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); } if (uc.getNumExpectedLocations() < minLocationsNum && lastBlock.getNumBytes() == 0) { // There is no datanode reported to this block. // may be client have crashed before writing data to pipeline. // This blocks doesn't need any recovery. // We can remove this block and close the file. pendingFile.removeLastBlock(lastBlock); finalizeINodeFileUnderConstruction(src, pendingFile, iip.getLatestSnapshotId(), false); ... } {code} if the condition `uc.getNumExpectedLocations() < minLocationsNum && lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY logic, the block is removed from the block list in the inode file and marked as deleted. However it is not removed from the BlocksMap, it may cause memory leak. Therefore it is necessary to remove the block from the BlocksMap at this point as well. was: When call internalReleaseLease method: {code:java} boolean internalReleaseLease( ... int minLocationsNum = 1; if (lastBlock.isStriped()) { minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); } if (uc.getNumExpectedLocations() < minLocationsNum && lastBlock.getNumBytes() == 0) { // There is no datanode reported to this block. // may be client have crashed before writing data to pipeline. // This blocks doesn't need any recovery. // We can remove this block and close the file. pendingFile.removeLastBlock(lastBlock); finalizeINodeFileUnderConstruction(src, pendingFile, iip.getLatestSnapshotId(), false); ... } {code} if the condition `uc.getNumExpectedLocations() < minLocationsNum && lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY logic, the block is removed from the block list in the inode file and marked as deleted. However it is not removed from the BlocksMap, it may cause memory leak. Therefore it is necessary to remove the block from the BlocksMap at this point as well. > The NameNode should remove block from the BlocksMap if the block is marked as > deleted. > -- > > Key: HDFS-17297 > URL: https://issues.apache.org/jira/browse/HDFS-17297 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > > When call internalReleaseLease method: > {code:java} > boolean internalReleaseLease( > ... > int minLocationsNum = 1; > if (lastBlock.isStriped()) { > minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); > } > if (uc.getNumExpectedLocations() < minLocationsNum && > lastBlock.getNumBytes() == 0) { > // There is no datanode reported to this block. > // may be client have crashed before writing data to pipeline. > // This blocks doesn't need any recovery. > // We can remove this block and close the file. > pendingFile.removeLastBlock(lastBlock); > finalizeINodeFileUnderConstruction(src, pendingFile, > iip.getLatestSnapshotId(), false); > ... > } > {code} > if the condition `uc.getNumExpectedLocations() < minLocationsNum && > lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY > logic, the block is removed from the block list in the inode file and marked > as deleted. > However it is not removed from the BlocksMap, it may cause memory leak. > Therefore it is necessary to remove the block from the BlocksMap at this > point as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.
[ https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haiyang Hu updated HDFS-17297: -- Description: When call internalReleaseLease method: {code:java} boolean internalReleaseLease( ... int minLocationsNum = 1; if (lastBlock.isStriped()) { minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); } if (uc.getNumExpectedLocations() < minLocationsNum && lastBlock.getNumBytes() == 0) { // There is no datanode reported to this block. // may be client have crashed before writing data to pipeline. // This blocks doesn't need any recovery. // We can remove this block and close the file. pendingFile.removeLastBlock(lastBlock); finalizeINodeFileUnderConstruction(src, pendingFile, iip.getLatestSnapshotId(), false); ... } {code} if the condition `uc.getNumExpectedLocations() < minLocationsNum && lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY logic, the block is removed from the block list in the inode file and marked as deleted. However it is not removed from the BlocksMap, it may cause memory leak. Therefore it is necessary to remove the block from the BlocksMap at this point as well. was: When call internalReleaseLease method: {code:java} boolean internalReleaseLease( ... int minLocationsNum = 1; if (lastBlock.isStriped()) { minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); } if (uc.getNumExpectedLocations() < minLocationsNum && lastBlock.getNumBytes() == 0) { // There is no datanode reported to this block. // may be client have crashed before writing data to pipeline. // This blocks doesn't need any recovery. // We can remove this block and close the file. pendingFile.removeLastBlock(lastBlock); finalizeINodeFileUnderConstruction(src, pendingFile, iip.getLatestSnapshotId(), false); ... } {code} if the condition uc.getNumExpectedLocations() < minLocationsNum && lastBlock.getNumBytes() == 0 is met during the execution of UNDER_RECOVERY logic, the block is removed from the block list in the inode file and marked as deleted. However it is not removed from the BlocksMap, it may cause memory leak. Therefore it is necessary to remove the block from the BlocksMap at this point as well. > The NameNode should remove block from the BlocksMap if the block is marked as > deleted. > -- > > Key: HDFS-17297 > URL: https://issues.apache.org/jira/browse/HDFS-17297 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > > When call internalReleaseLease method: > {code:java} > boolean internalReleaseLease( > ... > int minLocationsNum = 1; > if (lastBlock.isStriped()) { > minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); > } > if (uc.getNumExpectedLocations() < minLocationsNum && > lastBlock.getNumBytes() == 0) { > // There is no datanode reported to this block. > // may be client have crashed before writing data to pipeline. > // This blocks doesn't need any recovery. > // We can remove this block and close the file. > pendingFile.removeLastBlock(lastBlock); > finalizeINodeFileUnderConstruction(src, pendingFile, > iip.getLatestSnapshotId(), false); > ... > } > {code} > if the condition `uc.getNumExpectedLocations() < minLocationsNum && > lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY > logic, the block is removed from the block list in the inode file and marked > as deleted. > However it is not removed from the BlocksMap, it may cause memory leak. > Therefore it is necessary to remove the block from the BlocksMap at this > point as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue
[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798418#comment-17798418 ] ASF GitHub Bot commented on HDFS-17290: --- hadoop-yetus commented on PR #6359: URL: https://github.com/apache/hadoop/pull/6359#issuecomment-1862076266 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 18m 0s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 48m 49s | | trunk passed | | +1 :green_heart: | compile | 18m 22s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 17m 31s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 19s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 37s | | trunk passed | | +1 :green_heart: | javadoc | 1m 13s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 49s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 34s | | trunk passed | | +1 :green_heart: | shadedclient | 39m 48s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 53s | | the patch passed | | +1 :green_heart: | compile | 17m 38s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 17m 38s | | the patch passed | | +1 :green_heart: | compile | 16m 21s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 16m 21s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 15s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/2/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 2 new + 197 unchanged - 0 fixed = 199 total (was 197) | | +1 :green_heart: | mvnsite | 1m 34s | | the patch passed | | +1 :green_heart: | javadoc | 1m 7s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 49s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 39s | | the patch passed | | +1 :green_heart: | shadedclient | 39m 21s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 19m 15s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 58s | | The patch does not generate ASF License warnings. | | | | 254m 35s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6359 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux 3d77f28fbbdb 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / be908e9236ed395aedbaac6342e6bd5b84cc2340 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/2/testReport/ | | Max.
[jira] [Commented] (HDFS-17292) Show the number of times the slowPeerCollectorDaemon thread has collected SlowNodes.
[ https://issues.apache.org/jira/browse/HDFS-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798409#comment-17798409 ] ASF GitHub Bot commented on HDFS-17292: --- hadoop-yetus commented on PR #6364: URL: https://github.com/apache/hadoop/pull/6364#issuecomment-1862016225 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 0s | | Docker mode activated. | | -1 :x: | patch | 0m 20s | | https://github.com/apache/hadoop/pull/6364 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help. | | Subsystem | Report/Notes | |--:|:-| | GITHUB PR | https://github.com/apache/hadoop/pull/6364 | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6364/4/console | | versions | git=2.34.1 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > Show the number of times the slowPeerCollectorDaemon thread has collected > SlowNodes. > > > Key: HDFS-17292 > URL: https://issues.apache.org/jira/browse/HDFS-17292 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue
[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798397#comment-17798397 ] ASF GitHub Bot commented on HDFS-17290: --- simbadzina commented on code in PR #6359: URL: https://github.com/apache/hadoop/pull/6359#discussion_r1430796400 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/metrics/RpcMetrics.java: ## @@ -342,6 +344,14 @@ public void incrClientBackoff() { rpcClientBackoff.incr(); } + /** + * Client was backoff due to disconnection Review Comment: Is this the other way around, the client ended up being disconnected due to backoffs? ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -3133,6 +3133,13 @@ private void internalQueueCall(Call call, boolean blocking) // For example, IPC clients using FailoverOnNetworkExceptionRetry handle // RetriableException. rpcMetrics.incrClientBackoff(); + // Clients that are directly put into lowest priority queue are backoff and disconnected. Review Comment: Nit-> tense. backed off. > HDFS: add client rpc backoff metrics due to disconnection from lowest > priority queue > > > Key: HDFS-17290 > URL: https://issues.apache.org/jira/browse/HDFS-17290 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0, 3.4.0 >Reporter: Lei Yang >Assignee: Lei Yang >Priority: Major > Labels: pull-request-available > > Clients are backoff when rpcs cannot be enqueued. However there are different > scenarios when backoff could happen. Currently there is no way to > differenciate whether a backoff happened due to lowest prio+disconnection or > queue overflow from higher priority queues when connection between client and > namenode remains open. Currently IPC server just emits a single metrics for > all the backoffs. > Example: > # Client are directly enqueued into lowest priority queue and backoff when > lowest queue is full. Client are expected to disconnect from namenode. > # Client are enqueued into non-lowest priority queue and overflowed all the > way down to lowest priority queue and back off. In this case, connection > between client and namenode remains open. > We would like to add metrics for #1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
[ https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma resolved HDFS-17294. - Resolution: Fixed > Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread. > --- > > Key: HDFS-17294 > URL: https://issues.apache.org/jira/browse/HDFS-17294 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
[ https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-17294: Fix Version/s: 3.4.0 > Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread. > --- > > Key: HDFS-17294 > URL: https://issues.apache.org/jira/browse/HDFS-17294 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
[ https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798395#comment-17798395 ] ASF GitHub Bot commented on HDFS-17294: --- huangzhaobo99 commented on PR #6366: URL: https://github.com/apache/hadoop/pull/6366#issuecomment-1861926613 > Thanks for your contribution, @huangzhaobo99! Thanks @tasanuma for helping review and merge. > Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread. > --- > > Key: HDFS-17294 > URL: https://issues.apache.org/jira/browse/HDFS-17294 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
[ https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798392#comment-17798392 ] ASF GitHub Bot commented on HDFS-17294: --- tasanuma commented on PR #6366: URL: https://github.com/apache/hadoop/pull/6366#issuecomment-1861912474 Thanks for your contribution, @huangzhaobo99! > Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread. > --- > > Key: HDFS-17294 > URL: https://issues.apache.org/jira/browse/HDFS-17294 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
[ https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798391#comment-17798391 ] ASF GitHub Bot commented on HDFS-17294: --- tasanuma merged PR #6366: URL: https://github.com/apache/hadoop/pull/6366 > Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread. > --- > > Key: HDFS-17294 > URL: https://issues.apache.org/jira/browse/HDFS-17294 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue
[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-17290: Description: Clients are backoff when rpcs cannot be enqueued. However there are different scenarios when backoff could happen. Currently there is no way to differenciate whether a backoff happened due to lowest prio+disconnection or queue overflow from higher priority queues when connection between client and namenode remains open. Currently IPC server just emits a single metrics for all the backoffs. Example: # Client are directly enqueued into lowest priority queue and backoff when lowest queue is full. Client are expected to disconnect from namenode. # Client are enqueued into non-lowest priority queue and overflowed all the way down to lowest priority queue and back off. In this case, connection between client and namenode remains open. We would like to add metrics for #1 was: Clients are backoff when rpcs cannot be enqueued. However there are different scenarios when backoff could happen. Currently there is no way to differenciate whether a backoff happened due to lowest prio+disconnection or queue overflow from higher ones. IPC server just emits a monolithic metrics for all the backoffs. Example: # Client are directly enqueued into lowest priority queue and backoff when lowest queue is full. Client are expected to disconnect from namenode. # Client are enqueued into non-lowest priority queue and overflowed all the way down to lowest priority queue and back off. In this case, connection between client and namenode remains open. We would like to add metrics for #1 > HDFS: add client rpc backoff metrics due to disconnection from lowest > priority queue > > > Key: HDFS-17290 > URL: https://issues.apache.org/jira/browse/HDFS-17290 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0, 3.4.0 >Reporter: Lei Yang >Assignee: Lei Yang >Priority: Major > Labels: pull-request-available > > Clients are backoff when rpcs cannot be enqueued. However there are different > scenarios when backoff could happen. Currently there is no way to > differenciate whether a backoff happened due to lowest prio+disconnection or > queue overflow from higher priority queues when connection between client and > namenode remains open. Currently IPC server just emits a single metrics for > all the backoffs. > Example: > # Client are directly enqueued into lowest priority queue and backoff when > lowest queue is full. Client are expected to disconnect from namenode. > # Client are enqueued into non-lowest priority queue and overflowed all the > way down to lowest priority queue and back off. In this case, connection > between client and namenode remains open. > We would like to add metrics for #1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue
[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798383#comment-17798383 ] ASF GitHub Bot commented on HDFS-17290: --- mccormickt12 commented on PR #6359: URL: https://github.com/apache/hadoop/pull/6359#issuecomment-1861870778 cc @goiri to get another pair of eyes on this > HDFS: add client rpc backoff metrics due to disconnection from lowest > priority queue > > > Key: HDFS-17290 > URL: https://issues.apache.org/jira/browse/HDFS-17290 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0, 3.4.0 >Reporter: Lei Yang >Assignee: Lei Yang >Priority: Major > Labels: pull-request-available > > Clients are backoff when rpcs cannot be enqueued. However there are different > scenarios when backoff could happen. Currently there is no way to > differenciate whether a backoff happened due to lowest prio+disconnection or > queue overflow from higher ones. IPC server just emits a monolithic metrics > for all the backoffs. > Example: > # Client are directly enqueued into lowest priority queue and backoff when > lowest queue is full. Client are expected to disconnect from namenode. > # Client are enqueued into non-lowest priority queue and overflowed all the > way down to lowest priority queue and back off. In this case, connection > between client and namenode remains open. > We would like to add metrics for #1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue
[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798382#comment-17798382 ] Lei Yang edited comment on HDFS-17290 at 12/18/23 11:48 PM: [~goiri] [~simbadzina] Can you please review this and let me know if you have any concerns? fyi [~mccormickt12] was (Author: JIRAUSER286942): [~goiri] [~simbadzina] Can you please review this and let me know if you have any concerns? > HDFS: add client rpc backoff metrics due to disconnection from lowest > priority queue > > > Key: HDFS-17290 > URL: https://issues.apache.org/jira/browse/HDFS-17290 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0, 3.4.0 >Reporter: Lei Yang >Assignee: Lei Yang >Priority: Major > Labels: pull-request-available > > Clients are backoff when rpcs cannot be enqueued. However there are different > scenarios when backoff could happen. Currently there is no way to > differenciate whether a backoff happened due to lowest prio+disconnection or > queue overflow from higher ones. IPC server just emits a monolithic metrics > for all the backoffs. > Example: > # Client are directly enqueued into lowest priority queue and backoff when > lowest queue is full. Client are expected to disconnect from namenode. > # Client are enqueued into non-lowest priority queue and overflowed all the > way down to lowest priority queue and back off. In this case, connection > between client and namenode remains open. > We would like to add metrics for #1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue
[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-17290: Description: Clients are backoff when rpcs cannot be enqueued. However there are different scenarios when backoff could happen. Currently there is no way to differenciate whether a backoff happened due to lowest prio+disconnection or queue overflow from higher ones. IPC server just emits a monolithic metrics for all the backoffs. Example: # Client are directly enqueued into lowest priority queue and backoff when lowest queue is full. Client are expected to disconnect from namenode. # Client are enqueued into non-lowest priority queue and overflowed all the way down to lowest priority queue and back off. In this case, connection between client and namenode remains open. We would like to add metrics for #1 was: Clients are backoff when rpcs cannot be enqueued. However there are different scenarios when backoff could happen. Currently there is no way to differenciate whether a backoff happened due to lowest prio+disconnection or queue overflow from higher ones. Example: # Client are directly enqueued into lowest priority queue and backoff when lowest queue is full. Client are expected to disconnect from namenode. # Client are enqueued into non-lowest priority queue and overflowed all the way down to lowest priority queue and back off. In this case, connection between client and namenode remains open. We would like to add metrics for #1 > HDFS: add client rpc backoff metrics due to disconnection from lowest > priority queue > > > Key: HDFS-17290 > URL: https://issues.apache.org/jira/browse/HDFS-17290 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0, 3.4.0 >Reporter: Lei Yang >Assignee: Lei Yang >Priority: Major > Labels: pull-request-available > > Clients are backoff when rpcs cannot be enqueued. However there are different > scenarios when backoff could happen. Currently there is no way to > differenciate whether a backoff happened due to lowest prio+disconnection or > queue overflow from higher ones. IPC server just emits a monolithic metrics > for all the backoffs. > Example: > # Client are directly enqueued into lowest priority queue and backoff when > lowest queue is full. Client are expected to disconnect from namenode. > # Client are enqueued into non-lowest priority queue and overflowed all the > way down to lowest priority queue and back off. In this case, connection > between client and namenode remains open. > We would like to add metrics for #1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue
[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798382#comment-17798382 ] Lei Yang commented on HDFS-17290: - [~goiri] [~simbadzina] Can you please review this and let me know if you have any concerns? > HDFS: add client rpc backoff metrics due to disconnection from lowest > priority queue > > > Key: HDFS-17290 > URL: https://issues.apache.org/jira/browse/HDFS-17290 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0, 3.4.0 >Reporter: Lei Yang >Assignee: Lei Yang >Priority: Major > Labels: pull-request-available > > Clients are backoff when rpcs cannot be enqueued. However there are different > scenarios when backoff could happen. Currently there is no way to > differenciate whether a backoff happened due to lowest prio+disconnection or > queue overflow from higher ones. IPC server just emits a monolithic metrics > for all the backoffs. > Example: > # Client are directly enqueued into lowest priority queue and backoff when > lowest queue is full. Client are expected to disconnect from namenode. > # Client are enqueued into non-lowest priority queue and overflowed all the > way down to lowest priority queue and back off. In this case, connection > between client and namenode remains open. > We would like to add metrics for #1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue
[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-17290: Description: Clients are backoff when rpcs cannot be enqueued. However there are different scenarios when backoff could happen. Currently there is no way to differenciate whether a backoff happened due to lowest prio+disconnection or queue overflow from higher ones. Example: # Client are directly enqueued into lowest priority queue and backoff when lowest queue is full. Client are expected to disconnect from namenode. # Client are enqueued into non-lowest priority queue and overflowed all the way down to lowest priority queue and back off. In this case, connection between client and namenode remains open. We would like to add metrics for #1 was: Clients are backoff when rpcs cannot be enqueued. However there are cases when this could happen. Example assumes prio # Client are directly enqueued into lowest priority queue and backoff when lowest queue is full. Client are expected to disconnect from namenode. # Client are enqueued into non-lowest priority queue and overflowed all the way down to lowest priority queue and back off. In this case, connection between client and namenode remains open. > HDFS: add client rpc backoff metrics due to disconnection from lowest > priority queue > > > Key: HDFS-17290 > URL: https://issues.apache.org/jira/browse/HDFS-17290 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0, 3.3.6 >Reporter: Lei Yang >Assignee: Lei Yang >Priority: Major > Labels: pull-request-available > > Clients are backoff when rpcs cannot be enqueued. However there are different > scenarios when backoff could happen. Currently there is no way to > differenciate whether a backoff happened due to lowest prio+disconnection or > queue overflow from higher ones. > Example: > # Client are directly enqueued into lowest priority queue and backoff when > lowest queue is full. Client are expected to disconnect from namenode. > # Client are enqueued into non-lowest priority queue and overflowed all the > way down to lowest priority queue and back off. In this case, connection > between client and namenode remains open. > We would like to add metrics for #1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue
[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-17290: Affects Version/s: 3.4.0 (was: 3.3.6) > HDFS: add client rpc backoff metrics due to disconnection from lowest > priority queue > > > Key: HDFS-17290 > URL: https://issues.apache.org/jira/browse/HDFS-17290 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0, 3.4.0 >Reporter: Lei Yang >Assignee: Lei Yang >Priority: Major > Labels: pull-request-available > > Clients are backoff when rpcs cannot be enqueued. However there are different > scenarios when backoff could happen. Currently there is no way to > differenciate whether a backoff happened due to lowest prio+disconnection or > queue overflow from higher ones. > Example: > # Client are directly enqueued into lowest priority queue and backoff when > lowest queue is full. Client are expected to disconnect from namenode. > # Client are enqueued into non-lowest priority queue and overflowed all the > way down to lowest priority queue and back off. In this case, connection > between client and namenode remains open. > We would like to add metrics for #1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue
[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-17290: Summary: HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue (was: HDFS: add client rpc backoff metrics due to throttling from lowest priority queue) > HDFS: add client rpc backoff metrics due to disconnection from lowest > priority queue > > > Key: HDFS-17290 > URL: https://issues.apache.org/jira/browse/HDFS-17290 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0, 3.3.6 >Reporter: Lei Yang >Assignee: Lei Yang >Priority: Major > Labels: pull-request-available > > Clients are backoff when rpcs cannot be enqueued. However there are cases > when this could happen. Example assumes prio > > # Client are directly enqueued into lowest priority queue and backoff when > lowest queue is full. Client are expected to disconnect from namenode. > # Client are enqueued into non-lowest priority queue and overflowed all the > way down to lowest priority queue and back off. In this case, connection > between client and namenode remains open. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17286) Add UDP as a transfer protocol for HDFS
[ https://issues.apache.org/jira/browse/HDFS-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xing Lin updated HDFS-17286: Attachment: active.png observer.png > Add UDP as a transfer protocol for HDFS > --- > > Key: HDFS-17286 > URL: https://issues.apache.org/jira/browse/HDFS-17286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Xing Lin >Priority: Major > Attachments: active.png, observer.png > > > Right now, every connection in HDFS is based on RPC/IPC which is based on > TCP. Connection is re-used based on ConnectionID, which includes RpcTimeout > as part of the key to identify a connection. The consequence is if we want to > use a different rpc timeout between two hosts, this would create different > TCP connections. > A use case which motivated us to consider UDP is getHAServiceState() in > ObserverReadProxyProvider. We'd like getHAServiceState() to time out with a > much smaller timeout threshold and move to probe next Namenode. To support > this, we used an executorService and set a timeout for the task in > HDFS-17030. This implementation can be improved by using UDP to query > HAServiceState. getHAServiceState() does not have to be very reliable, as we > can always fall back to the active. > Another motivation is it seems 5~10% of RPC calls hitting our > active/observers are GetHAServiceState(). If we can move them off to the UDP > server, that can hopefully improve RPC latency. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17286) Add UDP as a transfer protocol for HDFS
[ https://issues.apache.org/jira/browse/HDFS-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xing Lin updated HDFS-17286: Attachment: (was: active.png) > Add UDP as a transfer protocol for HDFS > --- > > Key: HDFS-17286 > URL: https://issues.apache.org/jira/browse/HDFS-17286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Xing Lin >Priority: Major > > Right now, every connection in HDFS is based on RPC/IPC which is based on > TCP. Connection is re-used based on ConnectionID, which includes RpcTimeout > as part of the key to identify a connection. The consequence is if we want to > use a different rpc timeout between two hosts, this would create different > TCP connections. > A use case which motivated us to consider UDP is getHAServiceState() in > ObserverReadProxyProvider. We'd like getHAServiceState() to time out with a > much smaller timeout threshold and move to probe next Namenode. To support > this, we used an executorService and set a timeout for the task in > HDFS-17030. This implementation can be improved by using UDP to query > HAServiceState. getHAServiceState() does not have to be very reliable, as we > can always fall back to the active. > Another motivation is it seems 5~10% of RPC calls hitting our > active/observers are GetHAServiceState(). If we can move them off to the UDP > server, that can hopefully improve RPC latency. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17286) Add UDP as a transfer protocol for HDFS
[ https://issues.apache.org/jira/browse/HDFS-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xing Lin updated HDFS-17286: Attachment: (was: Observer.png) > Add UDP as a transfer protocol for HDFS > --- > > Key: HDFS-17286 > URL: https://issues.apache.org/jira/browse/HDFS-17286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Xing Lin >Priority: Major > > Right now, every connection in HDFS is based on RPC/IPC which is based on > TCP. Connection is re-used based on ConnectionID, which includes RpcTimeout > as part of the key to identify a connection. The consequence is if we want to > use a different rpc timeout between two hosts, this would create different > TCP connections. > A use case which motivated us to consider UDP is getHAServiceState() in > ObserverReadProxyProvider. We'd like getHAServiceState() to time out with a > much smaller timeout threshold and move to probe next Namenode. To support > this, we used an executorService and set a timeout for the task in > HDFS-17030. This implementation can be improved by using UDP to query > HAServiceState. getHAServiceState() does not have to be very reliable, as we > can always fall back to the active. > Another motivation is it seems 5~10% of RPC calls hitting our > active/observers are GetHAServiceState(). If we can move them off to the UDP > server, that can hopefully improve RPC latency. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
[ https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798366#comment-17798366 ] ASF GitHub Bot commented on HDFS-17294: --- hadoop-yetus commented on PR #6366: URL: https://github.com/apache/hadoop/pull/6366#issuecomment-1861726825 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 21s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 33s | | trunk passed | | +1 :green_heart: | compile | 0m 41s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 42s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 36s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 43s | | trunk passed | | +1 :green_heart: | javadoc | 0m 43s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 4s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 40s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 10s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 36s | | the patch passed | | +1 :green_heart: | compile | 0m 37s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 37s | | the patch passed | | +1 :green_heart: | compile | 0m 33s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 33s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 29s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 36s | | the patch passed | | +1 :green_heart: | javadoc | 0m 29s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 59s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 38s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 32s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 187m 15s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6366/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 29s | | The patch does not generate ASF License warnings. | | | | 274m 2s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6366/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6366 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux da866634b780 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d7efbf8d6716eb77b0fd93d415494223e0a20a26 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6366/5/testReport/ | | Max. process+thread count | 4490 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output |
[jira] [Commented] (HDFS-17292) Show the number of times the slowPeerCollectorDaemon thread has collected SlowNodes.
[ https://issues.apache.org/jira/browse/HDFS-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798355#comment-17798355 ] ASF GitHub Bot commented on HDFS-17292: --- hadoop-yetus commented on PR #6364: URL: https://github.com/apache/hadoop/pull/6364#issuecomment-1861603070 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 49s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 18s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 36m 50s | | trunk passed | | +1 :green_heart: | compile | 6m 8s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 5m 53s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 32s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 4s | | trunk passed | | +1 :green_heart: | javadoc | 1m 44s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 2m 5s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 4m 39s | | trunk passed | | +1 :green_heart: | shadedclient | 39m 51s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 31s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 41s | | the patch passed | | +1 :green_heart: | compile | 5m 58s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 5m 58s | | the patch passed | | +1 :green_heart: | compile | 5m 51s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 5m 51s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 18s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 47s | | the patch passed | | +1 :green_heart: | javadoc | 1m 27s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 52s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 4m 45s | | the patch passed | | +1 :green_heart: | shadedclient | 38m 27s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 241m 58s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6364/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | unit | 23m 22s | | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 46s | | The patch does not generate ASF License warnings. | | | | 450m 22s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6364/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6364 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux d7afeccbfbfd 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 21fced341c2b1af22c668d9d3270fc1f5346841e | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results |
[jira] [Commented] (HDFS-17275) We should determine whether the block has been deleted in the block report
[ https://issues.apache.org/jira/browse/HDFS-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798340#comment-17798340 ] ASF GitHub Bot commented on HDFS-17275: --- hadoop-yetus commented on PR #6335: URL: https://github.com/apache/hadoop/pull/6335#issuecomment-1861466358 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 33s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 43m 37s | | trunk passed | | +1 :green_heart: | compile | 1m 22s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 1m 15s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 8s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 24s | | trunk passed | | +1 :green_heart: | javadoc | 1m 5s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 34s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 15s | | trunk passed | | +1 :green_heart: | shadedclient | 34m 40s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 11s | | the patch passed | | +1 :green_heart: | compile | 1m 11s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 1m 11s | | the patch passed | | +1 :green_heart: | compile | 1m 5s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 1m 5s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 0s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 13s | | the patch passed | | +1 :green_heart: | javadoc | 0m 51s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 28s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 17s | | the patch passed | | +1 :green_heart: | shadedclient | 34m 24s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 213m 23s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 43s | | The patch does not generate ASF License warnings. | | | | 349m 49s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6335/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6335 | | JIRA Issue | HDFS-17275 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux beed9c8ee0ae 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 0c8052f3eab0336439495e4c7dbb6f4b65c93727 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6335/2/testReport/ | | Max. process+thread count | 3406 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6335/2/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
[jira] [Updated] (HDFS-17296) ACL inheritance broken for new files
[ https://issues.apache.org/jira/browse/HDFS-17296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emil Kleszcz updated HDFS-17296: Affects Version/s: 3.3.6 > ACL inheritance broken for new files > > > Key: HDFS-17296 > URL: https://issues.apache.org/jira/browse/HDFS-17296 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.5, 3.2.1, 3.3.6 >Reporter: Emil Kleszcz >Priority: Critical > > Looks like the inheritance of ACLs for the files is not working correctly. > I have tried the following in HDFS v3.2.1: > {code:java} > >hdfs dfs -mkdir /test > >hdfs dfs -touchz /test/test1 > >hdfs dfs -mkdir /test/testdir1 > >hdfs dfs -setfacl -m user:test:rwx /test > >hdfs dfs -touchz /test/test2 > >hdfs dfs -getfacl -R /test # file: /test > # owner: hdfs > # group: hdfs > user::rwx > group::rwx > other::rwx > # file: /test/test1 > # owner: hdfs > # group: hdfs > user::rw- > group::rw- > other::rw- > # file: /test/test2 > # owner: hdfs > # group: hdfs > user::rw- > group::r-- > other::r-- > # file: /test/testdir1 > # owner: hdfs > # group: hdfs > user::rwx > group::rwx > other::rwx{code} > The same happens when I set default permissions and umask to rwx > {code:java} > hdfs dfs -setfacl -m default:user::rwx /test > hdfs dfs -setfacl -m mask::rwx /test{code} > Also I was overwriting the default umask-mode in core-site.xml: > {code:java} > > fs.permissions.umask-mode > 000 > {code} > Not helping. > Other relevant parameters: > {code:java} > > dfs.permissions > true > > dfs.permissions.supergroup > hdfs > > dfs.namenode.acls.enabled > true > {code} > Inheritance was not disabled and according to docs by default is set to true: > {code:java} > dfs.namenode.posix.acl.inheritance.enabled{code} > Ref. > [https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17296) ACL inheritance broken for new files
[ https://issues.apache.org/jira/browse/HDFS-17296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emil Kleszcz updated HDFS-17296: Description: Looks like the inheritance of ACLs for the files is not working correctly. I have tried the following in HDFS v3.2.1: {code:java} >hdfs dfs -mkdir /test >hdfs dfs -touchz /test/test1 >hdfs dfs -mkdir /test/testdir1 >hdfs dfs -setfacl -m user:test:rwx /test >hdfs dfs -touchz /test/test2 >hdfs dfs -getfacl -R /test # file: /test # owner: hdfs # group: hdfs user::rwx group::rwx other::rwx # file: /test/test1 # owner: hdfs # group: hdfs user::rw- group::rw- other::rw- # file: /test/test2 # owner: hdfs # group: hdfs user::rw- group::r-- other::r-- # file: /test/testdir1 # owner: hdfs # group: hdfs user::rwx group::rwx other::rwx{code} The same happens when I set default permissions and umask to rwx {code:java} hdfs dfs -setfacl -m default:user::rwx /test hdfs dfs -setfacl -m mask::rwx /test{code} Also I was overwriting the default umask-mode in core-site.xml: {code:java} fs.permissions.umask-mode 000 {code} Not helping. Other relevant parameters: {code:java} dfs.permissions true dfs.permissions.supergroup hdfs dfs.namenode.acls.enabled true {code} Inheritance was not disabled and according to docs by default is set to true: {code:java} dfs.namenode.posix.acl.inheritance.enabled{code} Ref. [https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml] was: Looks like the inheritance of ACLs for the files is not working correctly. I have tried the following in HDFS v3.2.1: {code:java} >hdfs dfs -mkdir /test >hdfs dfs -touchz /test/test1 >hdfs dfs -mkdir /test/testdir1 >hdfs dfs -setfacl -m user:test:rwx /test >hdfs dfs -getfacl -R /test # file: /test # owner: hdfs # group: hdfs user::rwx group::rwx other::rwx # file: /test/test1 # owner: hdfs # group: hdfs user::rw- group::rw- other::rw- # file: /test/test2 # owner: hdfs # group: hdfs user::rw- group::r-- other::r-- # file: /test/testdir1 # owner: hdfs # group: hdfs user::rwx group::rwx other::rwx{code} The same happens when I set default permissions and umask to rwx {code:java} hdfs dfs -setfacl -m default:user::rwx /test hdfs dfs -setfacl -m mask::rwx /test{code} Also I was overwriting the default umask-mode in core-site.xml: {code:java} fs.permissions.umask-mode 000 {code} Not helping. Other relevant parameters: {code:java} dfs.permissions true dfs.permissions.supergroup hdfs dfs.namenode.acls.enabled true {code} Inheritance was not disabled and according to docs by default is set to true: {code:java} dfs.namenode.posix.acl.inheritance.enabled{code} Ref. [https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml] > ACL inheritance broken for new files > > > Key: HDFS-17296 > URL: https://issues.apache.org/jira/browse/HDFS-17296 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.5, 3.2.1 >Reporter: Emil Kleszcz >Priority: Critical > > Looks like the inheritance of ACLs for the files is not working correctly. > I have tried the following in HDFS v3.2.1: > {code:java} > >hdfs dfs -mkdir /test > >hdfs dfs -touchz /test/test1 > >hdfs dfs -mkdir /test/testdir1 > >hdfs dfs -setfacl -m user:test:rwx /test > >hdfs dfs -touchz /test/test2 > >hdfs dfs -getfacl -R /test # file: /test > # owner: hdfs > # group: hdfs > user::rwx > group::rwx > other::rwx > # file: /test/test1 > # owner: hdfs > # group: hdfs > user::rw- > group::rw- > other::rw- > # file: /test/test2 > # owner: hdfs > # group: hdfs > user::rw- > group::r-- > other::r-- > # file: /test/testdir1 > # owner: hdfs > # group: hdfs > user::rwx > group::rwx > other::rwx{code} > The same happens when I set default permissions and umask to rwx > {code:java} > hdfs dfs -setfacl -m default:user::rwx /test > hdfs dfs -setfacl -m mask::rwx /test{code} > Also I was overwriting the default umask-mode in core-site.xml: > {code:java} > > fs.permissions.umask-mode > 000 > {code} > Not helping. > Other relevant parameters: > {code:java} > > dfs.permissions > true > > dfs.permissions.supergroup > hdfs > > dfs.namenode.acls.enabled > true > {code} > Inheritance was not disabled and according to docs by default is set to true: > {code:java} > dfs.namenode.posix.acl.inheritance.enabled{code} > Ref. > [https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17296) ACL inheritance broken for new files
[ https://issues.apache.org/jira/browse/HDFS-17296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emil Kleszcz updated HDFS-17296: Description: Looks like the inheritance of ACLs for the files is not working correctly. I have tried the following in HDFS v3.2.1: {code:java} >hdfs dfs -mkdir /test >hdfs dfs -touchz /test/test1 >hdfs dfs -mkdir /test/testdir1 >hdfs dfs -setfacl -m user:test:rwx /test >hdfs dfs -getfacl -R /test # file: /test # owner: hdfs # group: hdfs user::rwx group::rwx other::rwx # file: /test/test1 # owner: hdfs # group: hdfs user::rw- group::rw- other::rw- # file: /test/test2 # owner: hdfs # group: hdfs user::rw- group::r-- other::r-- # file: /test/testdir1 # owner: hdfs # group: hdfs user::rwx group::rwx other::rwx{code} The same happens when I set default permissions and umask to rwx {code:java} hdfs dfs -setfacl -m default:user::rwx /test hdfs dfs -setfacl -m mask::rwx /test{code} Also I was overwriting the default umask-mode in core-site.xml: {code:java} fs.permissions.umask-mode 000 {code} Not helping. Other relevant parameters: {code:java} dfs.permissions true dfs.permissions.supergroup hdfs dfs.namenode.acls.enabled true {code} Inheritance was not disabled and according to docs by default is set to true: {code:java} dfs.namenode.posix.acl.inheritance.enabled{code} Ref. [https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml] was: Looks like the inheritance of ACLs for the files is not working correctly. I have tried the following in HDFS v3.2.1: {code:java} >hdfs dfs -mkdir /test >hdfs dfs -touchz /test/test1 >hdfs dfs -mkdir /test/testdir1 >hdfs dfs -getfacl -R /test # file: /test # owner: hdfs # group: hdfs user::rwx group::rwx other::rwx# file: /test/test1 # owner: hdfs # group: hdfs user::rw- group::rw- other::rw-# file: /test/testdir1 # owner: hdfs # group: hdfs user::rwx group::rwx other::rwx{code} The same happens when I set default permissions and umask to rwx {code:java} hdfs dfs -setfacl -m default:user::rwx /test hdfs dfs -setfacl -m mask::rwx /test{code} Also I was overwriting the default umask-mode in core-site.xml: {code:java} fs.permissions.umask-mode 000 {code} Not helping. Other relevant parameters: {code:java} dfs.permissions true dfs.permissions.supergroup hdfs dfs.namenode.acls.enabled true {code} Inheritance was not disabled and according to docs by default is set to true: {code:java} dfs.namenode.posix.acl.inheritance.enabled{code} Ref. [https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml] > ACL inheritance broken for new files > > > Key: HDFS-17296 > URL: https://issues.apache.org/jira/browse/HDFS-17296 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.5, 3.2.1 >Reporter: Emil Kleszcz >Priority: Critical > > Looks like the inheritance of ACLs for the files is not working correctly. > I have tried the following in HDFS v3.2.1: > {code:java} > >hdfs dfs -mkdir /test > >hdfs dfs -touchz /test/test1 > >hdfs dfs -mkdir /test/testdir1 > >hdfs dfs -setfacl -m user:test:rwx /test > >hdfs dfs -getfacl -R /test # file: /test > # owner: hdfs > # group: hdfs > user::rwx > group::rwx > other::rwx > # file: /test/test1 > # owner: hdfs > # group: hdfs > user::rw- > group::rw- > other::rw- > # file: /test/test2 > # owner: hdfs > # group: hdfs > user::rw- > group::r-- > other::r-- > # file: /test/testdir1 > # owner: hdfs > # group: hdfs > user::rwx > group::rwx > other::rwx{code} > The same happens when I set default permissions and umask to rwx > {code:java} > hdfs dfs -setfacl -m default:user::rwx /test > hdfs dfs -setfacl -m mask::rwx /test{code} > Also I was overwriting the default umask-mode in core-site.xml: > {code:java} > > fs.permissions.umask-mode > 000 > {code} > Not helping. > Other relevant parameters: > {code:java} > > dfs.permissions > true > > dfs.permissions.supergroup > hdfs > > dfs.namenode.acls.enabled > true > {code} > Inheritance was not disabled and according to docs by default is set to true: > {code:java} > dfs.namenode.posix.acl.inheritance.enabled{code} > Ref. > [https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17296) ACL inheritance broken for new files
Emil Kleszcz created HDFS-17296: --- Summary: ACL inheritance broken for new files Key: HDFS-17296 URL: https://issues.apache.org/jira/browse/HDFS-17296 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.2.1, 2.7.5 Reporter: Emil Kleszcz Looks like the inheritance of ACLs for the files is not working correctly. I have tried the following in HDFS v3.2.1: {code:java} >hdfs dfs -mkdir /test >hdfs dfs -touchz /test/test1 >hdfs dfs -mkdir /test/testdir1 >hdfs dfs -getfacl -R /test # file: /test # owner: hdfs # group: hdfs user::rwx group::rwx other::rwx# file: /test/test1 # owner: hdfs # group: hdfs user::rw- group::rw- other::rw-# file: /test/testdir1 # owner: hdfs # group: hdfs user::rwx group::rwx other::rwx{code} The same happens when I set default permissions and umask to rwx {code:java} hdfs dfs -setfacl -m default:user::rwx /test hdfs dfs -setfacl -m mask::rwx /test{code} Also I was overwriting the default umask-mode in core-site.xml: {code:java} fs.permissions.umask-mode 000 {code} Not helping. Other relevant parameters: {code:java} dfs.permissions true dfs.permissions.supergroup hdfs dfs.namenode.acls.enabled true {code} Inheritance was not disabled and according to docs by default is set to true: {code:java} dfs.namenode.posix.acl.inheritance.enabled{code} Ref. [https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
[ https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798239#comment-17798239 ] ASF GitHub Bot commented on HDFS-17294: --- hadoop-yetus commented on PR #6366: URL: https://github.com/apache/hadoop/pull/6366#issuecomment-1860748953 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 21s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 35m 18s | | trunk passed | | +1 :green_heart: | compile | 0m 45s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 39s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 37s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 44s | | trunk passed | | +1 :green_heart: | javadoc | 0m 45s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 9s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 4s | | trunk passed | | -1 :x: | shadedclient | 28m 39s | | branch has errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 38s | | the patch passed | | +1 :green_heart: | compile | 0m 38s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 38s | | the patch passed | | +1 :green_heart: | compile | 0m 37s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 37s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 34s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 45s | | the patch passed | | +1 :green_heart: | javadoc | 0m 31s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 59s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 51s | | the patch passed | | -1 :x: | shadedclient | 23m 37s | | patch has errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 0m 24s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6366/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch failed. | | +1 :green_heart: | asflicense | 0m 21s | | The patch does not generate ASF License warnings. | | | | 102m 14s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6366/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6366 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux e70d04b26b8f 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d7efbf8d6716eb77b0fd93d415494223e0a20a26 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6366/3/testReport/ | | Max. process+thread count | 516 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6366/3/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This
[jira] [Commented] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
[ https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798222#comment-17798222 ] ASF GitHub Bot commented on HDFS-17294: --- tasanuma commented on PR #6366: URL: https://github.com/apache/hadoop/pull/6366#issuecomment-1860613005 Thanks for updating the PR. +1, pending CI. > Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread. > --- > > Key: HDFS-17294 > URL: https://issues.apache.org/jira/browse/HDFS-17294 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
[ https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798210#comment-17798210 ] ASF GitHub Bot commented on HDFS-17294: --- huangzhaobo99 commented on code in PR #6366: URL: https://github.com/apache/hadoop/pull/6366#discussion_r1430139244 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java: ## @@ -2673,6 +2677,24 @@ String reconfigureSlowNodesParameters(final DatanodeManager datanodeManager, datanodeManager.setMaxSlowPeersToReport(maxSlowPeersToReport); break; } + case DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_KEY: { +if (newVal == null) { + // set to the value of the current system or default + long defaultInterval = + getConf().getTimeDuration(DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_KEY, + DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_DEFAULT, TimeUnit.MILLISECONDS); + datanodeManager.restartSlowPeerCollector(defaultInterval); + result = DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_DEFAULT; Review Comment: > Maybe this is the correct line? Thanks, I have fixed it. > Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread. > --- > > Key: HDFS-17294 > URL: https://issues.apache.org/jira/browse/HDFS-17294 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
[ https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798205#comment-17798205 ] ASF GitHub Bot commented on HDFS-17294: --- tasanuma commented on code in PR #6366: URL: https://github.com/apache/hadoop/pull/6366#discussion_r1430132571 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java: ## @@ -2673,6 +2677,24 @@ String reconfigureSlowNodesParameters(final DatanodeManager datanodeManager, datanodeManager.setMaxSlowPeersToReport(maxSlowPeersToReport); break; } + case DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_KEY: { +if (newVal == null) { + // set to the value of the current system or default + long defaultInterval = + getConf().getTimeDuration(DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_KEY, + DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_DEFAULT, TimeUnit.MILLISECONDS); + datanodeManager.restartSlowPeerCollector(defaultInterval); + result = DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_DEFAULT; Review Comment: Maybe this is the correct line? ```suggestion result = Long.toString(defaultInterval); ``` > Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread. > --- > > Key: HDFS-17294 > URL: https://issues.apache.org/jira/browse/HDFS-17294 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17289) Considering the size of non-lastBlocks equals to complete block size can cause append failure.
[ https://issues.apache.org/jira/browse/HDFS-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798189#comment-17798189 ] ASF GitHub Bot commented on HDFS-17289: --- hadoop-yetus commented on PR #6357: URL: https://github.com/apache/hadoop/pull/6357#issuecomment-1860394370 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 20s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 13m 54s | | Maven dependency ordering for branch | | -1 :x: | mvninstall | 20m 34s | [/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6357/5/artifact/out/branch-mvninstall-root.txt) | root in trunk failed. | | +1 :green_heart: | compile | 2m 49s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 2m 53s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 43s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 14s | | trunk passed | | +1 :green_heart: | javadoc | 1m 4s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 28s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 59s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 8s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 20s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 4s | | the patch passed | | +1 :green_heart: | compile | 2m 50s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 2m 50s | | the patch passed | | +1 :green_heart: | compile | 2m 43s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 2m 43s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 35s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6357/5/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 4 new + 49 unchanged - 0 fixed = 53 total (was 49) | | +1 :green_heart: | mvnsite | 1m 8s | | the patch passed | | +1 :green_heart: | javadoc | 0m 54s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 21s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 8s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 28s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 49s | | hadoop-hdfs-client in the patch passed. | | +1 :green_heart: | unit | 188m 39s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 27s | | The patch does not generate ASF License warnings. | | | | 294m 27s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6357/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6357 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 8f2a577263f2 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 65927eef34947fd9bee244e759e699a48291c3e5 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
[jira] [Commented] (HDFS-17295) 'hdfs dfs -put' may fail when more than half of the datanodes are unavailable
[ https://issues.apache.org/jira/browse/HDFS-17295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798040#comment-17798040 ] Xuze Yang commented on HDFS-17295: -- A simple solution is to increase the considerLoadFactor. For example, if we set considerLoadFactor to 4, then more than three-quarters of datanodes are unavailable may lead to put operation fail. More generally, if considerLoadFactor is N, then only more than (N-1)/N datanodes are unavailable may lead to put operation fail. However, in my opinion, simply increasing the considerLoadFactor is not good enough. Because a larger considerLoadFactor value means a more uneven load, which may lead to a decrease in the read and write performance of the datanode. > 'hdfs dfs -put' may fail when more than half of the datanodes are unavailable > - > > Key: HDFS-17295 > URL: https://issues.apache.org/jira/browse/HDFS-17295 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.1 >Reporter: Xuze Yang >Priority: Major > Attachments: image-2023-12-18-13-45-56-824.png, > image-2023-12-18-13-58-25-102.png, image-2023-12-18-14-07-05-802.png, > image-2023-12-18-14-25-12-447.png > > > I encountered an error in one of our production environments. > Client error log is: > !image-2023-12-18-13-45-56-824.png|width=978,height=211! > namenode error log is: > !image-2023-12-18-13-58-25-102.png|width=974,height=373! > datanode capacity usage is: > !image-2023-12-18-14-07-05-802.png! > 12 datanodes are all excluded because 7 is full and 5 is busy. 7 full is > obviously from datanode capacity usage. 5 busy can be derived from following > code: > !image-2023-12-18-14-25-12-447.png! > *considerLoadFactor* is set to 2 by default(controlled by > dfs.namenode.replication.considerLoad.factor) > *stats. getInServiceXceiverAverage()* is the total number of Xceivers divided > by the current number of datanodes in service. > In the error scenario mentioned above, the Xceiver count of 12 datanodes are: > 0, 0, 0, 0, 0, 0, 0, 24, 24, 24, 24, 24. Then the maxLoad is 2*(120/12)=20. > The last 5 datanodes will be excluded because 24 greater than 20. > Under the current settings, as long as more than half of the datanodes are > unavailable, the remaining available datanodes may be excluded due to high > load. > More than half of datanodes are unavailable is not a rare scenario. Capacity > used up is one example. Storage policy is another example, suppose we has a 5 > datanodes's cluster, 3 datanodes are all SSD, 2 datanodes are all HDD. The > storage policy for the /test/read and /test/write directories is HOT. > Starting from a certain moment, we read files in the/test/read directory, > which lead 2 HDD datanodes's Xceiver high, then we try to put files into > /test/write directory, the put operation will fail and throw the similar > exception mentioned before. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org