date:20231218

[jira] [Commented] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798456#comment-17798456
 ] 

ASF GitHub Bot commented on HDFS-17297:
---

hadoop-yetus commented on PR #6369:
URL: https://github.com/apache/hadoop/pull/6369#issuecomment-1862225690

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 21s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m 21s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 41s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 55s |  |  trunk passed  |
   | -1 :x: |  shadedclient  |  34m 52s |  |  branch has errors when building 
and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 53s |  |  the patch passed  |
   | -1 :x: |  shadedclient  |  23m  0s |  |  patch has errors when building 
and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   0m 42s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 32s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 107m 55s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6369 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c33f92e02368 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / dc032c9264cf9cfddd66312e378dbf1169df699b |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/2/testReport/ |
   | Max. process+thread count | 686 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/2/console |
   | versions |

[jira] [Commented] (HDFS-17292) Show the number of times the slowPeerCollectorDaemon thread has collected SlowNodes.

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798449#comment-17798449
 ] 

ASF GitHub Bot commented on HDFS-17292:
---

huangzhaobo99 commented on code in PR #6364:
URL: https://github.com/apache/hadoop/pull/6364#discussion_r1430976884


##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyExcludeSlowNodes.java:
##
@@ -176,4 +185,58 @@ public void testSlowPeerTrackerEnabledClearSlowNodes() 
throws Exception {
 }
   }
 
+  /**
+   * Dependent on the SlowNode related config, therefore placing
+   * 'testCollectSlowNodesIpAddrFrequencyMetrics' unit test in the
+   * TestReplicationPolicyExcludeSlowNodes class.
+   * 
+   * Test metrics associated with CollectSlowNodesIpAddrFrequency.
+   */
+  @Test
+  public void testCollectSlowNodesIpAddrFrequencyMetrics() throws Exception {
+namenode.getNamesystem().writeLock();
+try {
+  FSNamesystem fsNamesystem = namenode.getNamesystem();
+  assertEquals("{}", fsNamesystem.getCollectSlowNodesIpAddrFrequencyMap(), 
"{}");
+  MBeanServer mBeanServer = ManagementFactory.getPlatformMBeanServer();
+  ObjectName mxBeanName = new 
ObjectName("Hadoop:service=NameNode,name=FSNamesystemState");
+  String ipAddrFrequency =
+  (String) mBeanServer.getAttribute(mxBeanName, 
"CollectSlowNodesIpAddrFrequencyMap");
+  assertEquals("{}", ipAddrFrequency, "{}");
+
+  // add nodes
+  for (DatanodeDescriptor dataNode : dataNodes) {
+dnManager.addDatanode(dataNode);
+  }
+
+  // mock slow nodes
+  SlowPeerTracker tracker = dnManager.getSlowPeerTracker();
+  Assert.assertNotNull(tracker);
+  OutlierMetrics outlierMetrics = new OutlierMetrics(0.0, 0.0, 0.0, 5.0);
+  tracker.addReport(dataNodes[0].getInfoAddr(), 
dataNodes[2].getInfoAddr(), outlierMetrics);
+  tracker.addReport(dataNodes[1].getInfoAddr(), 
dataNodes[2].getInfoAddr(), outlierMetrics);
+
+  // waiting for slow nodes collector run and collect at least 2 times
+  Thread.sleep(3000);

Review Comment:
   > If we want to wait for a while, we use `GenericTestUtils.waitFor(..)` 
instead of `thread.sleep`.
   
   Thanks @slfan1989, I have fixed it.





> Show the number of times the slowPeerCollectorDaemon thread has collected 
> SlowNodes.
> 
>
> Key: HDFS-17292
> URL: https://issues.apache.org/jira/browse/HDFS-17292
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17292) Show the number of times the slowPeerCollectorDaemon thread has collected SlowNodes.

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798439#comment-17798439
 ] 

ASF GitHub Bot commented on HDFS-17292:
---

slfan1989 commented on code in PR #6364:
URL: https://github.com/apache/hadoop/pull/6364#discussion_r1430952283


##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyExcludeSlowNodes.java:
##
@@ -176,4 +185,58 @@ public void testSlowPeerTrackerEnabledClearSlowNodes() 
throws Exception {
 }
   }
 
+  /**
+   * Dependent on the SlowNode related config, therefore placing
+   * 'testCollectSlowNodesIpAddrFrequencyMetrics' unit test in the
+   * TestReplicationPolicyExcludeSlowNodes class.
+   * 
+   * Test metrics associated with CollectSlowNodesIpAddrFrequency.
+   */
+  @Test
+  public void testCollectSlowNodesIpAddrFrequencyMetrics() throws Exception {
+namenode.getNamesystem().writeLock();
+try {
+  FSNamesystem fsNamesystem = namenode.getNamesystem();
+  assertEquals("{}", fsNamesystem.getCollectSlowNodesIpAddrFrequencyMap(), 
"{}");
+  MBeanServer mBeanServer = ManagementFactory.getPlatformMBeanServer();
+  ObjectName mxBeanName = new 
ObjectName("Hadoop:service=NameNode,name=FSNamesystemState");
+  String ipAddrFrequency =
+  (String) mBeanServer.getAttribute(mxBeanName, 
"CollectSlowNodesIpAddrFrequencyMap");
+  assertEquals("{}", ipAddrFrequency, "{}");
+
+  // add nodes
+  for (DatanodeDescriptor dataNode : dataNodes) {
+dnManager.addDatanode(dataNode);
+  }
+
+  // mock slow nodes
+  SlowPeerTracker tracker = dnManager.getSlowPeerTracker();
+  Assert.assertNotNull(tracker);
+  OutlierMetrics outlierMetrics = new OutlierMetrics(0.0, 0.0, 0.0, 5.0);
+  tracker.addReport(dataNodes[0].getInfoAddr(), 
dataNodes[2].getInfoAddr(), outlierMetrics);
+  tracker.addReport(dataNodes[1].getInfoAddr(), 
dataNodes[2].getInfoAddr(), outlierMetrics);
+
+  // waiting for slow nodes collector run and collect at least 2 times
+  Thread.sleep(3000);

Review Comment:
   If we want to wait for a while, we use `GenericTestUtils.waitFor(..)` 
instead of `thread.sleep`.





> Show the number of times the slowPeerCollectorDaemon thread has collected 
> SlowNodes.
> 
>
> Key: HDFS-17292
> URL: https://issues.apache.org/jira/browse/HDFS-17292
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798425#comment-17798425
 ] 

ASF GitHub Bot commented on HDFS-17297:
---

hadoop-yetus commented on PR #6369:
URL: https://github.com/apache/hadoop/pull/6369#issuecomment-1862112592

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   8m 39s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |   0m 16s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | -1 :x: |  compile  |   0m 17s | 
[/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-hdfs in trunk failed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  compile  |   0m  8s | 
[/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt)
 |  hadoop-hdfs in trunk failed with JDK Private 
Build-1.8.0_392-8u392-ga-1~20.04-b08.  |
   | -0 :warning: |  checkstyle  |   3m  4s | 
[/buildtool-branch-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/buildtool-branch-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  The patch fails to run checkstyle in hadoop-hdfs  |
   | -1 :x: |  mvnsite  |   0m 22s | 
[/branch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/branch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in trunk failed.  |
   | -1 :x: |  javadoc  |   0m 22s | 
[/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-hdfs in trunk failed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  javadoc  |   0m 21s | 
[/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt)
 |  hadoop-hdfs in trunk failed with JDK Private 
Build-1.8.0_392-8u392-ga-1~20.04-b08.  |
   | -1 :x: |  spotbugs  |   0m 21s | 
[/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in trunk failed.  |
   | +1 :green_heart: |  shadedclient  |   5m 13s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   0m 21s | 
[/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch failed.  |
   | -1 :x: |  compile  |   0m 21s | 
[/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6369/1/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-hdfs in the patch failed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  javac  |   0m 21s |

[jira] [Updated] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.

2023-12-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17297:
--
Labels: pull-request-available  (was: )

> The NameNode should remove block from the BlocksMap if the block is marked as 
> deleted.
> --
>
> Key: HDFS-17297
> URL: https://issues.apache.org/jira/browse/HDFS-17297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> When call internalReleaseLease method:
> {code:java}
> boolean internalReleaseLease(
> ...
> int minLocationsNum = 1;
> if (lastBlock.isStriped()) {
>   minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
> }
> if (uc.getNumExpectedLocations() < minLocationsNum &&
> lastBlock.getNumBytes() == 0) {
>   // There is no datanode reported to this block.
>   // may be client have crashed before writing data to pipeline.
>   // This blocks doesn't need any recovery.
>   // We can remove this block and close the file.
>   pendingFile.removeLastBlock(lastBlock);
>   finalizeINodeFileUnderConstruction(src, pendingFile,
>   iip.getLatestSnapshotId(), false); 
> ...
> }
> {code}
>  if the condition `uc.getNumExpectedLocations() < minLocationsNum && 
> lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY 
> logic, the block is removed from the block list in the inode file and marked 
> as deleted. 
> However it is not removed from the BlocksMap, it may cause memory leak.
> Therefore it is necessary to remove the block from the BlocksMap at this 
> point as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798420#comment-17798420
 ] 

ASF GitHub Bot commented on HDFS-17297:
---

haiyang1987 opened a new pull request, #6369:
URL: https://github.com/apache/hadoop/pull/6369

   ### Description of PR
   https://issues.apache.org/jira/browse/HDFS-17297
   
   When call internalReleaseLease method:
   
   ```
   boolean internalReleaseLease(
   ...
   int minLocationsNum = 1;
   if (lastBlock.isStriped()) {
 minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
   }
   
   if (uc.getNumExpectedLocations() < minLocationsNum &&
   lastBlock.getNumBytes() == 0) {
 // There is no datanode reported to this block.
 // may be client have crashed before writing data to pipeline.
 // This blocks doesn't need any recovery.
 // We can remove this block and close the file.
 pendingFile.removeLastBlock(lastBlock);
 finalizeINodeFileUnderConstruction(src, pendingFile,
 iip.getLatestSnapshotId(), false); 
   ...
   }
   ```
   if the condition `uc.getNumExpectedLocations() < minLocationsNum && 
lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY 
logic, the block is removed from the block list in the inode file and marked as 
deleted.
   However it is not removed from the BlocksMap, it may cause memory leak.
   
   Therefore it is necessary to remove the block from the BlocksMap at this 
point as well.
   
   




> The NameNode should remove block from the BlocksMap if the block is marked as 
> deleted.
> --
>
> Key: HDFS-17297
> URL: https://issues.apache.org/jira/browse/HDFS-17297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>
> When call internalReleaseLease method:
> {code:java}
> boolean internalReleaseLease(
> ...
> int minLocationsNum = 1;
> if (lastBlock.isStriped()) {
>   minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
> }
> if (uc.getNumExpectedLocations() < minLocationsNum &&
> lastBlock.getNumBytes() == 0) {
>   // There is no datanode reported to this block.
>   // may be client have crashed before writing data to pipeline.
>   // This blocks doesn't need any recovery.
>   // We can remove this block and close the file.
>   pendingFile.removeLastBlock(lastBlock);
>   finalizeINodeFileUnderConstruction(src, pendingFile,
>   iip.getLatestSnapshotId(), false); 
> ...
> }
> {code}
>  if the condition `uc.getNumExpectedLocations() < minLocationsNum && 
> lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY 
> logic, the block is removed from the block list in the inode file and marked 
> as deleted. 
> However it is not removed from the BlocksMap, it may cause memory leak.
> Therefore it is necessary to remove the block from the BlocksMap at this 
> point as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.

2023-12-18 Thread Haiyang Hu (Jira)

Haiyang Hu created HDFS-17297:
-

 Summary: The NameNode should remove block from the BlocksMap if 
the block is marked as deleted.
 Key: HDFS-17297
 URL: https://issues.apache.org/jira/browse/HDFS-17297
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haiyang Hu
Assignee: Haiyang Hu


When call internalReleaseLease method:

{code:java}
boolean internalReleaseLease(
...
int minLocationsNum = 1;
if (lastBlock.isStriped()) {
  minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
}

if (uc.getNumExpectedLocations() < minLocationsNum &&
lastBlock.getNumBytes() == 0) {
  // There is no datanode reported to this block.
  // may be client have crashed before writing data to pipeline.
  // This blocks doesn't need any recovery.
  // We can remove this block and close the file.
  pendingFile.removeLastBlock(lastBlock);
  finalizeINodeFileUnderConstruction(src, pendingFile,
  iip.getLatestSnapshotId(), false); 
...
}
{code}
 if the condition uc.getNumExpectedLocations() < minLocationsNum && 
lastBlock.getNumBytes() == 0 is met during the execution of UNDER_RECOVERY 
logic, the block is removed from the block list in the inode file and marked as 
deleted. 
However it is not removed from the BlocksMap, it may cause memory leak.
Therefore it is necessary to remove the block from the BlocksMap at this point 
as well.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.

2023-12-18 Thread Haiyang Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haiyang Hu updated HDFS-17297:
--
Description: 
When call internalReleaseLease method:

{code:java}
boolean internalReleaseLease(
...
int minLocationsNum = 1;
if (lastBlock.isStriped()) {
  minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
}

if (uc.getNumExpectedLocations() < minLocationsNum &&
lastBlock.getNumBytes() == 0) {
  // There is no datanode reported to this block.
  // may be client have crashed before writing data to pipeline.
  // This blocks doesn't need any recovery.
  // We can remove this block and close the file.
  pendingFile.removeLastBlock(lastBlock);
  finalizeINodeFileUnderConstruction(src, pendingFile,
  iip.getLatestSnapshotId(), false); 
...
}
{code}
 if the condition `uc.getNumExpectedLocations() < minLocationsNum && 
lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY 
logic, the block is removed from the block list in the inode file and marked as 
deleted. 
However it is not removed from the BlocksMap, it may cause memory leak.

Therefore it is necessary to remove the block from the BlocksMap at this point 
as well.



  was:
When call internalReleaseLease method:

{code:java}
boolean internalReleaseLease(
...
int minLocationsNum = 1;
if (lastBlock.isStriped()) {
  minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
}

if (uc.getNumExpectedLocations() < minLocationsNum &&
lastBlock.getNumBytes() == 0) {
  // There is no datanode reported to this block.
  // may be client have crashed before writing data to pipeline.
  // This blocks doesn't need any recovery.
  // We can remove this block and close the file.
  pendingFile.removeLastBlock(lastBlock);
  finalizeINodeFileUnderConstruction(src, pendingFile,
  iip.getLatestSnapshotId(), false); 
...
}
{code}
 if the condition `uc.getNumExpectedLocations() < minLocationsNum && 
lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY 
logic, the block is removed from the block list in the inode file and marked as 
deleted. 
However it is not removed from the BlocksMap, it may cause memory leak.
Therefore it is necessary to remove the block from the BlocksMap at this point 
as well.




> The NameNode should remove block from the BlocksMap if the block is marked as 
> deleted.
> --
>
> Key: HDFS-17297
> URL: https://issues.apache.org/jira/browse/HDFS-17297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>
> When call internalReleaseLease method:
> {code:java}
> boolean internalReleaseLease(
> ...
> int minLocationsNum = 1;
> if (lastBlock.isStriped()) {
>   minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
> }
> if (uc.getNumExpectedLocations() < minLocationsNum &&
> lastBlock.getNumBytes() == 0) {
>   // There is no datanode reported to this block.
>   // may be client have crashed before writing data to pipeline.
>   // This blocks doesn't need any recovery.
>   // We can remove this block and close the file.
>   pendingFile.removeLastBlock(lastBlock);
>   finalizeINodeFileUnderConstruction(src, pendingFile,
>   iip.getLatestSnapshotId(), false); 
> ...
> }
> {code}
>  if the condition `uc.getNumExpectedLocations() < minLocationsNum && 
> lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY 
> logic, the block is removed from the block list in the inode file and marked 
> as deleted. 
> However it is not removed from the BlocksMap, it may cause memory leak.
> Therefore it is necessary to remove the block from the BlocksMap at this 
> point as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.

2023-12-18 Thread Haiyang Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haiyang Hu updated HDFS-17297:
--
Description: 
When call internalReleaseLease method:

{code:java}
boolean internalReleaseLease(
...
int minLocationsNum = 1;
if (lastBlock.isStriped()) {
  minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
}

if (uc.getNumExpectedLocations() < minLocationsNum &&
lastBlock.getNumBytes() == 0) {
  // There is no datanode reported to this block.
  // may be client have crashed before writing data to pipeline.
  // This blocks doesn't need any recovery.
  // We can remove this block and close the file.
  pendingFile.removeLastBlock(lastBlock);
  finalizeINodeFileUnderConstruction(src, pendingFile,
  iip.getLatestSnapshotId(), false); 
...
}
{code}
 if the condition `uc.getNumExpectedLocations() < minLocationsNum && 
lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY 
logic, the block is removed from the block list in the inode file and marked as 
deleted. 
However it is not removed from the BlocksMap, it may cause memory leak.
Therefore it is necessary to remove the block from the BlocksMap at this point 
as well.



  was:
When call internalReleaseLease method:

{code:java}
boolean internalReleaseLease(
...
int minLocationsNum = 1;
if (lastBlock.isStriped()) {
  minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
}

if (uc.getNumExpectedLocations() < minLocationsNum &&
lastBlock.getNumBytes() == 0) {
  // There is no datanode reported to this block.
  // may be client have crashed before writing data to pipeline.
  // This blocks doesn't need any recovery.
  // We can remove this block and close the file.
  pendingFile.removeLastBlock(lastBlock);
  finalizeINodeFileUnderConstruction(src, pendingFile,
  iip.getLatestSnapshotId(), false); 
...
}
{code}
 if the condition uc.getNumExpectedLocations() < minLocationsNum && 
lastBlock.getNumBytes() == 0 is met during the execution of UNDER_RECOVERY 
logic, the block is removed from the block list in the inode file and marked as 
deleted. 
However it is not removed from the BlocksMap, it may cause memory leak.
Therefore it is necessary to remove the block from the BlocksMap at this point 
as well.




> The NameNode should remove block from the BlocksMap if the block is marked as 
> deleted.
> --
>
> Key: HDFS-17297
> URL: https://issues.apache.org/jira/browse/HDFS-17297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>
> When call internalReleaseLease method:
> {code:java}
> boolean internalReleaseLease(
> ...
> int minLocationsNum = 1;
> if (lastBlock.isStriped()) {
>   minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
> }
> if (uc.getNumExpectedLocations() < minLocationsNum &&
> lastBlock.getNumBytes() == 0) {
>   // There is no datanode reported to this block.
>   // may be client have crashed before writing data to pipeline.
>   // This blocks doesn't need any recovery.
>   // We can remove this block and close the file.
>   pendingFile.removeLastBlock(lastBlock);
>   finalizeINodeFileUnderConstruction(src, pendingFile,
>   iip.getLatestSnapshotId(), false); 
> ...
> }
> {code}
>  if the condition `uc.getNumExpectedLocations() < minLocationsNum && 
> lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY 
> logic, the block is removed from the block list in the inode file and marked 
> as deleted. 
> However it is not removed from the BlocksMap, it may cause memory leak.
> Therefore it is necessary to remove the block from the BlocksMap at this 
> point as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798418#comment-17798418
 ] 

ASF GitHub Bot commented on HDFS-17290:
---

hadoop-yetus commented on PR #6359:
URL: https://github.com/apache/hadoop/pull/6359#issuecomment-1862076266

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  18m  0s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  48m 49s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  18m 22s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |  17m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 19s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 37s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 13s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 49s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m 34s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 48s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 53s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m 38s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |  17m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  16m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 15s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/2/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 2 new + 197 
unchanged - 0 fixed = 199 total (was 197)  |
   | +1 :green_heart: |  mvnsite  |   1m 34s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 49s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m 39s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  39m 21s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 15s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 58s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 254m 35s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6359 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
|
   | uname | Linux 3d77f28fbbdb 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / be908e9236ed395aedbaac6342e6bd5b84cc2340 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/2/testReport/ |
   | Max.

[jira] [Commented] (HDFS-17292) Show the number of times the slowPeerCollectorDaemon thread has collected SlowNodes.

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798409#comment-17798409
 ] 

ASF GitHub Bot commented on HDFS-17292:
---

hadoop-yetus commented on PR #6364:
URL: https://github.com/apache/hadoop/pull/6364#issuecomment-1862016225

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m  0s |  |  Docker mode activated.  |
   | -1 :x: |  patch  |   0m 20s |  |  
https://github.com/apache/hadoop/pull/6364 does not apply to trunk. Rebase 
required? Wrong Branch? See 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.  
|
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6364 |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6364/4/console |
   | versions | git=2.34.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Show the number of times the slowPeerCollectorDaemon thread has collected 
> SlowNodes.
> 
>
> Key: HDFS-17292
> URL: https://issues.apache.org/jira/browse/HDFS-17292
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798397#comment-17798397
 ] 

ASF GitHub Bot commented on HDFS-17290:
---

simbadzina commented on code in PR #6359:
URL: https://github.com/apache/hadoop/pull/6359#discussion_r1430796400


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/metrics/RpcMetrics.java:
##
@@ -342,6 +344,14 @@ public void incrClientBackoff() {
 rpcClientBackoff.incr();
   }
 
+  /**
+   * Client was backoff due to disconnection

Review Comment:
   Is this the other way around, the client ended up being disconnected due to 
backoffs?



##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java:
##
@@ -3133,6 +3133,13 @@ private void internalQueueCall(Call call, boolean 
blocking)
   // For example, IPC clients using FailoverOnNetworkExceptionRetry handle
   // RetriableException.
   rpcMetrics.incrClientBackoff();
+  // Clients that are directly put into lowest priority queue are backoff 
and disconnected.

Review Comment:
   Nit-> tense. backed off.





> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher priority queues when connection between client and 
> namenode remains open. Currently IPC server just emits a single metrics for 
> all the backoffs.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2023-12-18 Thread Takanobu Asanuma (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17294.
-
Resolution: Fixed

> Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
> ---
>
> Key: HDFS-17294
> URL: https://issues.apache.org/jira/browse/HDFS-17294
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2023-12-18 Thread Takanobu Asanuma (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17294:

Fix Version/s: 3.4.0

> Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
> ---
>
> Key: HDFS-17294
> URL: https://issues.apache.org/jira/browse/HDFS-17294
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798395#comment-17798395
 ] 

ASF GitHub Bot commented on HDFS-17294:
---

huangzhaobo99 commented on PR #6366:
URL: https://github.com/apache/hadoop/pull/6366#issuecomment-1861926613

   > Thanks for your contribution, @huangzhaobo99!
   
   Thanks @tasanuma for helping review and merge.




> Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
> ---
>
> Key: HDFS-17294
> URL: https://issues.apache.org/jira/browse/HDFS-17294
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798392#comment-17798392
 ] 

ASF GitHub Bot commented on HDFS-17294:
---

tasanuma commented on PR #6366:
URL: https://github.com/apache/hadoop/pull/6366#issuecomment-1861912474

   Thanks for your contribution, @huangzhaobo99!




> Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
> ---
>
> Key: HDFS-17294
> URL: https://issues.apache.org/jira/browse/HDFS-17294
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798391#comment-17798391
 ] 

ASF GitHub Bot commented on HDFS-17294:
---

tasanuma merged PR #6366:
URL: https://github.com/apache/hadoop/pull/6366




> Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
> ---
>
> Key: HDFS-17294
> URL: https://issues.apache.org/jira/browse/HDFS-17294
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread Lei Yang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17290:

Description: 
Clients are backoff when rpcs cannot be enqueued. However there are different 
scenarios when backoff could happen. Currently there is no way to differenciate 
whether a backoff happened due to lowest prio+disconnection or queue overflow 
from higher priority queues when connection between client and namenode remains 
open. Currently IPC server just emits a single metrics for all the backoffs.

Example:
 # Client are directly enqueued into lowest priority queue and backoff when 
lowest queue is full. Client are expected to disconnect from namenode.
 # Client are enqueued into non-lowest priority queue and overflowed all the 
way down to lowest priority queue and back off. In this case, connection 
between client and namenode remains open.

We would like to add metrics for #1

  was:
Clients are backoff when rpcs cannot be enqueued. However there are different 
scenarios when backoff could happen. Currently there is no way to differenciate 
whether a backoff happened due to lowest prio+disconnection or queue overflow 
from higher ones. IPC server just emits a monolithic metrics for all the 
backoffs.

Example:
 # Client are directly enqueued into lowest priority queue and backoff when 
lowest queue is full. Client are expected to disconnect from namenode.
 # Client are enqueued into non-lowest priority queue and overflowed all the 
way down to lowest priority queue and back off. In this case, connection 
between client and namenode remains open.

We would like to add metrics for #1


> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher priority queues when connection between client and 
> namenode remains open. Currently IPC server just emits a single metrics for 
> all the backoffs.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798383#comment-17798383
 ] 

ASF GitHub Bot commented on HDFS-17290:
---

mccormickt12 commented on PR #6359:
URL: https://github.com/apache/hadoop/pull/6359#issuecomment-1861870778

   cc @goiri to get another pair of eyes on this




> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher ones. IPC server just emits a monolithic metrics 
> for all the backoffs.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread Lei Yang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798382#comment-17798382
 ] 

Lei Yang edited comment on HDFS-17290 at 12/18/23 11:48 PM:


[~goiri]  [~simbadzina] Can you please review this and let me know if you have 
any concerns?

fyi [~mccormickt12] 


was (Author: JIRAUSER286942):
[~goiri]  [~simbadzina] Can you please review this and let me know if you have 
any concerns?

> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher ones. IPC server just emits a monolithic metrics 
> for all the backoffs.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread Lei Yang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17290:

Description: 
Clients are backoff when rpcs cannot be enqueued. However there are different 
scenarios when backoff could happen. Currently there is no way to differenciate 
whether a backoff happened due to lowest prio+disconnection or queue overflow 
from higher ones. IPC server just emits a monolithic metrics for all the 
backoffs.

Example:
 # Client are directly enqueued into lowest priority queue and backoff when 
lowest queue is full. Client are expected to disconnect from namenode.
 # Client are enqueued into non-lowest priority queue and overflowed all the 
way down to lowest priority queue and back off. In this case, connection 
between client and namenode remains open.

We would like to add metrics for #1

  was:
Clients are backoff when rpcs cannot be enqueued. However there are different 
scenarios when backoff could happen. Currently there is no way to differenciate 
whether a backoff happened due to lowest prio+disconnection or queue overflow 
from higher ones.

Example:
 # Client are directly enqueued into lowest priority queue and backoff when 
lowest queue is full. Client are expected to disconnect from namenode.
 # Client are enqueued into non-lowest priority queue and overflowed all the 
way down to lowest priority queue and back off. In this case, connection 
between client and namenode remains open.

We would like to add metrics for #1


> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher ones. IPC server just emits a monolithic metrics 
> for all the backoffs.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread Lei Yang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798382#comment-17798382
 ] 

Lei Yang commented on HDFS-17290:
-

[~goiri]  [~simbadzina] Can you please review this and let me know if you have 
any concerns?

> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher ones. IPC server just emits a monolithic metrics 
> for all the backoffs.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread Lei Yang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17290:

Description: 
Clients are backoff when rpcs cannot be enqueued. However there are different 
scenarios when backoff could happen. Currently there is no way to differenciate 
whether a backoff happened due to lowest prio+disconnection or queue overflow 
from higher ones.

Example:
 # Client are directly enqueued into lowest priority queue and backoff when 
lowest queue is full. Client are expected to disconnect from namenode.
 # Client are enqueued into non-lowest priority queue and overflowed all the 
way down to lowest priority queue and back off. In this case, connection 
between client and namenode remains open.

We would like to add metrics for #1

  was:
Clients are backoff when rpcs cannot be enqueued. However there are cases when 
this could happen. Example assumes prio

 
 # Client are directly enqueued into lowest priority queue and backoff when 
lowest queue is full. Client are expected to disconnect from namenode.
 # Client are enqueued into non-lowest priority queue and overflowed all the 
way down to lowest priority queue and back off. In this case, connection 
between client and namenode remains open.


> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.3.6
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher ones.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread Lei Yang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17290:

Affects Version/s: 3.4.0
   (was: 3.3.6)

> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher ones.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread Lei Yang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17290:

Summary: HDFS: add client rpc backoff metrics due to disconnection from 
lowest priority queue  (was: HDFS: add client rpc backoff metrics due to 
throttling from lowest priority queue)

> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.3.6
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are cases 
> when this could happen. Example assumes prio
>  
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17286) Add UDP as a transfer protocol for HDFS

2023-12-18 Thread Xing Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xing Lin updated HDFS-17286:

Attachment: active.png
observer.png

> Add UDP as a transfer protocol for HDFS
> ---
>
> Key: HDFS-17286
> URL: https://issues.apache.org/jira/browse/HDFS-17286
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Xing Lin
>Priority: Major
> Attachments: active.png, observer.png
>
>
> Right now, every connection in HDFS is based on RPC/IPC which is based on 
> TCP. Connection is re-used based on ConnectionID, which includes RpcTimeout 
> as part of the key to identify a connection. The consequence is if we want to 
> use a different rpc timeout between two hosts, this would create different 
> TCP connections. 
> A use case which motivated us to consider UDP is getHAServiceState() in 
> ObserverReadProxyProvider. We'd like getHAServiceState() to time out with a 
> much smaller timeout threshold and move to probe next Namenode. To support 
> this, we used an executorService and set a timeout for the task in 
> HDFS-17030. This implementation can be improved by using UDP to query 
> HAServiceState. getHAServiceState() does not have to be very reliable, as we 
> can always fall back to the active.
> Another motivation is it seems 5~10% of RPC calls hitting our 
> active/observers are GetHAServiceState(). If we can move them off to the UDP 
> server, that can hopefully improve RPC latency.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17286) Add UDP as a transfer protocol for HDFS

2023-12-18 Thread Xing Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xing Lin updated HDFS-17286:

Attachment: (was: active.png)

> Add UDP as a transfer protocol for HDFS
> ---
>
> Key: HDFS-17286
> URL: https://issues.apache.org/jira/browse/HDFS-17286
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Xing Lin
>Priority: Major
>
> Right now, every connection in HDFS is based on RPC/IPC which is based on 
> TCP. Connection is re-used based on ConnectionID, which includes RpcTimeout 
> as part of the key to identify a connection. The consequence is if we want to 
> use a different rpc timeout between two hosts, this would create different 
> TCP connections. 
> A use case which motivated us to consider UDP is getHAServiceState() in 
> ObserverReadProxyProvider. We'd like getHAServiceState() to time out with a 
> much smaller timeout threshold and move to probe next Namenode. To support 
> this, we used an executorService and set a timeout for the task in 
> HDFS-17030. This implementation can be improved by using UDP to query 
> HAServiceState. getHAServiceState() does not have to be very reliable, as we 
> can always fall back to the active.
> Another motivation is it seems 5~10% of RPC calls hitting our 
> active/observers are GetHAServiceState(). If we can move them off to the UDP 
> server, that can hopefully improve RPC latency.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17286) Add UDP as a transfer protocol for HDFS

2023-12-18 Thread Xing Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xing Lin updated HDFS-17286:

Attachment: (was: Observer.png)

> Add UDP as a transfer protocol for HDFS
> ---
>
> Key: HDFS-17286
> URL: https://issues.apache.org/jira/browse/HDFS-17286
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Xing Lin
>Priority: Major
>
> Right now, every connection in HDFS is based on RPC/IPC which is based on 
> TCP. Connection is re-used based on ConnectionID, which includes RpcTimeout 
> as part of the key to identify a connection. The consequence is if we want to 
> use a different rpc timeout between two hosts, this would create different 
> TCP connections. 
> A use case which motivated us to consider UDP is getHAServiceState() in 
> ObserverReadProxyProvider. We'd like getHAServiceState() to time out with a 
> much smaller timeout threshold and move to probe next Namenode. To support 
> this, we used an executorService and set a timeout for the task in 
> HDFS-17030. This implementation can be improved by using UDP to query 
> HAServiceState. getHAServiceState() does not have to be very reliable, as we 
> can always fall back to the active.
> Another motivation is it seems 5~10% of RPC calls hitting our 
> active/observers are GetHAServiceState(). If we can move them off to the UDP 
> server, that can hopefully improve RPC latency.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798366#comment-17798366
 ] 

ASF GitHub Bot commented on HDFS-17294:
---

hadoop-yetus commented on PR #6366:
URL: https://github.com/apache/hadoop/pull/6366#issuecomment-1861726825

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 21s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 33s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 36s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 43s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  4s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 40s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 10s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 38s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 32s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 187m 15s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6366/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 29s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 274m  2s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6366/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6366 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux da866634b780 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d7efbf8d6716eb77b0fd93d415494223e0a20a26 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6366/5/testReport/ |
   | Max. process+thread count | 4490 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output |

[jira] [Commented] (HDFS-17292) Show the number of times the slowPeerCollectorDaemon thread has collected SlowNodes.

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798355#comment-17798355
 ] 

ASF GitHub Bot commented on HDFS-17292:
---

hadoop-yetus commented on PR #6364:
URL: https://github.com/apache/hadoop/pull/6364#issuecomment-1861603070

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 49s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 18s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  36m 50s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   5m 53s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m  4s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   2m  5s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   4m 39s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 51s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 31s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 58s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   5m 58s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 51s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   5m 51s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 47s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 52s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   4m 45s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  38m 27s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 241m 58s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6364/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  unit  |  23m 22s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 46s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 450m 22s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6364/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6364 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux d7afeccbfbfd 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 21fced341c2b1af22c668d9d3270fc1f5346841e |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results |

[jira] [Commented] (HDFS-17275) We should determine whether the block has been deleted in the block report

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798340#comment-17798340
 ] 

ASF GitHub Bot commented on HDFS-17275:
---

hadoop-yetus commented on PR #6335:
URL: https://github.com/apache/hadoop/pull/6335#issuecomment-1861466358

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 33s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  43m 37s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  8s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 24s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  5s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 15s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  34m 40s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  5s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m  5s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 51s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 17s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  34m 24s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 213m 23s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 349m 49s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6335/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6335 |
   | JIRA Issue | HDFS-17275 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux beed9c8ee0ae 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 0c8052f3eab0336439495e4c7dbb6f4b65c93727 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6335/2/testReport/ |
   | Max. process+thread count | 3406 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6335/2/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |

[jira] [Updated] (HDFS-17296) ACL inheritance broken for new files

2023-12-18 Thread Emil Kleszcz (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emil Kleszcz updated HDFS-17296:

Affects Version/s: 3.3.6

> ACL inheritance broken for new files
> 
>
> Key: HDFS-17296
> URL: https://issues.apache.org/jira/browse/HDFS-17296
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.5, 3.2.1, 3.3.6
>Reporter: Emil Kleszcz
>Priority: Critical
>
> Looks like the inheritance of ACLs for the files is not working correctly.
> I have tried the following in HDFS v3.2.1:
> {code:java}
> >hdfs dfs -mkdir /test
> >hdfs dfs -touchz /test/test1
> >hdfs dfs -mkdir /test/testdir1
> >hdfs dfs -setfacl -m user:test:rwx /test
> >hdfs dfs -touchz /test/test2
> >hdfs dfs -getfacl -R /test # file: /test
> # owner: hdfs
> # group: hdfs
> user::rwx
> group::rwx
> other::rwx
> # file: /test/test1
> # owner: hdfs
> # group: hdfs
> user::rw-
> group::rw-
> other::rw-
> # file: /test/test2
> # owner: hdfs
> # group: hdfs
> user::rw-
> group::r--
> other::r--
> # file: /test/testdir1
> # owner: hdfs
> # group: hdfs
> user::rwx
> group::rwx
> other::rwx{code}
> The same happens when I set default permissions and umask to rwx
> {code:java}
> hdfs dfs -setfacl -m default:user::rwx /test
> hdfs dfs -setfacl -m mask::rwx /test{code}
> Also I was overwriting the default umask-mode in core-site.xml:
> {code:java}
> 
>         fs.permissions.umask-mode
>         000
>  {code}
> Not helping.
> Other relevant parameters:
> {code:java}
> 
>     dfs.permissions
>     true
> 
>     dfs.permissions.supergroup
>     hdfs
> 
>     dfs.namenode.acls.enabled
>     true
>  {code}
> Inheritance was not disabled and according to docs by default is set to true: 
> {code:java}
> dfs.namenode.posix.acl.inheritance.enabled{code}
> Ref. 
> [https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17296) ACL inheritance broken for new files

2023-12-18 Thread Emil Kleszcz (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emil Kleszcz updated HDFS-17296:

Description: 
Looks like the inheritance of ACLs for the files is not working correctly.
I have tried the following in HDFS v3.2.1:
{code:java}
>hdfs dfs -mkdir /test
>hdfs dfs -touchz /test/test1
>hdfs dfs -mkdir /test/testdir1
>hdfs dfs -setfacl -m user:test:rwx /test
>hdfs dfs -touchz /test/test2
>hdfs dfs -getfacl -R /test # file: /test
# owner: hdfs
# group: hdfs
user::rwx
group::rwx
other::rwx

# file: /test/test1
# owner: hdfs
# group: hdfs
user::rw-
group::rw-
other::rw-

# file: /test/test2
# owner: hdfs
# group: hdfs
user::rw-
group::r--
other::r--

# file: /test/testdir1
# owner: hdfs
# group: hdfs
user::rwx
group::rwx
other::rwx{code}
The same happens when I set default permissions and umask to rwx
{code:java}
hdfs dfs -setfacl -m default:user::rwx /test
hdfs dfs -setfacl -m mask::rwx /test{code}
Also I was overwriting the default umask-mode in core-site.xml:
{code:java}

        fs.permissions.umask-mode
        000
 {code}
Not helping.

Other relevant parameters:
{code:java}

    dfs.permissions
    true

    dfs.permissions.supergroup
    hdfs

    dfs.namenode.acls.enabled
    true
 {code}
Inheritance was not disabled and according to docs by default is set to true: 
{code:java}
dfs.namenode.posix.acl.inheritance.enabled{code}
Ref. 
[https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]

  was:
Looks like the inheritance of ACLs for the files is not working correctly.
I have tried the following in HDFS v3.2.1:
{code:java}
>hdfs dfs -mkdir /test
>hdfs dfs -touchz /test/test1
>hdfs dfs -mkdir /test/testdir1
>hdfs dfs -setfacl -m user:test:rwx /test
>hdfs dfs -getfacl -R /test # file: /test
# owner: hdfs
# group: hdfs
user::rwx
group::rwx
other::rwx

# file: /test/test1
# owner: hdfs
# group: hdfs
user::rw-
group::rw-
other::rw-

# file: /test/test2
# owner: hdfs
# group: hdfs
user::rw-
group::r--
other::r--

# file: /test/testdir1
# owner: hdfs
# group: hdfs
user::rwx
group::rwx
other::rwx{code}
The same happens when I set default permissions and umask to rwx
{code:java}
hdfs dfs -setfacl -m default:user::rwx /test
hdfs dfs -setfacl -m mask::rwx /test{code}
Also I was overwriting the default umask-mode in core-site.xml:
{code:java}

        fs.permissions.umask-mode
        000
 {code}
Not helping.

Other relevant parameters:
{code:java}

    dfs.permissions
    true

    dfs.permissions.supergroup
    hdfs

    dfs.namenode.acls.enabled
    true
 {code}
Inheritance was not disabled and according to docs by default is set to true: 
{code:java}
dfs.namenode.posix.acl.inheritance.enabled{code}
Ref. 
[https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]


> ACL inheritance broken for new files
> 
>
> Key: HDFS-17296
> URL: https://issues.apache.org/jira/browse/HDFS-17296
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.5, 3.2.1
>Reporter: Emil Kleszcz
>Priority: Critical
>
> Looks like the inheritance of ACLs for the files is not working correctly.
> I have tried the following in HDFS v3.2.1:
> {code:java}
> >hdfs dfs -mkdir /test
> >hdfs dfs -touchz /test/test1
> >hdfs dfs -mkdir /test/testdir1
> >hdfs dfs -setfacl -m user:test:rwx /test
> >hdfs dfs -touchz /test/test2
> >hdfs dfs -getfacl -R /test # file: /test
> # owner: hdfs
> # group: hdfs
> user::rwx
> group::rwx
> other::rwx
> # file: /test/test1
> # owner: hdfs
> # group: hdfs
> user::rw-
> group::rw-
> other::rw-
> # file: /test/test2
> # owner: hdfs
> # group: hdfs
> user::rw-
> group::r--
> other::r--
> # file: /test/testdir1
> # owner: hdfs
> # group: hdfs
> user::rwx
> group::rwx
> other::rwx{code}
> The same happens when I set default permissions and umask to rwx
> {code:java}
> hdfs dfs -setfacl -m default:user::rwx /test
> hdfs dfs -setfacl -m mask::rwx /test{code}
> Also I was overwriting the default umask-mode in core-site.xml:
> {code:java}
> 
>         fs.permissions.umask-mode
>         000
>  {code}
> Not helping.
> Other relevant parameters:
> {code:java}
> 
>     dfs.permissions
>     true
> 
>     dfs.permissions.supergroup
>     hdfs
> 
>     dfs.namenode.acls.enabled
>     true
>  {code}
> Inheritance was not disabled and according to docs by default is set to true: 
> {code:java}
> dfs.namenode.posix.acl.inheritance.enabled{code}
> Ref. 
> [https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17296) ACL inheritance broken for new files

2023-12-18 Thread Emil Kleszcz (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emil Kleszcz updated HDFS-17296:

Description: 
Looks like the inheritance of ACLs for the files is not working correctly.
I have tried the following in HDFS v3.2.1:
{code:java}
>hdfs dfs -mkdir /test
>hdfs dfs -touchz /test/test1
>hdfs dfs -mkdir /test/testdir1
>hdfs dfs -setfacl -m user:test:rwx /test
>hdfs dfs -getfacl -R /test # file: /test
# owner: hdfs
# group: hdfs
user::rwx
group::rwx
other::rwx

# file: /test/test1
# owner: hdfs
# group: hdfs
user::rw-
group::rw-
other::rw-

# file: /test/test2
# owner: hdfs
# group: hdfs
user::rw-
group::r--
other::r--

# file: /test/testdir1
# owner: hdfs
# group: hdfs
user::rwx
group::rwx
other::rwx{code}
The same happens when I set default permissions and umask to rwx
{code:java}
hdfs dfs -setfacl -m default:user::rwx /test
hdfs dfs -setfacl -m mask::rwx /test{code}
Also I was overwriting the default umask-mode in core-site.xml:
{code:java}

        fs.permissions.umask-mode
        000
 {code}
Not helping.

Other relevant parameters:
{code:java}

    dfs.permissions
    true

    dfs.permissions.supergroup
    hdfs

    dfs.namenode.acls.enabled
    true
 {code}
Inheritance was not disabled and according to docs by default is set to true: 
{code:java}
dfs.namenode.posix.acl.inheritance.enabled{code}
Ref. 
[https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]

  was:
Looks like the inheritance of ACLs for the files is not working correctly.
I have tried the following in HDFS v3.2.1:
{code:java}
>hdfs dfs -mkdir /test
>hdfs dfs -touchz /test/test1
>hdfs dfs -mkdir /test/testdir1
>hdfs dfs -getfacl -R /test # file: /test
# owner: hdfs
# group: hdfs
user::rwx
group::rwx
other::rwx# file: /test/test1
# owner: hdfs
# group: hdfs
user::rw-
group::rw-
other::rw-# file: /test/testdir1
# owner: hdfs
# group: hdfs
user::rwx
group::rwx
other::rwx{code}
The same happens when I set default permissions and umask to rwx
{code:java}
hdfs dfs -setfacl -m default:user::rwx /test
hdfs dfs -setfacl -m mask::rwx /test{code}
Also I was overwriting the default umask-mode in core-site.xml:
{code:java}

        fs.permissions.umask-mode
        000
 {code}
Not helping.

Other relevant parameters:
{code:java}

    dfs.permissions
    true

    dfs.permissions.supergroup
    hdfs

    dfs.namenode.acls.enabled
    true
 {code}
Inheritance was not disabled and according to docs by default is set to true: 
{code:java}
dfs.namenode.posix.acl.inheritance.enabled{code}
Ref. 
[https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]


> ACL inheritance broken for new files
> 
>
> Key: HDFS-17296
> URL: https://issues.apache.org/jira/browse/HDFS-17296
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.5, 3.2.1
>Reporter: Emil Kleszcz
>Priority: Critical
>
> Looks like the inheritance of ACLs for the files is not working correctly.
> I have tried the following in HDFS v3.2.1:
> {code:java}
> >hdfs dfs -mkdir /test
> >hdfs dfs -touchz /test/test1
> >hdfs dfs -mkdir /test/testdir1
> >hdfs dfs -setfacl -m user:test:rwx /test
> >hdfs dfs -getfacl -R /test # file: /test
> # owner: hdfs
> # group: hdfs
> user::rwx
> group::rwx
> other::rwx
> # file: /test/test1
> # owner: hdfs
> # group: hdfs
> user::rw-
> group::rw-
> other::rw-
> # file: /test/test2
> # owner: hdfs
> # group: hdfs
> user::rw-
> group::r--
> other::r--
> # file: /test/testdir1
> # owner: hdfs
> # group: hdfs
> user::rwx
> group::rwx
> other::rwx{code}
> The same happens when I set default permissions and umask to rwx
> {code:java}
> hdfs dfs -setfacl -m default:user::rwx /test
> hdfs dfs -setfacl -m mask::rwx /test{code}
> Also I was overwriting the default umask-mode in core-site.xml:
> {code:java}
> 
>         fs.permissions.umask-mode
>         000
>  {code}
> Not helping.
> Other relevant parameters:
> {code:java}
> 
>     dfs.permissions
>     true
> 
>     dfs.permissions.supergroup
>     hdfs
> 
>     dfs.namenode.acls.enabled
>     true
>  {code}
> Inheritance was not disabled and according to docs by default is set to true: 
> {code:java}
> dfs.namenode.posix.acl.inheritance.enabled{code}
> Ref. 
> [https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-17296) ACL inheritance broken for new files

2023-12-18 Thread Emil Kleszcz (Jira)

Emil Kleszcz created HDFS-17296:
---

 Summary: ACL inheritance broken for new files
 Key: HDFS-17296
 URL: https://issues.apache.org/jira/browse/HDFS-17296
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.2.1, 2.7.5
Reporter: Emil Kleszcz


Looks like the inheritance of ACLs for the files is not working correctly.
I have tried the following in HDFS v3.2.1:
{code:java}
>hdfs dfs -mkdir /test
>hdfs dfs -touchz /test/test1
>hdfs dfs -mkdir /test/testdir1
>hdfs dfs -getfacl -R /test # file: /test
# owner: hdfs
# group: hdfs
user::rwx
group::rwx
other::rwx# file: /test/test1
# owner: hdfs
# group: hdfs
user::rw-
group::rw-
other::rw-# file: /test/testdir1
# owner: hdfs
# group: hdfs
user::rwx
group::rwx
other::rwx{code}
The same happens when I set default permissions and umask to rwx
{code:java}
hdfs dfs -setfacl -m default:user::rwx /test
hdfs dfs -setfacl -m mask::rwx /test{code}
Also I was overwriting the default umask-mode in core-site.xml:
{code:java}

        fs.permissions.umask-mode
        000
 {code}
Not helping.

Other relevant parameters:
{code:java}

    dfs.permissions
    true

    dfs.permissions.supergroup
    hdfs

    dfs.namenode.acls.enabled
    true
 {code}
Inheritance was not disabled and according to docs by default is set to true: 
{code:java}
dfs.namenode.posix.acl.inheritance.enabled{code}
Ref. 
[https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798239#comment-17798239
 ] 

ASF GitHub Bot commented on HDFS-17294:
---

hadoop-yetus commented on PR #6366:
URL: https://github.com/apache/hadoop/pull/6366#issuecomment-1860748953

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 21s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m 18s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 37s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m  4s |  |  trunk passed  |
   | -1 :x: |  shadedclient  |  28m 39s |  |  branch has errors when building 
and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 51s |  |  the patch passed  |
   | -1 :x: |  shadedclient  |  23m 37s |  |  patch has errors when building 
and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   0m 24s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6366/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 21s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 102m 14s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6366/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6366 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux e70d04b26b8f 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d7efbf8d6716eb77b0fd93d415494223e0a20a26 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6366/3/testReport/ |
   | Max. process+thread count | 516 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6366/3/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This

[jira] [Commented] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798222#comment-17798222
 ] 

ASF GitHub Bot commented on HDFS-17294:
---

tasanuma commented on PR #6366:
URL: https://github.com/apache/hadoop/pull/6366#issuecomment-1860613005

   Thanks for updating the PR. +1, pending CI.




> Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
> ---
>
> Key: HDFS-17294
> URL: https://issues.apache.org/jira/browse/HDFS-17294
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798210#comment-17798210
 ] 

ASF GitHub Bot commented on HDFS-17294:
---

huangzhaobo99 commented on code in PR #6366:
URL: https://github.com/apache/hadoop/pull/6366#discussion_r1430139244


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java:
##
@@ -2673,6 +2677,24 @@ String reconfigureSlowNodesParameters(final 
DatanodeManager datanodeManager,
 datanodeManager.setMaxSlowPeersToReport(maxSlowPeersToReport);
 break;
   }
+  case DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_KEY: {
+if (newVal == null) {
+  // set to the value of the current system or default
+  long defaultInterval =
+  
getConf().getTimeDuration(DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_KEY,
+  DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_DEFAULT, 
TimeUnit.MILLISECONDS);
+  datanodeManager.restartSlowPeerCollector(defaultInterval);
+  result = DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_DEFAULT;

Review Comment:
   > Maybe this is the correct line?
   
   Thanks, I have fixed it.





> Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
> ---
>
> Key: HDFS-17294
> URL: https://issues.apache.org/jira/browse/HDFS-17294
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798205#comment-17798205
 ] 

ASF GitHub Bot commented on HDFS-17294:
---

tasanuma commented on code in PR #6366:
URL: https://github.com/apache/hadoop/pull/6366#discussion_r1430132571


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java:
##
@@ -2673,6 +2677,24 @@ String reconfigureSlowNodesParameters(final 
DatanodeManager datanodeManager,
 datanodeManager.setMaxSlowPeersToReport(maxSlowPeersToReport);
 break;
   }
+  case DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_KEY: {
+if (newVal == null) {
+  // set to the value of the current system or default
+  long defaultInterval =
+  
getConf().getTimeDuration(DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_KEY,
+  DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_DEFAULT, 
TimeUnit.MILLISECONDS);
+  datanodeManager.restartSlowPeerCollector(defaultInterval);
+  result = DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_DEFAULT;

Review Comment:
   Maybe this is the correct line?
   ```suggestion
 result = Long.toString(defaultInterval);
   ```





> Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
> ---
>
> Key: HDFS-17294
> URL: https://issues.apache.org/jira/browse/HDFS-17294
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17289) Considering the size of non-lastBlocks equals to complete block size can cause append failure.

2023-12-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798189#comment-17798189
 ] 

ASF GitHub Bot commented on HDFS-17289:
---

hadoop-yetus commented on PR #6357:
URL: https://github.com/apache/hadoop/pull/6357#issuecomment-1860394370

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m 54s |  |  Maven dependency ordering for branch  |
   | -1 :x: |  mvninstall  |  20m 34s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6357/5/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  compile  |   2m 49s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   2m 53s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 43s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 14s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  4s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m 59s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m  8s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 20s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m  4s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 50s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   2m 50s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 43s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   2m 43s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 35s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6357/5/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 4 new + 49 unchanged - 0 fixed = 
53 total (was 49)  |
   | +1 :green_heart: |  mvnsite  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m  8s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 28s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 49s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 188m 39s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 27s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 294m 27s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6357/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6357 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 8f2a577263f2 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 65927eef34947fd9bee244e759e699a48291c3e5 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |

[jira] [Commented] (HDFS-17295) 'hdfs dfs -put' may fail when more than half of the datanodes are unavailable

2023-12-18 Thread Xuze Yang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798040#comment-17798040
 ] 

Xuze Yang commented on HDFS-17295:
--

A simple solution is to increase the considerLoadFactor. For example, if we set 
considerLoadFactor to 4, then more than three-quarters of datanodes are 
unavailable may lead to put operation fail. More generally, if 
considerLoadFactor is N, then only more than (N-1)/N datanodes are unavailable 
may lead to put operation fail. 

However, in my opinion, simply increasing the considerLoadFactor is not good 
enough. Because a larger considerLoadFactor value means a more uneven load, 
which may lead to a decrease in the read and write performance of the datanode. 

> 'hdfs dfs -put' may fail when more than half of the datanodes are unavailable
> -
>
> Key: HDFS-17295
> URL: https://issues.apache.org/jira/browse/HDFS-17295
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.10.1
>Reporter: Xuze Yang
>Priority: Major
> Attachments: image-2023-12-18-13-45-56-824.png, 
> image-2023-12-18-13-58-25-102.png, image-2023-12-18-14-07-05-802.png, 
> image-2023-12-18-14-25-12-447.png
>
>
> I encountered an error in one of our production environments.
> Client error log is:
> !image-2023-12-18-13-45-56-824.png|width=978,height=211!
> namenode error log is:
> !image-2023-12-18-13-58-25-102.png|width=974,height=373!
> datanode capacity usage is:
> !image-2023-12-18-14-07-05-802.png!
> 12 datanodes are all excluded because 7 is full and 5 is busy. 7 full is 
> obviously from datanode capacity usage. 5 busy can be derived from following 
> code:
> !image-2023-12-18-14-25-12-447.png!
> *considerLoadFactor* is set to 2 by default(controlled by 
> dfs.namenode.replication.considerLoad.factor)
> *stats. getInServiceXceiverAverage()* is the total number of Xceivers divided 
> by the current number of datanodes in service.
> In the error scenario mentioned above, the Xceiver count of 12 datanodes are: 
> 0, 0, 0, 0, 0, 0, 0, 24, 24, 24, 24, 24. Then the maxLoad is 2*(120/12)=20. 
> The last 5 datanodes will be excluded because 24 greater than 20.
> Under the current settings, as long as more than half of the datanodes are 
> unavailable, the remaining available datanodes may be excluded due to high 
> load.
> More than half of datanodes are unavailable is not a rare scenario. Capacity 
> used up is one example. Storage policy is another example, suppose we has a 5 
> datanodes's cluster, 3 datanodes are all SSD, 2 datanodes are all HDD. The 
> storage policy for the /test/read and /test/write directories is HOT. 
> Starting from a certain moment, we read files in the/test/read directory, 
> which lead 2 HDD datanodes's Xceiver high, then we try to put files into 
> /test/write directory, the put operation will fail and throw the similar 
> exception mentioned before.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

41 matches

Mail list logo