[jira] [Commented] (HDFS-17867) Implement a new NetworkTopology that supports weighted random choose

ASF GitHub Bot (Jira) Wed, 31 Dec 2025 07:30:20 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-17867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048568#comment-18048568
 ]


ASF GitHub Bot commented on HDFS-17867:
---------------------------------------

hadoop-yetus commented on PR #8154:
URL: https://github.com/apache/hadoop/pull/8154#issuecomment-3702392057

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 44s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 5 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |  22m 26s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/1/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  compile  |   0m 47s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 46s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 53s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  spotbugs  |   2m  8s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 52s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   0m 18s | 
[/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/1/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch failed.  |
   | -1 :x: |  compile  |   0m 17s | 
[/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/1/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-hdfs in the patch failed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  javac  |   0m 17s | 
[/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/1/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-hdfs in the patch failed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  compile  |   0m 19s | 
[/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/1/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-hdfs in the patch failed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  javac  |   0m 19s | 
[/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/1/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-hdfs in the patch failed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/1/artifact/out/blanks-eol.txt)
 |  The patch has 4 line(s) that end in blanks. Use git apply --whitespace=fix 
<<patch_file>>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   0m 25s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 12 new + 252 unchanged 
- 0 fixed = 264 total (was 252)  |
   | -1 :x: |  mvnsite  |   0m 19s | 
[/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/1/artifact/out/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch failed.  |
   | -1 :x: |  javadoc  |   0m 19s | 
[/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/1/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-hdfs in the patch failed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  javadoc  |   0m 17s | 
[/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/1/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-hdfs in the patch failed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  spotbugs  |   0m 18s | 
[/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/1/artifact/out/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch failed.  |
   | -1 :x: |  shadedclient  |   7m 11s |  |  patch has errors when building 
and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | -1 :x: |  unit  |   0m 19s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 18s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  55m 34s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.52 ServerAPI=1.52 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/8154 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 52b0e37c1e11 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 
20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4089a903124c366b4d46181cdd4fd5a2d2430dc0 |
   | Default Java | Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/1/testReport/ |
   | Max. process+thread count | 613 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/1/console |
   | versions | git=2.25.1 maven=3.9.11 spotbugs=4.9.7 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Implement a new NetworkTopology that supports weighted random choose
> --------------------------------------------------------------------
>
>                 Key: HDFS-17867
>                 URL: https://issues.apache.org/jira/browse/HDFS-17867
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: khazhen
>            Priority: Major
>              Labels: pull-request-available
>
> h2. Background
>      In BlockPlacementPolicyDefault, each DN in the cluster is selected with 
> roughly equal probability. However, in our cluster, there are various types 
> of DataNode machines with completely different hardware specifications.
>       For example, some machines have more disks, higher bandwidth NIC, 
> higher-performance CPUs, etc., while some older machines are the opposite. 
> Their service capacity is much lower than other newer machines. Therefore, as 
> the cluster load increases, these lower-performance machines immediately 
> become bottlenecks, causing the cluster's performance to decline, or even 
> affecting availability (such as slow data nodes or PipelineRecovery failures).
>       The root cause of this problem is that we don't have a way to achieve 
> load balancing between data nodes.
> h2. Solution
>       To better solve this problem, we implemented a NetworkTopology that 
> support weighted random choose.
>       We can configure a weight value for each DN similar to how we configure 
> racks. For clusters containing DNs with different hardware specifications, 
> introducing this feature has several benefits:
>  # Better load balancing between DNs. High-performance machines can handle 
> more traffic, and the overall service capacity of the cluster will be 
> improved.
>  # Higher resource utilization.
>  # Reduced overhead from Balancer. Typically, higher-performance machines 
> mean more hard drives and larger capacity. If we configure weights according 
> to capacity ratios, the amount of data that needs to be moved by Balancer 
> will be significantly reduced. (Of course, Balancer is still needed when new 
> dn is added)
>       Our production cluster has many different types of hardware 
> specifications for DN machines, and some machines can have capacities up to 
> 10 times that of some older models. Additionally, some machines are 
> co-deployed with many other computing services, causing them to immediately 
> become slow nodes once traffic increases.
>       After introducing this feature, we let independently deployed 
> high-performance, large-capacity machines handle more traffic, and both the 
> overall IO performance and availability of the cluster have been 
> significantly improved.
>       Our cluster's Hadoop version is still at 2.x, so we directly modified 
> the NetworkTopology class to implement this feature. However, in the latest 
> version, DFSNetworkTopology has been introduced as the default 
> implementation. Therefore, I attempted to re-implement this feature based on 
> DFSNetworkTopology. I will introduce the details next.
> h2. Implementation
>       Let's have a look at the chooseRandomWithStorageType method of 
> DFSNetworkTopology. Consider we have 3 dn in the cluster: dn1(/r1), dn2(/r1), 
> dn3(/r2). The topology tree looks like this:
> {code:java}
> /
>   /r1
>     /dn1
>     /dn2
>   /r2
>     /dn3 {code}
>       There are 3 core steps to choose a random dn from root scope:
> 1. compute num of available nodes under r1 and r2, which is [2, 1] in this 
> case.
> 2. perform a weighted random choose from [r1, r2] with weight [2, 1], assume 
> r1 is chosen
> 3. as r1 is a rack inner node, randomly choose a dn from its children list 
> [dn1, dn2]
>       The probability of each of these three dn being chosen is 1/3.
>       Now we want to introduce a weighted random choose from [dn1, dn2, dn3] 
> with weight [3, 1, 2]. A simple and straightforward solution is to add 
> virtual nodes to the topology tree, and the new topology tree looks like this:
> {code:java}
> /
>   /r1
>     /dn1'
>     /dn1'
>     /dn1'
>     /dn2'
>   /r2
>     /dn3'
>     /dn3' {code}
>       The probability of each of these virtual nodes being chosen is 1/6, and 
> dn1 has 3 virtual nodes, so the probability of choosing dn1 is 1/2, and 1/6, 
> 1/3 for dn2 and dn3 respectively.
>       However, upon reviewing steps 1 through 3, we can see that step 1 and 2 
> only care about the number of data nodes under inner node, this means that we 
> don't need to really add virtual nodes to the topology tree, instead, we can 
> introduce a new method getNodeCount(Node n), it accepts a node as input, and 
> returns the number of data nodes under n. In the old DFSNetworkTopology 
> class, it just returns the number of physical data nodes under n. Then we can 
> add a new subclass of DFSNetworkTopology which overrides getNodeCount(Node n) 
> to return the total weight of all data nodes under n.
>       The step 3 needs to be modified as well, we should perform a weighted 
> random choose from child list rather than a simple random choose.
> h2. Difference with AvailableSpaceBlockPlacementPolicy
>       AvailableSpaceBlockPlacementPolicy is useful when we add new nodes to 
> the cluster, it makes the new added nodes being chosen with a little high 
> possibility than the old ones, and the cluster will trend to be balanced 
> after a period of time. The real time load of newly added nodes won't change 
> much.
>       This feature focuses on the real time load balancing between data 
> nodes, it's useful in the cluster that has many different types of data nodes.
> h2. Conclusion
>       I have submitted a PR. More suggestions and discussions are welcomed.
>       By the way, it is a very useful feature to make the weight of nodes 
> reconfigurable without restarting namenode. It allows us to quickly adjust 
> weights based on the actual load of the cluster. I will introduce this 
> feature in a separate JIRA after this one is completed.
>       



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-17867) Implement a new NetworkTopology that supports weighted random choose

Reply via email to