[ 
https://issues.apache.org/jira/browse/HDFS-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JiangHua Zhu reassigned HDFS-16614:
-----------------------------------

    Assignee: JiangHua Zhu

> Improve balancer operation strategy and performance
> ---------------------------------------------------
>
>                 Key: HDFS-16614
>                 URL: https://issues.apache.org/jira/browse/HDFS-16614
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover, namenode
>    Affects Versions: 3.3.0
>            Reporter: JiangHua Zhu
>            Assignee: JiangHua Zhu
>            Priority: Major
>         Attachments: image-2022-06-02-13-18-33-213.png
>
>
> When the Balancer program is run, it does some work in the following order:
> 1. Obtain available datanode information from NameNode.
> 2. Classify and calculate the average utilization according to StorageType. 
> Here, some sets will be obtained in combination with the set thresholds: 
> overUtilized, aboveAvgUtilized, belowAvgUtilized, and underUtilized.
> 3. According to some calculations, the source and target related to the 
> transfer data are obtained. The source is used for the source end, and the 
> target is used for the data receiving end.
> 4. Start the data transfer work in parallel.
> In this process, run iteratively. In this process, the threshold is unified 
> and applied to all StorageTypes, which seems to be a bit rough, because one 
> of the StorageTypes cannot be distinguished, which is based on the currently 
> supported heterogeneous storage.
> There is an online cluster with more than 2000 nodes, and there is an 
> imbalance in node storage. E.g:
>  !image-2022-06-02-13-18-33-213.png! 
> Here, the average utilization of the cluster is 78%, but the utilization of 
> most nodes is between 85% and 90%. When the balancer is turned on, we find 
> that 85% of the nodes are working as sources. In this case, we think it is 
> not reasonable, because it will occupy more network resources in the cluster, 
> and it will be beneficial to the normal work of the cluster to do some 
> effective restrictions.
> So here are some changes to make:
> 1. When the balancer is running, it should try to prompt the threshold 
> related to StorageType. For example [[DISK, 10%], [SSD, 8%]...]
> 2. Support to set threshold according to StorageType and work.
> 3. Add an option to prohibit nodes below the threshold from joining the 
> Source set. This is to allow nodes with high utilization to transfer data as 
> soon as possible, which is good for balance.
> 4. Add new support. If there are a lot of datanode usage in the cluster, it 
> should remain unchanged. For example, the utilization rate of 40% of the 
> nodes in the cluster is 75% to 80%, and these nodes should not join the 
> Source set. Of course this support needs to be specified by the user at 
> runtime.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to