[ https://issues.apache.org/jira/browse/HDFS-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
JiangHua Zhu updated HDFS-16614: -------------------------------- Affects Version/s: 2.9.2 (was: 3.3.0) > Improve balancer operation strategy and performance > --------------------------------------------------- > > Key: HDFS-16614 > URL: https://issues.apache.org/jira/browse/HDFS-16614 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover, namenode > Affects Versions: 2.9.2 > Reporter: JiangHua Zhu > Assignee: JiangHua Zhu > Priority: Major > Attachments: image-2022-06-02-13-18-33-213.png > > > When the Balancer program is run, it does some work in the following order: > 1. Obtain available datanode information from NameNode. > 2. Classify and calculate the average utilization according to StorageType. > Here, some sets will be obtained in combination with the set thresholds: > overUtilized, aboveAvgUtilized, belowAvgUtilized, and underUtilized. > 3. According to some calculations, the source and target related to the > transfer data are obtained. The source is used for the source end, and the > target is used for the data receiving end. > 4. Start the data transfer work in parallel. > In this process, run iteratively. In this process, the threshold is unified > and applied to all StorageTypes, which seems to be a bit rough, because one > of the StorageTypes cannot be distinguished, which is based on the currently > supported heterogeneous storage. > There is an online cluster with more than 2000 nodes, and there is an > imbalance in node storage. E.g: > !image-2022-06-02-13-18-33-213.png! > Here, the average utilization of the cluster is 78%, but the utilization of > most nodes is between 85% and 90%. When the balancer is turned on, we find > that 85% of the nodes are working as sources. In this case, we think it is > not reasonable, because it will occupy more network resources in the cluster, > and it will be beneficial to the normal work of the cluster to do some > effective restrictions. > So here are some changes to make: > 1. When the balancer is running, it should try to prompt the threshold > related to StorageType. For example [[DISK, 10%], [SSD, 8%]...] > 2. Support to set threshold according to StorageType and work. > 3. Add an option to prohibit nodes below the threshold from joining the > Source set. This is to allow nodes with high utilization to transfer data as > soon as possible, which is good for balance. > 4. Add new support. If there are a lot of datanode usage in the cluster, it > should remain unchanged. For example, the utilization rate of 40% of the > nodes in the cluster is 75% to 80%, and these nodes should not join the > Source set. Of course this support needs to be specified by the user at > runtime. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org