[ https://issues.apache.org/jira/browse/HDFS-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198864#comment-15198864 ]
John Zhuge commented on HDFS-9940: ---------------------------------- Thank you all for the great discussion ! [~yzhangal] like the design because of its simplicity and ease-of-use for customers. We should separate this jira from HDFS-7466. I would even suggest not implementing the DN config for this jira and leave HDFS-7466 to decide how to proceed because it can always add more logic to DN reconfig method to overwrite this behavior. I would also suggest naming the Balancer config {{dfs.balancer.max.concurrent.moves.per.datanode}} to emphasize per-datanode. Let me summarize the simplified design below: * Balancer properties ** Rename {{dfs.datanode.balance.max.concurrent.moves}} to {{dfs.balancer.max.concurrent.moves.per.datanode}} ** Add {{dfs.balancer.max.bandwidthPerSec.per.datanode}} * Documentation ** Advise customers not to set {{dfs.datanode.balance.max.concurrent.moves}} and {{dfs.datanode.balance.bandwidthPerSec}} manually on DNs * NN ** Add API {{setBalancerConcurrentMoves}} and related changes * Balancer startup code ** Call NN {{setBalancerBandwidth}} based on {{dfs.balancer.max.bandwidthPerSec.per.datanode}} ** Call NN {{setBalancerConcurrentMoves}} based on {{dfs.balancer.max.concurrent.moves.per.datanode}} ** Wait some time for change propagation to all DNs * Make similar changes to Mover startup code Further improvements * Do not rely on NN to config all DNs for rebalance. For a cluster of 1000 nodes, if only 100 nodes are involved in rebalance, why waste any resource configuring the other 900 nodes? Balancer can do the job and it only has to config the DNs involved in moves. Of course it has to handle DN restarts. > Rename dfs.balancer.max.concurrent.moves to avoid confusion > ----------------------------------------------------------- > > Key: HDFS-9940 > URL: https://issues.apache.org/jira/browse/HDFS-9940 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover > Affects Versions: 2.6.0 > Reporter: John Zhuge > Assignee: John Zhuge > Priority: Minor > Labels: supportability > Fix For: 2.8.0 > > > It is very confusing for both Balancer and Datanode to use the same property > {{dfs.datanode.balance.max.concurrent.moves}}. It is especially so for the > Balancer because the property has "datanode" in the name string. Many > customers forget to set the property for the Balancer. > Change the Balancer to use a new property > {{dfs.balancer.max.concurrent.moves}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)