[ 
https://issues.apache.org/jira/browse/HDFS-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198864#comment-15198864
 ] 

John Zhuge commented on HDFS-9940:
----------------------------------

Thank you all for the great discussion !

[~yzhangal] like the design because of its simplicity and ease-of-use for 
customers.

We should separate this jira from HDFS-7466. I would even suggest not 
implementing the DN config for this jira and leave HDFS-7466 to decide how to 
proceed because it can always add more logic to DN reconfig method to overwrite 
this behavior. I would also suggest naming the Balancer config 
{{dfs.balancer.max.concurrent.moves.per.datanode}} to emphasize per-datanode.

Let me summarize the simplified design below:
* Balancer properties
** Rename {{dfs.datanode.balance.max.concurrent.moves}} to 
{{dfs.balancer.max.concurrent.moves.per.datanode}}
** Add {{dfs.balancer.max.bandwidthPerSec.per.datanode}}
* Documentation
** Advise customers not to set {{dfs.datanode.balance.max.concurrent.moves}} 
and {{dfs.datanode.balance.bandwidthPerSec}} manually on DNs
* NN
** Add API {{setBalancerConcurrentMoves}} and related changes
* Balancer startup code
** Call NN {{setBalancerBandwidth}} based on 
{{dfs.balancer.max.bandwidthPerSec.per.datanode}}
** Call NN {{setBalancerConcurrentMoves}} based on 
{{dfs.balancer.max.concurrent.moves.per.datanode}}
** Wait some time for change propagation to all DNs
* Make similar changes to Mover startup code

Further improvements
* Do not rely on NN to config all DNs for rebalance. For a cluster of 1000 
nodes, if only 100 nodes are involved in rebalance, why waste any resource 
configuring the other 900 nodes? Balancer can do the job and it only has to 
config the DNs involved in moves. Of course it has to handle DN restarts.

> Rename dfs.balancer.max.concurrent.moves to avoid confusion
> -----------------------------------------------------------
>
>                 Key: HDFS-9940
>                 URL: https://issues.apache.org/jira/browse/HDFS-9940
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover
>    Affects Versions: 2.6.0
>            Reporter: John Zhuge
>            Assignee: John Zhuge
>            Priority: Minor
>              Labels: supportability
>             Fix For: 2.8.0
>
>
> It is very confusing for both Balancer and Datanode to use the same property 
> {{dfs.datanode.balance.max.concurrent.moves}}. It is especially so for the 
> Balancer because the property has "datanode" in the name string. Many 
> customers forget to set the property for the Balancer.
> Change the Balancer to use a new property 
> {{dfs.balancer.max.concurrent.moves}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to