[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951561#comment-13951561 ]
Tsz Wo Nicholas Sze commented on HDFS-6010: ------------------------------------------- The patch is generally good. Some comments: - I think "-datanodes" may be a better name than "-servers". However, I actually suggest not adding it as a CLI parameter since, for a large cluster, it may not be easy to specify all the selected datanodes in CLI. How about adding a new conf property, say dfs.balancer.selectedDatanodes? - The new class NodeStringValidator is unlikely to be used outside Balancer. How about moving it to the balancer package and renaming it to BalancerUtil? - In initNodes(..), if target == null, it will throw an IllegalArgumentException. However, a balancer may run for a long time and some datanodes could be down. I think we should not throw exceptions. Perhaps, printing a warning is good enough. -* The new code could be moved to a static method (in BalancerUtil) so that it is earlier to read. I have not yet checked NodeStringValidator and the new tests in details. > Make balancer able to balance data among specified servers > ---------------------------------------------------------- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer > Affects Versions: 2.3.0 > Reporter: Yu Li > Assignee: Yu Li > Priority: Minor > Labels: balancer > Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)