[ 
https://issues.apache.org/jira/browse/HDFS-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072468#comment-13072468
 ] 

Eric Payne commented on HDFS-2202:
----------------------------------

Hi Nicholas,

Thank you for reviewing this Jira. Your comments were clear, precise, and 
easily understood. I appreciate that.

> Hi Eric, sorry that the refactoring breaks your patch. Could you update it?
Yes. It has been updated.

> In TestBalancerBandwidth, you may call MiniDFSCluster.getFileSystem() instead 
> of creating a DFSClient.
Done.

> We should update ClientProtocol.versionID and DatanodeProtocol.versionID.
> I think the BalancerBandwidthCommand.version is not needed. We have to change 
> the DatanodeProtocol.versionID in this case.

I did this in the 0.23.0 patch. However, one of the requirements for the 
0.20.205.0 patch was to not modify the DatanodeProtocol.versionID (please see 
https://issues.apache.org/jira/browse/HDFS-2171?focusedCommentId=13068990&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13068990).
 The reason is that the operations team does not want to require all clusters 
in a colo to be upgraded for 0.20.205, which would have to be done if the 
DatanodeProtocol.versionID changed. This is because there are some 
cross-cluster use cases.

In 0.20.205, I left the BalancerBandwidthCommand.version.
In the case of 0.23, the DatanodeProtocol.versionID has to change anyway, so it 
makes sense there.

> You may use for-each statement for the following (... foreach example code 
> here...)
Done

> The initial capacity does not really matter. How about removing it? 
Done

> Please add getter/setter and do not use public field 
> DatanodeDescriptor.bandwidth.
Done

> Please add javadoc (or change comments to javadoc) to all new public 
> classes/methods/fields.
Done

> Changes to balancer bandwidth should not require datanode restart.
> ------------------------------------------------------------------
>
>                 Key: HDFS-2202
>                 URL: https://issues.apache.org/jira/browse/HDFS-2202
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer, data-node
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: HDFS-2171.patch, HDFS-2202.0.20.205.0.v1.patch, 
> HDFS-2202.0.23.0.v1.patch, HDFS-2202.patch
>
>
> Currently in order to change the value of the balancer bandwidth 
> (dfs.datanode.balance.bandwidthPerSec), the datanode daemon must be restarted.
> The optimal value of the bandwidthPerSec parameter is not always (almost 
> never) known at the time of cluster startup, but only once a new node is 
> placed in the cluster and balancing is begun. If the balancing is taking too 
> long (bandwidthPerSec is too low) or the balancing is taking up too much 
> bandwidth (bandwidthPerSec is too high), the cluster must go into a 
> "maintenance window" where it is unusable while all of the datanodes are 
> bounced. In large clusters of thousands of nodes, this can be a real 
> maintenance problem because these "mainenance windows" can take a long time 
> and there may have to be several of them while the bandwidthPerSec is 
> experimented with and tuned.
> A possible solution to this problem would be to add a -bandwidth parameter to 
> the balancer tool. If bandwidth is supplied, pass the value to the datanodes 
> via the OP_REPLACE_BLOCK and OP_COPY_BLOCK DataTransferProtocol requests. 
> This would make it necessary, however, to change the DataTransferProtocol 
> version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to