[ 
https://issues.apache.org/jira/browse/HDFS-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071837#comment-13071837
 ] 

Eric Payne commented on HDFS-2171:
----------------------------------

Hey Todd,

Is it ever appropriate to merge a patch to a branch first and then merge it in 
trunk? My concern is that this patch will get stale while I try to resolve 
HDFS-2202, which is the patch for trunk.

I have added a patch to trunk for this feature in HDFS-2202, but there are a 
couple of issues with it:
1) there are some seemingly unrelated test failures in areas I didn't touch: 
hdfsCLI, Append, and HFlush. These DO NOT show up when I run test-patch in my 
build environment. I'm looking into those.
2) In HDFS-2106, Nickolas has refactored the FSNameSystem class (and others), 
so the trunk patch will also need to be redone.

I am working on these, and the patch for trunk will be updated in the next 
couple of days. Is it possible that the branch patch could go in first to 205 
and then the trunk patch goes in a couple of days afterwards?

Thanks,
-Eric

> Changes to balancer bandwidth should not require datanode restart.
> ------------------------------------------------------------------
>
>                 Key: HDFS-2171
>                 URL: https://issues.apache.org/jira/browse/HDFS-2171
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer, data-node
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: HDFS-2171.patch
>
>
> Currently in order to change the value of the balancer bandwidth 
> (dfs.datanode.balance.bandwidthPerSec), the datanode daemon must be restarted.
> The optimal value of the bandwidthPerSec parameter is not always (almost 
> never) known at the time of cluster startup, but only once a new node is 
> placed in the cluster and balancing is begun. If the balancing is taking too 
> long (bandwidthPerSec is too low) or the balancing is taking up too much 
> bandwidth (bandwidthPerSec is too high), the cluster must go into a 
> "maintenance window" where it is unusable while all of the datanodes are 
> bounced. In large clusters of thousands of nodes, this can be a real 
> maintenance problem because these "mainenance windows" can take a long time 
> and there may have to be several of them while the bandwidthPerSec is 
> experimented with and tuned.
> A possible solution to this problem would be to add a -bandwidth parameter to 
> the balancer tool. If bandwidth is supplied, pass the value to the datanodes 
> via the OP_REPLACE_BLOCK and OP_COPY_BLOCK DataTransferProtocol requests. 
> This would make it necessary, however, to change the DataTransferProtocol 
> version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to