[ https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465270#comment-13465270 ]
Eli Collins commented on HDFS-1312: ----------------------------------- There are issues with just modifying the placement policy: # It only solves the problem for new blocks. If you add a bunch of new disks you want to rebalance the cluster immediately to get better read throughput. And if you have to implement block balancing (eg on startup) then you don't need to modify the placement policy. # The internal policy intentionally avoids disk usage to optimize for performance (round robin'ing blocks across all spindles). As you point out in some cases this won't be much of a hit, but on a 12 disk machine where half the disks are new the impact will be noticeable. # There are multiple placement policies now that they're pluggable, this requires every policy solve this problem vs just solving it once. IMO a background process would actually be easier then modifying the placement policy. Just balancing on DN startup is simplest and would solve most people issues, though would require a rolling DN restart if you wanted to do it on-line. > Re-balance disks within a Datanode > ---------------------------------- > > Key: HDFS-1312 > URL: https://issues.apache.org/jira/browse/HDFS-1312 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node > Reporter: Travis Crawford > > Filing this issue in response to ``full disk woes`` on hdfs-user. > Datanodes fill their storage directories unevenly, leading to situations > where certain disks are full while others are significantly less used. Users > at many different sites have experienced this issue, and HDFS administrators > are taking steps like: > - Manually rebalancing blocks in storage directories > - Decomissioning nodes & later readding them > There's a tradeoff between making use of all available spindles, and filling > disks at the sameish rate. Possible solutions include: > - Weighting less-used disks heavier when placing new blocks on the datanode. > In write-heavy environments this will still make use of all spindles, > equalizing disk use over time. > - Rebalancing blocks locally. This would help equalize disk use as disks are > added/replaced in older cluster nodes. > Datanodes should actively manage their local disk so operator intervention is > not needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira