[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462960#comment-13462960
 ] 

Steve Hoffman commented on HDFS-1312:
-------------------------------------

{quote}
The other thing is that as you grow a grid, you care less and less about the 
balance on individual nodes. This issue is of primary important to smaller 
installations who likely are under-provisioned hardware-wise anyway.
{quote}
Our installation is about 1PB so I think we can say we are past "small".  We 
typically run at 70-80% full as we are not made of money.  And at 90% the disk 
alarms start waking people out of bed.
I would say we very much care about the balance of a single node.  When that 
node fills, it'll take out the region server, the M/R jobs running on it and 
generally anger people who's jobs have to be restarted.

I wouldn't be so quick to discount this.  And when you have enough machines, 
you are replacing disks more and more frequently.  So ANY manual process is $ 
wasted in people time.  Time to re-run jobs, times to take down datanode and 
move blocks.  Time = $.  To turn Hadoop into a more mature product, shouldn't 
we be striving for "it just works"?
                
> Re-balance disks within a Datanode
> ----------------------------------
>
>                 Key: HDFS-1312
>                 URL: https://issues.apache.org/jira/browse/HDFS-1312
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node
>            Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to