[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463168#comment-13463168
 ] 

Steve Hoffman commented on HDFS-1312:
-------------------------------------

bq. Yes, I think this should be fixed.
This was my original question really.  Since it hasn't made the cut in over 2 
years, I was wondering what it would take to either do something with this or 
should it be closed it as a "won't fix" with script/documentation support for 
the admins?

bq. No, I don't think this is as big of an issue as most people think.
Basically, I agree with you.  There are worse things that can go wrong.

bq. At 70-80% full, you start to run the risk that the NN is going to have 
trouble placing blocks, esp if . Also, if you are like most places and put the 
MR spill space on the same file system as HDFS, that 70-80% is more like 100%, 
especially if you don't clean up after MR. (Thus why I always put MR area on a 
separate file system...)
Agreed.  More getting installed Friday.  Just don't want bad timing/luck to be 
a factor here -- and we do clean up after the MR.

bq. As you scale, you care less about the health of individual nodes and more 
about total framework health.
Sorry, have to disagree here.  The total framework is made up of the parts.  
While I agree there is enough redundancy built in to handle most cases once 
your node count gets above a certain level, you are basically saying it doesn't 
have to work well in all cases because more $ can be thrown at it.

bq. 1PB isn't that big. At 12 drives per node, we're looking at ~50-60 nodes.
Our cluster is storage dense yes, so a loss of 1 node is noticeable.
                
> Re-balance disks within a Datanode
> ----------------------------------
>
>                 Key: HDFS-1312
>                 URL: https://issues.apache.org/jira/browse/HDFS-1312
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node
>            Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to