[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-10-19 Thread Steve Hoffman (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480055#comment-13480055
 ] 

Steve Hoffman commented on HDFS-1312:
-

Given the general nature of HDFS and its many uses (HBase, M/R, etc) as much as 
I'd like it to just work, it is clear it always depends on the use.  Maybe 
one day we won't need a balancer script for disks (or for the cluster).

I'm totally OK with having a machine-level balancer script.  We use the HDFS 
balancer to fix inter-machine imbalances when they crop up (again, for a 
variety of reasons).  It makes sense to have a manual script for intra-machine 
imbalances for people who DO have issues and make it part of the standard 
install (like the HDFS balancer).

 Re-balance disks within a Datanode
 --

 Key: HDFS-1312
 URL: https://issues.apache.org/jira/browse/HDFS-1312
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node
Reporter: Travis Crawford

 Filing this issue in response to ``full disk woes`` on hdfs-user.
 Datanodes fill their storage directories unevenly, leading to situations 
 where certain disks are full while others are significantly less used. Users 
 at many different sites have experienced this issue, and HDFS administrators 
 are taking steps like:
 - Manually rebalancing blocks in storage directories
 - Decomissioning nodes  later readding them
 There's a tradeoff between making use of all available spindles, and filling 
 disks at the sameish rate. Possible solutions include:
 - Weighting less-used disks heavier when placing new blocks on the datanode. 
 In write-heavy environments this will still make use of all spindles, 
 equalizing disk use over time.
 - Rebalancing blocks locally. This would help equalize disk use as disks are 
 added/replaced in older cluster nodes.
 Datanodes should actively manage their local disk so operator intervention is 
 not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-09-25 Thread Steve Hoffman (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13462916#comment-13462916
 ] 

Steve Hoffman commented on HDFS-1312:
-

Wow, I can't believe this is still lingering out there as a new feature 
request.  I'd argue this is a bug -- and a big one.  Here's why:

* You have Nx 12x3TB machines in your cluster.
* 1 disk fails on 12 drive machine.  Let's say they each were 80% full.
* You install the replacement drive (0% full), but by the time you do this the 
under-replicated blocks have been fixed (on this and other nodes)
* The 0% full drive will fill at the same rate as the blocks on the other 
disks.  That machine's other 11 disks will fill to 100% as the block placement 
is at a node level and the node seems to use a round-robin algorithm even 
though there is more space.

The only way we have found to move blocks internally (without taking the 
cluster down completely) is to decommission the node and have it empty and then 
re-add it to the cluster so the balancer can take over and move block back onto 
it.

Hard drives fail.  This isn't news to anybody.  The larger (12 disk) nodes only 
make the problem worse in time to empty and fill again.  Even if you had a 1U 4 
disk machine it is still bad 'cause you lose 25% of your capacity on 1 disk 
failure where the impact of the 12 disk machine is less than 9%. 

The remove/add of a complete node seems like a pretty poor option.
Or am I alone in this?  Can we please revive this JIRA?

 Re-balance disks within a Datanode
 --

 Key: HDFS-1312
 URL: https://issues.apache.org/jira/browse/HDFS-1312
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node
Reporter: Travis Crawford

 Filing this issue in response to ``full disk woes`` on hdfs-user.
 Datanodes fill their storage directories unevenly, leading to situations 
 where certain disks are full while others are significantly less used. Users 
 at many different sites have experienced this issue, and HDFS administrators 
 are taking steps like:
 - Manually rebalancing blocks in storage directories
 - Decomissioning nodes  later readding them
 There's a tradeoff between making use of all available spindles, and filling 
 disks at the sameish rate. Possible solutions include:
 - Weighting less-used disks heavier when placing new blocks on the datanode. 
 In write-heavy environments this will still make use of all spindles, 
 equalizing disk use over time.
 - Rebalancing blocks locally. This would help equalize disk use as disks are 
 added/replaced in older cluster nodes.
 Datanodes should actively manage their local disk so operator intervention is 
 not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-09-25 Thread Steve Hoffman (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13462960#comment-13462960
 ] 

Steve Hoffman commented on HDFS-1312:
-

{quote}
The other thing is that as you grow a grid, you care less and less about the 
balance on individual nodes. This issue is of primary important to smaller 
installations who likely are under-provisioned hardware-wise anyway.
{quote}
Our installation is about 1PB so I think we can say we are past small.  We 
typically run at 70-80% full as we are not made of money.  And at 90% the disk 
alarms start waking people out of bed.
I would say we very much care about the balance of a single node.  When that 
node fills, it'll take out the region server, the M/R jobs running on it and 
generally anger people who's jobs have to be restarted.

I wouldn't be so quick to discount this.  And when you have enough machines, 
you are replacing disks more and more frequently.  So ANY manual process is $ 
wasted in people time.  Time to re-run jobs, times to take down datanode and 
move blocks.  Time = $.  To turn Hadoop into a more mature product, shouldn't 
we be striving for it just works?

 Re-balance disks within a Datanode
 --

 Key: HDFS-1312
 URL: https://issues.apache.org/jira/browse/HDFS-1312
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node
Reporter: Travis Crawford

 Filing this issue in response to ``full disk woes`` on hdfs-user.
 Datanodes fill their storage directories unevenly, leading to situations 
 where certain disks are full while others are significantly less used. Users 
 at many different sites have experienced this issue, and HDFS administrators 
 are taking steps like:
 - Manually rebalancing blocks in storage directories
 - Decomissioning nodes  later readding them
 There's a tradeoff between making use of all available spindles, and filling 
 disks at the sameish rate. Possible solutions include:
 - Weighting less-used disks heavier when placing new blocks on the datanode. 
 In write-heavy environments this will still make use of all spindles, 
 equalizing disk use over time.
 - Rebalancing blocks locally. This would help equalize disk use as disks are 
 added/replaced in older cluster nodes.
 Datanodes should actively manage their local disk so operator intervention is 
 not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-09-25 Thread Steve Hoffman (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463168#comment-13463168
 ] 

Steve Hoffman commented on HDFS-1312:
-

bq. Yes, I think this should be fixed.
This was my original question really.  Since it hasn't made the cut in over 2 
years, I was wondering what it would take to either do something with this or 
should it be closed it as a won't fix with script/documentation support for 
the admins?

bq. No, I don't think this is as big of an issue as most people think.
Basically, I agree with you.  There are worse things that can go wrong.

bq. At 70-80% full, you start to run the risk that the NN is going to have 
trouble placing blocks, esp if . Also, if you are like most places and put the 
MR spill space on the same file system as HDFS, that 70-80% is more like 100%, 
especially if you don't clean up after MR. (Thus why I always put MR area on a 
separate file system...)
Agreed.  More getting installed Friday.  Just don't want bad timing/luck to be 
a factor here -- and we do clean up after the MR.

bq. As you scale, you care less about the health of individual nodes and more 
about total framework health.
Sorry, have to disagree here.  The total framework is made up of the parts.  
While I agree there is enough redundancy built in to handle most cases once 
your node count gets above a certain level, you are basically saying it doesn't 
have to work well in all cases because more $ can be thrown at it.

bq. 1PB isn't that big. At 12 drives per node, we're looking at ~50-60 nodes.
Our cluster is storage dense yes, so a loss of 1 node is noticeable.

 Re-balance disks within a Datanode
 --

 Key: HDFS-1312
 URL: https://issues.apache.org/jira/browse/HDFS-1312
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node
Reporter: Travis Crawford

 Filing this issue in response to ``full disk woes`` on hdfs-user.
 Datanodes fill their storage directories unevenly, leading to situations 
 where certain disks are full while others are significantly less used. Users 
 at many different sites have experienced this issue, and HDFS administrators 
 are taking steps like:
 - Manually rebalancing blocks in storage directories
 - Decomissioning nodes  later readding them
 There's a tradeoff between making use of all available spindles, and filling 
 disks at the sameish rate. Possible solutions include:
 - Weighting less-used disks heavier when placing new blocks on the datanode. 
 In write-heavy environments this will still make use of all spindles, 
 equalizing disk use over time.
 - Rebalancing blocks locally. This would help equalize disk use as disks are 
 added/replaced in older cluster nodes.
 Datanodes should actively manage their local disk so operator intervention is 
 not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3647) Expose dfs.datanode.max.xcievers as metric

2012-07-12 Thread Steve Hoffman (JIRA)
Steve Hoffman created HDFS-3647:
---

 Summary: Expose dfs.datanode.max.xcievers as metric
 Key: HDFS-3647
 URL: https://issues.apache.org/jira/browse/HDFS-3647
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, performance
Affects Versions: 0.20.2
Reporter: Steve Hoffman


Not sure if this is in a newer version of Hadoop, but in CDH3u3 it isn't there.

There is a lot of mystery surrounding how large to set 
dfs.datanode.max.xcievers.  Most people say to just up it to 4096, but given 
that exceeding this will cause an HBase RegionServer shutdown (see Lars' blog 
post here (http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html), 
it would be nice if we could expose the current count via the built-in metrics 
framework (most likely under dfs).  In this way we could watch it to see if we 
have it set too high, too low, time to bump it up, etc.

Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3647) Expose current xcievers count as metric

2012-07-12 Thread Steve Hoffman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Hoffman updated HDFS-3647:


Summary: Expose current xcievers count as metric  (was: Expose 
dfs.datanode.max.xcievers as metric)

 Expose current xcievers count as metric
 ---

 Key: HDFS-3647
 URL: https://issues.apache.org/jira/browse/HDFS-3647
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, performance
Affects Versions: 0.20.2
Reporter: Steve Hoffman

 Not sure if this is in a newer version of Hadoop, but in CDH3u3 it isn't 
 there.
 There is a lot of mystery surrounding how large to set 
 dfs.datanode.max.xcievers.  Most people say to just up it to 4096, but given 
 that exceeding this will cause an HBase RegionServer shutdown (see Lars' blog 
 post here (http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html), 
 it would be nice if we could expose the current count via the built-in 
 metrics framework (most likely under dfs).  In this way we could watch it to 
 see if we have it set too high, too low, time to bump it up, etc.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3647) Expose current xcievers count as metric

2012-07-12 Thread Steve Hoffman (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413075#comment-13413075
 ] 

Steve Hoffman commented on HDFS-3647:
-

https://issues.cloudera.org/browse/DISTRO-414 opened with cloudera.  Thx.

I'll leave it to you guys if you want to use this to track a 1.X apache back 
port.

 Expose current xcievers count as metric
 ---

 Key: HDFS-3647
 URL: https://issues.apache.org/jira/browse/HDFS-3647
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, performance
Affects Versions: 0.20.2
Reporter: Steve Hoffman

 Not sure if this is in a newer version of Hadoop, but in CDH3u3 it isn't 
 there.
 There is a lot of mystery surrounding how large to set 
 dfs.datanode.max.xcievers.  Most people say to just up it to 4096, but given 
 that exceeding this will cause an HBase RegionServer shutdown (see Lars' blog 
 post here: http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html), 
 it would be nice if we could expose the current count via the built-in 
 metrics framework (most likely under dfs).  In this way we could watch it to 
 see if we have it set too high, too low, time to bump it up, etc.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira