[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode
[ https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480055#comment-13480055 ] Steve Hoffman commented on HDFS-1312: - Given the general nature of HDFS and its many uses (HBase, M/R, etc) as much as I'd like it to just work, it is clear it always depends on the use. Maybe one day we won't need a balancer script for disks (or for the cluster). I'm totally OK with having a machine-level balancer script. We use the HDFS balancer to fix inter-machine imbalances when they crop up (again, for a variety of reasons). It makes sense to have a manual script for intra-machine imbalances for people who DO have issues and make it part of the standard install (like the HDFS balancer). Re-balance disks within a Datanode -- Key: HDFS-1312 URL: https://issues.apache.org/jira/browse/HDFS-1312 Project: Hadoop HDFS Issue Type: New Feature Components: data-node Reporter: Travis Crawford Filing this issue in response to ``full disk woes`` on hdfs-user. Datanodes fill their storage directories unevenly, leading to situations where certain disks are full while others are significantly less used. Users at many different sites have experienced this issue, and HDFS administrators are taking steps like: - Manually rebalancing blocks in storage directories - Decomissioning nodes later readding them There's a tradeoff between making use of all available spindles, and filling disks at the sameish rate. Possible solutions include: - Weighting less-used disks heavier when placing new blocks on the datanode. In write-heavy environments this will still make use of all spindles, equalizing disk use over time. - Rebalancing blocks locally. This would help equalize disk use as disks are added/replaced in older cluster nodes. Datanodes should actively manage their local disk so operator intervention is not needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode
[ https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13462916#comment-13462916 ] Steve Hoffman commented on HDFS-1312: - Wow, I can't believe this is still lingering out there as a new feature request. I'd argue this is a bug -- and a big one. Here's why: * You have Nx 12x3TB machines in your cluster. * 1 disk fails on 12 drive machine. Let's say they each were 80% full. * You install the replacement drive (0% full), but by the time you do this the under-replicated blocks have been fixed (on this and other nodes) * The 0% full drive will fill at the same rate as the blocks on the other disks. That machine's other 11 disks will fill to 100% as the block placement is at a node level and the node seems to use a round-robin algorithm even though there is more space. The only way we have found to move blocks internally (without taking the cluster down completely) is to decommission the node and have it empty and then re-add it to the cluster so the balancer can take over and move block back onto it. Hard drives fail. This isn't news to anybody. The larger (12 disk) nodes only make the problem worse in time to empty and fill again. Even if you had a 1U 4 disk machine it is still bad 'cause you lose 25% of your capacity on 1 disk failure where the impact of the 12 disk machine is less than 9%. The remove/add of a complete node seems like a pretty poor option. Or am I alone in this? Can we please revive this JIRA? Re-balance disks within a Datanode -- Key: HDFS-1312 URL: https://issues.apache.org/jira/browse/HDFS-1312 Project: Hadoop HDFS Issue Type: New Feature Components: data-node Reporter: Travis Crawford Filing this issue in response to ``full disk woes`` on hdfs-user. Datanodes fill their storage directories unevenly, leading to situations where certain disks are full while others are significantly less used. Users at many different sites have experienced this issue, and HDFS administrators are taking steps like: - Manually rebalancing blocks in storage directories - Decomissioning nodes later readding them There's a tradeoff between making use of all available spindles, and filling disks at the sameish rate. Possible solutions include: - Weighting less-used disks heavier when placing new blocks on the datanode. In write-heavy environments this will still make use of all spindles, equalizing disk use over time. - Rebalancing blocks locally. This would help equalize disk use as disks are added/replaced in older cluster nodes. Datanodes should actively manage their local disk so operator intervention is not needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode
[ https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13462960#comment-13462960 ] Steve Hoffman commented on HDFS-1312: - {quote} The other thing is that as you grow a grid, you care less and less about the balance on individual nodes. This issue is of primary important to smaller installations who likely are under-provisioned hardware-wise anyway. {quote} Our installation is about 1PB so I think we can say we are past small. We typically run at 70-80% full as we are not made of money. And at 90% the disk alarms start waking people out of bed. I would say we very much care about the balance of a single node. When that node fills, it'll take out the region server, the M/R jobs running on it and generally anger people who's jobs have to be restarted. I wouldn't be so quick to discount this. And when you have enough machines, you are replacing disks more and more frequently. So ANY manual process is $ wasted in people time. Time to re-run jobs, times to take down datanode and move blocks. Time = $. To turn Hadoop into a more mature product, shouldn't we be striving for it just works? Re-balance disks within a Datanode -- Key: HDFS-1312 URL: https://issues.apache.org/jira/browse/HDFS-1312 Project: Hadoop HDFS Issue Type: New Feature Components: data-node Reporter: Travis Crawford Filing this issue in response to ``full disk woes`` on hdfs-user. Datanodes fill their storage directories unevenly, leading to situations where certain disks are full while others are significantly less used. Users at many different sites have experienced this issue, and HDFS administrators are taking steps like: - Manually rebalancing blocks in storage directories - Decomissioning nodes later readding them There's a tradeoff between making use of all available spindles, and filling disks at the sameish rate. Possible solutions include: - Weighting less-used disks heavier when placing new blocks on the datanode. In write-heavy environments this will still make use of all spindles, equalizing disk use over time. - Rebalancing blocks locally. This would help equalize disk use as disks are added/replaced in older cluster nodes. Datanodes should actively manage their local disk so operator intervention is not needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode
[ https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463168#comment-13463168 ] Steve Hoffman commented on HDFS-1312: - bq. Yes, I think this should be fixed. This was my original question really. Since it hasn't made the cut in over 2 years, I was wondering what it would take to either do something with this or should it be closed it as a won't fix with script/documentation support for the admins? bq. No, I don't think this is as big of an issue as most people think. Basically, I agree with you. There are worse things that can go wrong. bq. At 70-80% full, you start to run the risk that the NN is going to have trouble placing blocks, esp if . Also, if you are like most places and put the MR spill space on the same file system as HDFS, that 70-80% is more like 100%, especially if you don't clean up after MR. (Thus why I always put MR area on a separate file system...) Agreed. More getting installed Friday. Just don't want bad timing/luck to be a factor here -- and we do clean up after the MR. bq. As you scale, you care less about the health of individual nodes and more about total framework health. Sorry, have to disagree here. The total framework is made up of the parts. While I agree there is enough redundancy built in to handle most cases once your node count gets above a certain level, you are basically saying it doesn't have to work well in all cases because more $ can be thrown at it. bq. 1PB isn't that big. At 12 drives per node, we're looking at ~50-60 nodes. Our cluster is storage dense yes, so a loss of 1 node is noticeable. Re-balance disks within a Datanode -- Key: HDFS-1312 URL: https://issues.apache.org/jira/browse/HDFS-1312 Project: Hadoop HDFS Issue Type: New Feature Components: data-node Reporter: Travis Crawford Filing this issue in response to ``full disk woes`` on hdfs-user. Datanodes fill their storage directories unevenly, leading to situations where certain disks are full while others are significantly less used. Users at many different sites have experienced this issue, and HDFS administrators are taking steps like: - Manually rebalancing blocks in storage directories - Decomissioning nodes later readding them There's a tradeoff between making use of all available spindles, and filling disks at the sameish rate. Possible solutions include: - Weighting less-used disks heavier when placing new blocks on the datanode. In write-heavy environments this will still make use of all spindles, equalizing disk use over time. - Rebalancing blocks locally. This would help equalize disk use as disks are added/replaced in older cluster nodes. Datanodes should actively manage their local disk so operator intervention is not needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3647) Expose dfs.datanode.max.xcievers as metric
Steve Hoffman created HDFS-3647: --- Summary: Expose dfs.datanode.max.xcievers as metric Key: HDFS-3647 URL: https://issues.apache.org/jira/browse/HDFS-3647 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, performance Affects Versions: 0.20.2 Reporter: Steve Hoffman Not sure if this is in a newer version of Hadoop, but in CDH3u3 it isn't there. There is a lot of mystery surrounding how large to set dfs.datanode.max.xcievers. Most people say to just up it to 4096, but given that exceeding this will cause an HBase RegionServer shutdown (see Lars' blog post here (http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html), it would be nice if we could expose the current count via the built-in metrics framework (most likely under dfs). In this way we could watch it to see if we have it set too high, too low, time to bump it up, etc. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3647) Expose current xcievers count as metric
[ https://issues.apache.org/jira/browse/HDFS-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Hoffman updated HDFS-3647: Summary: Expose current xcievers count as metric (was: Expose dfs.datanode.max.xcievers as metric) Expose current xcievers count as metric --- Key: HDFS-3647 URL: https://issues.apache.org/jira/browse/HDFS-3647 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, performance Affects Versions: 0.20.2 Reporter: Steve Hoffman Not sure if this is in a newer version of Hadoop, but in CDH3u3 it isn't there. There is a lot of mystery surrounding how large to set dfs.datanode.max.xcievers. Most people say to just up it to 4096, but given that exceeding this will cause an HBase RegionServer shutdown (see Lars' blog post here (http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html), it would be nice if we could expose the current count via the built-in metrics framework (most likely under dfs). In this way we could watch it to see if we have it set too high, too low, time to bump it up, etc. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3647) Expose current xcievers count as metric
[ https://issues.apache.org/jira/browse/HDFS-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413075#comment-13413075 ] Steve Hoffman commented on HDFS-3647: - https://issues.cloudera.org/browse/DISTRO-414 opened with cloudera. Thx. I'll leave it to you guys if you want to use this to track a 1.X apache back port. Expose current xcievers count as metric --- Key: HDFS-3647 URL: https://issues.apache.org/jira/browse/HDFS-3647 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, performance Affects Versions: 0.20.2 Reporter: Steve Hoffman Not sure if this is in a newer version of Hadoop, but in CDH3u3 it isn't there. There is a lot of mystery surrounding how large to set dfs.datanode.max.xcievers. Most people say to just up it to 4096, but given that exceeding this will cause an HBase RegionServer shutdown (see Lars' blog post here: http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html), it would be nice if we could expose the current count via the built-in metrics framework (most likely under dfs). In this way we could watch it to see if we have it set too high, too low, time to bump it up, etc. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira