[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-07-18 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382995#comment-15382995
 ] 

Andrew Wang commented on HDFS-1312:
---

Thanks Arpit, I've added similar text over at HADOOP-13383 for the website 
notes.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Fix For: 3.0.0-alpha1
>
> Attachments: Architecture_and_test_update.pdf, 
> Architecture_and_testplan.pdf, HDFS-1312.001.patch, HDFS-1312.002.patch, 
> HDFS-1312.003.patch, HDFS-1312.004.patch, HDFS-1312.005.patch, 
> HDFS-1312.006.patch, HDFS-1312.007.patch, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-07-18 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382793#comment-15382793
 ] 

Arpit Agarwal commented on HDFS-1312:
-

Done.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Fix For: 3.0.0-alpha1
>
> Attachments: Architecture_and_test_update.pdf, 
> Architecture_and_testplan.pdf, HDFS-1312.001.patch, HDFS-1312.002.patch, 
> HDFS-1312.003.patch, HDFS-1312.004.patch, HDFS-1312.005.patch, 
> HDFS-1312.006.patch, HDFS-1312.007.patch, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-07-18 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382756#comment-15382756
 ] 

Andrew Wang commented on HDFS-1312:
---

Could someone (Arpit? Anu? Eddy?) add release notes for this feature? Thanks!

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Fix For: 3.0.0-alpha1
>
> Attachments: Architecture_and_test_update.pdf, 
> Architecture_and_testplan.pdf, HDFS-1312.001.patch, HDFS-1312.002.patch, 
> HDFS-1312.003.patch, HDFS-1312.004.patch, HDFS-1312.005.patch, 
> HDFS-1312.006.patch, HDFS-1312.007.patch, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-06-23 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347829#comment-15347829
 ] 

Arpit Agarwal commented on HDFS-1312:
-

The checkstyle failures were 'hides a field' and one long method which was not 
added by this patch.

I've merged the HDFS-1312 feature branch to trunk. Thanks for the code 
contribution [~anu], [~xiaobingo], [~eddyxu] and [~linyiqun]. Thanks to 
everyone else who contributed ideas and feedback on this historical jira. :) 
Users frequently request this feature and it felt good to commit it. 

Anu or I will resolve this Jira shortly and move out the remaining sub-tasks to 
a follow-up Jira.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_test_update.pdf, 
> Architecture_and_testplan.pdf, HDFS-1312.001.patch, HDFS-1312.002.patch, 
> HDFS-1312.003.patch, HDFS-1312.004.patch, HDFS-1312.005.patch, 
> HDFS-1312.006.patch, HDFS-1312.007.patch, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-06-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347802#comment-15347802
 ] 

Hudson commented on HDFS-1312:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #10014 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10014/])
HDFS-8821. Stop tracking CHANGES.txt in the HDFS-1312 feature branch. (arp: rev 
4b93ddae07ba4332f40f896542ee2c6d7bf899ed)
* hadoop-hdfs-project/hadoop-hdfs/pom.xml
* hadoop-hdfs-project/hadoop-hdfs/HDFS-1312_CHANGES.txt
Fix a build break in HDFS-1312 (arp: rev 
9be9703716d2787cd6ee0ebbbe44a18b1f039018)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/diskbalancer/DiskBalancerException.java


> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_test_update.pdf, 
> Architecture_and_testplan.pdf, HDFS-1312.001.patch, HDFS-1312.002.patch, 
> HDFS-1312.003.patch, HDFS-1312.004.patch, HDFS-1312.005.patch, 
> HDFS-1312.006.patch, HDFS-1312.007.patch, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347621#comment-15347621
 ] 

Hadoop QA commented on HDFS-1312:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue}  0m  
0s{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 15 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 45s{color} | {color:orange} hadoop-hdfs-project: The patch generated 19 new 
+ 1000 unchanged - 2 fixed = 1019 total (was 1002) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
12s{color} | {color:green} The patch generated 0 new + 75 unchanged - 1 fixed = 
75 total (was 76) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
54s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 29s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}101m 22s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:85209cc |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12812975/HDFS-1312.007.patch |
| JIRA Issue | HDFS-1312 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  xml  shellcheck  shelld

[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347444#comment-15347444
 ] 

Hadoop QA commented on HDFS-1312:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue}  0m  
1s{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 15 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
42s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 50s{color} | {color:orange} hadoop-hdfs-project: The patch generated 20 new 
+ 1000 unchanged - 2 fixed = 1020 total (was 1002) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
14s{color} | {color:green} The patch generated 0 new + 75 unchanged - 1 fixed = 
75 total (was 76) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
4s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m  0s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}107m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestEditLog |
|   | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:85209cc |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12812949/HDFS-1312.006.patch |
| JIRA Issue | HDFS-1312 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  xml  shellcheck  shelldocs  |
| uname | Linu

[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347140#comment-15347140
 ] 

Hadoop QA commented on HDFS-1312:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m  
1s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue}  0m  
0s{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 15 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 49s{color} | {color:orange} hadoop-hdfs-project: The patch generated 19 new 
+ 1000 unchanged - 2 fixed = 1019 total (was 1002) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
13s{color} | {color:green} The patch generated 0 new + 75 unchanged - 1 fixed = 
75 total (was 76) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
57s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 18s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}124m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | hadoop.hdfs.server.blockmanagement.TestBlockManager |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand |
|   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:85209cc |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12812894/HDFS-131

[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344753#comment-15344753
 ] 

Hadoop QA commented on HDFS-1312:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue}  0m  
0s{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 15 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 43s{color} | {color:orange} hadoop-hdfs-project: The patch generated 19 new 
+ 1000 unchanged - 2 fixed = 1019 total (was 1002) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
14s{color} | {color:green} The patch generated 0 new + 75 unchanged - 1 fixed = 
75 total (was 76) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
59s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 17s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 90m 13s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.server.namenode.TestCacheDirectives |
|   | hadoop.hdfs.server.namenode.TestNamenodeRetryCache |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:e2f6409 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12812506/HDFS-1312.003.patch |
| JIRA Issue 

[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-06-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340177#comment-15340177
 ] 

Hadoop QA commented on HDFS-1312:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue} 0m 1s 
{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 15 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 13s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 34s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 52s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 47s 
{color} | {color:red} hadoop-hdfs-project: The patch generated 18 new + 1001 
unchanged - 2 fixed = 1019 total (was 1003) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 
15s {color} | {color:green} The patch generated 0 new + 75 unchanged - 1 fixed 
= 75 total (was 76) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 1s 
{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 42s {color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
29s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 99m 40s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:e2f6409 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12811864/HDFS-1312.002.patch |
| JIRA Issue | HDFS-1312 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  xml  shellcheck  shelldocs  |
| uname | Linux 2b8e9ba71dec 3.13.

[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-06-17 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337024#comment-15337024
 ] 

Anu Engineer commented on HDFS-1312:


[~aw] Thanks will do.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_test_update.pdf, 
> Architecture_and_testplan.pdf, HDFS-1312.001.patch, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-06-17 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337016#comment-15337016
 ] 

Allen Wittenauer commented on HDFS-1312:


bq. White Space warning is due to doc changes.

You should still remove them.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_test_update.pdf, 
> Architecture_and_testplan.pdf, HDFS-1312.001.patch, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-06-17 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336789#comment-15336789
 ] 

Anu Engineer commented on HDFS-1312:


* Checkstyle issues are mostly - variable hides a field.
* White Space warning is due to doc changes.
* Test failures are not related to this patch

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_test_update.pdf, 
> Architecture_and_testplan.pdf, HDFS-1312.001.patch, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-06-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336751#comment-15336751
 ] 

Hadoop QA commented on HDFS-1312:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue} 0m 1s 
{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 15 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 37s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 30s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 44s 
{color} | {color:red} hadoop-hdfs-project: The patch generated 19 new + 1000 
unchanged - 2 fixed = 1019 total (was 1002) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 
13s {color} | {color:green} The patch generated 0 new + 75 unchanged - 1 fixed 
= 75 total (was 76) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch 10 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 55s 
{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 24s {color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 92m 51s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.tools.TestHdfsConfigFields |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:e2f6409 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12810943/HDFS-1312.001.patch |
| JIRA Issue | HDFS-1312 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  xml  shellcheck  shelldocs  |
| uname | Linux cf6241f0ffa9 3.13.0-36-lowlaten

[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-01-15 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102381#comment-15102381
 ] 

Anu Engineer commented on HDFS-1312:


*Notes from the call on Jan,14th 2016*

Attendees: Andrew Wang, Lei Xu, Colin McCabe, Chris Trezzo, Ming Ma, Arpit 
Agarwal, Jitendra Pandey, Jing Zhao, Mingliang Liu , Xiaobing Zhou, Anu 
Engineer and
others who dialed in (I could only see phone numbers not names, my apologies to 
people I am missing)

We discussed the goals of HDFS-1312. Andrew Wang mentioned that HDFS-1804 is 
used by many customers and it is safe and been used in production for a while. 
Jitendra pointed out that we still have many customers who are not using 
HDFS-1804. so he suggested that we focus the discussion on HDFS-1312. We 
explored the pros and cons of having the planner completely inside the datanode 
and various other  user scenarios. As a team we wanted to make sure that all 
major scenarios are identified and covered in this review.

Ming Ma raised an interesting question, which we decided to address - He wanted 
to find out if running diskbalancer has any quantifiable performance effect. 
Anu mentioned that since we have bandwidth control, Admins should be able to 
control it. However, any disk I/O has a cost and we decided to do some 
performance measurement of disk balancer.

Andrew Wang raised the question of performance counters and how external tools 
like Cloudera Manager or Ambari would use disk balancer ? He also explored how 
we will be able to integrate this tool with other Management tools. We agreed 
that we will have a set of performance counters exposed via datanode JMX. We 
also discussed design trade-offs of doing disk balancer inside the datanode vs. 
outside. We reviewed lots of administrative scenarios and concluded that this 
tool would be able to address them. We also concluded that tool does not do any 
cluster-wide planning and all data movement in confined to datanode.

Colin McCabe brought up a set of interesting questions. He made us think 
through the scenario of data changing in the datanodes while disk balancer is 
operational, the impact of future disks with shingled magnetic recording and 
large disk sizes. He was wondering how long we would take to balance a datanode 
if it is filled with 6 TB or even 20 TB drives. The conclusion was that if you 
had large slow disks and lots of data in a node, it would take proportionally 
more time. For the question of data changing in the datanodes, disk balancer 
would support a tolerance value, or good enough value for balancing. That is, 
an administrator can specify that getting 10% close to the expected data 
distribution is good enough. We also discussed a scenario called "Hot Removeā€, 
just like hot swap, small cluster owners might find it useful to move all data 
out of hard disk before removing a disk, say to upgrade to a larger size. 

Ming ma pointed out that for them it is easier and simpler to decommission a 
node. if you have large number of nodes, relying on network is more efficient 
than micro-managing a datanode. We agreed to that, but for small cluster owners 
(say less than 5 or 10 nodes), it might make sense to support the ability to 
move data out of disk. Anu pointed out that disk balancer design does 
accommodate that capability even though it is not the primary goal of the tool.

Ming Ma also brought up how twitter runs balancer tool today, it is always 
being run against Twitter clusters. We discussed if having that balancer as a 
part of namenode makes sense, but concluded that it was out of scope for 
HDFS-1312. Andrew mentioned that is the right thing to do in the long run. We 
also discussed if disk balancer should automatically trigger instead of being 
an administrator driven task and we were worried that it would trigger and 
incur I/O when higher priority compute jobs were running in the cluster, hence 
we decided we are better off letting an admin decide when it is good time to 
run the disk balancer.

At the end of review Andrew asked if we can finish this work by end of next 
month and offered help to make sure that this feature is done sooner.

*Action Items:*  
* Analyze performance impact of disk balancer.
* Add a set of performance counters exposed via datanode JMX.

Please feel free to comment / correct these notes if I have missed anything. 
Thank you all for calling in and for having such a great and productive 
discussion about HDFS-1312.






> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-pro

[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-01-13 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096827#comment-15096827
 ] 

Chris Trezzo commented on HDFS-1312:


I will dial into the call as well. Thanks for posting.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-01-11 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092796#comment-15092796
 ] 

Anu Engineer commented on HDFS-1312:


Hi [~andrew.wang],

As discussed off-line, let us meet on 14th of Jan, 2016 @ 4:00 - 5:00 PM PST.  
Here is the meeting info. 
I look forward to chatting with other Apache members who might be interested in 
this topic.

Anu Engineer is inviting you to a scheduled Zoom meeting. 
{noformat}
Topic: HDFS-1312 discussion
Time: Jan 14, 2016 4:00 PM (GMT-8:00) Pacific Time (US and Canada) 

Join from PC, Mac, Linux, iOS or Android: 
https://hortonworks.zoom.us/j/267578285
 
Or join by phone:

+1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll)
+1 855 880 1246 (US Toll Free)
+1 888 974 9888 (US Toll Free)
Meeting ID: 267 578 285 
International numbers available: 
https://hortonworks.zoom.us/zoomconference?m=ZlHRHTGmVEKXzM_RaCyzcSnjlk_z3ovm 
{noformat}

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-01-08 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090085#comment-15090085
 ] 

Anu Engineer commented on HDFS-1312:


Hi [~andrew.wang],

Thanks for you comments.  Here are my thoughts on these issues.

bq. I don't follow this line of reasoning; don't concerns about using a new 
feature apply to a hypothetical HDFS-1312 implementation too?

I think it is related to the risk. Let us look at the worst case scenarios 
possible with HDFS-1804 and HDFS-1312. With HDFS-1804 it is a cluster wide 
change, it is always on and any write will always go thru it. HDFS-1804 thus 
can have a cluster wide impact including impact on various workloads in the 
cluster.

However with HDFS-1312, the worst case is that we will take a node off-line. 
Since it is external tool that operates off-line on a node. Another important 
difference is that it is not always on, it works and goes away. So the amount 
of risk to the cluster, especially from an administrators point of view is 
different with these 2 approaches.

bq. Why do we lose this? Can't the DN dump this somewhere?

we can , but then we need to add RPCs in datanode to pull out that data and 
display the change in the node, whereas in the current approach it is something 
that we write to the local disk and then compute the diff later against the 
sources. We don't need a datanode operation.

bq. This is an interesting point I was not aware of. Is the goal here to do 
inter-DN moving? 
No, the goal is *intra-DN*, I was referring to {noformat} hdfs mover {noformat} 
not to {noformat} hdfs balancer{noformat} 

bq. If it's only for intra-DN moving, then it could still live in the DN.

Completely agree, all block moving code will be in DN.  

bq. This is also why I brought up HDFS-8538. If HDFS-1804 is the default volume 
choosing policy, we won't see imbalance outside of hotswap.

Agree, and it is a goal that we should work towards. From the comments in 
HDFS-8538, it looks like we might have to make some minor tweaks to that before 
we can commit it. I can look at it after HDFS-1312.

bq. The point I was trying to make is that HDFS-1804 addresses the imbalance 
issues besides hotswap, so we eliminate the alerts in the first place. Hotswap 
is an operation explictly undertaken by the admin, so the admin will know to 
also run the intra-DN balancer.

Since we both have made this point many times, I am going to agree with what 
you are saying. Even if we assume that hotswap or normal swap is the only use 
case for disk balancing, in a large cluster many disks would have failed. So if 
a cluster gets a number of disks replaced the current interface would make 
admins life easier. The admins can replace a bunch of disks on various machines 
and ask the system to find and fix those nodes. I just think the interface we 
are building makes the life of admins easier, and takes nothing away from the 
use cases described by you.

bq. This is an aspirational goal, but when debugging a prod cluster we almost 
certainly also want to see the DN log too

Right now, we have actually met the aspirational goal, we capture the snapshot 
of the node and that allows us to both debug and simulate what is happening 
with disk-balancer off-line.

bq. Would it help to have a phone call about this? We have a lot of points 
flying around, might be easier to settle this via a higher-bandwidth medium.

I think that is an excellent idea, would love to chat with you in person. I 
will setup a meeting and post the meeting info in this JIRA.

I really appreciate your inputs and thoughtful discussion we are having, hope 
to speak to you in person soon.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equali

[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-01-08 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089877#comment-15089877
 ] 

Andrew Wang commented on HDFS-1312:
---

Hi Anu, some replies:

bq. Generally administrators are wary of enabling a feature like HDFS-1804 in a 
production cluster. For new clusters it is more easier but for existing 
production clusters assuming the existence of HDFS-1804 is not realistic.

I don't follow this line of reasoning; don't concerns about using a new feature 
apply to a hypothetical HDFS-1312 implementation too?

HDFS-1804 also was fixed in 2.1.0, so almost everyone should have it available. 
It's also been in use for years, so it's pretty stable.

bq. we do lose one the critical feature of the tool, that is ability to report 
what we did to the machine

Why do we lose this? Can't the DN dump this somewhere?

bq. We wanted to merge mover into this engine later...

This is an interesting point I was not aware of. Is the goal here to do 
inter-DN moving? If so, we have a long-standing issue with inter-DN balancing, 
which is that the balancer as an external process is not aware of the NN's 
block placement policies, leading to placement violations. This is something 
[~mingma] and [~ctrezzo] brought up; if we're doing a rewrite of this 
functionality, it should probably be in the NN.

If it's only for intra-DN moving, then it could still live in the DN.

bq. Two issues with that, one there are lots of customers without HDFS-1804, 
and HDFS-1804 is just an option that user can choose.

Almost everyone is running a version of HDFS with HDFS-1804 these days. As I 
said in my previous comment, if a cluster is commonly hitting imbalance, 
enabling HDFS-1804 should be the first step since a) it's already available and 
b) it avoids the imbalance in the first place, which better conserves IO 
bandwidth.

This is also why I brought up HDFS-8538. If HDFS-1804 is the default volume 
choosing policy, we won't see imbalance outside of hotswap.

bq. Getting an alert due to low space on disk from datanode is very 
reactive it is common enough problem that I think it should be solved at 
HDFS level.

The point I was trying to make is that HDFS-1804 addresses the imbalance issues 
besides hotswap, so we eliminate the alerts in the first place. Hotswap is an 
operation explictly undertaken by the admin, so the admin will know to also run 
the intra-DN balancer. There's no monitoring system in the loop.

bq. I prefer to debug by looking at my local directory instead of ssh-ing into 
a datanode...

This is an aspirational goal, but when debugging a prod cluster we almost 
certainly also want to see the DN log too, which is local to the DN. Cluster 
management systems also make log collection pretty easy, so this seems minor.

Would it help to have a phone call about this? We have a lot of points flying 
around, might be easier to settle this via a higher-bandwidth medium.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-01-07 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088688#comment-15088688
 ] 

Anu Engineer commented on HDFS-1312:


Hi [~andrew.wang] Thanks for the quick response. I was thinking about what you 
said  and I think the the disconnect we are having is because you are assuming 
that HDFS-1804 is always available on the HDFS clusters, but for many customers 
that is not true.

bq. Have these customers tried out HDFS-1804? We've had users complain about 
imbalance before too, and after enabling HDFS-1804, no further issues.

Generally administrators are wary of  enabling a feature like HDFS-1804 in a 
production cluster. For new clusters it is more easier but for existing 
production clusters assuming the existence of HDFS-1804 is not realistic.


bq. If you're interested, feel free to take HDFS-8538 from me. I really think 
it'll fix the majority of imbalance issues outside of hotswap.

Thank you for the offer. I can certainly work on HDFS-8538 after HDFS-1312. I 
think of HDFS-1804, HDFS-8538 and HDFS-1312 as part of the solution to the same 
problem. Just attacking it from different angles, without all three some part 
of HDFS users will always be left out.

bq. When I mentioned removing the discover phase, I meant the NN communication. 
Here, the DN just probes its own volume information. Does it need to talk to 
the NN for anything else?


# I am not able to see any particular advantage is doing planning inside the 
datanode. However with that approach we do lose one the critical feature of the 
tool, that is ability to report what we did to the machine, capturing the 
"before state" is much more complex. I agree that users can manually capture 
this info before and after, but that is an extra administrative burden. With 
the current approach, we record the information of a datanode before and we 
make it easy to compare the state once we are done.
# Also We wanted to merge mover into this engine later, with the current 
approach what we build inside datanode is a simple block mover, or to be 
precise an RPC interface to the existing mover's block interface. You can feed 
it any move commands you like, which provides better composability. With what 
you are suggesting we will lose that flexibility.
# Less complexity inside datanode, planner code never needs to run inside 
datanode, it is a piece of code that plans a set of moves, why would we  want 
to run it inside datanode ?.
# Since none of our tools currently report the disk level data distribution, 
without talking to Namenode it is not possible to find which nodes are 
imbalanced. I know that you are arguing that it will  never happen, if all 
customers use HDFS-1804. Two issues with that, one there are lots of customers 
without HDFS-1804, and HDFS-1804 is just an option that user can choose. Since 
it is configurable, it is arguable that we will always have customers without 
HDFS-1804. The current architecture addresses the needs of both group of users. 
In that sense current architecture is a better or more encompassing 
architecture.

bq. Cluster-wide disk information is already handled by monitoring tools, no? 
The admin gets the Ganglia alert saying some node is imbalanced, admin triggers 
intranode balancer, admin keeps looking at Ganglia to see if it's fixed. I 
don't think adding our own monitoring of the same information helps, when 
Ganglia etc. are already available, in-use, and understood by admins.

Please correct me if I am missing something here, Getting an alert due to low 
space on disk from datanode is very reactive. With the current disk balancer 
design we are assuming that  disk balancer tool can be run to address and fix 
any issue you have in the cluster. You could argue that admins can write some 
scripts to monitor this issue using ganglia command line, but it is common 
enough problem that I think it should be solved at HDFS level.

Here are two use cases that disk balancer addresses. First one is discovering 
nodes with potential issues and the second one is auto fixing those issues. 
This is very similar to current balancer.

1. Scenario 1: Admin can run {noformat} hdfs diskbalancer -top 100 {noformat} , 
and viola! we print out the top 100 hundred nodes that is having a problem. Let 
us say that admin now wants to look closely at the node and find out 
distribution on individual disks, he can now do that via. disk balancer, ssh or 
ganglia.

2. Scenario 2 : Admin wants  not to be bothered with this balancing act at all, 
in reality he is thinking why doesn't HDFS just take care of this(I know 
HDFS-1804 is addressing that, but again we are talking about cluster which does 
not have it enabled.) and in that case we will let the admin run {noformat} 
hdfs diskbalancer -top 10 -balance  {noformat}, this allows the admin to run 
disk balancer just like current balancer, without having to worry

[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-01-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088505#comment-15088505
 ] 

Andrew Wang commented on HDFS-1312:
---

Hi Anu, thanks for the reply,

bq. Our own experience from the field is that many customers routinely run into 
this issue, with and without new drives being added, so we are tackling both 
use cases.

Have these customers tried out HDFS-1804? We've had users complain about 
imbalance before too, and after enabling HDFS-1804, no further issues. This is 
why I'm trying to separate the two usecases; heterogeneous disks are better 
addressed by HDFS-1804 since it works automatically, leaving hotswap to this 
JIRA (HDFS-1312).

If you're interested, feel free to take HDFS-8538 from me. I really think it'll 
fix the majority of imbalance issues outside of hotswap.

bq. (self-quote) I think most of this functionality should live in the DN since 
it's better equipped to do IO throttling and mutual exclusion.

I think I was too brief before, let me expand a little. I imagine basically 
everything (discover, planning, execute) happening in the DN:

# Client sends RPC to DN telling it to balance with some parameters
# DN examines its volumes, constructs some {{Info}} object to hold the data
# DN thread calls Planner passing the {{Info}} object, which outputs a {{Plan}}
# {{Plan}} is queued at DN executor pool, which does the moves

Attempting to address your points one by one:

# When I mentioned removing the discover phase, I meant the NN communication. 
Here, the DN just probes its own volume information. Does it need to talk to 
the NN for anything else?
# Assuming no need for NN communication, there's no new code. The new RPCs 
would be one to start balancing and one to monitor ongoing balancing, the rest 
of the communication happens between DN threads.
# Cluster-wide disk information is already handled by monitoring tools, no? The 
admin gets the Ganglia alert saying some node is imbalanced, admin triggers 
intranode balancer, admin keeps looking at Ganglia to see if it's fixed. I 
don't think adding our own monitoring of the same information helps, when 
Ganglia etc. are already available, in-use, and understood by admins.
# I don't think this conflicts with the debuggability goal. The DN can dump the 
{{Info}} object (and even the {{Plan}}) object) if requested, to the log or 
somewhere in a data dir. Then we can pass it into a unit test to debug. This 
unit test doesn't need to be a minicluster either, if we write the Planner 
correctly the {{Info}} should encapsulate all the state and we're just running 
an algorithm to make a {{Plan}}. The planner being inside the DN doesn't change 
this.

Thanks for the pointer to the mover logic, wasn't aware we had that. I asked 
about this since the proposal doc in 4.2 says "copy block from A to B and 
verify". Adding a note that says "we use the existing moveBlockAcrossStorage 
method" is a great answer.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-01-07 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088431#comment-15088431
 ] 

Anu Engineer commented on HDFS-1312:


[~andrew.wang] I really appreciate you taking time out to read the proposal and 
provide such detailed feedback. 

bq. I really like separating planning from execution, since it'll make the unit 
tests actual unit tests (no minicluster!). This is a real issue with the 
existing balancer, the tests take forever to run and don't converge reliably.

Thank you, that was the intent.

bq. Section 2: HDFS-1804 has been around for a while and used successfully by 
many of our users, so "lack of real world data or adoption" is not entirely 
correct. We've even considered making it the default, see HDFS-8538 where the 
consensus was that we could do this if we add some additional throttling.

Thanks for correcting me and reference to HDFS-8538.  I know folks at Cloudera 
write some excellent engineering blog posts and papers, I would love to read 
your experiences with using HDFS-1804.

bq. IMO the usecase to focus on is the addition of fresh drives, particularly 
in the context of hotswap. I'm unconvinced that intra-node imbalance happens 
naturally when HDFS-1804 is enabling, and enabling HDFS-1804 is essential if a 
cluster is commonly suffering from intra-DN imbalance (e.g. from differently 
sized disks on a node). This means we should only see intra-node imbalance on 
admin action like adding a new drive; a singular, administrator-triggered 
operation.

Our own experience from the field is that many customers routinely run into 
this issue, with and without new drives being added, so we are tackling both 
use cases.

bq. However, I don't understand the need for cluster-wide reporting and 
orchestration. 

Just to make sure we are in same page, there is no cluster-wide orchestration. 
The cluster-wide reporting allows admins to see which nodes need disk 
balancing. Not all clusters are running HDFS-1804, and hence the ability to 
discover which machines need disk-balancing is useful for a large set of 
customers.

bq. I think most of this functionality should live in the DN since it's better 
equipped to do IO throttling and mutual exclusion.
That is how it is, HDFS-1312 is not complete and you will see all the data 
movement is indeed inside the datanode.

bq. Namely, this avoids the Discover step, simplifying things. There's also no 
global planning step. What I'm envisioning is a user experience where admin 
just points it at a DN, like:
{noformat}
hdfs balancer -volumes -datanode 1.2.3.4:50070
{noformat}

[Took the liberty to re-order some of your comments since both of these are 
best answered together]


Completely agree with the proposed command, in fact that is one of the commands 
that will be part of the tool. As you pointed out that is the simplest use case.

The reason for discovery is due to many considerations.

# This discover phase is needed even if we have to process a single node, we 
have to read the information about that node. I understand that you are 
pointing out that we may not need the info for the whole cluster, please see my 
next point.
# Discover phase  makes coding simpler since we are able to rely on current 
balancer code in {{NameNodeConnector}}, and avoids us having to write any new 
code. For the approach that you are suggesting we will have to add more  RPCs 
to datanode and for a rarely used administrative tool it seemed like an 
overkill, when balancer already provided a way for us to do it. if you look at 
current code in HDFS-1312, you will  see that discover is merely a pass-thru to 
the  {{Balancer#NameNodeConnector}}.
# With discover approach, we can take a snapshot of the nodes and cluster, that 
allows us to report to the admin what changes were done by us after the move. 
This is pretty useful. Since we have cluster-wide data with us it also allows 
us to report to an administrator which nodes need his/her attention (this is 
just a sort and print). As I mentioned earlier, unfortunately there is a large 
number of customers that still do not use HDFS-1804 and it is a beneficial 
feature for them.
# Last but most important from my personal point of view, this allows testing 
and error reporting for disk balancer to much more easier. Let us say we find a 
bug in disk-balancer, a customer could just provide the disk-balancer cluster 
json discovered by diskbalancer and we can debug the issue off-line. During the 
the development phase, that is how I have been testing disk balancer, by taking 
a snapshot of real clusters and then feeding that data back into disk-balancer 
via a connector called {{JsonNodeConnector}}. 

As I said earlier, the most generic use case I have in mind is the one you 
already described,  where the user wants to just point the balancer at a 
datanode, and we will support that use case.

bq. There's al

[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2016-01-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088012#comment-15088012
 ] 

Andrew Wang commented on HDFS-1312:
---

Hi Anu, thanks for picking up this JIRA, it's a long-standing issue. It's clear 
from the docs that you've put a lot of thought into the problem, and I hope 
that some of the same ideas could someday be carried over to the existing 
balancer too. I really like separating planning from execution, since it'll 
make the unit tests actual unit tests (no minicluster!). This is a real issue 
with the existing balancer, the tests take forever to run and don't converge 
reliably.

I did have a few comments though, hoping we can get clarity on the usecases and 
some of the resulting design decisions:

Section 2: HDFS-1804 has been around for a while and used successfully by many 
of our users, so "lack of real world data or adoption" is not entirely correct. 
We've even considered making it the default, see HDFS-8538 where the consensus 
was that we could do this if we add some additional throttling.

IMO the usecase to focus on is the addition of fresh drives, particularly in 
the context of hotswap. I'm unconvinced that intra-node imbalance happens 
naturally when HDFS-1804 is enabling, and enabling HDFS-1804 is essential if a 
cluster is commonly suffering from intra-DN imbalance (e.g. from differently 
sized disks on a node). This means we should only see intra-node imbalance on 
admin action like adding a new drive; a singular, administrator-triggered 
operation.

With that in mind, I wonder if we can limit the scope of this effort. I like 
the idea of an online balancer; the previous hacky scripts required downtime, 
which is unacceptable with hotswap. However, I don't understand the need for 
cluster-wide reporting and orchestration. With HDFS-1804, intra-node imbalance 
should only happen when an admin adding a new drive, the admin can also know to 
trigger the intra-DN balancer when doing this. If they forget, Ganglia will 
light up and remind them.

What I'm envisioning is a user experience where admin just points it at a DN, 
like:

{noformat}
hdfs balancer -volumes -datanode 1.2.3.4:50070
{noformat}

Namely, this avoids the Discover step, simplifying things. There's also no 
global planning step, I think most of this functionality should live in the DN 
since it's better equipped to do IO throttling and mutual exclusion. Basically 
we'd send an RPC to tell the DN to balance itself (with parameters), and then 
poll another RPC to watch the status and wait until it's done.

On the topic of the actual balancing, how do we atomically move the block in 
the presence of failures? Right now the NN expects only one replica per DN, so 
if the same replica is on multiple volumes of the DN, we could run into issues. 
See related (quite serious) issues like HDFS-7443 and HDFS-7960. I think we can 
do some tricks with a temp filename and rename, but this procedure should be 
carefully explained.

Thanks,
Andrew

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-12-22 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068956#comment-15068956
 ] 

Anu Engineer commented on HDFS-1312:


[~eddyxu] Very pertinent questions. I will probably steal these questions and 
add them to the diskbalancer documentation.

bq. Will disk balancer be a daemon or a CLI tool, that waits the process to 
finishes? 

It is a combination of both. Let me explain in greater detail. We rely on 
datanodes to do actual copy, and datanodes will run a background thread - if 
there is a disk balancing job to be run. The disk balancer job is a list of 
statements which contain a source volume, destination volume and number of 
bytes - in the code called a MoveStep 
{{org.apache.hadoop.hdfs.server.diskbalancer.planner.MoveStep}}. The CLI will 
read the current state of the cluster and compute a set of MoveSteps, which is 
called a Plan. The advantage of this approach is that it allows the 
administrator to review the plan if needed before executing it against the 
datanode. This  plan can also be persisted to a file if needed or just 
submitted to a DataNode via submitDiskbalancerPlan RPC - HDFS-9588. 

The datanode takes this plan and executes these moves in the background, so 
there is no new daemon, and the cli tool is really used in the planning and 
generation of data movement for each datanode. This is quite similar to the 
balancer, but instead of sending one RPC at a time we submit all of them 
together to the datanode -  since our moves are pretty much self contained 
within a datanode. To sum up, we have a CLI tool that will submit a plan but 
does not wait for plan to finish executing and the datanode will do the moves 
itself. 

bq. Where is the Planner executed? A DiskBalancer daemon / CLI tool or NN?
There is a background thread just like {{DirectoryScanner}} that executes the 
plan.

bq. When copying a replica from one volume to another, how to prevent it to 
have concurrent issues with the DirectoryScanner in background?

Great question, this is one of the motivators for plan executor to become a 
thread inside datanode, so that we don't get into issues with concurrent access 
with {{DirectoryScanner}}. We take the same locks as DirectoryScanner when we 
move the blocks, Also we don't have any new code for these moves, we rely on 
the existing mover code path inside the datanode to achieve the actual move.

bq. Is there a limitation of how many such disk balancer jobs can run in the 
cluster? 
Yes, one per datanode, if submit Rpc is called when a job is executing we will 
reject the new submission. Please look at HDFS-9588 , 
{{ClientDatanodeProtocol.proto#SubmitDiskBalancerPlanResponseProto#submitResults}}
 enum to see the set of errors that we return. One of them is 
PLAN_ALREADY_IN_PROGRESS which is the error you would see if you tried to 
submit another job to a data node that is already executing a job.

bq. Could other job queries the status of a running job?

Yes, I will post a patch soon which will support the QueryPlan RPC which will 
return the current status of an executing or last executed plan.








> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-12-22 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068911#comment-15068911
 ] 

Lei (Eddy) Xu commented on HDFS-1312:
-

Hey, [~anu].

Thanks a lot for working on this.  It is a very demanding feature.

I have a few questions regarding the design, much appreciated if you can answer 
them to help me to better understand the design.

* Will disk balancer be a daemon or a CLI tool, that waits the process to 
finishes?  
* Where is the Planner executed? A DiskBalancer daemon / CLI tool or NN? 
* When copying a replica from one volume to another, how to prevent it to have 
concurrent issues with the {{DirectoryScanner}} in background?
* Is there a limitation of how many such disk balancer jobs can run in the 
cluster? Could other job queries the status of a running job? 

Thanks again!
 

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-11-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023670#comment-15023670
 ] 

Tsz Wo Nicholas Sze commented on HDFS-1312:
---

Just have created a new branch HDFS-1312.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-11-18 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012631#comment-15012631
 ] 

Tsz Wo Nicholas Sze commented on HDFS-1312:
---

BTW, the design doc did not mention anything related to Federation.  We should 
think about how to handle Federation even if it is not supported in the first 
phase.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-11-18 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012479#comment-15012479
 ] 

Anu Engineer commented on HDFS-1312:


bq. Yes. Mean and variance are well known concepts. It is easier for other 
people to understand the feature and read the matrices/reports. They don't have 
to look up what does it mean by "data density".

Thanks, I will update the patch with that change. The current patch in 
HDFS-9420 contains notions of "data density".



> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-11-18 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012472#comment-15012472
 ] 

Anu Engineer commented on HDFS-1312:


bq. Similar to Balancer, we need to define a threshold so that the storage is 
considered as balanced if its dfsUsedRatio is within nodeWeightedMean +/- 
threshold.
Sorry this detail is not in the original design document. It escaped me then. 
It is defined in code and also in the Archtecture docuement from page.5 of 
Architecture_and_test_paln.pdf
{code}
dfs.disk.balancer.block.tolerance.percent |   5  | Since data nodes are 
operational we stop copying data if we have reached a good enough threshold
{code}
It is not based  on werightedMean since the data node is operational and we 
compute the plan for data moves from a static snapshot of disk utilization. 
Currently we support a simple knob in the config that
allows us to say what is a good enough target, that is if we reach close to 5% 
of desired value in terms of data movement , we can consider it balanced. 
Please do let me know if this addresses the concern.

bq. DataTransferProtocol.replaceBlock does support move blocks across storage 
types within the same node. We only need to slightly modify it for disk 
balancing (i.e. moving block within the same storage type in the same node.)
Something that should have been part of the design document. Thanks for bring 
it up.  Actual move is performed by calling into 
{{FsDataSetImpl.java#moveBlockAcrossStorage}} which is part of mover's logic.
Here is the high level logic in DiskBalancer.java -- It finds blocks on the 
source volume and moves them to destination volume by making calls to 
moveBlocksAcrossStorage ( slightly modified version of this function)




> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-11-18 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012469#comment-15012469
 ] 

Tsz Wo Nicholas Sze commented on HDFS-1312:
---

> ... Do you think redefining in terms of nodeWeightedVariance is more precise 
> as compared to the earlier method ? ...

Yes.  Mean and variance are well known concepts.  It is easier for other people 
to understand the feature and read the matrices/reports.  They don't have to 
look up what does it mean by "data density".

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-11-18 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012448#comment-15012448
 ] 

Tsz Wo Nicholas Sze commented on HDFS-1312:
---

Some more comments:
- Similar to Balancer, we need to define a threshold so that the storage is 
considered as balanced if its dfsUsedRatio is within nodeWeightedMean +/- 
threshold.
- DataTransferProtocol.replaceBlock does support move blocks across storage 
types within the same node.  We only need to slightly modify it for disk 
balancing (i.e. moving block within the same storage type in the same node.)

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-11-18 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012440#comment-15012440
 ] 

Anu Engineer commented on HDFS-1312:


[~szetszwo] You are absolutely right, we could redefine these metrics in terms 
of standard terms. I will update the document. However I do have some code 
which uses the 
{code}
sum(abs(nodeDataDensity(i))
{code}

as defined in section 4.1.2. Do you think redefining in terms of 
nodeWeightedVariance is more precise as compared to the earlier method ? This 
is just for my understanding of the issues of the earlier algorithm. 

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-11-18 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012427#comment-15012427
 ] 

Tsz Wo Nicholas Sze commented on HDFS-1312:
---

Note also that the calculation of nodeWeightedVariance can be simplified as 
{code}
nodeWeightedVariance = sum(w_i * ratio_i^2) - nodeWeightedMean^2.
{code}


> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-11-18 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012417#comment-15012417
 ] 

Tsz Wo Nicholas Sze commented on HDFS-1312:
---

Hi Anu, the design doc looks good in general.

I think we don't need to define volumeDataDensity and nodeDataDensity in 
Section 4.1.  We may simply formulate the calculation using weighted mean and 
weighted variance.
- dfsUsedRatio_i for storage i is defined the same as before, i.e.
{code}
dfsUsedRatio_i = dfsUsed_i/capacity_i.
{code}
- Define normalized weight using capacity as 
{code}
w_i = capacity_i / sum(capacity_i).
{code}
- Then, define
{code}
nodeWeightedMean = sum(w_i * dfsUsedRatio_i), and
nodeWeightedVariance = sum(w_i * (ratio_i - nodeWeightedMean)^2).
{code}
We use nodeWeightedVariance (instead of nodeDataDensity) to do comparison.  
Note that nodeWeightedMean is the same as idealStorage.



> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-10-05 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944286#comment-14944286
 ] 

Andrew Wang commented on HDFS-1312:
---

We discussed changing the default policy in HDFS-8538, Arpit wanted a failsafe 
to prevent overloading a disk with too many writes before we changed the 
default. Linking it here.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
> Attachments: disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-10-05 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944280#comment-14944280
 ] 

Aaron T. Myers commented on HDFS-1312:
--

bq. There are a large number of clusters which are using round-robin 
scheduling. I have been looking around for the data on HDFS-1804 (In fact the 
proposal discusses that issue). It will be good if you have some data on 
HDFS-1804 deployment. Most of the clusters that I am (anecdotal, I know) seeing 
are based on round robin scheduling. Please also see the thread I refer to in 
the proposal on linkedin, and you will see customers are also looking for this 
data.

Presumably that's mostly because the round-robin volume choosing policy is the 
default, and many users don't even know that there's an alternative. 
Independently of the need to implement an active balancer as this JIRA 
proposes, should we consider changing the default to the available space volume 
choosing policy? We'd probably need to do this in Hadoop 3.0, since I'd think 
this should be considered an incompatible change.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
> Attachments: disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-10-05 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944247#comment-14944247
 ] 

Anu Engineer commented on HDFS-1312:


[~templedf] Thanks for your comments, Please see my thoughts on your comments.

bq. Given HDFS-1804, I think [~steve_l]'s original proposal of a balancer 
script that can be run manually while the DN is offline sounds like a 
simpler/safer/better approach.  Since the primary remaining source of imbalance 
is disk failure, an offline process seems sensible.  What's the main motivation 
for building an online-balancer?

# We supported hot-swapping in HDFS-1362, so this feature compliments that. 
Off-line was a good idea when [~steve_l] proposed it , but I am not sure if it 
helps currently.
# There are a large number of clusters which are using round-robin scheduling. 
I have been looking around for the data on HDFS-1804 (In fact the proposal 
discusses that issue). It will be good if you have some data on HDFS-1804 
deployment. Most of the clusters that I am (anecdotal, I know)  seeing are 
based on round robin scheduling. Please also see the thread I refer to in the 
proposal on linkedin, and you will see customers are also looking for this data.

This has been a painful issue with multiple customers complaining about, and 
addressing that would improve the use of HDFS. Please look at section 2 of the 
proposal to see a large set of issues that customers have been complaining 
about, and how hard some of the workarounds are.

bq. The reporting aspect should perhaps be handled under HDFS-1121.

In order to have a tool that can do disk balancing , the user usually asks a 
simple question, "Which machines in my cluster needs rebalancing" since there 
is an I/O cost involved in running any disk balancing operation, and may times 
you need to do this proactively since there are times when disk go out of 
balance (see some comments earlier in this JIRA itself) even without failure.  
This proposal says that we will create a metric that describes ideal data 
distribution and how far a given node is from that ideal, which will allow us 
to compute a set of nodes that will benefit from disk balancing and how the 
actual balancing should look like (that is, what are the data movements that we 
are planning). I don't think HDFS-1121 is talking about this requirement. I do 
think being able to answer "which machines need disk balancing" makes for a 
good operational interface for HDFS.




> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
> Attachments: disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-10-05 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944172#comment-14944172
 ] 

Daniel Templeton commented on HDFS-1312:


The reporting aspect should perhaps be handled under HDFS-1121.

Given HDFS-1804, I think [~steve_l]'s original proposal of a balancer script 
that can be run manually while the DN is offline sounds like a 
simpler/safer/better approach.  Since the primary remaining source of imbalance 
is disk failure, an offline process seems sensible.  What's the main motivation 
for building an online-balancer?

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
> Attachments: disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-09-10 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740094#comment-14740094
 ] 

Anu Engineer commented on HDFS-1312:


[~dkaiser] Thanks for you comment. Without going into too much details, the 
answer is yes.  DiskBalancer is a simple data layout program that allows the 
end user to control how much data resides on each volume.


> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
> Attachments: disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-09-10 Thread David Kaiser (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740038#comment-14740038
 ] 

David Kaiser commented on HDFS-1312:


Anu, I would like to add a comment to the tool proposal.  Could an 
administrator specify that a particular volume (or list of volumes) end up with 
exactly zero blocks, effectively emptying that volume from block storage, say 
to move blocks to support the task of "decommissioning a drive".

Intent of this would be that all blocks from the specified volume would be 
moved, as proposed by the DiskBalancer logic to the other volumes so that the 
then-empty volume could be removed from the dfs.datanode.dir path mapping.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
> Attachments: disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-01-07 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267925#comment-14267925
 ] 

Dave Marion commented on HDFS-1312:
---

Note that as of 2.6.0 the layout has changed on the DataNode. Please see [1] 
for more information. I don't know if the tools mentioned above will work with 
these new restrictions.

[1] 
https://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F


> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2014-06-23 Thread Benoit Perroud (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041787#comment-14041787
 ] 

Benoit Perroud commented on HDFS-1312:
--

Here is a rebalancer we're using internally on small clusters: 
https://github.com/killerwhile/volume-balancer 

But from our experience, it's easier (and sometime faster) to wipe all the 
drives and run a balancer.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2013-09-01 Thread Kevin Lyda (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755855#comment-13755855
 ] 

Kevin Lyda commented on HDFS-1312:
--

I wrote the following in Java which... is not really my favourite language. 
However Hadoop is written in Java and if the code will ever make it into Hadoop 
proper that seems like an important requirement.

My particular cluster that I need this working on is a 1.0.3 cluster so my code 
is written for that. Hence ant, etc. It assumes a built 1.0.3 tree lives in 
../hadoop-common. Again, not a Java person so I know this isn't the greatest 
setup.

The code is here:

https://bitbucket.org/lyda/intranode-balance

I've run it on a test cluster but haven't tried it on "real" data yet. In the 
meantime pull requests are accepted / desired / encouraged / etc.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2013-07-03 Thread Michael Schmitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699345#comment-13699345
 ] 

Michael Schmitz commented on HDFS-1312:
---

I wrote this code to balance blocks on a single data node.  I'm not sure if 
it's the right approach at all--particularly with how I deal with 
subdirectories.  I wish there were better documentation on how to manually 
balance a disk.

https://github.com/schmmd/hadoop-balancer

My biggest problem with my code is it's way too slow and it requires the 
cluster to be down.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2013-04-03 Thread Kevin Lyda (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620835#comment-13620835
 ] 

Kevin Lyda commented on HDFS-1312:
--

Are there any decent code examples for how one might do this? I've never done 
hadoop dev so some pointers for where to start would be appreciated.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-10-19 Thread Steve Hoffman (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480055#comment-13480055
 ] 

Steve Hoffman commented on HDFS-1312:
-

Given the general nature of HDFS and its many uses (HBase, M/R, etc) as much as 
I'd like it to "just work", it is clear it always depends on the use.  Maybe 
one day we won't need a balancer script for disks (or for the cluster).

I'm totally OK with having a machine-level balancer script.  We use the HDFS 
balancer to fix inter-machine imbalances when they crop up (again, for a 
variety of reasons).  It makes sense to have a manual script for intra-machine 
imbalances for people who DO have issues and make it part of the standard 
install (like the HDFS balancer).

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-10-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479921#comment-13479921
 ] 

Steve Loughran commented on HDFS-1312:
--

@Kevin: loss of a single disk is an event that not only preserves the rest of 
the data on the server, the server keeps going. You get 1-3TB of network 
traffic as the underreplicated data is re-duplicated, but that's all. 

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-10-19 Thread Kevin Lyda (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479689#comment-13479689
 ] 

Kevin Lyda commented on HDFS-1312:
--

As an alternative, this issue could be avoided if clusters were configured with 
LVM and data striping. Well, not avoided, but with the problem pushed down a 
layer. However tuning docs discourage using LVM.

Besides performance, another obvious downside to putting a single fs across all 
disks is that a single disk failure would destroy the entire node as opposed to 
a portion of it. However I'm new to hadoop so am not sure how a partial data 
loss event is handled. If a data node w/ a partial data loss can be restored 
w/o copying in the data it has not lost, then avoiding striped LVM is a huge 
win. Particularly for the XX TB case mentioned above where balancing in a new 
node from scratch would involve a huge data transfer with time and network 
consequences.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-10-19 Thread Kevin Lyda (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479681#comment-13479681
 ] 

Kevin Lyda commented on HDFS-1312:
--

Continuing on Eli's comment, modifying the placement policy also fails to 
handle deletions.

I'm currently experiencing this on my cluster where the first datadir is both 
smaller and getting more of the data (for reasons I'm still trying to figure 
out - it might be due to how the machines were configured historically). The 
offline rebalance script sounds like a good first start.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-09-27 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465270#comment-13465270
 ] 

Eli Collins commented on HDFS-1312:
---

There are issues with just modifying the placement policy:

# It only solves the problem for new blocks. If you add a bunch of new disks 
you want to rebalance the cluster immediately to get better read throughput. 
And if you have to implement block balancing (eg on startup) then you don't 
need to modify the placement policy. 
# The internal policy intentionally avoids disk usage to optimize for 
performance (round robin'ing blocks across all spindles). As you point out in 
some cases this won't be much of a hit, but on a 12 disk machine where half the 
disks are new the impact will be noticeable.
# There are multiple placement policies now that they're pluggable, this 
requires every policy solve this problem vs just solving it once.

IMO a background process would actually be easier then modifying the placement 
policy. Just balancing on DN startup is simplest and would solve most people 
issues, though would require a rolling DN restart if you wanted to do it 
on-line.


> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-09-27 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465219#comment-13465219
 ] 

Scott Carey commented on HDFS-1312:
---

Isn't the datanode internal block placement policy an easier/simpler solution?

IMO if you simply placed blocks on disks based on the weight of free space 
available then this would not be a big issue.  You would always run out of 
space with all drives near the same capacity.  The drawback would be write 
performance bottlenecks in more extreme cases.

If you were 90% full on 11 drives and 100% empty on one, then ~50% of new 
blocks would go to the new drive (however, few reads would hit this drive) . 
That is not ideal for performance but not a big problem either since it should 
rapidly become more balanced.

In most situations, we would be talking about systems that have 3 to 11 drives 
that are 50% to 70% full and one empty drive.  This would lead to between ~17% 
and 55% of writes going to the drive instead of the 8% or 25% that would happen 
if round-robin.

IMO the default datanode block placement should be weighted towards disks with 
less space.  There are other cases besides disk failure that can lead to 
imbalanced space usage, including heterogeneous partition sizes.   That would 
mitigate the need for any complicated background rebalance tasks.  

Perhaps on start-up a datanode could optionally do some local rebalancing 
before joining the cluster.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-09-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464564#comment-13464564
 ] 

Steve Loughran commented on HDFS-1312:
--

Let's start with a python script that people can use while the DN is offline, 
as that would work against all versions. Moving as a background task would be 
much better -but I can imagine surprises (need to make sure block isn't in use 
locally, remotely, handling of very unbalanced disks (where disk #4 is a 50TB 
NFS mount, etc).

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-09-26 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464425#comment-13464425
 ] 

Eli Collins commented on HDFS-1312:
---

This one is on my radar, just haven't had time to get to it. The most common 
use cases for this I've seen is an admin adding new drives, or drives have 
mixed capacities so smaller drives fill faster. For this occasionally imbalance 
a DN startup option that rebalances the disks and shutsdown is probably 
sufficient. OTOH it would be pretty easy to have a background task that moves 
finalized blocks from heavy disks to lightly-loaded disks.


> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-09-26 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463642#comment-13463642
 ] 

Steve Loughran commented on HDFS-1312:
--

I don't think it's a wontfix, just that nobody has sat down to fix it. With the 
trend towards very storage intense servers (12-16x 3TBs), this problem has 
grown -and it does need fixing 

What does it take?
# tests
# solution
# patch submit and review.

It may be possible to implement the rebalance operation as some script (Please, 
not bash, something better like Python); ops could run this script after 
installing a new HDD. 

A way to assess HDD imbalance in a DN or across the cluster would be good too; 
that could pick up problems which manual/automated intervention could fix.

BTW,, not everyone splits MR and DFS onto separate disk partitions, as that 
remove flexibility and the ability to handle large-intermediate-output MR jobs. 


> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-09-25 Thread Steve Hoffman (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463168#comment-13463168
 ] 

Steve Hoffman commented on HDFS-1312:
-

bq. Yes, I think this should be fixed.
This was my original question really.  Since it hasn't made the cut in over 2 
years, I was wondering what it would take to either do something with this or 
should it be closed it as a "won't fix" with script/documentation support for 
the admins?

bq. No, I don't think this is as big of an issue as most people think.
Basically, I agree with you.  There are worse things that can go wrong.

bq. At 70-80% full, you start to run the risk that the NN is going to have 
trouble placing blocks, esp if . Also, if you are like most places and put the 
MR spill space on the same file system as HDFS, that 70-80% is more like 100%, 
especially if you don't clean up after MR. (Thus why I always put MR area on a 
separate file system...)
Agreed.  More getting installed Friday.  Just don't want bad timing/luck to be 
a factor here -- and we do clean up after the MR.

bq. As you scale, you care less about the health of individual nodes and more 
about total framework health.
Sorry, have to disagree here.  The total framework is made up of the parts.  
While I agree there is enough redundancy built in to handle most cases once 
your node count gets above a certain level, you are basically saying it doesn't 
have to work well in all cases because more $ can be thrown at it.

bq. 1PB isn't that big. At 12 drives per node, we're looking at ~50-60 nodes.
Our cluster is storage dense yes, so a loss of 1 node is noticeable.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-09-25 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463005#comment-13463005
 ] 

Allen Wittenauer commented on HDFS-1312:


Since someone (off-jira) asked:

* Yes, I think this should be fixed.
* No, I don't think this is as big of an issue as most people think.
* At 70-80% full, you start to run the risk that the NN is going to have 
trouble placing blocks, esp if .  Also, if you are like most places and put the 
MR spill space on the same file system as HDFS, that 70-80% is more like 100%, 
especially if you don't clean up after MR.  (Thus why I always put MR area on a 
separate file system...)
* As you scale, you care less about the health of individual nodes and more 
about total framework health. 
* 1PB isn't that big.  At 12 drives per node, we're looking at ~50-60 nodes.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-09-25 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462973#comment-13462973
 ] 

Allen Wittenauer commented on HDFS-1312:


bq. We typically run at 70-80% full as we are not made of money.

No, you aren't made of money, but you are under-provisioned. 

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-09-25 Thread Steve Hoffman (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462960#comment-13462960
 ] 

Steve Hoffman commented on HDFS-1312:
-

{quote}
The other thing is that as you grow a grid, you care less and less about the 
balance on individual nodes. This issue is of primary important to smaller 
installations who likely are under-provisioned hardware-wise anyway.
{quote}
Our installation is about 1PB so I think we can say we are past "small".  We 
typically run at 70-80% full as we are not made of money.  And at 90% the disk 
alarms start waking people out of bed.
I would say we very much care about the balance of a single node.  When that 
node fills, it'll take out the region server, the M/R jobs running on it and 
generally anger people who's jobs have to be restarted.

I wouldn't be so quick to discount this.  And when you have enough machines, 
you are replacing disks more and more frequently.  So ANY manual process is $ 
wasted in people time.  Time to re-run jobs, times to take down datanode and 
move blocks.  Time = $.  To turn Hadoop into a more mature product, shouldn't 
we be striving for "it just works"?

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-09-25 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462951#comment-13462951
 ] 

Andrew Purtell commented on HDFS-1312:
--

It might be worth adding to the manual a note that after adding or replacing 
drives on a DataNode, when it's temporarily offline anyway, that blocks and 
their associated metadata file both can be moved to any defined data directory 
for local rebalancing?

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-09-25 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462927#comment-13462927
 ] 

Allen Wittenauer commented on HDFS-1312:


bq. The only way we have found to move blocks internally (without taking the 
cluster down completely) is to decommission the node and have it empty and then 
re-add it to the cluster so the balancer can take over and move block back onto 
it.

That's the slow way.  You can also take down the data node, move the blocks 
around on the disk, restart the data node.

The other thing is that as you grow a grid, you care less and less about the 
balance on individual nodes.  This issue is of primary important to smaller 
installations who likely are under-provisioned hardware-wise anyway.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2012-09-25 Thread Steve Hoffman (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462916#comment-13462916
 ] 

Steve Hoffman commented on HDFS-1312:
-

Wow, I can't believe this is still lingering out there as a new feature 
request.  I'd argue this is a bug -- and a big one.  Here's why:

* You have Nx 12x3TB machines in your cluster.
* 1 disk fails on 12 drive machine.  Let's say they each were 80% full.
* You install the replacement drive (0% full), but by the time you do this the 
under-replicated blocks have been fixed (on this and other nodes)
* The 0% full drive will fill at the same rate as the blocks on the other 
disks.  That machine's other 11 disks will fill to 100% as the block placement 
is at a node level and the node seems to use a round-robin algorithm even 
though there is more space.

The only way we have found to move blocks internally (without taking the 
cluster down completely) is to decommission the node and have it empty and then 
re-add it to the cluster so the balancer can take over and move block back onto 
it.

Hard drives fail.  This isn't news to anybody.  The larger (12 disk) nodes only 
make the problem worse in time to empty and fill again.  Even if you had a 1U 4 
disk machine it is still bad 'cause you lose 25% of your capacity on 1 disk 
failure where the impact of the 12 disk machine is less than 9%. 

The remove/add of a complete node seems like a pretty poor option.
Or am I alone in this?  Can we please revive this JIRA?

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HDFS-1312) Re-balance disks within a Datanode

2011-03-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007403#comment-13007403
 ] 

Steve Loughran commented on HDFS-1312:
--

Another driver for rebalancing is a cluster set up with mapred temp space 
allocated to the same partitions as HDFS. This delivers best IO rate for temp 
storage, but once that temp space is reclaimed, the disks can get unbalanced.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HDFS-1312) Re-balance disks within a Datanode

2011-03-15 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007204#comment-13007204
 ] 

Sanjay Radia commented on HDFS-1312:


>> From my point, there are mainly two cases of re-balance:
>There's a third, and one I suspect is more common than people realize: mass 
>deletes. 
Since the creates are distributed, there is high probability that deletes will 
be distributed.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HDFS-1312) Re-balance disks within a Datanode

2011-03-10 Thread Wang Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005542#comment-13005542
 ] 

Wang Xu commented on HDFS-1312:
---

Hi Allen,

For the jsp that showing local filesystem, I will comments on HDFS-1121 for 
detail requirements

And to simplify the design, I agree that we could move finalized blocks only 
and do not touch existed dirs.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HDFS-1312) Re-balance disks within a Datanode

2011-03-04 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002509#comment-13002509
 ] 

Allen Wittenauer commented on HDFS-1312:


>For the cluster monitoring issue, we could not expect an integrated monitoring 
> substitute the external monitoring system. Thus, we need to make clear
> what information gathering requirements should be considered inside hdfs.

There is nothing preventing us from building a jsp that runs on the datanode 
that shows the file systems in use and the relevant stats for those fs's.  

>And the "lock" I mentions is to stop write new blocks into the volume,
> which is the simplest way to migrate blocks. 

This is going to be a big performance hit.  I think the focus should be on 
already written/closed blocks and just ignore new blocks.


> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1312) Re-balance disks within a Datanode

2011-03-03 Thread Wang Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002437#comment-13002437
 ] 

Wang Xu commented on HDFS-1312:
---

Hi Allen,

For the cluster monitoring issue, we could not expect an integrated monitoring 
substitute the external monitoring system. Thus, we need to make clear what 
information gathering requirements should be considered inside hdfs.

And the "lock" I mentions is to stop write new blocks into the volume, which is 
the simplest way to migrate blocks. And I do think the depth is matter, from my 
understanding, the deeper the dir is, the more io will be wasted to reach the 
block.  

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1312) Re-balance disks within a Datanode

2011-03-02 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001765#comment-13001765
 ] 

Allen Wittenauer commented on HDFS-1312:


> IMHO, you can monitor the file distribution among disks with external tools 
> such as ganglia, is it required to integrate it in the Web interface of HDFS?

Yes.  First off, Ganglia sucks at large scales for too many reasons to go into 
here.  Secondly, only Hadoop really knows what file systems are in play for 
HDFS.

> From my point, there are mainly two cases of re-balance:

There's a third, and one I suspect is more common than people realize:  mass 
deletes.  

> Re-balance should be only process while it is not in heavy load 
> (should this be guaranteed by the administrator?)

and

> Lock origin disks: stop written to them and wait finalization on them.

I don't think it is realistic to expect the system to be idle during a 
rebalance.  Ultimately, it shouldn't matter from the rebalancer's perspective 
anyway; the only performance hit that should be noticeable would be for blocks 
in the middle of being moved.  Even then, the DN process knows what blocks are 
where and can read from the 'old' location.

If DN's being idle are a requirement, then one is better off just shutting down 
the DN (and TT?) processes and doing them offline.

> Find the deepest dirs in every selected disk and move blocks from those dirs.
> And if a dir is empty, then the dir should also be removed.

Why does depth matter?

> If two or more dirs are located in a same disk, they might confuse the 
> space calculation. And this is just the case in MiniDFSCluster deployment.

This is also the case with pooled storage systems (such as ZFS) on real 
clusters already.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1312) Re-balance disks within a Datanode

2011-03-02 Thread Wang Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001477#comment-13001477
 ] 

Wang Xu commented on HDFS-1312:
---

Hi folks,

Here is the basic design of the process. Is there any other consideration?

The basic flow is:
# Re-balance should be only process while it is not in heavy load (should this 
be guaranteed by the administrator?)
# Calculate the total and average available & used space of dirs.
# Find the disks have most and least space, and decide move direction. We need 
define a unbalance threshold here to decide whether it is worthy to re-balance.
# Lock origin disks: stop written to them and wait finalization on them.
# Find the deepest dirs in every selected disk and move blocks from those dirs. 
And if a dir is empty, then the dir should also be removed.
# Check the balance status while the blocks are migrated, and break from the 
loop if it reaches a threshold.
# Release the lock.

The case should be take into account:
* If a disk have much less space than other disks, it might have least 
available space, but could not migrate blocks out.
* If two or more dirs are located in a same disk, they might confuse the space 
calculation. And this is just the case in MiniDFSCluster deployment.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1312) Re-balance disks within a Datanode

2011-02-21 Thread Wang Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997302#comment-12997302
 ] 

Wang Xu commented on HDFS-1312:
---

Hi Steve,

>From my point, there are mainly two cases of re-balance:
# The admin replace a disk, and then trigger the re-balance at once or in 
specific time.
# The monitoring system observed disk space distribution issue, and rise a 
warning; and then the admin trigger the re-balance at once or in specific time.

Do you mean we should integrate a monitoring system inside HDFS?

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1312) Re-balance disks within a Datanode

2011-02-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996842#comment-12996842
 ] 

Steve Loughran commented on HDFS-1312:
--

I think having a remote web view is useful in two ways
-lets people see the basics of what is going in within the entire cluster (yes, 
that will need some aggregation eventually)
-lets you write tests that hit the status pages and so verify that the 
rebalancing worked. 

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1312) Re-balance disks within a Datanode

2011-02-18 Thread Wang Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996531#comment-12996531
 ] 

Wang Xu commented on HDFS-1312:
---

and @Steve, I did not intend to extend HDFS-1362 to cover this issue, and I 
only mean they are related.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1312) Re-balance disks within a Datanode

2011-02-18 Thread Wang Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996525#comment-12996525
 ] 

Wang Xu commented on HDFS-1312:
---

Hi Steve,

I have not understood the nessesary of HDFS-1121. IMHO, you can monitor the 
file distribution among disks with external tools such as ganglia, is it 
required to integrate it in the Web interface of HDFS?

I think the regular routine is finding the problem in cluster management system 
and then trigger the rebalance action in HDFS. 

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1312) Re-balance disks within a Datanode

2011-02-18 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996487#comment-12996487
 ] 

Steve Loughran commented on HDFS-1312:
--

HDFS-1362 and this issue are part of the HDFS-664 problem "support efficient 
hotswap". 
Before worrying about this one, consider HDFS-1121, which is provide a way to 
monitor the distribution (i.e. web view). That web/management view would be how 
we'd test the rebalancing works, so its a pre-req. Also it's best to keep the 
issues independent (where possible), so worry about getting HDFS-1362 in first 
before trying to extend it. 

That said, because the #of HDDs/server is growing to 12 or more 2TB/unit, with 
3TB on the horizon, we will need this feature in the 0.23-0.24 timeframe.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1312) Re-balance disks within a Datanode

2011-02-17 Thread Wang Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996274#comment-12996274
 ] 

Wang Xu commented on HDFS-1312:
---

I think re-balance is just the next step of substitute a failed disk as in 
HDFS-1362. And since I've just finished the patch of HDFS-1362, I would like to 
implement this feature if no one have done it.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Reporter: Travis Crawford
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira