[jira] [Commented] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes

2012-03-31 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243059#comment-13243059
 ] 

Uma Maheswara Rao G commented on HDFS-3070:
---

Aaron, You are right. We have seen this yesterday and realized it. :-)
Before Federation we might not have the requirement of loading properties from 
hdfs-site.xml in balancer, some might have proceeded with default values set in 
the code. Becaus eConfiguration can load core-site.xml files.

I agree with the fix that creating the HdfsConfiguration class and passing.

To catch this bug in tests itself, I would suggest to call the runBalancerCLI( 
expose new API from Balancer with package scope) and make the run method 
private.
{code}
static int runBalancerCLI(String[] args) throws Exception {
return ToolRunner.run(null, new Cli(), args); //Here you have to fix
  }
{code}

let main method and all tests call this function.


output from tests :

{quote}
2012-03-31 12:19:47,340 INFO  balancer.Balancer (Balancer.java:parse(1508)) - 
Using a threshold of 10.0
2012-03-31 12:19:47,340 INFO  balancer.Balancer (Balancer.java:run(1387)) - 
namenodes = []
2012-03-31 12:19:47,340 INFO  balancer.Balancer (Balancer.java:run(1388)) - p   
  = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0]
Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
Bytes Being Moved
Balancing took 1.0 milliseconds
2012-03-31 12:19:47,341 INFO  balancer.Balancer 
(TestBalancerWithMultipleNameNodes.java:runBalancer(164)) - BALANCER 2
2012-03-31 12:19:47,341 INFO  balancer.Balancer 
(TestBalancerWithMultipleNameNodes.java:wait(132)) - WAIT 
expectedUsedSpace=350, expectedTotalSpace=1000
2012-03-31 12:19:47,341 INFO  balancer.Balancer 
(TestBalancerWithMultipleNameNodes.java:runBalancer(166)) - BALANCER 3
2012-03-31 12:19:47,342 WARN  balancer.Balancer 
(TestBalancerWithMultipleNameNodes.java:runBalancer(183)) - datanodes[0]: 
getDfsUsed()=60, getCapacity()=500
2012-03-31 12:19:47,343 WARN  balancer.Balancer 
(TestBalancerWithMultipleNameNodes.java:runBalancer(183)) - datanodes[1]: 
getDfsUsed()=290, getCapacity()=500
2012-03-31 12:19:47,344 WARN  balancer.Balancer 
(TestBalancerWithMultipleNameNodes.java:runBalancer(200)) - datanodes 1 is not 
yet balanced: used=290, cap=500, avg=35.0
{quote}

Remove HdfsConfiguration object creation from Balancer Tests. 

 hdfs balancer doesn't balance blocks between datanodes
 --

 Key: HDFS-3070
 URL: https://issues.apache.org/jira/browse/HDFS-3070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.0.0
Reporter: Stephen Chu
Assignee: Aaron T. Myers
 Attachments: HDFS-3070.patch, unbalanced_nodes.png, 
 unbalanced_nodes_inservice.png


 I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
 both have over 3% disk usage.
 Attached is a screenshot of the Live Nodes web UI.
 On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
 the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
 styx02 stay put).
 HA is currently enabled.
 [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
 active
 [schu@styx01 ~]$ hdfs balancer -threshold 1
 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
 12/03/08 10:10:32 INFO balancer.Balancer: p = 
 Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 Balancing took 95.0 milliseconds
 [schu@styx01 ~]$ 
 I believe with a threshold of 1% the balancer should trigger blocks being 
 moved across DataNodes, right? I am curious about the namenode = [] from 
 the above output.
 [schu@styx01 ~]$ hadoop version
 Hadoop 0.24.0-SNAPSHOT
 Subversion 
 git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
  -r f6a577d697bbcd04ffbc568167c97b79479ff319
 Compiled by schu on Thu Mar  8 15:32:50 PST 2012
 From source with checksum ec971a6e7316f7fbf471b617905856b8
 From 
 http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
 The threshold parameter is a fraction in the range of (0%, 100%) with a 
 default value of 10%. The threshold sets a target for whether the cluster is 
 balanced. A cluster is balanced if for each datanode, the utilization of the 
 node (ratio of used space at the node to total capacity of the node) differs 
 from the utilization of the (ratio of used space in the cluster to total 
 capacity of the cluster) by no more than the threshold value. The smaller the 
 threshold, the more balanced a cluster will become. It takes more time to run 
 the balancer for 

[jira] [Commented] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes

2012-03-31 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243063#comment-13243063
 ] 

Uma Maheswara Rao G commented on HDFS-3070:
---

BTW, could you please edit the issue title?

 hdfs balancer doesn't balance blocks between datanodes
 --

 Key: HDFS-3070
 URL: https://issues.apache.org/jira/browse/HDFS-3070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.0.0
Reporter: Stephen Chu
Assignee: Aaron T. Myers
 Attachments: HDFS-3070.patch, unbalanced_nodes.png, 
 unbalanced_nodes_inservice.png


 I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
 both have over 3% disk usage.
 Attached is a screenshot of the Live Nodes web UI.
 On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
 the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
 styx02 stay put).
 HA is currently enabled.
 [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
 active
 [schu@styx01 ~]$ hdfs balancer -threshold 1
 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
 12/03/08 10:10:32 INFO balancer.Balancer: p = 
 Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 Balancing took 95.0 milliseconds
 [schu@styx01 ~]$ 
 I believe with a threshold of 1% the balancer should trigger blocks being 
 moved across DataNodes, right? I am curious about the namenode = [] from 
 the above output.
 [schu@styx01 ~]$ hadoop version
 Hadoop 0.24.0-SNAPSHOT
 Subversion 
 git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
  -r f6a577d697bbcd04ffbc568167c97b79479ff319
 Compiled by schu on Thu Mar  8 15:32:50 PST 2012
 From source with checksum ec971a6e7316f7fbf471b617905856b8
 From 
 http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
 The threshold parameter is a fraction in the range of (0%, 100%) with a 
 default value of 10%. The threshold sets a target for whether the cluster is 
 balanced. A cluster is balanced if for each datanode, the utilization of the 
 node (ratio of used space at the node to total capacity of the node) differs 
 from the utilization of the (ratio of used space in the cluster to total 
 capacity of the cluster) by no more than the threshold value. The smaller the 
 threshold, the more balanced a cluster will become. It takes more time to run 
 the balancer for small threshold values. Also for a very small threshold the 
 cluster may not be able to reach the balanced state when applications write 
 and delete files concurrently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes

2012-03-31 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243067#comment-13243067
 ] 

Aaron T. Myers commented on HDFS-3070:
--

Hi Uma,

bq. To catch this bug in tests itself, I would suggest to call the 
runBalancerCLI...

I don't think this will actually expose the bug. The trouble isn't that the 
object isn't an instance of HdfsConfiguration, but rather that 
HdfsConfiguration never gets class-loaded and therefore the static initializer 
that add hdfs-default.xml and hdfs-site.xml as resources never gets called. 
Another perfectly valid solution would have been to continue to pass null for 
the configuration object, but to call HdfsConfiguration#init() somewhere 
(anywhere) in the Balancer. So, the only way to write a test that would catch 
this would be if from the tests we forked a new JVM to run the balancer, and 
examining the effects. Doing that doesn't seem worth it to me, for something 
that's such a simple bug.


bq. BTW, could you please edit the issue title?

Good idea. Will do.

 hdfs balancer doesn't balance blocks between datanodes
 --

 Key: HDFS-3070
 URL: https://issues.apache.org/jira/browse/HDFS-3070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.0.0
Reporter: Stephen Chu
Assignee: Aaron T. Myers
 Attachments: HDFS-3070.patch, unbalanced_nodes.png, 
 unbalanced_nodes_inservice.png


 I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
 both have over 3% disk usage.
 Attached is a screenshot of the Live Nodes web UI.
 On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
 the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
 styx02 stay put).
 HA is currently enabled.
 [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
 active
 [schu@styx01 ~]$ hdfs balancer -threshold 1
 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
 12/03/08 10:10:32 INFO balancer.Balancer: p = 
 Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 Balancing took 95.0 milliseconds
 [schu@styx01 ~]$ 
 I believe with a threshold of 1% the balancer should trigger blocks being 
 moved across DataNodes, right? I am curious about the namenode = [] from 
 the above output.
 [schu@styx01 ~]$ hadoop version
 Hadoop 0.24.0-SNAPSHOT
 Subversion 
 git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
  -r f6a577d697bbcd04ffbc568167c97b79479ff319
 Compiled by schu on Thu Mar  8 15:32:50 PST 2012
 From source with checksum ec971a6e7316f7fbf471b617905856b8
 From 
 http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
 The threshold parameter is a fraction in the range of (0%, 100%) with a 
 default value of 10%. The threshold sets a target for whether the cluster is 
 balanced. A cluster is balanced if for each datanode, the utilization of the 
 node (ratio of used space at the node to total capacity of the node) differs 
 from the utilization of the (ratio of used space in the cluster to total 
 capacity of the cluster) by no more than the threshold value. The smaller the 
 threshold, the more balanced a cluster will become. It takes more time to run 
 the balancer for small threshold values. Also for a very small threshold the 
 cluster may not be able to reach the balanced state when applications write 
 and delete files concurrently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes

2012-03-30 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242987#comment-13242987
 ] 

Hadoop QA commented on HDFS-3070:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12520710/HDFS-3070.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2136//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2136//console

This message is automatically generated.

 hdfs balancer doesn't balance blocks between datanodes
 --

 Key: HDFS-3070
 URL: https://issues.apache.org/jira/browse/HDFS-3070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.0.0
Reporter: Stephen Chu
Assignee: Aaron T. Myers
 Attachments: HDFS-3070.patch, unbalanced_nodes.png, 
 unbalanced_nodes_inservice.png


 I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
 both have over 3% disk usage.
 Attached is a screenshot of the Live Nodes web UI.
 On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
 the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
 styx02 stay put).
 HA is currently enabled.
 [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
 active
 [schu@styx01 ~]$ hdfs balancer -threshold 1
 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
 12/03/08 10:10:32 INFO balancer.Balancer: p = 
 Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 Balancing took 95.0 milliseconds
 [schu@styx01 ~]$ 
 I believe with a threshold of 1% the balancer should trigger blocks being 
 moved across DataNodes, right? I am curious about the namenode = [] from 
 the above output.
 [schu@styx01 ~]$ hadoop version
 Hadoop 0.24.0-SNAPSHOT
 Subversion 
 git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
  -r f6a577d697bbcd04ffbc568167c97b79479ff319
 Compiled by schu on Thu Mar  8 15:32:50 PST 2012
 From source with checksum ec971a6e7316f7fbf471b617905856b8
 From 
 http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
 The threshold parameter is a fraction in the range of (0%, 100%) with a 
 default value of 10%. The threshold sets a target for whether the cluster is 
 balanced. A cluster is balanced if for each datanode, the utilization of the 
 node (ratio of used space at the node to total capacity of the node) differs 
 from the utilization of the (ratio of used space in the cluster to total 
 capacity of the cluster) by no more than the threshold value. The smaller the 
 threshold, the more balanced a cluster will become. It takes more time to run 
 the balancer for small threshold values. Also for a very small threshold the 
 cluster may not be able to reach the balanced state when applications write 
 and delete files concurrently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes

2012-03-12 Thread Stephen Chu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227770#comment-13227770
 ] 

Stephen Chu commented on HDFS-3070:
---

Eli, servicerpc-address was not configured in hdfs-site.xml.

dfs.namenode.rpc-address:
{noformat}
  property
namedfs.namenode.rpc-address.ha-nn-uri.nn1/name
valuestyx01.sf.cloudera.com:12020/value
  /property
  property
namedfs.namenode.rpc-address.ha-nn-uri.nn2/name
valuestyx02.sf.cloudera.com:12020/value
  /property
{noformat}


 hdfs balancer doesn't balance blocks between datanodes
 --

 Key: HDFS-3070
 URL: https://issues.apache.org/jira/browse/HDFS-3070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 0.24.0
Reporter: Stephen Chu
 Attachments: unbalanced_nodes.png, unbalanced_nodes_inservice.png


 I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
 both have over 3% disk usage.
 Attached is a screenshot of the Live Nodes web UI.
 On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
 the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
 styx02 stay put).
 HA is currently enabled.
 [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
 active
 [schu@styx01 ~]$ hdfs balancer -threshold 1
 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
 12/03/08 10:10:32 INFO balancer.Balancer: p = 
 Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 Balancing took 95.0 milliseconds
 [schu@styx01 ~]$ 
 I believe with a threshold of 1% the balancer should trigger blocks being 
 moved across DataNodes, right? I am curious about the namenode = [] from 
 the above output.
 [schu@styx01 ~]$ hadoop version
 Hadoop 0.24.0-SNAPSHOT
 Subversion 
 git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
  -r f6a577d697bbcd04ffbc568167c97b79479ff319
 Compiled by schu on Thu Mar  8 15:32:50 PST 2012
 From source with checksum ec971a6e7316f7fbf471b617905856b8
 From 
 http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
 The threshold parameter is a fraction in the range of (0%, 100%) with a 
 default value of 10%. The threshold sets a target for whether the cluster is 
 balanced. A cluster is balanced if for each datanode, the utilization of the 
 node (ratio of used space at the node to total capacity of the node) differs 
 from the utilization of the (ratio of used space in the cluster to total 
 capacity of the cluster) by no more than the threshold value. The smaller the 
 threshold, the more balanced a cluster will become. It takes more time to run 
 the balancer for small threshold values. Also for a very small threshold the 
 cluster may not be able to reach the balanced state when applications write 
 and delete files concurrently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes

2012-03-12 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227775#comment-13227775
 ] 

Aaron T. Myers commented on HDFS-3070:
--

If the lack of having servicerpc-address configured caused this, then I would 
still consider that a bug. The balancer should work even if only the normal NN 
RPC address is configured.

 hdfs balancer doesn't balance blocks between datanodes
 --

 Key: HDFS-3070
 URL: https://issues.apache.org/jira/browse/HDFS-3070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 0.24.0
Reporter: Stephen Chu
 Attachments: unbalanced_nodes.png, unbalanced_nodes_inservice.png


 I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
 both have over 3% disk usage.
 Attached is a screenshot of the Live Nodes web UI.
 On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
 the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
 styx02 stay put).
 HA is currently enabled.
 [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
 active
 [schu@styx01 ~]$ hdfs balancer -threshold 1
 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
 12/03/08 10:10:32 INFO balancer.Balancer: p = 
 Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 Balancing took 95.0 milliseconds
 [schu@styx01 ~]$ 
 I believe with a threshold of 1% the balancer should trigger blocks being 
 moved across DataNodes, right? I am curious about the namenode = [] from 
 the above output.
 [schu@styx01 ~]$ hadoop version
 Hadoop 0.24.0-SNAPSHOT
 Subversion 
 git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
  -r f6a577d697bbcd04ffbc568167c97b79479ff319
 Compiled by schu on Thu Mar  8 15:32:50 PST 2012
 From source with checksum ec971a6e7316f7fbf471b617905856b8
 From 
 http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
 The threshold parameter is a fraction in the range of (0%, 100%) with a 
 default value of 10%. The threshold sets a target for whether the cluster is 
 balanced. A cluster is balanced if for each datanode, the utilization of the 
 node (ratio of used space at the node to total capacity of the node) differs 
 from the utilization of the (ratio of used space in the cluster to total 
 capacity of the cluster) by no more than the threshold value. The smaller the 
 threshold, the more balanced a cluster will become. It takes more time to run 
 the balancer for small threshold values. Also for a very small threshold the 
 cluster may not be able to reach the balanced state when applications write 
 and delete files concurrently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes

2012-03-12 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228068#comment-13228068
 ] 

Eli Collins commented on HDFS-3070:
---

Yea sounds like a bug in the method the balancer uses to determine the 
namenodes.

 hdfs balancer doesn't balance blocks between datanodes
 --

 Key: HDFS-3070
 URL: https://issues.apache.org/jira/browse/HDFS-3070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 0.24.0
Reporter: Stephen Chu
 Attachments: unbalanced_nodes.png, unbalanced_nodes_inservice.png


 I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
 both have over 3% disk usage.
 Attached is a screenshot of the Live Nodes web UI.
 On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
 the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
 styx02 stay put).
 HA is currently enabled.
 [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
 active
 [schu@styx01 ~]$ hdfs balancer -threshold 1
 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
 12/03/08 10:10:32 INFO balancer.Balancer: p = 
 Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 Balancing took 95.0 milliseconds
 [schu@styx01 ~]$ 
 I believe with a threshold of 1% the balancer should trigger blocks being 
 moved across DataNodes, right? I am curious about the namenode = [] from 
 the above output.
 [schu@styx01 ~]$ hadoop version
 Hadoop 0.24.0-SNAPSHOT
 Subversion 
 git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
  -r f6a577d697bbcd04ffbc568167c97b79479ff319
 Compiled by schu on Thu Mar  8 15:32:50 PST 2012
 From source with checksum ec971a6e7316f7fbf471b617905856b8
 From 
 http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
 The threshold parameter is a fraction in the range of (0%, 100%) with a 
 default value of 10%. The threshold sets a target for whether the cluster is 
 balanced. A cluster is balanced if for each datanode, the utilization of the 
 node (ratio of used space at the node to total capacity of the node) differs 
 from the utilization of the (ratio of used space in the cluster to total 
 capacity of the cluster) by no more than the threshold value. The smaller the 
 threshold, the more balanced a cluster will become. It takes more time to run 
 the balancer for small threshold values. Also for a very small threshold the 
 cluster may not be able to reach the balanced state when applications write 
 and delete files concurrently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes

2012-03-09 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226590#comment-13226590
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3070:
--

 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []

The namenode lists is empty.  You have to set dfs.namenode.servicerpc-address.

 hdfs balancer doesn't balance blocks between datanodes
 --

 Key: HDFS-3070
 URL: https://issues.apache.org/jira/browse/HDFS-3070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 0.24.0
Reporter: Stephen Chu
 Attachments: unbalanced_nodes.png, unbalanced_nodes_inservice.png


 I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
 both have over 3% disk usage.
 Attached is a screenshot of the Live Nodes web UI.
 On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
 the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
 styx02 stay put).
 HA is currently enabled.
 [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
 active
 [schu@styx01 ~]$ hdfs balancer -threshold 1
 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
 12/03/08 10:10:32 INFO balancer.Balancer: p = 
 Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 Balancing took 95.0 milliseconds
 [schu@styx01 ~]$ 
 I believe with a threshold of 1% the balancer should trigger blocks being 
 moved across DataNodes, right? I am curious about the namenode = [] from 
 the above output.
 [schu@styx01 ~]$ hadoop version
 Hadoop 0.24.0-SNAPSHOT
 Subversion 
 git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
  -r f6a577d697bbcd04ffbc568167c97b79479ff319
 Compiled by schu on Thu Mar  8 15:32:50 PST 2012
 From source with checksum ec971a6e7316f7fbf471b617905856b8
 From 
 http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
 The threshold parameter is a fraction in the range of (0%, 100%) with a 
 default value of 10%. The threshold sets a target for whether the cluster is 
 balanced. A cluster is balanced if for each datanode, the utilization of the 
 node (ratio of used space at the node to total capacity of the node) differs 
 from the utilization of the (ratio of used space in the cluster to total 
 capacity of the cluster) by no more than the threshold value. The smaller the 
 threshold, the more balanced a cluster will become. It takes more time to run 
 the balancer for small threshold values. Also for a very small threshold the 
 cluster may not be able to reach the balanced state when applications write 
 and delete files concurrently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes

2012-03-09 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226599#comment-13226599
 ] 

Eli Collins commented on HDFS-3070:
---

Stephen,
What are dfs.namenode.rpc-address and servicerpc-address set to in the configs?

I suspect at least the 1st is set so it might be a bug in the method the 
balancer uses to determine the namenodes (eg doesn't work for a federated or HA 
conf).

 hdfs balancer doesn't balance blocks between datanodes
 --

 Key: HDFS-3070
 URL: https://issues.apache.org/jira/browse/HDFS-3070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 0.24.0
Reporter: Stephen Chu
 Attachments: unbalanced_nodes.png, unbalanced_nodes_inservice.png


 I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
 both have over 3% disk usage.
 Attached is a screenshot of the Live Nodes web UI.
 On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
 the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
 styx02 stay put).
 HA is currently enabled.
 [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
 active
 [schu@styx01 ~]$ hdfs balancer -threshold 1
 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
 12/03/08 10:10:32 INFO balancer.Balancer: p = 
 Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 Balancing took 95.0 milliseconds
 [schu@styx01 ~]$ 
 I believe with a threshold of 1% the balancer should trigger blocks being 
 moved across DataNodes, right? I am curious about the namenode = [] from 
 the above output.
 [schu@styx01 ~]$ hadoop version
 Hadoop 0.24.0-SNAPSHOT
 Subversion 
 git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
  -r f6a577d697bbcd04ffbc568167c97b79479ff319
 Compiled by schu on Thu Mar  8 15:32:50 PST 2012
 From source with checksum ec971a6e7316f7fbf471b617905856b8
 From 
 http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
 The threshold parameter is a fraction in the range of (0%, 100%) with a 
 default value of 10%. The threshold sets a target for whether the cluster is 
 balanced. A cluster is balanced if for each datanode, the utilization of the 
 node (ratio of used space at the node to total capacity of the node) differs 
 from the utilization of the (ratio of used space in the cluster to total 
 capacity of the cluster) by no more than the threshold value. The smaller the 
 threshold, the more balanced a cluster will become. It takes more time to run 
 the balancer for small threshold values. Also for a very small threshold the 
 cluster may not be able to reach the balanced state when applications write 
 and delete files concurrently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira