[jira] [Updated] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes

2012-03-30 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3070:
-

Attachment: HDFS-3070.patch

Sigh. Looks like this problem is the classic hdfs-site.xml happens to never 
get loaded because HdfsConfiguration is never statically initialized in the 
JVM issue. The tests don't catch this because MiniDFSCluster sets up the 
configuration explicitly, without hdfs-site.xml having to get loaded.

Here's a patch which addresses the issue. I tested this manually and confirmed 
that without the fix, the balancer won't run, but with the fix it runs just 
fine. Sample output:

{noformat}
12/03/30 19:06:08 INFO balancer.Balancer: namenodes = [hdfs://ha-nn-uri]
12/03/30 19:06:08 INFO balancer.Balancer: p = 
Balancer.Parameters[BalancingPolicy.Node, threshold=10.0]
Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
Bytes Being Moved
12/03/30 19:06:09 INFO net.NetworkTopology: Adding a new node: 
/default-rack/172.29.20.100:50010
12/03/30 19:06:09 INFO balancer.Balancer: 0 over-utilized: []
12/03/30 19:06:09 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
Balancing took 1.255 seconds
{noformat}

 hdfs balancer doesn't balance blocks between datanodes
 --

 Key: HDFS-3070
 URL: https://issues.apache.org/jira/browse/HDFS-3070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.0.0
Reporter: Stephen Chu
Assignee: Aaron T. Myers
 Attachments: HDFS-3070.patch, unbalanced_nodes.png, 
 unbalanced_nodes_inservice.png


 I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
 both have over 3% disk usage.
 Attached is a screenshot of the Live Nodes web UI.
 On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
 the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
 styx02 stay put).
 HA is currently enabled.
 [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
 active
 [schu@styx01 ~]$ hdfs balancer -threshold 1
 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
 12/03/08 10:10:32 INFO balancer.Balancer: p = 
 Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 Balancing took 95.0 milliseconds
 [schu@styx01 ~]$ 
 I believe with a threshold of 1% the balancer should trigger blocks being 
 moved across DataNodes, right? I am curious about the namenode = [] from 
 the above output.
 [schu@styx01 ~]$ hadoop version
 Hadoop 0.24.0-SNAPSHOT
 Subversion 
 git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
  -r f6a577d697bbcd04ffbc568167c97b79479ff319
 Compiled by schu on Thu Mar  8 15:32:50 PST 2012
 From source with checksum ec971a6e7316f7fbf471b617905856b8
 From 
 http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
 The threshold parameter is a fraction in the range of (0%, 100%) with a 
 default value of 10%. The threshold sets a target for whether the cluster is 
 balanced. A cluster is balanced if for each datanode, the utilization of the 
 node (ratio of used space at the node to total capacity of the node) differs 
 from the utilization of the (ratio of used space in the cluster to total 
 capacity of the cluster) by no more than the threshold value. The smaller the 
 threshold, the more balanced a cluster will become. It takes more time to run 
 the balancer for small threshold values. Also for a very small threshold the 
 cluster may not be able to reach the balanced state when applications write 
 and delete files concurrently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes

2012-03-30 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3070:
-

 Target Version/s: 2.0.0  (was: 0.23.3)
Affects Version/s: (was: 0.24.0)
   2.0.0
   Status: Patch Available  (was: Open)

 hdfs balancer doesn't balance blocks between datanodes
 --

 Key: HDFS-3070
 URL: https://issues.apache.org/jira/browse/HDFS-3070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.0.0
Reporter: Stephen Chu
Assignee: Aaron T. Myers
 Attachments: HDFS-3070.patch, unbalanced_nodes.png, 
 unbalanced_nodes_inservice.png


 I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
 both have over 3% disk usage.
 Attached is a screenshot of the Live Nodes web UI.
 On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
 the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
 styx02 stay put).
 HA is currently enabled.
 [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
 active
 [schu@styx01 ~]$ hdfs balancer -threshold 1
 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
 12/03/08 10:10:32 INFO balancer.Balancer: p = 
 Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 Balancing took 95.0 milliseconds
 [schu@styx01 ~]$ 
 I believe with a threshold of 1% the balancer should trigger blocks being 
 moved across DataNodes, right? I am curious about the namenode = [] from 
 the above output.
 [schu@styx01 ~]$ hadoop version
 Hadoop 0.24.0-SNAPSHOT
 Subversion 
 git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
  -r f6a577d697bbcd04ffbc568167c97b79479ff319
 Compiled by schu on Thu Mar  8 15:32:50 PST 2012
 From source with checksum ec971a6e7316f7fbf471b617905856b8
 From 
 http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
 The threshold parameter is a fraction in the range of (0%, 100%) with a 
 default value of 10%. The threshold sets a target for whether the cluster is 
 balanced. A cluster is balanced if for each datanode, the utilization of the 
 node (ratio of used space at the node to total capacity of the node) differs 
 from the utilization of the (ratio of used space in the cluster to total 
 capacity of the cluster) by no more than the threshold value. The smaller the 
 threshold, the more balanced a cluster will become. It takes more time to run 
 the balancer for small threshold values. Also for a very small threshold the 
 cluster may not be able to reach the balanced state when applications write 
 and delete files concurrently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes

2012-03-09 Thread Eli Collins (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3070:
--

Target Version/s: 0.23.3

 hdfs balancer doesn't balance blocks between datanodes
 --

 Key: HDFS-3070
 URL: https://issues.apache.org/jira/browse/HDFS-3070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 0.24.0
Reporter: Stephen Chu
 Attachments: unbalanced_nodes.png, unbalanced_nodes_inservice.png


 I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
 both have over 3% disk usage.
 Attached is a screenshot of the Live Nodes web UI.
 On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
 the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
 styx02 stay put).
 HA is currently enabled.
 [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
 active
 [schu@styx01 ~]$ hdfs balancer -threshold 1
 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
 12/03/08 10:10:32 INFO balancer.Balancer: p = 
 Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 Balancing took 95.0 milliseconds
 [schu@styx01 ~]$ 
 I believe with a threshold of 1% the balancer should trigger blocks being 
 moved across DataNodes, right? I am curious about the namenode = [] from 
 the above output.
 [schu@styx01 ~]$ hadoop version
 Hadoop 0.24.0-SNAPSHOT
 Subversion 
 git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
  -r f6a577d697bbcd04ffbc568167c97b79479ff319
 Compiled by schu on Thu Mar  8 15:32:50 PST 2012
 From source with checksum ec971a6e7316f7fbf471b617905856b8
 From 
 http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
 The threshold parameter is a fraction in the range of (0%, 100%) with a 
 default value of 10%. The threshold sets a target for whether the cluster is 
 balanced. A cluster is balanced if for each datanode, the utilization of the 
 node (ratio of used space at the node to total capacity of the node) differs 
 from the utilization of the (ratio of used space in the cluster to total 
 capacity of the cluster) by no more than the threshold value. The smaller the 
 threshold, the more balanced a cluster will become. It takes more time to run 
 the balancer for small threshold values. Also for a very small threshold the 
 cluster may not be able to reach the balanced state when applications write 
 and delete files concurrently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes

2012-03-08 Thread Stephen Chu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Chu updated HDFS-3070:
--

Attachment: unbalanced_nodes.png

 hdfs balancer doesn't balance blocks between datanodes
 --

 Key: HDFS-3070
 URL: https://issues.apache.org/jira/browse/HDFS-3070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 0.24.0
Reporter: Stephen Chu
 Attachments: unbalanced_nodes.png


 I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
 both have over 3% disk usage.
 Attached is a screenshot of the Live Nodes web UI.
 On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
 the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
 styx02 stay put).
 HA is currently enabled.
 [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
 active
 [schu@styx01 ~]$ hdfs balancer -threshold 1
 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
 12/03/08 10:10:32 INFO balancer.Balancer: p = 
 Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 Balancing took 95.0 milliseconds
 [schu@styx01 ~]$ 
 I believe with a threshold of 1% the balancer should trigger blocks being 
 moved across DataNodes, right? I am curious about the namenode = [] from 
 the above output.
 [schu@styx01 ~]$ hadoop version
 Hadoop 0.24.0-SNAPSHOT
 Subversion 
 git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
  -r f6a577d697bbcd04ffbc568167c97b79479ff319
 Compiled by schu on Thu Mar  8 15:32:50 PST 2012
 From source with checksum ec971a6e7316f7fbf471b617905856b8
 From 
 http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
 The threshold parameter is a fraction in the range of (0%, 100%) with a 
 default value of 10%. The threshold sets a target for whether the cluster is 
 balanced. A cluster is balanced if for each datanode, the utilization of the 
 node (ratio of used space at the node to total capacity of the node) differs 
 from the utilization of the (ratio of used space in the cluster to total 
 capacity of the cluster) by no more than the threshold value. The smaller the 
 threshold, the more balanced a cluster will become. It takes more time to run 
 the balancer for small threshold values. Also for a very small threshold the 
 cluster may not be able to reach the balanced state when applications write 
 and delete files concurrently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes

2012-03-08 Thread Stephen Chu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Chu updated HDFS-3070:
--

Attachment: unbalanced_nodes_inservice.png

Woops, the first screenshot shows that 2 nodes are decommissioned. After 
recommissioning them and attempting to run hdfs balancer, the nodes still don't 
become balanced and the balancer claims to complete ~100 ms.

 hdfs balancer doesn't balance blocks between datanodes
 --

 Key: HDFS-3070
 URL: https://issues.apache.org/jira/browse/HDFS-3070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 0.24.0
Reporter: Stephen Chu
 Attachments: unbalanced_nodes.png, unbalanced_nodes_inservice.png


 I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
 both have over 3% disk usage.
 Attached is a screenshot of the Live Nodes web UI.
 On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
 the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
 styx02 stay put).
 HA is currently enabled.
 [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
 active
 [schu@styx01 ~]$ hdfs balancer -threshold 1
 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
 12/03/08 10:10:32 INFO balancer.Balancer: p = 
 Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 Balancing took 95.0 milliseconds
 [schu@styx01 ~]$ 
 I believe with a threshold of 1% the balancer should trigger blocks being 
 moved across DataNodes, right? I am curious about the namenode = [] from 
 the above output.
 [schu@styx01 ~]$ hadoop version
 Hadoop 0.24.0-SNAPSHOT
 Subversion 
 git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
  -r f6a577d697bbcd04ffbc568167c97b79479ff319
 Compiled by schu on Thu Mar  8 15:32:50 PST 2012
 From source with checksum ec971a6e7316f7fbf471b617905856b8
 From 
 http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
 The threshold parameter is a fraction in the range of (0%, 100%) with a 
 default value of 10%. The threshold sets a target for whether the cluster is 
 balanced. A cluster is balanced if for each datanode, the utilization of the 
 node (ratio of used space at the node to total capacity of the node) differs 
 from the utilization of the (ratio of used space in the cluster to total 
 capacity of the cluster) by no more than the threshold value. The smaller the 
 threshold, the more balanced a cluster will become. It takes more time to run 
 the balancer for small threshold values. Also for a very small threshold the 
 cluster may not be able to reach the balanced state when applications write 
 and delete files concurrently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira