hdfs balancer doesn't balance blocks between datanodes
------------------------------------------------------

                 Key: HDFS-3070
                 URL: https://issues.apache.org/jira/browse/HDFS-3070
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: balancer
    Affects Versions: 0.24.0
            Reporter: Stephen Chu
         Attachments: unbalanced_nodes.png

I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
both have over 3% disk usage.
Attached is a screenshot of the Live Nodes web UI.

On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
styx02 stay put).

HA is currently enabled.

[schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
active
[schu@styx01 ~]$ hdfs balancer -threshold 1
12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
12/03/08 10:10:32 INFO balancer.Balancer: p         = 
Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  
Bytes Being Moved
Balancing took 95.0 milliseconds
[schu@styx01 ~]$ 

I believe with a threshold of 1% the balancer should trigger blocks being moved 
across DataNodes, right? I am curious about the "namenode = []" from the above 
output.

[schu@styx01 ~]$ hadoop version
Hadoop 0.24.0-SNAPSHOT
Subversion 
git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
 -r f6a577d697bbcd04ffbc568167c97b79479ff319
Compiled by schu on Thu Mar  8 15:32:50 PST 2012
>From source with checksum ec971a6e7316f7fbf471b617905856b8

>From 
>http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
The threshold parameter is a fraction in the range of (0%, 100%) with a default 
value of 10%. The threshold sets a target for whether the cluster is balanced. 
A cluster is balanced if for each datanode, the utilization of the node (ratio 
of used space at the node to total capacity of the node) differs from the 
utilization of the (ratio of used space in the cluster to total capacity of the 
cluster) by no more than the threshold value. The smaller the threshold, the 
more balanced a cluster will become. It takes more time to run the balancer for 
small threshold values. Also for a very small threshold the cluster may not be 
able to reach the balanced state when applications write and delete files 
concurrently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to