[ https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243067#comment-13243067 ]
Aaron T. Myers commented on HDFS-3070: -------------------------------------- Hi Uma, bq. To catch this bug in tests itself, I would suggest to call the runBalancerCLI... I don't think this will actually expose the bug. The trouble isn't that the object isn't an instance of HdfsConfiguration, but rather that HdfsConfiguration never gets class-loaded and therefore the static initializer that add hdfs-default.xml and hdfs-site.xml as resources never gets called. Another perfectly valid solution would have been to continue to pass "null" for the configuration object, but to call HdfsConfiguration#init() somewhere (anywhere) in the Balancer. So, the only way to write a test that would catch this would be if from the tests we forked a new JVM to run the balancer, and examining the effects. Doing that doesn't seem worth it to me, for something that's such a simple bug. bq. BTW, could you please edit the issue title? Good idea. Will do. > hdfs balancer doesn't balance blocks between datanodes > ------------------------------------------------------ > > Key: HDFS-3070 > URL: https://issues.apache.org/jira/browse/HDFS-3070 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer > Affects Versions: 2.0.0 > Reporter: Stephen Chu > Assignee: Aaron T. Myers > Attachments: HDFS-3070.patch, unbalanced_nodes.png, > unbalanced_nodes_inservice.png > > > I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, > both have over 3% disk usage. > Attached is a screenshot of the Live Nodes web UI. > On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see > the blocks being balanced across all 4 datanodes (all blocks on styx01 and > styx02 stay put). > HA is currently enabled. > [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1 > active > [schu@styx01 ~]$ hdfs balancer -threshold 1 > 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0 > 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = [] > 12/03/08 10:10:32 INFO balancer.Balancer: p = > Balancer.Parameters[BalancingPolicy.Node, threshold=1.0] > Time Stamp Iteration# Bytes Already Moved Bytes Left To Move > Bytes Being Moved > Balancing took 95.0 milliseconds > [schu@styx01 ~]$ > I believe with a threshold of 1% the balancer should trigger blocks being > moved across DataNodes, right? I am curious about the "namenode = []" from > the above output. > [schu@styx01 ~]$ hadoop version > Hadoop 0.24.0-SNAPSHOT > Subversion > git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common > -r f6a577d697bbcd04ffbc568167c97b79479ff319 > Compiled by schu on Thu Mar 8 15:32:50 PST 2012 > From source with checksum ec971a6e7316f7fbf471b617905856b8 > From > http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html: > The threshold parameter is a fraction in the range of (0%, 100%) with a > default value of 10%. The threshold sets a target for whether the cluster is > balanced. A cluster is balanced if for each datanode, the utilization of the > node (ratio of used space at the node to total capacity of the node) differs > from the utilization of the (ratio of used space in the cluster to total > capacity of the cluster) by no more than the threshold value. The smaller the > threshold, the more balanced a cluster will become. It takes more time to run > the balancer for small threshold values. Also for a very small threshold the > cluster may not be able to reach the balanced state when applications write > and delete files concurrently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira