[jira] [Commented] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes

Aaron T. Myers (Commented) (JIRA) Sat, 31 Mar 2012 00:29:15 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243067#comment-13243067
 ]


Aaron T. Myers commented on HDFS-3070:
--------------------------------------

Hi Uma,

bq. To catch this bug in tests itself, I would suggest to call the 
runBalancerCLI...

I don't think this will actually expose the bug. The trouble isn't that the 
object isn't an instance of HdfsConfiguration, but rather that 
HdfsConfiguration never gets class-loaded and therefore the static initializer 
that add hdfs-default.xml and hdfs-site.xml as resources never gets called. 
Another perfectly valid solution would have been to continue to pass "null" for 
the configuration object, but to call HdfsConfiguration#init() somewhere 
(anywhere) in the Balancer. So, the only way to write a test that would catch 
this would be if from the tests we forked a new JVM to run the balancer, and 
examining the effects. Doing that doesn't seem worth it to me, for something 
that's such a simple bug.


bq. BTW, could you please edit the issue title?

Good idea. Will do.
                
> hdfs balancer doesn't balance blocks between datanodes
> ------------------------------------------------------
>
>                 Key: HDFS-3070
>                 URL: https://issues.apache.org/jira/browse/HDFS-3070
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer
>    Affects Versions: 2.0.0
>            Reporter: Stephen Chu
>            Assignee: Aaron T. Myers
>         Attachments: HDFS-3070.patch, unbalanced_nodes.png, 
> unbalanced_nodes_inservice.png
>
>
> I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
> both have over 3% disk usage.
> Attached is a screenshot of the Live Nodes web UI.
> On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
> the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
> styx02 stay put).
> HA is currently enabled.
> [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
> active
> [schu@styx01 ~]$ hdfs balancer -threshold 1
> 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
> 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
> 12/03/08 10:10:32 INFO balancer.Balancer: p         = 
> Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
> Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  
> Bytes Being Moved
> Balancing took 95.0 milliseconds
> [schu@styx01 ~]$ 
> I believe with a threshold of 1% the balancer should trigger blocks being 
> moved across DataNodes, right? I am curious about the "namenode = []" from 
> the above output.
> [schu@styx01 ~]$ hadoop version
> Hadoop 0.24.0-SNAPSHOT
> Subversion 
> git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
>  -r f6a577d697bbcd04ffbc568167c97b79479ff319
> Compiled by schu on Thu Mar  8 15:32:50 PST 2012
> From source with checksum ec971a6e7316f7fbf471b617905856b8
> From 
> http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
> The threshold parameter is a fraction in the range of (0%, 100%) with a 
> default value of 10%. The threshold sets a target for whether the cluster is 
> balanced. A cluster is balanced if for each datanode, the utilization of the 
> node (ratio of used space at the node to total capacity of the node) differs 
> from the utilization of the (ratio of used space in the cluster to total 
> capacity of the cluster) by no more than the threshold value. The smaller the 
> threshold, the more balanced a cluster will become. It takes more time to run 
> the balancer for small threshold values. Also for a very small threshold the 
> cluster may not be able to reach the balanced state when applications write 
> and delete files concurrently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes

Reply via email to