[ 
https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243059#comment-13243059
 ] 

Uma Maheswara Rao G commented on HDFS-3070:
-------------------------------------------

Aaron, You are right. We have seen this yesterday and realized it. :-)
Before Federation we might not have the requirement of loading properties from 
hdfs-site.xml in balancer, some might have proceeded with default values set in 
the code. Becaus eConfiguration can load core-site.xml files.

I agree with the fix that creating the HdfsConfiguration class and passing.

To catch this bug in tests itself, I would suggest to call the runBalancerCLI( 
expose new API from Balancer with package scope) and make the run method 
private.
{code}
static int runBalancerCLI(String[] args) throws Exception {
    return ToolRunner.run(null, new Cli(), args); //Here you have to fix
  }
{code}

let main method and all tests call this function.


output from tests :

{quote}
2012-03-31 12:19:47,340 INFO  balancer.Balancer (Balancer.java:parse(1508)) - 
Using a threshold of 10.0
2012-03-31 12:19:47,340 INFO  balancer.Balancer (Balancer.java:run(1387)) - 
namenodes = []
2012-03-31 12:19:47,340 INFO  balancer.Balancer (Balancer.java:run(1388)) - p   
      = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0]
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  
Bytes Being Moved
Balancing took 1.0 milliseconds
2012-03-31 12:19:47,341 INFO  balancer.Balancer 
(TestBalancerWithMultipleNameNodes.java:runBalancer(164)) - BALANCER 2
2012-03-31 12:19:47,341 INFO  balancer.Balancer 
(TestBalancerWithMultipleNameNodes.java:wait(132)) - WAIT 
expectedUsedSpace=350, expectedTotalSpace=1000
2012-03-31 12:19:47,341 INFO  balancer.Balancer 
(TestBalancerWithMultipleNameNodes.java:runBalancer(166)) - BALANCER 3
2012-03-31 12:19:47,342 WARN  balancer.Balancer 
(TestBalancerWithMultipleNameNodes.java:runBalancer(183)) - datanodes[0]: 
getDfsUsed()=60, getCapacity()=500
2012-03-31 12:19:47,343 WARN  balancer.Balancer 
(TestBalancerWithMultipleNameNodes.java:runBalancer(183)) - datanodes[1]: 
getDfsUsed()=290, getCapacity()=500
2012-03-31 12:19:47,344 WARN  balancer.Balancer 
(TestBalancerWithMultipleNameNodes.java:runBalancer(200)) - datanodes 1 is not 
yet balanced: used=290, cap=500, avg=35.0
{quote}

Remove HdfsConfiguration object creation from Balancer Tests. 
                
> hdfs balancer doesn't balance blocks between datanodes
> ------------------------------------------------------
>
>                 Key: HDFS-3070
>                 URL: https://issues.apache.org/jira/browse/HDFS-3070
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer
>    Affects Versions: 2.0.0
>            Reporter: Stephen Chu
>            Assignee: Aaron T. Myers
>         Attachments: HDFS-3070.patch, unbalanced_nodes.png, 
> unbalanced_nodes_inservice.png
>
>
> I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, 
> both have over 3% disk usage.
> Attached is a screenshot of the Live Nodes web UI.
> On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see 
> the blocks being balanced across all 4 datanodes (all blocks on styx01 and 
> styx02 stay put).
> HA is currently enabled.
> [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1
> active
> [schu@styx01 ~]$ hdfs balancer -threshold 1
> 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0
> 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = []
> 12/03/08 10:10:32 INFO balancer.Balancer: p         = 
> Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
> Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  
> Bytes Being Moved
> Balancing took 95.0 milliseconds
> [schu@styx01 ~]$ 
> I believe with a threshold of 1% the balancer should trigger blocks being 
> moved across DataNodes, right? I am curious about the "namenode = []" from 
> the above output.
> [schu@styx01 ~]$ hadoop version
> Hadoop 0.24.0-SNAPSHOT
> Subversion 
> git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common
>  -r f6a577d697bbcd04ffbc568167c97b79479ff319
> Compiled by schu on Thu Mar  8 15:32:50 PST 2012
> From source with checksum ec971a6e7316f7fbf471b617905856b8
> From 
> http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html:
> The threshold parameter is a fraction in the range of (0%, 100%) with a 
> default value of 10%. The threshold sets a target for whether the cluster is 
> balanced. A cluster is balanced if for each datanode, the utilization of the 
> node (ratio of used space at the node to total capacity of the node) differs 
> from the utilization of the (ratio of used space in the cluster to total 
> capacity of the cluster) by no more than the threshold value. The smaller the 
> threshold, the more balanced a cluster will become. It takes more time to run 
> the balancer for small threshold values. Also for a very small threshold the 
> cluster may not be able to reach the balanced state when applications write 
> and delete files concurrently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to