[ https://issues.apache.org/jira/browse/HBASE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Elliott Clark updated HBASE-8517: --------------------------------- Priority: Blocker (was: Major) > Stochastic Loadbalancer isn't finding steady state on real clusters > ------------------------------------------------------------------- > > Key: HBASE-8517 > URL: https://issues.apache.org/jira/browse/HBASE-8517 > Project: HBase > Issue Type: Bug > Affects Versions: 0.95.0 > Reporter: Elliott Clark > Assignee: Elliott Clark > Priority: Blocker > Attachments: HBASE-8517-0.patch, HBASE-8517-1.patch, > HBASE-8517-2.patch > > > I have a cluster that runs IT tests. Last night after all tests were done I > noticed that the balancer was thrashing regions around. > The number of regions on each machine is not correct. > The balancer seems to value the cost of moving a region way too little. > {code} > 2013-05-09 16:34:58,920 DEBUG [IPC Server handler 4 on 60000] > org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished > computing new load balance plan. Computation took 5367ms to try 8910 > different iterations. Found a solution that moves 37 regions; Going from a > computed cost of 56.50254222730425 to a new cost of 11.214035466575254 > 2013-05-09 16:37:48,715 DEBUG [IPC Server handler 7 on 60000] > org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished > computing new load balance plan. Computation took 4735ms to try 8910 > different iterations. Found a solution that moves 38 regions; Going from a > computed cost of 56.612624531830996 to a new cost of 11.275763861636982 > 2013-05-09 16:38:11,398 DEBUG [IPC Server handler 6 on 60000] > org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished > computing new load balance plan. Computation took 4502ms to try 8910 > different iterations. Found a solution that moves 39 regions; Going from a > computed cost of 56.50048461413552 to a new cost of 11.225352339003237 > {code} > Each of those balancer runs were triggered when there was no load on the > cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira