[ https://issues.apache.org/jira/browse/HBASE-24139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Jasani resolved HBASE-24139. ---------------------------------- Resolution: Fixed > Balancer should avoid leaving idle region servers > ------------------------------------------------- > > Key: HBASE-24139 > URL: https://issues.apache.org/jira/browse/HBASE-24139 > Project: HBase > Issue Type: Improvement > Components: Balancer, Operability > Reporter: Sean Busbey > Assignee: Beata Sudi > Priority: Critical > Labels: beginner > Fix For: 3.0.0, 2.3.0, 1.7.0 > > > After HBASE-15529 the StochasticLoadBalancer makes the decision to run based > on its internal cost functions rather than the simple region count skew of > BaseLoadBalancer. > Given the default weights for those cost functions, the default minimum cost > to indicate a need to rebalance, and a regions per region server density of > ~90 we are not very responsive to adding additional region servers for > non-trivial cluster sizes: > * For clusters ~10 nodes, the defaults think a single RS at 0 regions means > we need to balance > * For clusters >20 nodes, the defaults will not consider a single RS at 0 > regions to mean we need to balance. 2 RS at 0 will cause it to balance. > * For clusters ~100 nodes, having 6 RS with no regions will still not meet > the threshold to cause a balance. > Note that this is the decision to look at balancer plans at all. The > calculation is severely dominated by the region count skew (it has weight 500 > and all other weights are ~105), so barring a very significant change in all > other cost functions this condition will persist indefinitely. > Two possible approaches: > * add a new cost function that's essentially "don't have RS with 0 regions" > that an operator can tune > * add a short circuit condition for the {{needsBalance}} method that checks > for empty RS similar to the check we do for colocated region replicas > For those currently hitting this an easy work around is to set > {{hbase.master.balancer.stochastic.minCostNeedBalance}} to {{0.01}}. This > will mean that a single RS having 0 regions will cause the balancer to run > for clusters of up to ~90 region servers. It's essentially the same as the > default slop of 0.01 used by the BaseLoadBalancer. -- This message was sent by Atlassian Jira (v8.3.4#803005)