Ray Mattingly created HBASE-29772:
-------------------------------------

             Summary: The balancer is too slow with 100k regions and 
conditionals enabled
                 Key: HBASE-29772
                 URL: https://issues.apache.org/jira/browse/HBASE-29772
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 2.6.4
            Reporter: Ray Mattingly
            Assignee: Ray Mattingly


We have some clusters with upwards of 100k regions. These clusters also use 
system table isolation, meta table isolation, and read replica distribution 
balancer conditionals. We're starting to hit some real slow downs in balancer 
performance on clusters like this, particularly in the last mile of balancing 
(where, for example, there may only be 1/300 servers that is particularly 
under-utilized so random moves are very rarely a good one and moves are slow to 
evaluate due to the scale)

I'd suggest that we make the 
[SlopFixingCandidateGenerator|https://github.com/apache/hbase/blob/07de86938c58dfb627c1910f4f8db88d544b600e/hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/SlopFixingCandidateGenerator.java#L34]
 a default candidate generator which may be returned by `getRandomGenerator`. 
This should help the balancer find quicker paths to, at least, "unsloppy" 
balance before things begin to slow down 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to