[ https://issues.apache.org/jira/browse/HBASE-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15696937#comment-15696937 ]
Ted Yu commented on HBASE-17110: -------------------------------- Adding the above method should be fine. As long as StochasticLoadBalancer, etc provide proper implementation for the method, downstream projects can still inherit the implementation. > Add an "Overall Strategy" option(balanced both on table level and server > level) to SimpleLoadBalancer > ----------------------------------------------------------------------------------------------------- > > Key: HBASE-17110 > URL: https://issues.apache.org/jira/browse/HBASE-17110 > Project: HBase > Issue Type: Improvement > Components: Balancer > Affects Versions: 2.0.0, 1.2.4 > Reporter: Charlie Qiangeng Xu > Assignee: Charlie Qiangeng Xu > Attachments: HBASE-17110-V2.patch, HBASE-17110-V3.patch, > HBASE-17110-V4.patch, HBASE-17110-V5.patch, HBASE-17110.patch > > > This jira is about an enhancement of simpleLoadBalancer. Here we introduce a > new strategy: "bytableOverall" which could be controlled by adding: > {noformat} > <property> > <name>hbase.master.loadbalance.bytableOverall</name> > <value>true</value> > </property> > {noformat} > We have been using the strategy on our largest cluster for several months. > it's proven to be very helpful and stable, especially, the result is quite > visible to the users. > Here is the reason why it's helpful: > When operating large scale clusters(our case), some companies still prefer to > use {{SimpleLoadBalancer}} due to its simplicity, quick balance plan > generation, etc. Current SimpleLoadBalancer has two modes: > 1. byTable, which only guarantees that the regions of one table could be > uniformly distributed. > 2. byCluster, which ignores the distribution within tables and balance the > regions all together. > If the pressures on different tables are different, the first byTable option > is the preferable one in most case. Yet, this choice sacrifice the cluster > level balance and would cause some servers to have significantly higher load, > e.g. 242 regions on server A but 417 regions on server B.(real world stats) > Consider this case, a cluster has 3 tables and 4 servers: > {noformat} > server A has 3 regions: table1:1, table2:1, table3:1 > server B has 3 regions: table1:2, table2:2, table3:2 > server C has 3 regions: table1:3, table2:3, table3:3 > server D has 0 regions. > {noformat} > From the byTable strategy's perspective, the cluster has already been > perfectly balanced on table level. But a perfect status should be like: > {noformat} > server A has 2 regions: table2:1, table3:1 > server B has 2 regions: table1:2, table3:2 > server C has 3 regions: table1:3, table2:3, table3:3 > server D has 2 regions: table1:1, table2:2 > {noformat} > We can see the server loads change from 3,3,3,0 to 2,2,3,2, while the table1, > table2 and table3 still keep balanced. > And this is what the new mode "byTableOverall" can achieve. > Two UTs have been added as well and the last one demonstrates the advantage > of the new strategy. > Also, a onConfigurationChange method has been implemented to hot control the > "slop" variable. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)