[ 
https://issues.apache.org/jira/browse/HBASE-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charlie Qiangeng Xu updated HBASE-17110:
----------------------------------------
    Description: 
This jira is about an enhancement of simpleLoadBalancer. Here we introduce a 
new strategy: "bytableOverall" which could be controlled by adding:
<property>
  <name>hbase.master.loadbalance.bytableOverall</name>
  <value>true</value>
</property>
We have been using the strategy on our largest cluster for several months. it's 
proven to be very helpful and stable, especially, the result is quite visible 
to the users.

Here is the reason why it's helpful:
When operating large scale clusters(our case), some companies still prefer to 
use SimpleLoadBalancer due to its simplicity, quick balance plan generation, 
etc. Current SimpleLoadBalancer has two mode: 
1. byTable, which only guarantees that the regions of one table could be 
uniformly distributed. 
2. byCluster, which ignores the distribution within tables and balance the 
regions all together.
If the pressures on different tables are different, the first byTable option is 
preferable one in most case. Yet, this choice sacrifice the cluster level 
balance and would cause some servers to have significantly higher load, e.g. 
242 regions on server A but 417 regions on server B.(real case stats)
Consider this case,  a cluster has 3 tables and 4 servers:
  server A has 3 regions: table1:1, table2:1, table3:1
  server B has 3 regions: table1:2, table2:2, table3:2
  server C has 3 regions: table1:3, table2:3, table3:3
  server D has 0 regions.
>From the byTable strategy's perspective, the cluster has already been 
>perfectly balanced on table level. But a perfect status should be like:
  server A has 2 regions: table2:1, table3:1
  server B has 2 regions: table1:2, table3:2
  server C has 3 regions: table1:3, table2:3, table3:3
  server D has 2 regions: table1:1, table2:2
And this is what the new mode "byTableOverall" can achieve.

Two UTs have been added as well and the last one demonstrates the advantage of 
the new strategy.




 

  was:
This jira is about an enhancement of simpleLoadBalancer. Here we introduce a 
new strategy: "bytableOverall" which could be controlled by adding:
<property>
  <name>hbase.master.loadbalance.bytableOverall</name>
  <value>true</value>
</property>
We have been using the strategy on our largest cluster for several months. it's 
proven to be very helpful and stable, especially, the result is quite visible 
to the users.

Here is the reason why it's helpful:
When operating large scale clusters(our case), some companies still prefer to 
use SimpleLoadBalancer due to its simplicity, quick balance plan generation, 
etc. Current SimpleLoadBalancer has two mode: 
1. byTable, which only guarantees that the regions of one table could be 
uniformly distributed. 
2. byCluster, which ignores the distribution within tables and balance the 
regions all together.
If the pressures on different tables are different, the first byTable option is 
preferable one in most case. Yet, this choice sacrifice the cluster level 
balance and would cause some servers to have significantly higher load, e.g. 
240 regions on server A but 410 regions on server B.
Consider this case,  a cluster has 3 tables and 4 servers:
  server A has 3 regions: table1:1, table2:1, table3:1
  server B has 3 regions: table1:2, table2:2, table3:2
  server C has 3 regions: table1:3, table2:3, table3:3
  server D has 0 regions.
>From the byTable strategy's perspective, the cluster has already been 
>perfectly balanced on table level. But a perfect status should be like:
  server A has 2 regions: table2:1, table3:1
  server B has 2 regions: table1:2, table3:2
  server C has 3 regions: table1:3, table2:3, table3:3
  server D has 2 regions: table1:1, table2:2
And this is what the new mode "byTableOverall" can achieve.

Two UTs have been added as well and the last one demonstrates the advantage of 
the new strategy.




 


> Add an "Overall Strategy" option(balanced both on table level and server 
> level) to SimpleLoadBalancer
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-17110
>                 URL: https://issues.apache.org/jira/browse/HBASE-17110
>             Project: HBase
>          Issue Type: New Feature
>          Components: Balancer
>    Affects Versions: 2.0.0, 1.2.4
>            Reporter: Charlie Qiangeng Xu
>            Assignee: Charlie Qiangeng Xu
>
> This jira is about an enhancement of simpleLoadBalancer. Here we introduce a 
> new strategy: "bytableOverall" which could be controlled by adding:
> <property>
>   <name>hbase.master.loadbalance.bytableOverall</name>
>   <value>true</value>
> </property>
> We have been using the strategy on our largest cluster for several months. 
> it's proven to be very helpful and stable, especially, the result is quite 
> visible to the users.
> Here is the reason why it's helpful:
> When operating large scale clusters(our case), some companies still prefer to 
> use SimpleLoadBalancer due to its simplicity, quick balance plan generation, 
> etc. Current SimpleLoadBalancer has two mode: 
> 1. byTable, which only guarantees that the regions of one table could be 
> uniformly distributed. 
> 2. byCluster, which ignores the distribution within tables and balance the 
> regions all together.
> If the pressures on different tables are different, the first byTable option 
> is preferable one in most case. Yet, this choice sacrifice the cluster level 
> balance and would cause some servers to have significantly higher load, e.g. 
> 242 regions on server A but 417 regions on server B.(real case stats)
> Consider this case,  a cluster has 3 tables and 4 servers:
>   server A has 3 regions: table1:1, table2:1, table3:1
>   server B has 3 regions: table1:2, table2:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 0 regions.
> From the byTable strategy's perspective, the cluster has already been 
> perfectly balanced on table level. But a perfect status should be like:
>   server A has 2 regions: table2:1, table3:1
>   server B has 2 regions: table1:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 2 regions: table1:1, table2:2
> And this is what the new mode "byTableOverall" can achieve.
> Two UTs have been added as well and the last one demonstrates the advantage 
> of the new strategy.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to