[jira] [Commented] (HBASE-25739) TableSkewCostFunction need to use aggregated deviation

Nick Dimiduk (Jira) Thu, 24 Jun 2021 09:44:27 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368964#comment-17368964
 ]


Nick Dimiduk commented on HBASE-25739:
--------------------------------------

bq. Because of the fix, the default 0.05 minCostNeedBalance will not quite 
work. As a gap-stopper before I check in auto-tuning threshold, should I just 
reduce the default value? So people won't be caught off guard? The broken 
TableSkewCostFunction artificially inflate the total cost. So if the fix is in 
and we don't change threshold, people will be badly surprised that balancer 
gets stuck.

This sounds like a case where we have to implement both changes together, nor 
neither of them. In that case, we have to leave them both out of any patch 
releases.

> TableSkewCostFunction need to use aggregated deviation
> ------------------------------------------------------
>
>                 Key: HBASE-25739
>                 URL: https://issues.apache.org/jira/browse/HBASE-25739
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Balancer, master
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>            Priority: Major
>         Attachments: 
> TEST-org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerBalanceCluster.xml,
>  
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerBalanceCluster.txt
>
>
> TableSkewCostFunction uses the sum of the max deviation region per server for 
> all tables as the measure of unevenness. It doesn't work in a very common 
> scenario in operations. Say we have 100 regions on 50 nodes, two on each. We 
> add 50 new nodes and they have 0 each. The max deviation from the mean is 1, 
> compared to 99 in the worst case scenario of 100 regions on a single server. 
> The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer 
> wouldn't move.  The proposal is to use aggregated deviation of the count per 
> region server to detect this scenario, generating a cost of 100/198 = 0.5 in 
> this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-25739) TableSkewCostFunction need to use aggregated deviation

Reply via email to