[ 
https://issues.apache.org/jira/browse/HBASE-30134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-30134.
------------------------------------------
    Fix Version/s: 2.6.6
     Release Note: 
This modified CacheAwareLoadBalancer in order to exclude region data labeled 
"cold" by BucketCache Time Based Priority setting when calculation region's 
cache ratio to define imbalance. 

It also changes CacheAwareLoadBalancer.CacheAwareCostFunction to  
optimistically consider that moves of low cache ratio regions to servers with 
available cache space succeed in bringing up cache ratio for the given region, 
reducing the imbalance.

Low cache ratio for regions to be eligible to move to servers with available 
cache space is defined by a configurable, float threshold: 
hbase.master.balancer.cacheaware.lowCacheRatioThreshold (defaults to 0.35).

The cost of moving such low cache ratios regions to servers with available 
cache space will be calculating assuming the cache ratio for these regions in 
the target server would be region hot data size times a configurable float 
factor: hbase.master.balancer.cacheaware.potentialCacheRatioAfterMove (defaults 
to 0.95); 

The available cache space to be considered for the "hypothetical" cache ratio 
on target server from moved low cached region is also defined by a configurable 
float factor: hbase.master.balancer.cacheaware.minFreeCacheSpaceFactor 
(defaults to 1.0f);

For example, considering a cluster with total regions hot data size 100TB, 
total cache capacity 120TB, but having several regions with low cache ratio on 
their current servers, such that total cached hot data size is only 90TB and 
all regions are evenly distributed, so that there's no region skewness. Prior 
to these changes, if no region has cached data on servers other than the 
current server the regions are hosted on, CacheAwareCostFunction.cost for this 
setup would be 0 and no imbalance is seen. 

With HBASE-30134 default configurations, the initial cache ratio for this 
deployment would be (9/10) and the CacheAwareCost value ~ (1 -(9/10)). 

The imbalance is then defined by: 

((CacheAwareCost.cost * CacheAwareCost.multiplier) + 
(CacheAwareRegionSkewness.cost * CacheAwareRegionSkewness.multiplier)) /  
(CacheAwareCost.multiplier + CacheAwareRegionSkewness.multiplier)

In the example: (0.1 * 20) + (0*20) / (20 + 20) = 0.05

Balancer would be triggered, as the current imbalance is above the default 
hbase.master.balancer.stochastic.minCostNeedBalance value of 0.025.






       Resolution: Fixed

Merged to master, branch-3, branch-2 and branch-2.6. Thanks for the reviews, 
[~taklwu] [~vrodionov] !

> Improve CacheAwareLoadBalancer to consider low cache ratio when calculating 
> imbalance
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-30134
>                 URL: https://issues.apache.org/jira/browse/HBASE-30134
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Balancer
>    Affects Versions: 3.0.0-beta-1, 2.7.0, 2.6.5
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.0.0, 2.7.0, 2.6.6
>
>
> When enabling time based priority on CFs that are already cached on clusters 
> with datasets larger than the cache capacity, once the now cold data gets 
> evicted, hot data that was previously uncached due to no cache space would 
> remain uncached unless:
> 1) A client read request read it from the file system and caches it;
> 2) An operator manually disables and reenables tables with hot data, so that 
> the prefetch executor can run and cache those blocks.
> Both options are non optimal, leading to temporary performance impacts and/or 
> requiring manual interventions.
> The CacheAwareLoadBalancer, currently, only raises the cost of moving highly 
> cached regions, when calculating assignment plans, and doesn't consider cache 
> ratio at all when calculating potential imbalance, only skewness. So in the 
> scenario where regions are evenly distributed and there's no skewness, 
> CacheAwareLoadBalancer would not trigger any moves even when there would be 
> regions with low cache ratio on many servers and enough cache space to 
> accommodate those regions data in region servers cache.
> The solution for this problem will be split in two jiras, for ease of review. 
> The first part to be worked here will only include low cached ratio regions 
> in the imbalance calculation, so that CacheAwareLoadBalancer can trigger the 
> computation of new assignment plans. 
> HBASE-30135 should provide the changes needed to recalculate cache ratio on 
> the assignment plans, simulating that low cache ratio regions moved to enough 
> cache free space servers would get fully cached, so that such plan scores 
> higher then the current state.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to