[ 
https://issues.apache.org/jira/browse/HBASE-28596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-28596:
-----------------------------------------
    Affects Version/s: 3.0.0-beta-1
                       2.6.0
                       4.0.0-alpha-1
                       2.7.0

> Optimise BucketCache usage upon regions splits/merges.
> ------------------------------------------------------
>
>                 Key: HBASE-28596
>                 URL: https://issues.apache.org/jira/browse/HBASE-28596
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>              Labels: pull-request-available
>
> This proposal aims to give more flexibility for users to decide whether or 
> not blocks from a parent region should be evict, and also optimise cache 
> usage by resolving file reference blocks to the referred block in the cache.
> Some extra context:
> 1) Originally, the default behaviour on splits was to rely on the 
> "hbase.rs.evictblocksonclose" value to decide if the cached blocks from the 
> parent split should be evicted or not. Then the resulting split daughters get 
> open with refs to the parent file. If hbase.rs.prefetchblocksonopen is set, 
> these openings will trigger a prefetch of the blocks from the parent split, 
> now with cache keys from the ref path. That means, if 
> "hbase.rs.evictblocksonclose" is false and “hbase.rs.prefetchblocksonopen” is 
> true, we will be duplicating blocks in the cache. In scenarios where cache 
> usage is at capacity and added latency for reading from the file system is 
> high (for example reading from a cloud storage), this can have a severe 
> impact, as the prefetch for the refs would trigger evictions. Also, the refs 
> tend to be short lived, as compaction is triggered on the split daughters 
> soon after it’s open.
> 2) HBASE-27474 has changed the original behaviour described above, to now 
> always evict blocks from the split parent upon split is completed, and 
> skipping prefetch for refs (since refs are short lived). The side effect is 
> that the daughters blocks would only be cached once compaction is completed, 
> but compaction itself will run slower since it needs to read the blocks from 
> the file system. On regions as large as 20GB, the performance degradation 
> reported by users has been severe.
> This proposes a new “hbase.rs.evictblocksonsplit” configuration property that 
> makes the eviction over split configurable. Depending on the use case, the 
> impact of mass evictions due to cache capacity may be higher, in which case 
> users might prefer to keep evicting split parent blocks. Additionally, it 
> modifies the way we handle refs when caching. HBASE-27474 behaviour was to 
> skip caching refs to avoid duplicate data in the cache as long as compaction 
> was enabled, relying on the fact that refs from splits are usually short 
> lived. Here, we propose modifying the search for blocks cache keys, so that 
> we always resolve the referenced file first and look for the related 
> referenced file block in the cache. That way we avoid duplicates in the cache 
> and also expedite scan performance on the split daughters, as it’s now 
> resolving the referenced file and reading from the cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to