[ 
https://issues.apache.org/jira/browse/HBASE-28596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28596.
------------------------------------------
    Release Note: 
This adds a new configuration property, "hbase.rs.evictblocksonsplit”, with 
default value set to true, which makes all parent region blocks to get evicted 
on split. 

It has modified behaviour implemented on previous HBASE-27474, to allow 
prefetch to run on the daughters' refs (if hbase.rs.prefetchblocksonopen is 
true).

It has also modified how BucketCache deals with blocks from reference files:
1) When adding blocks for a reference file, it first resolves the reference and 
check if the related block from the parent file is already in the cache. If so, 
it doesn't add any this block to the cache. Otherwise, it will add the block 
with the reference as the cache key.
2) When searching for blocks from a reference file in the cache, it first 
resolves the reference and check for the block from the original file, 
returning this one if found. Otherwise, it searches the cache again, now using 
the reference file as cache key.


      Resolution: Fixed

Merged into master, branch-3 and branch-2. Thanks for reviewing it [~taklwu] !

> Optimise BucketCache usage upon regions splits/merges.
> ------------------------------------------------------
>
>                 Key: HBASE-28596
>                 URL: https://issues.apache.org/jira/browse/HBASE-28596
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2
>
>
> This proposal aims to give more flexibility for users to decide whether or 
> not blocks from a parent region should be evict, and also optimise cache 
> usage by resolving file reference blocks to the referred block in the cache.
> Some extra context:
> 1) Originally, the default behaviour on splits was to rely on the 
> "hbase.rs.evictblocksonclose" value to decide if the cached blocks from the 
> parent split should be evicted or not. Then the resulting split daughters get 
> open with refs to the parent file. If hbase.rs.prefetchblocksonopen is set, 
> these openings will trigger a prefetch of the blocks from the parent split, 
> now with cache keys from the ref path. That means, if 
> "hbase.rs.evictblocksonclose" is false and “hbase.rs.prefetchblocksonopen” is 
> true, we will be duplicating blocks in the cache. In scenarios where cache 
> usage is at capacity and added latency for reading from the file system is 
> high (for example reading from a cloud storage), this can have a severe 
> impact, as the prefetch for the refs would trigger evictions. Also, the refs 
> tend to be short lived, as compaction is triggered on the split daughters 
> soon after it’s open.
> 2) HBASE-27474 has changed the original behaviour described above, to now 
> always evict blocks from the split parent upon split is completed, and 
> skipping prefetch for refs (since refs are short lived). The side effect is 
> that the daughters blocks would only be cached once compaction is completed, 
> but compaction itself will run slower since it needs to read the blocks from 
> the file system. On regions as large as 20GB, the performance degradation 
> reported by users has been severe.
> This proposes a new “hbase.rs.evictblocksonsplit” configuration property that 
> makes the eviction over split configurable. Depending on the use case, the 
> impact of mass evictions due to cache capacity may be higher, in which case 
> users might prefer to keep evicting split parent blocks. Additionally, it 
> modifies the way we handle refs when caching. HBASE-27474 behaviour was to 
> skip caching refs to avoid duplicate data in the cache as long as compaction 
> was enabled, relying on the fact that refs from splits are usually short 
> lived. Here, we propose modifying the search for blocks cache keys, so that 
> we always resolve the referenced file first and look for the related 
> referenced file block in the cache. That way we avoid duplicates in the cache 
> and also expedite scan performance on the split daughters, as it’s now 
> resolving the referenced file and reading from the cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to