[
https://issues.apache.org/jira/browse/HBASE-28596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wellington Chevreuil reopened HBASE-28596:
------------------------------------------
It seems the branch-2.6 PR broke
TestBucketCache.testNotifyFileCachingCompletedForEncodedDataSuccess on that
branch. I'll submit an addendum PR.
> Optimise BucketCache usage upon regions splits/merges.
> ------------------------------------------------------
>
> Key: HBASE-28596
> URL: https://issues.apache.org/jira/browse/HBASE-28596
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0
> Reporter: Wellington Chevreuil
> Assignee: Wellington Chevreuil
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.3
>
>
> This proposal aims to give more flexibility for users to decide whether or
> not blocks from a parent region should be evict, and also optimise cache
> usage by resolving file reference blocks to the referred block in the cache.
> Some extra context:
> 1) Originally, the default behaviour on splits was to rely on the
> "hbase.rs.evictblocksonclose" value to decide if the cached blocks from the
> parent split should be evicted or not. Then the resulting split daughters get
> open with refs to the parent file. If hbase.rs.prefetchblocksonopen is set,
> these openings will trigger a prefetch of the blocks from the parent split,
> now with cache keys from the ref path. That means, if
> "hbase.rs.evictblocksonclose" is false and “hbase.rs.prefetchblocksonopen” is
> true, we will be duplicating blocks in the cache. In scenarios where cache
> usage is at capacity and added latency for reading from the file system is
> high (for example reading from a cloud storage), this can have a severe
> impact, as the prefetch for the refs would trigger evictions. Also, the refs
> tend to be short lived, as compaction is triggered on the split daughters
> soon after it’s open.
> 2) HBASE-27474 has changed the original behaviour described above, to now
> always evict blocks from the split parent upon split is completed, and
> skipping prefetch for refs (since refs are short lived). The side effect is
> that the daughters blocks would only be cached once compaction is completed,
> but compaction itself will run slower since it needs to read the blocks from
> the file system. On regions as large as 20GB, the performance degradation
> reported by users has been severe.
> This proposes a new “hbase.rs.evictblocksonsplit” configuration property that
> makes the eviction over split configurable. Depending on the use case, the
> impact of mass evictions due to cache capacity may be higher, in which case
> users might prefer to keep evicting split parent blocks. Additionally, it
> modifies the way we handle refs when caching. HBASE-27474 behaviour was to
> skip caching refs to avoid duplicate data in the cache as long as compaction
> was enabled, relying on the fact that refs from splits are usually short
> lived. Here, we propose modifying the search for blocks cache keys, so that
> we always resolve the referenced file first and look for the related
> referenced file block in the cache. That way we avoid duplicates in the cache
> and also expedite scan performance on the split daughters, as it’s now
> resolving the referenced file and reading from the cache.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)