[jira] [Created] (HBASE-29727) Introduce a String pool for repeating filename, region and cf string fields in BlockCacheKey

Wellington Chevreuil (Jira) Fri, 21 Nov 2025 13:22:13 -0800

Wellington Chevreuil created HBASE-29727:
--------------------------------------------


             Summary: Introduce a String pool for repeating filename, region 
and cf string fields in BlockCacheKey
                 Key: HBASE-29727
                 URL: https://issues.apache.org/jira/browse/HBASE-29727
             Project: HBase
          Issue Type: Improvement
            Reporter: Wellington Chevreuil
            Assignee: Wellington Chevreuil


For every block added to BucketCache, we create and keep a BlockCacheKey object 
with a String attribute for the file name the blocks belong to, plus the Path 
containing the entire path for the given file. HFiles will normally contain 
many blocks, and for all blocks from a same file, these attributes will have 
the very same value, yet, we create different instances for each of the blocks. 
When using file based bucket cache, where the bucket cache size is in the TB 
magnitude, the total block count in the cache can grow very large, and so is 
the heap used by the BucketCache object, due to the high count of BlockCacheKey 
instances it has to keep.

For a few years now, the reference architecture with my employer for hbase 
clusters on the cloud  has been to deploy hbase root dir on cloud storage, then 
use ephemeral SSD disks shipped within the RSes node VMs to for a file based 
BucketCache. At the moment, the standard VM profile used allows for as much as 
1.6TB of BucketCache capacity. For a cache of such size, with the default block 
size of 64KB, we see on average, 30M blocks, with a minimal heap usage around 
12GB.

With cloud providers now offering different VM profiles with more ephemeral SSD 
disks capacity, we are looking for alternatives to optimise the heap usage by 
BucketCache. The approach proposed here, is to define a "string pool" for 
mapping the String attributes in the BlockCacheKey class to integer ids, so 
that we can save some bytes for blocks from same file. 





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HBASE-29727) Introduce a String pool for repeating filename, region and cf string fields in BlockCacheKey

Reply via email to