Thanks for the pointer — I wasn’t aware of HBASE-29585 when I sent the note, appreciate you sharing it.
>From a quick look, it seems the current effort is focused on a Caffeine-based implementation integrated into HBase internals, which is great to see moving forward. The motivation behind my proposal was slightly different: exploring whether a row cache could be implemented in a more non-intrusive way (coprocessor-only), with minimal Java heap footprint and optional compression (dictionary-based), mainly to understand the trade-offs around metadata overhead and memory density. It sounds like the core use case is already being addressed in the ongoing work, so I won’t push this further on the list. I’ll keep following HBASE-29585 with interest, and happy to share observations if they turn out to be useful. Thanks again for the reference. On Tue, Jan 13, 2026 at 7:43 AM 张铎(Duo Zhang) <[email protected]> wrote: > There is already a jira issue and a contributor is working on it. > > https://issues.apache.org/jira/browse/HBASE-29585 > > Charles Connell via dev <[email protected]> 于2026年1月13日周二 23:02写道: > > > > I'm interesting in trying out the row cache for some of our data sets > > at HubSpot. No timeline available yet, although I'm sure it will be > > before the end of 2026. I'm excited to see what I can do for us. > > > > On Mon, Jan 12, 2026 at 10:52 PM Vladimir Rodionov > > <[email protected]> wrote: > > > > > > Forgot to mention: Row Cache can be easily made cache implementation > > > agnostic (Caffeine, EHCache) if it matters. > > > > > > > > > On Mon, Jan 12, 2026 at 6:27 PM Vladimir Rodionov < > [email protected]> > > > wrote: > > > > > > > Andor, below, my answers to your questions: > > > > > > > > > Doesn't the benefits of row based caching strongly depend on the > use > > > > case? > > > > > > > > Sure. It's a point queries, not a scan operation. The repo, I posted > the > > > > link : > > > > > https://github.com/VladRodionov/hbase-row-cache/wiki/HBase:-Why-Block-Cache-Alone-Is-No-Longer-Enough-in-the-Cloud > > > > where you can find numerous use cases, where row cache will be > useful. > > > > > > > > > What’s the advantage if clients don’t always need the entire row > just a > > > > subset of cells? > > > > Yes, this is a known limitation of a current version. There is an > open > > > > ticket to support "sparse" rows. here: > > > > https://github.com/VladRodionov/hbase-row-cache/issues/26 > > > > > > > > > Is block cache more performant and memory efficient in this case? > > > > > > > > The only use case where block cache will be more performant is a scan > > > > operation, which involves multiple rows. These caches are > complementary, > > > > not mutually exclusive. Row Cache has a serious advantage in point > queries > > > > (It can do upto 100 Kops on full row reads, where each row is 3 > families > > > > with 3 columns and 10 versions). Block cache is more suitable for > larger > > > > operations, such as a scan of multiple rows. > > > > Row cache can be enabled/disabled per table and per table's column > > > > families. > > > > > > > > From RAM usage perspective, Row Cache (Carrot Cache) uses advanced > data > > > > compression scheme (zstd with dictionary), which usually allows to > save an > > > > additional 40-50% RAM > > > > compared to all non-dictionary based compression algorithms. It > works well > > > > even if the individual data item is less than 100 bytes. Hbase Block > Cache > > > > (Bucket Cache) > > > > uses this type of compression as well (maybe I am wrong here?), but > it > > > > compresses the whole block. > > > > > > > > Performance-wise, I think Row Cache should be much faster than Block > Cache > > > > if blocks cached are compressed (you will need to decompress and > decode the > > > > whole block on a point read). > > > > > > > > Another limitation of a Block (Bucket) cache is a high meta-data > overhead > > > > (like 100+ bytes vs 12-16 bytes in Row Cache) All meta data in Row > Cache > > > > (Carrot Cache) is off-heap as well. > > > > > > > > The repo has nice write up for when Row Cache is more preferable > than a > > > > Block cache. > > > > > > > > > > > > > > > > On Mon, Jan 12, 2026 at 5:27 PM Andor Molnár <[email protected]> > wrote: > > > > > > > >> Thanks Vladimir. > > > >> > > > >> I think this would be a great addition to HBase. > > > >> > > > >> Doesn't the benefits of row based caching strongly depend on the use > > > >> case? > > > >> What’s the advantage if clients don’t always need the entire row > just a > > > >> subset of cells? > > > >> Is block cache more performant and memory efficient in this case? > > > >> > > > >> Regards, > > > >> Andor > > > >> > > > >> > > > >> > > > >> > > > >> > On Jan 4, 2026, at 13:02, Vladimir Rodionov < > [email protected]> > > > >> wrote: > > > >> > > > > >> > Hello HBase community, > > > >> > > > > >> > I’d like to start a discussion around a feature that exists in > related > > > >> > systems but is still missing in Apache HBase: row-level caching. > > > >> > > > > >> > Both *Cassandra* and *Google Bigtable* provide a row cache for > hot rows. > > > >> > Bigtable recently revisited this area and reported measurable > gains for > > > >> > single-row reads. HBase today relies almost entirely on *block > cache*, > > > >> > which is excellent for scans and predictable access patterns, but > can be > > > >> > inefficient for *small random reads*, *hot rows spanning multiple > > > >> blocks*, > > > >> > and *cloud / object-store–backed deployments*. > > > >> > > > > >> > To explore this gap, I’ve been working on an *HBase Row Cache for > HBase > > > >> 2.x*, > > > >> > implemented as a *RegionObserver coprocessor*, and I’d appreciate > > > >> feedback > > > >> > from HBase developers and operators. > > > >> > > > > >> > *Project*: > > > >> > > > > >> > https://github.com/VladRodionov/hbase-row-cache > > > >> > > > > >> > > > > >> > *Background / motivation (cloud focus):* > > > >> > > > > >> > > > > >> > https://github.com/VladRodionov/hbase-row-cache/wiki/HBase:-Why-Block-Cache-Alone-Is-No-Longer-Enough-in-the-Cloud > > > >> > > > > >> > What This Is > > > >> > > > > >> > > > > >> > - > > > >> > > > > >> > Row-level cache for HBase 2.x (coprocessor-based) > > > >> > - > > > >> > > > > >> > Powered by *Carrot Cache* (mostly off-heap, GC-friendly) > > > >> > - > > > >> > > > > >> > Multi-level cache (L1/L2/L3) > > > >> > - > > > >> > > > > >> > Read-through caching of table : rowkey : column-family > > > >> > - > > > >> > > > > >> > Cache invalidation on any mutation of the corresponding row+CF > > > >> > - > > > >> > > > > >> > Designed for *read-mostly, random-access* workloads > > > >> > - > > > >> > > > > >> > Can be enabled per table or per column family > > > >> > - > > > >> > > > > >> > Typically used *instead of*, not alongside, block cache > > > >> > > > > >> > *Block Cache vs Row Cache (Conceptual)* > > > >> > > > > >> > *Aspect* > > > >> > > > > >> > *Block Cache* > > > >> > > > > >> > *Row Cache* > > > >> > > > > >> > Cached unit > > > >> > > > > >> > HFile block (e.g. 64KB) > > > >> > > > > >> > Row / column family > > > >> > > > > >> > Optimized for > > > >> > > > > >> > Scans, sequential access > > > >> > > > > >> > Random small reads, hot rows > > > >> > > > > >> > Memory efficiency for small reads > > > >> > > > > >> > Low (unused data in blocks) > > > >> > > > > >> > High (cache only requested data) > > > >> > > > > >> > Rows spanning multiple blocks > > > >> > > > > >> > Multiple blocks cached > > > >> > > > > >> > Single cache entry > > > >> > > > > >> > Read-path CPU cost > > > >> > > > > >> > Decode & merge every read > > > >> > > > > >> > Amortized across hits > > > >> > > > > >> > Cloud / object store fit > > > >> > > > > >> > Necessary but expensive > > > >> > > > > >> > Reduces memory & I/O amplification > > > >> > > > > >> > Block cache remains essential; row cache targets a *different > > > >> optimization > > > >> > point*. > > > >> > > > > >> > *Non-Goals (Important)* > > > >> > > > > >> > > > > >> > - > > > >> > > > > >> > Not proposing removal or replacement of block cache > > > >> > - > > > >> > > > > >> > Not suggesting this be merged into HBase core > > > >> > - > > > >> > > > > >> > Not targeting scan-heavy or sequential workloads > > > >> > - > > > >> > > > > >> > Not eliminating row reconstruction entirely > > > >> > - > > > >> > > > > >> > Not optimized for write-heavy or highly mutable tables > > > >> > - > > > >> > > > > >> > Not changing HBase storage or replication semantics > > > >> > > > > >> > This is an *optional optimization* for a specific class of > workloads. > > > >> > > > > >> > *Why I’m Posting* > > > >> > > > > >> > This is *not a merge proposal*, but a request for discussion: > > > >> > > > > >> > > > > >> > 1. > > > >> > > > > >> > Do you see *row-level caching* as relevant for modern HBase > > > >> deployments? > > > >> > 2. > > > >> > > > > >> > Are there workloads where block cache alone is insufficient > today? > > > >> > 3. > > > >> > > > > >> > Is a coprocessor-based approach reasonable for experimentation? > > > >> > 4. > > > >> > > > > >> > Are there historical or architectural reasons why row cache > never > > > >> landed > > > >> > in HBase? > > > >> > > > > >> > Any feedback—positive or critical—is very welcome. > > > >> > > > > >> > Best regards, > > > >> > > > > >> > Vladimir Rodionov > > > >> > > > >> >
