Thanks Vladimir. I think this would be a great addition to HBase.
Doesn't the benefits of row based caching strongly depend on the use case? What’s the advantage if clients don’t always need the entire row just a subset of cells? Is block cache more performant and memory efficient in this case? Regards, Andor > On Jan 4, 2026, at 13:02, Vladimir Rodionov <[email protected]> wrote: > > Hello HBase community, > > I’d like to start a discussion around a feature that exists in related > systems but is still missing in Apache HBase: row-level caching. > > Both *Cassandra* and *Google Bigtable* provide a row cache for hot rows. > Bigtable recently revisited this area and reported measurable gains for > single-row reads. HBase today relies almost entirely on *block cache*, > which is excellent for scans and predictable access patterns, but can be > inefficient for *small random reads*, *hot rows spanning multiple blocks*, > and *cloud / object-store–backed deployments*. > > To explore this gap, I’ve been working on an *HBase Row Cache for HBase 2.x*, > implemented as a *RegionObserver coprocessor*, and I’d appreciate feedback > from HBase developers and operators. > > *Project*: > > https://github.com/VladRodionov/hbase-row-cache > > > *Background / motivation (cloud focus):* > > https://github.com/VladRodionov/hbase-row-cache/wiki/HBase:-Why-Block-Cache-Alone-Is-No-Longer-Enough-in-the-Cloud > > What This Is > > > - > > Row-level cache for HBase 2.x (coprocessor-based) > - > > Powered by *Carrot Cache* (mostly off-heap, GC-friendly) > - > > Multi-level cache (L1/L2/L3) > - > > Read-through caching of table : rowkey : column-family > - > > Cache invalidation on any mutation of the corresponding row+CF > - > > Designed for *read-mostly, random-access* workloads > - > > Can be enabled per table or per column family > - > > Typically used *instead of*, not alongside, block cache > > *Block Cache vs Row Cache (Conceptual)* > > *Aspect* > > *Block Cache* > > *Row Cache* > > Cached unit > > HFile block (e.g. 64KB) > > Row / column family > > Optimized for > > Scans, sequential access > > Random small reads, hot rows > > Memory efficiency for small reads > > Low (unused data in blocks) > > High (cache only requested data) > > Rows spanning multiple blocks > > Multiple blocks cached > > Single cache entry > > Read-path CPU cost > > Decode & merge every read > > Amortized across hits > > Cloud / object store fit > > Necessary but expensive > > Reduces memory & I/O amplification > > Block cache remains essential; row cache targets a *different optimization > point*. > > *Non-Goals (Important)* > > > - > > Not proposing removal or replacement of block cache > - > > Not suggesting this be merged into HBase core > - > > Not targeting scan-heavy or sequential workloads > - > > Not eliminating row reconstruction entirely > - > > Not optimized for write-heavy or highly mutable tables > - > > Not changing HBase storage or replication semantics > > This is an *optional optimization* for a specific class of workloads. > > *Why I’m Posting* > > This is *not a merge proposal*, but a request for discussion: > > > 1. > > Do you see *row-level caching* as relevant for modern HBase deployments? > 2. > > Are there workloads where block cache alone is insufficient today? > 3. > > Is a coprocessor-based approach reasonable for experimentation? > 4. > > Are there historical or architectural reasons why row cache never landed > in HBase? > > Any feedback—positive or critical—is very welcome. > > Best regards, > > Vladimir Rodionov
