Forgot to mention: Row Cache can be easily made cache implementation agnostic (Caffeine, EHCache) if it matters.
On Mon, Jan 12, 2026 at 6:27 PM Vladimir Rodionov <[email protected]> wrote: > Andor, below, my answers to your questions: > > > Doesn't the benefits of row based caching strongly depend on the use > case? > > Sure. It's a point queries, not a scan operation. The repo, I posted the > link : > https://github.com/VladRodionov/hbase-row-cache/wiki/HBase:-Why-Block-Cache-Alone-Is-No-Longer-Enough-in-the-Cloud > where you can find numerous use cases, where row cache will be useful. > > > What’s the advantage if clients don’t always need the entire row just a > subset of cells? > Yes, this is a known limitation of a current version. There is an open > ticket to support "sparse" rows. here: > https://github.com/VladRodionov/hbase-row-cache/issues/26 > > > Is block cache more performant and memory efficient in this case? > > The only use case where block cache will be more performant is a scan > operation, which involves multiple rows. These caches are complementary, > not mutually exclusive. Row Cache has a serious advantage in point queries > (It can do upto 100 Kops on full row reads, where each row is 3 families > with 3 columns and 10 versions). Block cache is more suitable for larger > operations, such as a scan of multiple rows. > Row cache can be enabled/disabled per table and per table's column > families. > > From RAM usage perspective, Row Cache (Carrot Cache) uses advanced data > compression scheme (zstd with dictionary), which usually allows to save an > additional 40-50% RAM > compared to all non-dictionary based compression algorithms. It works well > even if the individual data item is less than 100 bytes. Hbase Block Cache > (Bucket Cache) > uses this type of compression as well (maybe I am wrong here?), but it > compresses the whole block. > > Performance-wise, I think Row Cache should be much faster than Block Cache > if blocks cached are compressed (you will need to decompress and decode the > whole block on a point read). > > Another limitation of a Block (Bucket) cache is a high meta-data overhead > (like 100+ bytes vs 12-16 bytes in Row Cache) All meta data in Row Cache > (Carrot Cache) is off-heap as well. > > The repo has nice write up for when Row Cache is more preferable than a > Block cache. > > > > On Mon, Jan 12, 2026 at 5:27 PM Andor Molnár <[email protected]> wrote: > >> Thanks Vladimir. >> >> I think this would be a great addition to HBase. >> >> Doesn't the benefits of row based caching strongly depend on the use >> case? >> What’s the advantage if clients don’t always need the entire row just a >> subset of cells? >> Is block cache more performant and memory efficient in this case? >> >> Regards, >> Andor >> >> >> >> >> > On Jan 4, 2026, at 13:02, Vladimir Rodionov <[email protected]> >> wrote: >> > >> > Hello HBase community, >> > >> > I’d like to start a discussion around a feature that exists in related >> > systems but is still missing in Apache HBase: row-level caching. >> > >> > Both *Cassandra* and *Google Bigtable* provide a row cache for hot rows. >> > Bigtable recently revisited this area and reported measurable gains for >> > single-row reads. HBase today relies almost entirely on *block cache*, >> > which is excellent for scans and predictable access patterns, but can be >> > inefficient for *small random reads*, *hot rows spanning multiple >> blocks*, >> > and *cloud / object-store–backed deployments*. >> > >> > To explore this gap, I’ve been working on an *HBase Row Cache for HBase >> 2.x*, >> > implemented as a *RegionObserver coprocessor*, and I’d appreciate >> feedback >> > from HBase developers and operators. >> > >> > *Project*: >> > >> > https://github.com/VladRodionov/hbase-row-cache >> > >> > >> > *Background / motivation (cloud focus):* >> > >> > >> https://github.com/VladRodionov/hbase-row-cache/wiki/HBase:-Why-Block-Cache-Alone-Is-No-Longer-Enough-in-the-Cloud >> > >> > What This Is >> > >> > >> > - >> > >> > Row-level cache for HBase 2.x (coprocessor-based) >> > - >> > >> > Powered by *Carrot Cache* (mostly off-heap, GC-friendly) >> > - >> > >> > Multi-level cache (L1/L2/L3) >> > - >> > >> > Read-through caching of table : rowkey : column-family >> > - >> > >> > Cache invalidation on any mutation of the corresponding row+CF >> > - >> > >> > Designed for *read-mostly, random-access* workloads >> > - >> > >> > Can be enabled per table or per column family >> > - >> > >> > Typically used *instead of*, not alongside, block cache >> > >> > *Block Cache vs Row Cache (Conceptual)* >> > >> > *Aspect* >> > >> > *Block Cache* >> > >> > *Row Cache* >> > >> > Cached unit >> > >> > HFile block (e.g. 64KB) >> > >> > Row / column family >> > >> > Optimized for >> > >> > Scans, sequential access >> > >> > Random small reads, hot rows >> > >> > Memory efficiency for small reads >> > >> > Low (unused data in blocks) >> > >> > High (cache only requested data) >> > >> > Rows spanning multiple blocks >> > >> > Multiple blocks cached >> > >> > Single cache entry >> > >> > Read-path CPU cost >> > >> > Decode & merge every read >> > >> > Amortized across hits >> > >> > Cloud / object store fit >> > >> > Necessary but expensive >> > >> > Reduces memory & I/O amplification >> > >> > Block cache remains essential; row cache targets a *different >> optimization >> > point*. >> > >> > *Non-Goals (Important)* >> > >> > >> > - >> > >> > Not proposing removal or replacement of block cache >> > - >> > >> > Not suggesting this be merged into HBase core >> > - >> > >> > Not targeting scan-heavy or sequential workloads >> > - >> > >> > Not eliminating row reconstruction entirely >> > - >> > >> > Not optimized for write-heavy or highly mutable tables >> > - >> > >> > Not changing HBase storage or replication semantics >> > >> > This is an *optional optimization* for a specific class of workloads. >> > >> > *Why I’m Posting* >> > >> > This is *not a merge proposal*, but a request for discussion: >> > >> > >> > 1. >> > >> > Do you see *row-level caching* as relevant for modern HBase >> deployments? >> > 2. >> > >> > Are there workloads where block cache alone is insufficient today? >> > 3. >> > >> > Is a coprocessor-based approach reasonable for experimentation? >> > 4. >> > >> > Are there historical or architectural reasons why row cache never >> landed >> > in HBase? >> > >> > Any feedback—positive or critical—is very welcome. >> > >> > Best regards, >> > >> > Vladimir Rodionov >> >>
