Re: [DISCUSS] Row Cache for HBase 2.x/3.x – feedback from the community

Andor Molnár Mon, 12 Jan 2026 17:27:23 -0800

Thanks Vladimir.

I think this would be a great addition to HBase.


Doesn't the benefits of row based caching strongly depend on the use case? 
What’s the advantage if clients don’t always need the entire row just a subset 
of cells? 
Is block cache more performant and memory efficient in this case?

Regards,
Andor




> On Jan 4, 2026, at 13:02, Vladimir Rodionov <[email protected]> wrote:
> 
> Hello HBase community,
> 
> I’d like to start a discussion around a feature that exists in related
> systems but is still missing in Apache HBase: row-level caching.
> 
> Both *Cassandra* and *Google Bigtable* provide a row cache for hot rows.
> Bigtable recently revisited this area and reported measurable gains for
> single-row reads. HBase today relies almost entirely on *block cache*,
> which is excellent for scans and predictable access patterns, but can be
> inefficient for *small random reads*, *hot rows spanning multiple blocks*,
> and *cloud / object-store–backed deployments*.
> 
> To explore this gap, I’ve been working on an *HBase Row Cache for HBase 2.x*,
> implemented as a *RegionObserver coprocessor*, and I’d appreciate feedback
> from HBase developers and operators.
> 
> *Project*:
> 
> https://github.com/VladRodionov/hbase-row-cache
> 
> 
> *Background / motivation (cloud focus):*
> 
> https://github.com/VladRodionov/hbase-row-cache/wiki/HBase:-Why-Block-Cache-Alone-Is-No-Longer-Enough-in-the-Cloud
> 
> What This Is
> 
> 
>   -
> 
>   Row-level cache for HBase 2.x (coprocessor-based)
>   -
> 
>   Powered by *Carrot Cache* (mostly off-heap, GC-friendly)
>   -
> 
>   Multi-level cache (L1/L2/L3)
>   -
> 
>   Read-through caching of table : rowkey : column-family
>   -
> 
>   Cache invalidation on any mutation of the corresponding row+CF
>   -
> 
>   Designed for *read-mostly, random-access* workloads
>   -
> 
>   Can be enabled per table or per column family
>   -
> 
>   Typically used *instead of*, not alongside, block cache
> 
> *Block Cache vs Row Cache (Conceptual)*
> 
> *Aspect*
> 
> *Block Cache*
> 
> *Row Cache*
> 
> Cached unit
> 
> HFile block (e.g. 64KB)
> 
> Row / column family
> 
> Optimized for
> 
> Scans, sequential access
> 
> Random small reads, hot rows
> 
> Memory efficiency for small reads
> 
> Low (unused data in blocks)
> 
> High (cache only requested data)
> 
> Rows spanning multiple blocks
> 
> Multiple blocks cached
> 
> Single cache entry
> 
> Read-path CPU cost
> 
> Decode & merge every read
> 
> Amortized across hits
> 
> Cloud / object store fit
> 
> Necessary but expensive
> 
> Reduces memory & I/O amplification
> 
> Block cache remains essential; row cache targets a *different optimization
> point*.
> 
> *Non-Goals (Important)*
> 
> 
>   -
> 
>   Not proposing removal or replacement of block cache
>   -
> 
>   Not suggesting this be merged into HBase core
>   -
> 
>   Not targeting scan-heavy or sequential workloads
>   -
> 
>   Not eliminating row reconstruction entirely
>   -
> 
>   Not optimized for write-heavy or highly mutable tables
>   -
> 
>   Not changing HBase storage or replication semantics
> 
> This is an *optional optimization* for a specific class of workloads.
> 
> *Why I’m Posting*
> 
> This is *not a merge proposal*, but a request for discussion:
> 
> 
>   1.
> 
>   Do you see *row-level caching* as relevant for modern HBase deployments?
>   2.
> 
>   Are there workloads where block cache alone is insufficient today?
>   3.
> 
>   Is a coprocessor-based approach reasonable for experimentation?
>   4.
> 
>   Are there historical or architectural reasons why row cache never landed
>   in HBase?
> 
> Any feedback—positive or critical—is very welcome.
> 
> Best regards,
> 
> Vladimir Rodionov

Re: [DISCUSS] Row Cache for HBase 2.x/3.x – feedback from the community

Reply via email to