Re: [DISCUSS] Row Cache for HBase 2.x/3.x – feedback from the community

Vladimir Rodionov Mon, 12 Jan 2026 19:52:20 -0800

Forgot to mention: Row Cache can be easily made cache implementation
agnostic (Caffeine, EHCache) if it matters.



On Mon, Jan 12, 2026 at 6:27 PM Vladimir Rodionov <[email protected]>
wrote:

> Andor, below, my answers to your questions:
>
> > Doesn't the benefits of row based caching strongly depend on the use
> case?
>
> Sure. It's a point queries, not a scan operation. The repo, I posted the
> link :
> https://github.com/VladRodionov/hbase-row-cache/wiki/HBase:-Why-Block-Cache-Alone-Is-No-Longer-Enough-in-the-Cloud
> where you can find numerous use cases, where row cache will be useful.
>
> > What’s the advantage if clients don’t always need the entire row just a
> subset of cells?
> Yes, this is a known limitation of a current version. There is an open
>  ticket to support "sparse" rows. here:
> https://github.com/VladRodionov/hbase-row-cache/issues/26
>
> > Is block cache more performant and memory efficient in this case?
>
> The only use case where block cache will be more performant is a scan
> operation, which involves multiple rows. These caches are complementary,
> not mutually exclusive. Row Cache has a serious advantage in point queries
> (It can do upto 100 Kops on full row reads, where each row is 3 families
> with 3 columns and 10 versions).  Block cache is more suitable for larger
> operations, such as a scan of multiple rows.
> Row cache can be enabled/disabled per table and per table's column
> families.
>
> From RAM usage perspective, Row Cache (Carrot Cache) uses advanced data
> compression scheme (zstd with dictionary), which usually allows to save  an
> additional 40-50% RAM
> compared to all non-dictionary based compression algorithms. It works well
> even if the individual data item is less than 100 bytes. Hbase Block Cache
> (Bucket Cache)
> uses this type of compression as well (maybe I am wrong here?), but it
> compresses the whole block.
>
> Performance-wise, I think Row Cache should be much faster than Block Cache
> if blocks cached are compressed (you will need to decompress and decode the
> whole block on a point read).
>
> Another limitation of a Block (Bucket) cache is a high meta-data overhead
> (like 100+ bytes vs 12-16 bytes in Row Cache) All meta data in Row Cache
> (Carrot Cache) is off-heap as well.
>
> The repo has nice write up for when Row Cache is more preferable than a
> Block cache.
>
>
>
> On Mon, Jan 12, 2026 at 5:27 PM Andor Molnár <[email protected]> wrote:
>
>> Thanks Vladimir.
>>
>> I think this would be a great addition to HBase.
>>
>> Doesn't the benefits of row based caching strongly depend on the use
>> case?
>> What’s the advantage if clients don’t always need the entire row just a
>> subset of cells?
>> Is block cache more performant and memory efficient in this case?
>>
>> Regards,
>> Andor
>>
>>
>>
>>
>> > On Jan 4, 2026, at 13:02, Vladimir Rodionov <[email protected]>
>> wrote:
>> >
>> > Hello HBase community,
>> >
>> > I’d like to start a discussion around a feature that exists in related
>> > systems but is still missing in Apache HBase: row-level caching.
>> >
>> > Both *Cassandra* and *Google Bigtable* provide a row cache for hot rows.
>> > Bigtable recently revisited this area and reported measurable gains for
>> > single-row reads. HBase today relies almost entirely on *block cache*,
>> > which is excellent for scans and predictable access patterns, but can be
>> > inefficient for *small random reads*, *hot rows spanning multiple
>> blocks*,
>> > and *cloud / object-store–backed deployments*.
>> >
>> > To explore this gap, I’ve been working on an *HBase Row Cache for HBase
>> 2.x*,
>> > implemented as a *RegionObserver coprocessor*, and I’d appreciate
>> feedback
>> > from HBase developers and operators.
>> >
>> > *Project*:
>> >
>> > https://github.com/VladRodionov/hbase-row-cache
>> >
>> >
>> > *Background / motivation (cloud focus):*
>> >
>> >
>> https://github.com/VladRodionov/hbase-row-cache/wiki/HBase:-Why-Block-Cache-Alone-Is-No-Longer-Enough-in-the-Cloud
>> >
>> > What This Is
>> >
>> >
>> >   -
>> >
>> >   Row-level cache for HBase 2.x (coprocessor-based)
>> >   -
>> >
>> >   Powered by *Carrot Cache* (mostly off-heap, GC-friendly)
>> >   -
>> >
>> >   Multi-level cache (L1/L2/L3)
>> >   -
>> >
>> >   Read-through caching of table : rowkey : column-family
>> >   -
>> >
>> >   Cache invalidation on any mutation of the corresponding row+CF
>> >   -
>> >
>> >   Designed for *read-mostly, random-access* workloads
>> >   -
>> >
>> >   Can be enabled per table or per column family
>> >   -
>> >
>> >   Typically used *instead of*, not alongside, block cache
>> >
>> > *Block Cache vs Row Cache (Conceptual)*
>> >
>> > *Aspect*
>> >
>> > *Block Cache*
>> >
>> > *Row Cache*
>> >
>> > Cached unit
>> >
>> > HFile block (e.g. 64KB)
>> >
>> > Row / column family
>> >
>> > Optimized for
>> >
>> > Scans, sequential access
>> >
>> > Random small reads, hot rows
>> >
>> > Memory efficiency for small reads
>> >
>> > Low (unused data in blocks)
>> >
>> > High (cache only requested data)
>> >
>> > Rows spanning multiple blocks
>> >
>> > Multiple blocks cached
>> >
>> > Single cache entry
>> >
>> > Read-path CPU cost
>> >
>> > Decode & merge every read
>> >
>> > Amortized across hits
>> >
>> > Cloud / object store fit
>> >
>> > Necessary but expensive
>> >
>> > Reduces memory & I/O amplification
>> >
>> > Block cache remains essential; row cache targets a *different
>> optimization
>> > point*.
>> >
>> > *Non-Goals (Important)*
>> >
>> >
>> >   -
>> >
>> >   Not proposing removal or replacement of block cache
>> >   -
>> >
>> >   Not suggesting this be merged into HBase core
>> >   -
>> >
>> >   Not targeting scan-heavy or sequential workloads
>> >   -
>> >
>> >   Not eliminating row reconstruction entirely
>> >   -
>> >
>> >   Not optimized for write-heavy or highly mutable tables
>> >   -
>> >
>> >   Not changing HBase storage or replication semantics
>> >
>> > This is an *optional optimization* for a specific class of workloads.
>> >
>> > *Why I’m Posting*
>> >
>> > This is *not a merge proposal*, but a request for discussion:
>> >
>> >
>> >   1.
>> >
>> >   Do you see *row-level caching* as relevant for modern HBase
>> deployments?
>> >   2.
>> >
>> >   Are there workloads where block cache alone is insufficient today?
>> >   3.
>> >
>> >   Is a coprocessor-based approach reasonable for experimentation?
>> >   4.
>> >
>> >   Are there historical or architectural reasons why row cache never
>> landed
>> >   in HBase?
>> >
>> > Any feedback—positive or critical—is very welcome.
>> >
>> > Best regards,
>> >
>> > Vladimir Rodionov
>>
>>

Re: [DISCUSS] Row Cache for HBase 2.x/3.x – feedback from the community

Reply via email to