Re: [DISCUSS] Row Cache for HBase 2.x/3.x – feedback from the community

Vladimir Rodionov Tue, 13 Jan 2026 10:20:24 -0800

Thanks for the pointer — I wasn’t aware of HBASE-29585 when I sent the
note, appreciate you sharing it.


>From a quick look, it seems the current effort is focused on a
Caffeine-based implementation integrated into HBase internals, which is
great to see moving forward.

The motivation behind my proposal was slightly different: exploring whether
a row cache could be implemented in a more non-intrusive way
(coprocessor-only), with minimal Java heap footprint and optional
compression (dictionary-based),
mainly to understand the trade-offs around metadata overhead and memory
density. It sounds like the core use case is already being addressed in the
ongoing work, so I won’t push this further on the list.
I’ll keep following HBASE-29585 with interest, and happy to share
observations if they turn out to be useful.


Thanks again for the reference.



On Tue, Jan 13, 2026 at 7:43 AM 张铎(Duo Zhang) <[email protected]> wrote:

> There is already a jira issue and a contributor is working on it.
>
> https://issues.apache.org/jira/browse/HBASE-29585
>
> Charles Connell via dev <[email protected]> 于2026年1月13日周二 23:02写道：
> >
> > I'm interesting in trying out the row cache for some of our data sets
> > at HubSpot. No timeline available yet, although I'm sure it will be
> > before the end of 2026. I'm excited to see what I can do for us.
> >
> > On Mon, Jan 12, 2026 at 10:52 PM Vladimir Rodionov
> > <[email protected]> wrote:
> > >
> > > Forgot to mention: Row Cache can be easily made cache implementation
> > > agnostic (Caffeine, EHCache) if it matters.
> > >
> > >
> > > On Mon, Jan 12, 2026 at 6:27 PM Vladimir Rodionov <
> [email protected]>
> > > wrote:
> > >
> > > > Andor, below, my answers to your questions:
> > > >
> > > > > Doesn't the benefits of row based caching strongly depend on the
> use
> > > > case?
> > > >
> > > > Sure. It's a point queries, not a scan operation. The repo, I posted
> the
> > > > link :
> > > >
> https://github.com/VladRodionov/hbase-row-cache/wiki/HBase:-Why-Block-Cache-Alone-Is-No-Longer-Enough-in-the-Cloud
> > > > where you can find numerous use cases, where row cache will be
> useful.
> > > >
> > > > > What’s the advantage if clients don’t always need the entire row
> just a
> > > > subset of cells?
> > > > Yes, this is a known limitation of a current version. There is an
> open
> > > >  ticket to support "sparse" rows. here:
> > > > https://github.com/VladRodionov/hbase-row-cache/issues/26
> > > >
> > > > > Is block cache more performant and memory efficient in this case?
> > > >
> > > > The only use case where block cache will be more performant is a scan
> > > > operation, which involves multiple rows. These caches are
> complementary,
> > > > not mutually exclusive. Row Cache has a serious advantage in point
> queries
> > > > (It can do upto 100 Kops on full row reads, where each row is 3
> families
> > > > with 3 columns and 10 versions).  Block cache is more suitable for
> larger
> > > > operations, such as a scan of multiple rows.
> > > > Row cache can be enabled/disabled per table and per table's column
> > > > families.
> > > >
> > > > From RAM usage perspective, Row Cache (Carrot Cache) uses advanced
> data
> > > > compression scheme (zstd with dictionary), which usually allows to
> save  an
> > > > additional 40-50% RAM
> > > > compared to all non-dictionary based compression algorithms. It
> works well
> > > > even if the individual data item is less than 100 bytes. Hbase Block
> Cache
> > > > (Bucket Cache)
> > > > uses this type of compression as well (maybe I am wrong here?), but
> it
> > > > compresses the whole block.
> > > >
> > > > Performance-wise, I think Row Cache should be much faster than Block
> Cache
> > > > if blocks cached are compressed (you will need to decompress and
> decode the
> > > > whole block on a point read).
> > > >
> > > > Another limitation of a Block (Bucket) cache is a high meta-data
> overhead
> > > > (like 100+ bytes vs 12-16 bytes in Row Cache) All meta data in Row
> Cache
> > > > (Carrot Cache) is off-heap as well.
> > > >
> > > > The repo has nice write up for when Row Cache is more preferable
> than a
> > > > Block cache.
> > > >
> > > >
> > > >
> > > > On Mon, Jan 12, 2026 at 5:27 PM Andor Molnár <[email protected]>
> wrote:
> > > >
> > > >> Thanks Vladimir.
> > > >>
> > > >> I think this would be a great addition to HBase.
> > > >>
> > > >> Doesn't the benefits of row based caching strongly depend on the use
> > > >> case?
> > > >> What’s the advantage if clients don’t always need the entire row
> just a
> > > >> subset of cells?
> > > >> Is block cache more performant and memory efficient in this case?
> > > >>
> > > >> Regards,
> > > >> Andor
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> > On Jan 4, 2026, at 13:02, Vladimir Rodionov <
> [email protected]>
> > > >> wrote:
> > > >> >
> > > >> > Hello HBase community,
> > > >> >
> > > >> > I’d like to start a discussion around a feature that exists in
> related
> > > >> > systems but is still missing in Apache HBase: row-level caching.
> > > >> >
> > > >> > Both *Cassandra* and *Google Bigtable* provide a row cache for
> hot rows.
> > > >> > Bigtable recently revisited this area and reported measurable
> gains for
> > > >> > single-row reads. HBase today relies almost entirely on *block
> cache*,
> > > >> > which is excellent for scans and predictable access patterns, but
> can be
> > > >> > inefficient for *small random reads*, *hot rows spanning multiple
> > > >> blocks*,
> > > >> > and *cloud / object-store–backed deployments*.
> > > >> >
> > > >> > To explore this gap, I’ve been working on an *HBase Row Cache for
> HBase
> > > >> 2.x*,
> > > >> > implemented as a *RegionObserver coprocessor*, and I’d appreciate
> > > >> feedback
> > > >> > from HBase developers and operators.
> > > >> >
> > > >> > *Project*:
> > > >> >
> > > >> > https://github.com/VladRodionov/hbase-row-cache
> > > >> >
> > > >> >
> > > >> > *Background / motivation (cloud focus):*
> > > >> >
> > > >> >
> > > >>
> https://github.com/VladRodionov/hbase-row-cache/wiki/HBase:-Why-Block-Cache-Alone-Is-No-Longer-Enough-in-the-Cloud
> > > >> >
> > > >> > What This Is
> > > >> >
> > > >> >
> > > >> >   -
> > > >> >
> > > >> >   Row-level cache for HBase 2.x (coprocessor-based)
> > > >> >   -
> > > >> >
> > > >> >   Powered by *Carrot Cache* (mostly off-heap, GC-friendly)
> > > >> >   -
> > > >> >
> > > >> >   Multi-level cache (L1/L2/L3)
> > > >> >   -
> > > >> >
> > > >> >   Read-through caching of table : rowkey : column-family
> > > >> >   -
> > > >> >
> > > >> >   Cache invalidation on any mutation of the corresponding row+CF
> > > >> >   -
> > > >> >
> > > >> >   Designed for *read-mostly, random-access* workloads
> > > >> >   -
> > > >> >
> > > >> >   Can be enabled per table or per column family
> > > >> >   -
> > > >> >
> > > >> >   Typically used *instead of*, not alongside, block cache
> > > >> >
> > > >> > *Block Cache vs Row Cache (Conceptual)*
> > > >> >
> > > >> > *Aspect*
> > > >> >
> > > >> > *Block Cache*
> > > >> >
> > > >> > *Row Cache*
> > > >> >
> > > >> > Cached unit
> > > >> >
> > > >> > HFile block (e.g. 64KB)
> > > >> >
> > > >> > Row / column family
> > > >> >
> > > >> > Optimized for
> > > >> >
> > > >> > Scans, sequential access
> > > >> >
> > > >> > Random small reads, hot rows
> > > >> >
> > > >> > Memory efficiency for small reads
> > > >> >
> > > >> > Low (unused data in blocks)
> > > >> >
> > > >> > High (cache only requested data)
> > > >> >
> > > >> > Rows spanning multiple blocks
> > > >> >
> > > >> > Multiple blocks cached
> > > >> >
> > > >> > Single cache entry
> > > >> >
> > > >> > Read-path CPU cost
> > > >> >
> > > >> > Decode & merge every read
> > > >> >
> > > >> > Amortized across hits
> > > >> >
> > > >> > Cloud / object store fit
> > > >> >
> > > >> > Necessary but expensive
> > > >> >
> > > >> > Reduces memory & I/O amplification
> > > >> >
> > > >> > Block cache remains essential; row cache targets a *different
> > > >> optimization
> > > >> > point*.
> > > >> >
> > > >> > *Non-Goals (Important)*
> > > >> >
> > > >> >
> > > >> >   -
> > > >> >
> > > >> >   Not proposing removal or replacement of block cache
> > > >> >   -
> > > >> >
> > > >> >   Not suggesting this be merged into HBase core
> > > >> >   -
> > > >> >
> > > >> >   Not targeting scan-heavy or sequential workloads
> > > >> >   -
> > > >> >
> > > >> >   Not eliminating row reconstruction entirely
> > > >> >   -
> > > >> >
> > > >> >   Not optimized for write-heavy or highly mutable tables
> > > >> >   -
> > > >> >
> > > >> >   Not changing HBase storage or replication semantics
> > > >> >
> > > >> > This is an *optional optimization* for a specific class of
> workloads.
> > > >> >
> > > >> > *Why I’m Posting*
> > > >> >
> > > >> > This is *not a merge proposal*, but a request for discussion:
> > > >> >
> > > >> >
> > > >> >   1.
> > > >> >
> > > >> >   Do you see *row-level caching* as relevant for modern HBase
> > > >> deployments?
> > > >> >   2.
> > > >> >
> > > >> >   Are there workloads where block cache alone is insufficient
> today?
> > > >> >   3.
> > > >> >
> > > >> >   Is a coprocessor-based approach reasonable for experimentation?
> > > >> >   4.
> > > >> >
> > > >> >   Are there historical or architectural reasons why row cache
> never
> > > >> landed
> > > >> >   in HBase?
> > > >> >
> > > >> > Any feedback—positive or critical—is very welcome.
> > > >> >
> > > >> > Best regards,
> > > >> >
> > > >> > Vladimir Rodionov
> > > >>
> > > >>
>

Re: [DISCUSS] Row Cache for HBase 2.x/3.x – feedback from the community

Reply via email to