Hi Vignesh,

> Is the lower-level CacheBackend considered the intended caching
> strategy for NoSQL, [...]

Yes.

> or is there still interest in also supporting InMemoryEntityCache
> for consistency with the other backends?

I'm not sure the term "consistency" is applicable here.

One can certainly put the Entity Cache on top of NoSQL Persistence,
however, the underlying impl. is very different between NoSQL and JDBC.

So, even with the Entity Cache in the call path, I do not think we can talk
about
consistency in impl. between NoSQL and JDBC.

What does the Entity Cache achieve?

IMHO, it prevents multiple database lookups at the Persistence layer issued
by
the Resolver.

The NoSQL backend cache achieves the same effect.

Do you see another effect of the Entity Cache that may be worth implementing
on the NoSQL call paths?

> Are there any benchmarks or hit-rate numbers comparing the
> NoSQL persistence cache vs the JDBC/TreeMap entity cache?

I'm not sure it is technically possible to compare the performance of the
Entity Cache vs. the NoSQL backend cache in isolation. Both caches work
on different call paths. I believe meaningful comparisons are possible only
at the common API level, which is the REST Catalog API.

Pierre created a nice benchmarking tool for that [1].

Unfortunately, it does not look like anyone is available these days to run
those benchmarks with scientific rigour :)

If you're interested, please do make such a comparison and we can certainly
discuss this in more details. This is probably going to be an iterative
process.
I'm posting some preliminary thoughts below.

Attention to the env. setup and data collection / analysis is going to be
essential to make the JDBC vs. NoSQL comparison meaningful.

I'd suggest creating the testbed so that all resources are utilized well
below
their limits (Network, Disk, CPU, Memory) and compare response times.

Another approach is to load the system until first failure and compare
saturated
requests per second.

[1] https://github.com/apache/polaris-tools/tree/main/benchmarks

Cheers,
Dmitri.

On Thu, Jun 25, 2026 at 8:17 AM vignesh a <[email protected]> wrote:

> Hi Dmitri,
>
> Thanks for the pointer. *Now understand the split.*
>
> At the PolarisMetaStoreManager / Resolver layer, my original observation
> holds:
> NoSqlMetaStoreManagerFactory.getOrCreateEntityCache() returns null, and
> NoSqlMetaStoreManager doesn't implement the change-tracking methods that
> InMemoryEntityCache needs. So Resolver bypasses the entity cache that JDBC
> and
> TreeMap backends use.
>
> However, as you noted, NoSQL has caching one level down. The per-realm
> Persistence
> is wrapped by PersistenceCacheDecorator → CachingPersistenceImpl backed by
> CaffeineCacheBackend (enabled by default via
> polaris.persistence.cache.enable).
> This cache intercepts fetches and reference lookups and invalidates on
> writes.
>
> A couple of questions before I follow up on #4874:
>
> Is the lower-level CacheBackend considered the intended caching strategy
> for NoSQL,
> or is there still interest in also supporting InMemoryEntityCache for
> consistency
> with the other backends?
>
> Are there any benchmarks or hit-rate numbers comparing the NoSQL
> persistence cache
> vs the JDBC/TreeMap entity cache? It would be useful to know how different
> the
> behavior is under load.
>
> If the current design is intentional and performs well, I'll update the
> issue with
> a summary and possibly send a small docs PR clarifying the NoSQL caching
> architecture.
>
> Cheers,
> Vignesh
>
> On Wed, 24 Jun 2026 at 04:30, Dmitri Bourlatchkov <[email protected]>
> wrote:
>
> > Hi Vignesh,
> >
> > NoSQL Persistence has caching at a different level [1]. Not every
> > Persistence SPI call hits the database.
> >
> > [1]
> >
> >
> https://github.com/apache/polaris/blob/c6d966d6a701e356284671d81d5ce1af94bf8e7e/persistence/nosql/persistence/api/src/main/java/org/apache/polaris/persistence/nosql/api/cache/CacheBackend.java#L35
> >
> > Cheers,
> > Dmitri.
> >
> > On Tue, Jun 23, 2026 at 4:37 PM vignesh a <[email protected]>
> wrote:
> >
> > > Hi all,
> > >
> > > I opened GitHub issue #4874 <
> > https://github.com/apache/polaris/issues/4874
> > > >
> > > after noticing something while reading through the metastore
> > > implementations, and I wanted to get some feedback before diving into a
> > PR.
> > >
> > > From what I can tell, the NoSQL backend currently doesn't use the
> entity
> > > cache at all. Since NoSqlMetaStoreManager doesn't implement change
> > > tracking, NoSqlMetaStoreManagerFactory ends up returning null instead
> of
> > > creating an InMemoryEntityCache.
> > >
> > > That means every Resolver operation - principal lookups, catalog
> > > resolution, privilege checks, location validation, and so on - goes
> > > directly to the backing store. By comparison, the JDBC and in-memory
> > > TreeMap implementations both benefit from the existing cache.
> > >
> > > The details are in issue #4874
> > > <https://github.com/apache/polaris/issues/4874>, but I was curious
> about
> > > the intent here.
> > >
> > > A few questions:
> > >
> > >    -
> > >
> > >    Is this a known limitation, or is it something that simply hasn't
> been
> > >    addressed yet?
> > >    -
> > >
> > >    Is the expected long-term solution to add change tracking support
> for
> > >    NoSQL?
> > >    -
> > >
> > >    Has anyone considered a lighter-weight approach for NoSQL caching,
> or
> > >    are there consistency concerns that make that undesirable?
> > >    -
> > >
> > >    More generally, should we expect similar performance characteristics
> > >    across the supported metastore backends, or is this difference
> > > intentional?
> > >
> > > The NoSQL backend is a supported production backend, so the lack of
> > caching
> > > stood out to me as a potentially significant behavioral difference
> rather
> > > than just an implementation detail.
> > >
> > > I'd appreciate any context before I spend time exploring solutions.
> > >
> > > Thanks,
> > >
> > > Vignesh
> > >
> >
>

Reply via email to