Hi all,
I still think that there is value in providing more expressive filters
out of the box. Especially if we wire metrics on the events subsystem.
It doesn't have to be implemented in CEL, if that's the concern. For
example, the Jakarta Expression Language is a well-established
standard and it can perfectly express things like:
namespace == "demo"
catalog == "prod" && namespace == "sales"
!table.startsWith("tmp_")
So I don't see this like a huge effort.
Just my 2 cents.
Thanks,
Alex
On Thu, Jun 11, 2026 at 2:40 AM Yufei Gu <[email protected]> wrote:
>
> I think the feature request makes sense, but I would separate a couple of
> concerns here.
>
> First, retention and purge feels like a different topic to me. Filtering
> can reduce the amount of data we write, but it should not be the mechanism
> we rely on for cleanup. Even with filtering, we still need a clear
> retention or purge story for persisted metrics.
>
> Second, I am not sure that table or namespace level filtering is a common
> enough use case to justify the extra configuration complexity. It may be
> useful for some troubleshooting cases, but it also makes the feature harder
> to explain and operate.
>
> My instinct is that filtering by metrics type, for example scan metrics vs
> commit metrics, may be a simpler and more useful first step. These two
> types can have different volume and different operational value, so
> allowing users to enable one without the other might cover most cases.
>
> Also, with the event based approach proposed in the other thread[1], users
> can already implement arbitrary filtering in custom event listeners. In
> that world, filtering on metrics type becomes trivial, since scan and
> commit metrics are naturally represented as different event types. That
> makes me wonder whether we need catalog, namespace, or table level
> filtering in Polaris itself right away.
>
> We can always add more fine grained filtering later if we see concrete
> demand, or rely on the custom event lisenter for special persistence
> requirement.
>
> 1. https://lists.apache.org/thread/x9j8nscvy8hq61tyn01mj8yp6n9of0kp
>
> Yufei
>
>
> On Mon, Jun 8, 2026 at 8:37 PM Dmitri Bourlatchkov <[email protected]> wrote:
>
> > Hi Yong,
> >
> > Include/exclude lists and glob patterns work as well.
> >
> > As for CEL, my understanding is that the previous community consensus was
> > to remove it, hence issue #3847.
> >
> > Let's see what other people think in this context given this fresh use
> > case.
> >
> > Cheers,
> > Dmitri.
> >
> > On Mon, Jun 8, 2026 at 10:04 PM Yong Zheng <[email protected]> wrote:
> >
> > > Hello Dmitri,
> > >
> > > I was thinking something along the lines of exposing catalog, namespace,
> > > and table name to CEL. For example:
> > > namespace == "demo"
> > > catalog == "prod" && namespace = "sales"
> > > !table.startsWith("tmp_")
> > >
> > > This would allow users to enable metrics for specific catalogs,
> > > namespaces, or tables during troubleshooting, or exclude noisy tables.
> > >
> > > That said, I'm not sure we need the flexibility of CEL right away. I'm
> > > wondering if we should start with something simpler, such as
> > > include/exclude lists or glob patterns, which may be easier to configure
> > > and understand.
> > >
> > > I'm pretty open-ended on the implementation. If we end up using CEL for
> > > maintenance utilities, it may make sense to use the same approach here as
> > > well so we can reuse the code and provide a consistent experience across
> > > features.
> > >
> > > Thanks,
> > > Yong Zheng
> > >
> > > On 2026/06/09 01:08:40 Dmitri Bourlatchkov wrote:
> > > > Hi Yong,
> > > >
> > > > The feature request sounds reasonable to me. I think other users would
> > > > appreciate this feature too.
> > > >
> > > > How do you envision defining these filters?
> > > >
> > > > I think CEL expressions can be a good fit. They are currently used in
> > > NoSQL
> > > > maintenance tasks, but concerns have been raised about using CEL
> > > (Nessie's
> > > > cel-java impl.) , which are tracked in [3847].
> > > >
> > > > [3847] https://github.com/apache/polaris/issues/3847
> > > >
> > > > Cheers,
> > > > Dmitri.
> > > >
> > > > On Sun, Jun 7, 2026 at 2:52 PM Yong Zheng <[email protected]> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > Currently we have polaris.iceberg-metrics.reporting and the ability
> > to
> > > > > persists those metrics to the backend. By default, this can be
> > enabled
> > > by
> > > > > change log level for org.apache.polaris.service.reporting to INFO for
> > > log
> > > > > based metrics and polaris.iceberg-metrics.reporting.type to
> > persistent
> > > if
> > > > > we want it to be persisted on the backend. Currently this setting is
> > > all or
> > > > > nothing. This means, with the settings enabled, all tables' metrics
> > > will be
> > > > > report/persist. Should we introduce a filter (include/exclusion type
> > > > > settings) which people can fine tune on what to include/exclude (and
> > > > > default to include all)?
> > > > >
> > > > > There are couple use cases such as:
> > > > > 1. exclude noise tables
> > > > > 2. enable metrics for a given namespace during troubleshooting
> > without
> > > > > enable all (e.g. only certain tables are user facing and we would
> > want
> > > to
> > > > > close monitor the performance metrics on them while other tables may
> > be
> > > > > batched and latency is not that sensitive compared to those)
> > > > > 3. avoid potential storage growth as there is lack of cleanup job atm
> > > > > (raised in a different ML) and avoid extra I/O to the backend RDS if
> > > > > metrics for majority of the tables are not necessary
> > > > >
> > > > > Thanks,
> > > > > Yong Zheng
> > > > >
> > > >
> > >
> >