Re: [DISCUSS] Facilitate the forwarding use cases of Iceberg Scan and Commit Metrics via Event

Alexandre Dutra Mon, 15 Jun 2026 09:09:17 -0700

Hey Romain,

> Any planned default (in official docker image) alternative to EL?


I don't think so.

Re: javascript / graaljs, I'm indeed afraid the final image size will
grow considerably, not to mention CPU overhead and dealing with all
the CVEs. The nice thing about Jakarta EL is that it is fast,
lightweight and already included in the server image.

I think CEL would be a great fit, but CEL has met some pushback
because the historical impl is hosted under Project Nessie [1]. That
said, the Google CEL impl [2] seems to have gained some momentum
lately, that could maybe be a good alternative (but I never tried it).

> I still wonder why there are these factories whereas the CDI container does 
> handle it already well enough?

If I'm getting your question right, it's because the same filter can
be instantiated with N different configurations, e.g.:

polaris.event-filter.filter1.type = jakarta-el
polaris.event-filter.filter2.type = jakarta-el
polaris.event-filter.filter1.include=attributes.catalog_name == "dev"
polaris.event-filter.filter2.include=attributes.catalog_name == "prod"

Thanks,
Alex

[1]: https://lists.apache.org/thread/f28pp26s3mrgwpw2zrvq7snzr08z78sr
[2]: https://github.com/google/cel-java

On Mon, Jun 15, 2026 at 5:32 PM Romain Manni-Bucau
<[email protected]> wrote:
>
> Hi Alexandre,
>
> Any planned default (in official docker image) alternative to EL? got
> proven not sufficient as soon as you want some logic flow by my past
> experience.
> Out of my head I can think to jython or javascript (using graaljs
> extension) but it is a bit fat-ty - but both are sandboxable which is key
> if expressions can be injected from end users at some point.
> Lua is way lighter to embed but language is less known and less user
> friendly but can still be a compromise.
>
> Side question: I still wonder why there are these factories whereas the CDI
> container does handle it already well enough? Would avoid to have to
> specialize the default factory adding a new factory which is always nicer
> for vendors/integrators.
>
> Romain Manni-Bucau
> @rmannibucau <https://x.com/rmannibucau> | .NET Blog
> <https://dotnetbirdie.github.io/> | Blog <https://rmannibucau.github.io/> | 
> Old
> Blog <http://rmannibucau.wordpress.com> | Github
> <https://github.com/rmannibucau> | LinkedIn
> <https://www.linkedin.com/in/rmannibucau> | Book
> <https://www.packtpub.com/en-us/product/java-ee-8-high-performance-9781788473064>
> Javaccino founder (Java/.NET service - contact via linkedin)
>
>
> Le lun. 15 juin 2026 à 15:45, Alexandre Dutra <[email protected]> a écrit :
>
> > Hi Yong, hi all,
> >
> > Since the ability to filter seems to be of concern, I went ahead and
> > implemented an EventFilter API with a first implementation based on
> > Jakarta EL:
> >
> > https://github.com/apache/polaris/pull/4773
> >
> > In the above PR, event filters form a composable chain:
> >
> > emitter -> filters (0..N) -> sanitizer (0..1) -> listener
> >
> > It's easy enough to create another EventFilter implementation to do
> > some event sampling, as you suggested.
> >
> > (Note: the goal is to do the same with sanitizers and make them 0..N
> > as well in the delivery pipeline.)
> >
> > Thanks,
> > Alex
> >
> > On Mon, Jun 15, 2026 at 8:13 AM Yong Zheng <[email protected]>
> > wrote:
> > >
> > > Hello team,
> > >
> > > Thanks Yufei for the summary and for raising this discussion.
> > >
> > > One concern I have is around filtering. Without some form of filtering or
> > > sampling, blindly writing every metrics event to logs (when debug logging
> > > is enabled) or persisting every metric to a backend could introduce
> > > significant overhead.
> > >
> > > For Polaris itself, processing and forwarding large volumes of scan
> > metrics
> > > can consume resources that would otherwise be available for serving
> > catalog
> > > requests. For deployments that persist metrics to a database, the
> > > additional storage, indexing, and write workload can also consume compute
> > > resources that could be used for core catalog operations instead.
> > >
> > > This is one of the reasons I think filtering should remain part of the
> > > discussion regardless of whether metrics are delivered through JDBC
> > > persistence, the event framework, or another implementation. High-volume
> > > scan metrics can easily generate far more traffic than many deployments
> > > actually need to retain or analyze.
> > >
> > > I agree with the SPI direction, but I do not think it addresses the
> > > underlying scalability concern by itself. The SPI provides flexibility in
> > > how metrics are handled, but the load is ultimately determined by the
> > > implementation behind it. Without some form of filtering or sampling,
> > > high-volume scan metrics can still generate substantial load regardless
> > of
> > > whether the implementation uses JDBC persistence, event forwarding,
> > > logging, or something else.
> > >
> > > Thanks,
> > > Yong Zheng
> > >
> > > On Thu, Jun 11, 2026 at 10:30 PM EJ Wang <[email protected]
> > >
> > > wrote:
> > >
> > > > Hi Yufei, Alex,
> > > >
> > > > Thanks Yufei for writing this up, and Alex for spelling out the
> > operational
> > > > concerns. My read is that both points are compatible if we are clear
> > about
> > > > the layering.
> > > >
> > > > I agree that Iceberg scan/commit metrics often behave like structured
> > > > telemetry events: append-only, high-volume, usually consumed
> > > > asynchronously, and often forwarded to external systems.
> > Events/listeners
> > > > are a natural fit for that kind of delivery path.
> > > >
> > > > I also agree with Alex that event delivery does not make persistence,
> > > > filtering, retention, payload sizing, or performance free. Those are
> > real
> > > > concerns, especially for high-volume scan reports.
> > > >
> > > > The way I would reconcile these is to distinguish the default battery
> > from
> > > > extension implementations.
> > > >
> > > > The latest metrics sync alignment
> > > > <
> > > >
> > https://docs.google.com/document/d/100h7c4damrUzVuquYbBHM0EvA4LSWuW2IT2dN_7nYVA/edit?pli=1&tab=t.k96s2xyqr5u1#heading=h.uvb454otvxc0
> > > > >
> > > > was not that Polaris should pick JDBC, events, or external telemetry
> > as the
> > > > one built-in metrics subsystem. It was closer to: Polaris should
> > define a
> > > > clean metrics reporting/emitting boundary, ship a small safe default,
> > and
> > > > let deployments choose implementation paths behind that boundary.
> > > >
> > > > Under that framing, I would not make event
> > > > forwarding/Prometheus/Grafana/custom routing the default battery
> > itself. I
> > > > would frame it as a useful non-default extension implementation of the
> > > > metrics reporting/emitting path.
> > > >
> > > > Concretely, I think the split could be:
> > > >
> > > > 1.  Polaris exposes a stable Iceberg metrics reporting/emitting SPI.
> > > > 2.  The built-in default battery stays minimal: based on the latest
> > notes,
> > > > no-op or log-only is enough as the safe OSS default.
> > > > 3.  Durable JDBC metrics storage is one named extension implementation
> > of
> > > > that SPI, not part of core persistence.
> > > > 4.  Event-based forwarding can be another named extension
> > implementation of
> > > > that SPI, where the listener/extension owns delivery, filtering,
> > retention,
> > > > payload handling, and destination-specific behavior.
> > > >
> > > > That keeps the useful part of Yufei's proposal: deployments that want
> > > > Grafana/dashboard integration or custom telemetry routing can choose an
> > > > event/listener-based implementation. It also keeps Alex's concerns
> > scoped
> > > > to the implementation that chooses that delivery model, instead of
> > making
> > > > them requirements for every Polaris deployment or for the built-in
> > default.
> > > > So I am generally +1 on exploring the event-forwarding path, with the
> > > > layering caveat that I would treat it as an extension implementation
> > of the
> > > > metrics reporting/emitting SPI, not as replacing the default battery or
> > > > collapsing metrics into core event persistence.
> > > >
> > > > Once that boundary is clear, which I'm pushing in PR4115
> > > > <
> > https://github.com/apache/polaris/pull/4115#pullrequestreview-4481873839
> > > > >,
> > > > integrations become implementation choices rather than architectural
> > > > changes.
> > > >
> > > > Thanks,
> > > > -ej
> > > >
> > > > On Thu, Jun 11, 2026 at 5:41 AM Alexandre Dutra <[email protected]>
> > wrote:
> > > >
> > > > > > listeners can already implement whatever filtering logic they need
> > > > >
> > > > > True, but I think they would be reinventing the wheel quite often.
> > > > > There are some common filtering patterns such as filtering by
> > catalog,
> > > > > namespace or table names or IDs. If we could provide this filter out
> > > > > of the box, that would be beneficial to many listeners.
> > > > >
> > > > > Thanks,
> > > > > Alex
> > > > >
> > > > >
> > > > > On Thu, Jun 11, 2026 at 3:35 AM Yufei Gu <[email protected]>
> > wrote:
> > > > > >
> > > > > > Thanks all for the feedback! It seems we have some initial
> > consensus
> > > > that
> > > > > > using the event framework for metrics delivery is a reasonable
> > > > direction
> > > > > > worth exploring. Most of the discussion now appears to be around
> > impl
> > > > > > details and operational considerations.
> > > > > >
> > > > > > 1. Benchmarking is a great idea, using the existing tool makes
> > sense. I
> > > > > > don't see it as a blocker though. The volume of scan metrics
> > should be
> > > > > > similar to, or even lower than, the volume of LoadTable requests.
> > Some
> > > > > > clients may not send scan metrics at all. If we're comfortable
> > > > supporting
> > > > > > LoadTable events, I'm not sure why metrics events would require a
> > > > > > fundamentally different validation path, though benchmarking would
> > > > > > certainly help us tune the event bus and listener configuration.
> > > > > >
> > > > > > 2. I agree that separating the datasource for event and metrics
> > > > > persistence
> > > > > > is an active and worthwhile discussion. I think we should continue
> > that
> > > > > > work regardless of the direction we take here.
> > > > > >
> > > > > > 3. Agreed on evaluating payload sizes. That said, it doesn't seem
> > like
> > > > a
> > > > > > major concern to me given that we already support larger payloads
> > in
> > > > some
> > > > > > existing events.
> > > > > >
> > > > > > 4. Filtering is a valid use case. My thinking is that custom event
> > > > > > listeners can already implement whatever filtering logic they
> > need. I'm
> > > > > not
> > > > > > sure we need a generic filtering framework in Polaris itself yet,
> > but
> > > > I'm
> > > > > > open to further discussion if we find common requirements across
> > > > > > deployments.
> > > > > >
> > > > > > 5. Schema migration is a good point and something we should keep in
> > > > mind
> > > > > if
> > > > > > metrics are persisted.
> > > > > >
> > > > > > 6. I also agree with Dmitri that we can continue improving the
> > RDBMS
> > > > > schema
> > > > > > evolution story. That feels largely orthogonal to this proposal, so
> > > > > perhaps
> > > > > > it's best discussed in a separate thread.
> > > > > >
> > > > > > Thanks,
> > > > > > Yufei
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 10, 2026 at 12:56 PM Dmitri Bourlatchkov <
> > [email protected]
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > +1 to all points from Alex's email.
> > > > > > >
> > > > > > > Re: Metrics Persistence I believe we ought to make it as smooth
> > as
> > > > > possible
> > > > > > > from the Polaris code maintenance perspective. Therefore, I
> > propose
> > > > > > > starting the work to isolate the existing metrics schema from the
> > > > > MetaStore
> > > > > > > schema in parallel with the event bus work. I think it will be
> > > > > beneficial
> > > > > > > in its own right, regardless of how the event bus work
> > progresses.
> > > > > > >
> > > > > > > PR [4397] is but the first step in that direction.
> > > > > > >
> > > > > > > Side note: we probably do not need to copy the whole schema SQL
> > file
> > > > on
> > > > > > > every revision, but I'm contemplating starting a separate thread
> > on
> > > > > that.
> > > > > > >
> > > > > > > Once a separate metrics schema is established, I think it will be
> > > > > natural
> > > > > > > to also allow it to be on a different JDBC DataSource than the
> > > > > MetaStore
> > > > > > > schema.
> > > > > > >
> > > > > > > If the event bus work is successful, JDBC Metrics Persistence can
> > > > > become
> > > > > > > one of possibly many consumers for metrics events.
> > > > > > >
> > > > > > > With this approach, it should also be possible to write metrics
> > to
> > > > the
> > > > > > > database in batches. IIRC, Venkateshwaran brought this point up
> > in
> > > > the
> > > > > > > latest Metrics Sync meeting.
> > > > > > >
> > > > > > > Metrics filtering can probably progress in parallel too. I think
> > it
> > > > is
> > > > > a
> > > > > > > useful feature.
> > > > > > >
> > > > > > > [4397] https://github.com/apache/polaris/pull/4397
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Dmitri.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jun 10, 2026 at 9:56 AM Alexandre Dutra <
> > [email protected]>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Yufei,
> > > > > > > >
> > > > > > > > The proposal to leverage the events subsystem for metrics
> > delivery
> > > > is
> > > > > > > > quite appealing, though it requires a thorough evaluation
> > regarding
> > > > > > > > potential performance overhead.
> > > > > > > >
> > > > > > > > My primary considerations are as follows:
> > > > > > > >
> > > > > > > > 1) Given that scan reports can trigger a high volume of
> > events, we
> > > > > > > > should conduct rigorous testing, potentially using the Polaris
> > > > > > > > benchmark tool. We need to determine what's the right
> > configuration
> > > > > > > > for the event bus and for the event listener executor.
> > > > > > > >
> > > > > > > > 2) While the events subsystem handles dispatch and delivery
> > > > natively,
> > > > > > > > it doesn't give persistence for free. My recollection is that
> > we
> > > > were
> > > > > > > > pursuing the idea of a metrics persistence system with a unique
> > > > > schema
> > > > > > > > and possibly a separate datasource, a process initiated by a
> > > > recently
> > > > > > > > merged PR [1]. Is that still the case? Furthermore, we'd need
> > to
> > > > > > > > implement data retention and purging, including for the current
> > > > > events
> > > > > > > > table [2].
> > > > > > > >
> > > > > > > > 3) If we consider the events table for metrics storage, we must
> > > > > > > > evaluate average payload sizes. Although a PR [3] was
> > introduced to
> > > > > > > > prune large payloads (such as table metadata), this
> > functionality
> > > > is
> > > > > > > > still in its early stages and will evolve. Similar pruning
> > would be
> > > > > > > > necessary for metrics reports if they are big.
> > > > > > > >
> > > > > > > > 4) As Yong suggested [4], we may still require more
> > sophisticated
> > > > > > > > metrics filtering. The events subsystem currently only allows
> > > > > > > > filtering by event type or event category, which may not be
> > > > granular
> > > > > > > > enough for our needs (as of today, it would allow only to
> > > > distinguish
> > > > > > > > scan vs metrics reports). In that regard, I would welcome the
> > > > > > > > opportunity to implement a generic EventFilter interface with a
> > > > > > > > default implementation based on CEL.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Alex
> > > > > > > >
> > > > > > > > [1]: https://github.com/apache/polaris/pull/4397
> > > > > > > > [2]:
> > > > > https://lists.apache.org/thread/krmddx8myov926sd0mbh4ogy8sdgrfgq
> > > > > > > > [3]: https://github.com/apache/polaris/pull/4225
> > > > > > > > [4]:
> > > > > https://lists.apache.org/thread/ogskc1szctkg5n0tdj0cm3pfkowcwx4z
> > > > > > > >
> > > > > > > > On Wed, Jun 10, 2026 at 2:04 AM Yufei Gu <[email protected]
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I've been thinking about how Polaris should support Iceberg
> > scan
> > > > > and
> > > > > > > > commit
> > > > > > > > > metrics. A few challenges have come up in recent discussions:
> > > > > > > > > 1. Sync metrics persistence chokes Polaris persistence due
> > to the
> > > > > high
> > > > > > > > > volume of scan metrics [3].
> > > > > > > > > 2. We spent considerable time figuring out the metrics
> > > > persistence,
> > > > > > > > > including the schema, SPIs, REST APIs [4].
> > > > > > > > > 3. Metric filtering remains a challenge [1].
> > > > > > > > > 4. We need to figure out how to purge metrics because they
> > keep
> > > > > growing
> > > > > > > > [2].
> > > > > > > > >
> > > > > > > > > Looking at these challenges, most of them are not really
> > metrics
> > > > > > > > problems.
> > > > > > > > > They are transport, delivery, retention, and lifecycle
> > problems
> > > > > that
> > > > > > > the
> > > > > > > > > existing event framework already addresses. I'd like to
> > propose
> > > > > using
> > > > > > > the
> > > > > > > > > event system to facilitate the current use cases of Iceberg
> > scan
> > > > > and
> > > > > > > > commit
> > > > > > > > > metrics rather than introducing a separate Polaris metrics
> > > > > subsystem.
> > > > > > > The
> > > > > > > > > metrics for current use cases are fundamentally events with
> > > > > structured
> > > > > > > > > telemetry attached. They are append only, generated by IRC
> > > > > endpoints,
> > > > > > > > > typically consumed asynchronously, and often forwarded to
> > > > external
> > > > > > > > systems.
> > > > > > > > > Since Polaris already needs to support them as part of IRC,
> > > > > treating
> > > > > > > them
> > > > > > > > > as event types seems like a natural fit.
> > > > > > > > >
> > > > > > > > > More importantly, I think Polaris should remain a catalog
> > service
> > > > > and
> > > > > > > > > telemetry producer rather than a metrics warehouse. Instead
> > of
> > > > > > > > introducing
> > > > > > > > > a dedicated metrics subsystem along with storage, retention,
> > > > > query, and
> > > > > > > > > scaling concerns, we could build on the existing event
> > framework:
> > > > > > > > >
> > > > > > > > >    - Emit them through the existing event mechanism. We will
> > do
> > > > > that
> > > > > > > > anyway
> > > > > > > > >    given it's an IRC endpoint.
> > > > > > > > >    - Let custom event listeners route them to the
> > destination of
> > > > > > > choice,
> > > > > > > > >    such as Prometheus, Grafana, RDBMSs, or other systems.
> > > > > > > > >    - Reuse the existing event lifecycle, retention, and
> > delivery
> > > > > > > models.
> > > > > > > > If
> > > > > > > > >    temporary persistence is still required, the existing
> > event
> > > > > table
> > > > > > > can
> > > > > > > > serve
> > > > > > > > >    that purpose. The payload size is manageable given that we
> > > > have
> > > > > put
> > > > > > > > the
> > > > > > > > >    loadTable/LoadView response in events.
> > > > > > > > >
> > > > > > > > > This approach also gives deployments flexibility to filter,
> > > > > sample, or
> > > > > > > > > redirect high volume scan metrics without Polaris needing
> > backend
> > > > > > > > specific
> > > > > > > > > metric storage behavior. For example, event listeners can
> > choose
> > > > > which
> > > > > > > > > metric events to process. We don't need to implement metric
> > > > > filtering
> > > > > > > > logic
> > > > > > > > > [1].
> > > > > > > > >
> > > > > > > > > In short, my proposal is: Events provide the transport and
> > > > > lifecycle
> > > > > > > > > mechanism, while downstream metrics systems remain
> > responsible
> > > > for
> > > > > > > > storage,
> > > > > > > > > querying, aggregation, and visualization.
> > > > > > > > >
> > > > > > > > > Curious what others think.
> > > > > > > > >
> > > > > > > > > 1.
> > > > > https://lists.apache.org/thread/ogskc1szctkg5n0tdj0cm3pfkowcwx4z
> > > > > > > > > 2.
> > > > > https://lists.apache.org/thread/5nst0f2ygnl2gj3j910q7m8nk2fvokc7
> > > > > > > > > 3.
> > > > > https://lists.apache.org/thread/zp2rvsdkq3mb46722o0hfl0zh7kdqyr8
> > > > > > > > > 4.
> > > > > https://lists.apache.org/thread/qj1y7cw4dygcnczmymdwkfkp4ysq41ts
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yufei
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> >

Re: [DISCUSS] Facilitate the forwarding use cases of Iceberg Scan and Commit Metrics via Event

Reply via email to