Forgot to add one thing - we need to make sure data migration handles any changes to persisted metrics. I know it is left to the implementor to do, but this will be a breaking schema change.
- anand From: Anand Kumar Sankaran <[email protected]> Date: Wednesday, June 10, 2026 at 11:40 AM To: [email protected] <[email protected]> Subject: Re: [DISCUSS] Facilitate the forwarding use cases of Iceberg Scan and Commit Metrics via Event Hi Yufei, My first proposal and PR for metrics persistence suggested we use the events framework to persist metrics 😊. At that time, it was decided that these are two separate approaches and should be treated differently. I am generally supportive of this approach, but as Alex suggests below, having a separate datasource for metrics and events (irrespective of collapsing this into events) will definitely help. — anand From: Alexandre Dutra <[email protected]> Date: Wednesday, June 10, 2026 at 6:56 AM To: [email protected] <[email protected]> Subject: Re: [DISCUSS] Facilitate the forwarding use cases of Iceberg Scan and Commit Metrics via Event This Message Is From an External Sender This message came from outside your organization. Report Suspicious<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/Iz9xO38YGHZK!YhNDZABMXipHYV_q1eWLF1bliIlCRvD1srryLwCDVxEYkWpe5VwmKgHg7CKZc3YwMxS6aKj8dMPJE3ghkjFoxXXUcXEnKXbEXoQZz4fWqqLRdjJ6AyPaPYpWhdenz_32$> Hi Yufei, The proposal to leverage the events subsystem for metrics delivery is quite appealing, though it requires a thorough evaluation regarding potential performance overhead. My primary considerations are as follows: 1) Given that scan reports can trigger a high volume of events, we should conduct rigorous testing, potentially using the Polaris benchmark tool. We need to determine what's the right configuration for the event bus and for the event listener executor. 2) While the events subsystem handles dispatch and delivery natively, it doesn't give persistence for free. My recollection is that we were pursuing the idea of a metrics persistence system with a unique schema and possibly a separate datasource, a process initiated by a recently merged PR [1]. Is that still the case? Furthermore, we'd need to implement data retention and purging, including for the current events table [2]. 3) If we consider the events table for metrics storage, we must evaluate average payload sizes. Although a PR [3] was introduced to prune large payloads (such as table metadata), this functionality is still in its early stages and will evolve. Similar pruning would be necessary for metrics reports if they are big. 4) As Yong suggested [4], we may still require more sophisticated metrics filtering. The events subsystem currently only allows filtering by event type or event category, which may not be granular enough for our needs (as of today, it would allow only to distinguish scan vs metrics reports). In that regard, I would welcome the opportunity to implement a generic EventFilter interface with a default implementation based on CEL. Thanks, Alex [1]: https://urldefense.com/v3/__https://github.com/apache/polaris/pull/4397__;!!Iz9xO38YGHZK!5dsRyjNLlS4tJmGfg1X9V25V-tvijZY1hFe6PXH2IiLM0IhA-MIEfM35XbkSExf1DGhDbpai60c236zCPoc$ [2]: https://urldefense.com/v3/__https://lists.apache.org/thread/krmddx8myov926sd0mbh4ogy8sdgrfgq__;!!Iz9xO38YGHZK!5dsRyjNLlS4tJmGfg1X9V25V-tvijZY1hFe6PXH2IiLM0IhA-MIEfM35XbkSExf1DGhDbpai60c21xAa8l0$ [3]: https://urldefense.com/v3/__https://github.com/apache/polaris/pull/4225__;!!Iz9xO38YGHZK!5dsRyjNLlS4tJmGfg1X9V25V-tvijZY1hFe6PXH2IiLM0IhA-MIEfM35XbkSExf1DGhDbpai60c2pIpKNVY$ [4]: https://urldefense.com/v3/__https://lists.apache.org/thread/ogskc1szctkg5n0tdj0cm3pfkowcwx4z__;!!Iz9xO38YGHZK!5dsRyjNLlS4tJmGfg1X9V25V-tvijZY1hFe6PXH2IiLM0IhA-MIEfM35XbkSExf1DGhDbpai60c2Nb6AvGA$ On Wed, Jun 10, 2026 at 2:04 AM Yufei Gu <[email protected]> wrote: > > Hi all, > > I've been thinking about how Polaris should support Iceberg scan and commit > metrics. A few challenges have come up in recent discussions: > 1. Sync metrics persistence chokes Polaris persistence due to the high > volume of scan metrics [3]. > 2. We spent considerable time figuring out the metrics persistence, > including the schema, SPIs, REST APIs [4]. > 3. Metric filtering remains a challenge [1]. > 4. We need to figure out how to purge metrics because they keep growing [2]. > > Looking at these challenges, most of them are not really metrics problems. > They are transport, delivery, retention, and lifecycle problems that the > existing event framework already addresses. I'd like to propose using the > event system to facilitate the current use cases of Iceberg scan and commit > metrics rather than introducing a separate Polaris metrics subsystem. The > metrics for current use cases are fundamentally events with structured > telemetry attached. They are append only, generated by IRC endpoints, > typically consumed asynchronously, and often forwarded to external systems. > Since Polaris already needs to support them as part of IRC, treating them > as event types seems like a natural fit. > > More importantly, I think Polaris should remain a catalog service and > telemetry producer rather than a metrics warehouse. Instead of introducing > a dedicated metrics subsystem along with storage, retention, query, and > scaling concerns, we could build on the existing event framework: > > - Emit them through the existing event mechanism. We will do that anyway > given it's an IRC endpoint. > - Let custom event listeners route them to the destination of choice, > such as Prometheus, Grafana, RDBMSs, or other systems. > - Reuse the existing event lifecycle, retention, and delivery models. If > temporary persistence is still required, the existing event table can serve > that purpose. The payload size is manageable given that we have put the > loadTable/LoadView response in events. > > This approach also gives deployments flexibility to filter, sample, or > redirect high volume scan metrics without Polaris needing backend specific > metric storage behavior. For example, event listeners can choose which > metric events to process. We don't need to implement metric filtering logic > [1]. > > In short, my proposal is: Events provide the transport and lifecycle > mechanism, while downstream metrics systems remain responsible for storage, > querying, aggregation, and visualization. > > Curious what others think. > > 1. > https://urldefense.com/v3/__https://lists.apache.org/thread/ogskc1szctkg5n0tdj0cm3pfkowcwx4z__;!!Iz9xO38YGHZK!5dsRyjNLlS4tJmGfg1X9V25V-tvijZY1hFe6PXH2IiLM0IhA-MIEfM35XbkSExf1DGhDbpai60c2Nb6AvGA$ > 2. > https://urldefense.com/v3/__https://lists.apache.org/thread/5nst0f2ygnl2gj3j910q7m8nk2fvokc7__;!!Iz9xO38YGHZK!5dsRyjNLlS4tJmGfg1X9V25V-tvijZY1hFe6PXH2IiLM0IhA-MIEfM35XbkSExf1DGhDbpai60c2tJc9x58$ > 3. > https://urldefense.com/v3/__https://lists.apache.org/thread/zp2rvsdkq3mb46722o0hfl0zh7kdqyr8__;!!Iz9xO38YGHZK!5dsRyjNLlS4tJmGfg1X9V25V-tvijZY1hFe6PXH2IiLM0IhA-MIEfM35XbkSExf1DGhDbpai60c2IF2oSWE$ > 4. > https://urldefense.com/v3/__https://lists.apache.org/thread/qj1y7cw4dygcnczmymdwkfkp4ysq41ts__;!!Iz9xO38YGHZK!5dsRyjNLlS4tJmGfg1X9V25V-tvijZY1hFe6PXH2IiLM0IhA-MIEfM35XbkSExf1DGhDbpai60c2-_M1niA$ > > > Yufei
