Hello Nándor and Dmitri, I agree this is becoming more important as we persist more data in the Polaris backend. Today we have at least the events tables and the persisted Iceberg metrics tables that need some form of cleanup and retention management.
The admin tool approach sounds reasonable to me. It gives operators control over when cleanup runs and allows them to use existing scheduling mechanisms such as k8s crob. It would also be nice to avoid building a separate cleanup solution for every feature. If we go down the admin tool route, perhaps we can have a common maintenance framework that supports events cleanup, metrics cleanup, engine-specific maintenance tasks (for example, rebuilding indexes), as well as future maintenance operations. I am pretty open-ended on the implementation details. One thing that I think would be beneficial is introducing a maintenance section in the Polaris helm chart. That would allow operators to configure and schedule maintenance tasks without having to create separate one-off charts or jobs for each task. Thanks, Yong Zheng On Mon, Jun 8, 2026 at 8:01 PM Dmitri Bourlatchkov <[email protected]> wrote: > Hi Yong, > > Thanks for starting this discussion! > > From my POV the Admin tool does look like a good fit for this capability. > It is similar to the NoSQL maintenance task [3395]. > > I believe end users could then schedule the maintenance runs according to > their deployment mechanics, e.g. via k8s jobs. > > I made an attempt at refactoring the Admin CLI for pluggability in terms of > sub-commands in [3947]. We could revive that PR if there's community > interest. The Metrics / Events maintenance tasks could then be plugged in > similarly to NoSQL maintenance. > > [3395] https://github.com/apache/polaris/pull/3395 > > [3947] https://github.com/apache/polaris/pull/3947 > > Cheers, > Dmitri. > > On Sun, Jun 7, 2026 at 2:34 PM Yong Zheng <[email protected]> wrote: > > > Hello, > > > > A while back Alex raised https://github.com/apache/polaris/issues/2573 > > for requesting a mechanism to purge the events table. Recently there is a > > persisted iceberg metrics also got introduced ( > > https://github.com/apache/polaris/pull/3385) and this created two tables > > (read and write metrics tables) which we also lack the life cycle > > management and tables size should grow indefinitely. We will likely need > a > > mechanism to handle both. > > > > I am wondering what does community thinks about this? Should this be part > > of admin tool where admins/ops should make the call on when to clean up > or > > should we have a janitor process that runs automatically (users will need > > to provide rules on what to cleanup such as time based TTL). > > > > Thanks, > > Yong Zheng > > >
