Hello Nándor and Dmitri,

I agree this is becoming more important as we persist more data in the
Polaris backend. Today we have at least the events tables and the persisted
Iceberg metrics tables that need some form of cleanup and retention
management.

The admin tool approach sounds reasonable to me. It gives operators control
over when cleanup runs and allows them to use existing scheduling
mechanisms such as k8s crob.

It would also be nice to avoid building a separate cleanup solution for
every feature. If we go down the admin tool route, perhaps we can have a
common maintenance framework that supports events cleanup, metrics cleanup,
engine-specific maintenance tasks (for example, rebuilding indexes), as
well as future maintenance operations.

I am pretty open-ended on the implementation details. One thing that I
think would be beneficial is introducing a maintenance section in the
Polaris helm chart. That would allow operators to configure and schedule
maintenance tasks without having to create separate one-off charts or jobs
for each task.

Thanks,
Yong Zheng


On Mon, Jun 8, 2026 at 8:01 PM Dmitri Bourlatchkov <[email protected]> wrote:

> Hi Yong,
>
> Thanks for starting this discussion!
>
> From my POV the Admin tool does look like a good fit for this capability.
> It is similar to the NoSQL maintenance task [3395].
>
> I believe end users could then schedule the maintenance runs according to
> their deployment mechanics, e.g. via k8s jobs.
>
> I made an attempt at refactoring the Admin CLI for pluggability in terms of
> sub-commands in [3947]. We could revive that PR if there's community
> interest. The Metrics / Events maintenance tasks could then be plugged in
> similarly to NoSQL maintenance.
>
> [3395] https://github.com/apache/polaris/pull/3395
>
> [3947] https://github.com/apache/polaris/pull/3947
>
> Cheers,
> Dmitri.
>
> On Sun, Jun 7, 2026 at 2:34 PM Yong Zheng <[email protected]> wrote:
>
> > Hello,
> >
> > A while back Alex raised https://github.com/apache/polaris/issues/2573
> > for requesting a mechanism to purge the events table. Recently there is a
> > persisted iceberg metrics also got introduced (
> > https://github.com/apache/polaris/pull/3385) and this created two tables
> > (read and write metrics tables) which we also lack the life cycle
> > management and tables size should grow indefinitely. We will likely need
> a
> > mechanism to handle both.
> >
> > I am wondering what does community thinks about this? Should this be part
> > of admin tool where admins/ops should make the call on when to clean up
> or
> > should we have a janitor process that runs automatically (users will need
> > to provide rules on what to cleanup such as time based TTL).
> >
> > Thanks,
> > Yong Zheng
> >
>

Reply via email to