Re: Provide a mechanism to purge the events/metrics table

Adnan Hemani via dev Fri, 12 Jun 2026 17:42:37 -0700

Hi all,

Sorry for being late to reply. +1 to needing this, I think I mentioned
during the initial events proposal that setting retention boundaries is
necessary for this system to be scalable.


Keeping the job in the Admin tool makes sense to me. The only thing I'd
like to see (an implementation detail, but I'm stating it upfront) is the
ability to set the retention amount through the Polaris configurations
(with a potential manual override). Although running this maintenance job
is inherently destructive, Polaris must ensure that Events and Metrics
maintenance jobs always respect pre-set retention limits so that an admin
doesn't accidentally delete data they did not intend to.

Best,
Adnan Hemani

On Fri, Jun 12, 2026 at 9:03 AM Yong Zheng <[email protected]> wrote:

> Hi all,
>
> Thanks for the feedback. I will start work on the generic cronjob support
> in helm first. Then the newer maintenance jobs can be plugin-and-play.
>
> Thanks,
> Yong Zheng
>
> > On Jun 11, 2026, at 8:31 AM, Alexandre Dutra <[email protected]> wrote:
> >
> > Hi all,
> >
> > I like the idea of a maintenance section in the Helm chart that would
> > create Jobs or CronJobs delegating to various admin commands. This
> > design looks clean to me, and corresponds to how the admin tool was
> > designed to be used.
> >
> > Thanks,
> > Alex
> >
> >> On Thu, Jun 11, 2026 at 1:17 AM Yong Zheng <[email protected]>
> wrote:
> >>
> >> Yes for the helm maintenance section which will create k8s cronjob. For
> non-k8s env, you will just need to invoke the CLI periodically with ur job
> orchestrator.
> >>
> >> Thanks,
> >> Yong Zheng
> >>
> >>>> On Jun 10, 2026, at 4:59 PM, Dmitri Bourlatchkov <[email protected]>
> wrote:
> >>>
> >>> Hi Nandor,
> >>>
> >>> I was thinking about a k8s cron job too for OSS charts.
> >>>
> >>> In non-k8s environments, users will have to find a way to call the new
> >>> admin tool command.
> >>>
> >>> Cheers,
> >>> Dmitri.
> >>>
> >>>> On Wed, Jun 10, 2026 at 3:55 PM Nándor Kollár <[email protected]>
> wrote:
> >>>>
> >>>> +1 for the Helm chart maintenance section too. Would that create a k8s
> >>>> cron job, which periodically executes the cleanup admin command?
> >>>> Customers, who don't use Kubernetes should solve the scheduling in
> >>>> their own system, for example configuring a cron job on a VM?
> >>>>
> >>>> Dmitri Bourlatchkov <[email protected]> ezt írta (időpont: 2026. jún.
> >>>> 9., K, 5:34):
> >>>>>
> >>>>> Hi Yong,
> >>>>>
> >>>>> +1 to adding a maintenance section to the helm chart.
> >>>>>
> >>>>> Cheers,
> >>>>> Dmitri.
> >>>>>
> >>>>> On Mon, Jun 8, 2026 at 10:13 PM Yong Zheng <[email protected]>
> >>>> wrote:
> >>>>>
> >>>>>> Hello Nándor and Dmitri,
> >>>>>>
> >>>>>> I agree this is becoming more important as we persist more data in
> the
> >>>>>> Polaris backend. Today we have at least the events tables and the
> >>>> persisted
> >>>>>> Iceberg metrics tables that need some form of cleanup and retention
> >>>>>> management.
> >>>>>>
> >>>>>> The admin tool approach sounds reasonable to me. It gives operators
> >>>> control
> >>>>>> over when cleanup runs and allows them to use existing scheduling
> >>>>>> mechanisms such as k8s crob.
> >>>>>>
> >>>>>> It would also be nice to avoid building a separate cleanup solution
> for
> >>>>>> every feature. If we go down the admin tool route, perhaps we can
> have
> >>>> a
> >>>>>> common maintenance framework that supports events cleanup, metrics
> >>>> cleanup,
> >>>>>> engine-specific maintenance tasks (for example, rebuilding
> indexes), as
> >>>>>> well as future maintenance operations.
> >>>>>>
> >>>>>> I am pretty open-ended on the implementation details. One thing
> that I
> >>>>>> think would be beneficial is introducing a maintenance section in
> the
> >>>>>> Polaris helm chart. That would allow operators to configure and
> >>>> schedule
> >>>>>> maintenance tasks without having to create separate one-off charts
> or
> >>>> jobs
> >>>>>> for each task.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Yong Zheng
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Jun 8, 2026 at 8:01 PM Dmitri Bourlatchkov <
> [email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi Yong,
> >>>>>>>
> >>>>>>> Thanks for starting this discussion!
> >>>>>>>
> >>>>>>> From my POV the Admin tool does look like a good fit for this
> >>>> capability.
> >>>>>>> It is similar to the NoSQL maintenance task [3395].
> >>>>>>>
> >>>>>>> I believe end users could then schedule the maintenance runs
> >>>> according to
> >>>>>>> their deployment mechanics, e.g. via k8s jobs.
> >>>>>>>
> >>>>>>> I made an attempt at refactoring the Admin CLI for pluggability in
> >>>> terms
> >>>>>> of
> >>>>>>> sub-commands in [3947]. We could revive that PR if there's
> community
> >>>>>>> interest. The Metrics / Events maintenance tasks could then be
> >>>> plugged in
> >>>>>>> similarly to NoSQL maintenance.
> >>>>>>>
> >>>>>>> [3395] https://github.com/apache/polaris/pull/3395
> >>>>>>>
> >>>>>>> [3947] https://github.com/apache/polaris/pull/3947
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Dmitri.
> >>>>>>>
> >>>>>>> On Sun, Jun 7, 2026 at 2:34 PM Yong Zheng <[email protected]>
> wrote:
> >>>>>>>
> >>>>>>>> Hello,
> >>>>>>>>
> >>>>>>>> A while back Alex raised
> >>>> https://github.com/apache/polaris/issues/2573
> >>>>>>>> for requesting a mechanism to purge the events table. Recently
> >>>> there
> >>>>>> is a
> >>>>>>>> persisted iceberg metrics also got introduced (
> >>>>>>>> https://github.com/apache/polaris/pull/3385) and this created two
> >>>>>> tables
> >>>>>>>> (read and write metrics tables) which we also lack the life cycle
> >>>>>>>> management and tables size should grow indefinitely. We will
> likely
> >>>>>> need
> >>>>>>> a
> >>>>>>>> mechanism to handle both.
> >>>>>>>>
> >>>>>>>> I am wondering what does community thinks about this? Should this
> >>>> be
> >>>>>> part
> >>>>>>>> of admin tool where admins/ops should make the call on when to
> >>>> clean up
> >>>>>>> or
> >>>>>>>> should we have a janitor process that runs automatically (users
> >>>> will
> >>>>>> need
> >>>>>>>> to provide rules on what to cleanup such as time based TTL).
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Yong Zheng
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
>

Re: Provide a mechanism to purge the events/metrics table

Reply via email to