Re: Provide a mechanism to purge the events/metrics table

Yong Zheng Fri, 12 Jun 2026 09:02:49 -0700

Hi all,

Thanks for the feedback. I will start work on the generic cronjob support in 
helm first. Then the newer maintenance jobs can be plugin-and-play.


Thanks,
Yong Zheng

> On Jun 11, 2026, at 8:31 AM, Alexandre Dutra <[email protected]> wrote:
> 
> Hi all,
> 
> I like the idea of a maintenance section in the Helm chart that would
> create Jobs or CronJobs delegating to various admin commands. This
> design looks clean to me, and corresponds to how the admin tool was
> designed to be used.
> 
> Thanks,
> Alex
> 
>> On Thu, Jun 11, 2026 at 1:17 AM Yong Zheng <[email protected]> wrote:
>> 
>> Yes for the helm maintenance section which will create k8s cronjob. For 
>> non-k8s env, you will just need to invoke the CLI periodically with ur job 
>> orchestrator.
>> 
>> Thanks,
>> Yong Zheng
>> 
>>>> On Jun 10, 2026, at 4:59 PM, Dmitri Bourlatchkov <[email protected]> wrote:
>>> 
>>> Hi Nandor,
>>> 
>>> I was thinking about a k8s cron job too for OSS charts.
>>> 
>>> In non-k8s environments, users will have to find a way to call the new
>>> admin tool command.
>>> 
>>> Cheers,
>>> Dmitri.
>>> 
>>>> On Wed, Jun 10, 2026 at 3:55 PM Nándor Kollár <[email protected]> wrote:
>>>> 
>>>> +1 for the Helm chart maintenance section too. Would that create a k8s
>>>> cron job, which periodically executes the cleanup admin command?
>>>> Customers, who don't use Kubernetes should solve the scheduling in
>>>> their own system, for example configuring a cron job on a VM?
>>>> 
>>>> Dmitri Bourlatchkov <[email protected]> ezt írta (időpont: 2026. jún.
>>>> 9., K, 5:34):
>>>>> 
>>>>> Hi Yong,
>>>>> 
>>>>> +1 to adding a maintenance section to the helm chart.
>>>>> 
>>>>> Cheers,
>>>>> Dmitri.
>>>>> 
>>>>> On Mon, Jun 8, 2026 at 10:13 PM Yong Zheng <[email protected]>
>>>> wrote:
>>>>> 
>>>>>> Hello Nándor and Dmitri,
>>>>>> 
>>>>>> I agree this is becoming more important as we persist more data in the
>>>>>> Polaris backend. Today we have at least the events tables and the
>>>> persisted
>>>>>> Iceberg metrics tables that need some form of cleanup and retention
>>>>>> management.
>>>>>> 
>>>>>> The admin tool approach sounds reasonable to me. It gives operators
>>>> control
>>>>>> over when cleanup runs and allows them to use existing scheduling
>>>>>> mechanisms such as k8s crob.
>>>>>> 
>>>>>> It would also be nice to avoid building a separate cleanup solution for
>>>>>> every feature. If we go down the admin tool route, perhaps we can have
>>>> a
>>>>>> common maintenance framework that supports events cleanup, metrics
>>>> cleanup,
>>>>>> engine-specific maintenance tasks (for example, rebuilding indexes), as
>>>>>> well as future maintenance operations.
>>>>>> 
>>>>>> I am pretty open-ended on the implementation details. One thing that I
>>>>>> think would be beneficial is introducing a maintenance section in the
>>>>>> Polaris helm chart. That would allow operators to configure and
>>>> schedule
>>>>>> maintenance tasks without having to create separate one-off charts or
>>>> jobs
>>>>>> for each task.
>>>>>> 
>>>>>> Thanks,
>>>>>> Yong Zheng
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jun 8, 2026 at 8:01 PM Dmitri Bourlatchkov <[email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Yong,
>>>>>>> 
>>>>>>> Thanks for starting this discussion!
>>>>>>> 
>>>>>>> From my POV the Admin tool does look like a good fit for this
>>>> capability.
>>>>>>> It is similar to the NoSQL maintenance task [3395].
>>>>>>> 
>>>>>>> I believe end users could then schedule the maintenance runs
>>>> according to
>>>>>>> their deployment mechanics, e.g. via k8s jobs.
>>>>>>> 
>>>>>>> I made an attempt at refactoring the Admin CLI for pluggability in
>>>> terms
>>>>>> of
>>>>>>> sub-commands in [3947]. We could revive that PR if there's community
>>>>>>> interest. The Metrics / Events maintenance tasks could then be
>>>> plugged in
>>>>>>> similarly to NoSQL maintenance.
>>>>>>> 
>>>>>>> [3395] https://github.com/apache/polaris/pull/3395
>>>>>>> 
>>>>>>> [3947] https://github.com/apache/polaris/pull/3947
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Dmitri.
>>>>>>> 
>>>>>>> On Sun, Jun 7, 2026 at 2:34 PM Yong Zheng <[email protected]> wrote:
>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> A while back Alex raised
>>>> https://github.com/apache/polaris/issues/2573
>>>>>>>> for requesting a mechanism to purge the events table. Recently
>>>> there
>>>>>> is a
>>>>>>>> persisted iceberg metrics also got introduced (
>>>>>>>> https://github.com/apache/polaris/pull/3385) and this created two
>>>>>> tables
>>>>>>>> (read and write metrics tables) which we also lack the life cycle
>>>>>>>> management and tables size should grow indefinitely. We will likely
>>>>>> need
>>>>>>> a
>>>>>>>> mechanism to handle both.
>>>>>>>> 
>>>>>>>> I am wondering what does community thinks about this? Should this
>>>> be
>>>>>> part
>>>>>>>> of admin tool where admins/ops should make the call on when to
>>>> clean up
>>>>>>> or
>>>>>>>> should we have a janitor process that runs automatically (users
>>>> will
>>>>>> need
>>>>>>>> to provide rules on what to cleanup such as time based TTL).
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Yong Zheng
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>

Re: Provide a mechanism to purge the events/metrics table

Reply via email to