Hi all, we have implemented a partition based data ttl management, which we can manage ttl for hudi partition by size, expired time and sub-partition count. When a partition is detected as outdated, we use delete partition interface to delete it, which will generate a replace commit to mark the data as deleted. The real deletion will then done by clean service.
If community is interested in this idea, maybe we can propose a RFC to discuss it in detail. > On Oct 19, 2022, at 10:06, Vinoth Chandar <vin...@apache.org> wrote: > > +1 love to discuss this on a RFC proposal. > > On Tue, Oct 18, 2022 at 13:11 Alexey Kudinkin <ale...@onehouse.ai> wrote: > >> That's a very interesting idea. >> >> Do you want to take a stab at writing a full proposal (in the form of RFC) >> for it? >> >> On Tue, Oct 18, 2022 at 10:20 AM Bingeng Huang <hbgstc...@gmail.com> >> wrote: >> >>> Hi all, >>> >>> Do we have plan to integrate data TTL into HUDI, so we don't have to >>> schedule a offline spark job to delete outdated data, just set a TTL >>> config, then writer or some offline service will delete old data as >>> expected. >>> >>