Good idea, this is definitely worth an RFC btw should it only depend on Hudi's partition? I feel it should be a more common feature since sometimes customers' data can not update across partitions
On Wed, Oct 19, 2022 at 11:07 AM stream2000 <[email protected]> wrote: > Hi all, we have implemented a partition based data ttl management, which > we can manage ttl for hudi partition by size, expired time and > sub-partition count. When a partition is detected as outdated, we use > delete partition interface to delete it, which will generate a replace > commit to mark the data as deleted. The real deletion will then done by > clean service. > > > If community is interested in this idea, maybe we can propose a RFC to > discuss it in detail. > > > > On Oct 19, 2022, at 10:06, Vinoth Chandar <[email protected]> wrote: > > > > +1 love to discuss this on a RFC proposal. > > > > On Tue, Oct 18, 2022 at 13:11 Alexey Kudinkin <[email protected]> > wrote: > > > >> That's a very interesting idea. > >> > >> Do you want to take a stab at writing a full proposal (in the form of > RFC) > >> for it? > >> > >> On Tue, Oct 18, 2022 at 10:20 AM Bingeng Huang <[email protected]> > >> wrote: > >> > >>> Hi all, > >>> > >>> Do we have plan to integrate data TTL into HUDI, so we don't have to > >>> schedule a offline spark job to delete outdated data, just set a TTL > >>> config, then writer or some offline service will delete old data as > >>> expected. > >>> > >> > > -- *Jian Feng,冯健* Shopee | Engineer | Data Infrastructure
