great proposal. Partition TTL is a good starting point. we can extend it to
other TTL strategies like column-based, and make it customizable and
pluggable. Looking forward to the RFC!

On Wed, Oct 19, 2022 at 11:40 AM Jian Feng <jian.f...@shopee.com.invalid>
wrote:

> Good idea,
> this is definitely worth an  RFC
> btw should it only depend on Hudi's partition? I feel it should be a more
> common feature since sometimes customers' data can not update across
> partitions
>
>
> On Wed, Oct 19, 2022 at 11:07 AM stream2000 <18889897...@163.com> wrote:
>
> > Hi all, we have implemented a partition based data ttl management, which
> > we can manage ttl for hudi partition by size, expired time and
> > sub-partition count. When a partition is detected as outdated, we use
> > delete partition interface to delete it, which will generate a replace
> > commit to mark the data as deleted. The real deletion will then done by
> > clean service.
> >
> >
> > If community is interested in this idea, maybe we can propose a RFC to
> > discuss it in detail.
> >
> >
> > > On Oct 19, 2022, at 10:06, Vinoth Chandar <vin...@apache.org> wrote:
> > >
> > > +1 love to discuss this on a RFC proposal.
> > >
> > > On Tue, Oct 18, 2022 at 13:11 Alexey Kudinkin <ale...@onehouse.ai>
> > wrote:
> > >
> > >> That's a very interesting idea.
> > >>
> > >> Do you want to take a stab at writing a full proposal (in the form of
> > RFC)
> > >> for it?
> > >>
> > >> On Tue, Oct 18, 2022 at 10:20 AM Bingeng Huang <hbgstc...@gmail.com>
> > >> wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> Do we have plan to integrate data TTL into HUDI, so we don't have to
> > >>> schedule a offline spark job to delete outdated data, just set a TTL
> > >>> config, then writer or some offline service will delete old data as
> > >>> expected.
> > >>>
> > >>
> >
> >
>
> --
> *Jian Feng,冯健*
> Shopee | Engineer | Data Infrastructure
>


-- 
Best,
Shiyan

Reply via email to