Good idea,
this is definitely worth an  RFC
btw should it only depend on Hudi's partition? I feel it should be a more
common feature since sometimes customers' data can not update across
partitions


On Wed, Oct 19, 2022 at 11:07 AM stream2000 <[email protected]> wrote:

> Hi all, we have implemented a partition based data ttl management, which
> we can manage ttl for hudi partition by size, expired time and
> sub-partition count. When a partition is detected as outdated, we use
> delete partition interface to delete it, which will generate a replace
> commit to mark the data as deleted. The real deletion will then done by
> clean service.
>
>
> If community is interested in this idea, maybe we can propose a RFC to
> discuss it in detail.
>
>
> > On Oct 19, 2022, at 10:06, Vinoth Chandar <[email protected]> wrote:
> >
> > +1 love to discuss this on a RFC proposal.
> >
> > On Tue, Oct 18, 2022 at 13:11 Alexey Kudinkin <[email protected]>
> wrote:
> >
> >> That's a very interesting idea.
> >>
> >> Do you want to take a stab at writing a full proposal (in the form of
> RFC)
> >> for it?
> >>
> >> On Tue, Oct 18, 2022 at 10:20 AM Bingeng Huang <[email protected]>
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> Do we have plan to integrate data TTL into HUDI, so we don't have to
> >>> schedule a offline spark job to delete outdated data, just set a TTL
> >>> config, then writer or some offline service will delete old data as
> >>> expected.
> >>>
> >>
>
>

-- 
*Jian Feng,冯健*
Shopee | Engineer | Data Infrastructure

Reply via email to