Hi all, we have implemented a partition based data ttl management, which we can 
manage ttl for hudi partition by size, expired time and sub-partition count. 
When a partition is detected as outdated, we use delete partition interface to 
delete it, which will generate a replace commit to mark the data as deleted. 
The real deletion will then done by clean service. 


If community is interested in this idea, maybe we can propose a RFC to discuss 
it in detail.


> On Oct 19, 2022, at 10:06, Vinoth Chandar <vin...@apache.org> wrote:
> 
> +1 love to discuss this on a RFC proposal.
> 
> On Tue, Oct 18, 2022 at 13:11 Alexey Kudinkin <ale...@onehouse.ai> wrote:
> 
>> That's a very interesting idea.
>> 
>> Do you want to take a stab at writing a full proposal (in the form of RFC)
>> for it?
>> 
>> On Tue, Oct 18, 2022 at 10:20 AM Bingeng Huang <hbgstc...@gmail.com>
>> wrote:
>> 
>>> Hi all,
>>> 
>>> Do we have plan to integrate data TTL into HUDI, so we don't have to
>>> schedule a offline spark job to delete outdated data, just set a TTL
>>> config, then writer or some offline service will delete old data as
>>> expected.
>>> 
>> 

Reply via email to