[ 
https://issues.apache.org/jira/browse/HUDI-7104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7104.
---------------------------
    Resolution: Fixed

> Cleaner could miss to clean up some files w/ savepoint interplay 
> -----------------------------------------------------------------
>
>                 Key: HUDI-7104
>                 URL: https://issues.apache.org/jira/browse/HUDI-7104
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: cleaning, savepoint, table-service
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.15.0, 1.0.0
>
>
> Lets say partitioning is day based and is based on created date. So, older 
> partitions generally does not get any new data after few days. 
>  
> Lets say we have savepoints added to a day and later removed. 
> day 1: cleaned up. 
> day2: savepoint added. and so cleaner ignord. 
> day3: cleaned up 
> day4: earliest commit to retain based on cleaner configs. 
>  
> So, w/ this table/timeline state, if we remove the savepointed commit, data 
> pertaining to day2 will never be cleaned by the cleaner since its lesser than 
> the earliest commit to retain. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to