the-other-tim-brown opened a new pull request, #13660:
URL: https://github.com/apache/hudi/pull/13660

   ### Change Logs
   
   In the case of cleaner policy `KEEP_LATEST_FILE_VERSIONS`, the 
`earliestCommitToRetain` is never set in the cleaner plan or clean metadata. 
The savepoint operation currently relies on this field to determine whether the 
instant is valid for a savepoint. If that value is not set, the code will 
assume all instants are valid which is not accurate.
   
   To handle the case where the `KEEP_LATEST_FILE_VERSIONS` policy is used, we 
instead ensure that the instant provided to the savepoint is no older than the 
last commit before the last clean operation. Since the clean metadata does not 
include the number of file versions it kept, this is the only commit that we 
can guarantee is fully intact with this cleaner policy.
   
   ### Impact
   
   Adds more safety when creating savepoints
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
     ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to