the-other-tim-brown opened a new pull request, #13660:
URL: https://github.com/apache/hudi/pull/13660
### Change Logs
In the case of cleaner policy `KEEP_LATEST_FILE_VERSIONS`, the
`earliestCommitToRetain` is never set in the cleaner plan or clean metadata.
The savepoint operation currently relies on this field to determine whether the
instant is valid for a savepoint. If that value is not set, the code will
assume all instants are valid which is not accurate.
To handle the case where the `KEEP_LATEST_FILE_VERSIONS` policy is used, we
instead ensure that the instant provided to the savepoint is no older than the
last commit before the last clean operation. Since the clean metadata does not
include the number of file versions it kept, this is the only commit that we
can guarantee is fully intact with this cleaner policy.
### Impact
Adds more safety when creating savepoints
### Risk level (write none, low medium or high below)
None
### Documentation Update
_Describe any necessary documentation update if there is any new feature,
config, or user-facing change. If not, put "none"._
- _The config description must be updated if new configs are added or the
default value of the configs are changed_
- _Any new feature or user-facing change requires updating the Hudi website.
Please create a Jira ticket, attach the
ticket number here and follow the
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to
make
changes to the website._
### Contributor's checklist
- [ ] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]