uicosp opened a new pull request, #6173:
URL: https://github.com/apache/paimon/pull/6173
[core] Fix checkpoint recovery failure for compacted changelog files
### Purpose
Fixes checkpoint recovery failures when using precommit-compact
functionality introduced in commit [flink] add coordinate and worker operator
for small changelog files compaction (#4380).
**Root Cause:**
Compacted changelog files have two types of file names:
1. Real files: `compacted-changelog-xxx$bid-len.cc-format`
2. Fake files: `compacted-changelog-xxx$bid-len-off-len2.cc-format`
Fake file names point to segments of real files but don't exist in the
filesystem. The `checkFilesExistence` method was directly checking these fake
file paths, causing recovery failures.
**Solution:**
- Created `CompactedChangelogPathResolver` utility class to resolve fake
file paths to real file paths
- Modified `TableCommitImpl.checkFilesExistence()` to resolve all paths
before checking existence
- Added deduplication logic since multiple fake files may resolve to the
same real file
- Path resolution rules:
- Real files (`xxx$bid-len.cc-format`): return original path
- Fake files (`xxx$bid-len-off-len2.cc-format`): resolve to
`bucket-bid/xxx$bid-len.cc-format`
### Tests
- Added unit tests in `CompactedChangelogPathResolverTest` to verify path
resolution logic
- Existing checkpoint recovery tests should now pass with compacted
changelog files
### API and Format
No changes to public API or storage format. This is an internal fix for
file path resolution.
### Documentation
No new features introduced. This is a bug fix for existing
precommit-compact functionality.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]