kiyeonjeon21 opened a new pull request, #15957: URL: https://github.com/apache/iceberg/pull/15957
## Summary `RemoveDanglingDeleteFiles` always operated on the main branch. There was no way to target a specific branch, and `RewriteDataFilesSparkAction` did not forward its branch when invoking the action internally. This PR: - Adds a `toBranch(String)` default method to the `RemoveDanglingDeleteFiles` API - Implements branch-aware metadata reads and commits in `RemoveDanglingDeletesSparkAction` - Forwards the branch from `RewriteDataFilesSparkAction` to the dangling delete removal step Closes #15369 ## Changes - **API**: Added `toBranch(String)` with a default `UnsupportedOperationException` to avoid breaking changes (revapi passes) - **Spark (v3.4, v3.5, v4.0)**: Metadata table reads are scoped to the branch snapshot via `snapshot-id` option. Commits are directed to the branch via `RewriteFiles.toBranch(branch)` - **Spark (v4.1)**: Uses `SparkTable.create(metadataTable, TimeTravel)` instead of `snapshot-id` option, since time travel options were reworked in Spark 4.1 - **RewriteDataFilesSparkAction** (all versions): Now passes its `branch` field to `RemoveDanglingDeletesSparkAction` - **Tests**: Added `testBranchSupport` and `testBranchWithDanglingDeletes` for v3.5, v4.0, v4.1 ## Notes - Unpartitioned table early return is kept as-is. The `findDanglingDeletes` SQL relies on `data_file.partition` which does not exist for unpartitioned tables. Addressing unpartitioned tables would require a different query strategy and is better handled separately. - AI tools were used to assist with code exploration and drafting. I reviewed and tested all changes locally. ## Test plan - [x] `./gradlew :iceberg-api:revapi` passes - [x] `TestRemoveDanglingDeleteAction` passes (18 tests, 0 failures on Spark 4.1) - [x] `./gradlew spotlessCheck` passes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
