kiyeonjeon21 opened a new pull request, #15957:
URL: https://github.com/apache/iceberg/pull/15957

   ## Summary
   
   `RemoveDanglingDeleteFiles` always operated on the main branch. There was no 
way to target a specific branch, and `RewriteDataFilesSparkAction` did not 
forward its branch when invoking the action internally.
   
   This PR:
   - Adds a `toBranch(String)` default method to the 
`RemoveDanglingDeleteFiles` API
   - Implements branch-aware metadata reads and commits in 
`RemoveDanglingDeletesSparkAction`
   - Forwards the branch from `RewriteDataFilesSparkAction` to the dangling 
delete removal step
   
   Closes #15369
   
   ## Changes
   
   - **API**: Added `toBranch(String)` with a default 
`UnsupportedOperationException` to avoid breaking changes (revapi passes)
   - **Spark (v3.4, v3.5, v4.0)**: Metadata table reads are scoped to the 
branch snapshot via `snapshot-id` option. Commits are directed to the branch 
via `RewriteFiles.toBranch(branch)`
   - **Spark (v4.1)**: Uses `SparkTable.create(metadataTable, TimeTravel)` 
instead of `snapshot-id` option, since time travel options were reworked in 
Spark 4.1
   - **RewriteDataFilesSparkAction** (all versions): Now passes its `branch` 
field to `RemoveDanglingDeletesSparkAction`
   - **Tests**: Added `testBranchSupport` and `testBranchWithDanglingDeletes` 
for v3.5, v4.0, v4.1
   
   ## Notes
   
   - Unpartitioned table early return is kept as-is. The `findDanglingDeletes` 
SQL relies on `data_file.partition` which does not exist for unpartitioned 
tables. Addressing unpartitioned tables would require a different query 
strategy and is better handled separately.
   - AI tools were used to assist with code exploration and drafting. I 
reviewed and tested all changes locally.
   
   ## Test plan
   
   - [x] `./gradlew :iceberg-api:revapi` passes
   - [x] `TestRemoveDanglingDeleteAction` passes (18 tests, 0 failures on Spark 
4.1)
   - [x] `./gradlew spotlessCheck` passes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to