sachinnn99 opened a new pull request, #16171: URL: https://github.com/apache/iceberg/pull/16171
## Summary Closes #16138. The `RemoveDanglingDeleteFiles` action currently exists only for Spark. Flink users have no equivalent, causing equality delete files to accumulate indefinitely in V2 tables managed by the Flink maintenance pipeline. This PR adds `RemoveDanglingDeleteFiles` support to the Flink maintenance API across all three Flink versions (v1.20, v2.0, v2.1). **Approach:** - Follows the `ExpireSnapshots` pattern (single `ProcessFunction` operator) - Implements the same dangling detection algorithm as `RemoveDanglingDeletesSparkAction` using the Iceberg Java API directly via `ManifestReader` iteration (no Spark DataFrames) - Handles both sequence-number-based dangling detection (position/equality deletes) and reference-based detection (deletion vectors) **New files (identical across all three Flink versions):** - `RemoveDanglingDeleteFiles.java` - API builder extending `MaintenanceTaskBuilder` - `RemoveDanglingDeleteFilesProcessor.java` - Core operator with dangling detection logic - `TestRemoveDanglingDeleteFiles.java` - Tests covering partitioned deletes, equality delete edge cases, unpartitioned tables, and no-op scenarios ## Test plan - [x] `TestRemoveDanglingDeleteFiles` passes on Flink v1.20, v2.0, v2.1 - [x] All existing maintenance tests pass (no regressions) - [x] Compilation succeeds across all Flink versions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
