lawofcycles opened a new pull request, #3237: URL: https://github.com/apache/iceberg-python/pull/3237
When _OverwriteFiles._deleted_entries() creates DELETED manifest entries, it now sets snapshot_id to the current (deleting) snapshot's ID instead of retaining the original INSERT snapshot's ID. Closes #3236 # Rationale for this change According to the [Iceberg spec (Manifest Entry Fields)](https://iceberg.apache.org/spec/#manifest-entry-fields), `snapshot_id` for a DELETED entry (status=2) should be the snapshot ID in which the file was deleted. However, `_OverwriteFiles._deleted_entries()` was copying the original entry's `snapshot_id` (from the INSERT snapshot) into the new DELETED entry. This caused downstream consumers that filter manifest entries by `snapshot_id` (e.g. Iceberg Java's `IncrementalChangelogScan`) to silently miss DELETED files, breaking CDC pipelines. ## Are these changes tested? Added `test_manifest_entry_snapshot_id_after_partial_deletes` in `tests/integration/test_deletes.py`. ## Are there any user-facing changes? N/A -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
