lawofcycles opened a new pull request, #3237:
URL: https://github.com/apache/iceberg-python/pull/3237

   When _OverwriteFiles._deleted_entries() creates DELETED manifest entries, it 
now sets snapshot_id to the current (deleting) snapshot's ID instead of 
retaining the original INSERT snapshot's ID.
   
   Closes #3236
   
   # Rationale for this change
   According to the [Iceberg spec (Manifest Entry 
Fields)](https://iceberg.apache.org/spec/#manifest-entry-fields), `snapshot_id` 
for a DELETED entry (status=2) should be the snapshot ID in which the file was 
deleted. However, `_OverwriteFiles._deleted_entries()` was copying the original 
entry's `snapshot_id` (from the INSERT snapshot) into the new DELETED entry.
   
   This caused downstream consumers that filter manifest entries by 
`snapshot_id` (e.g. Iceberg Java's `IncrementalChangelogScan`) to silently miss 
DELETED files, breaking CDC pipelines.
   
   ## Are these changes tested?
   Added `test_manifest_entry_snapshot_id_after_partial_deletes` in 
`tests/integration/test_deletes.py`. 
   
   ## Are there any user-facing changes?
   N/A


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to