tusharchou opened a new pull request, #3149:
URL: https://github.com/apache/iceberg-python/pull/3149

   ## Rationale
   
   While reviewing PR #3011 (manifest pruning optimization), I identified a 
correctness
   gap when tables have undergone partition spec evolution.
   
   When `dynamic_partition_overwrite` is called on a table with mixed 
`partition_spec_id`s
   in its snapshot, the delete predicate was built using only the **current** 
partition spec.
   This caused `inclusive_projection` to fail silently when evaluating older 
manifests —
   the predicate contained field references (e.g. `region`) that have no 
corresponding
   partition field in the old spec, causing the manifest evaluator to skip 
those manifests
   entirely. The result is silent data duplication: stale rows from old spec 
manifests are
   never deleted.
   
   ## Changes
   
   - `pyiceberg/table/__init__.py`: `dynamic_partition_overwrite` now iterates 
over all
     `partition_spec_id`s present in the current snapshot and builds a per-spec 
delete
     predicate, projecting the new data files' partition values into each 
historical spec's
     coordinate space before evaluating.
   
   - `tests/integration/test_manifest_pruning_spec_evolution.py`: two 
regression tests added:
     1. Mixed-spec snapshot — overwrite a partition present under both spec-0 
and spec-1
     2. Overwrite a partition that exists **only** in spec-0 manifests (the 
silent data
        duplication case — no exception raised, wrong rows survive)
   
   ## Are these changes tested?
   
   Yes — two new integration tests using the SQLite in-memory catalog, no 
external
   services required.
   
   ## Are there any user-facing changes?
   
   Yes — `dynamic_partition_overwrite` now correctly deletes all matching rows 
across
   all historical partition specs, fixing silent data duplication on evolved 
tables.
   
   ## Related
   - Fixes #3148
   - Related to #3011 (manifest pruning optimization that exposed this gap)
   - Related to #1108 (prior spec evolution fix in manifest rewriting by @Fokko)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to