geruh opened a new pull request, #2255:
URL: https://github.com/apache/iceberg-python/pull/2255

   Closes #1210
   
   # Summary
   
   This work was primarily done by @rutb327 while I provided guidance!
   
   This PR adds equality delete read support to PyIceberg by implementing the 
delete file indexing system that matches delete files to data files, mimicking 
the behavior found in [Iceberg 
Core](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/DeleteFileIndex.java).
 With this implementation we are able to index files and now read equality 
deletes during table scans.
   
   ## Design details
   
   ### Delete File Index
   The new `DeleteFileIndex` class centralizes handling of all delete file 
types: positional deletes, equality deletes, and deletion vectors. It organizes 
deletes by type (equality vs. positional), partition (using `PartitionMap` for 
spec-aware grouping), and path (for path-specific positional deletes). This 
enables efficient lookup during table scans, reducing unnecessary delete file 
processing.
   
   
   ## Equality Delete support
   
   Equality delete files are loaded as PyArrow Tables with their respective 
equality ids for the schema and for each we are grouping tables with the same 
set equality id's to reduce anti join operations.
   
   
   # Testing
   Added tests from the core iceberg 
[DeleteFileIndex](https://github.com/apache/iceberg/blob/main/core/src/test/java/org/apache/iceberg/DeleteFileIndexTestBase.java#L45)
 test suite and added some tests with dummy files. As well as some manual 
testing with a flink setup.
   
   ```
   table_eq with only equality deletes on id=2, id=5
   +---+-------+
   | id|   data|
   +---+-------+
   |  1|  Alice|
   |  3|Charlie|
   |  4|  David|
   |  6|  Frank|
   +---+-------+
   
   table_eq_pos with equality deletes and positional delete at position 3
   +---+-----+
   | id| data|
   +---+-----+
   |  1|Alice|
   |  4|David|
   |  6|Frank|
   +---+-----+
   ```
   
   
   # Are there any user-facing changes?
   
   Yes can read tables with equality deletes
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to