The GitHub Actions job "Python CI" on iceberg-python.git/main has succeeded.
Run started by GitHub user kevinjqliu (triggered by kevinjqliu).

Head commit for run:
2d6a1b97bd0facc8000377b98373eff6433c47dc / geruh <[email protected]>
feat: Add DeleteFileIndex to improve position delete lookup (#2918)

Related to #2255.

# Rationale for this change

This PR is a piece of the existing DFI PR in #2255. However, this rips
out the existing delete->data matching behavior for deletes and indexes
them for efficient lookup.

The previous implementation:
1. Scanned all delete files with sequence number >= data file's sequence
number
2. Created a new `_InclusiveMetricsEvaluator` instance for each data
file
3. Evaluated every candidate delete file against the data file's path

Now we extend this workflow with a `DeleteFileIndex` that:
- INdexes path specific DVs 
- Indexes partition-scoped deletes by (spec_id, partition record)
- Uses bisect_left for sequence number filtering 

This aligns with the Java implementation of the
[DeleteFileIndex](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/DeleteFileIndex.java),
following the python infra.

## Are these changes tested?

New tests added and existing tests continue to pass

## Are there any user-facing changes?

No

Report URL: https://github.com/apache/iceberg-python/actions/runs/21267255495

With regards,
GitHub Actions via GitBox

Reply via email to