The GitHub Actions job "CI" on iceberg-rust.git/main has failed.
Run started by GitHub user liurenjie1024 (triggered by liurenjie1024).

Head commit for run:
a329a3b756710b71c49dfd4b43060a4cc2731d3f / Lo <[email protected]>
feat: Implement shared delete file loading and caching for ArrowReader (#1941)

## Which issue does this PR close?

- Closes #.

## What changes are included in this PR?

Currently, ArrowReader instantiates a new CachingDeleteFileLoader (and
consequently a new DeleteFilter) for each FileScanTask when calling
load_deletes. This
results in the DeleteFilter state being isolated per task. If multiple
tasks reference the same delete file (common in positional deletes),
that delete file is
re-read and re-parsed for every task, leading to significant performance
overhead and redundant I/O.

  Changes

* Shared State: Moved the DeleteFilter instance into the
CachingDeleteFileLoader struct. Since ArrowReader holds a single
CachingDeleteFileLoader instance across
its lifetime, the DeleteFilter state is now effectively shared across
all file scan tasks processed by that reader.
* Positional Delete Caching: Implemented a state machine for loading
positional delete files (PosDelState) in DeleteFilter.
* Added try_start_pos_del_load: Coordinates concurrent access to the
same positional delete file.
       * Added finish_pos_del_load: Signals completion of loading.
* Synchronization: Introduced a WaitFor state. Unlike equality deletes
(which are accessed asynchronously), positional deletes are accessed
synchronously by
ArrowReader. Therefore, if a task encounters a file that is currently
being loaded by another task, it must asynchronously wait
(notify.notified().await)
during the loading phase to ensure the data is fully populated before
ArrowReader proceeds.
* Refactoring: Updated load_file_for_task and related types in
CachingDeleteFileLoader to support the new caching logic and carry file
paths through the loading
     context.

## Are these changes tested?

Added test_caching_delete_file_loader_caches_results to verify that
repeated loads of the same delete file return shared memory objects

Report URL: https://github.com/apache/iceberg-rust/actions/runs/20418833068

With regards,
GitHub Actions via GitBox

Reply via email to