The GitHub Actions job "Bindings Python CI" on iceberg-rust.git/main has succeeded. Run started by GitHub user liurenjie1024 (triggered by liurenjie1024).
Head commit for run: a329a3b756710b71c49dfd4b43060a4cc2731d3f / Lo <[email protected]> feat: Implement shared delete file loading and caching for ArrowReader (#1941) ## Which issue does this PR close? - Closes #. ## What changes are included in this PR? Currently, ArrowReader instantiates a new CachingDeleteFileLoader (and consequently a new DeleteFilter) for each FileScanTask when calling load_deletes. This results in the DeleteFilter state being isolated per task. If multiple tasks reference the same delete file (common in positional deletes), that delete file is re-read and re-parsed for every task, leading to significant performance overhead and redundant I/O. Changes * Shared State: Moved the DeleteFilter instance into the CachingDeleteFileLoader struct. Since ArrowReader holds a single CachingDeleteFileLoader instance across its lifetime, the DeleteFilter state is now effectively shared across all file scan tasks processed by that reader. * Positional Delete Caching: Implemented a state machine for loading positional delete files (PosDelState) in DeleteFilter. * Added try_start_pos_del_load: Coordinates concurrent access to the same positional delete file. * Added finish_pos_del_load: Signals completion of loading. * Synchronization: Introduced a WaitFor state. Unlike equality deletes (which are accessed asynchronously), positional deletes are accessed synchronously by ArrowReader. Therefore, if a task encounters a file that is currently being loaded by another task, it must asynchronously wait (notify.notified().await) during the loading phase to ensure the data is fully populated before ArrowReader proceeds. * Refactoring: Updated load_file_for_task and related types in CachingDeleteFileLoader to support the new caching logic and carry file paths through the loading context. ## Are these changes tested? Added test_caching_delete_file_loader_caches_results to verify that repeated loads of the same delete file return shared memory objects Report URL: https://github.com/apache/iceberg-rust/actions/runs/20418833078 With regards, GitHub Actions via GitBox
