The GitHub Actions job "Nightly PyPI Build" on iceberg-rust.git/main has failed.
Run started by GitHub user Fokko (triggered by Fokko).

Head commit for run:
16906c127d521395a789a9019350e467cc34d063 / Lo 
<[email protected]>
fix: stack overflow when loading large equality deletes (#1915)

## Which issue does this PR close?

- Closes #.

## What changes are included in this PR?

A stack overflow occurs when processing data files containing a large
number of equality deletes (e.g., > 6000 rows).
This happens because parse_equality_deletes_record_batch_stream
previously constructed the final predicate by linearly calling .and() in
a loop:
```rust
result_predicate = result_predicate.and(row_predicate.not());
```
This resulted in a deeply nested, left-skewed tree structure with a
depth equal to the number of rows (N). When rewrite_not() (which uses a
recursive visitor
pattern) was subsequently called on this structure, or when the
structure was dropped, the call stack limit was exceeded.

Changes
1. Balanced Tree Construction: Refactored the predicate combination
logic. Instead of linear accumulation, row predicates are collected and
combined using a
pairwise combination approach to build a balanced tree. This reduces the
tree depth from O(N) to O(log N).
2. Early Rewrite: rewrite_not() is now called immediately on each
individual row predicate before they are combined. This ensures we are
combining simplified
      predicates and avoids traversing a massive unoptimized tree later.
3. Regression Test: Added
test_large_equality_delete_batch_stack_overflow, which processes 20,000
equality delete rows to verify the fix.

## Are these changes tested?
- [x] New regression test
test_large_equality_delete_batch_stack_overflow passed.
   - [x] All existing tests in arrow::caching_delete_file_loader passed.

Co-authored-by: Renjie Liu <[email protected]>

Report URL: https://github.com/apache/iceberg-rust/actions/runs/20216205951

With regards,
GitHub Actions via GitBox

Reply via email to