cbb330 opened a new pull request, #49011: URL: https://github.com/apache/arrow/pull/49011
## Summary Part 3/15 of ORC predicate pushdown implementation. ⚠️ **Depends on PR #49009 and PR 2 being merged first** Adds lazy evaluation infrastructure to OrcFileFragment: - Only process statistics for fields referenced in predicate - Cache statistics expressions to avoid recomputation - Track which fields have been processed - Efficient: O(fields_in_predicate) not O(all_fields) ## Changes - Add `statistics_expressions_` cache to OrcFileFragment - Add `statistics_expressions_complete_` tracking - Add metadata caching infrastructure - Add `EnsureMetadataCached()` for lazy loading ## Performance Impact For a file with 100 columns and predicate on 2 columns: - Without lazy evaluation: Process 100 × N stripes - With lazy evaluation: Process 2 × N stripes (50x reduction) **Part of stacked PR series. Review after PR 2.** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
