xudong963 opened a new pull request, #9694: URL: https://github.com/apache/arrow-rs/pull/9694
# Which issue does this PR close? - Related to https://github.com/apache/datafusion/issues/19028 # Rationale for this change When DataFusion evaluates a Parquet scan with filter pushdown, it uses row group statistics to determine which row groups to scan. In many real-world queries, the predicate matches **all** rows in some (or all) row groups — for example, a time-range filter where entire row groups fall within the range, or a `WHERE status != 'DELETED'` filter on data that contains no deleted rows. Today, even when row group statistics **prove** that every row satisfies the predicate, the `RowFilter` is still evaluated row-by-row during decoding. This means the filter columns are decoded and the predicate expression is evaluated for every row — work that produces no useful filtering and can be expensive, especially when filter columns are large (e.g., strings) or the predicate is complex. This PR adds a mechanism to skip `RowFilter` evaluation entirely for row groups that are known to be "fully matched" based on statistics. The caller (e.g., DataFusion) determines which row groups are fully matched during row group pruning and passes that information to the reader builder. During decoding, fully matched row groups skip straight to data materialization, bypassing filter column decoding and predicate evaluation. **Benchmark results** (from a DataFusion benchmark using `ParquetPushDecoder` — 20 row groups × 50K rows, with a 1KB string payload column): | Scenario | Time | vs. no optimization | |----------|------|-------------------| | Filter pushdown, no skip | 53.3 ms | baseline | | Filter pushdown, with skip | 27.3 ms | **~49% faster** | | No pushdown at all | 59.8 ms | — | When all row groups are fully matched, skipping the filter brings performance close to having no filter at all, while still benefiting from row group pruning on row groups that don't fully match. # What changes are included in this PR? 1. **New builder method `with_fully_matched_row_groups(Vec<usize>)`** on `ArrowReaderBuilder` — allows callers to specify which row groups have all rows matching the filter predicate. 2. **Skip filter in `RowGroupReaderBuilder::try_transition()`** — when a row group is in the fully-matched set, the `Start` state transitions directly to `StartData`, bypassing the `Filters` / `WaitingOnFilterData` states entirely. The filter is preserved (put back into `self.filter`) for subsequent non-fully-matched row groups. 3. **Plumbed through all decoder paths** — the field is propagated through `ParquetPushDecoderBuilder`, `ParquetRecordBatchStreamBuilder` (async), and ignored in the sync reader (which processes one row group at a time). **Design choices:** - The fully-matched set is stored as a `HashSet<usize>` on `RowGroupReaderBuilder` for O(1) lookup, rather than on `RowGroupDecoderState`, so the state enum size is unchanged (preserving the existing 200-byte size test). - The API uses `Option<Vec<usize>>` at the builder level and converts to `HashSet` internally. # Are these changes tested? The optimization is exercised by an end-to-end benchmark in DataFusion that uses `ParquetPushDecoder` directly (the same code path used by DataFusion's async Parquet opener). The benchmark verifies correctness by asserting the expected row count. Unit tests can be added if reviewers prefer — happy to add tests that verify: - Fully matched row groups skip filter evaluation and still return all rows - Non-fully-matched row groups in the same scan still have filters applied - The API is a no-op when no fully matched row groups are specified # Are there any user-facing changes? Yes — a new public method `ArrowReaderBuilder::with_fully_matched_row_groups()` is added. This is a purely additive, non-breaking change. Existing code is unaffected since the default is `None` (no row groups are marked as fully matched). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
