xudong963 opened a new pull request, #9694:
URL: https://github.com/apache/arrow-rs/pull/9694

   # Which issue does this PR close?
   
   - Related to https://github.com/apache/datafusion/issues/19028
   
   # Rationale for this change
   
   When DataFusion evaluates a Parquet scan with filter pushdown, it uses row 
group statistics to determine which row groups to scan. In many real-world 
queries, the predicate matches **all** rows in some (or all) row groups — for 
example, a time-range filter where entire row groups fall within the range, or 
a `WHERE status != 'DELETED'` filter on data that contains no deleted rows.
   
   Today, even when row group statistics **prove** that every row satisfies the 
predicate, the `RowFilter` is still evaluated row-by-row during decoding. This 
means the filter columns are decoded and the predicate expression is evaluated 
for every row — work that produces no useful filtering and can be expensive, 
especially when filter columns are large (e.g., strings) or the predicate is 
complex.
   
   This PR adds a mechanism to skip `RowFilter` evaluation entirely for row 
groups that are known to be "fully matched" based on statistics. The caller 
(e.g., DataFusion) determines which row groups are fully matched during row 
group pruning and passes that information to the reader builder. During 
decoding, fully matched row groups skip straight to data materialization, 
bypassing filter column decoding and predicate evaluation.
   
   **Benchmark results** (from a DataFusion benchmark using 
`ParquetPushDecoder` — 20 row groups × 50K rows, with a 1KB string payload 
column):
   
   | Scenario | Time | vs. no optimization |
   |----------|------|-------------------|
   | Filter pushdown, no skip | 53.3 ms | baseline |
   | Filter pushdown, with skip | 27.3 ms | **~49% faster** |
   | No pushdown at all | 59.8 ms | — |
   
   When all row groups are fully matched, skipping the filter brings 
performance close to having no filter at all, while still benefiting from row 
group pruning on row groups that don't fully match.
   
   # What changes are included in this PR?
   
   1. **New builder method `with_fully_matched_row_groups(Vec<usize>)`** on 
`ArrowReaderBuilder` — allows callers to specify which row groups have all rows 
matching the filter predicate.
   
   2. **Skip filter in `RowGroupReaderBuilder::try_transition()`** — when a row 
group is in the fully-matched set, the `Start` state transitions directly to 
`StartData`, bypassing the `Filters` / `WaitingOnFilterData` states entirely. 
The filter is preserved (put back into `self.filter`) for subsequent 
non-fully-matched row groups.
   
   3. **Plumbed through all decoder paths** — the field is propagated through 
`ParquetPushDecoderBuilder`, `ParquetRecordBatchStreamBuilder` (async), and 
ignored in the sync reader (which processes one row group at a time).
   
   **Design choices:**
   - The fully-matched set is stored as a `HashSet<usize>` on 
`RowGroupReaderBuilder` for O(1) lookup, rather than on `RowGroupDecoderState`, 
so the state enum size is unchanged (preserving the existing 200-byte size 
test).
   - The API uses `Option<Vec<usize>>` at the builder level and converts to 
`HashSet` internally.
   
   # Are these changes tested?
   
   The optimization is exercised by an end-to-end benchmark in DataFusion that 
uses `ParquetPushDecoder` directly (the same code path used by DataFusion's 
async Parquet opener). The benchmark verifies correctness by asserting the 
expected row count.
   
   Unit tests can be added if reviewers prefer — happy to add tests that verify:
   - Fully matched row groups skip filter evaluation and still return all rows
   - Non-fully-matched row groups in the same scan still have filters applied
   - The API is a no-op when no fully matched row groups are specified
   
   # Are there any user-facing changes?
   
   Yes — a new public method 
`ArrowReaderBuilder::with_fully_matched_row_groups()` is added. This is a 
purely additive, non-breaking change. Existing code is unaffected since the 
default is `None` (no row groups are marked as fully matched).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to