adriangb opened a new pull request, #22236:
URL: https://github.com/apache/datafusion/pull/22236

   ## Which issue does this PR close?
   
   - Part of #22144 (Adaptive filter pushdown), split into a reviewable stack. 
This is **PR 3 of 4**.
   
   ## Rationale for this change
   
   The cost model that decides where each filter conjunct runs (row-level, 
post-scan, or dropped) is large enough to review on its own, separate from the 
scan plumbing that consumes it.
   
   ## What changes are included in this PR?
   
   - `SelectivityTracker`: a cross-file cost model that accumulates per-filter 
selectivity and throughput statistics and, using a confidence interval, 
partitions filter conjuncts into row-level / post-scan / dropped buckets.
   - `total_compressed_bytes` helper in `row_filter` (column-byte sizing used 
by the tracker).
   - A criterion benchmark for the tracker.
   
   Nothing wires the tracker into the parquet scan yet — that is the final PR 
in the stack. A few `pub(crate)` items only exercised by that integration carry 
a temporary `#[expect(dead_code)]`, removed in PR 4.
   
   ## Are these changes tested?
   
   Yes — ~45 unit tests cover the partition / promote / demote / drop logic.
   
   ## Are there any user-facing changes?
   
   New `pub` module `datafusion-datasource-parquet::selectivity`. No behavior 
change — no production code path uses it yet.
   
   ---
   
   **Stacked PR — diff is cumulative against `main`.** Review the top commit 
*"feat: add SelectivityTracker adaptive filter cost model"*; the commits below 
it are PRs #22234 and #22235.
   
   Stack (review/merge in order):
   1. #22234 — OptionalFilterPhysicalExpr + proto
   2. #22235 — Per-conjunct pruning statistics
   3. **this PR** — SelectivityTracker cost model
   4. Adaptive parquet scan integration


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to