adriangb opened a new pull request, #22236: URL: https://github.com/apache/datafusion/pull/22236
## Which issue does this PR close? - Part of #22144 (Adaptive filter pushdown), split into a reviewable stack. This is **PR 3 of 4**. ## Rationale for this change The cost model that decides where each filter conjunct runs (row-level, post-scan, or dropped) is large enough to review on its own, separate from the scan plumbing that consumes it. ## What changes are included in this PR? - `SelectivityTracker`: a cross-file cost model that accumulates per-filter selectivity and throughput statistics and, using a confidence interval, partitions filter conjuncts into row-level / post-scan / dropped buckets. - `total_compressed_bytes` helper in `row_filter` (column-byte sizing used by the tracker). - A criterion benchmark for the tracker. Nothing wires the tracker into the parquet scan yet — that is the final PR in the stack. A few `pub(crate)` items only exercised by that integration carry a temporary `#[expect(dead_code)]`, removed in PR 4. ## Are these changes tested? Yes — ~45 unit tests cover the partition / promote / demote / drop logic. ## Are there any user-facing changes? New `pub` module `datafusion-datasource-parquet::selectivity`. No behavior change — no production code path uses it yet. --- **Stacked PR — diff is cumulative against `main`.** Review the top commit *"feat: add SelectivityTracker adaptive filter cost model"*; the commits below it are PRs #22234 and #22235. Stack (review/merge in order): 1. #22234 — OptionalFilterPhysicalExpr + proto 2. #22235 — Per-conjunct pruning statistics 3. **this PR** — SelectivityTracker cost model 4. Adaptive parquet scan integration -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
