adriangb commented on PR #22144:
URL: https://github.com/apache/datafusion/pull/22144#issuecomment-4464041143
## Split into a reviewable stack
This experiment has been broken into 4 stacked PRs so each piece can be
reviewed (and where possible, merged) on its own. Each builds on the previous;
the diffs are cumulative against `main`, but every PR adds exactly **one new
commit** — review that commit.
1. **#22234 — `OptionalFilterPhysicalExpr` + proto** (+400)
A transparent `PhysicalExpr` wrapper marking a filter as
droppable-without-affecting-correctness, plus proto round-trip support. Purely
additive, no caller — inert until something reads the marker.
2. **#22235 — Per-conjunct pruning statistics** (+500/-18)
`PruningPredicate::try_new_tagged_conjuncts` / `prune_per_conjunct` and
the row-group / page-index variants surface per-conjunct effectiveness as a
free side effect of the pruning pass. Existing untagged paths unchanged.
3. **#22236 — `SelectivityTracker` cost model** (+2973)
The cross-file cost model that partitions filter conjuncts into row-level
/ post-scan / dropped buckets, with ~45 unit tests and a benchmark. Not yet
wired into the scan.
4. **#22237 — Adaptive parquet scan integration** (+1823/-624)
Wires it all together: `AdaptiveParquetStream`, re-partitioning at
row-group boundaries, integration with the fully-matched run splitting from
#21637, the hash-join `OptionalFilterPhysicalExpr` wrap, and config knobs.
### Notes
- Each layer compiles and passes clippy (`-D warnings`) independently.
- **PRs 1–3 have no external dependency** and can merge on their own merits.
- **PR 4** pins a custom `arrow-rs` branch for the push-decoder
`StrategySwap` APIs — it cannot merge upstream until those APIs land in a
released `arrow-rs`.
This PR remains as the integration reference / discussion thread.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]