adriangb opened a new pull request, #20160: URL: https://github.com/apache/datafusion/pull/20160
## Summary Add `SelectivityAwareFilterExpr`, a wrapper `PhysicalExpr` that tracks filter selectivity at runtime and automatically disables filters that aren't pruning enough rows. This addresses the issue where dynamic filters from `HashJoinExec` can be expensive to evaluate for little benefit when the build side covers most of the probe side values. ## Key Features - **Selectivity threshold**: Filter disabled when `rows_passed / rows_total >= threshold` - **Minimum rows**: Statistics collected for `min_rows` before making a decision - **Generation-aware reset**: Resets when inner filter updates (e.g., hash table built) - **Permanent disable**: Once disabled, stays disabled for rest of query - **Disabled behavior**: Returns all-true array to bypass filter evaluation ## New Configuration Options Added to `OptimizerOptions`: - `enable_dynamic_filter_selectivity_tracking` (default: `false`) - `dynamic_filter_selectivity_threshold` (default: `0.95`) - `dynamic_filter_min_rows_for_selectivity` (default: `10000`) ## Files Changed | File | Changes | |------|---------| | `datafusion/physical-expr/src/expressions/selectivity_aware_filter.rs` | **NEW** - Core wrapper implementation | | `datafusion/physical-expr/src/expressions/mod.rs` | Add module and re-export | | `datafusion/common/src/config.rs` | Add 3 new config options to `OptimizerOptions` | | `datafusion/physical-plan/src/joins/hash_join/exec.rs` | Wrap dynamic filter with selectivity tracker | | `datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt` | Add integration tests | ## Test plan - [x] Unit tests for `SelectivityAwareFilterExpr` (6 tests) - [x] Hash join tests (367 tests pass) - [x] SQL logic tests for config options - [x] Verify queries return correct results with selectivity tracking enabled 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
