adriangb opened a new pull request, #20160:
URL: https://github.com/apache/datafusion/pull/20160

   ## Summary
   
   Add `SelectivityAwareFilterExpr`, a wrapper `PhysicalExpr` that tracks 
filter selectivity at runtime and automatically disables filters that aren't 
pruning enough rows. This addresses the issue where dynamic filters from 
`HashJoinExec` can be expensive to evaluate for little benefit when the build 
side covers most of the probe side values.
   
   ## Key Features
   
   - **Selectivity threshold**: Filter disabled when `rows_passed / rows_total 
>= threshold`
   - **Minimum rows**: Statistics collected for `min_rows` before making a 
decision
   - **Generation-aware reset**: Resets when inner filter updates (e.g., hash 
table built)
   - **Permanent disable**: Once disabled, stays disabled for rest of query
   - **Disabled behavior**: Returns all-true array to bypass filter evaluation
   
   ## New Configuration Options
   
   Added to `OptimizerOptions`:
   - `enable_dynamic_filter_selectivity_tracking` (default: `false`)
   - `dynamic_filter_selectivity_threshold` (default: `0.95`)  
   - `dynamic_filter_min_rows_for_selectivity` (default: `10000`)
   
   ## Files Changed
   
   | File | Changes |
   |------|---------|
   | `datafusion/physical-expr/src/expressions/selectivity_aware_filter.rs` | 
**NEW** - Core wrapper implementation |
   | `datafusion/physical-expr/src/expressions/mod.rs` | Add module and 
re-export |
   | `datafusion/common/src/config.rs` | Add 3 new config options to 
`OptimizerOptions` |
   | `datafusion/physical-plan/src/joins/hash_join/exec.rs` | Wrap dynamic 
filter with selectivity tracker |
   | `datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt` | 
Add integration tests |
   
   ## Test plan
   
   - [x] Unit tests for `SelectivityAwareFilterExpr` (6 tests)
   - [x] Hash join tests (367 tests pass)
   - [x] SQL logic tests for config options
   - [x] Verify queries return correct results with selectivity tracking enabled
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to