wirybeaver opened a new pull request, #22523: URL: https://github.com/apache/datafusion/pull/22523
## Which issue does this PR close? Related to https://github.com/apache/datafusion-ballista/issues/1359 ## Rationale Ballista's Adaptive Query Execution (AQE) planner re-invokes DataFusion's full `PhysicalOptimizer` chain after every completed stage. `FilterPushdown::new_post_optimization()` is not idempotent on plans containing `HashJoinExec`. In the `Post` phase, `HashJoinExec::gather_filters_for_pushdown` unconditionally creates a new `DynamicFilterPhysicalExpr` and installs it on the probe-side child via `with_self_filter`. After pass 1 the join already carries a `dynamic_filter: Some(...)`, and the shared `Arc<DynamicFilterPhysicalExpr>` is already wired into the probe-side scan's predicate. On pass 2 a *second* dynamic filter is created and ANDed onto the existing predicate, producing `DynamicFilter AND DynamicFilter`. Each subsequent pass adds another duplicate, compounding indefinitely in AQE replan loops. ## What changes are included in this PR? - **Guard in `HashJoinExec::gather_filters_for_pushdown`**: skip dynamic-filter creation when `self.dynamic_filter.is_some()`, meaning a previous pass already installed one. The existing `Arc` remains valid and correctly wired into the probe-side scan. - **Comment** explaining why the guard is needed (AQE replan context). - **Test** `post_phase_is_idempotent_on_hash_join` in `tests/physical_optimizer/filter_pushdown.rs`: builds a `HashJoinExec`, runs `FilterPushdown::new_post_optimization()` twice, and asserts structural equality via `get_plan_string`. ## Are these changes tested? Yes. The new test fails without the fix (plan strings diverge due to duplicated dynamic filter predicates) and passes with it. ## Are there any user-facing changes? No. Dynamic filter pushdown is an internal optimization; the idempotence guard only affects re-optimization scenarios (AQE). 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
