wirybeaver opened a new pull request, #22523:
URL: https://github.com/apache/datafusion/pull/22523

   ## Which issue does this PR close?
   
   Related to https://github.com/apache/datafusion-ballista/issues/1359
   
   ## Rationale
   
   Ballista's Adaptive Query Execution (AQE) planner re-invokes DataFusion's 
full `PhysicalOptimizer` chain after every completed stage. 
`FilterPushdown::new_post_optimization()` is not idempotent on plans containing 
`HashJoinExec`.
   
   In the `Post` phase, `HashJoinExec::gather_filters_for_pushdown` 
unconditionally creates a new `DynamicFilterPhysicalExpr` and installs it on 
the probe-side child via `with_self_filter`. After pass 1 the join already 
carries a `dynamic_filter: Some(...)`, and the shared 
`Arc<DynamicFilterPhysicalExpr>` is already wired into the probe-side scan's 
predicate. On pass 2 a *second* dynamic filter is created and ANDed onto the 
existing predicate, producing `DynamicFilter AND DynamicFilter`. Each 
subsequent pass adds another duplicate, compounding indefinitely in AQE replan 
loops.
   
   ## What changes are included in this PR?
   
   - **Guard in `HashJoinExec::gather_filters_for_pushdown`**: skip 
dynamic-filter creation when `self.dynamic_filter.is_some()`, meaning a 
previous pass already installed one. The existing `Arc` remains valid and 
correctly wired into the probe-side scan.
   - **Comment** explaining why the guard is needed (AQE replan context).
   - **Test** `post_phase_is_idempotent_on_hash_join` in 
`tests/physical_optimizer/filter_pushdown.rs`: builds a `HashJoinExec`, runs 
`FilterPushdown::new_post_optimization()` twice, and asserts structural 
equality via `get_plan_string`.
   
   ## Are these changes tested?
   
   Yes. The new test fails without the fix (plan strings diverge due to 
duplicated dynamic filter predicates) and passes with it.
   
   ## Are there any user-facing changes?
   
   No. Dynamic filter pushdown is an internal optimization; the idempotence 
guard only affects re-optimization scenarios (AQE).
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to