wirybeaver opened a new pull request, #22522: URL: https://github.com/apache/datafusion/pull/22522
## Which issue does this PR close? Related to https://github.com/apache/datafusion-ballista/issues/1359 ## Rationale Ballista's Adaptive Query Execution (AQE) planner re-invokes DataFusion's full `PhysicalOptimizer` chain after every completed stage (`AdaptivePlanner::replan_stages`). Rules that are not idempotent (`rule(rule(x)) != rule(x)`) stack execution-plan nodes on each pass. `OutputRequirements::new_add_mode()` wraps the plan root with `OutputRequirementExec` to preserve global ordering/distribution requirements. On a second pass the wrapper's `maintains_input_order() == [true]` and `required_input_ordering() == [None]` cause `require_top_ordering_helper` to recurse through it and produce a *second* wrapper, yielding `OutputRequirementExec(OutputRequirementExec(...))`. Each AQE replan adds another layer. ## What changes are included in this PR? - **Guard in `require_top_ordering()`**: if the plan root is already an `OutputRequirementExec`, return it unchanged. This makes the rule idempotent with zero overhead for single-pass use. - **Doc-comment update** on `new_add_mode()` and `require_top_ordering()` documenting the idempotence guarantee. - **Two tests** in `tests/physical_optimizer/output_requirements.rs`: - `add_mode_is_idempotent_on_bare_scan` — bare `ParquetExec` (exercises `is_changed = false` path). - `add_mode_is_idempotent_on_sorted_plan` — `SortExec → ParquetExec` (exercises `is_changed = true` path). ## Are these changes tested? Yes. Two new tests run the rule twice on distinct fixtures and assert structural equality via `get_plan_string`. Both fail without the fix (double-wrapped `OutputRequirementExec`) and pass with it. ## Are there any user-facing changes? No. `OutputRequirementExec` is an internal ancillary node stripped before execution; the idempotence guard only affects re-optimization scenarios (AQE). 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
