zhuqi-lucas opened a new issue, #22520: URL: https://github.com/apache/datafusion/issues/22520
### Describe the bug / inefficiency `ensure_distribution` in `datafusion/physical-optimizer/src/ensure_requirements/enforce_distribution.rs` unconditionally calls `plan.with_new_children(children_plans)` after collecting the (possibly redistributed) children, even when none of those children were actually replaced. For nodes like `ProjectionExec`, `with_new_children` goes through `try_new` and recomputes the schema, equivalence properties, and output ordering. When the children are byte-identical Arcs as the input, that recomputation is pure overhead. ### To Reproduce Profiling a representative deep plan stack (30 stacked `ProjectionExec` over a sorted parquet scan, no join / aggregate / unmet ordering — the common point-query shape) shows `ProjectionExec::with_new_children` dominating `ensure_distribution` time: - `ensure_distribution` total: 2.87s out of a 60s sample - `ProjectionExec::with_new_children`: 1.94s (56% of the rule) - `OneOfExec::with_new_children`: 0.21s - `SortExec::with_new_children`: 0.11s - Other: 0.61s Micro-benchmark on the same plan shape, 5000 iterations: - Current code: 852.7 ms total, 170.55 us/call - With the fix: 296.8 ms total, 59.36 us/call - ~2.87x speedup, -65% CPU per call ### Expected behavior When `ensure_distribution` has not replaced any of the children (all `children_plans` are `Arc::ptr_eq` to the original `plan.children()`), reuse the existing `plan` Arc unchanged instead of calling `with_new_children`. The `UnionExec` to `InterleaveExec` special case still needs to run first because it intentionally creates a new node even when child Arcs are unchanged. ### Additional context This shows up most clearly on workloads dominated by point queries against parquet with deep projection chains, where no distribution change applies at most plan nodes. The fix is local to `ensure_distribution` and does not touch the per-node `with_new_children` implementations. I have a fix ready and will open a PR referencing this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
