mingmwang opened a new issue #1965:
URL: https://github.com/apache/arrow-datafusion/issues/1965
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
The current with_new_children() method in ExecutionPlan trait is an abstract
method and each physical plan implements the method with their specific
versions. But a common optimization is omitted by almost all the
implementations is if the new children provided in the params effectively share
the same references with the existing children, there is no need to re-create
the struct. And recreate new struct might cause its parent plan node to
recreate itself which will cause many unnecessary plan node creation during
plan/optimization time. Besides performance impact, it will make the plan
debugging very hard for example If I want to
check why the a new plan node(like shuffle writer) was added to the plan
tree, and I add a debug breakpoint to the shuffle writer node's constructor, I
will see mostly it was invoked by with_new_children().
```
pub trait ExecutionPlan: Debug + Send + Sync {
/// Returns a new plan where all children were replaced by new plans.
/// The size of `children` must be equal to the size of
`ExecutionPlan::children()`.
fn with_new_children(
&self,
children: Vec<Arc<dyn ExecutionPlan>>,
) -> Result<Arc<dyn ExecutionPlan>>;
```
A new approach is proposed here is to have a default implement in
ExecutionPlan to include some common logic and optimizations like children size
check the fast path check and then delegate to the specific version:
```
pub trait ExecutionPlan: Debug + Send + Sync {
/// Returns a new plan where all children were replaced by new plans.
/// The size of `children` must be equal to the size of
`ExecutionPlan::children()`.
fn with_new_children(
self: Arc<Self>
children: Vec<Arc<dyn ExecutionPlan>>,
) -> Result<Arc<dyn ExecutionPlan>> {
/// children size check here
/// children reference check, if the input children's Arc share the
same raw point with current existing children, return self.
with_new_children_internal()
}
fn with_new_children_internal(
self: Arc<Self>
children: Vec<Arc<dyn ExecutionPlan>>,
) -> Result<Arc<dyn ExecutionPlan>>;
```
We can leverage the Arc::ptr_eq() to do the reference check.
https://doc.rust-lang.org/stable/std/sync/struct.Arc.html#method.ptr_eq
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features
you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]