mingmwang opened a new issue #1965:
URL: https://github.com/apache/arrow-datafusion/issues/1965


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   The current with_new_children() method in ExecutionPlan trait is an abstract 
method and each physical plan implements the method with their specific 
versions. But a common optimization is omitted by almost all the 
implementations is if the new children provided in the params effectively share 
the same references with the existing children, there is no need to re-create 
the struct. And recreate new struct might cause its parent plan node to 
recreate itself which will cause many unnecessary plan node creation during 
plan/optimization time. Besides performance impact, it will make the plan 
debugging very hard for example If I want to
   check why the a new plan node(like shuffle writer) was added to the plan 
tree, and I add a debug breakpoint to the shuffle writer node's constructor, I 
will see mostly it was invoked by  with_new_children().
    
   ```
    pub trait ExecutionPlan: Debug + Send + Sync {
   
       /// Returns a new plan where all children were replaced by new plans.
       /// The size of `children` must be equal to the size of 
`ExecutionPlan::children()`.
       fn with_new_children(
           &self,
           children: Vec<Arc<dyn ExecutionPlan>>,
       ) -> Result<Arc<dyn ExecutionPlan>>;
   
   ```
   
   A new approach is proposed here is to have a default implement in 
ExecutionPlan to include some common logic and optimizations like children size 
check the fast path check and then delegate to the specific version:
   
   ```
    pub trait ExecutionPlan: Debug + Send + Sync {
   
       /// Returns a new plan where all children were replaced by new plans.
       /// The size of `children` must be equal to the size of 
`ExecutionPlan::children()`.
       fn with_new_children(
           self: Arc<Self>
           children: Vec<Arc<dyn ExecutionPlan>>,
       ) -> Result<Arc<dyn ExecutionPlan>> {
        /// children size check here
       
        /// children reference check, if the input children's Arc share the 
same raw point with current existing children, return self.
   
        with_new_children_internal()
   
       }
   
      fn with_new_children_internal(
           self: Arc<Self>
           children: Vec<Arc<dyn ExecutionPlan>>,
       ) -> Result<Arc<dyn ExecutionPlan>>;
   
   
   ```
   
   We can leverage the Arc::ptr_eq() to do the reference check.
   
   https://doc.rust-lang.org/stable/std/sync/struct.Arc.html#method.ptr_eq
   
   
   **Describe the solution you'd like**
   A clear and concise description of what you want to happen.
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features 
you've considered.
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to