alamb commented on issue #19351: URL: https://github.com/apache/datafusion/issues/19351#issuecomment-3814027151
> I believe that current design (store state directly in plans) is not friendly for re-using. If we consider how this is done, for example in PostgreSQL, the state of plans and expressions needed for execution is stored separately from the planned expressions themselves. This allows re-using plans and their expressions without the need for constant recreation. For a reference, see the [function](https://github.com/postgres/postgres/blob/9b9eaf08ab2dc22c691b22e59f1574e0f1bcc822/src/backend/executor/execExpr.c#L108-L143). In my mind the "prepare for execution" step you describe is basically what `execute` does -- it creates the execution state in the returned `Stream` I don't understand why the design of `reset_plan_states` doesn't work for your usecase. There are several implementations of `ExecutionPlan::reset_state` that don't do a deep copy and seem pretty fast (copy some Arc's): https://github.com/apache/datafusion/blob/ead8209803770773980fafaf0fc622bb606be0ee/datafusion/physical-plan/src/joins/cross_join.rs#L281-L291 The [default](https://github.com/apache/datafusion/blob/ca904b30861c2aa4dd8c1ec261da9268e2f65fe2/datafusion/physical-plan/src/execution_plan.rs#L234-L233) implementation does calls `new_with_children` ```rust fn reset_state(self: Arc<Self>) -> Result<Arc<dyn ExecutionPlan>> { let children = self.children().into_iter().cloned().collect(); self.with_new_children(children) } ``` Which you identified as being expensive. However, maybe we could just implement optimized version `reset_plan_states` for operators like `ProjectionExec` and `GroupByHashExec` 🤔 I think the main challenge is when ExecutionPlans need to coordinate sharing / communication across multiple nodes (e.g. dynamic filters) which is tricky -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
