alamb commented on code in PR #17028: URL: https://github.com/apache/datafusion/pull/17028#discussion_r2252606364
########## datafusion/physical-plan/src/execution_plan.rs: ########## @@ -194,6 +194,31 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { children: Vec<Arc<dyn ExecutionPlan>>, ) -> Result<Arc<dyn ExecutionPlan>>; + /// Reset any internal state within this [`ExecutionPlan`]. + /// + /// This method is called when an [`ExecutionPlan`] needs to be re-executed, + /// such as in recursive queries. Unlike [`ExecutionPlan::with_new_children`], this method + /// ensures that any stateful components (e.g., [`DynamicFilterPhysicalExpr`]) + /// are reset to their initial state. + /// + /// The default implementation simply calls [`ExecutionPlan::with_new_children`] with the existing children, + /// effectively creating a new instance of the [`ExecutionPlan`] with the same children but without + /// necessarily resetting any internal state. Implementations that require resetting of some + /// internal state should override this method to provide the necessary logic. + /// + /// This method should *not* reset state recursively for children, as it is expected that + /// it will be called from within a walk of the execution plan tree so that it will be called on each child later + /// or was already called on each child. + /// + /// Note to implementers: unlike [`ExecutionPlan::with_new_children`] this method does not accept new children as an argument, + /// thus it is expected that any cached plan properties will remain valid after the reset. + /// + /// [`DynamicFilterPhysicalExpr`]: datafusion_physical_expr::expressions::DynamicFilterPhysicalExpr Review Comment: I wonder if we should also add a note to DynamicFilterPhysicalExpr saying any ExecutionPlan that uses them should also implement `reset_state` ########## datafusion/physical-plan/src/sorts/sort.rs: ########## @@ -1116,10 +1127,46 @@ impl ExecutionPlan for SortExec { self: Arc<Self>, children: Vec<Arc<dyn ExecutionPlan>>, ) -> Result<Arc<dyn ExecutionPlan>> { - let mut new_sort = SortExec::new(self.expr.clone(), Arc::clone(&children[0])) - .with_fetch(self.fetch) - .with_preserve_partitioning(self.preserve_partitioning); - new_sort.filter = self.filter.clone(); + let mut new_sort = self.cloned(); + assert!( + children.len() == 1, + "SortExec should have exactly one child" + ); + new_sort.input = Arc::clone(&children[0]); + // Recompute the properties based on the new input since they may have changed. + let (cache, sort_prefix) = Self::compute_properties( + &new_sort.input, + new_sort.expr.clone(), + new_sort.preserve_partitioning, + ) + .expect(concat!( + "Safety: we had already been calling `compute_properties(...).unwrap()` in `new()` ", + "and it seems to be okay", + "\n", + "We assumed that doing the same thing here directly instead ", + "of calling `new()` (as we did before this commit) is also okay but it's possible that ", + "implementations have drifted and this is no longer safe even if `new()` still works, ", + "for example if `new()` now does something different than just calling `compute_properties(...).unwrap()`", + "\n", + "This is clearly a bug, please report it!" Review Comment: I think we should return [`DatafusionError::Internal`](https://docs.rs/datafusion/latest/datafusion/common/enum.DataFusionError.html#variant.Internal) here rather than panic'ing as it is a better UX (if you want fail fast in debug builds, perhaps you could add a `debug_assert`) I also recommend converting the explanation into comments and leaving the panic message like "Internal inconsistency in SortExec" The rationale is that if a user sees this message it is not going to mean anything to them and they can't fix it, and this text will obscure the conclusion (this is a bug they can not do anything to fix). A developer will come to the source location and can read the comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org