peter-toth commented on PR #10543: URL: https://github.com/apache/datafusion/pull/10543#issuecomment-2122829132
> What do we think about merging this PR and filing a follow on ticket to unify the APIs? I'm ok with merging the current state of the PR. But I was also thinking about how to improve it: As far as I see we have 3 options here: 1. Adjust the current `TreeNode` implementations to be compatible with the more strict `apply_ref()` / `visit_ref()` and then replace the current `apply()` and `visit()` implementation to their `..._ref()` versions. This change would require: 1. adding a new method to all `DynTreeNode`s to retrun references to their children, 2. fixing `LogicalPlan::Join` as it only contains the left and right expressions in its `on: Vec<(Expr, Expr)>` field and `LogicalPlan:apply_expressions()` creates a temporary `Expr::eq` instance on the top of the pairs. But creating such temporary objects during visit is not compatible with the stricter API so we should change `LogicalPlan::Join` to contain `Expr::eq`s directly. 3. fixing `Expr::Exists`, `Expr::InSubquery` and `Expr::ScalarSubquery` as they only contain the `Subquery` strcuts and `LogicalPlan::apply_subqueries()` creates a temporary `LogicalPlan::Subquery` instance. We should change the 3 expressions to contain `LogicalPlan::Subquery`s directly. 2. Keep the current `apply()` / `visit()` and their `..._ref()` variants separate like this PR does. Maybe we could adjust the the implementation to not offer `apply_ref()` / `visit_ref()` (instead of their current throwing an error behaviour) on `TreeNodes` that doesn't support it (`DynTreeNode`s). 3. Implement `apply()` / `visit()` in a way that the references used in the API closures and `f_down()` / `f_up()` methods encode if the reference is a strict reference (whose lifetime match the root node's lifetime) or a "loose" reference whose reference doesn't match (used for returning references to temporary node instances). I.e. we could introduce a new tree node reference type: ``` pub enum TreeNodeRefImpl<'n, 'a, N: TreeNode> { Permanent(&'n N), // reference matches the root node's lifetime: 'n Temporary(&'a N), // reference doesn't match the root node's lifetime } ``` This change would require to introduce a `TreeNodeRef` trait and moving the `apply()` / `visit()` functionality to there. Both the `TreeNode`'s `apply_children()` implementations and the API user's closures and `f_down()` / `f_up()` methods would need to take into account if the current `TreeNodeRef` is permanent or temporary, which would make the whole thing quite complex... So far I've been playing with 3. but it became very complex and still doesn't fully work. Also, I'm no longer sure that such an API makes sense as the API user can't specify what kind of references they want. E.g. in CSE (https://github.com/apache/datafusion/pull/10473) I need permanent references and can't do anything with other references whose lifetime doesn't match the root node's lifetime. So now I'm working on implementing 1., but it will be a pervasive change as there are 56 (`ExecutionPlan`) + 15 (`PhysicalExpr`) `DynTreeNode` implementations in the current codebase. They currently offer `fn children(&self) -> Vec<Arc<dyn ExecutionPlan>>` and `fn children(&self) -> Vec<Arc<dyn PhysicalExpr>>` methods to get their children and now I'm changing those to return `Vec<&Arc<dyn ...>>`. But if you have a better idea please share. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org