peter-toth commented on PR #10543:
URL: https://github.com/apache/datafusion/pull/10543#issuecomment-2122829132

   > What do we think about merging this PR and filing a follow on ticket to 
unify the APIs?
   
   I'm ok with merging the current state of the PR. But I was also thinking 
about how to improve it:
   
   As far as I see we have 3 options here:
   1. Adjust the current `TreeNode` implementations to be compatible with the 
more strict `apply_ref()` / `visit_ref()` and then replace the current 
`apply()` and `visit()` implementation to their `..._ref()` versions.
      This change would require:
      1. adding a new method to all `DynTreeNode`s to retrun references to 
their children,
      2. fixing `LogicalPlan::Join` as it only contains the left and right 
expressions in its `on: Vec<(Expr, Expr)>` field and 
`LogicalPlan:apply_expressions()` creates a temporary `Expr::eq` instance on 
the top of the pairs. But creating such temporary objects during visit is not 
compatible with the stricter API so we should change `LogicalPlan::Join` to 
contain `Expr::eq`s directly.
      3. fixing `Expr::Exists`, `Expr::InSubquery` and `Expr::ScalarSubquery` 
as they only contain the `Subquery` strcuts and 
`LogicalPlan::apply_subqueries()` creates a temporary `LogicalPlan::Subquery` 
instance. We should change the 3 expressions to contain 
`LogicalPlan::Subquery`s directly.
   2. Keep the current `apply()` / `visit()` and their `..._ref()` variants 
separate like this PR does. Maybe we could adjust the the implementation to not 
offer `apply_ref()` / `visit_ref()` (instead of their current throwing an error 
behaviour) on `TreeNodes` that doesn't support it (`DynTreeNode`s).
   3. Implement `apply()` / `visit()` in a way that the references used in the 
API closures and `f_down()` / `f_up()` methods  encode if the reference is a 
strict reference (whose lifetime match the root node's lifetime) or a "loose" 
reference whose reference doesn't match (used for returning references to 
temporary node instances).
      I.e. we could introduce a new tree node reference type:
      ```
      pub enum TreeNodeRefImpl<'n, 'a, N: TreeNode> {
          Permanent(&'n N), // reference matches the root node's lifetime: 'n
          Temporary(&'a N), // reference doesn't match the root node's lifetime
      }
      ```
      This change would require to introduce a `TreeNodeRef` trait and moving 
the `apply()` /  `visit()` functionality to there. Both the `TreeNode`'s 
`apply_children()` implementations and the API user's closures and `f_down()` / 
`f_up()` methods would need to take into account if the current `TreeNodeRef` 
is permanent or temporary, which would make the whole thing quite complex...
   
   So far I've been playing with 3. but it became very complex and still 
doesn't fully work. Also, I'm no longer sure that such an API makes sense as 
the API user can't specify what kind of references they want. E.g. in CSE 
(https://github.com/apache/datafusion/pull/10473) I need permanent references 
and can't do anything with other references whose lifetime doesn't match the 
root node's lifetime.
   
   So now I'm working on implementing 1., but it will be a pervasive change as 
there are 56 (`ExecutionPlan`) + 15 (`PhysicalExpr`) `DynTreeNode` 
implementations in the current codebase. They currently offer `fn 
children(&self) -> Vec<Arc<dyn ExecutionPlan>>` and `fn children(&self) -> 
Vec<Arc<dyn PhysicalExpr>>` methods to get their children and now I'm changing 
those to return `Vec<&Arc<dyn ...>>`.
   But if you have a better idea please share.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to