peter-toth commented on PR #10543:
URL: https://github.com/apache/datafusion/pull/10543#issuecomment-2122829132
> What do we think about merging this PR and filing a follow on ticket to
unify the APIs?
I'm ok with merging the current state of the PR. But I was also thinking
about how to improve it:
As far as I see we have 3 options here:
1. Adjust the current `TreeNode` implementations to be compatible with the
more strict `apply_ref()` / `visit_ref()` and then replace the current
`apply()` and `visit()` implementation to their `..._ref()` versions.
This change would require:
1. adding a new method to all `DynTreeNode`s to retrun references to
their children,
2. fixing `LogicalPlan::Join` as it only contains the left and right
expressions in its `on: Vec<(Expr, Expr)>` field and
`LogicalPlan:apply_expressions()` creates a temporary `Expr::eq` instance on
the top of the pairs. But creating such temporary objects during visit is not
compatible with the stricter API so we should change `LogicalPlan::Join` to
contain `Expr::eq`s directly.
3. fixing `Expr::Exists`, `Expr::InSubquery` and `Expr::ScalarSubquery`
as they only contain the `Subquery` strcuts and
`LogicalPlan::apply_subqueries()` creates a temporary `LogicalPlan::Subquery`
instance. We should change the 3 expressions to contain
`LogicalPlan::Subquery`s directly.
2. Keep the current `apply()` / `visit()` and their `..._ref()` variants
separate like this PR does. Maybe we could adjust the the implementation to not
offer `apply_ref()` / `visit_ref()` (instead of their current throwing an error
behaviour) on `TreeNodes` that doesn't support it (`DynTreeNode`s).
3. Implement `apply()` / `visit()` in a way that the references used in the
API closures and `f_down()` / `f_up()` methods encode if the reference is a
strict reference (whose lifetime match the root node's lifetime) or a "loose"
reference whose reference doesn't match (used for returning references to
temporary node instances).
I.e. we could introduce a new tree node reference type:
```
pub enum TreeNodeRefImpl<'n, 'a, N: TreeNode> {
Permanent(&'n N), // reference matches the root node's lifetime: 'n
Temporary(&'a N), // reference doesn't match the root node's lifetime
}
```
This change would require to introduce a `TreeNodeRef` trait and moving
the `apply()` / `visit()` functionality to there. Both the `TreeNode`'s
`apply_children()` implementations and the API user's closures and `f_down()` /
`f_up()` methods would need to take into account if the current `TreeNodeRef`
is permanent or temporary, which would make the whole thing quite complex...
So far I've been playing with 3. but it became very complex and still
doesn't fully work. Also, I'm no longer sure that such an API makes sense as
the API user can't specify what kind of references they want. E.g. in CSE
(https://github.com/apache/datafusion/pull/10473) I need permanent references
and can't do anything with other references whose lifetime doesn't match the
root node's lifetime.
So now I'm working on implementing 1., but it will be a pervasive change as
there are 56 (`ExecutionPlan`) + 15 (`PhysicalExpr`) `DynTreeNode`
implementations in the current codebase. They currently offer `fn
children(&self) -> Vec<Arc<dyn ExecutionPlan>>` and `fn children(&self) ->
Vec<Arc<dyn PhysicalExpr>>` methods to get their children and now I'm changing
those to return `Vec<&Arc<dyn ...>>`.
But if you have a better idea please share.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]