peter-toth commented on code in PR #8891:
URL: https://github.com/apache/arrow-datafusion/pull/8891#discussion_r1504778879
##########
datafusion/common/src/tree_node.rs:
##########
@@ -333,35 +530,45 @@ pub trait DynTreeNode {
&self,
arc_self: Arc<Self>,
new_children: Vec<Arc<Self>>,
- ) -> Result<Arc<Self>>;
+ ) -> Result<Transformed<Arc<Self>>>;
}
/// Blanket implementation for Arc for any tye that implements
/// [`DynTreeNode`] (such as [`Arc<dyn PhysicalExpr>`])
impl<T: DynTreeNode + ?Sized> TreeNode for Arc<T> {
/// Apply the closure `F` to the node's children
- fn apply_children<F>(&self, op: &mut F) -> Result<VisitRecursion>
+ fn apply_children<F>(&self, f: &mut F) -> Result<TreeNodeRecursion>
where
- F: FnMut(&Self) -> Result<VisitRecursion>,
+ F: FnMut(&Self) -> Result<TreeNodeRecursion>,
{
+ let mut tnr = TreeNodeRecursion::Continue;
for child in self.arc_children() {
- handle_tree_recursion!(op(&child)?)
+ tnr = f(&child)?;
+ handle_visit_recursion!(tnr)
}
- Ok(VisitRecursion::Continue)
+ Ok(tnr)
}
- fn map_children<F>(self, transform: F) -> Result<Self>
+ fn map_children<F>(self, f: F) -> Result<Transformed<Self>>
where
- F: FnMut(Self) -> Result<Self>,
+ F: FnMut(Self) -> Result<Transformed<Self>>,
{
let children = self.arc_children();
if !children.is_empty() {
- let new_children =
- children.into_iter().map(transform).collect::<Result<_>>()?;
+ let t = children.into_iter().map_till_continue_and_collect(f)?;
+ // TODO: Currently `assert_eq!(t.transformed, t2.transformed)`
fails as
Review Comment:
Let me cc @berkaysynnada here as he opened a PR:
https://github.com/peter-toth/arrow-datafusion/pull/1 to this PR with his
suggestions. In general, let us merge that PR first and then come back your
comments @alamb as some of those will be fixed by the PR.
But regarding this particular comment I think we should discuss how to
handle the 3 `TODO`s I left in the this PR.
In the `map_children()` method of `LogicalPlan`, `Arc<DynTreeNode>` and
`ConcreteTreeNode` we actually have 2 options to calculate and then propagate
up the `transformed` boolean flag in the `Transformed` struct:
1. We apply the transformation closure (`f`) to all children and if any of
the applications returned `transformed=true` we propagate up`true`. "Or"ing the
flags is done in `map_till_continue_and_collect()`. This is actually
`t.transformed` in the above code. As `f` is provided by the API user the
"quality" of `t.transformed` depends if the user filled the `transformed`
correctly.
2. In `LogicalPlan`, `Arc<DynTreeNode>` and `ConcreteTreeNode` we detect if
any change was made. E.g. here in `Arc<DynTreeNode>` this happens in
`self.with_new_arc_children()` where addresses of the old and new children are
compared. (In `LogicalPlan` we compare the children in `map_children()`.) This
is `t2.transformed` in the above code.
(Please note that applying `f` on the current node can also return
`transformed=true` and that also gets "or"ed into the final `transformed` state
in both 1. and 2. but that isn't relevant here.)
So `t.transformed` depends on if the API user while `t2.transformed` depends
on if we can detect the change correctly (e.g. in `Expr` we don't even have
`t2.transformed` as if would be costly to compare expressions).
Now the issue is that currently `t.transformed` and `t2.transformed` are not
equal (if you put that assert back there will be many test failures.)
So the question is if we should always propagate up the `transformed` value
from the API user closures (this PR does that) or we should override it when we
detected changes?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]