Re: [PR] Consolidate `TreeNode` transform and rewrite APIs [arrow-datafusion]

via GitHub Tue, 27 Feb 2024 11:10:51 -0800


peter-toth commented on code in PR #8891:
URL: https://github.com/apache/arrow-datafusion/pull/8891#discussion_r1504778879



##########
datafusion/common/src/tree_node.rs:
##########
@@ -333,35 +530,45 @@ pub trait DynTreeNode {
         &self,
         arc_self: Arc<Self>,
         new_children: Vec<Arc<Self>>,
-    ) -> Result<Arc<Self>>;
+    ) -> Result<Transformed<Arc<Self>>>;
 }
 
 /// Blanket implementation for Arc for any tye that implements
 /// [`DynTreeNode`] (such as [`Arc<dyn PhysicalExpr>`])
 impl<T: DynTreeNode + ?Sized> TreeNode for Arc<T> {
     /// Apply the closure `F` to the node's children
-    fn apply_children<F>(&self, op: &mut F) -> Result<VisitRecursion>
+    fn apply_children<F>(&self, f: &mut F) -> Result<TreeNodeRecursion>
     where
-        F: FnMut(&Self) -> Result<VisitRecursion>,
+        F: FnMut(&Self) -> Result<TreeNodeRecursion>,
     {
+        let mut tnr = TreeNodeRecursion::Continue;
         for child in self.arc_children() {
-            handle_tree_recursion!(op(&child)?)
+            tnr = f(&child)?;
+            handle_visit_recursion!(tnr)
         }
-        Ok(VisitRecursion::Continue)
+        Ok(tnr)
     }
 
-    fn map_children<F>(self, transform: F) -> Result<Self>
+    fn map_children<F>(self, f: F) -> Result<Transformed<Self>>
     where
-        F: FnMut(Self) -> Result<Self>,
+        F: FnMut(Self) -> Result<Transformed<Self>>,
     {
         let children = self.arc_children();
         if !children.is_empty() {
-            let new_children =
-                children.into_iter().map(transform).collect::<Result<_>>()?;
+            let t = children.into_iter().map_till_continue_and_collect(f)?;
+            // TODO: Currently `assert_eq!(t.transformed, t2.transformed)` 
fails as

Review Comment:
   Let me cc @berkaysynnada here as he opened a PR: 
https://github.com/peter-toth/arrow-datafusion/pull/1 to this PR with his 
suggestions. In general, let us merge that PR first and then come back your 
comments @alamb as some of those will be fixed by the PR.
   
   But regarding this particular comment I think we should discuss how to 
handle the 3 `TODO`s I left in the this PR.
   In the `map_children()` method of `LogicalPlan`, `Arc<DynTreeNode>` and 
`ConcreteTreeNode` we actually have 2 options to calculate and then propagate 
up the `transformed` boolean flag in the `Transformed` struct:
   
   1. We apply the transformation closure (`f`) to all children and if any of 
the applications returned `transformed=true` we propagate up`true`. "Or"ing the 
flags is done in `map_till_continue_and_collect()`. This is actually 
`t.transformed` in the above code. As `f` is provided by the API user the 
"quality" of `t.transformed` depends if the user filled the `transformed` 
correctly.
   
   2. In `LogicalPlan`, `Arc<DynTreeNode>` and `ConcreteTreeNode` we detect if 
any change was made. E.g. here in `Arc<DynTreeNode>` this happens in 
`self.with_new_arc_children()` where addresses of the old and new children are 
compared. (In `LogicalPlan` we compare the children in `map_children()`.) This 
is `t2.transformed` in the above code.
   
   (Please note that applying `f` on the current node can also return 
`transformed=true` and that also gets "or"ed into the final `transformed` state 
in both 1. and 2. but that isn't relevant here.)
   
   So `t.transformed` depends on if the API user while `t2.transformed` depends 
on if we can detect the change correctly (e.g. in `Expr` we don't even have 
`t2.transformed` as if would be costly to compare expressions).
   
   Now the issue is that currently `t.transformed` and `t2.transformed` are not 
equal (if you put that assert back there will be many test failures.)
   
   So the question is if we should always propagate up the `transformed` value 
from the API user closures (this PR does that) or we should override it when we 
detected changes?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Consolidate `TreeNode` transform and rewrite APIs [arrow-datafusion]

Reply via email to