peter-toth commented on code in PR #10832:
URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632225209


##########
datafusion/optimizer/src/common_subexpr_eliminate.rs:
##########
@@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> {
             return Ok(Transformed::new(expr, false, TreeNodeRecursion::Jump));
         }
 
-        let (up_index, expr_id) = &self.id_array[self.down_index];
+        let (up_index, expr_id, aliased) = &self.id_array[self.down_index];
         self.down_index += 1;
 
+        self.aliased |= aliased;

Review Comment:
   So I was refering to expression trees like: `(((a + b) - c) as "alias1") * 
(a + b)`
   
   I think currently in this case `a + b` gets extracted as a common expression 
and the original expression becomes:
   `(((common_1 as "a + b") - c) as "alias1") * (common_1 as "a + b")`
   (`common_1` is not the actual name of the exracted expression but let's use 
if for the sake of simplicity.)
   
   Now, instead of the above this PR could drop the first added alias (`as  "a 
+ b"`) becuase the first `a + b` has an alias ancestor node (`as "alias1"`), 
but it shouldn't drop the 2nd alias because the 2nd `a + b` doesn't have an 
alias ancestor.
   
   Using a flag (`self.aliased`) caluclated in `f_down()`, to decide if we can 
drop an alias won't work. This is because `f_down()` gets called on all nodes 
in the subtree of the first child (`((a + b) - c) as "alias1"`) of `*` before 
it gets called on the 2nd child (`a + b`). So when `f_down()` encounters the 
2nd `a + b`, `self.aliased` is true and you miss adding a necessary alias.
   
   What I think this PR should do is maintain a stack of booleans and when 
`f_down()` is called push a new boolean item, and when `f_up()` is called pop 
one. A new boolean item should deppend on the previous item and if the current 
`expr` is `Alias`.
   
   (An alternative to maintaing a stack could be to use the `..._with_payload` 
APIs to propagate down some kind of "has alias ancestor" state, but I haven't 
finished adding that API yet: https://github.com/apache/datafusion/pull/8664.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to