[GitHub] [arrow-datafusion] alex-natzka commented on issue #4575: `common_sub_expression_eliminate` exists bug

GitBox Tue, 13 Dec 2022 04:06:23 -0800


alex-natzka commented on issue #4575:
URL: 
https://github.com/apache/arrow-datafusion/issues/4575#issuecomment-1348387667


   Hi, I'm afraid I won't have time to look into this. I did go over the error 
message though, here's what I think is happening:
   
   - I think some other optimization rule than 
`common_sub_expression_eliminate` transforms the filter
     ```t2.t2_int < Int64(10) OR t1.t1_int > Int64(2) AND t2.t2_name != 
Utf8("w")```
     into
     ```(t2.t2_int < UInt32(10) OR t1.t1_int > UInt32(2)) AND (t2.t2_int < 
UInt32(10) OR t2.t2_name != Utf8("w"))```
     which is weird but technically not wrong.
   - Then `common_sub_expression_eliminate` sees that `t2.t2_int < UInt32(10)` 
occurs twice, so it adds a projection before the filter where this expression 
gets computed. (This is correct and expected.) The projection creates the 
additional column `t2.t2_int < UInt32(10)UInt32(10)t2.t2_int` as alias for 
`t2.t2_int < UInt32(10)`.
   - This changes the output schema, because we don't project out the column 
again.
   
   The last point is the problem IMO. I guess `common_sub_expression_eliminate` 
would need to add another projection _after_ the filter that gets rid of the 
extra column.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alex-natzka commented on issue #4575: `common_sub_expression_eliminate` exists bug

Reply via email to