alex-natzka commented on issue #4575:
URL:
https://github.com/apache/arrow-datafusion/issues/4575#issuecomment-1348387667
Hi, I'm afraid I won't have time to look into this. I did go over the error
message though, here's what I think is happening:
- I think some other optimization rule than
`common_sub_expression_eliminate` transforms the filter
```t2.t2_int < Int64(10) OR t1.t1_int > Int64(2) AND t2.t2_name !=
Utf8("w")```
into
```(t2.t2_int < UInt32(10) OR t1.t1_int > UInt32(2)) AND (t2.t2_int <
UInt32(10) OR t2.t2_name != Utf8("w"))```
which is weird but technically not wrong.
- Then `common_sub_expression_eliminate` sees that `t2.t2_int < UInt32(10)`
occurs twice, so it adds a projection before the filter where this expression
gets computed. (This is correct and expected.) The projection creates the
additional column `t2.t2_int < UInt32(10)UInt32(10)t2.t2_int` as alias for
`t2.t2_int < UInt32(10)`.
- This changes the output schema, because we don't project out the column
again.
The last point is the problem IMO. I guess `common_sub_expression_eliminate`
would need to add another projection _after_ the filter that gets rid of the
extra column.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]