alamb commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2077771173
> I updated the tests, but I'm still unsure if it's okay to ignore this change in behavior. It seems like this could be a regression in other cases where we might be converting things to dictionaries for no reason. I think the reason to convert to a dictonary is that the other side of the comparsion is already a dictionary, which thus avoids convering *both* arguments (though I may be missing something) Maybe @viirya has some thoughts given he worked on coercion as part of https://github.com/apache/datafusion/pull/9459 recently > It would be nice to have some issue tracking this, or maybe there is one already. Perhaps https://github.com/apache/datafusion/issues/5928 ? There appear to be a bunch of open tickets about coercion https://github.com/apache/datafusion/issues?q=is%3Aissue+is%3Aopen+coercion > A possibly easy change to make in the short-term is to refactor away from using `comparison_coercion` for non-comparisons and instead use coercion logic specifically tailored to those expressions. This seems reasonable to me (aka have special rules for `coerce`). Other functions like `BETWEEN` I think can be thought of as syntactic sugar for comparisons (namely `x > low AND x < high`) > I just feels like we're playing Jenga with type coercion logic right now. The coercion logic should prioritize avoiding expensive casts, I think. The key to ensuring we don't break existing code is to rely on the tests. I agree the type coercion logic could be improved -- specifically I think it needs to have some encapsulation rather than being spread out through a bunch of free functions. Is this something you are interested in helping out with? I think the first thing to do would be to try and explain how the current logic works (and in that process we will likely uncover ways to make it better). Then, would improve the code the structure to encapsulate the logic (into structs / enums perhaps -- like I did recently in https://github.com/apache/datafusion/pull/10216). Once the logic is more encapsulated, then I think we'll be able to reason about it and feel good about how it fits into the overall picture I think the difference in casting a scalar integer to a scalar dictionary is neglible. The difference casting a column to a different type is likely subtantial (though casting string --> dictionary doesn't require any data copying) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org