erratic-pattern commented on code in PR #10221:
URL: https://github.com/apache/datafusion/pull/10221#discussion_r1581447540
##########
datafusion/expr/src/type_coercion/binary.rs:
##########
@@ -638,22 +638,26 @@ fn dictionary_coercion(
preserve_dictionaries: bool,
) -> Option<DataType> {
use arrow::datatypes::DataType::*;
+ let maybe_preserve_dictionaries =
+ |index_type: &Box<DataType>, value_type: DataType| -> DataType {
+ if preserve_dictionaries {
+ Dictionary(index_type.clone(), Box::new(value_type))
+ } else {
+ value_type
+ }
+ };
match (lhs_type, rhs_type) {
(
Dictionary(_lhs_index_type, lhs_value_type),
Dictionary(_rhs_index_type, rhs_value_type),
) => comparison_coercion(lhs_value_type, rhs_value_type),
Review Comment:
I agree with @jayzhan211 that we probably don't want to cast everything to
dictionaries in the way that we are currently doing it, and what we really want
is a way for the optimizer to avoid expensive casts of dictionary columns, and
more generally to just avoid column casts in favor of casting constants and
scalar expressions.
I think what we have works fine for now and fixes performance issues we're
seeing on dictionary columns, but should be improved for the general case in
subsequent PRs that redesign the type coercion logic.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]