jayzhan211 commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1580522622
########## datafusion/expr/src/type_coercion/binary.rs: ########## @@ -638,22 +638,26 @@ fn dictionary_coercion( preserve_dictionaries: bool, ) -> Option<DataType> { use arrow::datatypes::DataType::*; + let maybe_preserve_dictionaries = + |index_type: &Box<DataType>, value_type: DataType| -> DataType { + if preserve_dictionaries { + Dictionary(index_type.clone(), Box::new(value_type)) + } else { + value_type + } + }; match (lhs_type, rhs_type) { ( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), Review Comment: I think what we need to solve the issue is avoiding `casting from value to dict for column`, because casting for column is costly compare with casting constant. Given the example, if we preserve dict, we still ends up casting column (utf8) to Dict (i32,utf8), but in this case, we can cast the const from i64 to utf8 and it is enough. ``` statement ok create table test as values ('row1', arrow_cast(1, 'Utf8')), ('row2', arrow_cast(2, 'Utf8')), ('row3', arrow_cast(3, 'Utf8')) ; # query using an string '1' which must be coerced into a dictionary string query TT SELECT * from test where column2 = arrow_cast(2, 'Dictionary(Int32, Int64)'); ---- row2 2 query TT explain SELECT * from test where column2 = arrow_cast(2, 'Dictionary(Int32, Int64)'); ---- logical_plan 01)Filter: CAST(test.column2 AS Dictionary(Int32, Utf8)) = Dictionary(Int32, Utf8("2")) 02)--TableScan: test projection=[column1, column2] physical_plan 01)CoalesceBatchesExec: target_batch_size=8192 02)--FilterExec: CAST(column2@1 AS Dictionary(Int32, Utf8)) = 2 03)----MemoryExec: partitions=1, partition_sizes=[1] statement ok drop table test; ``` expect plan ``` 01)Filter: test.column2 = Utf8("2") 02)--TableScan: test projection=[column1, column2] physical_plan 01)CoalesceBatchesExec: target_batch_size=8192 02)--FilterExec: column2@1 = 2 03)----MemoryExec: partitions=1, partition_sizes=[1] ``` I think we should not preserve dict, but need specialization on comparing dict vs non-dict case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org