jayzhan211 commented on code in PR #10221:
URL: https://github.com/apache/datafusion/pull/10221#discussion_r1580522622
##########
datafusion/expr/src/type_coercion/binary.rs:
##########
@@ -638,22 +638,26 @@ fn dictionary_coercion(
preserve_dictionaries: bool,
) -> Option<DataType> {
use arrow::datatypes::DataType::*;
+ let maybe_preserve_dictionaries =
+ |index_type: &Box<DataType>, value_type: DataType| -> DataType {
+ if preserve_dictionaries {
+ Dictionary(index_type.clone(), Box::new(value_type))
+ } else {
+ value_type
+ }
+ };
match (lhs_type, rhs_type) {
(
Dictionary(_lhs_index_type, lhs_value_type),
Dictionary(_rhs_index_type, rhs_value_type),
) => comparison_coercion(lhs_value_type, rhs_value_type),
Review Comment:
I think what we need to solve the issue is avoiding `casting from value to
dict for column`, because casting for column is costly compare with casting
constant.
Given the example, if we preserve dict, we still ends up casting column
(utf8) to Dict (i32,utf8), but in this case, we can cast the const from i64 to
utf8 and it is enough.
```
statement ok
create table test as values
('row1', arrow_cast(1, 'Utf8')),
('row2', arrow_cast(2, 'Utf8')),
('row3', arrow_cast(3, 'Utf8'))
;
# query using an string '1' which must be coerced into a dictionary string
query TT
SELECT * from test where column2 = arrow_cast(2, 'Dictionary(Int32,
Int64)');
----
row2 2
query TT
explain SELECT * from test where column2 = arrow_cast(2, 'Dictionary(Int32,
Int64)');
----
logical_plan
01)Filter: CAST(test.column2 AS Dictionary(Int32, Utf8)) = Dictionary(Int32,
Utf8("2"))
02)--TableScan: test projection=[column1, column2]
physical_plan
01)CoalesceBatchesExec: target_batch_size=8192
02)--FilterExec: CAST(column2@1 AS Dictionary(Int32, Utf8)) = 2
03)----MemoryExec: partitions=1, partition_sizes=[1]
statement ok
drop table test;
```
expect plan
```
01)Filter: test.column2 = Utf8("2")
02)--TableScan: test projection=[column1, column2]
physical_plan
01)CoalesceBatchesExec: target_batch_size=8192
02)--FilterExec: column2@1 = 2
03)----MemoryExec: partitions=1, partition_sizes=[1]
```
I think we should not preserve dict, but need specialization on comparing
dict vs non-dict case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]