jayzhan211 commented on code in PR #10221:
URL: https://github.com/apache/datafusion/pull/10221#discussion_r1580522622


##########
datafusion/expr/src/type_coercion/binary.rs:
##########
@@ -638,22 +638,26 @@ fn dictionary_coercion(
     preserve_dictionaries: bool,
 ) -> Option<DataType> {
     use arrow::datatypes::DataType::*;
+    let maybe_preserve_dictionaries =
+        |index_type: &Box<DataType>, value_type: DataType| -> DataType {
+            if preserve_dictionaries {
+                Dictionary(index_type.clone(), Box::new(value_type))
+            } else {
+                value_type
+            }
+        };
     match (lhs_type, rhs_type) {
         (
             Dictionary(_lhs_index_type, lhs_value_type),
             Dictionary(_rhs_index_type, rhs_value_type),
         ) => comparison_coercion(lhs_value_type, rhs_value_type),

Review Comment:
   I think what we need to solve the issue is avoiding `casting from value to 
dict for column`, because casting for column is costly compare with casting 
constant.
   
   Given the example, if we preserve dict, we still ends up casting column 
(utf8) to Dict (i32,utf8), but in this case, we can cast the const from i64 to 
utf8 and it is enough.
   
   ```
   statement ok
   create table test as values
     ('row1', arrow_cast(1, 'Utf8')),
     ('row2', arrow_cast(2, 'Utf8')),
     ('row3', arrow_cast(3, 'Utf8'))
   ;
   
   # query using an string '1' which must be coerced into a dictionary string
   query TT
   SELECT * from test where column2 =  arrow_cast(2, 'Dictionary(Int32, 
Int64)');
   ----
   row2 2
   
   query TT
   explain SELECT * from test where column2 = arrow_cast(2, 'Dictionary(Int32, 
Int64)');
   ----
   logical_plan
   01)Filter: CAST(test.column2 AS Dictionary(Int32, Utf8)) = Dictionary(Int32, 
Utf8("2"))
   02)--TableScan: test projection=[column1, column2]
   physical_plan
   01)CoalesceBatchesExec: target_batch_size=8192
   02)--FilterExec: CAST(column2@1 AS Dictionary(Int32, Utf8)) = 2
   03)----MemoryExec: partitions=1, partition_sizes=[1]
   
   statement ok
   drop table test;
   ```
   
   expect plan
   ```
   01)Filter: test.column2 = Utf8("2")
   02)--TableScan: test projection=[column1, column2]
   physical_plan
   01)CoalesceBatchesExec: target_batch_size=8192
   02)--FilterExec: column2@1 = 2
   03)----MemoryExec: partitions=1, partition_sizes=[1]
   ```
   
   
   I think we should not preserve dict, but need specialization on  comparing 
dict vs non-dict case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to