[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2794: Use coerced type in inlist expr planning

GitBox Mon, 27 Jun 2022 00:09:07 -0700


Ted-Jiang commented on code in PR #2794:
URL: https://github.com/apache/arrow-datafusion/pull/2794#discussion_r907044238



##########
datafusion/expr/src/binary_rule.rs:
##########
@@ -185,6 +186,17 @@ fn comparison_order_coercion(
         .or_else(|| null_coercion(lhs_type, rhs_type))
 }
 
+fn string_numeric_coercion(lhs_type: &DataType, rhs_type: &DataType) -> 
Option<DataType> {
+    use arrow::datatypes::DataType::*;
+    match (lhs_type, rhs_type) {

Review Comment:
   I test in `748b6a65a5fa801595fd80a3c7b2728be3c9cdaa`(not this commit)
   
   ```
   explain select * from part where p_partkey in (1, 2, '3');
   
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                                
                                                                                
                                               |
   
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: #part.p_partkey, #part.p_name, #part.p_mfgr, 
#part.p_brand, #part.p_type, #part.p_size, #part.p_container, 
#part.p_retailprice, #part.p_comment                                            
                                                                   |
   |               |   Filter: #part.p_partkey IN ([Int64(1), Int64(2), 
Utf8("3")])                                                                     
                                                                                
                                                       |
   |               |     TableScan: part projection=Some([p_partkey, p_name, 
p_mfgr, p_brand, p_type, p_size, p_container, p_retailprice, p_comment]), 
partial_filters=[#part.p_partkey IN ([Int64(1), Int64(2), Utf8("3")])]          
                                                        |
   | physical_plan | ProjectionExec: expr=[p_partkey@0 as p_partkey, p_name@1 
as p_name, p_mfgr@2 as p_mfgr, p_brand@3 as p_brand, p_type@4 as p_type, 
p_size@5 as p_size, p_container@6 as p_container, p_retailprice@7 as 
p_retailprice, p_comment@8 as p_comment]                           |
   |               |   CoalesceBatchesExec: target_batch_size=4096              
                                                                                
                                                                                
                                               |
   |               |     FilterExec: p_partkey@0 IN ([Literal { value: Int64(1) 
}, Literal { value: Int64(2) }, CastExpr { expr: Literal { value: Utf8("3") }, 
cast_type: Int64, cast_options: CastOptions { safe: false } }])                 
                                                |
   |               |       RepartitionExec: partitioning=RoundRobinBatch(16)    
                                                                                
                                                                                
                                               |
   |               |         ParquetExec: limit=None, 
partitions=[/Users/yangjiang/test-data/tpch-1g-oneFile/part/part-00000-3a3c2777-00d3-4c27-b917-4ff2145123dc-c000.snappy.parquet],
 projection=[p_partkey, p_name, p_mfgr, p_brand, p_type, p_size, p_container, 
p_retailprice, p_comment] |
   |               |                                                            
                                                                                
                                                                                
                                               |
   
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   ```
   
   `int, int,utf8` cast to -> `int, int, int`,
   
   In my opinion, after apply this patch it will get   int, int,utf8` cast to 
-> `utf8, utf8, utf8`
   I think when list_values_size is large, we will construct a hashSet in 
https://github.com/apache/arrow-datafusion/pull/2156,  change to `int` will get 
better performance in build hasSet, Am i right?  😄 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2794: Use coerced type in inlist expr planning

Reply via email to