Ted-Jiang commented on code in PR #2794: URL: https://github.com/apache/arrow-datafusion/pull/2794#discussion_r907044238
########## datafusion/expr/src/binary_rule.rs: ########## @@ -185,6 +186,17 @@ fn comparison_order_coercion( .or_else(|| null_coercion(lhs_type, rhs_type)) } +fn string_numeric_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> { + use arrow::datatypes::DataType::*; + match (lhs_type, rhs_type) { Review Comment: I test in `748b6a65a5fa801595fd80a3c7b2728be3c9cdaa`(not this commit) ``` explain select * from part where p_partkey in (1, 2, '3'); +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | logical_plan | Projection: #part.p_partkey, #part.p_name, #part.p_mfgr, #part.p_brand, #part.p_type, #part.p_size, #part.p_container, #part.p_retailprice, #part.p_comment | | | Filter: #part.p_partkey IN ([Int64(1), Int64(2), Utf8("3")]) | | | TableScan: part projection=Some([p_partkey, p_name, p_mfgr, p_brand, p_type, p_size, p_container, p_retailprice, p_comment]), partial_filters=[#part.p_partkey IN ([Int64(1), Int64(2), Utf8("3")])] | | physical_plan | ProjectionExec: expr=[p_partkey@0 as p_partkey, p_name@1 as p_name, p_mfgr@2 as p_mfgr, p_brand@3 as p_brand, p_type@4 as p_type, p_size@5 as p_size, p_container@6 as p_container, p_retailprice@7 as p_retailprice, p_comment@8 as p_comment] | | | CoalesceBatchesExec: target_batch_size=4096 | | | FilterExec: p_partkey@0 IN ([Literal { value: Int64(1) }, Literal { value: Int64(2) }, CastExpr { expr: Literal { value: Utf8("3") }, cast_type: Int64, cast_options: CastOptions { safe: false } }]) | | | RepartitionExec: partitioning=RoundRobinBatch(16) | | | ParquetExec: limit=None, partitions=[/Users/yangjiang/test-data/tpch-1g-oneFile/part/part-00000-3a3c2777-00d3-4c27-b917-4ff2145123dc-c000.snappy.parquet], projection=[p_partkey, p_name, p_mfgr, p_brand, p_type, p_size, p_container, p_retailprice, p_comment] | | | | +---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` `int, int,utf8` cast to -> `int, int, int`, In my opinion, after apply this patch it will get int, int,utf8` cast to -> `utf8, utf8, utf8` I think when list_values_size is large, we will construct a hashSet in https://github.com/apache/arrow-datafusion/pull/2156, change to `int` will get better performance in build hasSet, Am i right? 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org