mongo360 opened a new pull request, #35200: URL: https://github.com/apache/doris/pull/35200
Description: The sql execute much slow when the literal value with string format in `in predicate`; and the real data is integral type。 ``` mysql> set enable_nereids_planner = false; Query OK, 0 rows affected (0.03 sec) mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10; +------------+---------------+ | id | sum(`clicks`) | +------------+---------------+ | 787934713 | 2838 | | 306960695 | 339 | +------------+---------------+ 2 rows in set (1.81 sec) mysql> set enable_nereids_planner = true; Query OK, 0 rows affected (0.02 sec) mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10; +------------+-------------+ | id | sum(clicks) | +------------+-------------+ | 787934713 | 2838 | | 306960695 | 339 | +------------+-------------+ 2 rows in set (28.14 sec) ``` Reason: In legacy planner, the string literal with convert to integral value, but in the nereids planner do not do this convert and with do string matching in BE。 Solved: do process string literal with numeric in `in predicate` like in `comparison predicate`; test table: ``` create table a_table( k1 BIGINT NOT NULL, k2 VARCHAR(100) NOT NULL, v1 INT SUM NULL DEFAULT "0" ) ENGINE=OLAP AGGREGATE KEY(k1,k2) distributed BY hash(k1) buckets 2 properties("replication_num" = "1"); insert into a_table values (10, 'name1', 10),(20, 'name2', 10); explain plan select * from a_table where k1 in ('10', '20001'); ``` before optimize: ``` +--------------------------------------------------------------------------------------------------------------------------------------+ | Explain String(Nereids Planner) | +--------------------------------------------------------------------------------------------------------------------------------------+ | ========== PARSED PLAN (time: 1ms) ========== | | UnboundResultSink[4] ( ) | | +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] ) | | +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') ) | | +--LogicalCheckPolicy ( ) | | +--UnboundRelation ( id=RelationId#0, nameParts=a_table ) | | | | ========== ANALYZED PLAN (time: 2ms) ========== | | LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] ) | | +--LogicalFilter[11] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) | | +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET ) | | | | ========== REWRITTEN PLAN (time: 6ms) ========== | | LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--LogicalFilter[43] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) | | +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) | | | | ========== OPTIMIZED PLAN (time: 6ms) ========== | | PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--PhysicalDistribute[87]@1 ( stats=0.33, distributionSpec=DistributionSpecGather ) | | +--PhysicalFilter[84]@1 ( stats=0.33, predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) | | +--PhysicalOlapScan[a_table]@0 ( stats=1 ) | +--------------------------------------------------------------------------------------------------------------------------------------+ ``` after optimize: ``` +--------------------------------------------------------------------------------------------------------------------------------------+ | Explain String(Nereids Planner) | +--------------------------------------------------------------------------------------------------------------------------------------+ | ========== PARSED PLAN (time: 15ms) ========== | | UnboundResultSink[4] ( ) | | +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] ) | | +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') ) | | +--LogicalCheckPolicy ( ) | | +--UnboundRelation ( id=RelationId#0, nameParts=a_table ) | | | | ========== ANALYZED PLAN (time: 11ms) ========== | | LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] ) | | +--LogicalFilter[11] ( predicates=k1#0 IN (10001, 20001) ) | | +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET ) | | | | ========== REWRITTEN PLAN (time: 12ms) ========== | | LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--LogicalFilter[43] ( predicates=k1#0 IN (10001, 20001) ) | | +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) | | | | ========== OPTIMIZED PLAN (time: 4ms) ========== | | PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--PhysicalDistribute[87]@1 ( stats=0, distributionSpec=DistributionSpecGather ) | | +--PhysicalFilter[84]@1 ( stats=0, predicates=k1#0 IN (10001, 20001) ) | | +--PhysicalOlapScan[a_table]@0 ( stats=2 ) | +--------------------------------------------------------------------------------------------------------------------------------------+ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org