mongo360 opened a new pull request, #35200:
URL: https://github.com/apache/doris/pull/35200

   Description:
      The sql execute much slow when the literal value with string format in 
`in predicate`; and the real data is integral type。
   ```
   mysql> set enable_nereids_planner = false;
   Query OK, 0 rows affected (0.03 sec)
   
   mysql> select id,sum(clicks) from a_table where id in ('787934713', 
'306960695') group by id limit 10;
   +------------+---------------+
   | id | sum(`clicks`) |
   +------------+---------------+
   |  787934713 |          2838 |
   |  306960695 |           339 |
   +------------+---------------+
   2 rows in set (1.81 sec)
   
   mysql> set enable_nereids_planner = true;
   Query OK, 0 rows affected (0.02 sec)
   
   mysql> select id,sum(clicks) from a_table where id in ('787934713', 
'306960695') group by id limit 10;
   +------------+-------------+
   | id | sum(clicks) |
   +------------+-------------+
   |  787934713 |        2838 |
   |  306960695 |         339 |
   +------------+-------------+
   2 rows in set (28.14 sec)
   ```
   
   Reason:
   In legacy planner, the string literal with convert to integral value, but in 
the nereids planner do not do this convert and with do string matching in BE。
   
   Solved:
   do process string literal with numeric in `in predicate` like in `comparison 
predicate`;
   test table:
   ```
   create table a_table(
       k1 BIGINT NOT NULL,
       k2 VARCHAR(100) NOT NULL,
       v1 INT SUM NULL DEFAULT "0"
   ) ENGINE=OLAP
   AGGREGATE KEY(k1,k2)
   distributed BY hash(k1) buckets 2
   properties("replication_num" = "1");
   insert into a_table values (10, 'name1', 10),(20, 'name2', 10);
   explain plan select * from a_table where k1 in ('10', '20001');
   ```
   before optimize:
   ```
   
+--------------------------------------------------------------------------------------------------------------------------------------+
   | Explain String(Nereids Planner)                                            
                                                          |
   
+--------------------------------------------------------------------------------------------------------------------------------------+
   | ========== PARSED PLAN (time: 1ms) ==========                              
                                                          |
   | UnboundResultSink[4] (  )                                                  
                                                          |
   | +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] )          
                                                          |
   |    +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') )            
                                                          |
   |       +--LogicalCheckPolicy (  )                                           
                                                          |
   |          +--UnboundRelation ( id=RelationId#0, nameParts=a_table )         
                                                          |
   |                                                                            
                                                          |
   | ========== ANALYZED PLAN (time: 2ms) ==========                            
                                                          |
   | LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] )                   
                                                          |
   | +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], 
excepts=[] )                                                    |
   |    +--LogicalFilter[11] ( predicates=cast(k1#0 as TEXT) IN ('10001', 
'20001') )                                                      |
   |       +--LogicalOlapScan ( qualified=internal.db.a_table, 
indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET )      |
   |                                                                            
                                                          |
   | ========== REWRITTEN PLAN (time: 6ms) ==========                           
                                                          |
   | LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] )                   
                                                          |
   | +--LogicalFilter[43] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') 
)                                                         |
   |    +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, 
selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) |
   |                                                                            
                                                          |
   | ========== OPTIMIZED PLAN (time: 6ms) ==========                           
                                                          |
   | PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] )                  
                                                          |
   | +--PhysicalDistribute[87]@1 ( stats=0.33, 
distributionSpec=DistributionSpecGather )                                       
           |
   |    +--PhysicalFilter[84]@1 ( stats=0.33, predicates=cast(k1#0 as TEXT) IN 
('10001', '20001') )                                       |
   |       +--PhysicalOlapScan[a_table]@0 ( stats=1 )                           
                                                          |
   
+--------------------------------------------------------------------------------------------------------------------------------------+
   ```
   after optimize:
   ```
   
+--------------------------------------------------------------------------------------------------------------------------------------+
   | Explain String(Nereids Planner)                                            
                                                          |
   
+--------------------------------------------------------------------------------------------------------------------------------------+
   | ========== PARSED PLAN (time: 15ms) ==========                             
                                                          |
   | UnboundResultSink[4] (  )                                                  
                                                          |
   | +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] )          
                                                          |
   |    +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') )            
                                                          |
   |       +--LogicalCheckPolicy (  )                                           
                                                          |
   |          +--UnboundRelation ( id=RelationId#0, nameParts=a_table )         
                                                          |
   |                                                                            
                                                          |
   | ========== ANALYZED PLAN (time: 11ms) ==========                           
                                                          |
   | LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] )                   
                                                          |
   | +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], 
excepts=[] )                                                    |
   |    +--LogicalFilter[11] ( predicates=k1#0 IN (10001, 20001) )              
                                                          |
   |       +--LogicalOlapScan ( qualified=internal.db.a_table, 
indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET )      |
   |                                                                            
                                                          |
   | ========== REWRITTEN PLAN (time: 12ms) ==========                          
                                                          |
   | LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] )                   
                                                          |
   | +--LogicalFilter[43] ( predicates=k1#0 IN (10001, 20001) )                 
                                                          |
   |    +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, 
selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) |
   |                                                                            
                                                          |
   | ========== OPTIMIZED PLAN (time: 4ms) ==========                           
                                                          |
   | PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] )                  
                                                          |
   | +--PhysicalDistribute[87]@1 ( stats=0, 
distributionSpec=DistributionSpecGather )                                       
              |
   |    +--PhysicalFilter[84]@1 ( stats=0, predicates=k1#0 IN (10001, 20001) )  
                                                          |
   |       +--PhysicalOlapScan[a_table]@0 ( stats=2 )                           
                                                          |
   
+--------------------------------------------------------------------------------------------------------------------------------------+
   ```
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to