morningman opened a new issue #3693:
URL: https://github.com/apache/incubator-doris/issues/3693


   **Describe the bug**
   In the test, we found that a simple single-table aggregate query, even in 
the case of hit rollup, still can not achieve a high QPS.
   
   Table: `id INT, name varchar(32), v1 INT SUM`
   
   Query: `SELECT SUM(v1) FROM tbl WHERE id IN (1,2,3,4,5,6, ... (more than 
100))`;
   
   The phenomenon is that the CPU util is 100%, and the delay of a single query 
is significantly increased.
   
   **Why**
   I found that Doris use BE config `doris_max_scan_key_num` to limit both the 
number of scan keys and the number of conditions of a single column that can be 
pushed down to the storage engine.
   
   In above example, when the elements of IN predicate is larger than 
`doris_max_scan_key_num`,
   no conditions will be pushed to storage engine for column `id`, So that none 
of index of column `id` will be used. This causes the storage engine to read 
all the rows and need to filter at the query layer, thus making CPU util high.
   
   **How to solve**
   The BE config `doris_max_scan_key_num` should only be used to limit the 
number of scan keys. And we need another config to limit the number of 
conditions of a single column that can be pushed down to the storage engine.
   
   **Solution**
   I add a new BE config `max_pushdown_conditions_per_column`, default is 1024.
   
   The test case is as follows:
   
   1. Env: 3 BE and 1 FE, Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, 56 core
   2. Table:
   ```
   
+-----------------------------+--------------+------+-------+------------+-------+
   | Field                       | Type         | Null | Key   | Default    | 
Extra |
   
+-----------------------------+--------------+------+-------+------------+-------+
   | date_partition              | DATE         | No   | true  | 1900-01-01 |   
    |
   | user_name                   | VARCHAR(256) | No   | true  |            |   
    |
   | id                          | INT          | No   | true  | N/A        |   
    |
   ...
   ...                    
   | commit                      | INT          | No   | false | 0          | 
SUM   |
   
+-----------------------------+--------------+------+-------+------------+-------+
   25 rows in set (0.01 sec)
   
   about 3674904 rows
   ```
   3. SQL:
   ```
   SELECT SUM(`commit`)  FROM tbl WHERE date_partition >= '2019-11-11' AND 
date_partition <= '2020-05-11' AND id IN (16730, 74818, 2575, 63428, 15571, 
74345, 55241, 52375, 18464, 36687, 22306, 49033, 37124, 14697, 75679, 72319, 
50219, 74723, 50778, 37782, 22152, 3573, 1567, 65673, 4366, 52784, 46973, 
32716, 37056, 75585, 18418, 66659, 72058, 5423, 78679, 10656, 65591, 64949, 
26560, 2709, 60128, 64406, 52200, 41163, 60266, 21974, 79801, 32669, 25025, 
6374, 56828, 80599, 13163, 30660, 64337, 2134, 75150, 2985, 66069, 19964, 6054, 
24464, 36888, 22353, 37502, 42207, 6411, 53184, 45586, 6169, 39933, 40313, 
69984, 52314, 34314, 29295, 43091, 51925, 39434, 839, 27399, 25053, 12930, 
39808, 68532, 77795, 33113, 34423, 28728, 38296, 53803, 55817, 63150, 3177, 
53253, 27514, 24087, 1097, 21326, 33732, 63689, 52950, 1160, 68816, 76697, 
55888, 26789, 42823, 63517, 67566, 38109, 41345, 39837, 78576, 32666, 37680, 
78646, 55641, 5221, 80058, 65142, 40914, 72088, 55113, 1184, 45254, 54959, 
37503, 1881, 66861, 29902, 2017, 34485, 74905, 29711, 3691, 2508, 66662, 23710, 
58297, 42831, 80003, 38781, 13246, 21610, 4402, 52361, 47242, 73721, 17362, 
51488, 239, 61924, 38887, 29551, 11765, 15387);
   ```
   4. Test tool: Apache jmeter, 50 threads.
   
   5. Result:
   
   Origin Doris version:
   
   251.5/s Avg:   198ms Min:    68ms Max:   805ms, CPU util: 92%
   
   New Doris version:
   
   604.5/s Avg:    82ms Min:    30ms Max:   369ms, CPU util: 65%
   
   New Doris version with bitmap index on column `id`:
   
   734.9/s Avg:    67ms Min:    18ms Max:   331ms, CPU util: 50%
   
   
![image](https://user-images.githubusercontent.com/2899462/82917994-b21f3a80-9fa6-11ea-8a89-04f340e55246.png)
   
   The improve is significant.
   
   I aslo add 2 new session variable `max_scan_key_num` and 
`doris_max_scan_key_num` which can overwrite the 2 configs in BE config. 
Because there is no CBO in Doris now, we need provide a method to let user 
change these 2 config to meet different query scenarios.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to