Juan Yu created IMPALA-6431:
-------------------------------

             Summary: Disable codegen if query kudu table with PK predicate
                 Key: IMPALA-6431
                 URL: https://issues.apache.org/jira/browse/IMPALA-6431
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
            Reporter: Juan Yu


In near real-time use cases, lots of kudu queries are filtered by primary keys 
so it only needs to process a small amount of data. In such scenario, codegen 
could consume a big part of CPU compare to the total query CPU usage. when 
running with high concurrency, this can easily saturate CPU and impact 
throughput. Disable codegen will not only improve throughput, also reduce CPU 
usage.

Here are some test results:

Create table test(
...
PRIMARY KEY (pickup_datetime, medallion, hack_license, vendor_id)
)
PARTITION BY HASH (medallion, vendor_id) PARTITIONS 10
stored as KUDU;

Q1 - select count(*) from trip_data_kudu where pickup_datetime='2013-01-09 
20:33:00';

Q2 - select passenger_count, avg(trip_time_in_secs), count(1) from 
trip_data_kudu where pickup_datetime='2013-01-09 20:33:00' group by 1 order by 
2 desc;

Q3 - select * from trip_data_kudu where pickup_datetime='2013-01-09 20:33:00' 
and vendor_id='CMT';

 
|16 concurrency|QPS|CPU usage|
|Q1 (codegen enabled)|42|76%|
|Q1 (codegen disabled)|250|58%|
| | | |
|Q2 (codegen enabled)|7.7|85%|
|Q2 (codegen disabled)|78|65%|
| | | |
|Q3 (codegen enabled)|52|75%|
|Q3 (codegen disabled)|185|60%|

 

Note that Impala doesn't have per partition cardinality stats for Kudu table, 
query cannot benefit from DISABLE_CODEGEN_ROWS_THRESHOLD optimization.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to