Tim Armstrong created IMPALA-7655:
-------------------------------------
Summary: Codegen output for conditional functions (if,isnull,
coalesce) is very suboptimal
Key: IMPALA-7655
URL: https://issues.apache.org/jira/browse/IMPALA-7655
Project: IMPALA
Issue Type: Improvement
Components: Backend
Reporter: Tim Armstrong
https://gerrit.cloudera.org/#/c/11565/ provided a clue that an aggregation
involving an if() function was very slow, 10x slower than the equivalent
version using a case:
{noformat}
[localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(case
when l_orderkey is NULL then 1 else NULL end) from
tpch10_parquet.lineitem;summary;
NUM_NODES set to 1
MT_DOP set to 1
Query: select count(case when l_orderkey is NULL then 1 else NULL end) from
tpch10_parquet.lineitem
Query submitted at: 2018-10-04 11:17:31 (Coordinator:
http://tarmstrong-box:25000)
Query progress can be monitored at:
http://tarmstrong-box:25000/query_plan?query_id=274b2a6f35cefe31:95a1964200000000
+----------------------------------------------------------+
| count(case when l_orderkey is null then 1 else null end) |
+----------------------------------------------------------+
| 0 |
+----------------------------------------------------------+
Fetched 1 row(s) in 0.51s
+--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem
| Est. Peak Mem | Detail |
+--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
| 01:AGGREGATE | 1 | 44.03ms | 44.03ms | 1 | 1 | 25.00 KB
| 10.00 MB | FINALIZE |
| 00:SCAN HDFS | 1 | 411.57ms | 411.57ms | 59.99M | -1 | 16.61 MB
| 88.00 MB | tpch10_parquet.lineitem |
+--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
[localhost:21000] default> set num_nodes=1; set mt_dop=1; select
count(if(l_orderkey is NULL, 1, NULL)) from tpch10_parquet.lineitem;summary;
NUM_NODES set to 1
MT_DOP set to 1
Query: select count(if(l_orderkey is NULL, 1, NULL)) from
tpch10_parquet.lineitem
Query submitted at: 2018-10-04 11:23:07 (Coordinator:
http://tarmstrong-box:25000)
Query progress can be monitored at:
http://tarmstrong-box:25000/query_plan?query_id=8e46ab1b84c4dbff:2786ca2600000000
+----------------------------------------+
| count(if(l_orderkey is null, 1, null)) |
+----------------------------------------+
| 0 |
+----------------------------------------+
Fetched 1 row(s) in 1.01s
+--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem
| Est. Peak Mem | Detail |
+--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
| 01:AGGREGATE | 1 | 422.07ms | 422.07ms | 1 | 1 | 25.00 KB
| 10.00 MB | FINALIZE |
| 00:SCAN HDFS | 1 | 511.13ms | 511.13ms | 59.99M | -1 | 16.61 MB
| 88.00 MB | tpch10_parquet.lineitem |
+--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
{noformat}
It turns out that this is because we don't have good codegen support for
ConditionalFunction, and just fall back to emitting a call to the interpreted
path:
https://github.com/apache/impala/blob/master/be/src/exprs/conditional-functions.cc#L28
See CaseExpr for an example of much better codegen support:
https://github.com/apache/impala/blob/master/be/src/exprs/case-expr.cc#L178
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]