[ https://issues.apache.org/jira/browse/IMPALA-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657532#comment-16657532 ]
Paul Rogers edited comment on IMPALA-7655 at 10/23/18 9:33 PM: --------------------------------------------------------------- Work on this ticket required a broad review of conditional functions as summarized in IMPALA-7747. The notes below focus on the functions covered in this ticket. h4. {{ISNULL(a, b)}} BE: Alias for this method exist in {{impala_functions.py}}, special implementation in {{conditional-functions.[h|cc]}}. *Suggestion:* Rewrite as: {code:sql} CASE a IS NULL THEN b ELSE a END {code} Since {{isnull()}} would vanish from the plan after this transform, remove the BE implementation. Ensure that the entry in {{impala_functions.py}} remains so that the function appears in the list of built-in functions. h4. {{NVL(a, b)}} \\ {{IFNULL(a, b)}} FE, {{SimplifyConditional}}: Treated same as {{ISNULL(a, b)}}, but is not rewritten to this form. BE: Alias for this method exist in {{impala_functions.py}}. *Suggestion:* Rewrite to {{ISNULL(a, b)}}, to make things a bit more tidy. h4. {{IF(cond, trueExpr, falseExpr)}} FE: {{SimplifyConditional}} performs basic simplifications. BE: Implemented in {{conditional-functions.[h|cc]}} as an interpreted-only function to allow short-circuit argument evaluation. *Suggestion:* Rewrite in the FE to {code:sql} CASE WHEN cond THEN trueExpr ELSE falseExpr END {code} {{IF()}} will then vanish from the plan so remove the BE implementation, leaving the entry in {{impala_functions.py}}. h4. {{COALESCE(e1, e2, … en)}} FE: {{SimplifyConditional}} performs basic simplifications. BE: Implemented in {{conditional-functions.[h|cc]}} as an interpreted-only function to allow short-circuit argument evaluation. *Suggestion:* Rewrite in the FE to {noformat} CASE WHEN [ei IS NOT NULL THEN ei]* ELSE en END {noformat} When doing so, extend two existing optimizations. 1. Remove not only leading null values, but all null values. 2. Special case not just the last non-null literal, but rather when encountering the first such value, drop all remaining terms. {{COLAESCE()}} will then vanish from the plan so remove the BE implementation. h4. Remove {{conditional-functions.[h|cc]}} Since the above will remove the the three special conditional functions, remove {{conditional-functions.[h|cc]}} as well. was (Author: paul.rogers): Work on this ticket required a broad review of conditional functions as summarized in IMPALA-7747. The notes below focus on the functions covered in this ticket. h4. {{ISNULL(a, b)}} BE: Alias for this method exist in {{impala_functions.py}}, special implementation in {{conditional-functions.[h|cc]}}. *Suggestion:* Rewrite as: {code:sql} CASE a IS NULL THEN b ELSE a END {code} Since {{isnull()}} would vanish from the plan after this transform, remove the BE implementation. Ensure that the entry in {{impala_functions.py}} remains so that the function appears in the list of built-in functions. h4. {{IF(cond, trueExpr, falseExpr)}} FE: {{SimplifyConditional}} performs basic simplifications. BE: Implemented in {{conditional-functions.[h|cc]}} as an interpreted-only function to allow short-circuit argument evaluation. *Suggestion:* Rewrite in the FE to {code:sql} CASE WHEN cond THEN trueExpr ELSE falseExpr END {code} {{IF()}} will then vanish from the plan so remove the BE implementation, leaving the entry in {{impala_functions.py}}. h4. {{COALESCE(e1, e2, … en)}} FE: {{SimplifyConditional}} performs basic simplifications. BE: Implemented in {{conditional-functions.[h|cc]}} as an interpreted-only function to allow short-circuit argument evaluation. *Suggestion:* Rewrite in the FE to {noformat} CASE WHEN [ei IS NOT NULL THEN ei]* ELSE en END {noformat} When doing so, extend two existing optimizations. 1. Remove not only leading null values, but all null values. 2. Special case not just the last non-null literal, but rather when encountering the first such value, drop all remaining terms. {{COLAESCE()}} will then vanish from the plan so remove the BE implementation. h4. Remove {{conditional-functions.[h|cc]}} Since the above will remove the the three special conditional functions, remove {{conditional-functions.[h|cc]}} as well. > Codegen output for conditional functions (if,isnull, coalesce) is very > suboptimal > --------------------------------------------------------------------------------- > > Key: IMPALA-7655 > URL: https://issues.apache.org/jira/browse/IMPALA-7655 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Reporter: Tim Armstrong > Priority: Major > Labels: codegen, perf, performance > > https://gerrit.cloudera.org/#/c/11565/ provided a clue that an aggregation > involving an if() function was very slow, 10x slower than the equivalent > version using a case: > {noformat} > [localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(case > when l_orderkey is NULL then 1 else NULL end) from > tpch10_parquet.lineitem;summary; > NUM_NODES set to 1 > MT_DOP set to 1 > Query: select count(case when l_orderkey is NULL then 1 else NULL end) from > tpch10_parquet.lineitem > Query submitted at: 2018-10-04 11:17:31 (Coordinator: > http://tarmstrong-box:25000) > Query progress can be monitored at: > http://tarmstrong-box:25000/query_plan?query_id=274b2a6f35cefe31:95a1964200000000 > +----------------------------------------------------------+ > | count(case when l_orderkey is null then 1 else null end) | > +----------------------------------------------------------+ > | 0 | > +----------------------------------------------------------+ > Fetched 1 row(s) in 0.51s > +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+ > | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak > Mem | Est. Peak Mem | Detail | > +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+ > | 01:AGGREGATE | 1 | 44.03ms | 44.03ms | 1 | 1 | 25.00 > KB | 10.00 MB | FINALIZE | > | 00:SCAN HDFS | 1 | 411.57ms | 411.57ms | 59.99M | -1 | 16.61 > MB | 88.00 MB | tpch10_parquet.lineitem | > +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+ > [localhost:21000] default> set num_nodes=1; set mt_dop=1; select > count(if(l_orderkey is NULL, 1, NULL)) from tpch10_parquet.lineitem;summary; > NUM_NODES set to 1 > MT_DOP set to 1 > Query: select count(if(l_orderkey is NULL, 1, NULL)) from > tpch10_parquet.lineitem > Query submitted at: 2018-10-04 11:23:07 (Coordinator: > http://tarmstrong-box:25000) > Query progress can be monitored at: > http://tarmstrong-box:25000/query_plan?query_id=8e46ab1b84c4dbff:2786ca2600000000 > +----------------------------------------+ > | count(if(l_orderkey is null, 1, null)) | > +----------------------------------------+ > | 0 | > +----------------------------------------+ > Fetched 1 row(s) in 1.01s > +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+ > | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak > Mem | Est. Peak Mem | Detail | > +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+ > | 01:AGGREGATE | 1 | 422.07ms | 422.07ms | 1 | 1 | 25.00 > KB | 10.00 MB | FINALIZE | > | 00:SCAN HDFS | 1 | 511.13ms | 511.13ms | 59.99M | -1 | 16.61 > MB | 88.00 MB | tpch10_parquet.lineitem | > +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+ > {noformat} > It turns out that this is because we don't have good codegen support for > ConditionalFunction, and just fall back to emitting a call to the interpreted > path: > https://github.com/apache/impala/blob/master/be/src/exprs/conditional-functions.cc#L28 > See CaseExpr for an example of much better codegen support: > https://github.com/apache/impala/blob/master/be/src/exprs/case-expr.cc#L178 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org