Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12797


Change subject: IMPALA-4356,IMPALA-7331: codegen all ScalarExprs
......................................................................

IMPALA-4356,IMPALA-7331: codegen all ScalarExprs

Based on initial draft patch by Pooja Nilangekar.

Codegen'd expressions can be executed in two ways - either by
being called directly from a fully codegend function, or from
interpreted code via a function pointer (previously
ScalarFnCall::scalar_fn_wrapper_).

This change moves the function pointer from ScalarFnCall to its
base class ScalarExpr, so the full expr tree can be codegen'd, not
just the ScalarFnCall subtrees. The key refactoring and improvements
are:
* ScalarExpr::Get*Val() switches between interpreted and the codegen'd
  function pointer code paths in an inline function, avoiding a
  virtual function call to ScalarFnCal::Get*Val().
* Boilerplate logic is moved to ScalarExpr::GetCodegendComputeFn(),
  which calls a virtual function GetCodegenComputeFnImpl().
* ScalarFnCall's logic for deciding whether to interpret or codegen is
  better abstracted and exposed to ScalarExpr as IsInterpretable()
  and ShouldCodegen() methods.
* The ScalarExpr::codegend_compute_fn_ function pointer is only
  populated for expressions that are "codgen entry points". The
  include the roots of expr trees and non-root expressions
  where the parent expression calls Get*Val() from the
  pseudo-codegend GetCodegendComputeFnWrapper().
* ScalarFnCall is always initialised for interpreted execution.
  Othe the function pointer is needed for non-root expressions,
  e.g. to support ScalarExprEvaluator::GetConstantVal().
* Latent bugs/gaps for codegen of CollectionVal are fixed. CollectionVal
  is modified to use the StringVal memory layout to allow code sharing
  with StringVal. These fixes allowed simplification of
  IsNotEmptyPredicate codegen (from IMPALA-7657).

IMPALA-7331 (CHAR codegen support functions) is also fixed because
it was simpler to enable CHAR codegen within ScalarExpr than to carry
forward the exiting CHAR workarounds from ScalarFnCall. The CHAR-specific
codegen support required in the scalar expr subsystem is very limited.
StringVal intermediates are used everywhere. Only SlotRef actually
operates on the different tuple layout, and the required codegen support
for SlotRef already exists for UDA intermediates anyway.

Testing:
* Ran exhaustive tests.

Perf:
* Ran a basic insert benchmark, which went from 10.1s to 7.6s
  create table foo stored as parquet as
  select case when l_orderkey % 2 = 0 then 'aaa' else 'bbb' end
  from tpch30_parquet.lineitem;
* Added perf regression test to tpcds-insert, similar to the manual
  benchmark.
* Ran single-node TPC-H with large and small scale factors, to estimate
  impact on execution perf and query startup time, respectively.

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 6.84    | -0.18%     | 4.49       | -0.31% 
        |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval   |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+
| TPCH(30) | TPCH-Q20 | parquet / none / none | 2.58   | 2.47        |   +4.18% 
  |   1.29%   |   0.88%        | 5     |   +4.12%       | 2.31    | 5.81   |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 4.81   | 4.61        |   +4.33% 
  |   2.18%   |   2.15%        | 5     |   +3.91%       | 1.73    | 3.09   |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 26.45  | 26.16       |   +1.09% 
  |   0.37%   |   0.50%        | 5     |   +1.36%       | 2.02    | 3.94   |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 15.92  | 15.75       |   +1.09% 
  |   2.87%   |   1.65%        | 5     |   +0.88%       | 0.29    | 0.73   |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 2.38   | 2.35        |   +1.12% 
  |   1.64%   |   1.11%        | 5     |   +0.80%       | 1.15    | 1.26   |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 2.94   | 2.91        |   +1.13% 
  |   7.68%   |   5.37%        | 5     |   -0.34%       | -0.29   | 0.27   |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 18.10  | 18.02       |   +0.42% 
  |   2.70%   |   0.56%        | 5     |   +0.28%       | 0.29    | 0.34   |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 4.72   | 4.72        |   -0.04% 
  |   1.20%   |   1.65%        | 5     |   +0.05%       | 0.00    | -0.04  |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 3.92   | 3.93        |   -0.26% 
  |   1.08%   |   2.36%        | 5     |   +0.20%       | 0.58    | -0.23  |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.27   | 1.27        |   -0.28% 
  |   0.22%   |   0.88%        | 5     |   +0.09%       | 0.29    | -0.68  |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 2.64   | 2.65        |   -0.45% 
  |   1.65%   |   0.65%        | 5     |   -0.24%       | -0.58   | -0.57  |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 3.10   | 3.13        |   -0.76% 
  |   1.47%   |   1.12%        | 5     |   -0.21%       | -0.29   | -0.93  |
| TPCH(30) | TPCH-Q2  | parquet / none / none | 1.20   | 1.21        |   -0.80% 
  |   2.26%   |   2.47%        | 5     |   -0.82%       | -1.15   | -0.53  |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 1.97   | 1.99        |   -1.37% 
  |   1.84%   |   3.21%        | 5     |   -0.47%       | -0.58   | -0.83  |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 11.53  | 11.63       |   -0.91% 
  |   0.46%   |   0.49%        | 5     |   -0.95%       | -2.02   | -3.08  |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 5.13   | 5.21        |   -1.51% 
  |   2.24%   |   4.05%        | 5     |   -0.94%       | -0.58   | -0.73  |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 3.61   | 3.66        |   -1.40% 
  |   0.66%   |   0.79%        | 5     |   -1.33%       | -1.73   | -3.05  |
| TPCH(30) | TPCH-Q7  | parquet / none / none | 19.42  | 19.71       |   -1.52% 
  |   1.34%   |   1.39%        | 5     |   -1.22%       | -1.44   | -1.76  |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 5.08   | 5.15        |   -1.49% 
  |   1.34%   |   0.73%        | 5     |   -1.35%       | -1.44   | -2.20  |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 3.42   | 3.49        |   -1.92% 
  |   0.93%   |   1.47%        | 5     |   -1.53%       | -1.15   | -2.49  |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.15   | 1.19        |   -3.17% 
  |   2.27%   |   1.95%        | 5     |   -4.21%       | -1.15   | -2.41  |
| TPCH(30) | TPCH-Q1  | parquet / none / none | 9.26   | 9.63        |   -3.85% 
  |   0.62%   |   0.59%        | 5     |   -3.78%       | -2.31   | -10.25 |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+

Cluster Name: UNKNOWN
Lab Run Info: UNKNOWN
Impala Version:          impalad version 3.2.0-SNAPSHOT RELEASE ()
Baseline Impala Version: impalad version 3.2.0-SNAPSHOT RELEASE (2019-03-19)

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(2)  | parquet / none / none | 0.90    | -0.08%     | 0.80       | -0.05% 
        |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval  |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+-------+
| TPCH(2)  | TPCH-Q18 | parquet / none / none | 1.22   | 1.19        |   +1.93% 
  |   3.81%   |   4.46%        | 20    |   +3.34%       | 1.62    | 1.46  |
| TPCH(2)  | TPCH-Q10 | parquet / none / none | 0.74   | 0.73        |   +1.97% 
  |   3.36%   |   2.94%        | 20    |   +0.97%       | 1.88    | 1.95  |
| TPCH(2)  | TPCH-Q11 | parquet / none / none | 0.49   | 0.48        |   +1.91% 
  |   6.19%   |   4.64%        | 20    |   +0.25%       | 0.95    | 1.09  |
| TPCH(2)  | TPCH-Q4  | parquet / none / none | 0.43   | 0.43        |   +1.99% 
  |   6.26%   |   5.86%        | 20    |   +0.15%       | 0.92    | 1.03  |
| TPCH(2)  | TPCH-Q15 | parquet / none / none | 0.50   | 0.49        |   +1.82% 
  |   7.32%   |   6.35%        | 20    |   +0.26%       | 1.01    | 0.83  |
| TPCH(2)  | TPCH-Q1  | parquet / none / none | 0.98   | 0.97        |   +0.79% 
  |   4.64%   |   2.73%        | 20    |   +0.36%       | 0.77    | 0.65  |
| TPCH(2)  | TPCH-Q19 | parquet / none / none | 0.83   | 0.83        |   +0.65% 
  |   3.33%   |   2.80%        | 20    |   +0.44%       | 2.18    | 0.67  |
| TPCH(2)  | TPCH-Q14 | parquet / none / none | 0.62   | 0.62        |   +0.97% 
  |   2.86%   |   1.00%        | 20    |   +0.04%       | 0.13    | 1.42  |
| TPCH(2)  | TPCH-Q3  | parquet / none / none | 0.88   | 0.87        |   +0.57% 
  |   2.17%   |   1.74%        | 20    |   +0.29%       | 1.15    | 0.92  |
| TPCH(2)  | TPCH-Q12 | parquet / none / none | 0.53   | 0.53        |   +0.27% 
  |   4.58%   |   5.78%        | 20    |   +0.46%       | 1.47    | 0.16  |
| TPCH(2)  | TPCH-Q17 | parquet / none / none | 0.72   | 0.72        |   +0.15% 
  |   3.64%   |   5.55%        | 20    |   +0.21%       | 0.86    | 0.10  |
| TPCH(2)  | TPCH-Q21 | parquet / none / none | 2.05   | 2.05        |   +0.21% 
  |   1.99%   |   2.37%        | 20    |   +0.01%       | 0.25    | 0.30  |
| TPCH(2)  | TPCH-Q5  | parquet / none / none | 1.28   | 1.27        |   +0.24% 
  |   1.61%   |   1.80%        | 20    |   -0.02%       | -0.57   | 0.44  |
| TPCH(2)  | TPCH-Q13 | parquet / none / none | 1.27   | 1.27        |   -0.34% 
  |   1.69%   |   1.83%        | 20    |   -0.20%       | -1.65   | -0.61 |
| TPCH(2)  | TPCH-Q7  | parquet / none / none | 1.72   | 1.73        |   -0.55% 
  |   2.40%   |   1.69%        | 20    |   -0.03%       | -0.42   | -0.83 |
| TPCH(2)  | TPCH-Q8  | parquet / none / none | 1.27   | 1.28        |   -0.68% 
  |   3.10%   |   3.89%        | 20    |   -0.06%       | -0.54   | -0.62 |
| TPCH(2)  | TPCH-Q6  | parquet / none / none | 0.36   | 0.36        |   -0.84% 
  |   0.79%   |   3.51%        | 20    |   -0.07%       | -0.36   | -1.04 |
| TPCH(2)  | TPCH-Q2  | parquet / none / none | 0.65   | 0.65        |   -1.17% 
  |   4.76%   |   5.99%        | 20    |   -0.05%       | -0.25   | -0.69 |
| TPCH(2)  | TPCH-Q9  | parquet / none / none | 1.59   | 1.62        |   -2.01% 
  |   1.45%   |   5.12%        | 20    |   -0.16%       | -1.24   | -1.69 |
| TPCH(2)  | TPCH-Q20 | parquet / none / none | 0.68   | 0.69        |   -1.73% 
  |   4.35%   |   4.43%        | 20    |   -0.49%       | -1.74   | -1.25 |
| TPCH(2)  | TPCH-Q22 | parquet / none / none | 0.38   | 0.40        |   -2.89% 
  |   7.42%   |   6.39%        | 20    |   -0.21%       | -0.66   | -1.34 |
| TPCH(2)  | TPCH-Q16 | parquet / none / none | 0.59   | 0.62        |   -4.01% 
  |   6.33%   |   5.83%        | 20    |   -4.72%       | -1.39   | -2.13 |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+-------+

Change-Id: I839d7a3a2f5e1309c33a1f66013ef11628c5dc11
---
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/codegen-anyval.h
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/aggregator.cc
M be/src/exec/exec-node.cc
M be/src/exec/filter-context.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/union-node.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/agg-fn.cc
M be/src/exprs/case-expr.cc
M be/src/exprs/case-expr.h
M be/src/exprs/compound-predicates.cc
M be/src/exprs/compound-predicates.h
M be/src/exprs/conditional-functions-ir.cc
M be/src/exprs/conditional-functions.cc
M be/src/exprs/conditional-functions.h
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/hive-udf-call.h
M be/src/exprs/is-not-empty-predicate.cc
M be/src/exprs/is-not-empty-predicate.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/exprs/null-literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-ir.cc
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
A be/src/exprs/scalar-expr.inline.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
D be/src/exprs/slot-ref-ir.cc
M be/src/exprs/slot-ref.cc
M be/src/exprs/slot-ref.h
M be/src/exprs/tuple-is-null-predicate.cc
M be/src/exprs/tuple-is-null-predicate.h
M be/src/exprs/valid-tuple-id.cc
M be/src/exprs/valid-tuple-id.h
M be/src/runtime/CMakeLists.txt
R be/src/runtime/collection-value.cc
M be/src/runtime/collection-value.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/tuple.cc
M be/src/service/fe-support.cc
M be/src/udf/udf-internal.h
M be/src/util/tuple-row-compare.cc
M 
testdata/workloads/functional-query/queries/QueryTest/datastream-sender-codegen.test
M testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
A testdata/workloads/tpcds-insert/queries/expr-insert.test
M tests/query_test/test_codegen.py
M tests/query_test/test_tpcds_queries.py
66 files changed, 899 insertions(+), 803 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/12797/4
--
To view, visit http://gerrit.cloudera.org:8080/12797
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I839d7a3a2f5e1309c33a1f66013ef11628c5dc11
Gerrit-Change-Number: 12797
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>

Reply via email to