[Impala-ASF-CR] IMPALA-11470: Add Cache For Codegen Functions

Yida Wu (Code Review) Tue, 06 Dec 2022 11:28:50 -0800

Yida Wu has uploaded a new patch set (#17). ( 
http://gerrit.cloudera.org:8080/19181 )


Change subject: IMPALA-11470: Add Cache For Codegen Functions
......................................................................

IMPALA-11470: Add Cache For Codegen Functions

The patch adds supports of the cache for CodeGen functions
to improve the performance of sub-second queries.

The main idea is to store the codegen functions to a cache,
and reuse them when it is appropriate to avoid repeated llvm
optimization time which could take over hundreds of milliseconds.

In this patch, we implement a cache to store codegen functions.
The cache is a singleton instance for each daemon, and contains
multiple cache entries. Each cache entry is at the fragment
level, that is storing all the codegen functions of a fragment
in a cache entry, if one exactly same fragment comes again, it
should be able to find all the codegen functions it needs
from the specific cache entry, therefore saving the time.

The module bitcode is used as the key to the cache, which will
be generated before the module optimization and final
compilation. If codegen_cache_mode is NORMAL, which is by default,
we will store the full bitcode string as the key. Otherwise, if
codegen_cache_mode is set to OPTIMAL, we will store a key only
containing the hash code and the total length of a full key to
reduce memory consumption.

Also, KrpcDataStreamSenderConfig::CodegenHashRow() is changed to
pass the hash seed as an argument because it can't hit the cache
for the fragment if using a dynamic hash seed within the codegen
function.

Codegen cache is disabled automatically for a fragment using a
native udf, because it can lead to a crash in this case. The reason
for that is the udf is loaded to the llvm execution engine global
mapping instead of the llvm module, however, the current key to the
cache entry uses the llvm module bitcode which can't reflect the
change of the udf address if the udf is reloaded during runtime,
for example database recreation, then it could lead to a crash due
to using an old udf address from the cache. Disable it until there
is a better solution, filed IMPALA-11771 to follow.

The patch also introduces following new flags for start and query
options for feature configuration and operation purpose.
Start option for configuration:
  - codegen_cache_capacity: The capacity of the cache, if set to 0,
    codegen cache is disabled.

Query option for operations:
  - disable_codegen_cache: Codegen cache will be disabled when it
    is set to true.

  - codegen_cache_mode: It is defined by a new enum type
    TCodeGenCacheMode. There are four types, NORMAL and OPTIMAL,
    and two other types, NORMAL_DEBUG and OPTIMAL_DEBUG, which are
    the debug mode of the first two types.
    If using NORMAL, a full key will be stored to the cache, it will
    cost more memory for each entry because the key is the bitcode
    of the llvm module, it can be large.
    If using OPTIMAL, the cache will only store the hash code and
    length of the key, it reduces the memory consumption largely,
    however, could be possible to have collision issues.
    If using debug modes, the behavior would be the same as the
    non-debug modes, but more logs or statistics will be allowed,
    that means could be slower.
    Only valid when disable_codegen_cache is set to false.

New impalad metrics:
  - impala.codegen-cache.misses
  - impala.codegen-cache.entries-in-use
  - impala.codegen-cache.entries-in-use-bytes
  - impala.codegen-cache.entries-evicted
  - impala.codegen-cache.hits
  - impala.codegen-cache.entry-sizes

New profile Metrics:
  - CodegenCacheLookupTime
  - CodegenCacheSaveTime
  - ModuleBitcodeGenTime
  - NumCachedFunctions

TPCH-1 performance evaluation (8 iteration) on AWS m5a.4xlarge,
the result removes the first iteration to show the benefit of the
cache:
Query     Cached(s) NoCache(s) Delta(Avg) NoCodegen(s)  Delta(Avg)
TPCH-Q1    0.39      1.02       -61.76%     5.59         -93.02%
TPCH-Q2    0.56      1.21       -53.72%     0.47         19.15%
TPCH-Q3    0.37      0.77       -51.95%     0.43         -13.95%
TPCH-Q4    0.36      0.51       -29.41%     0.33         9.09%
TPCH-Q5    0.39      1.1        -64.55%     0.39         0%
TPCH-Q6    0.24      0.27       -11.11%     0.77         -68.83%
TPCH-Q7    0.39      1.2        -67.5%      0.39         0%
TPCH-Q8    0.58      1.46       -60.27%     0.45         28.89%
TPCH-Q9    0.8       1.38       -42.03%     1            -20%
TPCH-Q10   0.6       1.03       -41.75%     0.85         -29.41%
TPCH-Q11   0.3       0.93       -67.74%     0.2          50%
TPCH-Q12   0.28      0.48       -41.67%     0.38         -26.32%
TPCH-Q13   1.11      1.22       -9.02%      1.16         -4.31%
TPCH-Q14   0.55      0.78       -29.49%     0.45         22.22%
TPCH-Q15   0.33      0.73       -54.79%     0.44         -25%
TPCH-Q16   0.32      0.78       -58.97%     0.41         -21.95%
TPCH-Q17   0.56      0.84       -33.33%     0.89         -37.08%
TPCH-Q18   0.54      0.92       -41.3%      0.89         -39.33%
TPCH-Q19   0.35      2.34       -85.04%     0.35         0%
TPCH-Q20   0.34      0.98       -65.31%     0.31         9.68%
TPCH-Q21   0.83      1.14       -27.19%     0.86         -3.49%
TPCH-Q22   0.26      0.52       -50%        0.25         4%

>From the result, it shows a pretty good performance compared to
codegen without cache (default setting). However, compared
to codegen disabled, as expected, for short queries, codegen
cache is not always faster, probably because for the codegen
cache, it still needs some time to prepare the codegen functions
and generate an appropriate module bitcode to be the key, if
the time of the preparation is larger than the benefit from
the codegen functions, especially for the extremely short queries,
the result can be slower than not using the codegen. There could
be room to improve in future.

We also test the total cache entry size for tpch queries. The data
below shows the total codegen cache used by each tpch query. We
can see the optimal mode is very helpful to reduce the size of
the cache, and the reason is the much smaller key in optimal mode
we mentioned before because the only difference between two modes
is the key.

Query     Normal(KB)  Optimal(KB)
TPCH-Q1     604.1       50.9
TPCH-Q2     973.4       135.5
TPCH-Q3     561.1       36.5
TPCH-Q4     423.3       41.1
TPCH-Q5     866.9       93.3
TPCH-Q6     295.9       4.9
TPCH-Q7     1105.4      124.5
TPCH-Q8     1382.6      211
TPCH-Q9     1041.4      119.5
TPCH-Q10    738.4       65.4
TPCH-Q11    1201.6      136.3
TPCH-Q12    452.8       46.7
TPCH-Q13    541.3       48.1
TPCH-Q14    696.8       102.8
TPCH-Q15    1148.1      95.2
TPCH-Q16    740.6       77.4
TPCH-Q17    990.1       133.4
TPCH-Q18    376         70.8
TPCH-Q19    1280.1      179.5
TPCH-Q20    1260.9      180.7
TPCH-Q21    722.5       66.8
TPCH-Q22    713.1       49.8

Tests:
Ran exhaustive tests.
Added E2e testcase TestCodegenCache.
Added unit testcase LlvmCodeGenCacheTest.

Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1
---
M be/src/codegen/CMakeLists.txt
A be/src/codegen/llvm-codegen-cache-test.cc
A be/src/codegen/llvm-codegen-cache.cc
A be/src/codegen/llvm-codegen-cache.h
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exprs/scalar-expr.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-state.h
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/test-env.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M common/thrift/metrics.json
A testdata/workloads/functional-query/queries/QueryTest/codegen-cache-udf.test
M tests/common/test_result_verifier.py
A tests/custom_cluster/test_codegen_cache.py
24 files changed, 1,845 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/19181/17
--
To view, visit http://gerrit.cloudera.org:8080/19181
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1
Gerrit-Change-Number: 19181
Gerrit-PatchSet: 17
Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com>
Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com>

[Impala-ASF-CR] IMPALA-11470: Add Cache For Codegen Functions

Reply via email to