Yida Wu has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/19181 )
Change subject: IMPALA-11470: Add Cache For Codegen Functions ...................................................................... IMPALA-11470: Add Cache For Codegen Functions The patch adds supports of the cache for CodeGen functions to improve the performance of sub-second queries. The main idea is to store the codegen functions to a cache, and reuse them when it is appropriate to avoid repeated llvm optimization time which could take over hundreds of milliseconds. In this patch, we implement a cache to store codegen functions. The cache is a singlton instance for each daemon, and contains multiple cache entries. Each cache entry is at the fragment level, that is storing all the codegen functions of a fragment in a cache entry, if one exactly same fragment comes again, it should be able to find all the codegen functions it needs from the specific cache entry, therefore saving the time. The module bitcode is used as the key to the cache, which will be generated before the module optimization and final compilation. If codegen_cache_mode is NORMAL, which is by default, we will store the full bitcode string as the key. Otherwise, if codegen_cache_mode is set to OPTIMAL, we will store a key only containing the hash code and the total length of a full key to reduce memory consumption. Also, KrpcDataStreamSenderConfig::CodegenHashRow() is changed to pass the hash seed as an argument because it can't hit the cache for the fragment if using a dynamic hash seed within the codegen function. Codegen cache is disabled automatically for a fragment using a native udf, because it can lead to a crash in this case. The reason for that is the udf is loaded to the llvm execution engine global mapping instead of the llvm module, however, the current key to the cache entry uses the llvm module bitcode which can't reflect the change of the udf address if the udf is reloaded during runtime, for example database recreation, then it could lead to a crash due to using an old udf address from the cache. Disable it until there is a better solution, filed IMPALA-11771 to follow. The patch also introduces following new flags for start and query options for feature configuration and operation purpose. Start option for configuration: - codegen_cache_capacity: The capacity of the cache, if set to 0, codegen cache is disabled. Query option for operations: - disable_codegen_cache: Codegen cache will be disabled when it is set to true. - codegen_cache_mode: It is defined by a new enum type TCodeGenCacheMode. There are four types, NORMAL and OPTIMAL, and two other types, NORMAL_DEBUG and OPTIMAL_DEBUG, which are the debug mode of the first two types. If using NORMAL, a full key will be stored to the cache, it will cost more memory for each entry because the key is the bitcode of the llvm module, it can be large. If using OPTIMAL, the cache will only store the hash code and length of the key, it reduces the memory consumption largely, however, could be possible to have collision issues. If using debug modes, the behavior would be the same as the non-debug modes, but more logs or statistics will be allowed, that means could be slower. Only valid when disable_codegen_cache is set to false. New impalad metrics: - impala.codegen-cache.misses - impala.codegen-cache.entries-in-use - impala.codegen-cache.entries-in-use-bytes - impala.codegen-cache.entries-evicted - impala.codegen-cache.hits - impala.codegen-cache.entry-sizes New profile Metrics: - CodegenCacheLookupTime - CodegenCacheSaveTime - ModuleBitcodeGenTime - NumCachedFunctions TPCH-1 performance evaluation (8 iteration) on AWS m5a.4xlarge, the result removes the first iteration to show the benefit of the cache: Query Cached(s) NoCache(s) Delta(Avg) NoCodegen(s) Delta(Avg) TPCH-Q1 0.39 1.02 -61.76% 5.59 -93.02% TPCH-Q2 0.56 1.21 -53.72% 0.47 19.15% TPCH-Q3 0.37 0.77 -51.95% 0.43 -13.95% TPCH-Q4 0.36 0.51 -29.41% 0.33 9.09% TPCH-Q5 0.39 1.1 -64.55% 0.39 0% TPCH-Q6 0.24 0.27 -11.11% 0.77 -68.83% TPCH-Q7 0.39 1.2 -67.5% 0.39 0% TPCH-Q8 0.58 1.46 -60.27% 0.45 28.89% TPCH-Q9 0.8 1.38 -42.03% 1 -20% TPCH-Q10 0.6 1.03 -41.75% 0.85 -29.41% TPCH-Q11 0.3 0.93 -67.74% 0.2 50% TPCH-Q12 0.28 0.48 -41.67% 0.38 -26.32% TPCH-Q13 1.11 1.22 -9.02% 1.16 -4.31% TPCH-Q14 0.55 0.78 -29.49% 0.45 22.22% TPCH-Q15 0.33 0.73 -54.79% 0.44 -25% TPCH-Q16 0.32 0.78 -58.97% 0.41 -21.95% TPCH-Q17 0.56 0.84 -33.33% 0.89 -37.08% TPCH-Q18 0.54 0.92 -41.3% 0.89 -39.33% TPCH-Q19 0.35 2.34 -85.04% 0.35 0% TPCH-Q20 0.34 0.98 -65.31% 0.31 9.68% TPCH-Q21 0.83 1.14 -27.19% 0.86 -3.49% TPCH-Q22 0.26 0.52 -50% 0.25 4% >From the result, it shows a pretty good performance compared to codegen without cache (default setting). However, compared to codegen disabled, as expected, for short queries, codegen cache is not always faster, probably because for the codegen cache, it still needs some time to prepare the codegen functions and generate an appropriate module bitcode to be the key, if the time of the preparation is larger than the benefit from the codegen functions, especially for the extremely short queries, the result can be slower than not using the codegen. There could be room to improve in future. Tests: Ran exhaustive tests. Added E2e testcase TestCodegenCache. Added unit testcase LlvmCodeGenCacheTest. Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1 --- M be/src/codegen/CMakeLists.txt A be/src/codegen/llvm-codegen-cache-test.cc A be/src/codegen/llvm-codegen-cache.cc A be/src/codegen/llvm-codegen-cache.h M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exprs/scalar-expr.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/fragment-state.h M be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/test-env.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M common/thrift/metrics.json A testdata/workloads/functional-query/queries/QueryTest/codegen-cache-udf.test M tests/common/test_result_verifier.py A tests/custom_cluster/test_codegen_cache.py 24 files changed, 1,785 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/19181/15 -- To view, visit http://gerrit.cloudera.org:8080/19181 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1 Gerrit-Change-Number: 19181 Gerrit-PatchSet: 15 Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com> Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com>