Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/19181 )
Change subject: IMPALA-11470: Add Cache For Codegen Functions ...................................................................... Patch Set 6: (10 comments) Thanks Michael and Joe for the review. This patch mainly changes the hash code of a CodeGenCacheKey to 128bits to reduce the collision chance and fixes a memory leak issue during cache entry eviction due to unable to release a const shared pointer in the entry. http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.h@70 PS5, Line 70: size_t operator()(const HashCode& h) const { > So this takes the first 8 bytes of the key and returns them? Yes, set the first sizeof(HashCode) bytes of the key as the hash code. http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.h@81 PS5, Line 81: > It looks like everywhere you use this, you convert it to a slice. So maybe Done http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.h@92 PS5, Line 92: void set_key(string key) { key_.swap(key); } > I'm confused why there's a constructor class at all. Why isn't this the con I think the main purpose of setting a Constructor is to be able to collect and store the data from different objects like fragment-state or query-state to finally generate a key via some complex computation, while keeping the CodeGenCacheKey as a simpler structure. Right now, only the bitcode string is needed for the key, but the next version might include more things to replace the bitcode string, so I prefer to keep the Constructor as an entry to construct the key for now. http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.h@195 PS5, Line 195: void RollbackHitCount() { > nit: don't need to say "simultaneously" twice. "The purpose of it is to pre Done http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc File be/src/codegen/llvm-codegen-cache.cc: http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc@51 PS5, Line 51: } > nit: if you want to guarantee a pointer can't be null, pass a reference ins The codebase convention is to use by pointer instead of by reference for an output parameter to make it more obvious. (Even though not all the places did) https://gerrit.cloudera.org/#/c/16318/24/be/src/runtime/tmp-file-mgr-internal.h@386 http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc@54 PS5, Line 54: void CodeGenCacheKeyConstructor::construct( > This is the only place GeneralElementSize is used, and the name isn't very It could be useful if we have a key analyzer later. Changed the name to ElementLengthType, hope it looks clearer. http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc@66 PS5, Line 66: string result; > I think we reduce the randomness in the hash by including 8 null bytes at t Done http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc@122 PS5, Line 122: reinterpret_cast<const CodeGenCacheEntry*>(cache_->Value(handle).data()); > My biggest concern is what happens during hash collisions. This patch doesn This is a good idea. Changed to use MurmurHash3_x64_128 and added comments. From the previous discussion, we may change to use the full bitcode which is debug mode if collision is found or simply disable the cache, but the chance should be quite small. I think currently the problem is the bitcode string is too large, if we have a smaller key in the next version, we may use that to replace hash code in non-debug mode. http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen.cc@1137 PS5, Line 1137: // Get pointers to all codegen'd functions. : for (int i = 0; i < fns_to_jit_compile_.size(); ++i) { > Could we fall back to regular codegen path if we encounter this? Changed to log a warning. But found an issue when using getFunctionAddress with a non-existent function name, it will hit assertion "report_fatal_error called with success value" within the execution engine , tested it in the old impala without cache, same. Looks like a bug in llvm 5 to me, not sure if llvm 16 would be good. But for the codegen cache, the engine doesn't expect to have non-existent function names unless we are getting a wrong cache with the same key (hash code). One solution to me is adding function names as part of the key to make sure an engine from a cache hit should have all the functions, or putting names in the cache entry, double check them before calling getFunctionAddress. http://gerrit.cloudera.org:8080/#/c/19181/4/tests/custom_cluster/test_codegen_cache.py File tests/custom_cluster/test_codegen_cache.py: http://gerrit.cloudera.org:8080/#/c/19181/4/tests/custom_cluster/test_codegen_cache.py@27 PS4, Line 27: class TestCodegenCache(CustomClusterTestSuite): > It would be nice to have some negative test cases for the times when we dis Seems a bit difficult to detect because it is fragment level disabled instead of query level, and for simplest case like "select udf(x)", the udf is called and output a const value before reaching the executor, so it doesn't contain the udf function in the codegen cache. But from my manual tests on several cases, they look good. Will add testcases for this later. -- To view, visit http://gerrit.cloudera.org:8080/19181 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1 Gerrit-Change-Number: 19181 Gerrit-PatchSet: 6 Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com> Gerrit-Comment-Date: Tue, 15 Nov 2022 18:44:11 +0000 Gerrit-HasComments: Yes