Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19181 )
Change subject: IMPALA-11470: Add Cache For Codegen Functions ...................................................................... Patch Set 9: (5 comments) http://gerrit.cloudera.org:8080/#/c/19181/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19181/9//COMMIT_MSG@34 PS9, Line 34: As a limitation, we don't cache the udf functions, because it > Yes, if any udf is in the fragment, so the execution engine of that fragmen I agree that adding UDF support in this patch does not worth the risk. On the other hand it would be great to have a regression test that can more or less reliably reproduce the issue to help further investigation. It would be also good to know what kind of UDFs are problematic - my guess is native, and generally Hive UDFs should be safe as in that case we do not handle the loaded module in c++. We discussed this a bit with Daniel, and one theory is that reusing the same codegend module can crash if the module was reloaded and the function addresses have changed, as we are saving the function pointers during codegen: https://github.com/apache/impala/blob/84fa6d210d3966e5ece8b4ac84ff8bd8780dec4e/be/src/codegen/llvm-codegen.cc#L860 https://github.com/apache/impala/blob/84fa6d210d3966e5ece8b4ac84ff8bd8780dec4e/be/src/codegen/llvm-codegen.cc#L916 If pointer set in addGlobalMapping does not affect the bitcode, then the module can be incorrectly reused. http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen-cache.h@38 PS9, Line 38: If the cache is in a debug mode, : /// the content of the key will be after the hash code, which could be a combination : /// of several keys, otherwise, in non-debug mode, the CodeGenCacheKey to be stored : /// to the cache would only contain a hash code to reduce the memory consumption I wonder why the full key is stored in CodeGenCacheKey in debug mode - wouldn't it make debug mode more efficient if the full key was stored in the value and during lookup first get the value for the key and then check that the full key actually matches? This wouldn't allow keys with the same hash to coexist in the cache, but this is the same behavior as in non-debug mode. http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen-cache.cc File be/src/codegen/llvm-codegen-cache.cc: http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen-cache.cc@185 PS9, Line 185: // Because the Cache::Insert() requires the caller to avoid multiple handles for : // the same key to inser > Thanks for pointing this out. I didn't notice it, and surprised the cache c The two caches we linked are a bit different - when implementing LIRS eviction policy Joe McDonnell copied the Kudu cache, added a new implementation and also this comment. https://gerrit.cloudera.org/c/15306/25/be/src/util/cache/cache.h It is possible that this comment is not relevant for LRU cache, but it would be better to consult this with Joe. http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc@1131 PS9, Line 1131: LOG(INFO) << DebugCacheEntryString(cache_key, true /*is_lookup*/, : state_->query_options().codegen_cache_debug_mode, suc); > Yeah, but non-debug mode mainly contains the query id and the hash, they ar No, I agree with the current log level. http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc@1160 PS9, Line 1160: void* jitted_function = reinterpret_cast<void*>( : execution_engine_cached_->getFunctionAddress(function->getName())); > Not sure how to get those from the jitted_function, but took a look at the Ok. -- To view, visit http://gerrit.cloudera.org:8080/19181 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1 Gerrit-Change-Number: 19181 Gerrit-PatchSet: 9 Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com> Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com> Gerrit-Comment-Date: Wed, 23 Nov 2022 17:05:30 +0000 Gerrit-HasComments: Yes