Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19181 )

Change subject: IMPALA-11470: Add Cache For Codegen Functions
......................................................................


Patch Set 9:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/19181/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19181/9//COMMIT_MSG@34
PS9, Line 34: As a limitation, we don't cache the udf functions, because it
> Yes, if any udf is in the fragment, so the execution engine of that fragmen
I agree that adding UDF support in this patch does not worth the risk. On the 
other hand it would be great to have a regression test that can more or less 
reliably reproduce the issue to help further investigation. It would be also 
good to know what kind of UDFs are problematic - my guess is native, and 
generally Hive UDFs should be safe as in that case we do not handle the loaded 
module in c++.

We discussed this a bit with Daniel, and one theory is that  reusing the same 
codegend module can crash if the module was reloaded and the function addresses 
have changed, as we are saving the function pointers during codegen:
https://github.com/apache/impala/blob/84fa6d210d3966e5ece8b4ac84ff8bd8780dec4e/be/src/codegen/llvm-codegen.cc#L860
https://github.com/apache/impala/blob/84fa6d210d3966e5ece8b4ac84ff8bd8780dec4e/be/src/codegen/llvm-codegen.cc#L916
If pointer set in addGlobalMapping does not affect the bitcode, then the module 
can be incorrectly reused.


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen-cache.h@38
PS9, Line 38: If the cache is in a debug mode,
            : /// the content of the key will be after the hash code, which 
could be a combination
            : /// of several keys, otherwise, in non-debug mode, the 
CodeGenCacheKey to be stored
            : /// to the cache would only contain a hash code to reduce the 
memory consumption
I wonder why the full key is stored in CodeGenCacheKey in debug mode - wouldn't 
it make debug mode more efficient if the full key was stored in the value and 
during lookup first get the value for the key and then check that the full key 
actually matches? This wouldn't allow keys with the same hash to coexist in the 
cache, but this is the same behavior as in non-debug mode.


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen-cache.cc
File be/src/codegen/llvm-codegen-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen-cache.cc@185
PS9, Line 185:     // Because the Cache::Insert() requires the caller to avoid 
multiple handles for
             :     // the same key to inser
> Thanks for pointing this out. I didn't notice it, and surprised the cache c
The two caches we linked are a bit different - when implementing LIRS eviction 
policy Joe McDonnell copied the Kudu cache, added a new implementation and also 
this comment. https://gerrit.cloudera.org/c/15306/25/be/src/util/cache/cache.h
It is possible that this comment is not relevant for LRU cache, but it would be 
better to consult this with Joe.


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc@1131
PS9, Line 1131:   LOG(INFO) << DebugCacheEntryString(cache_key, true 
/*is_lookup*/,
              :       state_->query_options().codegen_cache_debug_mode, suc);
> Yeah, but non-debug mode mainly contains the query id and the hash, they ar
No, I agree with the current log level.


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc@1160
PS9, Line 1160:       void* jitted_function = reinterpret_cast<void*>(
              :           
execution_engine_cached_->getFunctionAddress(function->getName()));
> Not sure how to get those from the jitted_function, but took a look at the
Ok.



--
To view, visit http://gerrit.cloudera.org:8080/19181
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1
Gerrit-Change-Number: 19181
Gerrit-PatchSet: 9
Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com>
Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Comment-Date: Wed, 23 Nov 2022 17:05:30 +0000
Gerrit-HasComments: Yes

Reply via email to