Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/19181 )
Change subject: IMPALA-11470: Add Cache For Codegen Functions ...................................................................... Patch Set 5: (9 comments) http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.h@70 PS5, Line 70: Slice hash_code_slice() { return Slice(key_.c_str(), sizeof(HashCode)); } So this takes the first 8 bytes of the key and returns them? http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.h@81 PS5, Line 81: string& str() { return key_; } It looks like everywhere you use this, you convert it to a slice. So maybe the getter should be str_slice and return a Slice. http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.h@92 PS5, Line 92: class CodeGenCacheKeyConstructor { I'm confused why there's a constructor class at all. Why isn't this the constructor function for CodeGenCacheKey? This should be straightforward: class CodeGenCacheKey { CodeGenCacheKey(std::string& source_content) { /* compute the key and store to key_ */ } // accessor methods here ... private: std::string key_; } Looking at how this is initialized in llvm-codegen.cc, looks like it's useful to have a default constructor and be able to check it's not empty, so those can stay. http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.h@195 PS5, Line 195: /// The purpose of it is to prevent the same key simultaneously to be inserted nit: don't need to say "simultaneously" twice. "The purpose of it is to prevent the same key to be inserted simultaneously." would suffice. http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc File be/src/codegen/llvm-codegen-cache.cc: http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc@51 PS5, Line 51: DCHECK(cache_key != nullptr); nit: if you want to guarantee a pointer can't be null, pass a reference instead. But I'm suggesting to significantly restructure how you're creating CodeGenCacheKeys, so this function should go away. http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc@54 PS5, Line 54: int len_general_size = sizeof(CodeGenCacheKey::GeneralElementSize); This is the only place GeneralElementSize is used, and the name isn't very descriptive. You're really just allocating space for 2 integers. Taking sizeof(int) and adding a comment that it's for the key length and content length would probably be clearer. http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc@66 PS5, Line 66: hashcode = HashUtil::MurmurHash2_64(result.c_str(), length, 0 /*seed*/); I think we reduce the randomness in the hash by including 8 null bytes at the beginning of it (from the uninitialized hoshcode). http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc@76 PS5, Line 76: codegen_cache_hits_ = metrics->AddCounter("impala.codegen-cache.hits", 0); nit: we could do all the rest of this method in the constructor, and probably don't need to use unique_ptr and explicit reset as much. But I understand the reason for a separate Init function, and once you have that the rest isn't terrible. http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc@122 PS5, Line 122: Slice key = cache_key.hash_code_slice(); My biggest concern is what happens during hash collisions. This patch doesn't have any discussion of that. Are there things we could check - like call signature - to reduce the likelihood we try to use the wrong bytecode? Can we consider using MurmurHash3_x64_128 for a larger hash space? -- To view, visit http://gerrit.cloudera.org:8080/19181 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1 Gerrit-Change-Number: 19181 Gerrit-PatchSet: 5 Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com> Gerrit-Comment-Date: Fri, 11 Nov 2022 00:19:07 +0000 Gerrit-HasComments: Yes