Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19181 )

Change subject: IMPALA-11470: Add Cache For Codegen Functions
......................................................................


Patch Set 5:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.h@70
PS5, Line 70:   Slice hash_code_slice() { return Slice(key_.c_str(), 
sizeof(HashCode)); }
So this takes the first 8 bytes of the key and returns them?


http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.h@81
PS5, Line 81:   string& str() { return key_; }
It looks like everywhere you use this, you convert it to a slice. So maybe the 
getter should be str_slice and return a Slice.


http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.h@92
PS5, Line 92: class CodeGenCacheKeyConstructor {
I'm confused why there's a constructor class at all. Why isn't this the 
constructor function for CodeGenCacheKey?

This should be straightforward:

class CodeGenCacheKey {
  CodeGenCacheKey(std::string& source_content) { /* compute the key and store 
to key_ */ }

  // accessor methods here
  ...
private:
  std::string key_;
}

Looking at how this is initialized in llvm-codegen.cc, looks like it's useful 
to have a default constructor and be able to check it's not empty, so those can 
stay.


http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.h@195
PS5, Line 195:   /// The purpose of it is to prevent the same key 
simultaneously to be inserted
nit: don't need to say "simultaneously" twice. "The purpose of it is to prevent 
the same key to be inserted simultaneously." would suffice.


http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc
File be/src/codegen/llvm-codegen-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc@51
PS5, Line 51:   DCHECK(cache_key != nullptr);
nit: if you want to guarantee a pointer can't be null, pass a reference 
instead. But I'm suggesting to significantly restructure how you're creating 
CodeGenCacheKeys, so this function should go away.


http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc@54
PS5, Line 54:   int len_general_size = 
sizeof(CodeGenCacheKey::GeneralElementSize);
This is the only place GeneralElementSize is used, and the name isn't very 
descriptive. You're really just allocating space for 2 integers. Taking 
sizeof(int) and adding a comment that it's for the key length and content 
length would probably be clearer.


http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc@66
PS5, Line 66:   hashcode = HashUtil::MurmurHash2_64(result.c_str(), length, 0 
/*seed*/);
I think we reduce the randomness in the hash by including 8 null bytes at the 
beginning of it (from the uninitialized hoshcode).


http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc@76
PS5, Line 76:   codegen_cache_hits_ = 
metrics->AddCounter("impala.codegen-cache.hits", 0);
nit: we could do all the rest of this method in the constructor, and probably 
don't need to use unique_ptr and explicit reset as much. But I understand the 
reason for a separate Init function, and once you have that the rest isn't 
terrible.


http://gerrit.cloudera.org:8080/#/c/19181/5/be/src/codegen/llvm-codegen-cache.cc@122
PS5, Line 122:   Slice key = cache_key.hash_code_slice();
My biggest concern is what happens during hash collisions. This patch doesn't 
have any discussion of that.

Are there things we could check - like call signature - to reduce the 
likelihood we try to use the wrong bytecode?

Can we consider using MurmurHash3_x64_128 for a larger hash space?



--
To view, visit http://gerrit.cloudera.org:8080/19181
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1
Gerrit-Change-Number: 19181
Gerrit-PatchSet: 5
Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Comment-Date: Fri, 11 Nov 2022 00:19:07 +0000
Gerrit-HasComments: Yes

Reply via email to