[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20211 ) Change subject: IMPALA-12253: Improve the integration of codegen cache and async codegen .. Patch Set 2: I can't see that 'bool disable_codegen_cache_' be/src/runtime/query-state.h is ever set to anything but 'false', maybe we can remove that variable? -- To view, visit http://gerrit.cloudera.org:8080/20211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908 Gerrit-Change-Number: 20211 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 23 Aug 2023 14:07:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen
Michael Smith has removed a vote on this change. Change subject: IMPALA-12253: Improve the integration of codegen cache and async codegen .. Removed Code-Review+1 by Michael Smith -- To view, visit http://gerrit.cloudera.org:8080/20211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908 Gerrit-Change-Number: 20211 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20211 ) Change subject: IMPALA-12253: Improve the integration of codegen cache and async codegen .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/codegen/llvm-codegen.cc@1384 PS1, Line 1384: DCHECK(cache_key_); > '!cache_key_->empty()' is not checked in the new version, is it? Good catch, I was unclear what part I was referring to and missed the removal of ->empty on review. -- To view, visit http://gerrit.cloudera.org:8080/20211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908 Gerrit-Change-Number: 20211 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 18 Aug 2023 19:56:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20211 ) Change subject: IMPALA-12253: Improve the integration of codegen cache and async codegen .. Patch Set 2: (21 comments) http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG@9 PS2, Line 9: the Nit: "the" not needed. http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG@10 PS2, Line 10: the Nit: "the" not needed. http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG@11 PS2, Line 11: Nit: is enabled http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG@13 PS2, Line 13: the Nit: "the" not needed. http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG@18 PS2, Line 18: reducing eliminating? http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG@18 PS2, Line 18: of Nit: for? http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG@19 PS2, Line 19: if cache exists "if an appropriate cache entry exists" could be better. http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG@21 PS2, Line 21: Did the perf a-b test with async codegen and codegen cache enabled, Shouldn't the performance improve? http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen-cache-test.cc@599 PS2, Line 599: NULL Nit: 'nullptr' would be better. http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen-cache-test.cc@608 PS2, Line 608: NULL Nit: 'nullptr' would be better. http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.h@351 PS2, Line 351: PrepareFinalizeModule() must be called before FinalizeModule() This should come in the second sentence, after "This should be called after all functions to JIT have been added to the module via AddFunctionToJit() ..." http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.h@897 PS2, Line 897: be Nit: no need for "be". http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.h@910 PS2, Line 910: /// If true, the module has been prepared and is ready for compilation. Could add that it is set in PrepareFinalizeModule(). http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/codegen/llvm-codegen.cc@1384 PS1, Line 1384: DCHECK(cache_key_); > Done '!cache_key_->empty()' is not checked in the new version, is it? Also, I think 'cache_key_ != nullptr' is more readable as it's more explicit. I, at least, have always used this longer form in Impala code. http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.cc@1277 PS2, Line 1277: DCHECK(!is_prepared_); We could also keep "DCHECK(!is_compiled_);" just in case. http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.cc@1279 PS2, Line 1279: DCHECK(can_return != nullptr); I'd put it after the previous DCHECK(s) before setting 'is_prepared_', so all precondition checking is done first. http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.cc@1340 PS2, Line 1340: DCHECK(cache_key_); I don't think this checks that 'cache_key_' is not empty, CodeGenCacheKey doesn't have operator bool(). This will only check that 'cache_key_' does not hold a NULL pointer, which is never the case because of L1327. http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.cc@1382 PS2, Line 1382: bool codegen_cache_enabled = state_->codegen_cache_enabled() && codegen_cache_enabled_; In PrepareFinalizeModule() you didn't extract it into a variable - I think both approaches are ok but it's better if it's the same in both functions. http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.cc@1384 PS2, Line 1384: DCHECK(cache_key_); See comment on L1340. If you only want to make sure it's not NULL, my opinion is that writing 'cache_key_ != nullptr' explicitly is better. http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/runtime/fragment-state.cc File be/src/runtime/fragment-state.cc: http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/runtime/fragment-state.cc@131 PS2, Line 131: Status status = llvm_codegen->PrepareFinalizeModule(&can_return); With this change, the work performed by PrepareFinalizeModule() is always done synchronously even if codegen caching is disabled. Separating PrepareFinalizeModule() (and running it synchronously) is only useful for performance if codegen caching is enabled. This could cause a performance regression when codegen caching is
[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20211 ) Change subject: IMPALA-12253: Improve the integration of codegen cache and async codegen .. Patch Set 2: Thanks Michael for the review. Passed exhaustive tests. -- To view, visit http://gerrit.cloudera.org:8080/20211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908 Gerrit-Change-Number: 20211 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 08 Aug 2023 19:39:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20211 ) Change subject: IMPALA-12253: Improve the integration of codegen cache and async codegen .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13725/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908 Gerrit-Change-Number: 20211 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Tue, 08 Aug 2023 18:33:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen
Yida Wu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/20211 ) Change subject: IMPALA-12253: Improve the integration of codegen cache and async codegen .. IMPALA-12253: Improve the integration of codegen cache and async codegen To improve the integration of codegen cache and async codegen, the seperate the FinalizeModule() into two parts, PrepareFinalizeModule() and FinalizeModule(). If codegen cache enabled, PrepareFinalizeModule() would generate the key of the module, so that if there is a cache hit, we could skip the FinalizeModule() no matter in async or sync. Otherwise, if the cache misses, we would keep going and call FinalizeModule() to finish the compilation of the module. This helps the codegen cache to have the advantage of async codegen for the first visit while reducing the need of creating new threads for async codegen if cache exists. Did the perf a-b test with async codegen and codegen cache enabled, didn't see any significant performance regression. Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908 --- M be/src/benchmarks/expr-benchmark.cc M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exprs/expr-codegen-test.cc M be/src/runtime/fragment-state.cc M be/src/runtime/test-env.cc M be/src/service/fe-support.cc 10 files changed, 125 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/20211/2 -- To view, visit http://gerrit.cloudera.org:8080/20211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908 Gerrit-Change-Number: 20211 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20211 ) Change subject: IMPALA-12253: Improve the integration of codegen cache and async codegen .. Patch Set 1: (5 comments) Overall makes sense. http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/codegen/llvm-codegen.cc@1384 PS1, Line 1384: DCHECK(cache_key_ != nullptr && !cache_key_->empty()); nit: https://en.cppreference.com/w/cpp/memory/unique_ptr/operator_bool is equivalent. http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/codegen/llvm-codegen.cc@1386 PS1, Line 1386: store_cache_status = StoreCache(*(cache_key_.get())); nit: https://en.cppreference.com/w/cpp/memory/unique_ptr/operator* is equivalent. Also applies to line 1341. http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/runtime/fragment-state.cc File be/src/runtime/fragment-state.cc: http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/runtime/fragment-state.cc@128 PS1, Line 128: bool can_return = false; Why initialize can_return here but not in fe-support.cc? Seems inconsistent. http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/runtime/fragment-state.cc@139 PS1, Line 139: RETURN_IF_ERROR(llvm_codegen->FinalizeModuleAsync(event_sequence)); Is there some sort of test we could add that FinalizeModule and FinalizeModuleAsync are skipped on a cache hit? http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/service/fe-support.cc File be/src/service/fe-support.cc: http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/service/fe-support.cc@247 PS1, Line 247: status = codegen->FinalizeModule(); Maybe DCHECK(status.ok()) here. -- To view, visit http://gerrit.cloudera.org:8080/20211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908 Gerrit-Change-Number: 20211 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Tue, 18 Jul 2023 18:20:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20211 ) Change subject: IMPALA-12253: Improve the integration of codegen cache and async codegen .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13565/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908 Gerrit-Change-Number: 20211 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Tue, 18 Jul 2023 01:48:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen
Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20211 Change subject: IMPALA-12253: Improve the integration of codegen cache and async codegen .. IMPALA-12253: Improve the integration of codegen cache and async codegen To improve the integration of codegen cache and async codegen, the seperate the FinalizeModule() into two parts, PrepareFinalizeModule() and FinalizeModule(). If codegen cache enabled, PrepareFinalizeModule() would generate the key of the module, so that if there is a cache hit, we could skip the FinalizeModule() no matter in async or sync. Otherwise, if the cache misses, we would keep going and call FinalizeModule() to finish the compilation of the module. This helps the codegen cache to have the advantage of async codegen for the first visit while reducing the need of creating new threads for async codegen if cache exists. Did the perf a-b test with async codegen and codegen cache enabled, didn't see any significant performance regression. Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908 --- M be/src/benchmarks/expr-benchmark.cc M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exprs/expr-codegen-test.cc M be/src/runtime/fragment-state.cc M be/src/service/fe-support.cc 9 files changed, 80 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/20211/1 -- To view, visit http://gerrit.cloudera.org:8080/20211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908 Gerrit-Change-Number: 20211 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Michael Smith