[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen

2023-08-23 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20211 )

Change subject: IMPALA-12253: Improve the integration of codegen cache and 
async codegen
..


Patch Set 2:

I can't see that 'bool disable_codegen_cache_' be/src/runtime/query-state.h is 
ever set to anything but 'false', maybe we can remove that variable?


--
To view, visit http://gerrit.cloudera.org:8080/20211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908
Gerrit-Change-Number: 20211
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Wed, 23 Aug 2023 14:07:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen

2023-08-18 Thread Michael Smith (Code Review)
Michael Smith has removed a vote on this change.

Change subject: IMPALA-12253: Improve the integration of codegen cache and 
async codegen
..


Removed Code-Review+1 by Michael Smith 
--
To view, visit http://gerrit.cloudera.org:8080/20211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908
Gerrit-Change-Number: 20211
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen

2023-08-18 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20211 )

Change subject: IMPALA-12253: Improve the integration of codegen cache and 
async codegen
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/codegen/llvm-codegen.cc@1384
PS1, Line 1384: DCHECK(cache_key_);
> '!cache_key_->empty()' is not checked in the new version, is it?
Good catch, I was unclear what part I was referring to and missed the removal 
of ->empty on review.



--
To view, visit http://gerrit.cloudera.org:8080/20211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908
Gerrit-Change-Number: 20211
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 18 Aug 2023 19:56:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen

2023-08-17 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20211 )

Change subject: IMPALA-12253: Improve the integration of codegen cache and 
async codegen
..


Patch Set 2:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG@9
PS2, Line 9: the
Nit: "the" not needed.


http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG@10
PS2, Line 10: the
Nit: "the" not needed.


http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG@11
PS2, Line 11:
Nit: is enabled


http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG@13
PS2, Line 13: the
Nit: "the" not needed.


http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG@18
PS2, Line 18: reducing
eliminating?


http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG@18
PS2, Line 18: of
Nit: for?


http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG@19
PS2, Line 19: if cache exists
"if an appropriate cache entry exists" could be better.


http://gerrit.cloudera.org:8080/#/c/20211/2//COMMIT_MSG@21
PS2, Line 21: Did the perf a-b test with async codegen and codegen cache 
enabled,
Shouldn't the performance improve?


http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen-cache-test.cc
File be/src/codegen/llvm-codegen-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen-cache-test.cc@599
PS2, Line 599: NULL
Nit: 'nullptr' would be better.


http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen-cache-test.cc@608
PS2, Line 608: NULL
Nit: 'nullptr' would be better.


http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.h@351
PS2, Line 351: PrepareFinalizeModule() must be called before FinalizeModule()
This should come in the second sentence, after "This should be called after all 
functions to JIT have been added to the module via AddFunctionToJit() ..."


http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.h@897
PS2, Line 897: be
Nit: no need for "be".


http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.h@910
PS2, Line 910:   /// If true, the module has been prepared and is ready for 
compilation.
Could add that it is set in PrepareFinalizeModule().


http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/codegen/llvm-codegen.cc@1384
PS1, Line 1384: DCHECK(cache_key_);
> Done
'!cache_key_->empty()' is not checked in the new version, is it?

Also, I think 'cache_key_ != nullptr' is more readable as it's more explicit. 
I, at least, have always used this longer form in Impala code.


http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.cc@1277
PS2, Line 1277:   DCHECK(!is_prepared_);
We could also keep "DCHECK(!is_compiled_);" just in case.


http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.cc@1279
PS2, Line 1279:   DCHECK(can_return != nullptr);
I'd put it after the previous DCHECK(s) before setting 'is_prepared_', so all 
precondition checking is done first.


http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.cc@1340
PS2, Line 1340: DCHECK(cache_key_);
I don't think this checks that 'cache_key_' is not empty, CodeGenCacheKey 
doesn't have operator bool().
This will only check that 'cache_key_' does not hold a NULL pointer, which is 
never the case because of L1327.


http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.cc@1382
PS2, Line 1382:   bool codegen_cache_enabled = state_->codegen_cache_enabled() 
&& codegen_cache_enabled_;
In PrepareFinalizeModule() you didn't extract it into a variable - I think both 
approaches are ok but it's better if it's the same in both functions.


http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/codegen/llvm-codegen.cc@1384
PS2, Line 1384: DCHECK(cache_key_);
See comment on L1340.
If you only want to make sure it's not NULL, my opinion is that writing 
'cache_key_ != nullptr' explicitly is better.


http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/runtime/fragment-state.cc
File be/src/runtime/fragment-state.cc:

http://gerrit.cloudera.org:8080/#/c/20211/2/be/src/runtime/fragment-state.cc@131
PS2, Line 131:   Status status = 
llvm_codegen->PrepareFinalizeModule(&can_return);
With this change, the work performed by PrepareFinalizeModule() is always done 
synchronously even if codegen caching is disabled.

Separating PrepareFinalizeModule() (and running it synchronously) is only 
useful for performance if codegen caching is enabled.

This could cause a performance regression when codegen caching is

[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen

2023-08-08 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20211 )

Change subject: IMPALA-12253: Improve the integration of codegen cache and 
async codegen
..


Patch Set 2:

Thanks Michael for the review. Passed exhaustive tests.


--
To view, visit http://gerrit.cloudera.org:8080/20211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908
Gerrit-Change-Number: 20211
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Tue, 08 Aug 2023 19:39:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen

2023-08-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20211 )

Change subject: IMPALA-12253: Improve the integration of codegen cache and 
async codegen
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13725/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908
Gerrit-Change-Number: 20211
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Tue, 08 Aug 2023 18:33:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen

2023-08-08 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/20211 )

Change subject: IMPALA-12253: Improve the integration of codegen cache and 
async codegen
..

IMPALA-12253: Improve the integration of codegen cache and async codegen

To improve the integration of codegen cache and async codegen, the
seperate the FinalizeModule() into two parts, PrepareFinalizeModule()
and FinalizeModule(). If codegen cache enabled,
PrepareFinalizeModule() would generate the key of the module, so that
if there is a cache hit, we could skip the FinalizeModule() no matter
in async or sync. Otherwise, if the cache misses, we would keep going
and call FinalizeModule() to finish the compilation of the module.

This helps the codegen cache to have the advantage of async codegen
for the first visit while reducing the need of creating new threads
for async codegen if cache exists.

Did the perf a-b test with async codegen and codegen cache enabled,
didn't see any significant performance regression.

Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/llvm-codegen-cache-test.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exprs/expr-codegen-test.cc
M be/src/runtime/fragment-state.cc
M be/src/runtime/test-env.cc
M be/src/service/fe-support.cc
10 files changed, 125 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/20211/2
--
To view, visit http://gerrit.cloudera.org:8080/20211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908
Gerrit-Change-Number: 20211
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen

2023-07-18 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20211 )

Change subject: IMPALA-12253: Improve the integration of codegen cache and 
async codegen
..


Patch Set 1:

(5 comments)

Overall makes sense.

http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/codegen/llvm-codegen.cc@1384
PS1, Line 1384: DCHECK(cache_key_ != nullptr && !cache_key_->empty());
nit: https://en.cppreference.com/w/cpp/memory/unique_ptr/operator_bool is 
equivalent.


http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/codegen/llvm-codegen.cc@1386
PS1, Line 1386: store_cache_status = StoreCache(*(cache_key_.get()));
nit: https://en.cppreference.com/w/cpp/memory/unique_ptr/operator* is 
equivalent. Also applies to line 1341.


http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/runtime/fragment-state.cc
File be/src/runtime/fragment-state.cc:

http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/runtime/fragment-state.cc@128
PS1, Line 128:   bool can_return = false;
Why initialize can_return here but not in fe-support.cc? Seems inconsistent.


http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/runtime/fragment-state.cc@139
PS1, Line 139: 
RETURN_IF_ERROR(llvm_codegen->FinalizeModuleAsync(event_sequence));
Is there some sort of test we could add that FinalizeModule and 
FinalizeModuleAsync are skipped on a cache hit?


http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/20211/1/be/src/service/fe-support.cc@247
PS1, Line 247:   status = codegen->FinalizeModule();
Maybe DCHECK(status.ok()) here.



--
To view, visit http://gerrit.cloudera.org:8080/20211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908
Gerrit-Change-Number: 20211
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Tue, 18 Jul 2023 18:20:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen

2023-07-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20211 )

Change subject: IMPALA-12253: Improve the integration of codegen cache and 
async codegen
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13565/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908
Gerrit-Change-Number: 20211
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Tue, 18 Jul 2023 01:48:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen

2023-07-17 Thread Yida Wu (Code Review)
Yida Wu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20211


Change subject: IMPALA-12253: Improve the integration of codegen cache and 
async codegen
..

IMPALA-12253: Improve the integration of codegen cache and async codegen

To improve the integration of codegen cache and async codegen, the
seperate the FinalizeModule() into two parts, PrepareFinalizeModule()
and FinalizeModule(). If codegen cache enabled,
PrepareFinalizeModule() would generate the key of the module, so that
if there is a cache hit, we could skip the FinalizeModule() no matter
in async or sync. Otherwise, if the cache misses, we would keep going
and call FinalizeModule() to finish the compilation of the module.

This helps the codegen cache to have the advantage of async codegen
for the first visit while reducing the need of creating new threads
for async codegen if cache exists.

Did the perf a-b test with async codegen and codegen cache enabled,
didn't see any significant performance regression.

Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/llvm-codegen-cache-test.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exprs/expr-codegen-test.cc
M be/src/runtime/fragment-state.cc
M be/src/service/fe-support.cc
9 files changed, 80 insertions(+), 19 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/20211/1
--
To view, visit http://gerrit.cloudera.org:8080/20211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908
Gerrit-Change-Number: 20211
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Michael Smith