Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3686: Introduce ExecNode::Codegen()
......................................................................


Patch Set 1:

(16 comments)

The overall cleanup looks good. 

Do we have any rough numbers on how long it takes to codegen a few exprs, since 
this will likely enable codegen for exprs where it didn't previously? Would be 
helpful to understand the potential performance impact.

http://gerrit.cloudera.org:8080/#/c/4651/1//COMMIT_MSG
Commit Message:

PS1, Line 7:  IMPALA-3686
wrong JIRA number: IMPALA-3638

Also it's not clear to me that this fully resolves the limitations with Expr 
codegen that the JIRA was sort-of tracking. The other big caveat was that only 
subtrees with ScalarFnCall at the root are codegen'd. 

Maybe we should just file another JIRA for that issue? Potentially it could be 
worth fixing it while we're touching the code, since it would be a similar 
transformation of how Codegen works in Exprs, e.g. having a Expr::Codegen() 
function that calls GetCodegendComputeFunction().


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 168: 
Unnecessary blank line.


Line 174:   if (state->codegen_enabled()) {
It would be great if we could avoid duplicating this check in every Codegen() 
method. Maybe we should turn the logic around so that we only call Codegen() on 
each ExecNode if codegen is enabled. It looks like if it's disabled we should 
just add the exec option "Codegen Disabled by Query Option" or something along 
those lines.

E.g. we could have a ExecNode::CodegenTree() function that walks the tree and 
either calls Codegen() on each node or add the exec option.


Line 833:   LlvmCodeGen* codegen = state->codegen();
Maybe we should just pass the codegen object in as an argument?


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 295:   runtime_profile()->AddInfoString(HDFS_SPLIT_STATS_DESC, str.str());
Extra blank line.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/old-hash-table.cc
File be/src/exec/old-hash-table.cc:

Line 419:   LlvmCodeGen* codegen = state->codegen();
IMO we should be passing in the codegen object rather than the state in most 
(all?) of these cases.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exprs/literal.cc
File be/src/exprs/literal.cc:

Line 378: Status Literal::GetCodegendComputeFn(RuntimeState* state, 
llvm::Function** fn) {
I feel like this should take codegen as an argument instead of state.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exprs/scalar-fn-call.cc
File be/src/exprs/scalar-fn-call.cc:

PS1, Line 98: UNLIKELY
UNLIKELY seems unnecessary since this isn't a hot path.


Line 99:       return Status(Substitute("Cannot handle IR UDF '$0': ", 
fn_.name.function_name,
This seems like something that would break queries that previously worked - 
should call it out in the commit message at least. I think running IR UDFs with 
DISABLE_CODEGEN is a weird enough corner case that it should be ok to break it.

We should also make sure that we don't break the IR UDF tests on exhaustive in 
case they run with that combination of ExecOptions.


Line 105:     DCHECK(NumFixedArgs() <= 8);
How do we avoid hitting this DCHECK? Do we have test coverage for this?


Line 136:     // TODO: don't do this for child exprs
This is an interesting point that we should make sure we fix when we clean up 
expr codegen.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/runtime/plan-fragment-executor.cc
File be/src/runtime/plan-fragment-executor.cc:

Line 194:   plan_->Codegen(state);
Should we do this after we prepared the sink, instead of in the middle of the 
prepare phase?

Also, should we add a Codegen method to the DataSink interface now so that we 
have that interface defined too? Or is that a follow-up patch?


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 257: it has already been called.
No-op if the codegen object has already been created.


PS1, Line 257: codegen_
maybe don't mention internal state. Instead something like "Create a codegen 
object that can be accessed via codegen()"


PS1, Line 257:  This is
             :   /// created on first use.
Not sure if the last sentence adds much.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

Line 89:   // Codegen is disabled unless the expression contains external UDF 
available only
This seems inconsistent with the behaviour in ScalarFnCall. Shouldn't we fail 
the query at this point if codegen is disabled rather than forcing it on?


-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to