Tim Armstrong has posted comments on this change. Change subject: IMPALA-4080, IMPALA-3686: Introduce ExecNode::Codegen() ......................................................................
Patch Set 1: (16 comments) The overall cleanup looks good. Do we have any rough numbers on how long it takes to codegen a few exprs, since this will likely enable codegen for exprs where it didn't previously? Would be helpful to understand the potential performance impact. http://gerrit.cloudera.org:8080/#/c/4651/1//COMMIT_MSG Commit Message: PS1, Line 7: IMPALA-3686 wrong JIRA number: IMPALA-3638 Also it's not clear to me that this fully resolves the limitations with Expr codegen that the JIRA was sort-of tracking. The other big caveat was that only subtrees with ScalarFnCall at the root are codegen'd. Maybe we should just file another JIRA for that issue? Potentially it could be worth fixing it while we're touching the code, since it would be a similar transformation of how Codegen works in Exprs, e.g. having a Expr::Codegen() function that calls GetCodegendComputeFunction(). http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 168: Unnecessary blank line. Line 174: if (state->codegen_enabled()) { It would be great if we could avoid duplicating this check in every Codegen() method. Maybe we should turn the logic around so that we only call Codegen() on each ExecNode if codegen is enabled. It looks like if it's disabled we should just add the exec option "Codegen Disabled by Query Option" or something along those lines. E.g. we could have a ExecNode::CodegenTree() function that walks the tree and either calls Codegen() on each node or add the exec option. Line 833: LlvmCodeGen* codegen = state->codegen(); Maybe we should just pass the codegen object in as an argument? http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 295: runtime_profile()->AddInfoString(HDFS_SPLIT_STATS_DESC, str.str()); Extra blank line. http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/old-hash-table.cc File be/src/exec/old-hash-table.cc: Line 419: LlvmCodeGen* codegen = state->codegen(); IMO we should be passing in the codegen object rather than the state in most (all?) of these cases. http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exprs/literal.cc File be/src/exprs/literal.cc: Line 378: Status Literal::GetCodegendComputeFn(RuntimeState* state, llvm::Function** fn) { I feel like this should take codegen as an argument instead of state. http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exprs/scalar-fn-call.cc File be/src/exprs/scalar-fn-call.cc: PS1, Line 98: UNLIKELY UNLIKELY seems unnecessary since this isn't a hot path. Line 99: return Status(Substitute("Cannot handle IR UDF '$0': ", fn_.name.function_name, This seems like something that would break queries that previously worked - should call it out in the commit message at least. I think running IR UDFs with DISABLE_CODEGEN is a weird enough corner case that it should be ok to break it. We should also make sure that we don't break the IR UDF tests on exhaustive in case they run with that combination of ExecOptions. Line 105: DCHECK(NumFixedArgs() <= 8); How do we avoid hitting this DCHECK? Do we have test coverage for this? Line 136: // TODO: don't do this for child exprs This is an interesting point that we should make sure we fix when we clean up expr codegen. http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/runtime/plan-fragment-executor.cc File be/src/runtime/plan-fragment-executor.cc: Line 194: plan_->Codegen(state); Should we do this after we prepared the sink, instead of in the middle of the prepare phase? Also, should we add a Codegen method to the DataSink interface now so that we have that interface defined too? Or is that a follow-up patch? http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: PS1, Line 257: it has already been called. No-op if the codegen object has already been created. PS1, Line 257: codegen_ maybe don't mention internal state. Instead something like "Create a codegen object that can be accessed via codegen()" PS1, Line 257: This is : /// created on first use. Not sure if the last sentence adds much. http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/service/fe-support.cc File be/src/service/fe-support.cc: Line 89: // Codegen is disabled unless the expression contains external UDF available only This seems inconsistent with the behaviour in ScalarFnCall. Shouldn't we fail the query at this point if codegen is disabled rather than forcing it on? -- To view, visit http://gerrit.cloudera.org:8080/4651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes