[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

Tim Armstrong (Code Review) Fri, 21 Oct 2016 13:54:35 -0700

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
......................................................................



Patch Set 4:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/4655/4/be/src/codegen/codegen-anyval.h
File be/src/codegen/codegen-anyval.h:

PS4, Line 242: if (val.is_null) goto null_block;
> Can you please elaborate this comment with code snippet like the following 
Done


http://gerrit.cloudera.org:8080/#/c/4655/4/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

PS4, Line 1065: can ignore the NULL bit of its destination value is
              :     // ignored
> ?
Done


PS4, Line 1067: set
> did you mean with NULL bit cleared ?
No it's set - the value should start off as NULL, since sum() of an empty set 
returns NULL.


Line 1599:     RETURN_IF_ERROR(agg_expr->GetCodegendComputeFn(state_, 
&agg_expr_fn));
> DCHECK(agg_expr_fn != NULL);
Done


PS4, Line 1620: src.CodegenBranchIfNull(&builder, ret_block);
> It seems that we emit this check for all paths except for the UDA path.  Wh
This is deliberate and a correctness fix that is needed to match the 
interpreted path. In general the UDA may do some processing on NULL values so 
we need to pass them in. E.g. see the test UDA I added that counts the # of 
NULLs in anticipation of using this codegen for arbitrary UDAs.

Filtering out src NULLs before calling the UDA is a special case optimisation 
that we can do for aggregate functions where we know the semantics..


PS4, Line 1628: dst_type.type != TYPE_TIMESTAMP
              :       && !dst_type.IsStringType()
> This seems to be reused in multiple places. May be it's worth factoring thi
I factored this out and made it a positive check so that most of the conditions 
are one line now and it's clearer what cases are actually handled.


PS4, Line 1637: dst_type.type != TYPE_TIMESTAMP
              :       && !dst_type.IsStringType()
> It may be easier to read if we do the check once up front:
I think I prefer this way because the implementation is directly adjacent to 
the conditions for when it should be.

E.g. with the alternative factoring, If I'm looking at case agg_op == SUM, I 
have to look in two places and decode a more complex expression to figure out 
what type/agg_op combinations that code is handling.

Also since the UDA interface is the "generic" approach and the others are 
special-case optimisations, so it makes more sense to me to check for the 
special cases then false back to the generic one.

I did some cleanup and added a comment up the top to make it clearer how it 
should be read.


PS4, Line 1676: src.CodegenBranchIfNull(&builder, ret_block);
> Actually, referring to the comment above again, it seems that we should jus
I find it easier to follow this way since the logic for each case is grouped 
together. E.g. all the logic for UpdateSlot() on a UDA fits on a screen, rather 
than having some of NULL handling down here and some 100 lines up.


PS4, Line 1685: resulting in
> with results stored in
Done


PS4, Line 1725: codegen->GetFunction(symbol);
> Should we consider adding "bool clone" as the second argument similar to th
Done.


PS4, Line 1745:     Value* input_lowered_ptr =
              :         
codegen->CreateEntryBlockAlloca(builder->GetInsertBlock()->getParent(),
              :             LlvmCodeGen::NamedVariable("input_lowered_ptr",
              :                                             
input_vals[i].value()->getType()));
              :     builder->CreateStore(input_vals[i].value(), 
input_lowered_ptr);
              :     Type* input_unlowered_ptr_type = 
CodegenAnyVal::GetUnloweredPtrType(
              :         codegen, 
evaluator->input_expr_ctxs()[i]->root()->type());
              :     Value* input_unlowered_ptr = builder->CreateBitCast(
              :         input_lowered_ptr, input_unlowered_ptr_type, 
"input_unlowered_ptr");
> Do you think it's worth factoring this logic out to a function given it's r
There was actually already a helper in CodegenAnyVal that did most of the work. 
Improved/cleaned that up a bit then used it.


http://gerrit.cloudera.org:8080/#/c/4655/4/be/src/exprs/compound-predicates.cc
File be/src/exprs/compound-predicates.cc:

PS4, Line 68:  
> Please consider removing these extra blank spaces too.
Done


http://gerrit.cloudera.org:8080/#/c/4655/4/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

PS4, Line 603:   Value* tuple_bytes = builder->CreateBitCast(tuple, 
codegen->ptr_type());
             :   Constant* null_byte_offset =
             :       ConstantInt::get(codegen->int_type(), 
null_indicator_offset_.byte_offset);
             :   Value* null_byte_ptr =
             :       builder->CreateInBoundsGEP(tuple_bytes, null_byte_offset, 
"null_byte_ptr");
> What do you think about factoring this code snippet as a utility function f
Done. Also now sharing the code with SlotRef.


-- 
To view, visit http://gerrit.cloudera.org:8080/4655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

Reply via email to