Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions ......................................................................
Patch Set 9: (8 comments) Rebased to pick up IMPALA-3884 http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/codegen/codegen-anyval.cc File be/src/codegen/codegen-anyval.cc: PS7, Line 459: > that Done http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/codegen/codegen-anyval.h File be/src/codegen/codegen-anyval.h: Line 184: /// lowered type. This *Val should be non-null. The output variable is called 'name'. > nit: space Done http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: PS7, Line 347: : /// TODO: Return Status inst > Avoid cloning if possible to reduce compilation time. Done http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS7, Line 1660: ond, the : // value of the slot was initialized in the right way in InitAggSlots() (e.g. 0 for : // SUM) that we get the right result if UpdateSlot() pretends that the NULL bit of : // 'dst' is unse > Please also see comments above. I tried to clean this up a bit. My intention was to document the details of the initialisation in InitAggSlot() and provide a pointer here. PS7, Line 1675: dst.SetIsNull(slot_desc->CodegenIsNull(codegen, &builder, agg_tuple_arg)); > Why is this not in the old code or was it done somewhere else ? It wasn't in the old code - the old code didn't handle arbitrary UDAs, just a subset of the builtins. E.g. the old code would have given wrong results if we changed SUM() so that it returned NULL on overflow. http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/exec/partitioned-aggregation-node.h File be/src/exec/partitioned-aggregation-node.h: PS7, Line 649: 'dst_val' should contain the previous value of the aggregate : /// function, and 'updated_dst_val' is set to the new value after the Update or Merge : /// operation is applied. > Just curious why we separate 'dst_val' and 'result_val'. Doesn't 'result_va We need two different CodegenAnyVal, since it really represents the value (which changes) rather than the memory location. We could use an in-out argument instead of separate input and output arguments but this seemed simpler to me. I renamed 'result_val' to 'updated_dst_val' to hopefully make the connection clearer (the naming was confusing). http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/exprs/scalar-fn-call.cc File be/src/exprs/scalar-fn-call.cc: PS7, Line 496: false > Please feel free to ignore but you can consider setting this second argumen It seems clearer to make it explicit, given it's not obvious what the "default" behaviour should be. http://gerrit.cloudera.org:8080/#/c/4655/7/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: Line 141: # Verify codegen was enabled for all stages of the aggregation. > Can remove this restriction now that IMPALA-3884 is in master Done -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes