Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg ......................................................................
IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg Since commit d2d3f4c (on asf-master), TAggregateExpr contains the logical input types of the Aggregate Expr. The reason they are included is that merging aggregate expressions will have input tyes of the intermediate values which aren't necessarily the same as the input types. For instance, NDV() uses a binary blob as its intermediate value and it's passed to its merge aggregate expressions as a StringVal but the input type of NDV() in the query could be DecimalVal. In this case, we consider DecimalVal as the logical input type while StringVal is the intermediate type. The logical input types are accessed by the BE via GetConstFnAttr() during interpretation and constant propagation during codegen. To handle distinct aggregate expressions (e.g. select count(distinct)), the FE uses 2-phase aggregation by introducing an extra phase of split/merge aggregation in which the distinct aggregate expressions' inputs are coverted and added to the group-by expressions in the first phase while the non-distinct aggregate expressions go through the normal split/merge treatement. The bug is that the existing code incorrectly propagates the intermediate types of the non-grouping aggregate expressions as the logical input types to the merging aggregate expressions in the second phase of aggregation. The input aggregate expressions for the non-distinct aggregate expressions in the second phase aggregation are already merging aggregate expressions (from phase one) in which case we should not treat its input types as logical input types. This change fixes the problem above by checking if the input aggregate expression passed to FunctionCallExpr.createMergeAggCall() is already a merging aggregate expression. If so, it will use the logical input types recorded in its 'mergeAggInputFn_' as references for its logical input types instead of the aggregate expression input types themselves. Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 Reviewed-on: http://gerrit.cloudera.org:8080/6724 Reviewed-by: Alex Behm <alex.b...@cloudera.com> Tested-by: Impala Public Jenkins --- M be/src/testutil/test-udas.cc M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/query_test/test_udfs.py 5 files changed, 153 insertions(+), 52 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6724 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com>