Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )
Change subject: IMPALA-11205: Implement CORR() function ...................................................................... Patch Set 7: (16 comments) The patch is looking good now! http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@291 PS7, Line 291: // nit: add a space after "//" http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@320 PS7, Line 320: if nit: need a space after "if" http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@321 PS7, Line 321: state->sumx += src1.val; : state->sumy += src2.val; : state->sum_squaredx += src1.val * src1.val; : state->sum_squaredy += src2.val * src2.val; : state->prod += src1.val * src2.val; : ++state->count; We can extract these into an update method, e.g. static inline void CorrUpdate(double x, double y, CorrState* state) and then use it in TimestampCorrUpdate() as well. http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@336 PS7, Line 336: if nit: need a space after "if" http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@337 PS7, Line 337: state->sumx -= src1.val; : state->sumy -= src2.val; : state->sum_squaredx -= src1.val * src1.val; : state->sum_squaredy -= src2.val * src2.val; : state->prod -= src1.val * src2.val; : --state->count; We can extract these into a method too. http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@355 PS7, Line 355: if nit: need a space after "if" http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@376 PS7, Line 376: if nit: need a space after if http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@401 PS7, Line 401: ctx->Free(src.ptr); We should not free it here since we still use 'state' at following lines. I think we should free it before each 'return' statement. Another solution is getting its values into local variables and then free it. http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@408 PS7, Line 408: if( nit: add a space after "if" http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions.h File be/src/exprs/aggregate-functions.h: http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions.h@74 PS7, Line 74: static const StringVal CorrSerialize(FunctionContext* ctx, const StringVal& src); We can remove this http://gerrit.cloudera.org:8080/#/c/18413/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/18413/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1322 PS7, Line 1322: Corr nit: Add a space after "//" http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test File testdata/workloads/functional-query/queries/QueryTest/uda.test: http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@10 PS7, Line 10: select corr(ps_availqty,ps_supplycost) from tpch.partsupp; Sorry to be back and forth, but we'd better move these tests to testdata/workloads/functional-query/queries/QueryTest/aggregation.test All the builtin aggregate functions are tested there. http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@24 PS7, Line 24: select corr(d,e) from functional.nulltable; Do we have test coverage on only one side is NULL? http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@41 PS7, Line 41: select corr(utctime,localtime) from functional.alltimezones; Cool! http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@48 PS7, Line 48: select corr(int_col, int_col) Could you also add the 'id' column here? Also please comment why the results are NULLs. http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@69 PS7, Line 69: select corr(double_col,double_col) Could you add the 'year' column here? -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward <pranav.lo...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <pranav.lo...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Comment-Date: Tue, 26 Apr 2022 07:51:17 +0000 Gerrit-HasComments: Yes