[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 10: (2 comments) http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@296 PS10, Line 296: double prod; > AFAIK, a similar approach is used in hive and other aggregate functions of Seems ok to stick with double as other databases do this and there does not appear to be a standard. I would recommend comparing results with postgres including cases that are likely to lose precision mixing small and large numbers. http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1539 PS10, Line 1539: > Sure, I'll include those. Also test empty table and all zero values cases. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 05 May 2022 17:54:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 10: (2 comments) > Patch Set 10: > > (1 comment) > > > Patch Set 10: > > > > (4 comments) > > > > I would prefer to see the covar_pop and covar_samp functions implemented in > > the same patch as well. http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@317 PS10, Line 317: state->sumx += x; > AFAIK, Hive and other aggregate functions of impala like avg() aren't check I'll add overflow handling http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@389 PS10, Line 389: dst_state->sumx += src_state->sumx; > AFAIK, Hive and other aggregate functions of impala like avg() aren't check I'll add overflow handling -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 02 May 2022 11:33:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 10: (1 comment) > Patch Set 10: > > (4 comments) > > I would prefer to see the covar_pop and covar_samp functions implemented in > the same patch as well. http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1539 PS10, Line 1539: > Add OLAP partition/window testcases. If not supported, they can be negative Sure, I'll include those. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 29 Apr 2022 07:37:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 10: > Patch Set 10: > > Sure, I'll include those as well. I mean the OLAP partition tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 29 Apr 2022 07:06:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 10: Sure, I'll include those as well. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 29 Apr 2022 07:04:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 10: (3 comments) > Patch Set 10: > > (4 comments) > > I would prefer to see the covar_pop and covar_samp functions implemented in > the same patch as well. Since this jira was focused on corr(), should we change its description or create a new jira for covariance functions? http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@296 PS10, Line 296: double prod; > These and any other computation variables may need to be precise types if t AFAIK, a similar approach is used in hive and other aggregate functions of impala, that's why we kept it double. Double supports precision types like Decimal. There's a test included as well just to check this behaviour. http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@317 PS10, Line 317: state->sumx += x; > Handle overflow? AFAIK, Hive and other aggregate functions of impala like avg() aren't checking explicitly for overflow. Do you want any specific check or behavior in case of overflow, probably returning null or giving an error? http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@389 PS10, Line 389: dst_state->sumx += src_state->sumx; > Handle overflow? AFAIK, Hive and other aggregate functions of impala like avg() aren't checking explicitly for overflow. Do you want any specific check or behavior in case of overflow, probably returning null or giving an error? -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 29 Apr 2022 07:03:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 10: (4 comments) I would prefer to see the covar_pop and covar_samp functions implemented in the same patch as well. http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@296 PS10, Line 296: double prod; These and any other computation variables may need to be precise types if the output type is precise. http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@317 PS10, Line 317: state->sumx += x; Handle overflow? http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@389 PS10, Line 389: dst_state->sumx += src_state->sumx; Handle overflow? http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1539 PS10, Line 1539: Add OLAP partition/window testcases. If not supported, they can be negative tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 27 Apr 2022 12:16:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 10: Code-Review+1 (5 comments) LGTM. I'll see if other guys want to review this. http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@339 PS10, Line 339: nit: remove the space http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@350 PS10, Line 350: nit: remove the space http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@364 PS10, Line 364: nit: fix the indention http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@365 PS10, Line 365: nit: remove blank lines http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/uda.test File testdata/workloads/functional-query/queries/QueryTest/uda.test: http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/uda.test@180 PS10, Line 180: nit: could you revert the changes in this file? -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 27 Apr 2022 09:39:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10496/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 27 Apr 2022 08:58:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10495/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 9 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 27 Apr 2022 08:52:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 10: (14 comments) > Patch Set 7: > > (16 comments) > > The patch is looking good now! http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@291 PS7, Line 291: // > nit: add a space after "//" Done http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@320 PS7, Line 320: st > nit: need a space after "if" Done http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@321 PS7, Line 321: state->prod += x * y; : ++state->count; : } : : static inline void CorrRemoveState(double x, double y, CorrState* state){ : state->sumx -= x; > We can extract these into an update method, e.g. Done http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@336 PS7, Line 336: if > nit: need a space after "if" Done http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@337 PS7, Line 337: CorrState* state = reinterpret_cast(dst->ptr); : if (!isnan(src1.val) && !isnan(src2.val)) { : CorrUpdateState(src1.val, src2.val , state); : } : } : > We can extract these into a method too. Done http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@355 PS7, Line 355: co > nit: need a space after "if" Done http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@376 PS7, Line 376: ns > nit: need a space after if Done http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@401 PS7, Line 401: return DoubleVal::null(); > We should not free it here since we still use 'state' at following lines. I Approach 1 worked so implemented that http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@408 PS7, Line 408: if > nit: add a space after "if" Done http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test File testdata/workloads/functional-query/queries/QueryTest/uda.test: http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@10 PS7, Line 10: select test_count(int_col) from functional.alltypestiny; > Sorry to be back and forth, but we'd better move these tests to testdata/wo Sure, done http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@24 PS7, Line 24: select sum_small_decimal(c3) from functional.decimal_tiny; > Do we have test coverage on only one side is NULL? Yes, added http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@41 PS7, Line 41: TYPES > Cool! Ack http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@48 PS7, Line 48: RESULTS > Could you also add the 'id' column here? Done http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@69 PS7, Line 69: TYPES > Could you add the 'year' column here? Done -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 27 Apr 2022 08:41:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
pranav.lo...@cloudera.com has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. IMPALA-11205: Implement CORR() function CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. This UDAF is tested with a few query tests written in aggregation.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test M testdata/workloads/functional-query/queries/QueryTest/uda.test 5 files changed, 274 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/10 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
pranav.lo...@cloudera.com has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. IMPALA-11205: Implement CORR() function CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. This UDAF is tested with a few query tests written in uda.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test M testdata/workloads/functional-query/queries/QueryTest/uda.test 5 files changed, 274 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/9 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 9 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 7: (16 comments) The patch is looking good now! http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@291 PS7, Line 291: // nit: add a space after "//" http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@320 PS7, Line 320: if nit: need a space after "if" http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@321 PS7, Line 321: state->sumx += src1.val; : state->sumy += src2.val; : state->sum_squaredx += src1.val * src1.val; : state->sum_squaredy += src2.val * src2.val; : state->prod += src1.val * src2.val; : ++state->count; We can extract these into an update method, e.g. static inline void CorrUpdate(double x, double y, CorrState* state) and then use it in TimestampCorrUpdate() as well. http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@336 PS7, Line 336: if nit: need a space after "if" http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@337 PS7, Line 337: state->sumx -= src1.val; : state->sumy -= src2.val; : state->sum_squaredx -= src1.val * src1.val; : state->sum_squaredy -= src2.val * src2.val; : state->prod -= src1.val * src2.val; : --state->count; We can extract these into a method too. http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@355 PS7, Line 355: if nit: need a space after "if" http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@376 PS7, Line 376: if nit: need a space after if http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@401 PS7, Line 401: ctx->Free(src.ptr); We should not free it here since we still use 'state' at following lines. I think we should free it before each 'return' statement. Another solution is getting its values into local variables and then free it. http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@408 PS7, Line 408: if( nit: add a space after "if" http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions.h File be/src/exprs/aggregate-functions.h: http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions.h@74 PS7, Line 74: static const StringVal CorrSerialize(FunctionContext* ctx, const StringVal& src); We can remove this http://gerrit.cloudera.org:8080/#/c/18413/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/18413/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1322 PS7, Line 1322: Corr nit: Add a space after "//" http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test File testdata/workloads/functional-query/queries/QueryTest/uda.test: http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@10 PS7, Line 10: select corr(ps_availqty,ps_supplycost) from tpch.partsupp; Sorry to be back and forth, but we'd better move these tests to testdata/workloads/functional-query/queries/QueryTest/aggregation.test All the builtin aggregate functions are tested there. http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@24 PS7, Line 24: select corr(d,e) from functional.nulltable; Do we have test coverage on only one side is NULL? http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@41 PS7, Line 41: select corr(utctime,localtime) from functional.alltimezones; Cool! http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@48 PS7, Line 48: select corr(int_col, int_col) Could you also add the 'id' column here? Also please comment why the results are NULLs. http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@69 PS7, Line 69: select corr(double_col,double_col) Could you add the 'year' column here? -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerr
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10487/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 25 Apr 2022 10:08:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
pranav.lo...@cloudera.com has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. IMPALA-11205: Implement CORR() function CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. This UDAF is tested with a few query tests written in uda.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/uda.test 4 files changed, 250 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/7 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@366 PS4, Line 366: return result; > Yeah, we can remove it, it'll get cleared after the function pops from the Oh, I think we need this. Otherwise we'll see warnings like WARNINGS: UDF WARNING: Memory leaked via FunctionContext::Allocate() Look at other udas, they use StringValSerializeOrFinalize() for such simple purpose, i.e. copy and free. I think we don't need this CorrSerialize() method, just do the same as other udas, e.g. https://github.com/apache/impala/blob/1358700740dbeff799f6a6a95f95b1d9fe7281d2/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java#L1376 http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@373 PS4, Line 373: // Calculating correlation coefficient using Pearson Product Moment Correlation formula > Yeah, we can remove it. I intended to remove it, but probably forgot. Based on the above comment, I think we do need this. http://gerrit.cloudera.org:8080/#/c/18413/5/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/5/be/src/exprs/aggregate-functions-ir.cc@372 PS5, Line 372: if (state->count == 0 || state->count == 1) return DoubleVal::null(); Can we add test cases for these? http://gerrit.cloudera.org:8080/#/c/18413/5/be/src/exprs/aggregate-functions-ir.cc@374 PS5, Line 374: ((state->prod*state->count) - (state->sumx*state->sumy)) / : (powstate->count*state->sum_squaredx) - : (state->sumx*state->sumx)) * ((state->count*state->sum_squaredy) - :(state->sumy*state->sumy))), 0.5)); Can we format this a little bit? There are too many brackets. Maybe this is better: int64_t n = state->count; double r1 = n * state->prod - state->sumx * state->sumy; double r2 = (n * state->sum_squaredx - state->sumx * state->sumx) * (n * state->sum_squaredy - state->sumy * state->sumy); double corr = r1 / sqrt(r2); Note that we can replace pow(x, 0.5) with sqrt(x). We should also check if r2 is 0 and return NULL in that case. Hive does it this way: https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java#L366 http://gerrit.cloudera.org:8080/#/c/18413/5/testdata/workloads/functional-query/queries/QueryTest/uda.test File testdata/workloads/functional-query/queries/QueryTest/uda.test: http://gerrit.cloudera.org:8080/#/c/18413/5/testdata/workloads/functional-query/queries/QueryTest/uda.test@19 PS5, Line 19: NULL Is this correct? In Hive, it's 1.0 http://gerrit.cloudera.org:8080/#/c/18413/5/testdata/workloads/functional-query/queries/QueryTest/uda.test@26 PS5, Line 26: corr(cast(timestamp_col as int),cast(timestamp_col as int)) Hive doesn't need to cast timestamp in the query. Can we improve this? -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 5 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 21 Apr 2022 03:27:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 5: (9 comments) > Patch Set 4: > > (9 comments) http://gerrit.cloudera.org:8080/#/c/18413/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18413/4//COMMIT_MSG@10 PS4, Line 10: t > nit: remove the leading space Done http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@292 PS4, Line 292: // type and returns the correlation coefficient between them. > Could you comment the formula here? Done http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@359 PS4, Line 359: dst_state->count += src_state->count; > nit: remove this blank line Done http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@366 PS4, Line 366: return result; > Could you explain why we need to free it here? It's not allocated by this f Yeah, we can remove it, it'll get cleared after the function pops from the stack trace so its not needed. http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@373 PS4, Line 373: // Calculating correlation coefficient using Pearson Product Moment Correlation formula > Why do we free it here? We still use the CorrState below. Yeah, we can remove it. I intended to remove it, but probably forgot. http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@375 PS4, Line 375: (powstate->count*state->sum_squaredx) - : (state->sumx*state->sumx)) * ((state->count*state->sum_squaredy) - :(state->sumy*state->sumy))), 0.5)); : return DoubleVal(corr); : } > I just realized that we are using a different formula than Hive's: Yes, they're equivalent. Both are using Pearson's correlation coefficient itself. In hive, there are functions for covariance and standard deviation, so the formula might look a little different. But on simplifying its the same. http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test File testdata/workloads/functional-query/queries/QueryTest/uda.test: http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@12 PS4, Line 12: 0.000321849166368 > I checked the test codes and found that it already handles rounding inaccur Done http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@15 PS4, Line 15: > Can we add tests on NULL values? Some tpcds tables contain NULLs, e.g. cata Done http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@32 PS4, Line 32: > Could you add tests for other types as well? E.g. Sure, done! -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 5 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 18 Apr 2022 09:23:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10461/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 5 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 18 Apr 2022 07:36:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
pranav.lo...@cloudera.com has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. IMPALA-11205: Implement CORR() function CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. This UDAF is tested with a few query tests written in uda.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/uda.test 4 files changed, 173 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/5 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 5 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 4: (9 comments) http://gerrit.cloudera.org:8080/#/c/18413/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18413/4//COMMIT_MSG@10 PS4, Line 10: nit: remove the leading space http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@292 PS4, Line 292: // type and returns the correlation coefficient between them. Could you comment the formula here? http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@359 PS4, Line 359: nit: remove this blank line http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@366 PS4, Line 366: ctx->Free(src.ptr); Could you explain why we need to free it here? It's not allocated by this function. http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@373 PS4, Line 373: ctx->Free(src.ptr); Why do we free it here? We still use the CorrState below. http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@375 PS4, Line 375: // Calculating correlation coefficient using Pearson Product Moment Correlation formula : double corr = ((state->prod*state->count) - (state->sumx*state->sumy)) / : (powstate->count*state->sum_squaredx) - : (state->sumx*state->sumx)) * ((state->count*state->sum_squaredy) - :(state->sumy*state->sumy))), 0.5)); I just realized that we are using a different formula than Hive's: https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java#L49-L60 Are they equivalent? http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test File testdata/workloads/functional-query/queries/QueryTest/uda.test: http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@12 PS4, Line 12: '0.00032' I checked the test codes and found that it already handles rounding inaccurary: https://github.com/apache/impala/blob/31bd2a47036e7be3a0a32f873b60d2f70dcd8c9f/tests/common/test_result_verifier.py#L176-L185 So we are safe to use double results directly. BTW, can we add more corr() on other columns? http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@15 PS4, Line 15: Can we add tests on NULL values? Some tpcds tables contain NULLs, e.g. catalog_sales has NULLs in cs_sold_date_sk column. http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@32 PS4, Line 32: select corr(double_col,double_col) from functional.alltypes; Could you add tests for other types as well? E.g. select corr(tinyint_col, tinyint_col), corr(smallint_col, smallint_col), corr(int_col, int_col), corr(bigint_col, bigint_col), corr(float_col, float_col), corr(double_col, double_col), corr(timestamp_col, timestamp_col) from functional.alltypes; 'alltypes' table doesn't have decimal types. We can use the `decimal_tbl` table. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 4 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 15 Apr 2022 01:04:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10449/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 4 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 14 Apr 2022 08:02:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
pranav.lo...@cloudera.com has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. IMPALA-11205: Implement CORR() function CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. This UDAF is tested with a few query tests written in uda.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/uda.test 4 files changed, 151 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/4 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 4 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 3: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/10447/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 14 Apr 2022 05:02:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
pranav.lo...@cloudera.com has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. IMPALA-11205: Implement CORR() function CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. This UDAF is tested with a few query tests written in uda.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/uda.test 4 files changed, 153 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/3 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 2: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/10446/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 14 Apr 2022 04:51:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
pranav.lo...@cloudera.com has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. IMPALA-11205: Implement CORR() function CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. This UDAF is tested with a few query tests written in uda.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/uda.test 4 files changed, 153 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/2 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10440/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 1 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 13 Apr 2022 12:19:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
pranav.lo...@cloudera.com has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18413 Change subject: IMPALA-11205: Implement CORR() function .. IMPALA-11205: Implement CORR() function CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. This UDAF is tested with a few query tests written in uda.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/uda.test 4 files changed, 146 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/1 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 1 Gerrit-Owner: Anonymous Coward
[Impala-ASF-CR] IMPALA-11205: Implement CORR() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement CORR() function .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc@315 PS1, Line 315: void AggregateFunctions::CorrUpdate(FunctionContext* ctx, const DoubleVal& src1, const DoubleVal& src2, line too long (103 > 90) http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc@329 PS1, Line 329: void AggregateFunctions::CorrRemove(FunctionContext* ctx, const DoubleVal& src1, const DoubleVal& src2, line too long (103 > 90) http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc@346 PS1, Line 346: void AggregateFunctions::CorrMerge(FunctionContext* ctx, const StringVal& src, StringVal* dst) { line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc@361 PS1, Line 361: const StringVal AggregateFunctions::CorrSerialize(FunctionContext* ctx, const StringVal& src) { line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc@374 PS1, Line 374: double corr = ((state->prod*state->count) - (state->sumx*state->sumy)) / (powstate->count*state->sum_squaredx) - (state->sumx*state->sumx)) * ((state->count*state->sum_squaredy) - (state->sumy*state->sumy))), 0.5)); line too long (221 > 90) http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions.h File be/src/exprs/aggregate-functions.h: http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions.h@69 PS1, Line 69: static void CorrUpdate(FunctionContext* ctx, const DoubleVal& src1, const DoubleVal& src2, line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions.h@71 PS1, Line 71: static void CorrRemove(FunctionContext* ctx, const DoubleVal& src1, const DoubleVal& src2, line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 1 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 13 Apr 2022 11:58:06 + Gerrit-HasComments: Yes