pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 5:

(9 comments)

> Patch Set 4:
>
> (9 comments)

http://gerrit.cloudera.org:8080/#/c/18413/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18413/4//COMMIT_MSG@10
PS4, Line 10: t
> nit: remove the leading space
Done


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@292
PS4, Line 292: // type and returns the correlation coefficient between them.
> Could you comment the formula here?
Done


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@359
PS4, Line 359:   dst_state->count += src_state->count;
> nit: remove this blank line
Done


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@366
PS4, Line 366:   return result;
> Could you explain why we need to free it here? It's not allocated by this f
Yeah, we can remove it, it'll get cleared after the function pops from the 
stack trace so its not needed.


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@373
PS4, Line 373:   // Calculating correlation coefficient using Pearson Product 
Moment Correlation formula
> Why do we free it here? We still use the CorrState below.
Yeah, we can remove it. I intended to remove it, but probably forgot.


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@375
PS4, Line 375:       (pow((((state->count*state->sum_squaredx) -
             :       (state->sumx*state->sumx)) * 
((state->count*state->sum_squaredy) -
             :                    (state->sumy*state->sumy))), 0.5));
             :   return DoubleVal(corr);
             : }
> I just realized that we are using a different formula than Hive's:
Yes, they're equivalent. Both are using Pearson's correlation coefficient 
itself. In hive, there are functions for covariance and standard deviation, so 
the formula might look a little different. But on simplifying its the same.


http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test
File testdata/workloads/functional-query/queries/QueryTest/uda.test:

http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@12
PS4, Line 12: 0.000321849166368
> I checked the test codes and found that it already handles rounding inaccur
Done


http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@15
PS4, Line 15: ====
> Can we add tests on NULL values? Some tpcds tables contain NULLs, e.g. cata
Done


http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@32
PS4, Line 32: ====
> Could you add tests for other types as well? E.g.
Sure, done!



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <pranav.lo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pranav.lo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Comment-Date: Mon, 18 Apr 2022 09:23:02 +0000
Gerrit-HasComments: Yes

Reply via email to