[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-05-05 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@296
PS10, Line 296:   double prod;
> AFAIK, a similar approach is used in hive and other aggregate functions of
Seems ok to stick with double as other databases do this and there does not 
appear to be a standard. I would recommend comparing results with postgres 
including cases that are likely to lose precision mixing small and large 
numbers.


http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1539
PS10, Line 1539:
> Sure, I'll include those.
Also test empty table and all zero values cases.



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 05 May 2022 17:54:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-05-02 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 10:

(2 comments)

> Patch Set 10:
>
> (1 comment)
>
> > Patch Set 10:
> >
> > (4 comments)
> >
> > I would prefer to see the covar_pop and covar_samp functions implemented in 
> > the same patch as well.

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@317
PS10, Line 317:   state->sumx += x;
> AFAIK, Hive and other aggregate functions of impala like avg() aren't check
I'll add overflow handling


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@389
PS10, Line 389:   dst_state->sumx += src_state->sumx;
> AFAIK, Hive and other aggregate functions of impala like avg() aren't check
I'll add overflow handling



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 02 May 2022 11:33:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-29 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 10:

(1 comment)

> Patch Set 10:
>
> (4 comments)
>
> I would prefer to see the covar_pop and covar_samp functions implemented in 
> the same patch as well.

http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1539
PS10, Line 1539: 
> Add OLAP partition/window testcases. If not supported, they can be negative
Sure, I'll include those.



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 29 Apr 2022 07:37:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-29 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 10:

> Patch Set 10:
>
> Sure, I'll include those as well.

I mean the OLAP partition tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 29 Apr 2022 07:06:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-29 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 10:

Sure, I'll include those as well.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 29 Apr 2022 07:04:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-29 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 10:

(3 comments)

> Patch Set 10:
>
> (4 comments)
>
> I would prefer to see the covar_pop and covar_samp functions implemented in 
> the same patch as well.

Since this jira was focused on corr(), should we change its description or 
create a new jira for covariance functions?

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@296
PS10, Line 296:   double prod;
> These and any other computation variables may need to be precise types if t
AFAIK, a similar approach is used in hive and other aggregate functions of 
impala, that's why we kept it double. Double supports precision types like 
Decimal. There's a test included as well just to check this behaviour.


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@317
PS10, Line 317:   state->sumx += x;
> Handle overflow?
AFAIK, Hive and other aggregate functions of impala like avg() aren't checking 
explicitly for overflow. Do you want any specific check or behavior in case of 
overflow, probably returning null or giving an error?


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@389
PS10, Line 389:   dst_state->sumx += src_state->sumx;
> Handle overflow?
AFAIK, Hive and other aggregate functions of impala like avg() aren't checking 
explicitly for overflow. Do you want any specific check or behavior in case of 
overflow, probably returning null or giving an error?



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 29 Apr 2022 07:03:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-27 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 10:

(4 comments)

I would prefer to see the covar_pop and covar_samp functions implemented in the 
same patch as well.

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@296
PS10, Line 296:   double prod;
These and any other computation variables may need to be precise types if the 
output type is precise.


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@317
PS10, Line 317:   state->sumx += x;
Handle overflow?


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@389
PS10, Line 389:   dst_state->sumx += src_state->sumx;
Handle overflow?


http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1539
PS10, Line 1539:
Add OLAP partition/window testcases. If not supported, they can be negative 
tests.



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 27 Apr 2022 12:16:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-27 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 10: Code-Review+1

(5 comments)

LGTM. I'll see if other guys want to review this.

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@339
PS10, Line 339:
nit: remove the space


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@350
PS10, Line 350:
nit: remove the space


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@364
PS10, Line 364:
nit: fix the indention


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@365
PS10, Line 365: 
nit: remove blank lines


http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/uda.test
File testdata/workloads/functional-query/queries/QueryTest/uda.test:

http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/uda.test@180
PS10, Line 180: 
nit: could you revert the changes in this file?



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 27 Apr 2022 09:39:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10496/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 27 Apr 2022 08:58:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10495/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 27 Apr 2022 08:52:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-27 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 10:

(14 comments)

> Patch Set 7:
>
> (16 comments)
> 
> The patch is looking good now!

http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@291
PS7, Line 291: //
> nit: add a space after "//"
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@320
PS7, Line 320: st
> nit: need a space after "if"
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@321
PS7, Line 321:   state->prod += x * y;
 :   ++state->count;
 : }
 :
 : static inline void CorrRemoveState(double x, double y, 
CorrState* state){
 :   state->sumx -= x;
> We can extract these into an update method, e.g.
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@336
PS7, Line 336: if
> nit: need a space after "if"
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@337
PS7, Line 337:   CorrState* state = reinterpret_cast(dst->ptr);
 :   if (!isnan(src1.val) && !isnan(src2.val)) {
 : CorrUpdateState(src1.val, src2.val , state);
 :   }
 : }
 :
> We can extract these into a method too.
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@355
PS7, Line 355: co
> nit: need a space after "if"
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@376
PS7, Line 376: ns
> nit: need a space after if
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@401
PS7, Line 401: return DoubleVal::null();
> We should not free it here since we still use 'state' at following lines. I
Approach 1 worked so implemented that


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@408
PS7, Line 408: if
> nit: add a space after "if"
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test
File testdata/workloads/functional-query/queries/QueryTest/uda.test:

http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@10
PS7, Line 10: select test_count(int_col) from functional.alltypestiny;
> Sorry to be back and forth, but we'd better move these tests to testdata/wo
Sure, done


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@24
PS7, Line 24: select sum_small_decimal(c3) from functional.decimal_tiny;
> Do we have test coverage on only one side is NULL?
Yes, added


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@41
PS7, Line 41:  TYPES
> Cool!
Ack


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@48
PS7, Line 48:  RESULTS
> Could you also add the 'id' column here?
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@69
PS7, Line 69:  TYPES
> Could you add the 'year' column here?
Done



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 27 Apr 2022 08:41:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-27 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..

IMPALA-11205: Implement CORR() function

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them. This UDAF
is tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
M testdata/workloads/functional-query/queries/QueryTest/uda.test
5 files changed, 274 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/10
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-27 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..

IMPALA-11205: Implement CORR() function

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them. This UDAF
is tested with a few query tests written in uda.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
M testdata/workloads/functional-query/queries/QueryTest/uda.test
5 files changed, 274 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/9
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-26 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 7:

(16 comments)

The patch is looking good now!

http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@291
PS7, Line 291: //
nit: add a space after "//"


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@320
PS7, Line 320: if
nit: need a space after "if"


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@321
PS7, Line 321: state->sumx += src1.val;
 : state->sumy += src2.val;
 : state->sum_squaredx += src1.val * src1.val;
 : state->sum_squaredy += src2.val * src2.val;
 : state->prod += src1.val * src2.val;
 : ++state->count;
We can extract these into an update method, e.g.

  static inline void CorrUpdate(double x, double y, CorrState* state)

and then use it in TimestampCorrUpdate() as well.


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@336
PS7, Line 336: if
nit: need a space after "if"


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@337
PS7, Line 337: state->sumx -= src1.val;
 : state->sumy -= src2.val;
 : state->sum_squaredx -= src1.val * src1.val;
 : state->sum_squaredy -= src2.val * src2.val;
 : state->prod -= src1.val * src2.val;
 : --state->count;
We can extract these into a method too.


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@355
PS7, Line 355: if
nit: need a space after "if"


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@376
PS7, Line 376: if
nit: need a space after if


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@401
PS7, Line 401:   ctx->Free(src.ptr);
We should not free it here since we still use 'state' at following lines. I 
think we should free it before each 'return' statement.

Another solution is getting its values into local variables and then free it.


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@408
PS7, Line 408: if(
nit: add a space after "if"


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions.h
File be/src/exprs/aggregate-functions.h:

http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions.h@74
PS7, Line 74:   static const StringVal CorrSerialize(FunctionContext* ctx, 
const StringVal& src);
We can remove this


http://gerrit.cloudera.org:8080/#/c/18413/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/18413/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1322
PS7, Line 1322: Corr
nit: Add a space after "//"


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test
File testdata/workloads/functional-query/queries/QueryTest/uda.test:

http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@10
PS7, Line 10: select corr(ps_availqty,ps_supplycost) from tpch.partsupp;
Sorry to be back and forth, but we'd better move these tests to 
testdata/workloads/functional-query/queries/QueryTest/aggregation.test
All the builtin aggregate functions are tested there.


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@24
PS7, Line 24: select corr(d,e) from functional.nulltable;
Do we have test coverage on only one side is NULL?


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@41
PS7, Line 41: select corr(utctime,localtime) from functional.alltimezones;
Cool!


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@48
PS7, Line 48: select corr(int_col, int_col)
Could you also add the 'id' column here?

Also please comment why the results are NULLs.


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@69
PS7, Line 69: select corr(double_col,double_col)
Could you add the 'year' column here?



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10487/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 25 Apr 2022 10:08:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-25 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..

IMPALA-11205: Implement CORR() function

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them. This UDAF
is tested with a few query tests written in uda.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/uda.test
4 files changed, 250 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/7
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-20 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@366
PS4, Line 366:   return result;
> Yeah, we can remove it, it'll get cleared after the function pops from the
Oh, I think we need this. Otherwise we'll see warnings like

 WARNINGS: UDF WARNING: Memory leaked via FunctionContext::Allocate()

Look at other udas, they use StringValSerializeOrFinalize() for such simple 
purpose, i.e. copy and free. I think we don't need this CorrSerialize() method, 
just do the same as other udas, e.g.
https://github.com/apache/impala/blob/1358700740dbeff799f6a6a95f95b1d9fe7281d2/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java#L1376


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@373
PS4, Line 373:   // Calculating correlation coefficient using Pearson Product 
Moment Correlation formula
> Yeah, we can remove it. I intended to remove it, but probably forgot.
Based on the above comment, I think we do need this.


http://gerrit.cloudera.org:8080/#/c/18413/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/5/be/src/exprs/aggregate-functions-ir.cc@372
PS5, Line 372:   if (state->count == 0 || state->count == 1) return 
DoubleVal::null();
Can we add test cases for these?


http://gerrit.cloudera.org:8080/#/c/18413/5/be/src/exprs/aggregate-functions-ir.cc@374
PS5, Line 374: ((state->prod*state->count) - (state->sumx*state->sumy)) /
 :   (powstate->count*state->sum_squaredx) -
 :   (state->sumx*state->sumx)) * 
((state->count*state->sum_squaredy) -
 :(state->sumy*state->sumy))), 0.5));
Can we format this a little bit? There are too many brackets. Maybe this is 
better:

  int64_t n = state->count;
  double r1 = n * state->prod - state->sumx * state->sumy;
  double r2 = (n * state->sum_squaredx - state->sumx * state->sumx) *
  (n * state->sum_squaredy - state->sumy * state->sumy);
  double corr = r1 / sqrt(r2);

Note that we can replace pow(x, 0.5) with sqrt(x).

We should also check if r2 is 0 and return NULL in that case. Hive does it this 
way:
https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java#L366


http://gerrit.cloudera.org:8080/#/c/18413/5/testdata/workloads/functional-query/queries/QueryTest/uda.test
File testdata/workloads/functional-query/queries/QueryTest/uda.test:

http://gerrit.cloudera.org:8080/#/c/18413/5/testdata/workloads/functional-query/queries/QueryTest/uda.test@19
PS5, Line 19: NULL
Is this correct? In Hive, it's 1.0


http://gerrit.cloudera.org:8080/#/c/18413/5/testdata/workloads/functional-query/queries/QueryTest/uda.test@26
PS5, Line 26: corr(cast(timestamp_col as int),cast(timestamp_col as int))
Hive doesn't need to cast timestamp in the query. Can we improve this?



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 21 Apr 2022 03:27:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-18 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 5:

(9 comments)

> Patch Set 4:
>
> (9 comments)

http://gerrit.cloudera.org:8080/#/c/18413/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18413/4//COMMIT_MSG@10
PS4, Line 10: t
> nit: remove the leading space
Done


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@292
PS4, Line 292: // type and returns the correlation coefficient between them.
> Could you comment the formula here?
Done


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@359
PS4, Line 359:   dst_state->count += src_state->count;
> nit: remove this blank line
Done


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@366
PS4, Line 366:   return result;
> Could you explain why we need to free it here? It's not allocated by this f
Yeah, we can remove it, it'll get cleared after the function pops from the 
stack trace so its not needed.


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@373
PS4, Line 373:   // Calculating correlation coefficient using Pearson Product 
Moment Correlation formula
> Why do we free it here? We still use the CorrState below.
Yeah, we can remove it. I intended to remove it, but probably forgot.


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@375
PS4, Line 375:   (powstate->count*state->sum_squaredx) -
 :   (state->sumx*state->sumx)) * 
((state->count*state->sum_squaredy) -
 :(state->sumy*state->sumy))), 0.5));
 :   return DoubleVal(corr);
 : }
> I just realized that we are using a different formula than Hive's:
Yes, they're equivalent. Both are using Pearson's correlation coefficient 
itself. In hive, there are functions for covariance and standard deviation, so 
the formula might look a little different. But on simplifying its the same.


http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test
File testdata/workloads/functional-query/queries/QueryTest/uda.test:

http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@12
PS4, Line 12: 0.000321849166368
> I checked the test codes and found that it already handles rounding inaccur
Done


http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@15
PS4, Line 15: 
> Can we add tests on NULL values? Some tpcds tables contain NULLs, e.g. cata
Done


http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@32
PS4, Line 32: 
> Could you add tests for other types as well? E.g.
Sure, done!



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 18 Apr 2022 09:23:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10461/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 18 Apr 2022 07:36:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-18 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..

IMPALA-11205: Implement CORR() function

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them. This UDAF
is tested with a few query tests written in uda.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/uda.test
4 files changed, 173 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/5
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-14 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 4:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/18413/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18413/4//COMMIT_MSG@10
PS4, Line 10:
nit: remove the leading space


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@292
PS4, Line 292: // type and returns the correlation coefficient between them.
Could you comment the formula here?


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@359
PS4, Line 359:
nit: remove this blank line


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@366
PS4, Line 366:   ctx->Free(src.ptr);
Could you explain why we need to free it here? It's not allocated by this 
function.


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@373
PS4, Line 373:   ctx->Free(src.ptr);
Why do we free it here? We still use the CorrState below.


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@375
PS4, Line 375:   // Calculating correlation coefficient using Pearson Product 
Moment Correlation formula
 :   double corr = ((state->prod*state->count) - 
(state->sumx*state->sumy)) /
 :   (powstate->count*state->sum_squaredx) -
 :   (state->sumx*state->sumx)) * 
((state->count*state->sum_squaredy) -
 :(state->sumy*state->sumy))), 0.5));
I just realized that we are using a different formula than Hive's:
https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java#L49-L60

Are they equivalent?


http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test
File testdata/workloads/functional-query/queries/QueryTest/uda.test:

http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@12
PS4, Line 12: '0.00032'
I checked the test codes and found that it already handles rounding inaccurary:
https://github.com/apache/impala/blob/31bd2a47036e7be3a0a32f873b60d2f70dcd8c9f/tests/common/test_result_verifier.py#L176-L185

So we are safe to use double results directly.

BTW, can we add more corr() on other columns?


http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@15
PS4, Line 15: 
Can we add tests on NULL values? Some tpcds tables contain NULLs, e.g. 
catalog_sales has NULLs in cs_sold_date_sk column.


http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@32
PS4, Line 32: select corr(double_col,double_col) from functional.alltypes;
Could you add tests for other types as well? E.g.

select
  corr(tinyint_col, tinyint_col),
  corr(smallint_col, smallint_col),
  corr(int_col, int_col),
  corr(bigint_col, bigint_col),
  corr(float_col, float_col),
  corr(double_col, double_col),
  corr(timestamp_col, timestamp_col)
from functional.alltypes;

'alltypes' table doesn't have decimal types. We can use the `decimal_tbl` table.



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 15 Apr 2022 01:04:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10449/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 14 Apr 2022 08:02:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-14 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..

IMPALA-11205: Implement CORR() function

CORR() function takes two numeric type columns as arguments and returns
 the Pearson's correlation coefficient between them. This UDAF
 is tested with a few query tests written in uda.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/uda.test
4 files changed, 151 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/4
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 3:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/10447/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 14 Apr 2022 05:02:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-13 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..

IMPALA-11205: Implement CORR() function

CORR() function takes two numeric type columns as arguments and returns
 the Pearson's correlation coefficient between them. This UDAF
 is tested with a few query tests written in uda.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/uda.test
4 files changed, 153 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/3
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 2:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/10446/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 14 Apr 2022 04:51:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-13 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..

IMPALA-11205: Implement CORR() function

CORR() function takes two numeric type columns as arguments and returns the 
Pearson's correlation coefficient between them. This UDAF is tested with a few 
query tests written in uda.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/uda.test
4 files changed, 153 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/2
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10440/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 13 Apr 2022 12:19:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-13 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18413


Change subject: IMPALA-11205: Implement CORR() function
..

IMPALA-11205: Implement CORR() function

CORR() function takes two numeric type columns as arguments and returns the 
Pearson's correlation coefficient between them. This UDAF is tested with a few 
query tests written in uda.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/uda.test
4 files changed, 146 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/1
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward 


[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

2022-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc@315
PS1, Line 315: void AggregateFunctions::CorrUpdate(FunctionContext* ctx, const 
DoubleVal& src1, const DoubleVal& src2,
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc@329
PS1, Line 329: void AggregateFunctions::CorrRemove(FunctionContext* ctx, const 
DoubleVal& src1, const DoubleVal& src2,
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc@346
PS1, Line 346: void AggregateFunctions::CorrMerge(FunctionContext* ctx, const 
StringVal& src, StringVal* dst) {
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc@361
PS1, Line 361: const StringVal 
AggregateFunctions::CorrSerialize(FunctionContext* ctx, const StringVal& src) {
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc@374
PS1, Line 374:   double corr = ((state->prod*state->count) - 
(state->sumx*state->sumy)) / (powstate->count*state->sum_squaredx) - 
(state->sumx*state->sumx)) * ((state->count*state->sum_squaredy) - 
(state->sumy*state->sumy))), 0.5));
line too long (221 > 90)


http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions.h
File be/src/exprs/aggregate-functions.h:

http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions.h@69
PS1, Line 69:   static void CorrUpdate(FunctionContext* ctx, const DoubleVal& 
src1, const DoubleVal& src2,
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions.h@71
PS1, Line 71:   static void CorrRemove(FunctionContext* ctx, const DoubleVal& 
src1, const DoubleVal& src2,
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 13 Apr 2022 11:58:06 +
Gerrit-HasComments: Yes