[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 22: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 13:06:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the
coefficient of determination (also called R-squared or goodness of fit)
for the regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Reviewed-on: http://gerrit.cloudera.org:8080/19569
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 988 insertions(+), 8 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-24 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 21: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 08:54:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 22:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9629/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 08:55:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 22: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 08:55:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 21:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13829/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 08:29:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-24 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#21). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the
coefficient of determination (also called R-squared or goodness of fit)
for the regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 988 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/21
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/21/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/21/be/src/exprs/aggregate-functions-ir.cc@298
PS21, Line 298: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
line too long (151 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 08:05:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-23 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 20:

> > Quanlong, do you think we can specify a range for these values instead of 
> > requiring exact matches?

Our tests allow the error of double to be no more than 10e-10. I think the 
second column results violate this.
https://github.com/apache/impala/blob/05bc485851a255b1b00bdb1a1086ba5527877768/tests/common/test_result_verifier.py#L232

We can convert the second column to be

 regr_intercept(ss_sold_time_sk, ss_quantity) / 1

to scale down the error.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 23 Aug 2023 12:42:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-22 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 20:

> Patch Set 20:
>
> Hi Pranav, do the expected values come from Hive or your Impala dev env?
>
> Quanlong, do you think we can specify a range for these values instead of 
> requiring exact matches?

In both essentially.
In impala: 0.0602719636627,51709.905412,1.87116649337e-08
In hive: 0.060271963662778746,51709.905411981366,1.8711664933729818E-8


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 22 Aug 2023 10:48:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-22 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 20:

Hi Pranav, do the expected values come from Hive or your Impala dev env?

Quanlong, do you think we can specify a range for these values instead of 
requiring exact matches?


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 22 Aug 2023 09:56:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-22 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 20:

> Patch Set 20:
>
> > Patch Set 20: Verified-1
> >
> > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9607/
>
> The failure is TestAggregationQueries.test_aggregation in 
> https://jenkins.impala.io/job/ubuntu-20.04-dockerised-tests/372
>
> select regr_slope(ss_sold_time_sk, ss_quantity), 
> regr_intercept(ss_sold_time_sk, ss_quantity),
>  regr_r2(ss_sold_time_sk, ss_quantity) from tpcds.store_sales;
>
> (expected vs actual):
> 0.06027196366276166,51709.90541198131,1.871166493371902e-08 !=
> 0.06027196366273494,51709.90541198241,1.871166493370322e-08
>
> It seems a floating point error but it only occurs in the dockerised-test. 
> Are these expected values got from Hive, or from your local Impala runs? 
> @Pranav

Hi, yes, these are the expected values.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 22 Aug 2023 08:33:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-22 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 20:

> Patch Set 20: Verified-1
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9607/

The failure is TestAggregationQueries.test_aggregation in 
https://jenkins.impala.io/job/ubuntu-20.04-dockerised-tests/372

select regr_slope(ss_sold_time_sk, ss_quantity), 
regr_intercept(ss_sold_time_sk, ss_quantity),
 regr_r2(ss_sold_time_sk, ss_quantity) from tpcds.store_sales;

(expected vs actual):
0.06027196366276166,51709.90541198131,1.871166493371902e-08 !=
0.06027196366273494,51709.90541198241,1.871166493370322e-08

It seems a floating point error but it only occurs in the dockerised-test. Are 
these expected values got from Hive, or from your local Impala runs? @Pranav


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 22 Aug 2023 07:55:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 20: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9607/


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 21 Aug 2023 13:59:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-21 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 19: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 21 Aug 2023 09:37:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 20:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9607/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 21 Aug 2023 09:38:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 20: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 21 Aug 2023 09:38:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-20 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 19: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 21 Aug 2023 00:46:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 19:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13781/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 18 Aug 2023 20:35:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-18 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 19:

(5 comments)

> Patch Set 18:
>
> (5 comments)

http://gerrit.cloudera.org:8080/#/c/19569/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19569/16//COMMIT_MSG@20
PS16, Line 20: f
> Why is it needed? Why should the line be indented one space?
Sorry, I misunderstood, I though a space will also be needed. Done!


http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@316
PS18, Line 316:   AllocBuffer(ctx, dst, dst->len);
> You should initialise 'dst' with AllocBuffer(), otherwise 'dst' will not be
Done


http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@510
PS18, Line 510:   AllocBuffer(ctx, dst, dst->len);
> See L316.
Done


http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@516
PS18, Line 516: }
> Nit: still empty line.
Done


http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@748
PS18, Line 748:   if (UNLIKELY(dst->is_null)) {
> See L316.
Done



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 18 Aug 2023 20:10:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-18 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#19). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the
coefficient of determination (also called R-squared or goodness of fit)
for the regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 988 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/19
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/19/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/19/be/src/exprs/aggregate-functions-ir.cc@298
PS19, Line 298: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
line too long (151 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 18 Aug 2023 20:11:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-18 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 18:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/19569/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19569/16//COMMIT_MSG@20
PS16, Line 20:
> I think its needed
Why is it needed? Why should the line be indented one space?


http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@316
PS18, Line 316:   dst->ptr = ctx->Allocate(dst->len);
You should initialise 'dst' with AllocBuffer(), otherwise 'dst' will not be set 
to null on failure.


http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@510
PS18, Line 510:   dst->ptr = ctx->Allocate(dst->len);
See L316.


http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@516
PS18, Line 516:
Nit: still empty line.


http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@748
PS18, Line 748:   dst->ptr = ctx->Allocate(dst->len);
See L316.



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 18 Aug 2023 08:11:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13772/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 17 Aug 2023 19:58:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 17:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13771/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 17 Aug 2023 19:57:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-17 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 18:

(4 comments)

> Patch Set 18:
>
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19569/16//COMMIT_MSG@20
PS16, Line 20:
> Nit: unnecessary space.
I think its needed


http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc@292
PS16, Line 292:  i
> Nit: no need for parentheses here.
Done


http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc@316
PS16, Line 316:   dst->ptr = ctx->Allocate(dst->len);
> We need to deal with allocation failure here, e.g.
Done


http://gerrit.cloudera.org:8080/#/c/19569/16/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/19569/16/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1372
PS16, Line 1372: List
> nit: use 4 spaces indention
Done



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 17 Aug 2023 19:35:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-17 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the
coefficient of determination (also called R-squared or goodness of fit)
 for the regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 987 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/18
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-17 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#17). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the
coefficient of determination (also called R-squared or goodness of fit)
 for the regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 987 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/17
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@298
PS18, Line 298: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
line too long (151 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 17 Aug 2023 19:34:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/17/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/17/be/src/exprs/aggregate-functions-ir.cc@298
PS17, Line 298: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
line too long (151 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 17 Aug 2023 19:32:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-17 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 16:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19569/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19569/16//COMMIT_MSG@20
PS16, Line 20:
Nit: unnecessary space.


http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc@292
PS16, Line 292: ()
Nit: no need for parentheses here.



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 17 Aug 2023 10:43:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-08 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 16:

(3 comments)

> Patch Set 16:
>
> > Patch Set 16:
> >
> > (1 comment)
>
> Please ignore patch 15. I've removed 2 tests from the patch:
> select s_store_sk, regr_slope(s_number_employees, s_floor_space) over 
> (partition by s_city order by s_store_sk
>  rows between 1 preceding and 1 following) from tpcds.store;
> and
> select s_store_sk, regr_intercept(s_number_employees, s_floor_space) over 
> (partition by s_city order by s_store_sk
>  rows between 1 preceding and 1 following) from tpcds.store;
> as they were resulting in wrong results when compared to hive. The reason for 
> the wrong results does not correspond to the code but is because of different 
> implementaion for evaluating window functions in Hive and Impala.
> This was verified by runnig a query independent of regression functions.
> A separate jira can be filed to look further into it.

I filed IMPALA-12347 to track the different results with Hive due to floating 
point computation error.

http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc@316
PS16, Line 316:   dst->ptr = ctx->Allocate(dst->len);
We need to deal with allocation failure here, e.g.
https://github.com/apache/impala/blob/8638255e5074f1342dfc452bca39f649a76612d6/be/src/exprs/aggregate-functions-ir.cc#L1678-L1682

CorrInit() and CovarInit() are also missing this.


http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc@508
PS16, Line 508:
nit: empty line


http://gerrit.cloudera.org:8080/#/c/19569/16/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/19569/16/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1372
PS16, Line 1372:
nit: use 4 spaces indention



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 08 Aug 2023 08:35:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13712/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 04 Aug 2023 07:34:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-04 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 16:

> Patch Set 16:
>
> (1 comment)

Please ignore patch 15. I've removed 2 tests from the patch:
select s_store_sk, regr_slope(s_number_employees, s_floor_space) over 
(partition by s_city order by s_store_sk
 rows between 1 preceding and 1 following) from tpcds.store;
and
select s_store_sk, regr_intercept(s_number_employees, s_floor_space) over 
(partition by s_city order by s_store_sk
 rows between 1 preceding and 1 following) from tpcds.store;
as they were resulting in wrong results when compared to hive. The reason for 
the wrong results does not correspond to the code but is because of different 
implementaion for evaluating window functions in Hive and Impala.
This was verified by runnig a query independent of regression functions.
A separate jira can be filed to look further into it.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 04 Aug 2023 07:19:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-04 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#16). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the
coefficient of determination (also called R-squared or goodness of fit)
 for the regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 975 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/16
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc@298
PS16, Line 298: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
line too long (151 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 04 Aug 2023 07:11:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-07-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 14: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 10 Jul 2023 21:19:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-07-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9489/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 10 Jul 2023 17:09:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-29 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 14: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 29 Jun 2023 16:32:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-26 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 14:

(2 comments)

> Patch Set 14: Code-Review+1
>
> Thanks Pranav!

http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@476
PS10, Line 476: const StringVal& src) {
> @Daniel, can we discuss about this offline.
Hi @Daniel, let me know about this.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@706
PS10, Line 706: c
> I checked the behavior of a few queries with hive and postgreSQL, our code
Hi @Daniel, before we freeze it, can you please just give it one more look to 
make sure we covered everything, I think we did, but just to be sure.



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 26 Jun 2023 08:33:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-26 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 14: Code-Review+1

Thanks Pranav!


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 26 Jun 2023 07:09:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13387/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sun, 25 Jun 2023 19:13:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-25 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 14:

(7 comments)

> Patch Set 14:
>
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@729
PS10, Line 729: // r = covar / (n)
> I don't think that's what Kurt meant as this will lead to a memory leak for
Done


http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@420
PS13, Line 420: gA
> This still copies a source field which will not be modified. If we eliminat
Done


http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@420
PS13, Line 420: do
> Nit: too much indentation here and in the following lines.
Done


http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@444
PS13, Line 444: are very small, they c
> We usually capitalise constants. Also, instead of "estimator", I think "thr
Done


http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@467
PS13, Line 467: | sta
> Use the newly defined constant also here.
Done


http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@478
PS13, Line 478:   RegrInterceptGetValue(ctx, src);
> Memory leak, see comment on L729.
Done


http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions.h
File be/src/exprs/aggregate-functions.h:

http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions.h@40
PS13, Line 40: FLOATING_POINT_ERROR_T
> About the name, see the comment at aggregate-functions-ir.cc L444.
Done



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sun, 25 Jun 2023 18:54:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/14/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/14/be/src/exprs/aggregate-functions-ir.cc@298
PS14, Line 298: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
line too long (151 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sun, 25 Jun 2023 18:51:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-25 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the
coefficient of determination (also called R-squared or goodness of fit)
 for the regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,019 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/14
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-20 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 13:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@729
PS10, Line 729: // population covariance between two columns of numeric types 
respectively using
> Done
I don't think that's what Kurt meant as this will lead to a memory leak for 
NULLs - we allocate a buffer in RegrSlopeInit() but don't free it now.

I think Kurt meant something like this:

{
  DoubleVal r = src.is_null ? DoubleVal::null() :
  Regr_r2GetValue(ctx, src);
  ctx->Free(src.ptr);
  return r;
}


http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@420
PS13, Line 420: nB
This still copies a source field which will not be modified. If we eliminate 
the copies of the source variables, this should be eliminated too.


http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@420
PS13, Line 420:
Nit: too much indentation here and in the following lines.


http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@444
PS13, Line 444: FloatingErrorEstimator
We usually capitalise constants. Also, instead of "estimator", I think 
"threshold" would be better, so for example:
FLOATING_POINT_ERROR_THRESHOLD or FP_ERROR_THRESHOLD.


http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@467
PS13, Line 467: -1E-8
Use the newly defined constant also here.


http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@478
PS13, Line 478:   if (UNLIKELY(src.is_null)) {
Memory leak, see comment on L729.


http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions.h
File be/src/exprs/aggregate-functions.h:

http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions.h@40
PS13, Line 40: FloatingErrorEstimator
About the name, see the comment at aggregate-functions-ir.cc L444.



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 20 Jun 2023 10:05:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13296/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 14 Jun 2023 23:03:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@298
PS13, Line 298: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
line too long (151 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 14 Jun 2023 22:41:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-14 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the
coefficient of determination (also called R-squared or goodness of fit)
 for the regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,024 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/13
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 12:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/13292/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 14 Jun 2023 20:09:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-14 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the
coefficient of determination (also called R-squared or goodness of fit)
 for the regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,024 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/12
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/12/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/12/be/src/exprs/aggregate-functions-ir.cc@298
PS12, Line 298: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
line too long (151 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 14 Jun 2023 19:50:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 11:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/13291/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 14 Jun 2023 18:54:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-14 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 11:

(23 comments)

> Patch Set 11:
>
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@712
PS9, Line 712: as 0.
> I see. In this case I don't think we should divide 'independent_var' and 'd
I see, but if the values become zero due to floating point rounding, shouldn't 
we deal with those specifically, as in return null or 1.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@317
PS10, Line 317:   *(reinterpret_cast(dst->ptr)) = {};
> Probably faster to use:
Sure, after running a pseudo code, it does perform a little better.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@342
PS10, Line 342: 
*(reinterpret_cast(sizeof(RegrSlopeState))) = {};
> Probably faster to use:
Sure, after running a pseudo code, it does perform a little better.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@386
PS10, Line 386:   if (tm_src1.ToSubsecondUnixTime(UTCPTR, &val1) &&
> Can UtcToUnixTime be used here? Even better to add a function to convert di
Unfortunately no, since UtcToUnixTime takes a time_t* argument which would 
defeat the purpose of the whole operation. ToSubsecondUnixTime() takes a 
double* argument which converts the timestamp into double values. These values 
can then be used in the standard update and remove functions.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@402
PS10, Line 402:   if (tm_src1.ToSubsecondUnixTime(UTCPTR, &val1) &&
> Can UtcToUnixTime be used here?
Unfortunately no, since UtcToUnixTime takes a time_t* argument which would 
defeat the purpose of the whole operation. ToSubsecondUnixTime() takes a 
double* argument which converts the timestamp into double values. These values 
can then be used in the standard update and remove functions.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@410
PS10, Line 410:   const RegrSlopeState* src_state = 
reinterpret_cast(src.ptr);
> declare const
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@415
PS10, Line 415: int64_t nA = dst_state->count;
> It's often faster to not copy these member references to variables, especia
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@423
PS10, Line 423:
> src fields are not updated and do not need to be copied.
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@443
PS10, Line 443:   // Since these values are very small, they can be ignored and 
rounded to 0.
> Declare as const
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@451
PS10, Line 451: DoubleVal 
AggregateFunctions::RegrSlopeFinalize(FunctionContext* ctx,
> eliminate this variable.
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@472
PS10, Line 472:  stat
> define as constant or use an existing one
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@476
PS10, Line 476: DoubleVal 
AggregateFunctions::RegrInterceptFinalize(FunctionContext* ctx,
> Check if removing these variables improves code generation.
@Daniel, can we discuss about this offline.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@683
PS10, Line 683: const StringVal& src) {
> Add one more space so that "1" aligns with "NULL". Also applies to the next
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@683
PS10, Line 683: r
> I don't see much value in surrounding "var_pop(y) = 0" and "var_pop(x) != 0
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@689
PS10, Line 689:   // CorrUpdate(), which we use to produce the intermediate 
values, has the opposite
> Declare const
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@700
PS10, Line 700:   DCHECK(independent_var >= FloatingErrorEstimator);
> Nit: I would leave an empty line before this.
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@706
PS10, Line 706:
> I've done some experiments with Hive and indeed we get the results when we
I checked the behavior of a few queries with hive and postgreSQL, our code is 
in sync with the

[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-14 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the
coefficient of determination (also called R-squared or goodness of fit)
 for the regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,024 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/11
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/11/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/11/be/src/exprs/aggregate-functions-ir.cc@298
PS11, Line 298: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
line too long (151 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 14 Jun 2023 18:34:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-05-18 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 10:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@317
PS10, Line 317:   memset(dst->ptr, 0, dst->len);
Probably faster to use:
*(reinterpret_cast(buf)) = {};


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@342
PS10, Line 342: memset(state, 0, sizeof(RegrSlopeState));
Probably faster to use:
*(reinterpret_cast(buf)) = {};


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@386
PS10, Line 386:   if (tm_src1.ToSubsecondUnixTime(UTCPTR, &val1) &&
Can UtcToUnixTime be used here? Even better to add a function to convert 
directly.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@402
PS10, Line 402:   if (tm_src1.ToSubsecondUnixTime(UTCPTR, &val1) &&
Can UtcToUnixTime be used here?


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@410
PS10, Line 410:   RegrSlopeState* src_state = 
reinterpret_cast(src.ptr);
declare const


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@415
PS10, Line 415: int64_t nA = dst_state->count;
It's often faster to not copy these member references to variables, especially 
on CISC processors. If they are kept, the else branch can be eliminated and nB 
can be moved after the return.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@423
PS10, Line 423:   double yavgB = src_state->yavg;
src fields are not updated and do not need to be copied.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@443
PS10, Line 443:   RegrSlopeState* state = 
reinterpret_cast(src.ptr);
Declare as const


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@451
PS10, Line 451:   double regrSlope = state->covar / state->xvar;
eliminate this variable.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@472
PS10, Line 472: -1E-8
define as constant or use an existing one


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@476
PS10, Line 476:   double regrSlope = state->covar / state->xvar;
Check if removing these variables improves code generation.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@689
PS10, Line 689:   CorrState* state = reinterpret_cast(src.ptr);
Declare const


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@729
PS10, Line 729:   ctx->Free(src.ptr);
Changing function to have one Free call should reduce code size


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@751
PS10, Line 751:   memset(dst->ptr, 0, dst->len);
Probably faster to use:
*(reinterpret_cast(buf)) = {};


http://gerrit.cloudera.org:8080/#/c/19569/10/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/19569/10/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1344
PS10, Line 1344:
nit: extra WS



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 18 May 2023 16:31:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-05-18 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 10:

(8 comments)

Thanks, it is getting better and better. We'll still need to find out more 
about the problem of negative covariances of the dependent variable.

http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@712
PS9, Line 712: d_squared' can only be 0 if eit
> No, because covariance will also get divided by state->count which will can
I see. In this case I don't think we should divide 'independent_var' and 
'dependent_var' in the other places either, they are just checks for zero-ness.

If they are not zero before division it means they shouldn't be zero after 
division either, the only way they may become zero is by floating point 
rounding. But we never actually use the divided values later, so we don't have 
to treat them as zero in this case.
This involves L705-706 and L708 in Patch Set 10.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@683
PS10, Line 683: //   1 if (var_pop(y) = 0) and (var_pop(x) != 0), 
else
Add one more space so that "1" aligns with "NULL". Also applies to the next 
line.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@683
PS10, Line 683: (
I don't see much value in surrounding "var_pop(y) = 0" and "var_pop(x) != 0" 
with parentheses. What I thought of was

  ...
  1 if var_pop(y) = 0 (and var_pop(x) != 0), else
  ...

This puts "var_pop(x) != 0" in parentheses because it is redundant - it is 
implied by the previous line as it ends with "else". I would actually even omit 
"var_pop(x) != 0".


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@700
PS10, Line 700:   // dependent_var and independent_var become negative in 
certain cases due to floating
Nit: I would leave an empty line before this.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@706
PS10, Line 706: <
I've done some experiments with Hive and indeed we get the results when we 
treat this case like this, but it needs more investigation because it is not 
clear why Hive returns NULL in this case, for example for this query:

select s_store_sk, regr_r2(s_number_employees, s_floor_space) over (partition 
by s_city order by s_store_sk
 rows between 1 preceding and 1 following) from tpcds.store;

It doesn't seem to handle negative values here: 
https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBinarySetFunctions.java#L331

I'll try to do some more digging.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@713
PS10, Line 713: are
This is the one that should be "is", as in "if either 'dependent_var' or 
'independent_var' is 0".

In the next sentence it should be "are", as in "if both 'dependent_var' and 
'independent_var' are very small".


http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2444
PS7, Line 2444: dependednt
It is the dependent variable that becomes negative here.


http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2695
PS7, Line 2695: independe
It is the independent var that becomes negative here.



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 18 May 2023 13:05:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-05-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13068/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 17 May 2023 20:37:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-05-17 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 10:

(6 comments)

> Patch Set 9:
>
> (7 comments)

http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@702
PS7, Line 702: ignored and rounded
> I'll try to reproduce the situation myself and see what Hive does.
Sure


http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@683
PS9, Line 683: =
> There's no matching opening parenthesis. I think the opening paren should c
Done


http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@685
PS9, Line 685: // where y and x are the dependent and independent variables
> Fits on the previous line, after the if.
Done


http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@692
PS9, Line 692:   // 'x_var'. This is to avoid confusion, because for regr_r2() 
the dependent variable is
 :   // the first parameter and the independent variable is the 
second parameter, but
 :   // CorrUpdate(), which we use to produce the intermediate 
values, has the o
> This could come after assigning 'dependent_var' and 'independent_var', befo
Done


http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@712
PS9, Line 712: d_squared' can only be 0 if eit
> Shouldn't we also divide these by state->count? In that case we could inclu
No, because covariance will also get divided by state->count which will cancel 
the divisions of dependent_var and independent_var.


http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@714
PS9, Line 714: ind
> Nit: is.
Done



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 17 May 2023 20:16:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-05-17 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the
coefficient of determination (also called R-squared or goodness of fit)
 for the regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,023 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/10
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-05-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@298
PS10, Line 298: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
line too long (151 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 17 May 2023 20:16:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-05-12 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 9:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/19569/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19569/9//COMMIT_MSG@18
PS9, Line 18: regr_r2() takes two arguments of numeric type and returns the 
coefficient
This line is still too long (73 vs. 72).


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@702
PS7, Line 702:
> To cross check the results, I did run the queries on postgreSQL as well, an
I'll try to reproduce the situation myself and see what Hive does.


http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@683
PS9, Line 683: )
There's no matching opening parenthesis. I think the opening paren should come 
before the "and var_pop(x)".


http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@685
PS9, Line 685: //   (var_pop(y) != 0 and var_pop(x) != 0)
Fits on the previous line, after the if.


http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@692
PS9, Line 692:   // dependent_var and independent_var become negative in 
certain cases due to floating
 :   // point rounding error.
 :   // Since these values are very small, they can be ignored and 
rounded to 0.
This could come after assigning 'dependent_var' and 'independent_var', before 
the DCHECKs.


http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@712
PS9, Line 712: dependent_var * independent_var
Shouldn't we also divide these by state->count? In that case we could include 
the division from the beginning, i.e. the definition of 'dependent_var' and 
'independent_var'.


http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@714
PS9, Line 714: are
Nit: is.



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 12 May 2023 15:15:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-05-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13008/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 11 May 2023 21:50:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-05-11 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the coefficient
of determination (also called R-squared or goodness of fit) for the
regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,024 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/9
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-05-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@298
PS9, Line 298: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
line too long (151 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 11 May 2023 21:31:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-05-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12977/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 09 May 2023 17:57:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-05-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/8/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/8/be/src/exprs/aggregate-functions-ir.cc@298
PS8, Line 298: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
line too long (151 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 09 May 2023 17:38:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-05-09 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 8:

(11 comments)

> Patch Set 7:
>
> (10 comments)

http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@299
PS7, Line 299:
> line too long (151 > 90)
Since its a hyperlink, it should be in the same line for easy access.


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@678
PS7, Line 678: // CorrState is reused for implementing regr_r2.
> In source files the max line width is 90. Many lines here are shorter than
Noted, done!


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@679
PS7, Line 679: ar
> Nit: no need for this 'is'.
Done


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@684
PS7, Line 684: //   power(corr(y, x),2) if
> I think using 'dependent' and 'independent' here in the doc comment is over
Done


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@685
PS7, Line 685: p
> You left out var_pop.
Done


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@688
PS7, Line 688: DoubleVal AggregateFunctions::Regr_r2GetValue(FunctionContext* 
ctx,
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@698
PS7, Line 698:   // CorrUpdate(), which we use to produce the intermediate 
values, has the opposite
> We should add a comment describing why we assign 'state->xvar' to 'dependen
Done


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@702
PS7, Line 702:
> This still doesn't seem right. If 'dependent_var' is less than zero, then w
To cross check the results, I did run the queries on postgreSQL as well, and 
this is how its functioning there, we can discuss more about it offline.


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@708
PS7, Line 708: lse if (dependent_var == 0.0) {
 : return 1;
> This is unnecessary here, the following lines contain this information.
Done


http://gerrit.cloudera.org:8080/#/c/19569/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/19569/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1344
PS7, Line 1344:
> Nit: trailing whitespace, should be removed.
There's usually a line break before the definitions of new functions. Please 
let me know if you mean something else.


http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2398
PS7, Line 2398: (state->xvar <= 0.0), without which the be
> This is not accurate, we shouldn't handle these cases as the same. See comm
Done, let me know if you think that is okay.



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 09 May 2023 17:37:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-05-09 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the coefficient
of determination (also called R-squared or goodness of fit) for the
regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,023 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/8
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-04-27 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 7:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/19569/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19569/7//COMMIT_MSG@18
PS7, Line 18: regr_r2() takes two arguments of numeric type and returns the 
coefficient
This line is still too long (73 vs. 73).


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@678
PS7, Line 678: // Implementation of regr_r2():
In source files the max line width is 90. Many lines here are shorter than that 
for no reason. We should normally not use shorter lines, only if a line break 
has a reason, for example a new paragraph.


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@679
PS7, Line 679: is
Nit: no need for this 'is'.


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@684
PS7, Line 684: // regr_2(dependent, independent) = NULL if var_pop(independent) 
= 0, else
I think using 'dependent' and 'independent' here in the doc comment is 
overkill, you see 'y' and 'x' in most descriptions, where 'y' is the dependent 
var and 'x' is the independent var.


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@685
PS7, Line 685: (
You left out var_pop.


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@698
PS7, Line 698:   double dependent_var = state->xvar;
We should add a comment describing why we assign 'state->xvar' to 'dependent' 
and not to 'independent' (and vice versa).

Something like this:

In this function we use 'dependent_var' and 'independent_var' instead of 
'y_var' and 'x_var'. This is to avoid confusion, because for regr_r2() the 
dependent variable is the first parameter and the independent variable is the 
second parameter, but CorrUpdate(), which we use to produce the intermediate 
values, has the opposite order. Our aggregate function framework passes the 
variables in order to CorrUpdate(), so in CorrUpdate() 'x' corresponds to the 
dependent variable of regr_r2() and 'y' to the independent variable of 
regr_r2().


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@702
PS7, Line 702: dependent_var < 0.0
This still doesn't seem right. If 'dependent_var' is less than zero, then we 
should treat it as zero, but if at the same time 'independent_var' is not zero, 
then we should return 1, like on L705.

Maybe it's easier if we do the following:
First check if independent_var is negative, if it is we set it to 0. Then we do 
the same with dependent_var. We remove the checks for negativity from L702, so 
we only have (state->count < 2 || independent_var == 0) as the condition.


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@708
PS7, Line 708: / Returns null if dependent_var and independent_var are very 
small and
 : // if their product might become 0 because of floating point 
underflow.
This is unnecessary here, the following lines contain this information.


http://gerrit.cloudera.org:8080/#/c/19569/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/19569/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1344
PS7, Line 1344:
Nit: trailing whitespace, should be removed.


http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2398
PS7, Line 2398: (state->xvar <= 0.0 || state->yvar <= 0.0)
This is not accurate, we shouldn't handle these cases as the same. See comment 
in AggregateFunctions::Regr_r2GetValue().
Applies also to the other tests.



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 27 Apr 2023 14:47:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-04-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12843/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 24 Apr 2023 08:12:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-04-24 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the coefficient
of determination (also called R-squared or goodness of fit) for the
regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,021 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/7
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-04-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@299
PS7, Line 299: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
line too long (151 > 90)


http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@688
PS7, Line 688: // where dependent and independent are the dependent and 
independent variables respectively.
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 24 Apr 2023 07:53:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-04-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12748/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 04 Apr 2023 22:42:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-04-04 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() 
and regr_r2()

The linear regression functions fit an ordinary-least-squares
regression line to a set of number pairs. They can be
used both as aggregate and analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope of the 
line.
regr_intercept() takes two arguments of numeric type and returns the y-intercept
of the regression line.
regr_r2() takes two arguments of numeric type and returns the coefficient
of determination (also called R-squared or goodness of fit) for the regression.

Testing:
The functions are extensively tested and cross-checked with Hive.
The tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 993 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/5
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-04-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/5/be/src/exprs/aggregate-functions-ir.cc@299
PS5, Line 299: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
line too long (151 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 04 Apr 2023 22:23:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-04-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 4:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/12747/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 04 Apr 2023 21:31:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-04-04 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 4:

(54 comments)

> Patch Set 4:
>
> (1 comment)

A few more tests are yet to be added.

http://gerrit.cloudera.org:8080/#/c/19569/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19569/2//COMMIT_MSG@18
PS2, Line 18:
> You could have a newline after the ':'.
Done


http://gerrit.cloudera.org:8080/#/c/19569/2//COMMIT_MSG@18
PS2, Line 18:
> Nit: Hive (capital H).
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@291
PS1, Line 291: // Implementation of regr_slope() and regr_intercept():
> Did you use the Oracle link as specification? Did you take the implementati
Yes, I've used the Oracle link just for specification, couldn't find these 
exact formulas on any paper but their constituent parts like xvar, yvar, covar 
etc are mentioned in papers and have a similar implementation as correlation 
and covariance functions.


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@294
PS1, Line 294: he regression slope of the line and the y-int
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@294
PS1, Line 294: return
> Nit: return
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@296
PS1, Line 296: a set of nu
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@297
PS1, Line 297: // analytic functions.
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@303
PS1, Line 303: // regr_intercept() formula used:
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@308
PS1, Line 308: a
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@317
PS1, Line 317:   dst->len = sizeof(RegrSlopeState);
> Not addressed yet.
As discussed, to maintain uniformity across all the other aggregate functions, 
it'd be better to stick with memset(). Open for discussion.


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@341
PS1, Line 341:
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@344
PS1, Line 344: memset(state, 0, sizeof(RegrSlopeState));
> Not addressed yet.
As discussed, to maintain uniformity across all the other aggregate functions, 
it'd be better to stick with memset(). Open for discussion.


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@418
PS1, Line 418: int64_t nB = src_state->count;
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@421
PS1, Line 421:   return;
> Could use else if, and then no need for the condition "nA != 0".
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@442
PS1, Line 442:
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@452
PS1, Line 452:   }
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@453
PS1, Line 453: regr
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@457
PS1, Line 457: DoubleVal AggregateFunctions::RegrSlopeFinalize(FunctionContext* 
ctx,
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@459
PS1, Line 459:
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@468
PS1, Line 468: DoubleVal 
AggregateFunctions::RegrInterceptGetValue(FunctionContext* ctx,
> line too long (97 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@479
PS1, Line 479:   double regrIntercept = state->yavg - (regrSlope * state->xavg);
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@480
PS1, Line 480: DoubleVal
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@486
PS1, Line 486: ctx->Free(src.ptr);
> line too long (97 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@488
PS1, Line 488:
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@690
PS1, Line 690:
> This condition is superfluous, it follows from the ones above, but in case
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@691
PS1, Line 691: DoubleVal

[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-04-04 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() 
and regr_r2()

The linear regression functions fit an ordinary-least-squares
regression line to a set of number pairs. They can be
used both as aggregate and analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope of the 
line.
regr_intercept() takes two arguments of numeric type and returns the y-intercept
of the regression line.
regr_r2() takes two arguments of numeric type and returns the coefficient
of determination (also called R-squared or goodness of fit) for the regression.

Testing:
The functions are extensively tested and cross-checked with Hive.
The tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 992 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/4
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-04-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/4/be/src/exprs/aggregate-functions-ir.cc@299
PS4, Line 299: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
line too long (151 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 04 Apr 2023 21:20:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-04-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12733/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 03 Apr 2023 07:22:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-04-03 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() 
and regr_r2()

The linear regression functions fit an ordinary-least-squares regression line
to a set of number pairs. They can be used both as aggregate and analytic 
functions.

regr_slope() takes two arguments of numeric type and returns the slope of the 
line.
regr_intercept() takes two arguments of numeric type and returns the y-intercept
of the regression line.
regr_r2() takes two arguments of numeric type and returns the coefficient
of determination (also called R-squared or goodness of fit) for the regression.

Testing: The functions are extensively tested and cross-checked with hive and 
FENG. The tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 994 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/2
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-03-06 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 1:

(34 comments)

Thanks Pranav!

I wonder if we could create some script that tested these functions against 
some other database which we would like to be compatible with, for example 
Hive. We should run the same queries in both engines and expect the same 
results. I'm not sure we can insert it into our test framework but it would be 
good if we could at least manually run a test like this before adding these 
functions.

The tests in aggregation.test would be a good starting point for the queries to 
test against Hive (or Postrgesql etc.).

http://gerrit.cloudera.org:8080/#/c/19569/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19569/1//COMMIT_MSG@9
PS1, Line 9: The linear regression functions fit an ordinary-least-squares 
regression line
Lines in the commit message should be at most 72 chars long.


http://gerrit.cloudera.org:8080/#/c/19569/1//COMMIT_MSG@10
PS1, Line 10: pairs which
It would be clearer this way:
"... pairs. They can be used ...".


http://gerrit.cloudera.org:8080/#/c/19569/1//COMMIT_MSG@16
PS1, Line 16: of determination (also called R-squared or goodness of fit) for 
the regression.
Please add a "Testing".


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@291
PS1, Line 291: // Implementation of regr_slope() and regr_intercept():
Did you use some source for either the implementation or the specification of 
these regression functions (i.e. some description you based your implementation 
on)? If so, please reference it here.


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@294
PS1, Line 294: the regression slope and regression intercept
Now this information is written twice in this comment: here and starting on 
L297, which is a bit more detailed. I propose to move the more detailed version 
here and remove it from around L297.

Something like this:
"... two arguments of numeric types and return the regression slope of the line 
and the y-intercept of the regression line respectively."


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@294
PS1, Line 294: returns
Nit: return


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@296
PS1, Line 296: pairs which
Like in the commit message, "... pairs. They can be used ..." would be clearer.


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@297
PS1, Line 297: // regr_slope() returns the slope of the line while 
regr_intercept() returns the
See comment on L294.


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@303
PS1, Line 303: // // where x and y are the dependent and independent variables 
respectively.
The convention is that 'y' is the dependent variable and 'x' is the independent 
variable. The first argument of these functions is the dependent variable 
(which should be 'y'). (See for example 
https://docs.snowflake.com/en/sql-reference/functions/regr_intercept).

We should stick to this convention.

Also, the order of the arguments should be clearly described here, something 
like this:

"regr_slope(y, x) = covar_pop(x,y) / var_pop(x)
where y is the dependent variable and x is the independent variable."


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@308
PS1, Line 308: n
'n' is not introduced here, it should be "'count' times the variance of ...".
Applies also to the following line (L309).


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@317
PS1, Line 317:   memset(dst->ptr, 0, dst->len);
We could have a reset() member function in RegrSlopeState that sets each 
variable to 0.


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@341
PS1, Line 341:   double deltaX = x - state->xavg;
Could move the calculation of the deltas to the else branch, they are only used 
there.


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@344
PS1, Line 344: memset(state, 0, sizeof(RegrSlopeState));
Could use state->reset(), see L317.


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@418
PS1, Line 418:   memcpy(dst_state, src_state, sizeof(RegrSlopeState));
Could use the copy assignment operator (*dst_state = *src_state) instead of 
mempcy.


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@421
PS1, Line 421: if (nA != 0 && nB != 0) {
Could use else if, and then no need for the condition "nA != 0".


http://gerri

[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-03-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12521/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 02 Mar 2023 20:13:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-03-02 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/19569


Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() 
and regr_r2()

The linear regression functions fit an ordinary-least-squares regression line
to a set of number pairs which can be used both as aggregate and analytic 
functions.

regr_slope() takes two arguments of numeric type and returns the slope of the 
line.
regr_intercept() takes two arguments of numeric type and returns the y-intercept
of the regression line.
regr_r2() takes two arguments of numeric type and returns the coefficient
of determination (also called R-squared or goodness of fit) for the regression.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 990 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/1
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward 


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-03-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@442
PS1, Line 442: DoubleVal AggregateFunctions::RegrSlopeGetValue(FunctionContext* 
ctx, const StringVal& src) {
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@457
PS1, Line 457: DoubleVal AggregateFunctions::RegrSlopeFinalize(FunctionContext* 
ctx, const StringVal& src) {
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@468
PS1, Line 468: DoubleVal 
AggregateFunctions::RegrInterceptGetValue(FunctionContext* ctx, const 
StringVal& src) {
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@486
PS1, Line 486: DoubleVal 
AggregateFunctions::RegrInterceptFinalize(FunctionContext* ctx, const 
StringVal& src) {
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@692
PS1, Line 692: DoubleVal AggregateFunctions::Regr_r2GetValue(FunctionContext* 
ctx, const StringVal& src) {
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@712
PS1, Line 712: DoubleVal AggregateFunctions::Regr_r2Finalize(FunctionContext* 
ctx, const StringVal& src) {
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 02 Mar 2023 19:53:53 +
Gerrit-HasComments: Yes