[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 22: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 13:06:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Reviewed-on: http://gerrit.cloudera.org:8080/19569 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 988 insertions(+), 8 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 23 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 21: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 21 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 08:54:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 22: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9629/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 08:55:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 22: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 08:55:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 21: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13829/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 21 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 08:29:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 988 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/21 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 21 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 21: (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/21/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/21/be/src/exprs/aggregate-functions-ir.cc@298 PS21, Line 298: // https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9 line too long (151 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 21 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 08:05:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 20: > > Quanlong, do you think we can specify a range for these values instead of > > requiring exact matches? Our tests allow the error of double to be no more than 10e-10. I think the second column results violate this. https://github.com/apache/impala/blob/05bc485851a255b1b00bdb1a1086ba5527877768/tests/common/test_result_verifier.py#L232 We can convert the second column to be regr_intercept(ss_sold_time_sk, ss_quantity) / 1 to scale down the error. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 20 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 23 Aug 2023 12:42:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 20: > Patch Set 20: > > Hi Pranav, do the expected values come from Hive or your Impala dev env? > > Quanlong, do you think we can specify a range for these values instead of > requiring exact matches? In both essentially. In impala: 0.0602719636627,51709.905412,1.87116649337e-08 In hive: 0.060271963662778746,51709.905411981366,1.8711664933729818E-8 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 20 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 22 Aug 2023 10:48:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 20: Hi Pranav, do the expected values come from Hive or your Impala dev env? Quanlong, do you think we can specify a range for these values instead of requiring exact matches? -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 20 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 22 Aug 2023 09:56:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 20: > Patch Set 20: > > > Patch Set 20: Verified-1 > > > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9607/ > > The failure is TestAggregationQueries.test_aggregation in > https://jenkins.impala.io/job/ubuntu-20.04-dockerised-tests/372 > > select regr_slope(ss_sold_time_sk, ss_quantity), > regr_intercept(ss_sold_time_sk, ss_quantity), > regr_r2(ss_sold_time_sk, ss_quantity) from tpcds.store_sales; > > (expected vs actual): > 0.06027196366276166,51709.90541198131,1.871166493371902e-08 != > 0.06027196366273494,51709.90541198241,1.871166493370322e-08 > > It seems a floating point error but it only occurs in the dockerised-test. > Are these expected values got from Hive, or from your local Impala runs? > @Pranav Hi, yes, these are the expected values. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 20 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 22 Aug 2023 08:33:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 20: > Patch Set 20: Verified-1 > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9607/ The failure is TestAggregationQueries.test_aggregation in https://jenkins.impala.io/job/ubuntu-20.04-dockerised-tests/372 select regr_slope(ss_sold_time_sk, ss_quantity), regr_intercept(ss_sold_time_sk, ss_quantity), regr_r2(ss_sold_time_sk, ss_quantity) from tpcds.store_sales; (expected vs actual): 0.06027196366276166,51709.90541198131,1.871166493371902e-08 != 0.06027196366273494,51709.90541198241,1.871166493370322e-08 It seems a floating point error but it only occurs in the dockerised-test. Are these expected values got from Hive, or from your local Impala runs? @Pranav -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 20 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 22 Aug 2023 07:55:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 20: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9607/ -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 20 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 21 Aug 2023 13:59:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 19: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 19 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 21 Aug 2023 09:37:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 20: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9607/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 20 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 21 Aug 2023 09:38:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 20: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 20 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 21 Aug 2023 09:38:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 19: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 19 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 21 Aug 2023 00:46:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 19: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13781/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 19 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 18 Aug 2023 20:35:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 19: (5 comments) > Patch Set 18: > > (5 comments) http://gerrit.cloudera.org:8080/#/c/19569/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19569/16//COMMIT_MSG@20 PS16, Line 20: f > Why is it needed? Why should the line be indented one space? Sorry, I misunderstood, I though a space will also be needed. Done! http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@316 PS18, Line 316: AllocBuffer(ctx, dst, dst->len); > You should initialise 'dst' with AllocBuffer(), otherwise 'dst' will not be Done http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@510 PS18, Line 510: AllocBuffer(ctx, dst, dst->len); > See L316. Done http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@516 PS18, Line 516: } > Nit: still empty line. Done http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@748 PS18, Line 748: if (UNLIKELY(dst->is_null)) { > See L316. Done -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 19 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 18 Aug 2023 20:10:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 988 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/19 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 19 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 19: (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/19/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/19/be/src/exprs/aggregate-functions-ir.cc@298 PS19, Line 298: // https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9 line too long (151 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 19 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 18 Aug 2023 20:11:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 18: (5 comments) http://gerrit.cloudera.org:8080/#/c/19569/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19569/16//COMMIT_MSG@20 PS16, Line 20: > I think its needed Why is it needed? Why should the line be indented one space? http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@316 PS18, Line 316: dst->ptr = ctx->Allocate(dst->len); You should initialise 'dst' with AllocBuffer(), otherwise 'dst' will not be set to null on failure. http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@510 PS18, Line 510: dst->ptr = ctx->Allocate(dst->len); See L316. http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@516 PS18, Line 516: Nit: still empty line. http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@748 PS18, Line 748: dst->ptr = ctx->Allocate(dst->len); See L316. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 18 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 18 Aug 2023 08:11:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 18: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13772/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 18 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 17 Aug 2023 19:58:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 17: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13771/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 17 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 17 Aug 2023 19:57:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 18: (4 comments) > Patch Set 18: > > (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19569/16//COMMIT_MSG@20 PS16, Line 20: > Nit: unnecessary space. I think its needed http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc@292 PS16, Line 292: i > Nit: no need for parentheses here. Done http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc@316 PS16, Line 316: dst->ptr = ctx->Allocate(dst->len); > We need to deal with allocation failure here, e.g. Done http://gerrit.cloudera.org:8080/#/c/19569/16/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/19569/16/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1372 PS16, Line 1372: List > nit: use 4 spaces indention Done -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 18 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 17 Aug 2023 19:35:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 987 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/18 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 18 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 987 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/17 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 17 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 18: (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/18/be/src/exprs/aggregate-functions-ir.cc@298 PS18, Line 298: // https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9 line too long (151 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 18 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 17 Aug 2023 19:34:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 17: (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/17/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/17/be/src/exprs/aggregate-functions-ir.cc@298 PS17, Line 298: // https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9 line too long (151 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 17 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 17 Aug 2023 19:32:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 16: (2 comments) http://gerrit.cloudera.org:8080/#/c/19569/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19569/16//COMMIT_MSG@20 PS16, Line 20: Nit: unnecessary space. http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc@292 PS16, Line 292: () Nit: no need for parentheses here. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 16 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 17 Aug 2023 10:43:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 16: (3 comments) > Patch Set 16: > > > Patch Set 16: > > > > (1 comment) > > Please ignore patch 15. I've removed 2 tests from the patch: > select s_store_sk, regr_slope(s_number_employees, s_floor_space) over > (partition by s_city order by s_store_sk > rows between 1 preceding and 1 following) from tpcds.store; > and > select s_store_sk, regr_intercept(s_number_employees, s_floor_space) over > (partition by s_city order by s_store_sk > rows between 1 preceding and 1 following) from tpcds.store; > as they were resulting in wrong results when compared to hive. The reason for > the wrong results does not correspond to the code but is because of different > implementaion for evaluating window functions in Hive and Impala. > This was verified by runnig a query independent of regression functions. > A separate jira can be filed to look further into it. I filed IMPALA-12347 to track the different results with Hive due to floating point computation error. http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc@316 PS16, Line 316: dst->ptr = ctx->Allocate(dst->len); We need to deal with allocation failure here, e.g. https://github.com/apache/impala/blob/8638255e5074f1342dfc452bca39f649a76612d6/be/src/exprs/aggregate-functions-ir.cc#L1678-L1682 CorrInit() and CovarInit() are also missing this. http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc@508 PS16, Line 508: nit: empty line http://gerrit.cloudera.org:8080/#/c/19569/16/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/19569/16/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1372 PS16, Line 1372: nit: use 4 spaces indention -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 16 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 08 Aug 2023 08:35:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13712/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 16 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 04 Aug 2023 07:34:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 16: > Patch Set 16: > > (1 comment) Please ignore patch 15. I've removed 2 tests from the patch: select s_store_sk, regr_slope(s_number_employees, s_floor_space) over (partition by s_city order by s_store_sk rows between 1 preceding and 1 following) from tpcds.store; and select s_store_sk, regr_intercept(s_number_employees, s_floor_space) over (partition by s_city order by s_store_sk rows between 1 preceding and 1 following) from tpcds.store; as they were resulting in wrong results when compared to hive. The reason for the wrong results does not correspond to the code but is because of different implementaion for evaluating window functions in Hive and Impala. This was verified by runnig a query independent of regression functions. A separate jira can be filed to look further into it. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 16 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 04 Aug 2023 07:19:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 975 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/16 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 16 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 16: (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/16/be/src/exprs/aggregate-functions-ir.cc@298 PS16, Line 298: // https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9 line too long (151 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 16 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 04 Aug 2023 07:11:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 14: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 14 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 10 Jul 2023 21:19:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 14: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9489/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 14 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 10 Jul 2023 17:09:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 14: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 14 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 29 Jun 2023 16:32:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 14: (2 comments) > Patch Set 14: Code-Review+1 > > Thanks Pranav! http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@476 PS10, Line 476: const StringVal& src) { > @Daniel, can we discuss about this offline. Hi @Daniel, let me know about this. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@706 PS10, Line 706: c > I checked the behavior of a few queries with hive and postgreSQL, our code Hi @Daniel, before we freeze it, can you please just give it one more look to make sure we covered everything, I think we did, but just to be sure. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 14 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 26 Jun 2023 08:33:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 14: Code-Review+1 Thanks Pranav! -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 14 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 26 Jun 2023 07:09:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13387/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 14 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 25 Jun 2023 19:13:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 14: (7 comments) > Patch Set 14: > > (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@729 PS10, Line 729: // r = covar / (n) > I don't think that's what Kurt meant as this will lead to a memory leak for Done http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@420 PS13, Line 420: gA > This still copies a source field which will not be modified. If we eliminat Done http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@420 PS13, Line 420: do > Nit: too much indentation here and in the following lines. Done http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@444 PS13, Line 444: are very small, they c > We usually capitalise constants. Also, instead of "estimator", I think "thr Done http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@467 PS13, Line 467: | sta > Use the newly defined constant also here. Done http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@478 PS13, Line 478: RegrInterceptGetValue(ctx, src); > Memory leak, see comment on L729. Done http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions.h File be/src/exprs/aggregate-functions.h: http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions.h@40 PS13, Line 40: FLOATING_POINT_ERROR_T > About the name, see the comment at aggregate-functions-ir.cc L444. Done -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 14 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 25 Jun 2023 18:54:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/14/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/14/be/src/exprs/aggregate-functions-ir.cc@298 PS14, Line 298: // https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9 line too long (151 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 14 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 25 Jun 2023 18:51:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,019 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/14 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 14 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 13: (7 comments) http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@729 PS10, Line 729: // population covariance between two columns of numeric types respectively using > Done I don't think that's what Kurt meant as this will lead to a memory leak for NULLs - we allocate a buffer in RegrSlopeInit() but don't free it now. I think Kurt meant something like this: { DoubleVal r = src.is_null ? DoubleVal::null() : Regr_r2GetValue(ctx, src); ctx->Free(src.ptr); return r; } http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@420 PS13, Line 420: nB This still copies a source field which will not be modified. If we eliminate the copies of the source variables, this should be eliminated too. http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@420 PS13, Line 420: Nit: too much indentation here and in the following lines. http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@444 PS13, Line 444: FloatingErrorEstimator We usually capitalise constants. Also, instead of "estimator", I think "threshold" would be better, so for example: FLOATING_POINT_ERROR_THRESHOLD or FP_ERROR_THRESHOLD. http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@467 PS13, Line 467: -1E-8 Use the newly defined constant also here. http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@478 PS13, Line 478: if (UNLIKELY(src.is_null)) { Memory leak, see comment on L729. http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions.h File be/src/exprs/aggregate-functions.h: http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions.h@40 PS13, Line 40: FloatingErrorEstimator About the name, see the comment at aggregate-functions-ir.cc L444. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 20 Jun 2023 10:05:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13296/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 14 Jun 2023 23:03:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/13/be/src/exprs/aggregate-functions-ir.cc@298 PS13, Line 298: // https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9 line too long (151 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 14 Jun 2023 22:41:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,024 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/13 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 12: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/13292/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 12 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 14 Jun 2023 20:09:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,024 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/12 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 12 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/12/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/12/be/src/exprs/aggregate-functions-ir.cc@298 PS12, Line 298: // https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9 line too long (151 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 12 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 14 Jun 2023 19:50:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 11: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/13291/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 11 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 14 Jun 2023 18:54:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 11: (23 comments) > Patch Set 11: > > (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@712 PS9, Line 712: as 0. > I see. In this case I don't think we should divide 'independent_var' and 'd I see, but if the values become zero due to floating point rounding, shouldn't we deal with those specifically, as in return null or 1. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@317 PS10, Line 317: *(reinterpret_cast(dst->ptr)) = {}; > Probably faster to use: Sure, after running a pseudo code, it does perform a little better. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@342 PS10, Line 342: *(reinterpret_cast(sizeof(RegrSlopeState))) = {}; > Probably faster to use: Sure, after running a pseudo code, it does perform a little better. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@386 PS10, Line 386: if (tm_src1.ToSubsecondUnixTime(UTCPTR, &val1) && > Can UtcToUnixTime be used here? Even better to add a function to convert di Unfortunately no, since UtcToUnixTime takes a time_t* argument which would defeat the purpose of the whole operation. ToSubsecondUnixTime() takes a double* argument which converts the timestamp into double values. These values can then be used in the standard update and remove functions. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@402 PS10, Line 402: if (tm_src1.ToSubsecondUnixTime(UTCPTR, &val1) && > Can UtcToUnixTime be used here? Unfortunately no, since UtcToUnixTime takes a time_t* argument which would defeat the purpose of the whole operation. ToSubsecondUnixTime() takes a double* argument which converts the timestamp into double values. These values can then be used in the standard update and remove functions. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@410 PS10, Line 410: const RegrSlopeState* src_state = reinterpret_cast(src.ptr); > declare const Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@415 PS10, Line 415: int64_t nA = dst_state->count; > It's often faster to not copy these member references to variables, especia Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@423 PS10, Line 423: > src fields are not updated and do not need to be copied. Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@443 PS10, Line 443: // Since these values are very small, they can be ignored and rounded to 0. > Declare as const Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@451 PS10, Line 451: DoubleVal AggregateFunctions::RegrSlopeFinalize(FunctionContext* ctx, > eliminate this variable. Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@472 PS10, Line 472: stat > define as constant or use an existing one Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@476 PS10, Line 476: DoubleVal AggregateFunctions::RegrInterceptFinalize(FunctionContext* ctx, > Check if removing these variables improves code generation. @Daniel, can we discuss about this offline. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@683 PS10, Line 683: const StringVal& src) { > Add one more space so that "1" aligns with "NULL". Also applies to the next Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@683 PS10, Line 683: r > I don't see much value in surrounding "var_pop(y) = 0" and "var_pop(x) != 0 Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@689 PS10, Line 689: // CorrUpdate(), which we use to produce the intermediate values, has the opposite > Declare const Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@700 PS10, Line 700: DCHECK(independent_var >= FloatingErrorEstimator); > Nit: I would leave an empty line before this. Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@706 PS10, Line 706: > I've done some experiments with Hive and indeed we get the results when we I checked the behavior of a few queries with hive and postgreSQL, our code is in sync with the
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,024 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/11 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 11 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/11/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/11/be/src/exprs/aggregate-functions-ir.cc@298 PS11, Line 298: // https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9 line too long (151 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 11 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 14 Jun 2023 18:34:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 10: (15 comments) http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@317 PS10, Line 317: memset(dst->ptr, 0, dst->len); Probably faster to use: *(reinterpret_cast(buf)) = {}; http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@342 PS10, Line 342: memset(state, 0, sizeof(RegrSlopeState)); Probably faster to use: *(reinterpret_cast(buf)) = {}; http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@386 PS10, Line 386: if (tm_src1.ToSubsecondUnixTime(UTCPTR, &val1) && Can UtcToUnixTime be used here? Even better to add a function to convert directly. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@402 PS10, Line 402: if (tm_src1.ToSubsecondUnixTime(UTCPTR, &val1) && Can UtcToUnixTime be used here? http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@410 PS10, Line 410: RegrSlopeState* src_state = reinterpret_cast(src.ptr); declare const http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@415 PS10, Line 415: int64_t nA = dst_state->count; It's often faster to not copy these member references to variables, especially on CISC processors. If they are kept, the else branch can be eliminated and nB can be moved after the return. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@423 PS10, Line 423: double yavgB = src_state->yavg; src fields are not updated and do not need to be copied. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@443 PS10, Line 443: RegrSlopeState* state = reinterpret_cast(src.ptr); Declare as const http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@451 PS10, Line 451: double regrSlope = state->covar / state->xvar; eliminate this variable. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@472 PS10, Line 472: -1E-8 define as constant or use an existing one http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@476 PS10, Line 476: double regrSlope = state->covar / state->xvar; Check if removing these variables improves code generation. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@689 PS10, Line 689: CorrState* state = reinterpret_cast(src.ptr); Declare const http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@729 PS10, Line 729: ctx->Free(src.ptr); Changing function to have one Free call should reduce code size http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@751 PS10, Line 751: memset(dst->ptr, 0, dst->len); Probably faster to use: *(reinterpret_cast(buf)) = {}; http://gerrit.cloudera.org:8080/#/c/19569/10/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/19569/10/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1344 PS10, Line 1344: nit: extra WS -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 May 2023 16:31:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 10: (8 comments) Thanks, it is getting better and better. We'll still need to find out more about the problem of negative covariances of the dependent variable. http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@712 PS9, Line 712: d_squared' can only be 0 if eit > No, because covariance will also get divided by state->count which will can I see. In this case I don't think we should divide 'independent_var' and 'dependent_var' in the other places either, they are just checks for zero-ness. If they are not zero before division it means they shouldn't be zero after division either, the only way they may become zero is by floating point rounding. But we never actually use the divided values later, so we don't have to treat them as zero in this case. This involves L705-706 and L708 in Patch Set 10. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@683 PS10, Line 683: // 1 if (var_pop(y) = 0) and (var_pop(x) != 0), else Add one more space so that "1" aligns with "NULL". Also applies to the next line. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@683 PS10, Line 683: ( I don't see much value in surrounding "var_pop(y) = 0" and "var_pop(x) != 0" with parentheses. What I thought of was ... 1 if var_pop(y) = 0 (and var_pop(x) != 0), else ... This puts "var_pop(x) != 0" in parentheses because it is redundant - it is implied by the previous line as it ends with "else". I would actually even omit "var_pop(x) != 0". http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@700 PS10, Line 700: // dependent_var and independent_var become negative in certain cases due to floating Nit: I would leave an empty line before this. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@706 PS10, Line 706: < I've done some experiments with Hive and indeed we get the results when we treat this case like this, but it needs more investigation because it is not clear why Hive returns NULL in this case, for example for this query: select s_store_sk, regr_r2(s_number_employees, s_floor_space) over (partition by s_city order by s_store_sk rows between 1 preceding and 1 following) from tpcds.store; It doesn't seem to handle negative values here: https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBinarySetFunctions.java#L331 I'll try to do some more digging. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@713 PS10, Line 713: are This is the one that should be "is", as in "if either 'dependent_var' or 'independent_var' is 0". In the next sentence it should be "are", as in "if both 'dependent_var' and 'independent_var' are very small". http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2444 PS7, Line 2444: dependednt It is the dependent variable that becomes negative here. http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2695 PS7, Line 2695: independe It is the independent var that becomes negative here. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 May 2023 13:05:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13068/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 17 May 2023 20:37:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 10: (6 comments) > Patch Set 9: > > (7 comments) http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@702 PS7, Line 702: ignored and rounded > I'll try to reproduce the situation myself and see what Hive does. Sure http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@683 PS9, Line 683: = > There's no matching opening parenthesis. I think the opening paren should c Done http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@685 PS9, Line 685: // where y and x are the dependent and independent variables > Fits on the previous line, after the if. Done http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@692 PS9, Line 692: // 'x_var'. This is to avoid confusion, because for regr_r2() the dependent variable is : // the first parameter and the independent variable is the second parameter, but : // CorrUpdate(), which we use to produce the intermediate values, has the o > This could come after assigning 'dependent_var' and 'independent_var', befo Done http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@712 PS9, Line 712: d_squared' can only be 0 if eit > Shouldn't we also divide these by state->count? In that case we could inclu No, because covariance will also get divided by state->count which will cancel the divisions of dependent_var and independent_var. http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@714 PS9, Line 714: ind > Nit: is. Done -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 17 May 2023 20:16:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,023 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/10 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@298 PS10, Line 298: // https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9 line too long (151 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 17 May 2023 20:16:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 9: (7 comments) http://gerrit.cloudera.org:8080/#/c/19569/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19569/9//COMMIT_MSG@18 PS9, Line 18: regr_r2() takes two arguments of numeric type and returns the coefficient This line is still too long (73 vs. 72). http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@702 PS7, Line 702: > To cross check the results, I did run the queries on postgreSQL as well, an I'll try to reproduce the situation myself and see what Hive does. http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@683 PS9, Line 683: ) There's no matching opening parenthesis. I think the opening paren should come before the "and var_pop(x)". http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@685 PS9, Line 685: // (var_pop(y) != 0 and var_pop(x) != 0) Fits on the previous line, after the if. http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@692 PS9, Line 692: // dependent_var and independent_var become negative in certain cases due to floating : // point rounding error. : // Since these values are very small, they can be ignored and rounded to 0. This could come after assigning 'dependent_var' and 'independent_var', before the DCHECKs. http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@712 PS9, Line 712: dependent_var * independent_var Shouldn't we also divide these by state->count? In that case we could include the division from the beginning, i.e. the definition of 'dependent_var' and 'independent_var'. http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@714 PS9, Line 714: are Nit: is. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 9 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 12 May 2023 15:15:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13008/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 9 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 11 May 2023 21:50:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,024 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/9 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 9 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@298 PS9, Line 298: // https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9 line too long (151 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 9 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 11 May 2023 21:31:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12977/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 8 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 09 May 2023 17:57:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/8/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/8/be/src/exprs/aggregate-functions-ir.cc@298 PS8, Line 298: // https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9 line too long (151 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 8 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 09 May 2023 17:38:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 8: (11 comments) > Patch Set 7: > > (10 comments) http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@299 PS7, Line 299: > line too long (151 > 90) Since its a hyperlink, it should be in the same line for easy access. http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@678 PS7, Line 678: // CorrState is reused for implementing regr_r2. > In source files the max line width is 90. Many lines here are shorter than Noted, done! http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@679 PS7, Line 679: ar > Nit: no need for this 'is'. Done http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@684 PS7, Line 684: // power(corr(y, x),2) if > I think using 'dependent' and 'independent' here in the doc comment is over Done http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@685 PS7, Line 685: p > You left out var_pop. Done http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@688 PS7, Line 688: DoubleVal AggregateFunctions::Regr_r2GetValue(FunctionContext* ctx, > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@698 PS7, Line 698: // CorrUpdate(), which we use to produce the intermediate values, has the opposite > We should add a comment describing why we assign 'state->xvar' to 'dependen Done http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@702 PS7, Line 702: > This still doesn't seem right. If 'dependent_var' is less than zero, then w To cross check the results, I did run the queries on postgreSQL as well, and this is how its functioning there, we can discuss more about it offline. http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@708 PS7, Line 708: lse if (dependent_var == 0.0) { : return 1; > This is unnecessary here, the following lines contain this information. Done http://gerrit.cloudera.org:8080/#/c/19569/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/19569/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1344 PS7, Line 1344: > Nit: trailing whitespace, should be removed. There's usually a line break before the definitions of new functions. Please let me know if you mean something else. http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2398 PS7, Line 2398: (state->xvar <= 0.0), without which the be > This is not accurate, we shouldn't handle these cases as the same. See comm Done, let me know if you think that is okay. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 8 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 09 May 2023 17:37:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,023 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/8 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 8 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 7: (10 comments) http://gerrit.cloudera.org:8080/#/c/19569/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19569/7//COMMIT_MSG@18 PS7, Line 18: regr_r2() takes two arguments of numeric type and returns the coefficient This line is still too long (73 vs. 73). http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@678 PS7, Line 678: // Implementation of regr_r2(): In source files the max line width is 90. Many lines here are shorter than that for no reason. We should normally not use shorter lines, only if a line break has a reason, for example a new paragraph. http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@679 PS7, Line 679: is Nit: no need for this 'is'. http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@684 PS7, Line 684: // regr_2(dependent, independent) = NULL if var_pop(independent) = 0, else I think using 'dependent' and 'independent' here in the doc comment is overkill, you see 'y' and 'x' in most descriptions, where 'y' is the dependent var and 'x' is the independent var. http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@685 PS7, Line 685: ( You left out var_pop. http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@698 PS7, Line 698: double dependent_var = state->xvar; We should add a comment describing why we assign 'state->xvar' to 'dependent' and not to 'independent' (and vice versa). Something like this: In this function we use 'dependent_var' and 'independent_var' instead of 'y_var' and 'x_var'. This is to avoid confusion, because for regr_r2() the dependent variable is the first parameter and the independent variable is the second parameter, but CorrUpdate(), which we use to produce the intermediate values, has the opposite order. Our aggregate function framework passes the variables in order to CorrUpdate(), so in CorrUpdate() 'x' corresponds to the dependent variable of regr_r2() and 'y' to the independent variable of regr_r2(). http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@702 PS7, Line 702: dependent_var < 0.0 This still doesn't seem right. If 'dependent_var' is less than zero, then we should treat it as zero, but if at the same time 'independent_var' is not zero, then we should return 1, like on L705. Maybe it's easier if we do the following: First check if independent_var is negative, if it is we set it to 0. Then we do the same with dependent_var. We remove the checks for negativity from L702, so we only have (state->count < 2 || independent_var == 0) as the condition. http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@708 PS7, Line 708: / Returns null if dependent_var and independent_var are very small and : // if their product might become 0 because of floating point underflow. This is unnecessary here, the following lines contain this information. http://gerrit.cloudera.org:8080/#/c/19569/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/19569/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1344 PS7, Line 1344: Nit: trailing whitespace, should be removed. http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2398 PS7, Line 2398: (state->xvar <= 0.0 || state->yvar <= 0.0) This is not accurate, we shouldn't handle these cases as the same. See comment in AggregateFunctions::Regr_r2GetValue(). Applies also to the other tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 27 Apr 2023 14:47:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12843/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 24 Apr 2023 08:12:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,021 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/7 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@299 PS7, Line 299: // https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9 line too long (151 > 90) http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@688 PS7, Line 688: // where dependent and independent are the dependent and independent variables respectively. line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 24 Apr 2023 07:53:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12748/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 5 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 04 Apr 2023 22:42:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 993 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/5 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 5 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/5/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/5/be/src/exprs/aggregate-functions-ir.cc@299 PS5, Line 299: // https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9 line too long (151 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 5 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 04 Apr 2023 22:23:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 4: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/12747/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 4 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 04 Apr 2023 21:31:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 4: (54 comments) > Patch Set 4: > > (1 comment) A few more tests are yet to be added. http://gerrit.cloudera.org:8080/#/c/19569/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19569/2//COMMIT_MSG@18 PS2, Line 18: > You could have a newline after the ':'. Done http://gerrit.cloudera.org:8080/#/c/19569/2//COMMIT_MSG@18 PS2, Line 18: > Nit: Hive (capital H). Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@291 PS1, Line 291: // Implementation of regr_slope() and regr_intercept(): > Did you use the Oracle link as specification? Did you take the implementati Yes, I've used the Oracle link just for specification, couldn't find these exact formulas on any paper but their constituent parts like xvar, yvar, covar etc are mentioned in papers and have a similar implementation as correlation and covariance functions. http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@294 PS1, Line 294: he regression slope of the line and the y-int > Done Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@294 PS1, Line 294: return > Nit: return Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@296 PS1, Line 296: a set of nu > Done Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@297 PS1, Line 297: // analytic functions. > Done Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@303 PS1, Line 303: // regr_intercept() formula used: > Done Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@308 PS1, Line 308: a > Done Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@317 PS1, Line 317: dst->len = sizeof(RegrSlopeState); > Not addressed yet. As discussed, to maintain uniformity across all the other aggregate functions, it'd be better to stick with memset(). Open for discussion. http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@341 PS1, Line 341: > Done Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@344 PS1, Line 344: memset(state, 0, sizeof(RegrSlopeState)); > Not addressed yet. As discussed, to maintain uniformity across all the other aggregate functions, it'd be better to stick with memset(). Open for discussion. http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@418 PS1, Line 418: int64_t nB = src_state->count; > Done Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@421 PS1, Line 421: return; > Could use else if, and then no need for the condition "nA != 0". Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@442 PS1, Line 442: > line too long (93 > 90) Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@452 PS1, Line 452: } > Done Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@453 PS1, Line 453: regr > Done Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@457 PS1, Line 457: DoubleVal AggregateFunctions::RegrSlopeFinalize(FunctionContext* ctx, > line too long (93 > 90) Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@459 PS1, Line 459: > Done Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@468 PS1, Line 468: DoubleVal AggregateFunctions::RegrInterceptGetValue(FunctionContext* ctx, > line too long (97 > 90) Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@479 PS1, Line 479: double regrIntercept = state->yavg - (regrSlope * state->xavg); > Done Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@480 PS1, Line 480: DoubleVal > Done Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@486 PS1, Line 486: ctx->Free(src.ptr); > line too long (97 > 90) Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@488 PS1, Line 488: > Done Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@690 PS1, Line 690: > This condition is superfluous, it follows from the ones above, but in case Done http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@691 PS1, Line 691: DoubleVal
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 992 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/4 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 4 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/4/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/4/be/src/exprs/aggregate-functions-ir.cc@299 PS4, Line 299: // https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9 line too long (151 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 4 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 04 Apr 2023 21:20:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12733/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 03 Apr 2023 07:22:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with hive and FENG. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 994 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/2 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 1: (34 comments) Thanks Pranav! I wonder if we could create some script that tested these functions against some other database which we would like to be compatible with, for example Hive. We should run the same queries in both engines and expect the same results. I'm not sure we can insert it into our test framework but it would be good if we could at least manually run a test like this before adding these functions. The tests in aggregation.test would be a good starting point for the queries to test against Hive (or Postrgesql etc.). http://gerrit.cloudera.org:8080/#/c/19569/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19569/1//COMMIT_MSG@9 PS1, Line 9: The linear regression functions fit an ordinary-least-squares regression line Lines in the commit message should be at most 72 chars long. http://gerrit.cloudera.org:8080/#/c/19569/1//COMMIT_MSG@10 PS1, Line 10: pairs which It would be clearer this way: "... pairs. They can be used ...". http://gerrit.cloudera.org:8080/#/c/19569/1//COMMIT_MSG@16 PS1, Line 16: of determination (also called R-squared or goodness of fit) for the regression. Please add a "Testing". http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@291 PS1, Line 291: // Implementation of regr_slope() and regr_intercept(): Did you use some source for either the implementation or the specification of these regression functions (i.e. some description you based your implementation on)? If so, please reference it here. http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@294 PS1, Line 294: the regression slope and regression intercept Now this information is written twice in this comment: here and starting on L297, which is a bit more detailed. I propose to move the more detailed version here and remove it from around L297. Something like this: "... two arguments of numeric types and return the regression slope of the line and the y-intercept of the regression line respectively." http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@294 PS1, Line 294: returns Nit: return http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@296 PS1, Line 296: pairs which Like in the commit message, "... pairs. They can be used ..." would be clearer. http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@297 PS1, Line 297: // regr_slope() returns the slope of the line while regr_intercept() returns the See comment on L294. http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@303 PS1, Line 303: // // where x and y are the dependent and independent variables respectively. The convention is that 'y' is the dependent variable and 'x' is the independent variable. The first argument of these functions is the dependent variable (which should be 'y'). (See for example https://docs.snowflake.com/en/sql-reference/functions/regr_intercept). We should stick to this convention. Also, the order of the arguments should be clearly described here, something like this: "regr_slope(y, x) = covar_pop(x,y) / var_pop(x) where y is the dependent variable and x is the independent variable." http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@308 PS1, Line 308: n 'n' is not introduced here, it should be "'count' times the variance of ...". Applies also to the following line (L309). http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@317 PS1, Line 317: memset(dst->ptr, 0, dst->len); We could have a reset() member function in RegrSlopeState that sets each variable to 0. http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@341 PS1, Line 341: double deltaX = x - state->xavg; Could move the calculation of the deltas to the else branch, they are only used there. http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@344 PS1, Line 344: memset(state, 0, sizeof(RegrSlopeState)); Could use state->reset(), see L317. http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@418 PS1, Line 418: memcpy(dst_state, src_state, sizeof(RegrSlopeState)); Could use the copy assignment operator (*dst_state = *src_state) instead of mempcy. http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@421 PS1, Line 421: if (nA != 0 && nB != 0) { Could use else if, and then no need for the condition "nA != 0". http://gerri
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12521/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 1 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 02 Mar 2023 20:13:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19569 Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs which can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 990 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/1 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 1 Gerrit-Owner: Anonymous Coward
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@442 PS1, Line 442: DoubleVal AggregateFunctions::RegrSlopeGetValue(FunctionContext* ctx, const StringVal& src) { line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@457 PS1, Line 457: DoubleVal AggregateFunctions::RegrSlopeFinalize(FunctionContext* ctx, const StringVal& src) { line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@468 PS1, Line 468: DoubleVal AggregateFunctions::RegrInterceptGetValue(FunctionContext* ctx, const StringVal& src) { line too long (97 > 90) http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@486 PS1, Line 486: DoubleVal AggregateFunctions::RegrInterceptFinalize(FunctionContext* ctx, const StringVal& src) { line too long (97 > 90) http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@692 PS1, Line 692: DoubleVal AggregateFunctions::Regr_r2GetValue(FunctionContext* ctx, const StringVal& src) { line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@712 PS1, Line 712: DoubleVal AggregateFunctions::Regr_r2Finalize(FunctionContext* ctx, const StringVal& src) { line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 1 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 02 Mar 2023 19:53:53 + Gerrit-HasComments: Yes