[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 12 Mar 2021 16:13:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. IMPALA-10520: Implement ds_theta_intersect() function This function receives a set of serialized Apache DataSketches Theta sketches produced by ds_theta_sketch() and intersects them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and intersect them to get estimates based on the partitions the user is interested in related sketches. E.g.: SELECT ds_theta_estimate(ds_theta_intersect(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_theta_intersect() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_theta_intersect() on those sketches Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Reviewed-on: http://gerrit.cloudera.org:8080/17088 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 4 files changed, 182 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 6 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6958/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 12 Mar 2021 10:29:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 12 Mar 2021 10:29:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 12 Mar 2021 10:28:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8297/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 04 Mar 2021 03:43:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/17088/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17088/3//COMMIT_MSG@14 PS3, Line 14: > nit: not needed Done http://gerrit.cloudera.org:8080/#/c/17088/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test: http://gerrit.cloudera.org:8080/#/c/17088/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@271 PS3, Line 271: stimation, which is consistent : # with direct estimation of these sketches. > Could you add tests that cover the second part of this sentence so that we Done -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 04 Mar 2021 03:24:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Fucun Chu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. IMPALA-10520: Implement ds_theta_intersect() function This function receives a set of serialized Apache DataSketches Theta sketches produced by ds_theta_sketch() and intersects them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and intersect them to get estimates based on the partitions the user is interested in related sketches. E.g.: SELECT ds_theta_estimate(ds_theta_intersect(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_theta_intersect() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_theta_intersect() on those sketches Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 4 files changed, 182 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17088/4 -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. Patch Set 3: Code-Review+1 (2 comments) Thanks for this patch! In overall this looks great, I just had some minor comments. http://gerrit.cloudera.org:8080/#/c/17088/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17088/3//COMMIT_MSG@14 PS3, Line 14: an nit: not needed http://gerrit.cloudera.org:8080/#/c/17088/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test: http://gerrit.cloudera.org:8080/#/c/17088/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@271 PS3, Line 271: and checks if the intersection : # produces the same result as if these sketches were used separately to get the estimates Could you add tests that cover the second part of this sentence so that we can sew what ds_theta_intersect() gives when processing the sketches separately (and to see if they in fact match with the results of this test)? -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 02 Mar 2021 10:13:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8204/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Feb 2021 11:48:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Fucun Chu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. IMPALA-10520: Implement ds_theta_intersect() function This function receives a set of serialized Apache DataSketches Theta sketches produced by ds_theta_sketch() and intersects them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and intersect them to get an estimates based on the partitions the user is interested in related sketches. E.g.: SELECT ds_theta_estimate(ds_theta_intersect(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_theta_intersect() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_theta_intersect() on those sketches Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 4 files changed, 163 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17088/3 -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/17088/2/be/src/exprs/aggregate-functions.h File be/src/exprs/aggregate-functions.h: http://gerrit.cloudera.org:8080/#/c/17088/2/be/src/exprs/aggregate-functions.h@271 PS2, Line 271: static void DsThetaIntersectUpdate( > line too long (93 > 90) Done http://gerrit.cloudera.org:8080/#/c/17088/2/be/src/exprs/aggregate-functions.h@273 PS2, Line 273: static StringVal DsThetaIntersectSerialize(FunctionContext*, const StringVal& src); > line too long (92 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Feb 2021 11:28:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8198/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Feb 2021 02:07:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17088 ) Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17088/2/be/src/exprs/aggregate-functions.h File be/src/exprs/aggregate-functions.h: http://gerrit.cloudera.org:8080/#/c/17088/2/be/src/exprs/aggregate-functions.h@271 PS2, Line 271: static void DsThetaIntersectUpdate(FunctionContext*, const StringVal& src, StringVal* dst); line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/17088/2/be/src/exprs/aggregate-functions.h@273 PS2, Line 273: static void DsThetaIntersectMerge(FunctionContext*, const StringVal& src, StringVal* dst); line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Feb 2021 01:47:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10520: Implement ds theta intersect() function
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17088 Change subject: IMPALA-10520: Implement ds_theta_intersect() function .. IMPALA-10520: Implement ds_theta_intersect() function This function receives a set of serialized Apache DataSketches Theta sketches produced by ds_theta_sketch() and intersects them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and intersect them to get an estimates based on the partitions the user is interested in related sketches. E.g.: SELECT ds_theta_estimate(ds_theta_intersect(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_theta_intersect() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_theta_intersect() on those sketches Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test 4 files changed, 161 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17088/2 -- To view, visit http://gerrit.cloudera.org:8080/17088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97 Gerrit-Change-Number: 17088 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins