[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17048 ) Change subject: IMPALA-10467: Implement ds_theta_union() function .. IMPALA-10467: Implement ds_theta_union() function This function receives a set of serialized Apache DataSketches Theta sketches produced by ds_theta_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_theta_estimate(ds_theta_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_theta_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_theta_union() on those sketches Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 Reviewed-on: http://gerrit.cloudera.org:8080/17048 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/theta_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test M tests/query_test/test_datasketches.py 7 files changed, 152 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 Gerrit-Change-Number: 17048 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17048 ) Change subject: IMPALA-10467: Implement ds_theta_union() function .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 Gerrit-Change-Number: 17048 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 19 Feb 2021 13:32:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17048 ) Change subject: IMPALA-10467: Implement ds_theta_union() function .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8168/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 Gerrit-Change-Number: 17048 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 19 Feb 2021 08:12:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17048 ) Change subject: IMPALA-10467: Implement ds_theta_union() function .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 Gerrit-Change-Number: 17048 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 19 Feb 2021 07:50:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17048 ) Change subject: IMPALA-10467: Implement ds_theta_union() function .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6902/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 Gerrit-Change-Number: 17048 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 19 Feb 2021 07:50:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17048 ) Change subject: IMPALA-10467: Implement ds_theta_union() function .. Patch Set 2: Code-Review+2 Thanks for implementing this! It seems that adding new and new DataSketches functionality is sometimes more copy-paste and names rewrite than actually implementing something new :) -- To view, visit http://gerrit.cloudera.org:8080/17048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 Gerrit-Change-Number: 17048 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 19 Feb 2021 07:49:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function
Fucun Chu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17048 ) Change subject: IMPALA-10467: Implement ds_theta_union() function .. IMPALA-10467: Implement ds_theta_union() function This function receives a set of serialized Apache DataSketches Theta sketches produced by ds_theta_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_theta_estimate(ds_theta_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_theta_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_theta_union() on those sketches Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/theta_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test M tests/query_test/test_datasketches.py 7 files changed, 152 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17048/2 -- To view, visit http://gerrit.cloudera.org:8080/17048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 Gerrit-Change-Number: 17048 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17048 ) Change subject: IMPALA-10467: Implement ds_theta_union() function .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8111/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 Gerrit-Change-Number: 17048 Gerrit-PatchSet: 1 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 10 Feb 2021 01:52:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17048 Change subject: IMPALA-10467: Implement ds_theta_union() function .. IMPALA-10467: Implement ds_theta_union() function This function receives a set of serialized Apache DataSketches Theta sketches produced by ds_theta_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_theta_estimate(ds_theta_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_theta_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_theta_union() on those sketches Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/theta_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test M tests/query_test/test_datasketches.py 7 files changed, 162 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17048/1 -- To view, visit http://gerrit.cloudera.org:8080/17048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 Gerrit-Change-Number: 17048 Gerrit-PatchSet: 1 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins