[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 23 Jul 2020 22:20:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. IMPALA-9633: Implement ds_hll_union() This function receives a set of sketches produced by ds_hll_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_hll_estimate(ds_hll_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Note, currently there is a known limitation of unioning string types where some input sketches come from Impala and some from Hive. In this case if there is an overlap in the input data used by Impala and by Hive this overlapping data is still counted twice due to some string representation difference between Impala and Hive. For more details see: https://issues.apache.org/jira/browse/IMPALA-9939 Testing: - Apart from the automated tests I added to this patch I also tested ds_hll_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_hll_union() on those sketches. Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Reviewed-on: http://gerrit.cloudera.org:8080/16095 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exprs/CMakeLists.txt M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h A be/src/exprs/datasketches-common.cc A be/src/exprs/datasketches-common.h M be/src/exprs/datasketches-functions-ir.cc M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/hll_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test M tests/query_test/test_datasketches.py 11 files changed, 271 insertions(+), 29 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 9 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6175/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 23 Jul 2020 17:01:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 8: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6171/ -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 23 Jul 2020 15:15:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 23 Jul 2020 11:24:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6171/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 23 Jul 2020 11:24:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 23 Jul 2020 11:15:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6568/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 13 Jul 2020 08:15:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6567/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 13 Jul 2020 08:13:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/16095/5/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test: http://gerrit.cloudera.org:8080/#/c/16095/5/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@202 PS5, Line 202: > Can you also add a union test with hll_sketches_from_hive? Sure, Done -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 13 Jul 2020 07:48:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16095 to look at the new patch set (#7). Change subject: IMPALA-9633: Implement ds_hll_union() .. IMPALA-9633: Implement ds_hll_union() This function receives a set of sketches produced by ds_hll_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_hll_estimate(ds_hll_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Note, currently there is a known limitation of unioning string types where some input sketches come from Impala and some from Hive. In this case if there is an overlap in the input data used by Impala and by Hive this overlapping data is still counted twice due to some string representation difference between Impala and Hive. For more details see: https://issues.apache.org/jira/browse/IMPALA-9939 Testing: - Apart from the automated tests I added to this patch I also tested ds_hll_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_hll_union() on those sketches. Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 --- M be/src/exprs/CMakeLists.txt M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h A be/src/exprs/datasketches-common.cc A be/src/exprs/datasketches-common.h M be/src/exprs/datasketches-functions-ir.cc M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/hll_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test M tests/query_test/test_datasketches.py 11 files changed, 271 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16095/7 -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16095 to look at the new patch set (#6). Change subject: IMPALA-9633: Implement ds_hll_union() .. IMPALA-9633: Implement ds_hll_union() This function receives a set of sketches produced by ds_hll_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_hll_estimate(ds_hll_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Note, currently there is a known limitation of unioning string types where some input sketches come from Impala and some from Hive. In this case if there is an overlap in the input data used by Impala and by Hive this overlapping data is still counted twice due to some string representation difference between Impala and Hive. For more details see: https://issues.apache.org/jira/browse/IMPALA-9939 Testing: - Apart from the automated tests I added to this patch I also tested ds_hll_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_hll_union() on those sketches. Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 --- M be/src/exprs/CMakeLists.txt M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h A be/src/exprs/datasketches-common.cc A be/src/exprs/datasketches-common.h M be/src/exprs/datasketches-functions-ir.cc M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test M tests/query_test/test_datasketches.py 10 files changed, 271 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16095/6 -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 5: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/16095/5/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test: http://gerrit.cloudera.org:8080/#/c/16095/5/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@202 PS5, Line 202: Can you also add a union test with hll_sketches_from_hive? -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 10 Jul 2020 11:20:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6539/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 09 Jul 2020 10:02:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/16095/4/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16095/4/be/src/exprs/aggregate-functions-ir.cc@1736 PS4, Line 1736: tten by DeserializeHllSketch( > Do these configs matter, or they are overwritten during deserialization? My These settings are overwritten in DeserializeHllSketch() to repesent the ones coming from the serialized sketch. Added a comment. http://gerrit.cloudera.org:8080/#/c/16095/4/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/16095/4/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1336 PS4, Line 1336: Lists.newArrayList(Type.STRING), Type.STRING, Type.STRING, : prefix + "14DsHllUnionInitEPN10impala_udf15FunctionContextEPNS1_9StringValE", : prefix + : "16DsHllUnionUpdateEPN10impala_udf15FunctionContextERKNS1_9StringValEPS4_", : prefix + : "15DsHllUnionMergeEPN10impala_udf15FunctionContextERKNS1_9StringValEPS4_", : prefix + > nit: indentation Done http://gerrit.cloudera.org:8080/#/c/16095/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test: http://gerrit.cloudera.org:8080/#/c/16095/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@163 PS2, Line 163: from hll_sketches_from_hive; > Add a test where the input data set contains valid sketches and nulls. Expe Done -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 09 Jul 2020 09:36:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16095 to look at the new patch set (#5). Change subject: IMPALA-9633: Implement ds_hll_union() .. IMPALA-9633: Implement ds_hll_union() This function receives a set of sketches produced by ds_hll_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_hll_estimate(ds_hll_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_hll_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_hll_union() on those sketches. Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 --- M be/src/exprs/CMakeLists.txt M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h A be/src/exprs/datasketches-common.cc A be/src/exprs/datasketches-common.h M be/src/exprs/datasketches-functions-ir.cc M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test 8 files changed, 240 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16095/5 -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/16095/4/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16095/4/be/src/exprs/aggregate-functions-ir.cc@1736 PS4, Line 1736: DS_SKETCH_CONFIG, DS_HLL_TYPE Do these configs matter, or they are overwritten during deserialization? My understanding is that we should be able to union sketches with different configs. See https://datasketches.apache.org/docs/Architecture/SketchCriteria.html / Mergeable With Different Size Parameters http://gerrit.cloudera.org:8080/#/c/16095/4/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/16095/4/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1336 PS4, Line 1336: Lists.newArrayList(Type.STRING), Type.STRING, Type.STRING, : prefix + "14DsHllUnionInitEPN10impala_udf15FunctionContextEPNS1_9StringValE", : prefix + "16DsHllUnionUpdateEPN10impala_udf15FunctionContextERKNS1_9StringValEPS4_", : prefix + "15DsHllUnionMergeEPN10impala_udf15FunctionContextERKNS1_9StringValEPS4_", : prefix + "19DsHllUnionSerializeEPN10impala_udf15FunctionContextERKNS1_9StringValE", : prefix + "18DsHllUnionFinalizeEPN10impala_udf15FunctionContextERKNS1_9StringValE", : true, false, true)); nit: indentation -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 07 Jul 2020 19:46:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6514/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 07 Jul 2020 15:31:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16095 to look at the new patch set (#3). Change subject: IMPALA-9633: Implement ds_hll_union() .. IMPALA-9633: Implement ds_hll_union() This function receives a set of sketches produced by ds_hll_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_hll_estimate(ds_hll_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_hll_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_hll_union() on those sketches. Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 --- M be/src/exprs/CMakeLists.txt M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h A be/src/exprs/datasketches-common.cc A be/src/exprs/datasketches-common.h M be/src/exprs/datasketches-functions-ir.cc M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test 8 files changed, 232 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16095/3 -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/16095/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test: http://gerrit.cloudera.org:8080/#/c/16095/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@163 PS2, Line 163: QUERY Add a test where the input data set contains valid sketches and nulls. Expectation is that nulls are ignored. -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 26 Jun 2020 07:04:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6368/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 18 Jun 2020 14:55:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16095 to look at the new patch set (#2). Change subject: IMPALA-9633: Implement ds_hll_union() .. IMPALA-9633: Implement ds_hll_union() This function receives a set of sketches produced by ds_hll_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_hll_estimate(ds_hll_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_hll_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_hll_union() on those sketches. Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 --- M be/src/exprs/CMakeLists.txt M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h A be/src/exprs/datasketches-common.cc A be/src/exprs/datasketches-common.h M be/src/exprs/datasketches-functions-ir.cc M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test 8 files changed, 244 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16095/2 -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6366/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 18 Jun 2020 13:15:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16095 ) Change subject: IMPALA-9633: Implement ds_hll_union() .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/16095/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16095/1/be/src/exprs/aggregate-functions-ir.cc@1686 PS1, Line 1686: StringVal AggregateFunctions::DsHllUnionFinalize(FunctionContext* ctx, const StringVal& src) { line too long (94 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 18 Jun 2020 12:31:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16095 Change subject: IMPALA-9633: Implement ds_hll_union() .. IMPALA-9633: Implement ds_hll_union() This function receives a set of sketches produced by ds_hll_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_hll_estimate(ds_hll_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_hll_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_hll_union() on those sketches. Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 --- M be/src/exprs/CMakeLists.txt M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h A be/src/exprs/datasketches-common.cc A be/src/exprs/datasketches-common.h M be/src/exprs/datasketches-functions-ir.cc M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test 8 files changed, 243 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16095/1 -- To view, visit http://gerrit.cloudera.org:8080/16095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Gerrit-Change-Number: 16095 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab