[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 23 Jul 2020 22:20:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..

IMPALA-9633: Implement ds_hll_union()

This function receives a set of sketches produced by ds_hll_sketch()
and merges them into a single sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_hll_estimate(ds_hll_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Note, currently there is a known limitation of unioning string types
where some input sketches come from Impala and some from Hive. In
this case if there is an overlap in the input data used by Impala and
by Hive this overlapping data is still counted twice due to some
string representation difference between Impala and Hive.
For more details see:
https://issues.apache.org/jira/browse/IMPALA-9939

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_hll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_hll_union() on those sketches.

Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Reviewed-on: http://gerrit.cloudera.org:8080/16095
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
A be/src/exprs/datasketches-common.cc
A be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/hll_sketches_from_impala.parquet
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
M tests/query_test/test_datasketches.py
11 files changed, 271 insertions(+), 29 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6175/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 23 Jul 2020 17:01:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6171/


--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 23 Jul 2020 15:15:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 23 Jul 2020 11:24:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6171/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 23 Jul 2020 11:24:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-23 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 23 Jul 2020 11:15:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6568/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 13 Jul 2020 08:15:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6567/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 13 Jul 2020 08:13:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-13 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16095/5/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
File 
testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test:

http://gerrit.cloudera.org:8080/#/c/16095/5/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@202
PS5, Line 202: 
> Can you also add a union test with hll_sketches_from_hive?
Sure, Done



--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 13 Jul 2020 07:48:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-13 Thread Gabor Kaszab (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16095

to look at the new patch set (#7).

Change subject: IMPALA-9633: Implement ds_hll_union()
..

IMPALA-9633: Implement ds_hll_union()

This function receives a set of sketches produced by ds_hll_sketch()
and merges them into a single sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_hll_estimate(ds_hll_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Note, currently there is a known limitation of unioning string types
where some input sketches come from Impala and some from Hive. In
this case if there is an overlap in the input data used by Impala and
by Hive this overlapping data is still counted twice due to some
string representation difference between Impala and Hive.
For more details see:
https://issues.apache.org/jira/browse/IMPALA-9939

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_hll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_hll_union() on those sketches.

Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
A be/src/exprs/datasketches-common.cc
A be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/hll_sketches_from_impala.parquet
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
M tests/query_test/test_datasketches.py
11 files changed, 271 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16095/7
--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-13 Thread Gabor Kaszab (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16095

to look at the new patch set (#6).

Change subject: IMPALA-9633: Implement ds_hll_union()
..

IMPALA-9633: Implement ds_hll_union()

This function receives a set of sketches produced by ds_hll_sketch()
and merges them into a single sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_hll_estimate(ds_hll_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Note, currently there is a known limitation of unioning string types
where some input sketches come from Impala and some from Hive. In
this case if there is an overlap in the input data used by Impala and
by Hive this overlapping data is still counted twice due to some
string representation difference between Impala and Hive.
For more details see:
https://issues.apache.org/jira/browse/IMPALA-9939

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_hll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_hll_union() on those sketches.

Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
A be/src/exprs/datasketches-common.cc
A be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
M tests/query_test/test_datasketches.py
10 files changed, 271 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16095/6
--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-10 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 5: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16095/5/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
File 
testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test:

http://gerrit.cloudera.org:8080/#/c/16095/5/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@202
PS5, Line 202: 
Can you also add a union test with hll_sketches_from_hive?



--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 10 Jul 2020 11:20:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6539/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 09 Jul 2020 10:02:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-09 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16095/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16095/4/be/src/exprs/aggregate-functions-ir.cc@1736
PS4, Line 1736: tten by DeserializeHllSketch(
> Do these configs matter, or they are overwritten during deserialization? My
These settings are overwritten in DeserializeHllSketch() to repesent the ones 
coming from the serialized sketch. Added a comment.


http://gerrit.cloudera.org:8080/#/c/16095/4/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/16095/4/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1336
PS4, Line 1336: Lists.newArrayList(Type.STRING), Type.STRING, 
Type.STRING,
  : prefix + 
"14DsHllUnionInitEPN10impala_udf15FunctionContextEPNS1_9StringValE",
  : prefix +
  : 
"16DsHllUnionUpdateEPN10impala_udf15FunctionContextERKNS1_9StringValEPS4_",
  : prefix +
  : 
"15DsHllUnionMergeEPN10impala_udf15FunctionContextERKNS1_9StringValEPS4_",
  : prefix +
> nit: indentation
Done


http://gerrit.cloudera.org:8080/#/c/16095/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
File 
testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test:

http://gerrit.cloudera.org:8080/#/c/16095/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@163
PS2, Line 163: from hll_sketches_from_hive;
> Add a test where the input data set contains valid sketches and nulls. Expe
Done



--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 09 Jul 2020 09:36:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-09 Thread Gabor Kaszab (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16095

to look at the new patch set (#5).

Change subject: IMPALA-9633: Implement ds_hll_union()
..

IMPALA-9633: Implement ds_hll_union()

This function receives a set of sketches produced by ds_hll_sketch()
and merges them into a single sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_hll_estimate(ds_hll_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_hll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_hll_union() on those sketches.

Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
A be/src/exprs/datasketches-common.cc
A be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
8 files changed, 240 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16095/5
--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-07 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16095/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16095/4/be/src/exprs/aggregate-functions-ir.cc@1736
PS4, Line 1736: DS_SKETCH_CONFIG, DS_HLL_TYPE
Do these configs matter, or they are overwritten during deserialization? My 
understanding is that we should be able to union sketches with different 
configs. See 
https://datasketches.apache.org/docs/Architecture/SketchCriteria.html / 
Mergeable With Different Size Parameters


http://gerrit.cloudera.org:8080/#/c/16095/4/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/16095/4/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1336
PS4, Line 1336: Lists.newArrayList(Type.STRING), Type.STRING, 
Type.STRING,
  : prefix + 
"14DsHllUnionInitEPN10impala_udf15FunctionContextEPNS1_9StringValE",
  : prefix + 
"16DsHllUnionUpdateEPN10impala_udf15FunctionContextERKNS1_9StringValEPS4_",
  : prefix + 
"15DsHllUnionMergeEPN10impala_udf15FunctionContextERKNS1_9StringValEPS4_",
  : prefix + 
"19DsHllUnionSerializeEPN10impala_udf15FunctionContextERKNS1_9StringValE",
  : prefix + 
"18DsHllUnionFinalizeEPN10impala_udf15FunctionContextERKNS1_9StringValE",
  : true, false, true));
nit: indentation



--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 07 Jul 2020 19:46:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6514/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 07 Jul 2020 15:31:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-07-07 Thread Gabor Kaszab (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16095

to look at the new patch set (#3).

Change subject: IMPALA-9633: Implement ds_hll_union()
..

IMPALA-9633: Implement ds_hll_union()

This function receives a set of sketches produced by ds_hll_sketch()
and merges them into a single sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_hll_estimate(ds_hll_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_hll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_hll_union() on those sketches.

Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
A be/src/exprs/datasketches-common.cc
A be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
8 files changed, 232 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16095/3
--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-06-26 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16095/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
File 
testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test:

http://gerrit.cloudera.org:8080/#/c/16095/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@163
PS2, Line 163:  QUERY
Add a test where the input data set contains valid sketches and nulls. 
Expectation is that nulls are ignored.



--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 26 Jun 2020 07:04:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6368/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 18 Jun 2020 14:55:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-06-18 Thread Gabor Kaszab (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16095

to look at the new patch set (#2).

Change subject: IMPALA-9633: Implement ds_hll_union()
..

IMPALA-9633: Implement ds_hll_union()

This function receives a set of sketches produced by ds_hll_sketch()
and merges them into a single sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_hll_estimate(ds_hll_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_hll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_hll_union() on those sketches.

Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
A be/src/exprs/datasketches-common.cc
A be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
8 files changed, 244 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16095/2
--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6366/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 18 Jun 2020 13:15:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16095 )

Change subject: IMPALA-9633: Implement ds_hll_union()
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16095/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16095/1/be/src/exprs/aggregate-functions-ir.cc@1686
PS1, Line 1686: StringVal 
AggregateFunctions::DsHllUnionFinalize(FunctionContext* ctx, const StringVal& 
src) {
line too long (94 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 18 Jun 2020 12:31:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9633: Implement ds hll union()

2020-06-18 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16095


Change subject: IMPALA-9633: Implement ds_hll_union()
..

IMPALA-9633: Implement ds_hll_union()

This function receives a set of sketches produced by ds_hll_sketch()
and merges them into a single sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_hll_estimate(ds_hll_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_hll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_hll_union() on those sketches.

Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
A be/src/exprs/datasketches-common.cc
A be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
8 files changed, 243 insertions(+), 34 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16095/1
--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab