Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16226 )

Change subject: IMPALA-9942: DataSketches HLL shouldn't take empty strings as 
distinct values
......................................................................

IMPALA-9942: DataSketches HLL shouldn't take empty strings as distinct values

In Hive empty strings doesn't count as separate values when querying
count(distinct) estimates using Apache DataSketches HLL algorithm
on strings and varchars.
For compatibility's sake Impala should not take it either.

Tests:
-added extra tests for hll with empty strings

Change-Id: Ie7648217bbe2f66b817788f131c062f349b1e9ad
Reviewed-on: http://gerrit.cloudera.org:8080/16226
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
2 files changed, 28 insertions(+), 5 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie7648217bbe2f66b817788f131c062f349b1e9ad
Gerrit-Change-Number: 16226
Gerrit-PatchSet: 7
Gerrit-Owner: Adam Tamas <[email protected]>
Gerrit-Reviewer: Adam Tamas <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>

Reply via email to