Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
......................................................................

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale` in SQL function
NDV(<expr>, <scale>). However the use of larger NDV scale requires
the modification of SQL queries which may not be practical in certain
applications:

- Firstly, SQL writers are reluctant to lower that scale. They prone
to fill up the scale, which will make the cluster unstable. Especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin other than sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`
with the following semantics:

1. The allowed value is in the range [1..10];
2. Previously, the scale used in NDV(<expr>) functions was fixed at 2.
Now the scale is provided by the newly added query options.
3. It does not influence the NDV scale for SQL function
NDV(<expr>, <scale>) in which the NDV scale is provided by the 2nd
argument <scale>.

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can
substitute `count(distinct <expr>)` with `ndv(<expr>, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
+====================================================================+
|   Metric    | Count Distinct |    NDV2    |    NDV5    |    NDV10  |
+--------------------------------------------------------------------+
|  Memory(GB) |       3.83     |    1.84    |    1.85    |     1.89  |
| Duration(s) |      182.89    |   30.22    |    29.72   |     29.24 |
|  ErrorRate  |        0%      |    1.8%    |    1.17%   |     0.06% |
+====================================================================+

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Reviewed-on: http://gerrit.cloudera.org:8080/17306
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 314 insertions(+), 34 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 18
Gerrit-Owner: fifteencai <fifteen...@tencent.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: fifteencai <fifteen...@tencent.com>

Reply via email to