[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

2022-05-30 Thread Anonymous Coward (Code Review)
lipeng...@sensorsdata.cn has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18574


Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
..

IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

This commit optimizes the plain count(*) queries for the Iceberg tables.
When the `org.apache.iceberg.SnapshotSummary#TOTAL_RECORDS_PROP` can be
retrieved from the current `org.apache.iceberg.BaseSnapshot#summary` of
the Iceberg table, this kind of query can be very fast. If this property
is not retrieved, the query will aggregate the `num_rows` of parquet
`file_metadata_` as usual.

Queries that can be optimized need to meet the following requirements:
 - SelectStmt does not have WHERE clause
 - SelectStmt does not have GROUP BY clause
 - The TableRefs of FROM clause contains only one BaseTableRef
 - Only for the Iceberg table
 - SelectList contains only 'count(*)'

Testing:
 - Added end-to-end test
 - Existing tests

Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
---
M be/src/service/client-request-state.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-compound-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-is-null-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-upper-lower-bound-metrics.test
M tests/query_test/test_iceberg.py
14 files changed, 310 insertions(+), 17 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/18574/2
--
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward 


[Impala-ASF-CR] IMPALA-11233: Unset all query option

2022-05-30 Thread Xiaoqing Gao (Code Review)
Hello Quanlong Huang, Anonymous Coward (339), Gabor Kaszab, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18430

to look at the new patch set (#7).

Change subject: IMPALA-11233: Unset all query option
..

IMPALA-11233: Unset all query option

When using jdbc connection pool, a connection set some query option,
after query finished, connection is closed and put back to the connection
pool. When connection used again, the last query option also come into
affect. We need a feature that a set statement can reset all query option
without restart impalad.

Support UNSET statements in SQL dialect. UNSET ALL can unset all query
option.

Testing:
  - add unset all query option in test_hs2.py

Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250
---
M be/src/service/client-request-state.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/Frontend.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/SetStmt.java
M shell/impala_shell.py
M tests/hs2/test_hs2.py
8 files changed, 91 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/18430/7
--
To view, visit http://gerrit.cloudera.org:8080/18430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250
Gerrit-Change-Number: 18430
Gerrit-PatchSet: 7
Gerrit-Owner: Xiaoqing Gao 
Gerrit-Reviewer: Anonymous Coward (339)
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiaoqing Gao 


[Impala-ASF-CR] IMPALA-11233: Unset all query option

2022-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18430 )

Change subject: IMPALA-11233: Unset all query option
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18430/7/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/18430/7/be/src/service/query-options.cc@1295
PS7, Line 1295:   QUERY_OPTS_TABLE
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/18430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250
Gerrit-Change-Number: 18430
Gerrit-PatchSet: 7
Gerrit-Owner: Xiaoqing Gao 
Gerrit-Reviewer: Anonymous Coward (339)
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiaoqing Gao 
Gerrit-Comment-Date: Mon, 30 May 2022 08:51:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11233: Unset all query option

2022-05-30 Thread Xiaoqing Gao (Code Review)
Hello Quanlong Huang, Anonymous Coward (339), Gabor Kaszab, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18430

to look at the new patch set (#8).

Change subject: IMPALA-11233: Unset all query option
..

IMPALA-11233: Unset all query option

When using jdbc connection pool, a connection set some query option,
after query finished, connection is closed and put back to the connection
pool. When connection used again, the last query option also come into
affect. We need a feature that a set statement can reset all query option
without restart impalad.

Support UNSET statements in SQL dialect. UNSET ALL can unset all query
option.

Testing:
  - add unset all query option in test_hs2.py

Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250
---
M be/src/service/client-request-state.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/Frontend.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/SetStmt.java
M shell/impala_shell.py
M tests/hs2/test_hs2.py
8 files changed, 90 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/18430/8
--
To view, visit http://gerrit.cloudera.org:8080/18430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250
Gerrit-Change-Number: 18430
Gerrit-PatchSet: 8
Gerrit-Owner: Xiaoqing Gao 
Gerrit-Reviewer: Anonymous Coward (339)
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiaoqing Gao 


[Impala-ASF-CR] IMPALA-11233: Unset all query option

2022-05-30 Thread Xiaoqing Gao (Code Review)
Xiaoqing Gao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18430 )

Change subject: IMPALA-11233: Unset all query option
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18430/6/fe/src/main/java/org/apache/impala/analysis/SetStmt.java
File fe/src/main/java/org/apache/impala/analysis/SetStmt.java:

http://gerrit.cloudera.org:8080/#/c/18430/6/fe/src/main/java/org/apache/impala/analysis/SetStmt.java@30
PS6, Line 30: private final String value_;
:   private final TQueryOptionTy
> Seems like the `SetStmt` can only be one of the following modes:
Done



--
To view, visit http://gerrit.cloudera.org:8080/18430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250
Gerrit-Change-Number: 18430
Gerrit-PatchSet: 8
Gerrit-Owner: Xiaoqing Gao 
Gerrit-Reviewer: Anonymous Coward (339)
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiaoqing Gao 
Gerrit-Comment-Date: Mon, 30 May 2022 08:54:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

2022-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18574 )

Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10659/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 30 May 2022 08:54:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11233: Unset all query option

2022-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18430 )

Change subject: IMPALA-11233: Unset all query option
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10660/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250
Gerrit-Change-Number: 18430
Gerrit-PatchSet: 7
Gerrit-Owner: Xiaoqing Gao 
Gerrit-Reviewer: Anonymous Coward (339)
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiaoqing Gao 
Gerrit-Comment-Date: Mon, 30 May 2022 09:11:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11233: Unset all query option

2022-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18430 )

Change subject: IMPALA-11233: Unset all query option
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10661/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250
Gerrit-Change-Number: 18430
Gerrit-PatchSet: 8
Gerrit-Owner: Xiaoqing Gao 
Gerrit-Reviewer: Anonymous Coward (339)
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiaoqing Gao 
Gerrit-Comment-Date: Mon, 30 May 2022 09:12:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11233: Unset all query option

2022-05-30 Thread Anonymous Coward (Code Review)
Anonymous Coward (339) has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18430 )

Change subject: IMPALA-11233: Unset all query option
..


Patch Set 8: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250
Gerrit-Change-Number: 18430
Gerrit-PatchSet: 8
Gerrit-Owner: Xiaoqing Gao 
Gerrit-Reviewer: Anonymous Coward (339)
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiaoqing Gao 
Gerrit-Comment-Date: Mon, 30 May 2022 11:06:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

2022-05-30 Thread Jian Zhang (Code Review)
Jian Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18574 )

Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
File fe/src/main/java/org/apache/impala/catalog/FeFsTable.java:

http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@464
PS2, Line 464: public static TResultSet 
getIcebergTotalRecordsProp(FeIcebergTable feTable) {
how about `s/feTable/table/` to keep consistent with other functions?


http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@486
PS2, Line 486: // When NumberFormatException is thrown or the 
TOTAL_RECORDS_PROP is not
 :   // positive, the retrieval is considered unsuccessful.
Seems like the comment explains the opposite of the following code block, which 
is a little confusing to readers. How about rewriting the comment to explain 
why not return `0` when `total != null and total == 0`?


http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2228
PS2, Line 2228: if (!funcCallExpr.getParams().isStar()) return;
how about also optimizing the case of `select count(constant) from tbl` since 
`count(constant)` is equal to `count(*)`?


http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2459
PS2, Line 2459: if (feTable instanceof FeIcebergTable) {
I noticed that `optimizeQueryForIcebergTable()` has guaranteed that the table 
is an Iceberg table, when will this condition be matched?



--
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 30 May 2022 12:29:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

2022-05-30 Thread Xianqing He (Code Review)
Xianqing He has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18574 )

Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2213
PS2, Line 2213: if (stmt.hasGroupByClause()) return;
I think the statement can't has 'havingClause' too and it's better to add some 
tests for the statement has 'WhereClause','GroupByClause' or 'HavingClause'



--
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 30 May 2022 13:34:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

2022-05-30 Thread Xianqing He (Code Review)
Xianqing He has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18574 )

Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2158
PS2, Line 2158: optimizeQueryForIcebergTable(planCtx, analysisResult);
I think we can move this to #2027 since we need not to create the plan



--
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 30 May 2022 14:15:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-5675: Support UTF-8 VARCHAR and CHAR types

2022-05-30 Thread Quanlong Huang (Code Review)
Hello Qifan Chen, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16909

to look at the new patch set (#22).

Change subject: WIP IMPALA-5675: Support UTF-8 VARCHAR and CHAR types
..

WIP IMPALA-5675: Support UTF-8 VARCHAR and CHAR types

This patch adds support for UTF-8 aware varchar and char types. In
UTF-8 mode, when truncating UTF-8 varchar(N) and char(N) strings,
lengths will be counted by characters instead of bytes. So the
result string will have up to N characters.

The UTF8_MODE query option is first detected in FE when analyzing the
query. A 'is_utf8' label is added in Exprs and SlotDescriptors. They are
used in generating thrift objects and computing the tuple layouts. A
char(N) slot will occupy 4 * N bytes if it's in UTF-8 type, because a
UTF-8 character can be encoded into 1~4 bytes. The slot will store up to
N characters.

There is a gotcha that we should not add the label in Type.java, because
Type instances are shared across the FE. Query compilation reuses the
Type instances from the metadata. If we modify Type instances during
compilation, other queries in non-UTF8 mode will be affected.

However, in BE, we need the type related classes (e.g. ColumnType,
TypeDesc) to carry in the utf8 markers. It's impractical to check the
UTF8_MODE query option everywhere it needs to be. E.g. in
AnyValUtil::SetAnyVal we can't access the query options. So we add the
'is_utf8' marker in TScalarType, ColumnType, TypeDesc to conveniently
recognize char(N) and varchar(N) types in UTF-8 mode. When generating
thrift objects in FE, Exprs and SlotDescriptors deliver 'is_utf8'
markers to TScalaTypes. They finally landed in ColumnType and TypeDesc
instances.

Given the correct UTF-8 mode checked, we just need to truncate/pad the
char/varchar strings with their length counted by characters. To make
sure we don't miss any places, this patch change ColmunType's 'len'
field to two separated fields, 'char_len' and 'byte_len', representing
the length in characters and length in bytes. Codes should explicitly
use the desired length.

Since char(N) slots always occupy 4N bytes, when converting char(N) to
other string types, we need to re-calculate the actual length in bytes
corresponding to N characters. We can optimize this in later patches,
e.g. store the actual length in the slot, or deal with UTF-8 char(N) in
the same way as varchar(N), i.e. reallocate the string space and just
store the pointer and length in the slot.

This patch won't deal with invalid UTF-8 characters. It will be the
focus of IMPALA-10761.

Tests:
 - Add tests for reading char(N) and varchar(N) columns in UTF8_MODE.
 - Add truncating/padding tests
 - Kudu only supports VARCHAR currently. Add special tests for Kudu.
 - Add tests for writing CHAR(N)/VARCHAR(N) in UTF-8 mode.
 - Add test dimension on UTF8_MODE in some e2e tests, e.g.
   test_chars.py, test_ranger.py, test_insert_permutation.py, etc.
 - Add test coverage on UTF8_MODE=true in some tests of expr-test.

TODO: Run CORE tests with UTF8_MODE=true.

Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f
---
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner-test.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-stats.inline.h
M be/src/exec/parquet/parquet-common.h
M be/src/exec/parquet/parquet-data-converter.h
M be/src/exec/parquet/parquet-plain-test.cc
M be/src/exec/text-converter.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/anyval-util.cc
M be/src/exprs/anyval-util.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/conditional-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/math-functions-ir.cc
M be/src/exprs/operators-ir.cc
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/utility-functions-ir.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.inline.h
M be/src/runtime/string-value.h
M be/src/runtime/tuple.cc
M be/src/runtime/types.cc
M be/src/runtime/types.h
M be/src/service/fe-suppor

[Impala-ASF-CR] WIP IMPALA-5675: Support UTF-8 VARCHAR and CHAR types

2022-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16909 )

Change subject: WIP IMPALA-5675: Support UTF-8 VARCHAR and CHAR types
..


Patch Set 22:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16909/22/tests/authorization/test_ranger.py
File tests/authorization/test_ranger.py:

http://gerrit.cloudera.org:8080/#/c/16909/22/tests/authorization/test_ranger.py@34
PS22, Line 34: from tests.common.test_vector import ImpalaTestDimension
flake8: F401 'tests.common.test_vector.ImpalaTestDimension' imported but unused


http://gerrit.cloudera.org:8080/#/c/16909/22/tests/query_test/test_chars.py
File tests/query_test/test_chars.py:

http://gerrit.cloudera.org:8080/#/c/16909/22/tests/query_test/test_chars.py@23
PS22, Line 23: from tests.common.test_vector import ImpalaTestDimension
flake8: F401 'tests.common.test_vector.ImpalaTestDimension' imported but unused



--
To view, visit http://gerrit.cloudera.org:8080/16909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f
Gerrit-Change-Number: 16909
Gerrit-PatchSet: 22
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 30 May 2022 14:57:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-5675: Support UTF-8 VARCHAR and CHAR types

2022-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16909 )

Change subject: WIP IMPALA-5675: Support UTF-8 VARCHAR and CHAR types
..


Patch Set 22:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10662/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f
Gerrit-Change-Number: 16909
Gerrit-PatchSet: 22
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 30 May 2022 15:16:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-30 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 17:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@337
PS17, Line 337:   if (state->count > 0) --state->count;
If state->count is 1, it will be decreased to 0. We will hit divide-by-zero 
errors in the following lines.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@342
PS17, Line 342:   if (state->count > 1) {
If the original count is 2, it will be decreased to 1 at line 337. Then we will 
miss it here which is wrong IIUC.



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 May 2022 05:08:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

2022-05-30 Thread Anonymous Coward (Code Review)
lipeng...@sensorsdata.cn has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/18574 )

Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
..

IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

This commit optimizes the plain count(*) queries for the Iceberg tables.
When the `org.apache.iceberg.SnapshotSummary#TOTAL_RECORDS_PROP` can be
retrieved from the current `org.apache.iceberg.BaseSnapshot#summary` of
the Iceberg table, this kind of query can be very fast. If this property
is not retrieved, the query will aggregate the `num_rows` of parquet
`file_metadata_` as usual.

Queries that can be optimized need to meet the following requirements:
 - SelectStmt does not have WHERE clause
 - SelectStmt does not have GROUP BY clause
 - SelectStmt does not have HAVING clause
 - The TableRefs of FROM clause contains only one BaseTableRef
 - Only for the Iceberg table
 - SelectList contains only 'count(*)'

Testing:
 - Added end-to-end test
 - Existing tests

Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
---
M be/src/service/client-request-state.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-compound-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-is-null-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-upper-lower-bound-metrics.test
M tests/query_test/test_iceberg.py
14 files changed, 324 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/18574/3
--
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

2022-05-30 Thread Anonymous Coward (Code Review)
lipeng...@sensorsdata.cn has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18574 )

Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
File fe/src/main/java/org/apache/impala/catalog/FeFsTable.java:

http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@464
PS2, Line 464: public static TResultSet 
getIcebergTotalRecordsProp(FeIcebergTable table) {
> how about `s/feTable/table/` to keep consistent with other functions?
Done


http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@486
PS2, Line 486: }
 :   // When total is 0, no optimization is still very fast
> Seems like the comment explains the opposite of the following code block, w
Done


http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2158
PS2, Line 2158: optimizeQueryForIcebergTable(planCtx, analysisResult);
> I think we can move this to #2027 since we need not to create the plan
Because to consider retrieving TOTAL_RECORDS_PROP from the Iceberg Snapshot 
summary failure, creating a query plan is required.


http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2213
PS2, Line 2213: if (stmt.hasGroupByClause()) return;
> I think the statement can't has 'havingClause' too and it's better to add s
Great advice


http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2228
PS2, Line 2228: if (!funcCallExpr.getFnName().getFunction().equ
> how about also optimizing the case of `select count(constant) from tbl` sin
Thx for your cr.
As in the 
'https://gerrit.cloudera.org/#/c/18574/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-is-null-predicate-push-down.test',
 'select count(constant) from TBL' is rewritten to 'count(*)', so this 
optimization works.

The relevant code is`org.apache.impala.rewrite.NormalizeCountStarRule`.


http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2459
PS2, Line 2459: request.getTable_name());
> I noticed that `optimizeQueryForIcebergTable()` has guaranteed that the tab
Be consistent with other functions named "doGetXXX" and double check table type.



--
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 31 May 2022 05:28:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

2022-05-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18574 )

Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10663/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 31 May 2022 05:46:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

2022-05-30 Thread Jian Zhang (Code Review)
Jian Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18574 )

Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
..


Patch Set 3: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2228
PS2, Line 2228: if (!funcCallExpr.getFnName().getFunction().equ
> Thx for your cr.
Done


http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2459
PS2, Line 2459: request.getTable_name());
> Be consistent with other functions named "doGetXXX" and double check table
Done



--
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 31 May 2022 06:12:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-30 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 17:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@293
PS17, Line 293: // using a stable one-pass algorithm
nit: please also mention "Welford's online algorithm".


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@355
PS17, Line 355: NULL
nit: we prefer nullptr to NULL.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@358
PS17, Line 358:   if (!isnan(src1.val) && !isnan(src2.val)) {
nit: we can merge this check to line 354, i.e.

  if (src1.is_null || isnan(src1.val) || src2.is_null || isnan(src2.val)) 
return;


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@371
PS17, Line 371:   if (!isnan(src1.val) && !isnan(src2.val)) {
nit: same as above. We can merge the check to line 367.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@412
PS17, Line 412:   if(src.ptr != NULL) {
nit: add a space after "if" and replace NULL with nullptr.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@416
PS17, Line 416:   dst_state->count = src_state->count;
  :   dst_state->xavg = src_state->xavg;
  :   dst_state->yavg = src_state->yavg;
  :   dst_state->xvar = src_state->xvar;
  :   dst_state->yvar = src_state->yvar;
  :   dst_state->covar = src_state->covar;
nit: Can we simplify this by using memcpy? i.e.

  memcpy(dst, &src, sizeof(CorrState)


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@423
PS17, Line 423: if (nA != 0 && nB != 0) {
nit: replace "if" with "else if" to save one check, or add a "return" at the 
end of the above if-branch.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@453
PS17, Line 453: sqrt(state->xvar) / sqrt(state->yvar)
sqrt() is expensive. Let's change this to sqrt(state->xvar * state->yvar).


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@493
PS17, Line 493: both terms
nit: both terms? Maybe you want to add another equation after line 491:

 // c_n = c_(n - 1) + (x_n - mx_n) * (y_n - my_(n - 1))



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 May 2022 06:57:09 +
Gerrit-HasComments: Yes