[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
lipeng...@sensorsdata.cn has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18574 Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables .. IMPALA-11279: Optimize plain count(*) queries for Iceberg tables This commit optimizes the plain count(*) queries for the Iceberg tables. When the `org.apache.iceberg.SnapshotSummary#TOTAL_RECORDS_PROP` can be retrieved from the current `org.apache.iceberg.BaseSnapshot#summary` of the Iceberg table, this kind of query can be very fast. If this property is not retrieved, the query will aggregate the `num_rows` of parquet `file_metadata_` as usual. Queries that can be optimized need to meet the following requirements: - SelectStmt does not have WHERE clause - SelectStmt does not have GROUP BY clause - The TableRefs of FROM clause contains only one BaseTableRef - Only for the Iceberg table - SelectList contains only 'count(*)' Testing: - Added end-to-end test - Existing tests Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9 --- M be/src/service/client-request-state.cc M be/src/service/frontend.cc M be/src/service/frontend.h M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-compound-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-is-null-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-upper-lower-bound-metrics.test M tests/query_test/test_iceberg.py 14 files changed, 310 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/18574/2 -- To view, visit http://gerrit.cloudera.org:8080/18574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9 Gerrit-Change-Number: 18574 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward
[Impala-ASF-CR] IMPALA-11233: Unset all query option
Hello Quanlong Huang, Anonymous Coward (339), Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18430 to look at the new patch set (#7). Change subject: IMPALA-11233: Unset all query option .. IMPALA-11233: Unset all query option When using jdbc connection pool, a connection set some query option, after query finished, connection is closed and put back to the connection pool. When connection used again, the last query option also come into affect. We need a feature that a set statement can reset all query option without restart impalad. Support UNSET statements in SQL dialect. UNSET ALL can unset all query option. Testing: - add unset all query option in test_hs2.py Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250 --- M be/src/service/client-request-state.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/Frontend.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/SetStmt.java M shell/impala_shell.py M tests/hs2/test_hs2.py 8 files changed, 91 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/18430/7 -- To view, visit http://gerrit.cloudera.org:8080/18430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250 Gerrit-Change-Number: 18430 Gerrit-PatchSet: 7 Gerrit-Owner: Xiaoqing Gao Gerrit-Reviewer: Anonymous Coward (339) Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xiaoqing Gao
[Impala-ASF-CR] IMPALA-11233: Unset all query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18430 ) Change subject: IMPALA-11233: Unset all query option .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/18430/7/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/18430/7/be/src/service/query-options.cc@1295 PS7, Line 1295: QUERY_OPTS_TABLE line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/18430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250 Gerrit-Change-Number: 18430 Gerrit-PatchSet: 7 Gerrit-Owner: Xiaoqing Gao Gerrit-Reviewer: Anonymous Coward (339) Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xiaoqing Gao Gerrit-Comment-Date: Mon, 30 May 2022 08:51:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11233: Unset all query option
Hello Quanlong Huang, Anonymous Coward (339), Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18430 to look at the new patch set (#8). Change subject: IMPALA-11233: Unset all query option .. IMPALA-11233: Unset all query option When using jdbc connection pool, a connection set some query option, after query finished, connection is closed and put back to the connection pool. When connection used again, the last query option also come into affect. We need a feature that a set statement can reset all query option without restart impalad. Support UNSET statements in SQL dialect. UNSET ALL can unset all query option. Testing: - add unset all query option in test_hs2.py Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250 --- M be/src/service/client-request-state.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/Frontend.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/SetStmt.java M shell/impala_shell.py M tests/hs2/test_hs2.py 8 files changed, 90 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/18430/8 -- To view, visit http://gerrit.cloudera.org:8080/18430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250 Gerrit-Change-Number: 18430 Gerrit-PatchSet: 8 Gerrit-Owner: Xiaoqing Gao Gerrit-Reviewer: Anonymous Coward (339) Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xiaoqing Gao
[Impala-ASF-CR] IMPALA-11233: Unset all query option
Xiaoqing Gao has posted comments on this change. ( http://gerrit.cloudera.org:8080/18430 ) Change subject: IMPALA-11233: Unset all query option .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/18430/6/fe/src/main/java/org/apache/impala/analysis/SetStmt.java File fe/src/main/java/org/apache/impala/analysis/SetStmt.java: http://gerrit.cloudera.org:8080/#/c/18430/6/fe/src/main/java/org/apache/impala/analysis/SetStmt.java@30 PS6, Line 30: private final String value_; : private final TQueryOptionTy > Seems like the `SetStmt` can only be one of the following modes: Done -- To view, visit http://gerrit.cloudera.org:8080/18430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250 Gerrit-Change-Number: 18430 Gerrit-PatchSet: 8 Gerrit-Owner: Xiaoqing Gao Gerrit-Reviewer: Anonymous Coward (339) Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xiaoqing Gao Gerrit-Comment-Date: Mon, 30 May 2022 08:54:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18574 ) Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10659/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9 Gerrit-Change-Number: 18574 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 30 May 2022 08:54:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11233: Unset all query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18430 ) Change subject: IMPALA-11233: Unset all query option .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10660/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250 Gerrit-Change-Number: 18430 Gerrit-PatchSet: 7 Gerrit-Owner: Xiaoqing Gao Gerrit-Reviewer: Anonymous Coward (339) Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xiaoqing Gao Gerrit-Comment-Date: Mon, 30 May 2022 09:11:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11233: Unset all query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18430 ) Change subject: IMPALA-11233: Unset all query option .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10661/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250 Gerrit-Change-Number: 18430 Gerrit-PatchSet: 8 Gerrit-Owner: Xiaoqing Gao Gerrit-Reviewer: Anonymous Coward (339) Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xiaoqing Gao Gerrit-Comment-Date: Mon, 30 May 2022 09:12:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11233: Unset all query option
Anonymous Coward (339) has posted comments on this change. ( http://gerrit.cloudera.org:8080/18430 ) Change subject: IMPALA-11233: Unset all query option .. Patch Set 8: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/18430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iabf23622daab733ddab20dd3ca73af6c9bd5c250 Gerrit-Change-Number: 18430 Gerrit-PatchSet: 8 Gerrit-Owner: Xiaoqing Gao Gerrit-Reviewer: Anonymous Coward (339) Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xiaoqing Gao Gerrit-Comment-Date: Mon, 30 May 2022 11:06:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
Jian Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18574 ) Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java File fe/src/main/java/org/apache/impala/catalog/FeFsTable.java: http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@464 PS2, Line 464: public static TResultSet getIcebergTotalRecordsProp(FeIcebergTable feTable) { how about `s/feTable/table/` to keep consistent with other functions? http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@486 PS2, Line 486: // When NumberFormatException is thrown or the TOTAL_RECORDS_PROP is not : // positive, the retrieval is considered unsuccessful. Seems like the comment explains the opposite of the following code block, which is a little confusing to readers. How about rewriting the comment to explain why not return `0` when `total != null and total == 0`? http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2228 PS2, Line 2228: if (!funcCallExpr.getParams().isStar()) return; how about also optimizing the case of `select count(constant) from tbl` since `count(constant)` is equal to `count(*)`? http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2459 PS2, Line 2459: if (feTable instanceof FeIcebergTable) { I noticed that `optimizeQueryForIcebergTable()` has guaranteed that the table is an Iceberg table, when will this condition be matched? -- To view, visit http://gerrit.cloudera.org:8080/18574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9 Gerrit-Change-Number: 18574 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 30 May 2022 12:29:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
Xianqing He has posted comments on this change. ( http://gerrit.cloudera.org:8080/18574 ) Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2213 PS2, Line 2213: if (stmt.hasGroupByClause()) return; I think the statement can't has 'havingClause' too and it's better to add some tests for the statement has 'WhereClause','GroupByClause' or 'HavingClause' -- To view, visit http://gerrit.cloudera.org:8080/18574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9 Gerrit-Change-Number: 18574 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Xianqing He Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 30 May 2022 13:34:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
Xianqing He has posted comments on this change. ( http://gerrit.cloudera.org:8080/18574 ) Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2158 PS2, Line 2158: optimizeQueryForIcebergTable(planCtx, analysisResult); I think we can move this to #2027 since we need not to create the plan -- To view, visit http://gerrit.cloudera.org:8080/18574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9 Gerrit-Change-Number: 18574 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Xianqing He Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 30 May 2022 14:15:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-5675: Support UTF-8 VARCHAR and CHAR types
Hello Qifan Chen, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16909 to look at the new patch set (#22). Change subject: WIP IMPALA-5675: Support UTF-8 VARCHAR and CHAR types .. WIP IMPALA-5675: Support UTF-8 VARCHAR and CHAR types This patch adds support for UTF-8 aware varchar and char types. In UTF-8 mode, when truncating UTF-8 varchar(N) and char(N) strings, lengths will be counted by characters instead of bytes. So the result string will have up to N characters. The UTF8_MODE query option is first detected in FE when analyzing the query. A 'is_utf8' label is added in Exprs and SlotDescriptors. They are used in generating thrift objects and computing the tuple layouts. A char(N) slot will occupy 4 * N bytes if it's in UTF-8 type, because a UTF-8 character can be encoded into 1~4 bytes. The slot will store up to N characters. There is a gotcha that we should not add the label in Type.java, because Type instances are shared across the FE. Query compilation reuses the Type instances from the metadata. If we modify Type instances during compilation, other queries in non-UTF8 mode will be affected. However, in BE, we need the type related classes (e.g. ColumnType, TypeDesc) to carry in the utf8 markers. It's impractical to check the UTF8_MODE query option everywhere it needs to be. E.g. in AnyValUtil::SetAnyVal we can't access the query options. So we add the 'is_utf8' marker in TScalarType, ColumnType, TypeDesc to conveniently recognize char(N) and varchar(N) types in UTF-8 mode. When generating thrift objects in FE, Exprs and SlotDescriptors deliver 'is_utf8' markers to TScalaTypes. They finally landed in ColumnType and TypeDesc instances. Given the correct UTF-8 mode checked, we just need to truncate/pad the char/varchar strings with their length counted by characters. To make sure we don't miss any places, this patch change ColmunType's 'len' field to two separated fields, 'char_len' and 'byte_len', representing the length in characters and length in bytes. Codes should explicitly use the desired length. Since char(N) slots always occupy 4N bytes, when converting char(N) to other string types, we need to re-calculate the actual length in bytes corresponding to N characters. We can optimize this in later patches, e.g. store the actual length in the slot, or deal with UTF-8 char(N) in the same way as varchar(N), i.e. reallocate the string space and just store the pointer and length in the slot. This patch won't deal with invalid UTF-8 characters. It will be the focus of IMPALA-10761. Tests: - Add tests for reading char(N) and varchar(N) columns in UTF8_MODE. - Add truncating/padding tests - Kudu only supports VARCHAR currently. Add special tests for Kudu. - Add tests for writing CHAR(N)/VARCHAR(N) in UTF-8 mode. - Add test dimension on UTF8_MODE in some e2e tests, e.g. test_chars.py, test_ranger.py, test_insert_permutation.py, etc. - Add test coverage on UTF8_MODE=true in some tests of expr-test. TODO: Run CORE tests with UTF8_MODE=true. Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner-ir.cc M be/src/exec/hdfs-avro-scanner-test.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/orc-column-readers.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-stats.inline.h M be/src/exec/parquet/parquet-common.h M be/src/exec/parquet/parquet-data-converter.h M be/src/exec/parquet/parquet-plain-test.cc M be/src/exec/text-converter.cc M be/src/exec/text-converter.inline.h M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/conditional-functions-ir.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr-test.cc M be/src/exprs/literal.cc M be/src/exprs/math-functions-ir.cc M be/src/exprs/operators-ir.cc M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/utility-functions-ir.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.inline.h M be/src/runtime/string-value.h M be/src/runtime/tuple.cc M be/src/runtime/types.cc M be/src/runtime/types.h M be/src/service/fe-suppor
[Impala-ASF-CR] WIP IMPALA-5675: Support UTF-8 VARCHAR and CHAR types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16909 ) Change subject: WIP IMPALA-5675: Support UTF-8 VARCHAR and CHAR types .. Patch Set 22: (2 comments) http://gerrit.cloudera.org:8080/#/c/16909/22/tests/authorization/test_ranger.py File tests/authorization/test_ranger.py: http://gerrit.cloudera.org:8080/#/c/16909/22/tests/authorization/test_ranger.py@34 PS22, Line 34: from tests.common.test_vector import ImpalaTestDimension flake8: F401 'tests.common.test_vector.ImpalaTestDimension' imported but unused http://gerrit.cloudera.org:8080/#/c/16909/22/tests/query_test/test_chars.py File tests/query_test/test_chars.py: http://gerrit.cloudera.org:8080/#/c/16909/22/tests/query_test/test_chars.py@23 PS22, Line 23: from tests.common.test_vector import ImpalaTestDimension flake8: F401 'tests.common.test_vector.ImpalaTestDimension' imported but unused -- To view, visit http://gerrit.cloudera.org:8080/16909 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f Gerrit-Change-Number: 16909 Gerrit-PatchSet: 22 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 30 May 2022 14:57:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-5675: Support UTF-8 VARCHAR and CHAR types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16909 ) Change subject: WIP IMPALA-5675: Support UTF-8 VARCHAR and CHAR types .. Patch Set 22: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10662/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16909 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f Gerrit-Change-Number: 16909 Gerrit-PatchSet: 22 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 30 May 2022 15:16:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 17: (2 comments) http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@337 PS17, Line 337: if (state->count > 0) --state->count; If state->count is 1, it will be decreased to 0. We will hit divide-by-zero errors in the following lines. http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@342 PS17, Line 342: if (state->count > 1) { If the original count is 2, it will be decreased to 1 at line 337. Then we will miss it here which is wrong IIUC. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 17 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 May 2022 05:08:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
lipeng...@sensorsdata.cn has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/18574 ) Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables .. IMPALA-11279: Optimize plain count(*) queries for Iceberg tables This commit optimizes the plain count(*) queries for the Iceberg tables. When the `org.apache.iceberg.SnapshotSummary#TOTAL_RECORDS_PROP` can be retrieved from the current `org.apache.iceberg.BaseSnapshot#summary` of the Iceberg table, this kind of query can be very fast. If this property is not retrieved, the query will aggregate the `num_rows` of parquet `file_metadata_` as usual. Queries that can be optimized need to meet the following requirements: - SelectStmt does not have WHERE clause - SelectStmt does not have GROUP BY clause - SelectStmt does not have HAVING clause - The TableRefs of FROM clause contains only one BaseTableRef - Only for the Iceberg table - SelectList contains only 'count(*)' Testing: - Added end-to-end test - Existing tests Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9 --- M be/src/service/client-request-state.cc M be/src/service/frontend.cc M be/src/service/frontend.h M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-compound-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-is-null-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-upper-lower-bound-metrics.test M tests/query_test/test_iceberg.py 14 files changed, 324 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/18574/3 -- To view, visit http://gerrit.cloudera.org:8080/18574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9 Gerrit-Change-Number: 18574 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Xianqing He Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
lipeng...@sensorsdata.cn has posted comments on this change. ( http://gerrit.cloudera.org:8080/18574 ) Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables .. Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java File fe/src/main/java/org/apache/impala/catalog/FeFsTable.java: http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@464 PS2, Line 464: public static TResultSet getIcebergTotalRecordsProp(FeIcebergTable table) { > how about `s/feTable/table/` to keep consistent with other functions? Done http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@486 PS2, Line 486: } : // When total is 0, no optimization is still very fast > Seems like the comment explains the opposite of the following code block, w Done http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2158 PS2, Line 2158: optimizeQueryForIcebergTable(planCtx, analysisResult); > I think we can move this to #2027 since we need not to create the plan Because to consider retrieving TOTAL_RECORDS_PROP from the Iceberg Snapshot summary failure, creating a query plan is required. http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2213 PS2, Line 2213: if (stmt.hasGroupByClause()) return; > I think the statement can't has 'havingClause' too and it's better to add s Great advice http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2228 PS2, Line 2228: if (!funcCallExpr.getFnName().getFunction().equ > how about also optimizing the case of `select count(constant) from tbl` sin Thx for your cr. As in the 'https://gerrit.cloudera.org/#/c/18574/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-is-null-predicate-push-down.test', 'select count(constant) from TBL' is rewritten to 'count(*)', so this optimization works. The relevant code is`org.apache.impala.rewrite.NormalizeCountStarRule`. http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2459 PS2, Line 2459: request.getTable_name()); > I noticed that `optimizeQueryForIcebergTable()` has guaranteed that the tab Be consistent with other functions named "doGetXXX" and double check table type. -- To view, visit http://gerrit.cloudera.org:8080/18574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9 Gerrit-Change-Number: 18574 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Xianqing He Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 31 May 2022 05:28:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18574 ) Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10663/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9 Gerrit-Change-Number: 18574 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Xianqing He Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 31 May 2022 05:46:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
Jian Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18574 ) Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables .. Patch Set 3: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2228 PS2, Line 2228: if (!funcCallExpr.getFnName().getFunction().equ > Thx for your cr. Done http://gerrit.cloudera.org:8080/#/c/18574/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2459 PS2, Line 2459: request.getTable_name()); > Be consistent with other functions named "doGetXXX" and double check table Done -- To view, visit http://gerrit.cloudera.org:8080/18574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9 Gerrit-Change-Number: 18574 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Xianqing He Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 31 May 2022 06:12:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 17: (9 comments) http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@293 PS17, Line 293: // using a stable one-pass algorithm nit: please also mention "Welford's online algorithm". http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@355 PS17, Line 355: NULL nit: we prefer nullptr to NULL. http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@358 PS17, Line 358: if (!isnan(src1.val) && !isnan(src2.val)) { nit: we can merge this check to line 354, i.e. if (src1.is_null || isnan(src1.val) || src2.is_null || isnan(src2.val)) return; http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@371 PS17, Line 371: if (!isnan(src1.val) && !isnan(src2.val)) { nit: same as above. We can merge the check to line 367. http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@412 PS17, Line 412: if(src.ptr != NULL) { nit: add a space after "if" and replace NULL with nullptr. http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@416 PS17, Line 416: dst_state->count = src_state->count; : dst_state->xavg = src_state->xavg; : dst_state->yavg = src_state->yavg; : dst_state->xvar = src_state->xvar; : dst_state->yvar = src_state->yvar; : dst_state->covar = src_state->covar; nit: Can we simplify this by using memcpy? i.e. memcpy(dst, &src, sizeof(CorrState) http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@423 PS17, Line 423: if (nA != 0 && nB != 0) { nit: replace "if" with "else if" to save one check, or add a "return" at the end of the above if-branch. http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@453 PS17, Line 453: sqrt(state->xvar) / sqrt(state->yvar) sqrt() is expensive. Let's change this to sqrt(state->xvar * state->yvar). http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@493 PS17, Line 493: both terms nit: both terms? Maybe you want to add another equation after line 491: // c_n = c_(n - 1) + (x_n - mx_n) * (y_n - my_(n - 1)) -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 17 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 May 2022 06:57:09 + Gerrit-HasComments: Yes