[Impala-ASF-CR] IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18772 ) Change subject: IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iac7f2eda8b95643a3d3c3bef64ea71b67b20595a Gerrit-Change-Number: 18772 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Comment-Date: Sat, 06 Aug 2022 03:21:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18772 ) Change subject: IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs .. IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs We've seen catalogd throws OutOfMemoryError when serializing large responses (i.e. size > 2GB). However, the related table names are missing in the logs. Admins would like to get the table names and blacklist those tables until they are optimized (e.g. reducing partitions). To improve the supportability, this patch adds logs in the Catalogd RPC code paths to log some details of the request, also add thread annotations to improve readability of jstacks. Tests: - Add unit tests for short descriptions of requests. - Manually add codes to throw OutOfMemoryError and verify the logs shown as expected. - Run test_concurrent_ddls.py and metadata tests. Capture jstacks and verify the thread annotations are shown. - Run CORE tests Change-Id: Iac7f2eda8b95643a3d3c3bef64ea71b67b20595a Reviewed-on: http://gerrit.cloudera.org:8080/18772 Reviewed-by: Csaba Ringhofer Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/ColumnName.java M fe/src/main/java/org/apache/impala/analysis/FunctionName.java M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/analysis/TableName.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java A fe/src/main/java/org/apache/impala/util/CatalogOpUtil.java A fe/src/test/java/org/apache/impala/util/CatalogOpUtilTest.java 9 files changed, 513 insertions(+), 36 deletions(-) Approvals: Csaba Ringhofer: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/18772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iac7f2eda8b95643a3d3c3bef64ea71b67b20595a Gerrit-Change-Number: 18772 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate
[Impala-ASF-CR] IMPALA-11458: Update zlib, zstd, snappy
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18816 ) Change subject: IMPALA-11458: Update zlib, zstd, snappy .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If96fd5e8c49e581478b249bb8894001457742023 Gerrit-Change-Number: 18816 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Sat, 06 Aug 2022 02:54:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 ) Change subject: IMPALA-6684: Fix untracked memory in KRPC .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/krpc-data-stream-sender.cc@753 PS4, Line 753: // on some channel > I think it will make more sense to do this in Close() It works. http://gerrit.cloudera.org:8080/#/c/18798/5/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: http://gerrit.cloudera.org:8080/#/c/18798/5/be/src/runtime/row-batch.h@422 PS5, Line 422: us Serialize(OutboundRowBatch* output_batch, LockingFreePool* perm_free_pool, : RuntimeProfile::SummaryStatsCounter* tuple_data_stats_counter, We can return counters as integer, then update the profile counter in the caller. This will make unit-test code row-batch-serialize-test.cc easier. Same comment for line #531. http://gerrit.cloudera.org:8080/#/c/18798/5/be/src/runtime/row-batch.h@577 PS5, Line 577: /// row batch). If the distinct_tuples argument is non-null, full deduplication is nit: extra line -- To view, visit http://gerrit.cloudera.org:8080/18798 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82 Gerrit-Change-Number: 18798 Gerrit-PatchSet: 3 Gerrit-Owner: Omid Shahidi Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Omid Shahidi Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 05 Aug 2022 23:51:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18772 ) Change subject: IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8407/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/18772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iac7f2eda8b95643a3d3c3bef64ea71b67b20595a Gerrit-Change-Number: 18772 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Comment-Date: Fri, 05 Aug 2022 22:29:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18772 ) Change subject: IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs .. Patch Set 6: (1 comment) Thanks for your review! http://gerrit.cloudera.org:8080/#/c/18772/5/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: http://gerrit.cloudera.org:8080/#/c/18772/5/fe/src/main/java/org/apache/impala/service/JniCatalog.java@227 PS5, Line 227: Strin > Sure, I am ok with the follow up Jira, it could be also a nice ramp-up task Filed IMPALA-11478 -- To view, visit http://gerrit.cloudera.org:8080/18772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iac7f2eda8b95643a3d3c3bef64ea71b67b20595a Gerrit-Change-Number: 18772 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Comment-Date: Fri, 05 Aug 2022 22:28:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] PROTOTYPE: IMPALA-11207: Use hadoop-cloud-storage for Cloud dependencies
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/18817 ) Change subject: PROTOTYPE: IMPALA-11207: Use hadoop-cloud-storage for Cloud dependencies .. Patch Set 1: Code-Review+1 This seems reasonable. Some of this code is only used at runtime and I'm not sure how good the test coverage is for it. However the fe/target/build-classpath.txt diff between this and just reverting the google-oauth-client change (since this version of hadoop-cloud-storage still uses 0.31) is 34d33 < /home/michael/.m2/repository/com/google/cloud/bigdataoss/gcs-connector/2.1.2.7.2.16.0-77/gcs-connector-2.1.2.7.2.16.0-77.jar 59d57 < /home/michael/.m2/repository/com/jayway/jsonpath/json-path/2.4.0/json-path-2.4.0.jar 144,145d141 < /home/michael/.m2/repository/org/apache/directory/server/apacheds-i18n/2.0.0.AM25/apacheds-i18n-2.0.0.AM25.jar < /home/michael/.m2/repository/org/apache/directory/server/apacheds-jdbm/2.0.0-M5/apacheds-jdbm-2.0.0-M5.jar 150a147 > /home/michael/.m2/repository/org/apache/hadoop/hadoop-cloud-storage/3.1.1.7.2.16.0-77/hadoop-cloud-storage-3.1.1.7.2.16.0-77.jar 226a224,227 > /home/michael/.m2/repository/org/apache/ranger/ranger-raz-hook-abfs/2.1.0.7.2.16.0-77/ranger-raz-hook-abfs-2.1.0.7.2.16.0-77.jar > /home/michael/.m2/repository/org/apache/ranger/ranger-raz-hook-s3/2.1.0.7.2.16.0-77/ranger-raz-hook-s3-2.1.0.7.2.16.0-77.jar > /home/michael/.m2/repository/org/apache/ranger/ranger-raz-intg/2.1.0.7.2.16.0-77/ranger-raz-intg-2.1.0.7.2.16.0-77.jar > /home/michael/.m2/repository/org/apache/ranger/ranger-raz-s3-lib/2.1.0.7.2.16.0-77/ranger-raz-s3-lib-2.1.0.7.2.16.0-77.jar 238,244d238 < /home/michael/.m2/repository/org/codehaus/groovy/groovy/3.0.7/groovy-3.0.7.jar < /home/michael/.m2/repository/org/codehaus/groovy/groovy-console/3.0.7/groovy-console-3.0.7.jar < /home/michael/.m2/repository/org/codehaus/groovy/groovy-groovysh/3.0.7/groovy-groovysh-3.0.7.jar < /home/michael/.m2/repository/org/codehaus/groovy/groovy-json/3.0.7/groovy-json-3.0.7.jar < /home/michael/.m2/repository/org/codehaus/groovy/groovy-swing/3.0.7/groovy-swing-3.0.7.jar < /home/michael/.m2/repository/org/codehaus/groovy/groovy-templates/3.0.7/groovy-templates-3.0.7.jar < /home/michael/.m2/repository/org/codehaus/groovy/groovy-xml/3.0.7/groovy-xml-3.0.7.jar 276d269 < /home/michael/.m2/repository/org/fusesource/jansi/jansi/1.18/jansi-1.18.jar which doesn't seem too concerning. -- To view, visit http://gerrit.cloudera.org:8080/18817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3a1631289f990513823c2b17eb9241cc1b5a7ffd Gerrit-Change-Number: 18817 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Fri, 05 Aug 2022 22:18:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11458: Update zlib, zstd, snappy
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18816 ) Change subject: IMPALA-11458: Update zlib, zstd, snappy .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/11108/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If96fd5e8c49e581478b249bb8894001457742023 Gerrit-Change-Number: 18816 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Fri, 05 Aug 2022 22:10:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11458: Update zlib, zstd, snappy
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18816 ) Change subject: IMPALA-11458: Update zlib, zstd, snappy .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8406/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If96fd5e8c49e581478b249bb8894001457742023 Gerrit-Change-Number: 18816 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Fri, 05 Aug 2022 22:02:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11458: Update zlib, zstd, snappy
Hello Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18816 to look at the new patch set (#2). Change subject: IMPALA-11458: Update zlib, zstd, snappy .. IMPALA-11458: Update zlib, zstd, snappy Updates zlib to 1.2.12, zstd to 1.5.2, and snappy to 1.1.9. Requires a new toolchain build. The snappy update made some parquet files in the stats-extrapolation.test slightly larger, leading to slightly different estimates. I've checked that the data stored is unchanged with `select * from alltypes` on tables written before and after this change. Change-Id: If96fd5e8c49e581478b249bb8894001457742023 --- M bin/impala-config.sh M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test 2 files changed, 7 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/18816/2 -- To view, visit http://gerrit.cloudera.org:8080/18816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If96fd5e8c49e581478b249bb8894001457742023 Gerrit-Change-Number: 18816 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] PROTOTYPE: IMPALA-11207: Use hadoop-cloud-storage for Cloud dependencies
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18817 ) Change subject: PROTOTYPE: IMPALA-11207: Use hadoop-cloud-storage for Cloud dependencies .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3a1631289f990513823c2b17eb9241cc1b5a7ffd Gerrit-Change-Number: 18817 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Fri, 05 Aug 2022 20:59:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 ) Change subject: IMPALA-6684: Fix untracked memory in KRPC .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/11107/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18798 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82 Gerrit-Change-Number: 18798 Gerrit-PatchSet: 5 Gerrit-Owner: Omid Shahidi Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Omid Shahidi Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 05 Aug 2022 20:20:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC
Omid Shahidi has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/18798 ) Change subject: IMPALA-6684: Fix untracked memory in KRPC .. IMPALA-6684: Fix untracked memory in KRPC During serialization of an row batch header, a tuple_data_ is created which will hold the compressed tuple data for an outbound row batch. We would like this tuple data to be trackable as it is responsible for a significant portion of untrackable memory from the krpc data stream sender. By using free pool, we are able to allocate tuple data and compression scratch and account for it in the memory tracker of the KrpcDataStreamSender. This solution creates a RAII class responsible for memory allocation and changes the existing code to use a char buffer pointed by a char* tuple_data_ instead of the previously used std::string tuple_data_. The thrift implementation is left unchanged and the protobuf implementation is seperated. Testing: Passed core tests and ran a single node benchmark which shows no regression Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82 --- M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h 4 files changed, 296 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/5 -- To view, visit http://gerrit.cloudera.org:8080/18798 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82 Gerrit-Change-Number: 18798 Gerrit-PatchSet: 5 Gerrit-Owner: Omid Shahidi Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Omid Shahidi Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 ) Change subject: IMPALA-6684: Fix untracked memory in KRPC .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/row-batch.cc File be/src/runtime/row-batch.cc: http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/row-batch.cc@400 PS4, Line 400: Benchmarks (be/src/benchmarks/row-batch-serialize-benchmark.cc) and tests : /// (be/src/runtime/row-batch-serialize-test.cc) for serialization use TRowBatch and : /// Thrift so we need to keep the old implementation so they don't fail. Change benchmark and unit-test code for your new RowBatch::Serialize() -- To view, visit http://gerrit.cloudera.org:8080/18798 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82 Gerrit-Change-Number: 18798 Gerrit-PatchSet: 4 Gerrit-Owner: Omid Shahidi Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Omid Shahidi Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 05 Aug 2022 17:56:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] PROTOTYPE: IMPALA-11207: Use hadoop-cloud-storage for Cloud dependencies
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18817 ) Change subject: PROTOTYPE: IMPALA-11207: Use hadoop-cloud-storage for Cloud dependencies .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8405/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3a1631289f990513823c2b17eb9241cc1b5a7ffd Gerrit-Change-Number: 18817 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 05 Aug 2022 16:15:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11408: Fill missing partition columns when INSERT INTO iceberg tbl (col list)
lipeng...@sensorsdata.cn has posted comments on this change. ( http://gerrit.cloudera.org:8080/18790 ) Change subject: IMPALA-11408: Fill missing partition columns when INSERT INTO iceberg_tbl (col_list) .. Patch Set 3: (1 comment) awesome, thanks for cr! http://gerrit.cloudera.org:8080/#/c/18790/2/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: http://gerrit.cloudera.org:8080/#/c/18790/2/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@907 PS2, Line 907: // Add partition key expressions in the order of the Iceberg partition fields. > We can also make things work by only removing this precondition check becau Done -- To view, visit http://gerrit.cloudera.org:8080/18790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I40c733755d65e5c81a12ffe09b6d16ed5d115368 Gerrit-Change-Number: 18790 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 05 Aug 2022 16:08:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11408: Fill missing partition columns when INSERT INTO iceberg tbl (col list)
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18790 ) Change subject: IMPALA-11408: Fill missing partition columns when INSERT INTO iceberg_tbl (col_list) .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/11106/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I40c733755d65e5c81a12ffe09b6d16ed5d115368 Gerrit-Change-Number: 18790 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 05 Aug 2022 16:06:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11408: Fill missing partition columns when INSERT INTO iceberg tbl (col list)
lipeng...@sensorsdata.cn has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/18790 ) Change subject: IMPALA-11408: Fill missing partition columns when INSERT INTO iceberg_tbl (col_list) .. IMPALA-11408: Fill missing partition columns when INSERT INTO iceberg_tbl (col_list) In the case of INSERT INTO iceberg_tbl (col_a, col_b, ...), if the partition columns of Iceberg table are not in the columns permutation, in order for data to be written to the default partition '__HIVE_DEFAULT_PARTITION__' we will fill the missing partition columns with NullLiteral. Testing: - add e2e tests Change-Id: I40c733755d65e5c81a12ffe09b6d16ed5d115368 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test 3 files changed, 165 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/18790/3 -- To view, visit http://gerrit.cloudera.org:8080/18790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I40c733755d65e5c81a12ffe09b6d16ed5d115368 Gerrit-Change-Number: 18790 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-11436: Change search bind authentication parameters
Tamas Mate has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18819 Change subject: IMPALA-11436: Change search bind authentication parameters .. IMPALA-11436: Change search bind authentication parameters Impala's search bind authentication intends to mimic Spring's behaviour. However, the login username and user dn paremeters were swapped for group searches compared to Spring. This change intends to allign these parameters. For user search, Spring uses {0} to replace the login username. Meanwhile, during group search {0} is used to replace the login user dn and {1} is used to replace the login username. Testing: - Ran LdapSearchBindImpalaShellTest frontend tests Change-Id: I9808566a348f7c6200b0571fbc05e67f720f2075 --- M be/src/util/ldap-search-bind.cc M docs/topics/impala_ldap.xml M fe/src/test/java/org/apache/impala/customcluster/LdapSearchBindImpalaShellTest.java 3 files changed, 21 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/18819/1 -- To view, visit http://gerrit.cloudera.org:8080/18819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9808566a348f7c6200b0571fbc05e67f720f2075 Gerrit-Change-Number: 18819 Gerrit-PatchSet: 1 Gerrit-Owner: Tamas Mate
[Impala-ASF-CR] IMPALA-9482: Support for BINARY columns
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16066 ) Change subject: IMPALA-9482: Support for BINARY columns .. Patch Set 18: (1 comment) http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/runtime/descriptors.h@256 PS14, Line 256: return col_descs_[slot_desc->col_path().back()]; > Hmm, I realized that I have to think nested binaries through more carefully Checked how Hive handles this, it seems to utf8 encode the nested binary with replace: select named_struct("b", unhex("00112233445566778899AABBCCDDEEFF")); result: {"b":"3DUfw} select array(unhex("00112233445566778899AABBCCDDEEFF")); result: ["3DUfw] A weird behavior around nested binary is that it is not quoted (unlike string): select named_struct("s", "a", "b", cast("a" as binary)); {"s":"a","b":a} I have also checked what happens when we read these back from Impala: - if we return the whole struct, the non-valid utf8 binary is replaced with an empty string - if we return the whole array, it is returned escaped: "\000\021\"3DUfw"] - if the member/item is returned directly, we print it the same way as Hive - if we hex() the result back, it returns it correctly, so Hive does not write it in a lossy way I am thinking about not allowing nested binary for now, and enable them once the struct/array printing functions are unified + I have verified whether not quoting them is intentional in Hive. -- To view, visit http://gerrit.cloudera.org:8080/16066 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582 Gerrit-Change-Number: 16066 Gerrit-PatchSet: 18 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Steve Carlin Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 05 Aug 2022 11:22:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9482: Support for BINARY columns
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16066 ) Change subject: IMPALA-9482: Support for BINARY columns .. Patch Set 18: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/11105/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16066 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582 Gerrit-Change-Number: 16066 Gerrit-PatchSet: 18 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Steve Carlin Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 05 Aug 2022 11:04:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9482: Support for BINARY columns
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16066 ) Change subject: IMPALA-9482: Support for BINARY columns .. Patch Set 18: (1 comment) http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/service/hs2-util.cc File be/src/service/hs2-util.cc: http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/service/hs2-util.cc@872 PS14, Line 872: AuxColumnType aux_type(columnType); > Done Reverted this, as while gcc compiled it, clang had issue I could not solve. This is strange to me, as case TYPE_DECIMAL: also has some local variables, but I did not want to dive too deeply. -- To view, visit http://gerrit.cloudera.org:8080/16066 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582 Gerrit-Change-Number: 16066 Gerrit-PatchSet: 18 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Steve Carlin Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 05 Aug 2022 10:46:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9482: Support for BINARY columns
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16066 ) Change subject: IMPALA-9482: Support for BINARY columns .. Patch Set 18: (6 comments) http://gerrit.cloudera.org:8080/#/c/16066/18/tests/hs2/test_fetch.py File tests/hs2/test_fetch.py: http://gerrit.cloudera.org:8080/#/c/16066/18/tests/hs2/test_fetch.py@69 PS18, Line 69: " flake8: E126 continuation line over-indented for hanging indent http://gerrit.cloudera.org:8080/#/c/16066/18/tests/hs2/test_fetch.py@89 PS18, Line 89: " flake8: E126 continuation line over-indented for hanging indent http://gerrit.cloudera.org:8080/#/c/16066/18/tests/hs2/test_fetch.py@100 PS18, Line 100: " flake8: E126 continuation line over-indented for hanging indent http://gerrit.cloudera.org:8080/#/c/16066/18/tests/hs2/test_fetch.py@113 PS18, Line 113: " flake8: E126 continuation line over-indented for hanging indent http://gerrit.cloudera.org:8080/#/c/16066/18/tests/hs2/test_fetch.py@123 PS18, Line 123: " flake8: E126 continuation line over-indented for hanging indent http://gerrit.cloudera.org:8080/#/c/16066/18/tests/query_test/test_udfs.py File tests/query_test/test_udfs.py: http://gerrit.cloudera.org:8080/#/c/16066/18/tests/query_test/test_udfs.py@116 PS18, Line 116: y flake8: E501 line too long (92 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/16066 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582 Gerrit-Change-Number: 16066 Gerrit-PatchSet: 18 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Steve Carlin Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 05 Aug 2022 10:44:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9482: Support for BINARY columns
Hello Quanlong Huang, Gabor Kaszab, Zoltan Borok-Nagy, Attila Jeges, Tim Armstrong, Steve Carlin, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16066 to look at the new patch set (#18). Change subject: IMPALA-9482: Support for BINARY columns .. IMPALA-9482: Support for BINARY columns This patch adds support for BINARY columns for all table formats with the exception of Kudu. In Hive the main difference between STRING and BINARY is that STRING is assumed to be UTF8 encoded, while BINARY can be any byte array. Some other differences in Hive: - BINARY can be only cast from/to STRING - Only a small subset of built-in STRING functions support BINARY. - In several file formats (e.g. text) BINARY is base64 encoded. - No NDV is calculated during COMPUTE STATISTICS. As Impala doesn't treat STRINGs as UTF8, BINARY and STRING become nearly identical, especially from the backend's perspective. For this reason, BINARY is implemented a bit differently compared to other types: while the frontend treats STRING and BINARY as two separate types, most of the backend uses PrimitiveType::TYPE_STRING for BINARY too, e.g. in SlotDesc. Only the following parts of backend need to differentiate between STRING and BINARY: - table scanners - table writers - HS2/Beeswax service These parts have access to column metadata, which allows to add special handling for BINARY. Only a very few builtins are allowed for BINARY at the moment: - length - min/max/count - coalesce and similar "selector" functions Other STRING functions can be only used by casting to STRING first. Adding support for more of these functions is very easy, as simply the BINARY type has to be "connected" to the already existing STRING function's signature. Functions where the result depends on utf8_mode need to ensure that with BINARY it always works as if utf8_mode=0 (for example length() is mapped to bytes() as length count utf8 chars if utf8_mode=1). All kinds of UDFs (native, Hive legacy, Hive generic) support BINARY, though in case of legacy Hive UDFs it is only supported if the argument and return types are set explicitely to ensure backward compatibility. See IMPALA-11340 for details. The original plan was to behave as close to Hive as possible, but I realized that Hive has more relaxed casting rules than Impala, which led to STRING<->BINARY casts being necessary in more cases in Impala. This was needed to disallow passing a BINARY to functions that expect a STRING argument. An example for the difference is that in INSERT ... VALUES () string literals need to be explicitly cast to BINARY, while this is not needed in Hive. Testing: - Added functional.binary_tbl for all file formats (except Kudu) to test scanning. - Removed functional.unsupported_types and related tests, as now Impala supports all (non-complex) types that Hive does. - Added FE/EE tests mainly based on the ones added to the DATE type Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582 --- M be/src/exec/file-metadata-utils.cc M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-scan-node.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/orc-metadata-utils.cc M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/parquet/parquet-metadata-utils.cc M be/src/exec/parquet/parquet-metadata-utils.h M be/src/exec/text-converter.cc M be/src/exec/text-converter.h M be/src/exec/text-converter.inline.h M be/src/exprs/expr-test.cc M be/src/exprs/utility-functions-ir.cc M be/src/exprs/utility-functions.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/types.cc M be/src/runtime/types.h M be/src/service/hs2-util.cc M be/src/service/hs2-util.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/query-result-set.cc M be/src/testutil/test-udfs.cc M be/src/util/coding-util.cc M be/src/util/coding-util.h M bin/rat_exclude_files.txt M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/analysis/CastExpr.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java M fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/catalog/PrimitiveType.java M fe/src/main/java/org/apache/impala/catalog/ScalarFunction.java M fe/src/main/java/org/apache/impala/catalog/ScalarType.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/sr
[Impala-ASF-CR] IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18772 ) Change subject: IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs .. Patch Set 6: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/18772/5/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: http://gerrit.cloudera.org:8080/#/c/18772/5/fe/src/main/java/org/apache/impala/service/JniCatalog.java@227 PS5, Line 227: Strin > I can try to add more. But some RPCs are easy to get their target. E.g. get Sure, I am ok with the follow up Jira, it could be also a nice ramp-up task for someone new to Impala. If someone touches this area, it could also make sense to create a class hierarchy like CatalogJniOp with the basic common skeleton of this functions http://gerrit.cloudera.org:8080/#/c/18772/5/fe/src/main/java/org/apache/impala/service/JniCatalog.java@261 PS5, Line 261: JniUtil.logResponse(res.length, start, params, "exe > We calculate the duration in JniUtil.logResponse() and add warnings for slo yeah, I think that this can be often useful, e.g. to get a picture of catalog/HMS speed with a simple grep on the log -- To view, visit http://gerrit.cloudera.org:8080/18772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iac7f2eda8b95643a3d3c3bef64ea71b67b20595a Gerrit-Change-Number: 18772 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Comment-Date: Fri, 05 Aug 2022 10:33:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10453: Support file pruning via runtime filters on Iceberg
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/18531 ) Change subject: IMPALA-10453: Support file pruning via runtime filters on Iceberg .. Patch Set 15: Thank you for the reviews everyone! -- To view, visit http://gerrit.cloudera.org:8080/18531 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7762e1238bdf236b85d2728881a402a2bb41f36a Gerrit-Change-Number: 18531 Gerrit-PatchSet: 15 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 05 Aug 2022 10:36:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18772 ) Change subject: IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/11104/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iac7f2eda8b95643a3d3c3bef64ea71b67b20595a Gerrit-Change-Number: 18772 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Comment-Date: Fri, 05 Aug 2022 09:04:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18772 ) Change subject: IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/18772/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/18772/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2633 PS5, Line 2633: > We have already created a ThreadNameAnnotator in JniCatalog.ExecDdl, right? Oops, this is part of a previous patch. I should have reverted them. http://gerrit.cloudera.org:8080/#/c/18772/5/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: http://gerrit.cloudera.org:8080/#/c/18772/5/fe/src/main/java/org/apache/impala/service/JniCatalog.java@227 PS5, Line 227: Strin > Is there a reason for creating a ThreadAnnotator in some cases, but not doi I can try to add more. But some RPCs are easy to get their target. E.g. getCatalogDelta() is only called when statestore polls catalog updates. Some are dead codes, e.g. getDbs, getTableNames, getFunctions. We can remove them in a follow-up JIRA. http://gerrit.cloudera.org:8080/#/c/18772/5/fe/src/main/java/org/apache/impala/service/JniCatalog.java@261 PS5, Line 261: JniUtil.logResponse(res.length, start, params, "exe > optional: logging the duration could be useful (or we do this somewhere els We calculate the duration in JniUtil.logResponse() and add warnings for slow response (>60s) there. We can add duration here to ease performance comparison. http://gerrit.cloudera.org:8080/#/c/18772/5/fe/src/main/java/org/apache/impala/util/CatalogOpUtil.java File fe/src/main/java/org/apache/impala/util/CatalogOpUtil.java: http://gerrit.cloudera.org:8080/#/c/18772/5/fe/src/main/java/org/apache/impala/util/CatalogOpUtil.java@133 PS5, Line 133: if (req.isSetDb_name()) { : if (req.is_refresh) cmd += "FUNCTIONS IN "; : cmd += "DATABASE " + req.getDb_name(); : } else if (req.isSetTable_name()) { > Is this the right order to check these? So if we are refreshing a table, wo This is consistent with the checks in execResetMetadata(). The name is misleading that "Table_name" here is not just the table name but a TTableName which consists of the db and table string: https://github.com/apache/impala/blob/29a26a536c004e4d16fdb56e1bb5a41605ae4e30/common/thrift/CatalogObjects.thrift#L171-L177 https://github.com/apache/impala/blob/29a26a536c004e4d16fdb56e1bb5a41605ae4e30/common/thrift/CatalogService.thrift#L278-L280 If the target is a database, the db_name is set: https://github.com/apache/impala/blob/29a26a536c004e4d16fdb56e1bb5a41605ae4e30/common/thrift/CatalogService.thrift#L286-L287 -- To view, visit http://gerrit.cloudera.org:8080/18772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iac7f2eda8b95643a3d3c3bef64ea71b67b20595a Gerrit-Change-Number: 18772 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Comment-Date: Fri, 05 Aug 2022 08:44:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18772 to look at the new patch set (#6). Change subject: IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs .. IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs We've seen catalogd throws OutOfMemoryError when serializing large responses (i.e. size > 2GB). However, the related table names are missing in the logs. Admins would like to get the table names and blacklist those tables until they are optimized (e.g. reducing partitions). To improve the supportability, this patch adds logs in the Catalogd RPC code paths to log some details of the request, also add thread annotations to improve readability of jstacks. Tests: - Add unit tests for short descriptions of requests. - Manually add codes to throw OutOfMemoryError and verify the logs shown as expected. - Run test_concurrent_ddls.py and metadata tests. Capture jstacks and verify the thread annotations are shown. - Run CORE tests Change-Id: Iac7f2eda8b95643a3d3c3bef64ea71b67b20595a --- M fe/src/main/java/org/apache/impala/analysis/ColumnName.java M fe/src/main/java/org/apache/impala/analysis/FunctionName.java M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/analysis/TableName.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java A fe/src/main/java/org/apache/impala/util/CatalogOpUtil.java A fe/src/test/java/org/apache/impala/util/CatalogOpUtilTest.java 9 files changed, 513 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/18772/6 -- To view, visit http://gerrit.cloudera.org:8080/18772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iac7f2eda8b95643a3d3c3bef64ea71b67b20595a Gerrit-Change-Number: 18772 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate
[Impala-ASF-CR] IMPALA-11474: Codegen Tuple size in sorter-ir.cc
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18802 ) Change subject: IMPALA-11474: Codegen Tuple size in sorter-ir.cc .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/18802/5/be/src/exec/partial-sort-node.cc File be/src/exec/partial-sort-node.cc: http://gerrit.cloudera.org:8080/#/c/18802/5/be/src/exec/partial-sort-node.cc@98 PS5, Line 98: children_[0]->row_descriptor_->tuple_descriptors()[0]->byte_size() Looked at this after seeing that the tests failed in Kudu inserts (which uses partial sort) This doesn't look right to me, as the child's tuple size can be different than the materialized sort tuple size. The child can also have multiple tuples. We should use row_descriptor_'s tuple size similarly to SortNode. -- To view, visit http://gerrit.cloudera.org:8080/18802 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4161a61db1782dc448dae9a1d4c1d120b055b3c Gerrit-Change-Number: 18802 Gerrit-PatchSet: 5 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Comment-Date: Fri, 05 Aug 2022 07:13:44 + Gerrit-HasComments: Yes