[Impala-ASF-CR] IMPALA-12933: Avoid fetching unneccessary events of unwanted types
k.venureddy2...@gmail.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/21186 ) Change subject: IMPALA-12933: Avoid fetching unneccessary events of unwanted types .. Patch Set 8: Code-Review+1 (1 comment) Just a nit. Looks good to me. http://gerrit.cloudera.org:8080/#/c/21186/8/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/21186/8/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@1336 PS8, Line 1336: currentEventId, MetastoreEvents.AlterTableEvent.ALTER_TABLE_EVENT_TYPE, nit: Can we use getNextMetastoreEventsInBatchesForTable() here and also in createTableCore() ? -- To view, visit http://gerrit.cloudera.org:8080/21186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9 Gerrit-Change-Number: 21186 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 12 Apr 2024 03:57:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12933: Avoid fetching unneccessary events of unwanted types
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21186 ) Change subject: IMPALA-12933: Avoid fetching unneccessary events of unwanted types .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/21186/8/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/21186/8/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@3293 PS8, Line 3293: .getNextMetastoreEventsInBatches(catalog_, beforeDropEventId, > Nit: this can also point to the newly refactored method getNextMetastoreEve This is fetching events for db which don't specify the table name. If we pass in a null table name, we need changes in getNextMetastoreEventsInBatchesForTable to check for that. I think it's ok to leave it as-is. -- To view, visit http://gerrit.cloudera.org:8080/21186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9 Gerrit-Change-Number: 21186 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 12 Apr 2024 00:27:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12998: Add SHOW METADATA TABLES to ignored DDL
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21290 ) Change subject: IMPALA-12998: Add SHOW_METADATA_TABLES to ignored DDL .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15877/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21290 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I69f7de9756aa730d70cd9187c9f869d5bcf67fce Gerrit-Change-Number: 21290 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 11 Apr 2024 22:40:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12998: Add SHOW METADATA TABLES to ignored DDL
Michael Smith has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21290 Change subject: IMPALA-12998: Add SHOW_METADATA_TABLES to ignored DDL .. IMPALA-12998: Add SHOW_METADATA_TABLES to ignored DDL Adds SHOW_METADATA_TABLES to the list of ignored DDL in workload management. Fixes DCHECK failure when running Impala's full test suite with 'enable_workload_mgmt'. Change-Id: I69f7de9756aa730d70cd9187c9f869d5bcf67fce --- M be/src/service/workload-management.cc 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21290/1 -- To view, visit http://gerrit.cloudera.org:8080/21290 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I69f7de9756aa730d70cd9187c9f869d5bcf67fce Gerrit-Change-Number: 21290 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder Now that we have the DIRECTED distribution mode, some parts of IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is time to simplify the above classes. IcebergDeleteBuilder and KrpcDataStreamSender now also tolerate NULL file paths which are also not an error in the hash join mode. Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Reviewed-on: http://gerrit.cloudera.org:8080/21258 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/iceberg-delete-node.cc M be/src/exec/iceberg-delete-node.h M be/src/runtime/krpc-data-stream-sender.cc M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/6348b186d3705f6b-370ecfbb_152551971_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first_and_last.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_last.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_single.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_three_nulls.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/same_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/3a813d5e-fc0b-485f-bbba-010972a9f20a-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/e90d28aa-cd17-4655-ad04-aa3711792576-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/snap-5852039568708655222-1-3a813d5e-fc0b-485f-bbba-010972a9f20a.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M tests/query_test/test_iceberg.py 21 files changed, 219 insertions(+), 104 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 21:27:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 11 Apr 2024 18:36:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12963: Return parent PID when children spawned
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21278 ) Change subject: IMPALA-12963: Return parent PID when children spawned .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15876/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21278 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I214e79507c717340863d27f68f6ea54c169e4090 Gerrit-Change-Number: 21278 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 11 Apr 2024 18:34:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12986: Base64Encode fails if the 'out len' output parameter is passed with certain values
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21271 ) Change subject: IMPALA-12986: Base64Encode fails if the 'out_len' output parameter is passed with certain values .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I35ae59fc9b3280f89ea4f7d95d27d2f21751001f Gerrit-Change-Number: 21271 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 18:24:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12986: Base64Encode fails if the 'out len' output parameter is passed with certain values
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21271 ) Change subject: IMPALA-12986: Base64Encode fails if the 'out_len' output parameter is passed with certain values .. IMPALA-12986: Base64Encode fails if the 'out_len' output parameter is passed with certain values The Base64Encode function in coding-util.h with signature bool Base64Encode(const char* in, int64_t in_len, int64_t out_max, char* out, int64_t* out_len); fails if '*out_len', when passed to the function, contains a value that does not fit in a 32 bit integer. Internally we use the int sasl_encode64(const char *in, unsigned inlen, char *out, unsigned outmax, unsigned *outlen); function and explicitly cast 'out_len' to 'unsigned*'. The error is that the called sasl_encode64() function only sets the four lower bytes of '*out_len' (assuming that 'unsigned' is a 32 bit integer), and if the upper bytes are not all zero, the resulting value of '*out_len' will be incorrect. This change changes the type of 'out_len' from 'int64_t*' to 'unsigned*' to match the type that sasl_encode64() expects. Base64Decode() is also updated to use 'unsigned*'. Before this change it used an intermediate 32 bit local variable to avoid this issue. Testing: - added a regression test in coding-util-test.cc Change-Id: I35ae59fc9b3280f89ea4f7d95d27d2f21751001f Reviewed-on: http://gerrit.cloudera.org:8080/21271 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exec/text-converter.inline.h M be/src/exprs/string-functions-ir.cc M be/src/util/coding-util-test.cc M be/src/util/coding-util.cc M be/src/util/coding-util.h M be/src/util/runtime-profile.cc 6 files changed, 43 insertions(+), 16 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I35ae59fc9b3280f89ea4f7d95d27d2f21751001f Gerrit-Change-Number: 21271 Gerrit-PatchSet: 5 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12963: Return parent PID when children spawned
Hello Riza Suminto, Yida Wu, Jason Fehr, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21278 to look at the new patch set (#3). Change subject: IMPALA-12963: Return parent PID when children spawned .. IMPALA-12963: Return parent PID when children spawned Returns the original PID for a command rather than any children that may be active. This happens during graceful shutdown in UBSAN tests. Also updates 'kill' to use the version of 'get_pid' that logs details to help with debugging. Moves try block in test_query_log.py to after client2 has been initialized. Removes 'drop table' on unique_database, since test suite already handles cleanup. Change-Id: I214e79507c717340863d27f68f6ea54c169e4090 --- M tests/common/impala_cluster.py M tests/custom_cluster/test_query_log.py 2 files changed, 72 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/21278/3 -- To view, visit http://gerrit.cloudera.org:8080/21278 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I214e79507c717340863d27f68f6ea54c169e4090 Gerrit-Change-Number: 21278 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21285 ) Change subject: IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21285 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581 Gerrit-Change-Number: 21285 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 17:35:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12990: Fix impala-shell handling of unset rows deleted
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21284 ) Change subject: IMPALA-12990: Fix impala-shell handling of unset rows_deleted .. Patch Set 5: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/21284/2/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/21284/2/shell/impala_client.py@980 PS2, Line 980: self._check_hs2_rpc_status(resp.status) > About TestHS2FaultInjection::test_close_dml: Ah, I assumed the comment referred to num_deleted_rows but looks like it was never updated to include that. It'd be nice to have StrictHS2Client::close_dml match the return layout even if not used. But not required. http://gerrit.cloudera.org:8080/#/c/21284/2/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/21284/2/tests/shell/test_shell_commandline.py@932 PS2, Line 932: def _test_dml_reporting(self, vector, create_sql, db, is_kudu): > Yes ... but as I learned the Python3 + beeswax combination is only run in e Ack -- To view, visit http://gerrit.cloudera.org:8080/21284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438 Gerrit-Change-Number: 21284 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 11 Apr 2024 17:12:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 16:17:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10535/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 16:17:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 16:10:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11495: Add glibc version and effective locale to the Web UI
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21252 ) Change subject: IMPALA-11495: Add glibc version and effective locale to the Web UI .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15875/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia69c4d63df4beae29f5261691a8dcdd04b931de7 Gerrit-Change-Number: 21252 Gerrit-PatchSet: 2 Gerrit-Owner: Saurabh Katiyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Saurabh Katiyal Gerrit-Comment-Date: Thu, 11 Apr 2024 16:11:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15874/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 15:52:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. Patch Set 3: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@135 PS1, Line 135: // Skip the unsupported type and set it to NULL > I was wrong here, DATE is present in for example the 'all_files' metadata t good catch! http://gerrit.cloudera.org:8080/#/c/21269/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21269/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@801 PS3, Line 801: row_regex:'{2147483546:"aGRmczovL2xvY2FsaG9zdDoyMDUwMC90ZXN0LXdhcmVob3VzZS9pY2ViZXJnX3Rlc3QvaGFkb29wX2NhdGFsb2cvaWNlL2ljZWJlcmdfcXVlcnlfbWV0YWRhdGEvZGF0YS8.*",2147483545:"AAA="}','{2147483546:"aGRmczovL2xvY2FsaG9zdDoyMDUwMC90ZXN0LXdhcmVob3VzZS9pY2ViZXJnX3Rlc3QvaGFkb29wX2NhdGFsb2cvaWNlL2ljZWJlcmdfcXVlcnlfbWV0YWRhdGEvZGF0YS8.*",2147483545:"AAA="}' > We have file paths here and the file names will be different for each data Can't we provide fixed file names by using a table with copied files/manifests? I am more worried about the prefix of the path, e.g. S3:// vs HDFS:// -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Comment-Date: Thu, 11 Apr 2024 15:47:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11495: Add glibc version and effective locale to the Web UI
Saurabh Katiyal has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/21252 ) Change subject: IMPALA-11495: Add glibc version and effective locale to the Web UI .. IMPALA-11495: Add glibc version and effective locale to the Web UI Added a new section "Other Info" in root page for WebUI of catalog, coordinator and statestore displaying effective locale and glibc version. Change-Id: Ia69c4d63df4beae29f5261691a8dcdd04b931de7 --- M be/src/util/default-path-handlers.cc M tests/webserver/test_web_pages.py M www/root.tmpl 3 files changed, 17 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/21252/2 -- To view, visit http://gerrit.cloudera.org:8080/21252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia69c4d63df4beae29f5261691a8dcdd04b931de7 Gerrit-Change-Number: 21252 Gerrit-PatchSet: 2 Gerrit-Owner: Saurabh Katiyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Saurabh Katiyal
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. Patch Set 3: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/15873/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Comment-Date: Thu, 11 Apr 2024 15:33:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 7: (1 comment) Thanks for the comment! http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py File tests/query_test/test_iceberg.py: http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py@1455 PS7, Line 1455: if vector.get_value('exec_option')['disable_optimized_iceberg_v2_read'] == 0: > Here the code says that "if we don't disable the V2 read optimisations then Ah, those negations... you're right. Anyway, it revealed a bug in the test code that I fixed in the new PS. -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 15:28:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21258 to look at the new patch set (#10). Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder Now that we have the DIRECTED distribution mode, some parts of IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is time to simplify the above classes. IcebergDeleteBuilder and KrpcDataStreamSender now also tolerate NULL file paths which are also not an error in the hash join mode. Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/iceberg-delete-node.cc M be/src/exec/iceberg-delete-node.h M be/src/runtime/krpc-data-stream-sender.cc M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/6348b186d3705f6b-370ecfbb_152551971_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first_and_last.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_last.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_single.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_three_nulls.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/same_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/3a813d5e-fc0b-485f-bbba-010972a9f20a-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/e90d28aa-cd17-4655-ad04-aa3711792576-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/snap-5852039568708655222-1-3a813d5e-fc0b-485f-bbba-010972a9f20a.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M tests/query_test/test_iceberg.py 21 files changed, 219 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/10 -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Daniel Becker has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list Binary fields in complex types are currently not supported at all for regular tables (an error is returned). For Iceberg metadata tables, IMPALA-12899 added a temporary workaround to allow queries that contain these fields to succeed by NULLing them out. This change adds support for displaying them with base64 encoding for both regular and Iceberg metadata tables. Complex types are displayed in JSON format, so simply inserting the bytes of the binary fields is not acceptable as it would produce invalid JSON. Base64 is a widely used encoding that allows representing arbitrary binary information using only a limited set of ASCII characters. This change also adds support for top level binary columns in Iceberg metadata tables. However, these are not base64 encoded but are returned in raw byte format - this is consistent with how top level binary columns from regular (non-metadata) tables are handled. Testing: - added test queries in iceberg-metadata-tables.test referencing both nested and top level binary fields; also updated existing queries - moved relevant tests (queries extracting binary fields from within complex types) from nested-types-scanner-basic.test to a new binary-in-complex-type.test file and also added a query that selects the containing complex types; this new test file is run from test_scanners.py::TestBinaryInComplexType::\ test_binary_in_complex_type - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to positive tests (expecting success); a negative test already in AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M be/src/rpc/jni-thrift-util.h M be/src/runtime/complex-value-writer.inline.h M be/src/util/jni-util.cc M be/src/util/jni-util.h M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test M tests/query_test/test_scanners.py 26 files changed, 440 insertions(+), 153 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/3 -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/21269/3/be/src/util/jni-util.h File be/src/util/jni-util.h: http://gerrit.cloudera.org:8080/#/c/21269/3/be/src/util/jni-util.h@115 PS3, Line 115: /// is more restricted, see https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#GetPrimitiveArrayCritical_ReleasePrimitiveArrayCritical line too long (162 > 90) -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Comment-Date: Thu, 11 Apr 2024 15:11:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12990: Fix impala-shell handling of unset rows deleted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21284 ) Change subject: IMPALA-12990: Fix impala-shell handling of unset rows_deleted .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15872/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438 Gerrit-Change-Number: 21284 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 11 Apr 2024 14:46:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12990: Fix impala-shell handling of unset rows deleted
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21284 ) Change subject: IMPALA-12990: Fix impala-shell handling of unset rows_deleted .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/21284/4/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/21284/4/tests/shell/test_shell_commandline.py@961 PS4, Line 961: Kude > typo: Kudu Done -- To view, visit http://gerrit.cloudera.org:8080/21284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438 Gerrit-Change-Number: 21284 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 11 Apr 2024 14:23:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12990: Fix impala-shell handling of unset rows deleted
Hello Peter Rozsa, Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21284 to look at the new patch set (#5). Change subject: IMPALA-12990: Fix impala-shell handling of unset rows_deleted .. IMPALA-12990: Fix impala-shell handling of unset rows_deleted The issue occurred in Python 3 when 0 rows were deleted from Iceberg. It could also happen in other DMLs with older Impala servers where TDmlResult.rows_deleted was not set. See the Jira for details of the error. Testing: Extended shell tests for Kudu DML reporting to also cover Iceberg. Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438 --- M shell/impala_client.py M tests/custom_cluster/test_hs2_fault_injection.py M tests/shell/test_shell_commandline.py 3 files changed, 65 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/21284/5 -- To view, visit http://gerrit.cloudera.org:8080/21284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438 Gerrit-Change-Number: 21284 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 11: (6 comments) http://gerrit.cloudera.org:8080/#/c/21257/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21257/11//COMMIT_MSG@19 PS11, Line 19: rather : than sum of it (48) Can this get higher than the amount of slots per executor? Based on IMPALA-8998 the query is refused in this case. http://gerrit.cloudera.org:8080/#/c/21257/11//COMMIT_MSG@28 PS11, Line 28: which will be a closer resemblance of maximum : parallel execution of fragment instances. Does PLANNER_CPU_ASK always calculate a greater or equal number of slots than LARGEST_FRAGMENT? http://gerrit.cloudera.org:8080/#/c/21257/11/testdata/workloads/functional-query/queries/QueryTest/processing_cost_admission_slots.test File testdata/workloads/functional-query/queries/QueryTest/processing_cost_admission_slots.test: http://gerrit.cloudera.org:8080/#/c/21257/11/testdata/workloads/functional-query/queries/QueryTest/processing_cost_admission_slots.test@2 PS11, Line 2: QUERY It would be very nice to add an explain for the same query with the same query options to show plan. At the moment it is hard for me to think through how exactly we got to the results below - seeing a plan + having a few pointers in comments could help a lot. Besides documentation, this would also ensure that if the plan changes in the future, the test case with the explain will break, making it much easier to investigate the cause of the failure. http://gerrit.cloudera.org:8080/#/c/21257/11/testdata/workloads/functional-query/queries/QueryTest/processing_cost_admission_slots.test@31 PS11, Line 31: RESULTS Are the results actually important here? If the goal is to verify that slot_count_strategy doesn't change the results of the queries, then it would be better to add it to the dimensions of some large query suite, e.g. test_queries.py http://gerrit.cloudera.org:8080/#/c/21257/11/tests/custom_cluster/test_executor_groups.py File tests/custom_cluster/test_executor_groups.py: http://gerrit.cloudera.org:8080/#/c/21257/11/tests/custom_cluster/test_executor_groups.py@1245 PS11, Line 1245: # CoreCount={total=16 trace=F15:3+F01:1+F14:3+F03:1+F13:3+F05:1+F12:3+F07:1}, This is very useful to help in understanding the patch, but it would be better to point to specific query plan, as the plan (and fragment numbers) may change in future. http://gerrit.cloudera.org:8080/#/c/21257/11/tests/query_test/test_processing_cost.py File tests/query_test/test_processing_cost.py: http://gerrit.cloudera.org:8080/#/c/21257/11/tests/query_test/test_processing_cost.py@42 PS11, Line 42: add_mandatory_exec_option(cls, 'slot_count_strategy', 'planner_cpu_ask') IMO moving these to SET statements in the .test file would be clearer. It would also allow running the same query with both slot_count_strategy and planner_cpu_ask to show that it leads to different estimates. -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 11 Apr 2024 14:21:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21279 ) Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator .. Patch Set 7: (5 comments) I took the liberty of editing few nits along with the rebases. I leave my comments here for some that is not quite straightforward and need discussion. http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@72 PS7, Line 72: private static final double AGG_EXPR_COST_COEFFICIENT = 0.07; : // Cost per input row with single grouping expr with single agg expr : private static final double AGG_INPUT_SINGLE_GROUP_COST_COEFFICIENT = 0.2925; : // Cost per output row produced with single grouping expr and single agg expr : private static final double AGG_OUTPUT_SINGLE_GROUP_COST_COEFFICIENT = 2.6072; : // Cost per input row with multiple grouping exprs and single agg expr : private static final double AGG_INPUT_MULTI_GROUP_COST_COEFFICIENT = 1.3741; : // Cost per output row produced with multiple grouping exprs and single agg expr : private static final double AGG_OUTPUT_MULTI_GROUP_COST_COEFFICIENT = 4.5285; nit: Personally, I prefer related constant names to have common prefix, so these names becomes: COST_COEFFICIENT_AGG_EXPR COST_COEFFICIENT_AGG_INPUT COST_COEFFICIENT_AGG_OUTPUT COST_COEFFICIENT_AGG_INPUT_MULTI_GROUP COST_COEFFICIENT_AGG_OUTPUT_MULTI_GROUP I think it is easier to distinguish them if they have common prefix. Same comment for other coefficient constants. http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@582 PS7, Line 582: fragment_.getNumInstances() I'm not sure about using getNumInstances() here. traverseEffectiveParallelism() will change parallelism/num_instances later. If traverseEffectiveParallelism() increase/decrease num instances, intermediateOutputCardinality will be underestimated/overestimated. I understand this is a chicken-and-egg problem. I wonder if simply using Analyzer.getMinParallelismPerNode() * getNumNodes() is enough. That would be the approximate lower bound for intermediateOutputCardinality. What is your observation during benchmarking on this issue? http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java File fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java: http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java@116 PS7, Line 116: fragment_.getNumInstances() Another chicken-and-egg problem of using getNumInstances() before traverseEffectiveParallelism(). http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@633 PS7, Line 633: getNumInstances Another chicken-and-egg problem of using getNumInstances() before traverseEffectiveParallelism()? http://gerrit.cloudera.org:8080/#/c/21279/7/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test: http://gerrit.cloudera.org:8080/#/c/21279/7/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test@132 PS7, Line 132:HDFS partitions=1824/1824 files=1824 size=199.18MB :stored statistics: : table: rows=2.88M size=199.18MB nit: unnecessary change. Small bytes variability that is not asserted in Planner test. -- To view, visit http://gerrit.cloudera.org:8080/21279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca Gerrit-Change-Number: 21279 Gerrit-PatchSet: 7 Gerrit-Owner: David Rorke Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 11 Apr 2024 14:14:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15871/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 11 Apr 2024 13:49:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/21257/10/testdata/workloads/functional-query/queries/QueryTest/processing_cost_admission_slots.test File testdata/workloads/functional-query/queries/QueryTest/processing_cost_admission_slots.test: http://gerrit.cloudera.org:8080/#/c/21257/10/testdata/workloads/functional-query/queries/QueryTest/processing_cost_admission_slots.test@31 PS10, Line 31: RESULTS : 'AAABBAAA' > This assertion failed in dockerised environment. Done -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 11 Apr 2024 13:26:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10534/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 11 Apr 2024 13:28:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Hello Kurt Deschler, Abhishek Rawat, Csaba Ringhofer, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21257 to look at the new patch set (#11). Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. IMPALA-12980: Translate CpuAsk into admission control slots Impala has a concept of "admission control slots" - the amount of parallelism that should be allowed on an Impala daemon. This defaults to the number of processors per executor and can be overridden with -–admission_control_slots flag. Admission control slot accounting is described in IMPALA-8998. It compute 'slots_to_use' for each backend based on the max number of instances of any fragment on that backend. This can lead to underestimation, because multiple non-blocking fragments with the same number of instance counts, say 4 non-blocking fragments each with 12 instances, only request the max instance (12) admission slots rather than sum of it (48), making each of 12 cores oversubscribed by 4x. When COMPUTE_PROCESSING_COST is enabled, Planner will generate a CpuAsk number that represents the cpu requirement of that query over a particular executor group set. This number is an estimation of the largest number of query fragment instances that can run in parallel without waiting, given by the blocking operator analysis. Therefore, the fragment trace that sums into that CpuAsk number can be translated into 'slots_to_use' as well, which will be a closer resemblance of maximum parallel execution of fragment instances. This patch adds a new query option called SLOT_COUNT_STRATEGY to control which admission control slot accounting to use. There are two possible values: - LARGEST_FRAGMENT, which is the original algorithm from IMPALA-8998. This is still the default value for the SLOT_COUNT_STRATEGY option. - PLANNER_CPU_ASK, which will follow fragment trace that contributes towards CpuAsk number. To do the PLANNER_CPU_ASK strategy, the Planner will mark fragments that contribute to CpuAsk as dominant fragments. It also passes max_slot_per_executor information that it knows about the executor group set to the scheduler. AvgAdmissionSlotsPerExecutor counter is added to describe what Planner thinks the average 'slots_to_use' per backend will be, which follows this formula: AvgAdmissionSlotsPerExecutor = ceil(CpuAsk / num_executors) Actual 'slots_to_use' in each backend may differ than AvgAdmissionSlotsPerExecutor, depending on what is scheduled on that backend. 'slots_to_use' will be shown as 'AdmissionSlots' counter under each executor profile node. Testing: - Update test_executors.py with AvgAdmissionSlotsPerExecutor assertion. - Pass test_tpcds_queries.py::TestTpcdsQueryWithProcessingCost. Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 --- M be/src/scheduling/admission-controller-test.cc M be/src/scheduling/admission-controller.cc M be/src/scheduling/scheduler.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/common/Id.java M fe/src/main/java/org/apache/impala/planner/CoreCount.java M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A testdata/workloads/functional-query/queries/QueryTest/processing_cost_admission_slots.test M tests/custom_cluster/test_executor_groups.py A tests/query_test/test_processing_cost.py M tests/query_test/test_tpcds_queries.py 19 files changed, 534 insertions(+), 96 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/21257/11 -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/21257/10/testdata/workloads/functional-query/queries/QueryTest/processing_cost_admission_slots.test File testdata/workloads/functional-query/queries/QueryTest/processing_cost_admission_slots.test: http://gerrit.cloudera.org:8080/#/c/21257/10/testdata/workloads/functional-query/queries/QueryTest/processing_cost_admission_slots.test@31 PS10, Line 31: row_regex: .* AdmissionSlots: 8 .* : row_regex: .* AdmissionSlots: 4 .* This assertion failed in dockerised environment. In regular 3 node minicluster, fragment assignment is 8, 4, 4. In dockerised env, fragment assignment is 7, 4, 5. Will remove this. The assertion in L26 should be sufficient. -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 11 Apr 2024 13:21:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12986: Base64Encode fails if the 'out len' output parameter is passed with certain values
Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/21271 ) Change subject: IMPALA-12986: Base64Encode fails if the 'out_len' output parameter is passed with certain values .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I35ae59fc9b3280f89ea4f7d95d27d2f21751001f Gerrit-Change-Number: 21271 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 13:19:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12986: Base64Encode fails if the 'out len' output parameter is passed with certain values
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21271 ) Change subject: IMPALA-12986: Base64Encode fails if the 'out_len' output parameter is passed with certain values .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10533/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/21271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I35ae59fc9b3280f89ea4f7d95d27d2f21751001f Gerrit-Change-Number: 21271 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 13:20:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12986: Base64Encode fails if the 'out len' output parameter is passed with certain values
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21271 ) Change subject: IMPALA-12986: Base64Encode fails if the 'out_len' output parameter is passed with certain values .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I35ae59fc9b3280f89ea4f7d95d27d2f21751001f Gerrit-Change-Number: 21271 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 13:20:11 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: Bump Impyla version to 0.20a1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21286 ) Change subject: WIP: Bump Impyla version to 0.20a1 .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21286 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I297a2a34f3e688555ce8572b6c7fffbd34423f2d Gerrit-Change-Number: 21286 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 11 Apr 2024 12:57:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21285 ) Change subject: IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs .. Patch Set 2: Code-Review+2 (1 comment) Thanks for the comment! Carry +2 http://gerrit.cloudera.org:8080/#/c/21285/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21285/1//COMMIT_MSG@9 PS1, Line 9: using > nit: using Done -- To view, visit http://gerrit.cloudera.org:8080/21285 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581 Gerrit-Change-Number: 21285 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 12:33:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs
Hello Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21285 to look at the new patch set (#2). Change subject: IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs .. IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs Since we are using IcebergBufferedDeleteSink, which sorts the data before flushing, there is no need to add a SORT node before the sink. Testing: * updated planner tests Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581 --- M fe/src/main/java/org/apache/impala/planner/Planner.java M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test 2 files changed, 7 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/21285/2 -- To view, visit http://gerrit.cloudera.org:8080/21285 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581 Gerrit-Change-Number: 21285 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 10: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10530/ -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 11 Apr 2024 12:36:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21285 ) Change subject: IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10532/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/21285 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581 Gerrit-Change-Number: 21285 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 12:33:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12990: Fix impala-shell handling of unset rows deleted
Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/21284 ) Change subject: IMPALA-12990: Fix impala-shell handling of unset rows_deleted .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/21284/2/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/21284/2/tests/shell/test_shell_commandline.py@925 PS2, Line 925: def test_iceberg_dml_reporting(self, vector, unique_database): > Changed the function to make the Kudu / Iceberg differences more explicit a Thank you, it's clearer now. http://gerrit.cloudera.org:8080/#/c/21284/4/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/21284/4/tests/shell/test_shell_commandline.py@961 PS4, Line 961: Kude typo: Kudu -- To view, visit http://gerrit.cloudera.org:8080/21284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438 Gerrit-Change-Number: 21284 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 11 Apr 2024 12:25:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12990: Fix impala-shell handling of unset rows deleted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21284 ) Change subject: IMPALA-12990: Fix impala-shell handling of unset rows_deleted .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15870/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438 Gerrit-Change-Number: 21284 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 11 Apr 2024 11:30:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12990: Fix impala-shell handling of unset rows deleted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21284 ) Change subject: IMPALA-12990: Fix impala-shell handling of unset rows_deleted .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15869/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438 Gerrit-Change-Number: 21284 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 11 Apr 2024 11:20:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12990: Fix impala-shell handling of unset rows deleted
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21284 ) Change subject: IMPALA-12990: Fix impala-shell handling of unset rows_deleted .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/21284/2/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/21284/2/shell/impala_client.py@980 PS2, Line 980: self._check_hs2_rpc_status(resp.status) > This seems right given how it's used in impala_shell.py. About TestHS2FaultInjection::test_close_dml: The "return" was never reached there as the expectation is that an exception is thrown during the RPC. Updated the code as it was quite misleading. about ImpalaBeeswaxClient::close_dml: thanks for catching it, updated the beeswax case StrictHS2Client::close_dml: TDmlResult is only returned in Impala, so the strict client doesn't use it. Hive has a different way to report number of modified rows, which is not used in Impala. If interested, IMPALA-12647 has details about the Impala and Hive solution >Also the comment describing ImpalaClient::close_dml The comment I found was about row errors, not deleted rows, and for row errors this is still true. Updated the comment as it was stale and didn't mention deleted rows. http://gerrit.cloudera.org:8080/#/c/21284/2/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/21284/2/tests/shell/test_shell_commandline.py@925 PS2, Line 925: def test_iceberg_dml_reporting(self, vector, unique_database): > Reusing the Kudu DML sequence makes this new test compact but a bit complex Changed the function to make the Kudu / Iceberg differences more explicit and added some comments. If you think that it is still too complex, I can split the function to two. http://gerrit.cloudera.org:8080/#/c/21284/2/tests/shell/test_shell_commandline.py@932 PS2, Line 932: def _test_dml_reporting(self, vector, create_sql, db, is_kudu): > Do these get run with both HS2 and Beeswax? Yes ... but as I learned the Python3 + beeswax combination is only run in exhaustive tests so the tests I ran didn't catch it. This is a bit unlucky situation, I think that we should run all protocols with Python2 and Python3 in core tests. -- To view, visit http://gerrit.cloudera.org:8080/21284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438 Gerrit-Change-Number: 21284 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 11 Apr 2024 11:16:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12990: Fix impala-shell handling of unset rows deleted
Hello Peter Rozsa, Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21284 to look at the new patch set (#4). Change subject: IMPALA-12990: Fix impala-shell handling of unset rows_deleted .. IMPALA-12990: Fix impala-shell handling of unset rows_deleted The issue occurred in Python 3 when 0 rows were deleted from Iceberg. It could also happen in other DMLs with older Impala servers where TDmlResult.rows_deleted was not set. See the Jira for details of the error. Testing: Extended shell tests for Kudu DML reporting to also cover Iceberg. Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438 --- M shell/impala_client.py M tests/custom_cluster/test_hs2_fault_injection.py M tests/shell/test_shell_commandline.py 3 files changed, 65 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/21284/4 -- To view, visit http://gerrit.cloudera.org:8080/21284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438 Gerrit-Change-Number: 21284 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12990: Fix impala-shell handling of unset rows deleted
Hello Peter Rozsa, Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21284 to look at the new patch set (#3). Change subject: IMPALA-12990: Fix impala-shell handling of unset rows_deleted .. IMPALA-12990: Fix impala-shell handling of unset rows_deleted The issue occurred in Python 3 when 0 rows were deleted from Iceberg. It could also happen in other DMLs with older Impala servers where TDmlResult.rows_deleted was not set. See the Jira for details of the error. Testing: Extended shell tests for Kudu DML reporting to also cover Iceberg. Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438 --- M shell/impala_client.py M tests/custom_cluster/test_hs2_fault_injection.py M tests/shell/test_shell_commandline.py 3 files changed, 61 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/21284/3 -- To view, visit http://gerrit.cloudera.org:8080/21284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438 Gerrit-Change-Number: 21284 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] [TEST][WIP]IMPALA-11499: Refactor UrlEncode function to handle special characters
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 ) Change subject: [TEST][WIP]IMPALA-11499: Refactor UrlEncode function to handle special characters .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc File be/src/util/coding-util.cc: http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@83 PS3, Line 83: if (!hive_compat) { > That was previously possible. The following works for me before this change The exception is thrown here in the catalogd: https://github.com/apache/impala/blob/70c35425d3f8ac68b23fb8e0d08e12ee763965d7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1770 It is a Preconditions check. The partition name is not found in the map. 'partName' is "p=12?!@%23$%25^&%2A()|%3F%2F.~`", while 'nameToPartitionMap_' contains "p=12?!@%23$%25%5E&%2A()|%3F%2F.~`". The difference is the caret ("^"). What I think happens is that the HMS encodes it to "%5E", and that is what is present in the 'nameToPartitionMap_' map. But we don't encode it, hence 'partName' contains "^". Indeed, if I add the mapping {'^', "%5E"} in 'specialCharacterMap' in this file, there is no error. -- To view, visit http://gerrit.cloudera.org:8080/21131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Gerrit-Change-Number: 21131 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Thu, 11 Apr 2024 10:38:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12990: Fix impala-shell handling of unset rows deleted
Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/21284 ) Change subject: IMPALA-12990: Fix impala-shell handling of unset rows_deleted .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/21284/2/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/21284/2/tests/shell/test_shell_commandline.py@925 PS2, Line 925: def test_iceberg_dml_reporting(self, vector, unique_database): Reusing the Kudu DML sequence makes this new test compact but a bit complex. Maybe creating a separate test sequence for Iceberg should simplify it, at the cost of adding duplicate rows, but these duplicates, in my opinion, can be reasonable as they have different semantics. -- To view, visit http://gerrit.cloudera.org:8080/21284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438 Gerrit-Change-Number: 21284 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 11 Apr 2024 09:31:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py File tests/query_test/test_iceberg.py: http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py@1455 PS7, Line 1455: if vector.get_value('exec_option')['disable_optimized_iceberg_v2_read'] == 0: > We only test DIRECTED mode + V2 operator (which go in hand with each other) Here the code says that "if we don't disable the V2 read optimisations then skip the test" or in other words "if we have V2 read opts then skip". I might misread this code or we don't test the DIRECTED mode here only the old ANTI JOIN case. -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 09:02:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP: Bump Impyla version to 0.20a1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21286 ) Change subject: WIP: Bump Impyla version to 0.20a1 .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15868/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21286 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I297a2a34f3e688555ce8572b6c7fffbd34423f2d Gerrit-Change-Number: 21286 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 11 Apr 2024 08:16:42 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: Bump Impyla version to 0.20a1
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21286 to look at the new patch set (#2). Change subject: WIP: Bump Impyla version to 0.20a1 .. WIP: Bump Impyla version to 0.20a1 Change-Id: I297a2a34f3e688555ce8572b6c7fffbd34423f2d --- M infra/python/deps/requirements.txt M tests/common/impala_connection.py 2 files changed, 3 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/21286/2 -- To view, visit http://gerrit.cloudera.org:8080/21286 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I297a2a34f3e688555ce8572b6c7fffbd34423f2d Gerrit-Change-Number: 21286 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] WIP: Bump Impyla version to 0.20a1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21286 ) Change subject: WIP: Bump Impyla version to 0.20a1 .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10531/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21286 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I297a2a34f3e688555ce8572b6c7fffbd34423f2d Gerrit-Change-Number: 21286 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 11 Apr 2024 07:52:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21267 ) Change subject: IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/21267/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/21267/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@107 PS1, Line 107: fileDescs_ = new ArrayList<>(fileDescs_); : Collections.sort(fileDescs_); > I'm experimenting with your suggestion and see that it would bring too much I see, thanks Gabor for investigating this. -- To view, visit http://gerrit.cloudera.org:8080/21267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3 Gerrit-Change-Number: 21267 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 07:43:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21279 ) Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15867/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca Gerrit-Change-Number: 21279 Gerrit-PatchSet: 6 Gerrit-Owner: David Rorke Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 11 Apr 2024 07:37:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15865/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 11 Apr 2024 07:36:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21277 ) Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15866/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923 Gerrit-Change-Number: 21277 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 11 Apr 2024 07:36:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12920: Support ai generate text built-in function for OpenAI's chat completion API
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21168 ) Change subject: IMPALA-12920: Support ai_generate_text built-in function for OpenAI's chat completion API .. IMPALA-12920: Support ai_generate_text built-in function for OpenAI's chat completion API Added support for following built-in functions: - ai_generate_text_default(prompt) - ai_generate_text(ai_endpoint, prompt, ai_model, ai_api_key_jceks_secret, additional_params) 'ai_endpoint', 'ai_model' and 'ai_api_key_jceks_secret' are flagfile options. 'ai_generate_text_default(prompt)' syntax expects all these to be set to proper values. The other syntax, will try to use the provided input parameter values, but fallback to instance level values if the inputs are NULL or empty. Only public OpenAI (api.openai.com) and Azure OpenAI (openai.azure.com) API endpoints are currently supported. Exposed these functions in FunctionContext so that they can also be called from UDFs: - ai_generate_text_default(context, model) - ai_generate_text(context, ai_endpoint, prompt, ai_model, ai_api_key_jceks_secret, additional_params) Testing: - Added unit tests for AiGenerateTextInternal function - Added fe test for JniFrontend::getSecretFromKeyStore - Ran manual tests to make sure Impala can talk with OpenAI LLMs using 'ai_generate_text' built-in function. Example sql: select ai_generate_text("https://api.openai.com/v1/chat/completions;, "hello", "gpt-3.5-turbo", "open-ai-key", '{"temperature": 0.9, "model": "gpt-4"}') - Tested using standalone UDF SDK and made sure that the UDFs can invoke BuiltInFunctions (ai_generate_text and ai_generate_text_default) Change-Id: Id4446957f6030bab1f985fdd69185c3da07d7c4b Reviewed-on: http://gerrit.cloudera.org:8080/21168 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exprs/CMakeLists.txt A be/src/exprs/ai-functions-ir.cc A be/src/exprs/ai-functions.h A be/src/exprs/ai-functions.inline.h M be/src/exprs/expr-test.cc M be/src/exprs/scalar-expr-evaluator.cc M be/src/runtime/exec-env.cc M be/src/service/frontend.cc M be/src/service/frontend.h M be/src/udf/udf-internal.h M be/src/udf/udf.cc M be/src/udf/udf.h M be/src/udf_samples/udf-sample.cc M be/src/udf_samples/udf-sample.h M be/src/util/jni-util.h M bin/load-data.py M bin/rat_exclude_files.txt M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/service/JniFrontendTest.java M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py A testdata/jceks/.gitkeep 22 files changed, 784 insertions(+), 6 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Id4446957f6030bab1f985fdd69185c3da07d7c4b Gerrit-Change-Number: 21168 Gerrit-PatchSet: 19 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12920: Support ai generate text built-in function for OpenAI's chat completion API
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21168 ) Change subject: IMPALA-12920: Support ai_generate_text built-in function for OpenAI's chat completion API .. Patch Set 18: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id4446957f6030bab1f985fdd69185c3da07d7c4b Gerrit-Change-Number: 21168 Gerrit-PatchSet: 18 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 11 Apr 2024 07:25:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21279 ) Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/21279/4/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java File fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java: http://gerrit.cloudera.org:8080/#/c/21279/4/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java@78 PS4, Line 78: total-cost=" > Replace with "total-cost=" Done http://gerrit.cloudera.org:8080/#/c/21279/4/tests/custom_cluster/test_executor_groups.py File tests/custom_cluster/test_executor_groups.py: http://gerrit.cloudera.org:8080/#/c/21279/4/tests/custom_cluster/test_executor_groups.py@1105 PS4, Line 1105: > flake8: E261 at least two spaces before inline comment Done -- To view, visit http://gerrit.cloudera.org:8080/21279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca Gerrit-Change-Number: 21279 Gerrit-PatchSet: 6 Gerrit-Owner: David Rorke Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 11 Apr 2024 07:21:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10530/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 11 Apr 2024 07:23:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21285 ) Change subject: IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs .. Patch Set 1: Code-Review+2 (1 comment) Thanks! +2 with a nit http://gerrit.cloudera.org:8080/#/c/21285/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21285/1//COMMIT_MSG@9 PS1, Line 9: useing nit: using -- To view, visit http://gerrit.cloudera.org:8080/21285 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581 Gerrit-Change-Number: 21285 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 11 Apr 2024 07:19:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21277 ) Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/21277/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21277/1//COMMIT_MSG@16 PS1, Line 16: MAX_FRAGMENT_INSTANCES_PER_NODE options accordingly. > The EG selection should only need the unbounded number for all EG. The last Done http://gerrit.cloudera.org:8080/#/c/21277/2/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/21277/2/fe/src/main/java/org/apache/impala/service/Frontend.java@2300 PS2, Line 2300: > Might want to use max(cores_requirement, cores_requirement_unbounded) here, I go ahead with Math.max(cores_requirement, cores_requirement_unbounded). -- To view, visit http://gerrit.cloudera.org:8080/21277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923 Gerrit-Change-Number: 21277 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 11 Apr 2024 07:18:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 10: (2 comments) http://gerrit.cloudera.org:8080/#/c/21257/9/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/21257/9/be/src/scheduling/scheduler.cc@1212 PS9, Line 1212: be_max_instances = > CPC cpu costing also apply to coordinator fragments, but excluded during EG Add tests/query_test/test_processing_cost.py in ps10 http://gerrit.cloudera.org:8080/#/c/21257/9/fe/src/main/java/org/apache/impala/planner/CoreCount.java File fe/src/main/java/org/apache/impala/planner/CoreCount.java: http://gerrit.cloudera.org:8080/#/c/21257/9/fe/src/main/java/org/apache/impala/planner/CoreCount.java@148 PS9, Line 148: } > I think totalWithoutCoordinator() method will be useful. I'll test this. Done -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 11 Apr 2024 07:14:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Hello Kurt Deschler, Abhishek Rawat, Csaba Ringhofer, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21257 to look at the new patch set (#10). Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. IMPALA-12980: Translate CpuAsk into admission control slots Impala has a concept of "admission control slots" - the amount of parallelism that should be allowed on an Impala daemon. This defaults to the number of processors per executor and can be overridden with -–admission_control_slots flag. Admission control slot accounting is described in IMPALA-8998. It compute 'slots_to_use' for each backend based on the max number of instances of any fragment on that backend. This can lead to underestimation, because multiple non-blocking fragments with the same number of instance counts, say 4 non-blocking fragments each with 12 instances, only request the max instance (12) admission slots rather than sum of it (48), making each of 12 cores oversubscribed by 4x. When COMPUTE_PROCESSING_COST is enabled, Planner will generate a CpuAsk number that represents the cpu requirement of that query over a particular executor group set. This number is an estimation of the largest number of query fragment instances that can run in parallel without waiting, given by the blocking operator analysis. Therefore, the fragment trace that sums into that CpuAsk number can be translated into 'slots_to_use' as well, which will be a closer resemblance of maximum parallel execution of fragment instances. This patch adds a new query option called SLOT_COUNT_STRATEGY to control which admission control slot accounting to use. There are two possible values: - LARGEST_FRAGMENT, which is the original algorithm from IMPALA-8998. This is still the default value for the SLOT_COUNT_STRATEGY option. - PLANNER_CPU_ASK, which will follow fragment trace that contributes towards CpuAsk number. To do the PLANNER_CPU_ASK strategy, the Planner will mark fragments that contribute to CpuAsk as dominant fragments. It also passes max_slot_per_executor information that it knows about the executor group set to the scheduler. AvgAdmissionSlotsPerExecutor counter is added to describe what Planner thinks the average 'slots_to_use' per backend will be, which follows this formula: AvgAdmissionSlotsPerExecutor = ceil(CpuAsk / num_executors) Actual 'slots_to_use' in each backend may differ than AvgAdmissionSlotsPerExecutor, depending on what is scheduled on that backend. 'slots_to_use' will be shown as 'AdmissionSlots' counter under each executor profile node. Testing: - Update test_executors.py with AvgAdmissionSlotsPerExecutor assertion. - Pass test_tpcds_queries.py::TestTpcdsQueryWithProcessingCost. Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 --- M be/src/scheduling/admission-controller-test.cc M be/src/scheduling/admission-controller.cc M be/src/scheduling/scheduler.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/common/Id.java M fe/src/main/java/org/apache/impala/planner/CoreCount.java M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A testdata/workloads/functional-query/queries/QueryTest/processing_cost_admission_slots.test M tests/custom_cluster/test_executor_groups.py A tests/query_test/test_processing_cost.py M tests/query_test/test_tpcds_queries.py 19 files changed, 536 insertions(+), 96 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/21257/10 -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator
Riza Suminto has uploaded a new patch set (#6) to the change originally created by David Rorke. ( http://gerrit.cloudera.org:8080/21279 ) Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator .. IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator This patch improves the accuracy of the CPU ProcessingCost estimates for several of the CPU intensive operators by basing the costs on benchmark data. The general approach for a given operator was to run a set of queries that exercised the operator under various conditions (e.g. large vs small row sizes and row counts, varying NDV, different file formats, etc) and capture the CPU time spent per unit of work (the unit of work might be measured as some number of rows, some number of bytes, some number of predicates evaluated, or some combination of these). The data was then analyzed in an attempt to fit a simple model that would allow us to predict CPU consumption of a given operator based on information available at planning time. For example, the CPU ProcessingCost for a Parquet scan is estimated as: TotalCost = (0.0144 * BytesMaterialized) + (0.0281 * Rows * Predicate Count) The coefficients (0.0144 and 0.0281) are derived from benchmarking scans under a variety of conditions. Similar cost functions and coefficients were derived for all of the benchmarked operators. The coefficients for all the operators are normalized such that a single unit of cost equates to roughly 100 nanoseconds of CPU time on a r5d.4xlarge instance. So we would predict an operator with a cost of 10,000,000 would complete in roughly one second on a single core. Limitations: * Costing only addresses CPU time spent and doesn't account for any IO or other wait time. * Benchmarking scenarios didn't provide comprehensive coverage of the full range of data types, distributions, etc. More thorough benchmarking could improve the costing estimates further. * This initial patch only covers a subset of the operators, focusing on those that are most common and most CPU intensive. Specifically the following operators are covered by this patch. All others continue to use the previous ProcessingCost code: AggregationNode DataStreamSink (exchange sender) ExchangeNode HashJoinNode HdfsScanNode HdfsTableSink NestedLoopJoinNode SortNode UnionNode Benchmark-based costing of the remaining operators will be covered by a future patch. Future patches will automate the collection and analysis of the benchmark data and the computation of the cost coefficients to simplify maintenance of the costing as performance changes over time. Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca --- M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Hello Kurt Deschler, Abhishek Rawat, David Rorke, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21277 to look at the new patch set (#4). Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. IMPALA-12988: Calculate an unbounded version of CpuAsk Planner calculates CpuAsk through a recursive call beginning at Planner.computeBlockingAwareCores(), which is called after Planner.computeEffectiveParallelism(). It does blocking operator analysis over the selected degree of parallelism that was decided during computeEffectiveParallelism() traversal. That selected degree of parallelism, however, is already bounded by min and max parallelism config, derived from PROCESSING_COST_MIN_THREADS and MAX_FRAGMENT_INSTANCES_PER_NODE options accordingly. This patch calculates an unbounded version of CpuAsk that is not bounded by min and max parallelism config. It is purely based on the fragment's ProcessingCost and query plan relationship constraint (for example, the number of JOIN BUILDER fragments should equal the number of destination JOIN fragments for partitioned join). During executor group set selection, Frontend should use the maximum between bounded CpuAsk and unbounded CpuAsk numbers to avoid assigning a query to a small executor group set too soon. The last executor group set stays as the "catch-all" executor group set. After this patch, the "max-parallelism" field in the query plan will all be set with maximum parallelism based on ProcessingCost. The CpuAsk counter is changed to show the maximum between bounded CpuAsk and unbounded CpuAsk after scaling against the QUERY_CPU_COUNT_DIVISOR option. A new counter CpuAskBounded shows the bounded CpuAsk after scaling. The EffectiveParallelism counter remains unchanged, showing bounded CpuAsk before scaling. Testing: - Update and pass FE test TpcdsCpuCostPlannerTest and PlannerTest#testProcessingCost. - Pass EE test tests/query_test/test_tpcds_queries.py - Pass custom cluster test tests/custom_cluster/test_executor_groups.py Change-Id: I5441e31088f90761062af35862be4ce09d116923 --- M be/src/scheduling/scheduler.cc M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q22.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q28.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q29.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q32.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q36.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q37.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q38.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q40.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q43.test M
[Impala-ASF-CR] IMPALA-11871: Skip permissions check on HDFS files if Ranger is enabled
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20221 ) Change subject: IMPALA-11871: Skip permissions check on HDFS files if Ranger is enabled .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id33c400fbe0c918b6b65d713b09009512835a4c9 Gerrit-Change-Number: 20221 Gerrit-PatchSet: 10 Gerrit-Owner: Halim Kim Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Halim Kim Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 11 Apr 2024 06:27:29 + Gerrit-HasComments: No