[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9482/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 16 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 22 Sep 2021 04:17:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 17: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7479/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 17 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 22 Sep 2021 04:03:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 17: (2 comments) http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test File testdata/workloads/functional-query/queries/QueryTest/spilling.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450 PS14, Line 450: > See my other comment. In some cases, FastLimitCheckExceededRows is 0, e.g. SET debug_action=-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0; http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test@2729 PS14, Line 2729: 1 > I see. What I mean is to assure the value is greater than 0. Done -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 17 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 22 Sep 2021 04:02:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 ) Change subject: IMPALA-9857: Batching of consecutive partition events .. Patch Set 6: (2 comments) Thanks Vihang for introducing a way for batch event processing. I will rebase my patch for IMPALA-10923 on top of this patch. http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@270 PS6, Line 270: i=0, j=1 nit: spaces around assignment operator http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1803 PS6, Line 1803: for (T event : batchedEvents_) { It seems ignoredPartitions still being processed here. -- To view, visit http://gerrit.cloudera.org:8080/17848 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73 Gerrit-Change-Number: 17848 Gerrit-PatchSet: 6 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 22 Sep 2021 03:55:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#16). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. When both conditions below are true, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Sample queries: SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Testing: Add test targeted-perf/queries/aggregation.test Pass core test Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 19 files changed, 147 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/16 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 16 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-10922: Fix wrong sarg on ORC Date column in release build
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17861 ) Change subject: IMPALA-10922: Fix wrong sarg on ORC Date column in release build .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9481/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17861 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieda467c822cf5c6f0edc0ddfe4da61c46b04e51f Gerrit-Change-Number: 17861 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 22 Sep 2021 02:42:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10922: Fix wrong sarg on ORC Date column in release build
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17861 Change subject: IMPALA-10922: Fix wrong sarg on ORC Date column in release build .. IMPALA-10922: Fix wrong sarg on ORC Date column in release build One line of code is wrapped with DCHECK in generating ORC Date type literal for predicate pushdown. Release build will ignore all codes wrapped in DCHECK so that line of logic is missing. This patch fixes it by moving the code outside of DCHECK. Tests: - Reproduced the issue and verified the fix in release build. Change-Id: Ieda467c822cf5c6f0edc0ddfe4da61c46b04e51f --- M be/src/exec/hdfs-orc-scanner.cc 1 file changed, 2 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/17861/1 -- To view, visit http://gerrit.cloudera.org:8080/17861 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ieda467c822cf5c6f0edc0ddfe4da61c46b04e51f Gerrit-Change-Number: 17861 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang
[Impala-ASF-CR] IMPALA-10876: Support to download JWKS from given URL
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17802 ) Change subject: IMPALA-10876: Support to download JWKS from given URL .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9479/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17802 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic6ac8cf0010c13db30219776d1d275709bf211df Gerrit-Change-Number: 17802 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 21 Sep 2021 22:19:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 ) Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. Patch Set 2: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/9480/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 2 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 21 Sep 2021 22:09:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 ) Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17859/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17859/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@509 PS2, Line 509: org.apache.impala.catalog.Table tbl = getTableAndAcquireWriteLock(partition.getDbName(), line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/17859/2/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java File fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java: http://gerrit.cloudera.org:8080/#/c/17859/2/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@367 PS2, Line 367: HdfsTable currentTbl = (HdfsTable) catalog_.getTable(TEST_DB_NAME, tblNameLowerCase); line too long (97 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 2 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 21 Sep 2021 22:01:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP]: Initial commit: Add logic to sync to latest event id
Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17756 ) Change subject: [WIP]: Initial commit: Add logic to sync to latest event id .. Patch Set 31: (10 comments) http://gerrit.cloudera.org:8080/#/c/17756/31/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/17756/31/be/src/catalog/catalog-server.cc@116 PS31, Line 116: enable_catalogd_cache_sync_to_latest_event_id > may be a better name would be enable_sync_to_latest_event_on_ddls. The "cat Yes, that sounds better. Will rename it http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2141 PS31, Line 2141: if (tbl == null) return null; : LOG.debug("table {} exits in cache, last synced id {}", tbl.getFullName(), : tbl.getLastSyncedEventId()); : // if no validWriteIdList is provided, we return the tbl if its loaded : if (tbl.isLoaded() && validWriteIdList == null) { : LOG.debug("returning already loaded table {}", tbl.getFullName()); : return tbl; : } > can you change these to trace? They might spew a lot of logs otherwise. Ack http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@269 PS31, Line 269: NewMetastoreEventsFactory > Do we need a new factory? Can we not override the isSelfEvent() method dire The new factory is not really required at this point in time and we can override the existing isSelfEvent() method. Thats why I had added a TODO as well comment to get suggestion on the same http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@841 PS31, Line 841: if (!catalog_.tryLockDb(db)) { : throw new CatalogException(String.format("Couldn't acquire lock on db %s", : db.getName())); : } : catalog_.getLock().writeLock().unlock(); : boolean shouldSkip = false; : if (db.getLastSyncedEventId() >= eventId) { : LOG.info("Skipping event {} on db {} since it is already synced till event id {}", : eventInfo, dbName_, eventId); : shouldSkip = true; : } : db.getLock().unlock(); > please use try .. finally pattern when dealing with locks. Also make sure y Yes I usually follow try finally pattern for locks. Didn't do it here because b/w db lock and unlock we are not doing anything that can throw an error. Nevertheless, will fix it. There is one more way to deal with it: We can skip acquiring lock on db and can simply check if db.getLastSyncedEventId() >= eventId here. If the condition passes, we anyways acquire lock on db when processing an event. At that time, we again compare db's last synced event id with event id of the current event. Thoughts? http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@264 PS31, Line 264: import > not sure why are we commenting these? These imports are not required. They were accidently added. Will remove them http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@316 PS31, Line 316: metastoreEventsMetrics_ > I think it would be very confusing to have same named metrics at 2 differen We could reuse the existing metrics. However I thought that it would be more transparent if every user of metastore events i.e catalogServiceCatalog, CatalogOpExecutor etc track the metrics of events processed by them individually. Thoughts? http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@438 PS31, Line 438: LOG.info("Adding db {} to catalogd, current event id {}", msDb.getName(), : currentEventId); : addDbToCatalog(currentEventId, msDb); > Why do we need to add the db here? Shouldn't syncT
[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17859 to look at the new patch set (#2). Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/test/java/org/apache/impala/catalog/AlterDatabaseTest.java A fe/src/test/java/org/apache/impala/catalog/MetastoreApiTestUtils.java M fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java M fe/src/test/java/org/apache/impala/catalog/metastore/AbstractCatalogMetastoreTest.java M fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsFileMetadataTest.java A fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M tests/custom_cluster/test_metastore_service.py 27 files changed, 3,325 insertions(+), 250 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17859/2 -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 2 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10876: Support to download JWKS from given URL
Wenzhe Zhou has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17802 ) Change subject: IMPALA-10876: Support to download JWKS from given URL .. IMPALA-10876: Support to download JWKS from given URL This patch added functionality to download JWKS from a given URL and support key rotation by periodically checking the JWKS URL for updates. We use Kudu's EasyCurl wrapper to download file from the given URL. curl was added to native-toolchain. This patch modified makefiles and bootstrap_toolchain.py to integrate libcurl and libkudu_curl_util. Upgraded source code for Kudu's EasyCurl wrapper (curl_util.cc and curl_util.h in kudu/util) from the top of Kudu upstream (commit hash: 7a4061e87e). Added end-end JWT authentication test cases with JWKS specified as HTTP/HTTPS URL. Testing: - Passed core run, including new test cases. Change-Id: Ic6ac8cf0010c13db30219776d1d275709bf211df --- M CMakeLists.txt M be/CMakeLists.txt M be/src/kudu/util/CMakeLists.txt M be/src/kudu/util/curl_util-test.cc M be/src/kudu/util/curl_util.cc M be/src/kudu/util/curl_util.h M be/src/rpc/authentication.cc M be/src/service/impala-server.cc M be/src/util/jwt-util-internal.h M be/src/util/jwt-util-test.cc M be/src/util/jwt-util.cc M be/src/util/jwt-util.h M bin/bootstrap_toolchain.py M bin/impala-config.sh M bin/rat_exclude_files.txt A cmake_modules/FindCurl.cmake M fe/src/test/java/org/apache/impala/customcluster/CustomClusterRunner.java M fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java M fe/src/test/java/org/apache/impala/customcluster/JwtWebserverTest.java A testdata/jwt/jwks_es256.json 20 files changed, 725 insertions(+), 192 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/17802/4 -- To view, visit http://gerrit.cloudera.org:8080/17802 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic6ac8cf0010c13db30219776d1d275709bf211df Gerrit-Change-Number: 17802 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10876: Support to download JWKS from given URL
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17802 ) Change subject: IMPALA-10876: Support to download JWKS from given URL .. Patch Set 3: (3 comments) Add a working thread to periodically check the JWKS URL for updates. http://gerrit.cloudera.org:8080/#/c/17802/3/be/src/util/jwt-util.cc File be/src/util/jwt-util.cc: http://gerrit.cloudera.org:8080/#/c/17802/3/be/src/util/jwt-util.cc@585 PS3, Line 585: curl_global_init(CURL_GLOBAL_DEFAULT); : : // Initialize curl session. : curl_handle = curl_easy_init(); : if (curl_handle == nullptr) { : status = Status("Error to initialize curl session"); : goto cleanup; : } : curl_easy_setopt(curl_handle, CURLOPT_URL, jwks_url.c_str()); : curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, CurlWriteDownloadBufferCallback); : curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, (void*)&chunk); : curl_easy_setopt(curl_handle, CURLOPT_USERAGENT, "libcurl-agent/1.0"); > It's worth looking at Kudu's EasyCurl wrapper to see if it would work for u Will change code to use Kudu's EasyCurl wrapper. The source code of Kudu's EasyCurl were updated in Kudu's upstream repo. Need to upgrade the corresponding code from Kudu. http://gerrit.cloudera.org:8080/#/c/17802/3/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java File fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java: http://gerrit.cloudera.org:8080/#/c/17802/3/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java@227 PS3, Line 227: String webserverArgs = "--webserver_port=25010"; > It would be good to make it very clear that the statestore is serving up th Will change variable names as suggested to make code more readable. http://gerrit.cloudera.org:8080/#/c/17802/3/www/jwt/jwks_rs256.json File www/jwt/jwks_rs256.json: http://gerrit.cloudera.org:8080/#/c/17802/3/www/jwt/jwks_rs256.json@1 PS3, Line 1: : { : "keys": [ : { "use": "sig", "kty": "RSA", "kid": "public:c424b67b-fe28-45d7-b015-f79da50b5b21", "alg": "RS256", "n": "uGbXWiK3dQTyCbX5xdE4yCuYp0AF2d15Qq1JSXT_lx8CEcXb9RbDddl8jGDv-spi5qPa8qEHiK7FwV2KpRE983wGPnYsAm9BxLFb4YrLYcDFOIGULuk2FtrPS512Qea1bXASuvYXEpQNpGbnTGVsWXI9C-yjHztqyL2h8P6mlThPY9E9ue2fCqdgixfTFIF9Dm4SLHbphUS2iw7w1JgT69s7of9-I9l5lsJ9cozf1rxrXX4V1u_SotUuNB3Fp8oB4C1fLBEhSlMcUJirz1E8AziMCxS-VrRPDM-zfvpIJg3JljAh3PJHDiLu902v9w-Iplu1WyoB2aPfitxEhRN0Yw", "e": "AQAB" }, : { "use": "sig", "kty": "RSA", "kid": "public:9b9d0b47-b9ed-4ba6-9180-52fc5b161a3a", "alg": "RS256", "n": "xzYuc22QSst_dS7geYYK5l5kLxU0tayNdixkEQ17ix-CUcUbKIsnyftZxaCYT46rQtXgCaYRdJcbB3hmyrOavkhTpX79xJZnQmfuamMbZBqitvscxW9zRR9tBUL6vdi_0rpoUwPMEh8-Bw7CgYR0FK0DhWYBNDfe9HKcyZEv3max8Cdq18htxjEsdYO0iwzhtKRXomBWTdhD5ykd_fACVTr4-KEY-IeLvubHVmLUhbE5NgWXxrRpGasDqzKhCTmsa2Ysf712rl57SlH0Wz_Mr3F7aM9YpErzeYLrl0GhQr9BVJxOvXcVd4kmY-XkiCcrkyS1cnghnllh-LCwQu1sYw", "e": "AQAB" } : ] : } > Everything in the www/ directory gets shipped as part of the docker image b Move the JWKS file back to directory testdata/jwt, and copy the JWKS files temporarily to root directory of Web server when running test and delete it at the end of test. -- To view, visit http://gerrit.cloudera.org:8080/17802 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic6ac8cf0010c13db30219776d1d275709bf211df Gerrit-Change-Number: 17802 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 21 Sep 2021 21:46:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 ) Change subject: IMPALA-9857: Batching of consecutive partition events .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9478/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17848 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73 Gerrit-Change-Number: 17848 Gerrit-PatchSet: 6 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 21 Sep 2021 20:09:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 ) Change subject: IMPALA-9857: Batching of consecutive partition events .. Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@955 PS5, Line 955: > nit: too much indent Done http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1027 PS5, Line 1027: > nit: +2 indent Done http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2713 PS5, Line 2713: return hmsPartToHdfsPart.size(); > Shouldn't we return hmsPartToHdfsPart.size() since those are the actual par yes, thanks for spotting that. Fixed. http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@279 PS2, Line 279: if (!current.canBeBatched(next)) { > Currently one of the conditions of batching two events together is - event Yes, the events definitely can be intermingled from multiple tables. In case of ALTER_PARTITION event it is less likely to happen since these events are transactional in nature (publish all or none) and they hold a eventId lock in HMS when being generated. However, this is highly dependent on the source application which generates the events. For example if Hive makes 10 alterPartition calls instead of 1 alterPartitions() for 10 partitions, the event stream may include events which are inter-mingled between different tables. The batching logic currently optimizes for the common case of consecutive ALTER_PARTITION or INSERT events being generated by a single query. More sophisticated batching schemes like you mentioned can be explored but may increase the scope of the patch and I am inclined to commit this first and then incrementally improve it if needed. My primary concern about batching non-consecutive events is that we need to make sure that we are not introducing any subtle bugs because of the the fact that we are refreshing the partitions out of order then event stream. http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641 PS2, Line 1641: MetastoreEventPropertyKey.CATALOG_VERSION.getKey(), "-1")); > I understand that there may not be significant performance gain. I still fe Okay. Let me look into this and see how much additional changes will need to made for this. http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1811 PS5, Line 1811: Ref > nit: indentation is off My IDE seems to like doing this. Not sure how to fix it. Manually edited it. http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java File fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java: http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java@100 PS5, Line 100: dx); > This precondition is checked by the Java runtime. makes sense. Removed it. -- To view, visit http://gerrit.cloudera.org:8080/17848 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73 Gerrit-Change-Number: 17848 Gerrit-PatchSet: 6 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 21 Sep 2021 19:46:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events
Vihang Karajgaonkar has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17848 ) Change subject: IMPALA-9857: Batching of consecutive partition events .. IMPALA-9857: Batching of consecutive partition events This patch improves the performance of events processor by batching together consecutive ALTER_PARTITION or INSERT events. Currently, without this patch, if the events stream consists of a lot of consecutive ALTER_PARTITION events which cannot be skipped, events processor will refresh partition from each event one by one. Similarly, in case of INSERT events in a partition events processor refresh one partition at a time. By batching together such consecutive ALTER_PARTITION or INSERT events, events processor needs to take lock on the table only once per batch and can refresh all the partitions from the events using multiple threads. For transactional (acid) tables, this provides even significant performance gain since currently we refresh the whole table in case of ALTER_PARTITION or INSERT partition events. By batching them together, events processor will refresh the table once per batch. The batch of eligible ALTER_PARTITION and INSERT events will be processed as ALTER_PARTITIONS and INSERT_PARTITIONS event respectively. Performance tests: In order to simulate bunch of ALTER_PARTITION and INSERT events, a simple test was performed by running the following query from hive: insert into store_sales_copy partition(ss_sold_date_sk) select * from store_sales; This query generates 1824 ALTER_PARTITION and 1824 INSERT events and time taken to process all the events generated was measured before and after the patch for external and ACID table. Table Type Before After == External table 75 sec 25 sec ACID tables 313 sec 47 sec Additionally, the patch also fixes a minor bug in evaluateSelfEvent() method which should return false when serviceId does not match. Testing Done: 1. Added new tests which cover the batching logic of events. 2. Exhaustive tests. Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M tests/custom_cluster/test_events_custom_configs.py 9 files changed, 920 insertions(+), 185 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17848/6 -- To view, visit http://gerrit.cloudera.org:8080/17848 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73 Gerrit-Change-Number: 17848 Gerrit-PatchSet: 6 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 ) Change subject: IMPALA-10921 Add script to compare TPCDS runs. .. Patch Set 1: (2 comments) Hi Amogh, Thanks for submitting the script. Besides the flake8 errors, I have few comments and request to incorporate. http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py File bin/diagnostics/experimental/tpcds_run_comparator.py: http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@211 PS1, Line 211: def print_results(ht_mem_res, op_mem_res): It will be great if we can add option to print this output to 2 csv files. http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@245 PS1, Line 245: if (i == 23 or i == 24): : filenames = [str(i)+"_a.txt", str(i)+"_b.txt"] For iterating the profiles, what if we just use simple filename matching between the baseline dir and new measurement dir? The reason is that the naming convention is not always standardized. I have seen variation such as '23_a.txt', '23a.txt', 'tpcds-23a.txt', and so on. Using simple filename match will also make the script more general, not specific to just TPC-DS. -- To view, visit http://gerrit.cloudera.org:8080/17855 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6 Gerrit-Change-Number: 17855 Gerrit-PatchSet: 1 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 21 Sep 2021 18:17:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10714: Defer advancing read page until the buffer is attached
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17853 ) Change subject: IMPALA-10714: Defer advancing read page until the buffer is attached .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9477/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17853 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I586ed72ba01cc3f28b0dcb1e202b3ca32a6c3b83 Gerrit-Change-Number: 17853 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 21 Sep 2021 17:49:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10714: Defer advancing read page until the buffer is attached
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17853 ) Change subject: IMPALA-10714: Defer advancing read page until the buffer is attached .. Patch Set 3: (3 comments) Thank you for the feedback! After expanding the test case, I found that fix from patch set 2 is still incorrect. Delayed advancement of read page should only allowed in read-write stream. Patch set 3 add checks for 'has_write_iterator_' to make sure that the stream is indeed a read-write stream. http://gerrit.cloudera.org:8080/#/c/17853/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17853/2//COMMIT_MSG@10 PS2, Line 10: Stream > nit. expects Done http://gerrit.cloudera.org:8080/#/c/17853/2/be/src/runtime/buffered-tuple-stream-test.cc File be/src/runtime/buffered-tuple-stream-test.cc: http://gerrit.cloudera.org:8080/#/c/17853/2/be/src/runtime/buffered-tuple-stream-test.cc@1388 PS2, Line 1388: eservation to alloc > May need to add the following additional test cases This is now addressed by: 1. TestUnpinAfterFullStreamRead(num_rows, buffer_size, max_num_pages, false, true, false); 2. TestUnpinAfterFullStreamRead(num_rows, buffer_size, max_num_pages, true, false, ...); http://gerrit.cloudera.org:8080/#/c/17853/2/be/src/runtime/buffered-tuple-stream-test.cc@1397 PS2, Line 1397: > May need to test #pages == 1. I believe this is now addressed by TestUnpinAfterFullStreamRead with attach_on_read=true and refill_before_unpin=false. With this argument, the stream only have 1 page left before UnpinStream() is called. -- To view, visit http://gerrit.cloudera.org:8080/17853 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I586ed72ba01cc3f28b0dcb1e202b3ca32a6c3b83 Gerrit-Change-Number: 17853 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 21 Sep 2021 17:40:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10714: Defer advancing read page until the buffer is attached
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17853 ) Change subject: IMPALA-10714: Defer advancing read page until the buffer is attached .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17853/3/be/src/runtime/buffered-tuple-stream-test.cc File be/src/runtime/buffered-tuple-stream-test.cc: http://gerrit.cloudera.org:8080/#/c/17853/3/be/src/runtime/buffered-tuple-stream-test.cc@1490 PS3, Line 1490: VerifyStreamState(&stream, num_pages, num_pinned_pages, num_unpinned_pages, buffer_size); line too long (93 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17853 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I586ed72ba01cc3f28b0dcb1e202b3ca32a6c3b83 Gerrit-Change-Number: 17853 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 21 Sep 2021 17:28:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10714: Defer advancing read page until the buffer is attached
Hello Quanlong Huang, Qifan Chen, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17853 to look at the new patch set (#3). Change subject: IMPALA-10714: Defer advancing read page until the buffer is attached .. IMPALA-10714: Defer advancing read page until the buffer is attached If a BufferedTupleStream is a read-write stream and set up with attach_on_read = true, BufferedTupleStream::NextReadPage() expects that the page buffer is attached to the output row batch before advancing the read iterator. However, BufferedTupleStream::GetNextInternal() will not attach the page buffer of a completely read page if it is a read-write page. BufferedTupleStream::UnpinStream() need to defer advancing the read page if this situation happens. This patch adds BE test StreamStateTest.UnpinFullyExhaustedReadPage that simulates the corner case. This patch also moves BE test DeferAdvancingReadPage and ShortDebugString into class StreamStateTest to reduce friend class declaration in buffered-tuple-stream.h Testing: - Run and pass BE test StreamStateTest.UnpinFullyExhaustedReadPage. Change-Id: I586ed72ba01cc3f28b0dcb1e202b3ca32a6c3b83 --- M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h 3 files changed, 214 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17853/3 -- To view, visit http://gerrit.cloudera.org:8080/17853 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I586ed72ba01cc3f28b0dcb1e202b3ca32a6c3b83 Gerrit-Change-Number: 17853 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 ) Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables .. Patch Set 1: (8 comments) Looks very reasonable. It is not clear to me how a user select a particular iceberg table snapshot to use in a query. http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@11 PS1, Line 11: and the HDFS : block locations were not consistent across the reload > This is not clear to me: if the block locations have changed between querie +1. I think the fix is applicable when an iceberg table of a particular snapshot is to be scanned. In doing so for the same table instance, the partition info about the snapshot can be reused. When only the last version of the table is to be scanned, my impression is that the patch should not be applied. http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@110 PS1, Line 110: "Failed to load table: %s.%s", msTable.getDbName(), msTable.getTableName()), e); nit probably should include the snapshot id in the message, if it is not in e. http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1227 PS1, Line 1227: a nit. an http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1268 PS1, Line 1268: addSummary(response, "Updated table."); Updated table after set table properties http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1272 PS1, Line 1272: Updated table. Updated table after unset table properties http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1295 PS1, Line 1295: if (!needsToUpdateHms) { : loadTableMetadata(tbl, newCatalogVersion, true, true, null, "ALTER Iceberg TABLE " + : params.getAlter_type().name()); : catalog_.addVersionsForInflightEvents(false, tbl, newCatalogVersion); : addTableToCatalogUpdate(tbl, wantMinimalResult, response.result); : return false; : } Should this block of code be executed within 1283-1286 too? That is, if txn is needed, then this block of code is protected by the txn. http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@113 PS1, Line 113: addColumn nit. addColumns()? http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@339 PS1, Line 339: the table parameters nit. update property in 'txn' -- To view, visit http://gerrit.cloudera.org:8080/17857 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5 Gerrit-Change-Number: 17857 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 21 Sep 2021 17:23:08 + Gerrit-HasComments: Yes