[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 ) Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 1 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 21 Sep 2021 05:40:57 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17860 ) Change subject: [WIP] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table. .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/9476/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 1 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 21 Sep 2021 00:25:12 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table.
Amogh Margoor has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17860 Change subject: [WIP] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table. .. [WIP] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table. Currently, entire row is materialized, before filtering upon it during scan. Instead, cost can be saved if only the columns required for filtering are materialized first and then rest of the columns are materialized only for rows surviving after filter. Performance: TBD Testing: TBD Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/hdfs-columnar-scanner-ir.cc M be/src/exec/hdfs-columnar-scanner.cc M be/src/exec/hdfs-columnar-scanner.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-collection-column-reader.cc M be/src/exec/parquet/parquet-collection-column-reader.h M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-readers.h M be/src/exec/scratch-tuple-batch.h 12 files changed, 344 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/17860/1 -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 1 Gerrit-Owner: Amogh Margoor
[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 ) Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9475/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 1 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 20 Sep 2021 23:50:34 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP]: Initial commit to acquire table/database lock in metastore server before any HMS operation
Sourabh Goyal has abandoned this change. ( http://gerrit.cloudera.org:8080/17703 ) Change subject: [WIP]: Initial commit to acquire table/database lock in metastore server before any HMS operation .. Abandoned Consolidating multiple patches into one i.e https://gerrit.cloudera.org/c/17859/ -- To view, visit http://gerrit.cloudera.org:8080/17703 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I085eab20db61282daf4549ddbcc018aaf63cc361 Gerrit-Change-Number: 17703 Gerrit-PatchSet: 9 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] [WIP]: Initial commit: Add logic to sync to latest event id
Sourabh Goyal has abandoned this change. ( http://gerrit.cloudera.org:8080/17756 ) Change subject: [WIP]: Initial commit: Add logic to sync to latest event id .. Abandoned Consolidating multiple patches into one i.e https://gerrit.cloudera.org/c/17859/ -- To view, visit http://gerrit.cloudera.org:8080/17756 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Ia822d15725d0a9a9ad1398e10ed4ae3288d0e9ad Gerrit-Change-Number: 17756 Gerrit-PatchSet: 31 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 ) Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7478/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 1 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 20 Sep 2021 23:31:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 ) Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17859/1/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17859/1/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@704 PS1, Line 704: org.apache.impala.catalog.Table tbl = getTableAndAcquireWriteLock(partition.getDbName(), line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 1 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 20 Sep 2021 23:29:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Sourabh Goyal has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17859 Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/test/java/org/apache/impala/catalog/AlterDatabaseTest.java A fe/src/test/java/org/apache/impala/catalog/MetastoreApiTestUtils.java M fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java M fe/src/test/java/org/apache/impala/catalog/metastore/AbstractCatalogMetastoreTest.java M fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsFileMetadataTest.java A fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M tests/custom_cluster/test_metastore_service.py 27 files changed, 3,535 insertions(+), 240 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17859/1 -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 1 Gerrit-Owner: Sourabh Goyal
[Impala-ASF-CR] [WIP]: Initial commit: Add logic to sync to latest event id
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17756 ) Change subject: [WIP]: Initial commit: Add logic to sync to latest event id .. Patch Set 31: (12 comments) I will take another pass at this today or tomorrow since this patch is pretty big. http://gerrit.cloudera.org:8080/#/c/17756/31/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/17756/31/be/src/catalog/catalog-server.cc@116 PS31, Line 116: enable_catalogd_cache_sync_to_latest_event_id may be a better name would be enable_sync_to_latest_event_on_ddls. The "catalogd_cache" part of this config name is redundant. http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2141 PS31, Line 2141: if (tbl == null) return null; : LOG.debug("table {} exits in cache, last synced id {}", tbl.getFullName(), : tbl.getLastSyncedEventId()); : // if no validWriteIdList is provided, we return the tbl if its loaded : if (tbl.isLoaded() && validWriteIdList == null) { : LOG.debug("returning already loaded table {}", tbl.getFullName()); : return tbl; : } can you change these to trace? They might spew a lot of logs otherwise. http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/TableLoader.java File fe/src/main/java/org/apache/impala/catalog/TableLoader.java: http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@62 PS31, Line 62: eventFactory_ It would be good to avoid having eventFactory_ here since it leaks events processor details out in unrelated classes like TableLoader. http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@188 PS31, Line 188: initMetrics This is a weird place to have this method? Why are we initializing events processor metrics in TableLoader? http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@269 PS31, Line 269: NewMetastoreEventsFactory Do we need a new factory? Can we not override the isSelfEvent() method directly in the MetastoreEvent.isSelfEvent() to look for enable_catalogd_cache_sync_to_latest_event_id config and return false early. http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@841 PS31, Line 841: if (!catalog_.tryLockDb(db)) { : throw new CatalogException(String.format("Couldn't acquire lock on db %s", : db.getName())); : } : catalog_.getLock().writeLock().unlock(); : boolean shouldSkip = false; : if (db.getLastSyncedEventId() >= eventId) { : LOG.info("Skipping event {} on db {} since it is already synced till event id {}", : eventInfo, dbName_, eventId); : shouldSkip = true; : } : db.getLock().unlock(); please use try .. finally pattern when dealing with locks. Also make sure you have some form of UnlockWriteLockIfErronouslyLocked() in the finally block. http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@264 PS31, Line 264: import not sure why are we commenting these? http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@316 PS31, Line 316: metastoreEventsMetrics_ I think it would be very confusing to have same named metrics at 2 different places in the code. Why can't we reuse existing metastore event metrics? http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@438 PS31, Line 438: LOG.info("Adding db {} to catalogd, current event id {}", msDb.getName(), : currentEventId); : addDbToCatalog(currentEventId, msDb); Why do we
[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 ) Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@31 PS1, Line 31: service ID and new catalog version for the Iceberg table > Doesn't Iceberg have something similar to catalog version, e.g. something l +1 if we have native version number support in Iceberg it is better to use that detect self-events. I am okay doing it in a separate patch if needed. The self-events logic for ALTER events is up for a overhaul in https://gerrit.cloudera.org/#/c/17756/ which would use the eventId based detection instead of catalog version number. http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@348 PS1, Line 348: hdfsTable_ : .load(false, msClient, msTable_, true, true, false, null, null,null, reason); : pathHashToFileDescMap_ = Utils.loadAllPartition(this); This is probably unrelated to this patch I was curious to understand what is the difference between the file descriptors which are loaded by hdfsTable_.load() and the loadAllPartition(this) method here. Is there a overlap in these 2 collections of file descriptors? http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@481 PS1, Line 481: TGetPartialCatalogObjectResponse resp = : getHdfsTable().getPartialInfo(req, missingPartialInfos); : if (req.table_info_selector.want_iceberg_snapshot) { : resp.table_info.setIceberg_snapshot( : FeIcebergTable.Utils.createTIcebergSnapshot(this)); : } are we sending some file descriptors twice here? Some from the hdfsTable and some from the pathHashToFileDescMap? http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@116 PS1, Line 116: pathHashToFileDescMap_ Am I correct in understanding that pathHashToFileDescMap_ is always loaded from the snapshot? If yes, does catalogd need to load the filedescriptors in the hdfsTable_ backing this iceberg table? http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2915 PS1, Line 2915: newCatalogVersion = catalog_.incrementAndGetCatalogVersion(); It is not clear to me why do we need to increment the catalog version here again? http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2918 PS1, Line 2918: iceTxn.commitTransaction(); : return newCatalogVersion; Can you move these statements into a finally block and avoid code duplication in the if block and later down below? http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6070 PS1, Line 6070: addVersionsForInflightEvents Can we add test coverage for this in the test_self_events? -- To view, visit http://gerrit.cloudera.org:8080/17857 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5 Gerrit-Change-Number: 17857 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 20 Sep 2021 20:48:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events
Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 ) Change subject: IMPALA-9857: Batching of consecutive partition events .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2713 PS5, Line 2713: return hmsPartitions.size(); Shouldn't we return hmsPartToHdfsPart.size() since those are the actual partitions we are reloading in the cache? http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@279 PS2, Line 279: if (!current.canBeBatched(next)) { Currently one of the conditions of batching two events together is - event ids should be consecutive. This means that if there are events intermingled from multiple tables, then this batching logic won't be very effective. Do we see intermingled events in production? If yes, should we improve the batching logic here? http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641 PS2, Line 1641: MetastoreEventPropertyKey.CATALOG_VERSION.getKey(), "-1")); > Since the default value of polling is 1s and we fetch the events in the bat I understand that there may not be significant performance gain. I still feel that we should batch alter partition events from other clusters as well (and ignore self events when actually processing a batched event) considering that we would be syncing table till latest event id in future where self event logic will not be applicable anymore. -- To view, visit http://gerrit.cloudera.org:8080/17848 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73 Gerrit-Change-Number: 17848 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 20 Sep 2021 20:31:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 ) Change subject: IMPALA-9857: Batching of consecutive partition events .. Patch Set 5: (4 comments) Did a first round. The change looks really great. I can take a deeper look on Wednesday. http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@955 PS5, Line 955: nit: too much indent http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1027 PS5, Line 1027: nit: +2 indent http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1811 PS5, Line 1811: nit: indentation is off http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java File fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java: http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java@100 PS5, Line 100: idx >=0 && idx < insertEventIds_.size() This precondition is checked by the Java runtime. -- To view, visit http://gerrit.cloudera.org:8080/17848 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73 Gerrit-Change-Number: 17848 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 20 Sep 2021 19:23:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/9474/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 1 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Mon, 20 Sep 2021 18:41:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. Patch Set 1: (10 comments) http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2604 PS1, Line 2604: public boolean reloadTableIfExists(String dbName, String tblName, long eventId, String reason) line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2610 PS1, Line 2610: LOG.debug("Not reloading the table {}.{} since it is recreated.", dbName, tblName); line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1709 PS1, Line 1709: Preconditions.checkState(eventType_.equals(MetastoreEventType.ALLOC_WRITE_ID_EVENT)); line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1712 PS1, Line 1712: MetastoreEventsProcessor.getMessageDeserializer().getAllocWriteIdMessage(event.getMessage()); line too long (103 > 90) http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1081 PS1, Line 1081: Path parentPath = partition == null ? new Path(msTbl.getSd().getLocation()) : new Path(partition.getSd().getLocation()); line too long (124 > 90) http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1083 PS1, Line 1083: List newFiles = addFilesToDirectory(deltaPath, "testFile.", totalNumberOfFilesToAdd, false); line too long (104 > 90) http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2283 PS1, Line 2283: private void testInsertIntoTransactionalTable(String tblName, boolean toAbort, boolean isPartitioned) line too long (103 > 90) http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2297 PS1, Line 2297: long writeId = MetastoreShim.allocateTableWriteId(client.getHiveClient(), txnId, TEST_DB_NAME, tblName); line too long (110 > 90) http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2304 PS1, Line 2304: partition = client.getHiveClient().getPartition(TEST_DB_NAME, tblName, Arrays.asList("1")); line too long (99 > 90) http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3088 PS1, Line 3088: TEST_DB_NAME, tblName, test_insert_tbl, updated_partitions, isOverwrite, txnId, writeId); line too long (97 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 1 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 20 Sep 2021 18:29:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17858 Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Trigger partition level refreshing for addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config incremental_refresh_acid, which can switch on/off the fine-grained table refreshing. Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 12 files changed, 672 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/1 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 1 Gerrit-Owner: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 15: (2 comments) Just like to double check on the expected test results. http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test File testdata/workloads/functional-query/queries/QueryTest/spilling.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450 PS14, Line 450: > I modified the unit test because autotest failed.FastLimitCheckExceededRows See my other comment. http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test@2729 PS14, Line 2729: 0 > Usually, rowbatch contains 1024 rows, stream agg processeed a rowbatch, the I see. What I mean is to assure the value is greater than 0. So maybe use [1-9][0-9]*? -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 15 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Mon, 20 Sep 2021 16:13:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10714: Defer advancing read page until the buffer is attached
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17853 ) Change subject: IMPALA-10714: Defer advancing read page until the buffer is attached .. Patch Set 2: (3 comments) Looks good!. Some additional test cases may be needed to assure the new logic is sound. http://gerrit.cloudera.org:8080/#/c/17853/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17853/2//COMMIT_MSG@10 PS2, Line 10: expect nit. expects http://gerrit.cloudera.org:8080/#/c/17853/2/be/src/runtime/buffered-tuple-stream-test.cc File be/src/runtime/buffered-tuple-stream-test.cc: http://gerrit.cloudera.org:8080/#/c/17853/2/be/src/runtime/buffered-tuple-stream-test.cc@1388 PS2, Line 1388: PrepareForReadWrite May need to add the following additional test cases 1. PrepareForRead(true, ...) for read only use case; 2. PrepareForReadWrite(false, ...) that will not delay the advance of the 1st read page. http://gerrit.cloudera.org:8080/#/c/17853/2/be/src/runtime/buffered-tuple-stream-test.cc@1397 PS2, Line 1397: 2 May need to test #pages == 1. -- To view, visit http://gerrit.cloudera.org:8080/17853 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I586ed72ba01cc3f28b0dcb1e202b3ca32a6c3b83 Gerrit-Change-Number: 17853 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 20 Sep 2021 16:06:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 ) Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables .. Patch Set 1: (6 comments) Some high level comments. The code looks good to me so far, but I will go through it once more. http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@11 PS1, Line 11: and the HDFS : block locations were not consistent across the reload This is not clear to me: if the block locations have changed between queries, then it makes sense to schedule them differently, but if they didn't change (which seems the more common case to me), then how are they different? http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@25 PS1, Line 25: Fixing the above revealed another bug. Before this patch we didn't : handle self-events of Iceberg tables. Can you add more information about this? For example what events are generated in HMS for different Iceberg operations? I guess that schema changes will be treated similarly to the case when Impala calls HMS directly, but I can't imagine what happens in case of INSERTs at the moment. http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@31 PS1, Line 31: service ID and new catalog version for the Iceberg table Doesn't Iceberg have something similar to catalog version, e.g. something like "snapshot id", that could be used to check whether we need to reload data? The current way we handle self-events is a horrible hack IMO which is needed because HMS doesn't provide a usable version number for table/partition metadata. Just hoping that Iceberg does have something like that :) http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@34 PS1, Line 34: * added e2e test for the scan range assignment It is a bit tricky, but self-event handling can be also tested, see https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java#L508 http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java: http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java@104 PS1, Line 104: properties.setProperty("external.table.purge", "TRUE"); Is this change related to the ones mentioned in the commit message? http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java@151 PS1, Line 151: properties.setProperty(IcebergTable.ICEBERG_CATALOG, : tableProps.get(IcebergTable.ICEBERG_CATALOG)); Is this change related to the ones mentioned in the commit message? -- To view, visit http://gerrit.cloudera.org:8080/17857 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5 Gerrit-Change-Number: 17857 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 20 Sep 2021 15:40:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10838: Error when struct returned from WITH()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17847 ) Change subject: IMPALA-10838: Error when struct returned from WITH() .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9473/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17847 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57 Gerrit-Change-Number: 17847 Gerrit-PatchSet: 5 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 20 Sep 2021 13:34:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 ) Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9472/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17857 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5 Gerrit-Change-Number: 17857 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 20 Sep 2021 13:33:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10838: Error when struct returned from WITH()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17847 ) Change subject: IMPALA-10838: Error when struct returned from WITH() .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/17847/5/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java File fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java: http://gerrit.cloudera.org:8080/#/c/17847/5/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@119 PS5, Line 119: // Even though we are in a struct, it gives null, probably because it is not fully analyzed. line too long (96 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17847 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57 Gerrit-Change-Number: 17847 Gerrit-PatchSet: 5 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 20 Sep 2021 13:13:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10838: Error when struct returned from WITH()
Daniel Becker has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17847 ) Change subject: IMPALA-10838: Error when struct returned from WITH() .. IMPALA-10838: Error when struct returned from WITH() The following query fails: ''' with sub as ( select id, outer_struct from functional_orc_def.complextypes_nested_structs) select sub.id, sub.outer_struct.inner_struct2 from sub; ''' with the following error: ''' ERROR: IllegalStateException: Illegal reference to non-materialized tuple: debugname=InlineViewRef sub alias=sub tid=6 ''' while if 'outer_struct.inner_struct2' is added to the select list of the inline view, the query works as expected. This change fixes the problem by two modifications: - if a field of a struct needs to be materialised, also materialise all of its enclosing structs (ancestors) - in expression substitutions (in org.apache.impala.analysis.Expr.substituteImpl), if there is no substitution mapping for a field of a struct but there is a mapping for an enclosing struct, find the mapping for that enclosing struct and from its subexpressions use the one corresponding to the original expression for substitution This change also changes the way struct fields are materialised: until now, if a member of a struct was needed to be materialised, the whole struct, including other members of the struct were materialised. This behaviour can lead to using significantly more memory than necessaty if we for example query a single member of a large struct. This change modifies this behaviour so that we only materialise the struct members that are actually needed. Tests: - added queries that are fixed by this change (including the one above) in nested-struct-in-select-list.test Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57 --- M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test 6 files changed, 227 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/17847/5 -- To view, visit http://gerrit.cloudera.org:8080/17847 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57 Gerrit-Change-Number: 17847 Gerrit-PatchSet: 5 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17857 Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables .. IMPALA-10914: Consistently schedule scan ranges for Iceberg tables Before this patch Impala inconsistently scheduled scan ranges for Iceberg tables on HDFS, in local catalog mode. It did so because LocalIcebergTable reloaded all the files descriptors, and the HDFS block locations were not consistent across the reloads. Impala's scheduler uses the block location list for scan range assignment, hence the assignments were inconsistent between queries. This has a negative effect on caching and hence hit performance quite badly. It is redundant and expensive to reload file descriptors for each query in local catalog mode. This patch extends the GetPartialInfo() RPC with Iceberg-specific snapshot information. It means that the coordinator is now able to fetch Iceberg data file descriptors from the CatalogD. This way scan range assignment becomes consistent because we reuse the same file descriptors with the same block location information. Fixing the above revealed another bug. Before this patch we didn't handle self-events of Iceberg tables. When an Iceberg table is stored in the HiveCatalog it means that Iceberg will update the HMS table on modifications. Then Catalogd processes these modifications again when they were arrive via the event notification mechanism. I fixed this by creating Iceberg transactions in which I set the catalog service ID and new catalog version for the Iceberg table. Testing: * added e2e test for the scan range assignment Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5 --- M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-catalogs.test M tests/query_test/test_iceberg.py 16 files changed, 290 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/17857/1 -- To view, visit http://gerrit.cloudera.org:8080/17857 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5 Gerrit-Change-Number: 17857 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10838: Error when struct returned from WITH()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17847 ) Change subject: IMPALA-10838: Error when struct returned from WITH() .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9471/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17847 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57 Gerrit-Change-Number: 17847 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 20 Sep 2021 10:25:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10838: Error when struct returned from WITH()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17847 ) Change subject: IMPALA-10838: Error when struct returned from WITH() .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17847/4/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java File fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java: http://gerrit.cloudera.org:8080/#/c/17847/4/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@119 PS4, Line 119: // Even though we are in a struct, it gives null, probably because it is not fully analyzed. line too long (96 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17847 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57 Gerrit-Change-Number: 17847 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 20 Sep 2021 10:03:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10838: Error when struct returned from WITH()
Daniel Becker has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17847 ) Change subject: IMPALA-10838: Error when struct returned from WITH() .. IMPALA-10838: Error when struct returned from WITH() The following query fails: ''' with sub as ( select id, outer_struct from functional_orc_def.complextypes_nested_structs) select sub.id, sub.outer_struct.inner_struct2 from sub; ''' with the following error: ''' ERROR: IllegalStateException: Illegal reference to non-materialized tuple: debugname=InlineViewRef sub alias=sub tid=6 ''' while if 'outer_struct.inner_struct2' is added to the select list of the inline view, the query works as expected. This change fixes the problem by two modifications: - if a field of a struct needs to be materialised, also materialise all of its enclosing structs (ancestors) - in expression substitutions (in org.apache.impala.analysis.Expr.substituteImpl), if there is no substitution mapping for a field of a struct but there is a mapping for an enclosing struct, find the mapping for that enclosing struct and from its subexpressions use the one corresponding to the original expression for substitution This change also changes the way struct fields are materialised: until now, if a member of a struct was needed to be materialised, the whole struct, including other members of the struct were materialised. This behaviour can lead to using significantly more memory than necessaty if we for example query a single member of a large struct. This change modifies this behaviour so that we only materialise the struct members that are actually needed. Tests: - added queries that are fixed by this change (including the one above) in nested-struct-in-select-list.test Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57 --- M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test 6 files changed, 197 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/17847/4 -- To view, visit http://gerrit.cloudera.org:8080/17847 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57 Gerrit-Change-Number: 17847 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins