[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 ) Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients .. Patch Set 4: (8 comments) http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1045 PS4, Line 1045: alterTableOrViewRename(tbl, : TableName.fromThrift(params.getRename_params().getNew_table_name()), : newCatalogVersion, wantMinimalResult, response); : return; I don't see this change applied to renames, while the commit message suggests that it is applied to all DDLs. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1263 PS4, Line 1263: boolean syncToLatestEventId = : BackendConfig.INSTANCE.enableSyncToLatestEventOnDdls(); : if (syncToLatestEventId) { : MetastoreEventsProcessor.syncToLatestEventId(catalog_, tbl, : catalog_.getEventFactoryForSyncToLatestEvent()); refactor idea: this could be moved to a function like CatalogOpExecutor.trySyncToLatestEventId(). If it returns true, then reloading the table / handling in-flight events could be skipped. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1270 PS4, Line 1270: if (tbl.hasInProgressModification()) Preconditions.checkState(reloadMetadata); This seems valid even in the syncToLatestEventId case. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1276 PS4, Line 1276: catalog_.addVersionsForInflightEvents(false, tbl, newCatalogVersion); Are we using in flight event lists for anything when enableSyncToLatestEventOnDdls is true? If not, then checking the lists could be skipped in the event processor. This would solve IMPALA-12461. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1658 PS4, Line 1658: syncToLatestEventId What kind of guarantees do we have for the state of the event processor when syncing the events for a table? I don't see any synchronization - is it possible that it processes events from HMS polling at the same time? My concern is about processing events for the same table - can't this lead to processing the event twice, once in syncToLatestEventId, and once on the the event processor thread? http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7004 PS4, Line 7004: try { : createInsertEvents((FeFsTable) table, update.getUpdated_partitions(), : addedPartitionNames, update.is_overwrite, tblTxn); : } catch (Exception e) { : LOG.error("Failed to fire insert events for table {}", table.getFullName(), e); : } Allowing createInsertEvents() to fail is more problematic in case of enableSyncToLatestEventOnDdls, as we rely on creating these events to reload the table. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7031 PS4, Line 7031: catalog_.addVersionsForInflightEvents(false, table, newCatalogVersion); Can't we skip this too if the events will by synced for the table? http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7040 PS4, Line 7040: // update metastore table/partitions for insert command : boolean syncToLatestEventId = : BackendConfig.INSTANCE.enableSyncToLatestEventOnDdls(); : if (syncToLatestEventId) { : MetastoreEventsProcessor.syncToLatestEventId(catalog_, table, : catalog_.getEventFactoryForSyncToLatestEvent()); : } else { : loadTableMetadata(table, newCatalogVersion, true, false, partsToLoadMetadata, : partitionToEventId, "INSERT", update.getDebug_action()); : } The commit message and the flag name suggests that we only apply this for DDLs, while here it is done for a DML. I also see that it is applied to TRUNCATE. -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c
[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 ) Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/20367/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20367/4//COMMIT_MSG@8 PS4, Line 8: for all DDLS Can you add to the title whether Impala syncs the events before or after the DDL operation? I think that this a critical info about the intention of the patch http://gerrit.cloudera.org:8080/#/c/20367/4//COMMIT_MSG@20 PS4, Line 20: Once HIVE-27499 is implemented, we can update the : snapshot only if there are any pending events. Currently, there is no : efficient way to identify if there are pending events for a db/table. Won't there be always a pending event, as the DDL from Impala creates a new event for the DB/Table? http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1396 PS3, Line 1396: versionLock_.writeLock().lock(); > We always obtain a write lock on the table in CatalogOpExecutor before doin I think that we also need to hold versionLock_ before calling this, as the locking protocol states that version lock should be acquired before table lock. An example where we take version lock first and then the table lock is CatalogServiceCatalog.getOrLoadTable Note that this is probably irrelevant, see comment at line 1391. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1391 PS4, Line 1391: updateMetastoreTable I couldn't find any place where we call this function - is it still used? http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1546 PS4, Line 1546: alterTableRename(testTblName, newTblName, TEST_DB_NAME); : eventsProcessor_.processEvents(); What will happen if a rename is detected during a DDL and not in eventsProcessor_.processEvents();? E.g. a t1 is renamed to t2, then back to t1, and Impala tries to do a DDL on t1. I don't think that we really need a proper handling for this, but it shouldn't cause global issues in the catalog. -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 27 Oct 2023 16:14:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 ) Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients .. Patch Set 4: (4 comments) > Patch Set 3: > > (4 comments) http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1388 PS3, Line 1388: table ob > nit: "table" Ack http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1396 PS3, Line 1396: versionLock_.writeLock().lock(); > We need the table write lock to do this. Please add a precondition check at We always obtain a write lock on the table in CatalogOpExecutor before doing this (Eg: CatalogOpExecutor L#1634). Nevertheless, it is good to add a precondition here. http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1033 PS3, Line 1033: long oldCatalogVersion = tbl.getCatalogVersion(); > Why do we move tryWriteLock() after getCatalogVersion()? Not an intended change. Let me undo this. http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1040 PS3, Line 1040: if (params.getAlter_type() == TAlterTableType.RENAME_VIEW > I might miss something so got confused. After IMPALA-10926, in CatalogMetas Addressed in the latest patch. -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 26 Oct 2023 21:05:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 ) Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients .. Patch Set 4: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/14267/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 26 Oct 2023 20:02:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients
Hello Quanlong Huang, k.venureddy2...@gmail.com, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20367 to look at the new patch set (#4). Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients .. IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients The idea is that when any DDL operation is performed by Impala, it also syncs the db/table to its latest event ID as per HMS. This way updates to a db/table's are applied in the same order as they appear in the Notification log table in HMS which ensures consistency. Currently catalogD applies any updates received from Impala clients in-place. Instead it should perform an HMS operation first and then replay all the HMS events since the last synced event id. Implementation: when the enable_sync_to_latest_event_on_ddls flag is set to true, we fetch the latest snapshot of db/table and update it in the catalogD's cache. Once HIVE-27499 is implemented, we can update the snapshot only if there are any pending events. Currently, there is no efficient way to identify if there are pending events for a db/table. Set 'enable_sync_to_latest_event_on_ddls' to true and 'invalidate_hms_cache_on_ddls' to false to use this feature. Testing: 1) Added few tests in the MetaStoreEventProcessorForTest to verify this feature that simulates the metadata sync between HMS and Impala. 2) Added few tests in the CatalogHmsSyncToLatestEventIdTest class to the metadata sync between HMS end point, Catalog Metastore Server and Impala. The HMS end point serves as common interface to metadata changes outside the current Impala service such as Hive, Spark or other Impala service. Also verified the table lastSyncEventId is updated after the events are sync and confirmed that metastore event processor ignored these synced events. Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java 6 files changed, 553 insertions(+), 101 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/20367/4 -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 ) Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1388 PS3, Line 1388: database nit: "table" http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1396 PS3, Line 1396: tbl.setMetaStoreTable(msTbl); We need the table write lock to do this. Please add a precondition check at the beginning of the method: Preconditions.checkState(tbl.isWriteLockedByCurrentThread()); http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1033 PS3, Line 1033: tryWriteLock(tbl); Why do we move tryWriteLock() after getCatalogVersion()? http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1040 PS3, Line 1040: tbl = catalog_.updateMetastoreTable(msTbl, tbl); I might miss something so got confused. After IMPALA-10926, in CatalogMetastoreServiceHandler we perform the catalog operation in the following steps: 1. Acquire write lock on the table 2. Perform ddl operation in HMS 3. Sync table till the latest event id (as per HMS) since its last synced event id. Also update the last-synced-event-id at the end. #2 updates HMS and #3 updates the local catalog cache based on the events only. Catalogd won't modify the table cache based on the ddl request. Changes from different sources (local impala, external impala or other HMS clients) are applied in the order as they are executed in HMS, which brings us a higher consistency. However, the current patch just fetch the latest HMS object and use it to update the local cache before applying the ddl request. The last-synced-event-id is not updated at the end. This doesn't match what you mentioned in the commit message: > Currently catalogD applies any updates received from Impala clients in-place. Instead it should perform an HMS operation first and then replay all the HMS events since the last synced event id. Could you explain more about the current solution? -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 3 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 19 Oct 2023 11:01:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 ) Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14190/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 3 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Mon, 16 Oct 2023 19:14:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 ) Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1006 PS2, Line 1006: msTbl > How do we deal with the case when table is renamed through external source( If the rename event happened outside of metastore, how do you expect it sync to catalogD? CatalogD can deal with events present in the metastore. http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1007 PS2, Line 1007: tbl = catalog_.updateMetastoreTable(msTbl, tbl); > We don't seem to reload table ? Probably table schema, partitions would hav We don't have to reload entire hdfs table for 2 reasons. 1) It's a very costly operation to reload hdfs tables. (even during alter table query from impala, we only reload what is really required in the table like table schema or file schema). 2) We always access the metastore table from the cached impala representation of the table, so we would apply the current change on top of the latest metastore's table and reload the part that is really required. All in all, I believe reloading the table without knowing what changed is too aggressive and it would have complications in the performance. http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1011 PS2, Line 1011: long oldCatalogVersion = tbl.getCatalogVersion(); > I believe, we need to get the oldCatalogVersion before updating tbl. Becaus Ack http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2398 PS2, Line 2398: long newCatalogVersion = catalog_.incrementAndGetCatalogVersion(); > It is good to get the newCatalogVersion after catalog_.updateMetastoreTable Ack -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 2 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Mon, 16 Oct 2023 18:49:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients
Hello Quanlong Huang, k.venureddy2...@gmail.com, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20367 to look at the new patch set (#3). Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients .. IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients The idea is that when any DDL operation is performed by Impala, it also syncs the db/table to its latest event ID as per HMS. This way updates to a db/table's are applied in the same order as they appear in the Notification log table in HMS which ensures consistency. Currently catalogD applies any updates received from Impala clients in-place. Instead it should perform an HMS operation first and then replay all the HMS events since the last synced event id. Implementation: when the enable_sync_to_latest_event_on_ddls flag is set to true, we fetch the latest snapshot of db/table and update it in the catalogD's cache. Once HIVE-27499 is implemented, we can update the snapshot only if there are any pending events. Currently, there is no efficient way to identify if there are pending events for a db/table. Set enable_sync_to_latest_event_on_ddls to true to use this feature. Testing: Added few tests in the MetaStoreEventProcessorForTest to verify this feature. Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 4 files changed, 150 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/20367/3 -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 3 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients
k.venureddy2...@gmail.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 ) Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients .. Patch Set 2: (4 comments) few observations http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1006 PS2, Line 1006: msTbl How do we deal with the case when table is renamed through external source(hive/other catalogd) and the rename event is still not synced on this catalogd ? http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1007 PS2, Line 1007: tbl = catalog_.updateMetastoreTable(msTbl, tbl); We don't seem to reload table ? Probably table schema, partitions would have changed. http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1011 PS2, Line 1011: long oldCatalogVersion = tbl.getCatalogVersion(); I believe, we need to get the oldCatalogVersion before updating tbl. Because addTableToCatalogUpdate() is done only when oldCatalogVersion is different from tbl.getCatalogVersion(). http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2398 PS2, Line 2398: long newCatalogVersion = catalog_.incrementAndGetCatalogVersion(); It is good to get the newCatalogVersion after catalog_.updateMetastoreTable(). Otherwise newCatalogVersion would be smaller than table.getCatalogVersion(). -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 2 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 29 Sep 2023 20:16:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 ) Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients .. Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/20367/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20367/1//COMMIT_MSG@8 PS1, Line 8: Impala clien > I think that impala shell is misleading here, as any client initiate DDLs. Ack http://gerrit.cloudera.org:8080/#/c/20367/1//COMMIT_MSG@10 PS1, Line 10: by Impala, it al > Same as for the title, IMO "performed by Impala" would be better. Ack http://gerrit.cloudera.org:8080/#/c/20367/1//COMMIT_MSG@14 PS1, Line 14: Impala clien > same as for the title Ack http://gerrit.cloudera.org:8080/#/c/20367/1//COMMIT_MSG@20 PS1, Line 20: HIVE-27499 > Do you think that this feature can practically work before HIVE-27499? My Before HIVE-27499, we'll always fetch the HMS metadata no matter the event processor is lagging or not. Unfortunately, It does add some overhead. But this overhead ensures consistency. Yeah, any thing can happen between RPC call and processing of DDL, as it happens today and we cannot do anything about it. http://gerrit.cloudera.org:8080/#/c/20367/1//COMMIT_MSG@24 PS1, Line 24: Set enable_sync_to_latest_event_on_ddls to true to use this feature. > This flag is not exposed yet, we can only set it in tests. This is global flag https://github.com/apache/impala/blob/master/be/src/catalog/catalog-server.cc#L121 and by default it is disabled. http://gerrit.cloudera.org:8080/#/c/20367/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/20367/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1369 PS1, Line 1369: the table with the given metastore table > This function seems to be about tables, not databases. Ack http://gerrit.cloudera.org:8080/#/c/20367/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20367/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1012 PS1, Line 1012: // Get a new catalog version to assign to the table being altered. > Calling an HMS RPC when the global catalog version lock is locked is proble Valid point. We should always do RPC call before locking the catalog version. http://gerrit.cloudera.org:8080/#/c/20367/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/20367/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1488 PS1, Line 1488: impala shell > We are not using the shell here (or any Impala client) here, right? My unde Ack. I'm just simulating what would happen once the request lands on Impala's catalog OpExecutor. http://gerrit.cloudera.org:8080/#/c/20367/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1498 PS1, Line 1498: BackendConfig.INSTANCE.setHMSPollingIntervalInSeconds(10); > Will this have any effect? We seem check this only once, when the event pro Ah!! I thought this would take into effect. Is there any other way to make the event processor lagging in the Java unit tests? We would want to this in Java tests because we need to verify loaded table in java. There is no way we could verify this in end-to-end python tests. -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 2 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 05 Sep 2023 19:46:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 ) Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13796/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 2 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 22 Aug 2023 02:02:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20367 to look at the new patch set (#2). Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients .. IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients The idea is that when any DDL operation is performed by Impala, it also syncs the db/table to its latest event ID as per HMS. This way updates to a db/table's are applied in the same order as they appear in the Notification log table in HMS which ensures consistency. Currently catalogD applies any updates received from Impala clients in-place. Instead it should perform an HMS operation first and then replay all the HMS events since the last synced event id. Implementation: when the enable_sync_to_latest_event_on_ddls flag is set to true, we fetch the latest snapshot of db/table and update it in the catalogD's cache. Once HIVE-27499 is implemented, we can update the snapshot only if there are any pending events. Currently, there is no efficient way to identify if there are pending events for a db/table. Set enable_sync_to_latest_event_on_ddls to true to use this feature. Testing: Added few tests in the MetaStoreEventProcessorForTest to verify this feature. Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 4 files changed, 149 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/20367/2 -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 2 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala