Hello Quanlong Huang, k.venureddy2...@gmail.com, Csaba Ringhofer, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20367 to look at the new patch set (#5). Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id after the DDL operation for all DDLS from Impala clients ...................................................................... IMPALA-10976: Sync db/table in catalogD to latest HMS event id after the DDL operation for all DDLS from Impala clients The idea is that when any DDL operation is performed by Impala, it also syncs the db/table to its latest event ID as per HMS. This way updates to a db/table's are applied in the same order as they appear in the Notification log table in HMS which ensures consistency. Currently catalogD applies any updates received from Impala clients in-place. Instead it should perform an HMS operation first and then replay all the HMS events since the last synced event id. Implementation: when the enable_sync_to_latest_event_on_ddls flag is set to true, we do the DDL operation first, i.e., perform HMS operation and then sync the db/table in the catalogD's cache to the latest event in HMS for the corresponding db/table. Currently we fetch all events greater than the db/table's lastSyncEventId and filter them in the events processor to sync only the current db/table events. Once HIVE-27499 is implemented, we can directly fetch the events only for the respective db/table and process them. Currently, there is no efficient way to identify if there are pending events for a db/table. Set 'enable_sync_to_latest_event_on_ddls' to true and 'invalidate_hms_cache_on_ddls' to false to use this feature. Note: We don't modify the cache using MetastoreEventsProcessor for alter table rename operation as this is a complex operation regarding cache modification. Also, we modify cache for a DML operation 'truncate table' using this feature. Testing: 1) Added few tests in the MetaStoreEventProcessorForTest to verify this feature that simulates the metadata sync between HMS and Impala. 2) Added few tests in the CatalogHmsSyncToLatestEventIdTest class to the metadata sync between HMS end point, Catalog Metastore Server and Impala. The HMS end point serves as common interface to metadata changes outside the current Impala service such as Hive, Spark or other Impala service. Also verified the table lastSyncEventId is updated after the events are sync and confirmed that metastore event processor ignored these synced events. Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java 5 files changed, 511 insertions(+), 92 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/20367/5 -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 5 Gerrit-Owner: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <k.venureddy2...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com>