Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17576 )
Change subject: IMPALA-10746: Drop table/db from catalog cache when drop table/db HMS apis are accessed from catalog's metastore server. ...................................................................... Patch Set 13: (5 comments) http://gerrit.cloudera.org:8080/#/c/17576/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17576/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2515 PS10, Line 2515: Table existingTbl = db.getTable(tblName); : if (existingTbl == null) return null; > Isn't this same as previous? It is. We need existingTbl to get eventId which is set in newly initialized incompleteTable. http://gerrit.cloudera.org:8080/#/c/17576/10/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17576/10/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@462 PS10, Line 462: : // Parse db name. Throw error if parsing fails. : String dbName; : String catName; : String[] catAndDbName = : > This code seems to have a race condition, The database could have been adde Good catch! We should remove line no. 463 - 467 and let catalogOpExecutor_.removeDbIfNotAddedLater() take care of whether to remove table of not. Moreover since drop table/db calls in CatalogOpExecutor add entry into deleteEventLog, the same should happen with drop table/db calls here which is missing. We can do the following: 1. In removeDbIfnotAddedLater() and removeTableIfNotAddedLater(), if the table and db is removed successfully, add drop event id into deleteEntryLog while still holding metastoreDdlLock. 2. CatalogOpExecutor should call removeDbIfNotAddedLater() and removeTableIfNotAddedLater() to drop table/db. Thoughts ? http://gerrit.cloudera.org:8080/#/c/17576/13/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17576/13/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@456 PS13, Line 456: if (!invalidateCacheOnDDLs_) { : LOG.debug("invalidateCacheOnDDLs_ flag is false, skipping " + : "cache update for hms operation drop_database on db: " + : databaseName); : return; : } : : // Parse db name. Throw error if parsing fails. : String dbName; : String catName; : String[] catAndDbName = : MetaStoreUtils.parseDbName(databaseName, serverConf_); : catName = catAndDbName[0]; : dbName = catAndDbName[1]; : // get DB to drop from catalog cache : Db dbToDrop = catalog_.getDb(dbName); : if(dbToDrop == null) { : // dbToDrop doesn't exist in cache : return; : } : List<NotificationEvent> events; : try { : events = MetastoreEventsProcessor : .getNextMetastoreEvents(catalog_, currentEventId, : event -> event.getEventType() : .equalsIgnoreCase(MetastoreEvents.DropDatabaseEvent : .DROP_DATABASE_EVENT_TYPE) : && catName.equalsIgnoreCase(event.getCatName()) : && dbName.equalsIgnoreCase(event.getDbName())); : } catch (ImpalaException e) { : String errorMsg = "Unable to process Drop database event for db: " + : dbToDrop.getName(); : LOG.error(errorMsg, e); : throw new MetaException(errorMsg); : } : if (events.isEmpty()) { : throw new MetaException( : "Drop database event not received. Check if notification " + : "events are configured in hive metastore"); : } : long dropEventId = events.get(events.size() - 1).getEventId(); : boolean isRemoved = : catalogOpExecutor_.removeDbIfNotAddedLater(dropEventId, : dbToDrop.getMetaStoreDb()); : if (isRemoved) { : LOG.info("Removed database: " + databaseName + : " from cache due to HMS API: drop_database"); > it seems cleaner to refactor this into a separate method just like we have I intentionally didn't do it since there is only one drop db api. Let me know if you feel otherwise. http://gerrit.cloudera.org:8080/#/c/17576/13/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2968 PS13, Line 2968: Map<String, String> tblProperties = catalogTbl.getMetaStoreTable().getParameters(); : if (tblProperties == null || MetaStoreUtils.isTransactionalTable(tblProperties)) { > Can we reuse AcidUtils.isTransactionalTable(catalogTbl.getMetaStoreTable(). How would it help? If we use Impala's AcidUtils method the logic would be: Map<String, String> tblProperties = catalogTbl.getMetaStoreTable().getParameters(); if (tblProperties == null || AcidUtils.isTransactionalTable(tblProperties)) { } http://gerrit.cloudera.org:8080/#/c/17576/13/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@3003 PS13, Line 3003: if (!invalidateCacheOnDDLs_) { > When will we set this to false ? invalidateCacheOnDDLs_ By default invalidateCacheOnDDLs_ is set to True. Since it triggers the full reload of the table (after invalidating it from cache), a user can set it to False if there are any performance concerns. The side effect of setting flag to false would be - external tables would be eventually consistent. -- To view, visit http://gerrit.cloudera.org:8080/17576 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2ad2630e2028b8ad26a6272ee766b27e0935c Gerrit-Change-Number: 17576 Gerrit-PatchSet: 13 Gerrit-Owner: Sourabh Goyal <soura...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <kis...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com> Gerrit-Comment-Date: Thu, 29 Jul 2021 12:35:27 +0000 Gerrit-HasComments: Yes