Sourabh Goyal has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17576 )

Change subject: IMPALA-10746: Drop table/db from catalog cache when drop 
table/db HMS apis are accessed from catalog's metastore server.
......................................................................


Patch Set 13:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17576/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17576/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2515
PS10, Line 2515:       Table existingTbl = db.getTable(tblName);
               :       if (existingTbl == null) return null;
> Isn't this same as previous?
It is. We need existingTbl to get eventId which is set in newly initialized 
incompleteTable.


http://gerrit.cloudera.org:8080/#/c/17576/10/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17576/10/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@462
PS10, Line 462:
              :     // Parse db name. Throw error if parsing fails.
              :     String dbName;
              :     String catName;
              :     String[] catAndDbName =
              :
> This code seems to have a race condition, The database could have been adde
Good catch!
We should remove line no. 463 - 467 and let 
catalogOpExecutor_.removeDbIfNotAddedLater() take care of whether to remove 
table of not.

Moreover since drop table/db calls in CatalogOpExecutor add entry into 
deleteEventLog, the same should happen with drop table/db calls here which is 
missing.

We can do the following:
1. In removeDbIfnotAddedLater() and removeTableIfNotAddedLater(), if the table 
and db is removed successfully, add drop event id into deleteEntryLog while 
still holding metastoreDdlLock.

2. CatalogOpExecutor should call removeDbIfNotAddedLater() and 
removeTableIfNotAddedLater() to drop table/db.

Thoughts ?


http://gerrit.cloudera.org:8080/#/c/17576/13/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17576/13/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@456
PS13, Line 456: if (!invalidateCacheOnDDLs_) {
              :       LOG.debug("invalidateCacheOnDDLs_ flag is false, skipping 
" +
              :           "cache update for hms operation drop_database on db: 
" +
              :           databaseName);
              :       return;
              :     }
              :
              :     // Parse db name. Throw error if parsing fails.
              :     String dbName;
              :     String catName;
              :     String[] catAndDbName =
              :         MetaStoreUtils.parseDbName(databaseName, serverConf_);
              :     catName = catAndDbName[0];
              :     dbName = catAndDbName[1];
              :     // get DB to drop from catalog cache
              :     Db dbToDrop = catalog_.getDb(dbName);
              :     if(dbToDrop == null) {
              :       // dbToDrop doesn't exist in cache
              :       return;
              :     }
              :     List<NotificationEvent> events;
              :     try {
              :       events = MetastoreEventsProcessor
              :           .getNextMetastoreEvents(catalog_, currentEventId,
              :               event -> event.getEventType()
              :                   
.equalsIgnoreCase(MetastoreEvents.DropDatabaseEvent
              :                       .DROP_DATABASE_EVENT_TYPE)
              :                   && 
catName.equalsIgnoreCase(event.getCatName())
              :                   && 
dbName.equalsIgnoreCase(event.getDbName()));
              :     } catch (ImpalaException e) {
              :       String errorMsg = "Unable to process Drop database event 
for db: " +
              :           dbToDrop.getName();
              :       LOG.error(errorMsg, e);
              :       throw new MetaException(errorMsg);
              :     }
              :     if (events.isEmpty()) {
              :       throw new MetaException(
              :           "Drop database event not received. Check if 
notification " +
              :               "events are configured in hive metastore");
              :     }
              :     long dropEventId = events.get(events.size() - 
1).getEventId();
              :     boolean isRemoved =
              :         catalogOpExecutor_.removeDbIfNotAddedLater(dropEventId,
              :             dbToDrop.getMetaStoreDb());
              :     if (isRemoved) {
              :       LOG.info("Removed database: " + databaseName +
              :           " from cache due to HMS API: drop_database");
> it seems cleaner to refactor this into a separate method just like we have
I intentionally didn't do it since there is only one drop db api. Let me know 
if you feel otherwise.


http://gerrit.cloudera.org:8080/#/c/17576/13/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2968
PS13, Line 2968: Map<String, String> tblProperties = 
catalogTbl.getMetaStoreTable().getParameters();
               :     if (tblProperties == null || 
MetaStoreUtils.isTransactionalTable(tblProperties)) {
> Can we reuse AcidUtils.isTransactionalTable(catalogTbl.getMetaStoreTable().
How would it help? If we use Impala's AcidUtils method the logic would be:

 Map<String, String> tblProperties = 
catalogTbl.getMetaStoreTable().getParameters();
  if (tblProperties == null || AcidUtils.isTransactionalTable(tblProperties)) {

}


http://gerrit.cloudera.org:8080/#/c/17576/13/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@3003
PS13, Line 3003:     if (!invalidateCacheOnDDLs_) {
> When will we set this to false ? invalidateCacheOnDDLs_
By default invalidateCacheOnDDLs_ is set to True. Since it triggers the full 
reload of the table (after invalidating it from cache), a user can set it to 
False if there are any performance concerns. The side effect of setting flag to 
false would be - external tables would be eventually consistent.



--
To view, visit http://gerrit.cloudera.org:8080/17576
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2ad2630e2028b8ad26a6272ee766b27e0935c
Gerrit-Change-Number: 17576
Gerrit-PatchSet: 13
Gerrit-Owner: Sourabh Goyal <soura...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <kis...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Jul 2021 12:35:27 +0000
Gerrit-HasComments: Yes

Reply via email to