Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
......................................................................


Patch Set 14:

(49 comments)

Sorry for adding some comments on the older patch sets. I reviewed this over 
multiple days and the update was updated during that. Many of these comments 
are questions to understand things better.

http://gerrit.cloudera.org:8080/#/c/17859/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17859/11//COMMIT_MSG@7
PS11, Line 7: Sync db/table in catalog cache to latest HMS event id when 
performing
            : DDL operations via catalog HMS endpoints
If this patch is close to getting merged, now is a good to add more details 
here and follow the commit message format styles.


http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG@7
PS12, Line 7: IMPALA-10926: Sync db/table in catalog cache to latest HMS event 
id when performing
            : DDL operations via catalog HMS endpoints
Since this patch is not a WIP anymore, can you please follow the commit message 
conventions (limit the subject to 72 chars) and add a detailed description of 
change?


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@457
PS6, Line 457:     for(int i = 0; i < numTables; i++) {
             :       Table tbl = tables[i];
             :       if (!tryWriteLock(tbl)) {
             :         LOG.debug("Could not acquire write lock on table: " + 
tbl.getFullName());
             :         // unlock previously locked tables
             :         for(int j = 0; j < i; j++) {
             :           tables[j].releaseWriteLock();
             :         }
             :         return false;
             :       }
             :       // unlock version write lock for all tables
             :       // except last
             :       if (i < numTables-1) {
             :         versionLock_.writeLock().unlock();
             :       }
I think the versionLock.writeLock().unlock() should be moved out finally block. 
Also if tryWriteLock() throws an exception the locks on the previous tables is 
not released. It is critical to make sure that this method is releasing locks 
correctly under error conditions.


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3652
PS6, Line 3652: setEventFactoryForSyncToLatestEvent
annotate with @VisibleForTesting if this was for testing.


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3658
PS6, Line 3658: public
nit, missing newline.


http://gerrit.cloudera.org:8080/#/c/17859/11/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17859/11/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@459
PS11, Line 459: tryWriteLock
I had left some comments in the older gerrit url for this patch. Can you please 
address them?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@457
PS12, Line 457:     for(int i = 0; i < numTables; i++) {
              :       Table tbl = tables[i];
              :       if (!tryWriteLock(tbl)) {
              :         LOG.debug("Could not acquire write lock on table: " + 
tbl.getFullName());
              :         // unlock previously locked tables
              :         for(int j = 0; j < i; j++) {
              :           tables[j].releaseWriteLock();
              :         }
              :         return false;
              :       }
              :       // unlock version write lock for all tables
              :       // except last
              :       if (i < numTables-1) {
              :         versionLock_.writeLock().unlock();
              :       }
a RuntimeException thrown at line 459 will not release the table locks as well 
as the versionLock. Can you please make this more robust? Specifically, we want 
locks on all the tables or on none. A typical way to implement this would be
List<Table> lockedTbls = new ArrayList<>(tables.length);
try {
  for (Table tbl : tables) {
    if (!tryWriteLock(tbl)) throw new CatalogException("Could not acquire lock 
on " + tbl.fullName());
    lockedTbls.add(tbl);
  }
} finally {
  try {
  if (lockedTbls.size() != tables.length) {
    for (Table tbl : lockedTbls) tbl.releaseWriteLock();
  }
  } catch (Exception e) {
    LOG.error("Some write locks may not have been released on the tables in + 
lockedTbls.toString()", e);
  }
  // versionLock_ must be released before leaving the method.
  versionLock_.writeLock().unlock();
}


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2672
PS12, Line 2672: getTable
why do we need to override this method?


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java
File fe/src/main/java/org/apache/impala/catalog/Db.java:

http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java@115
PS6, Line 115: lastSyncedEventId_
please add a comment here explaining what the value of this field signifies.


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java@115
PS6, Line 115: volatile
need this?


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java@132
PS6, Line 132:   }
add a newline.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Db.java
File fe/src/main/java/org/apache/impala/catalog/Db.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Db.java@115
PS12, Line 115: lastSyncedEventId_
nit, add a comment on what this field represents and why it is volatile.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Db.java@132
PS12, Line 132:   }
nit, add a single blank line.


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Table.java@186
PS6, Line 186: volatile
not sure I understand why this needs to be volatile?


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Table.java@211
PS6, Line 211: }
new line.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java@186
PS12, Line 186: lastSyncedEventId_
nit, add a comment on what this field represents and why it is volatile.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java@212
PS12, Line 212: public
nit, single line space.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoader.java
File fe/src/main/java/org/apache/impala/catalog/TableLoader.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@170
PS12, Line 170: initMetrics
since these metrics are not getting exposed anywhere. can we remove these?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@168
PS12, Line 168: start
do we need these changes to this class?


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@166
PS6, Line 166: getInstance
can be simply renamed to get()


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@461
PS12, Line 461: shouldSkipWhenSyncingToLatestEventId
why are we not just using isSelfEvent() here? Would it not be cleaner to change 
its implementation to use lastSyncedEventId instead of self event identifiers?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@576
PS12, Line 576: boolean
is this commented for a reason?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@800
PS12, Line 800:         if (tbl == null) {
              :           infoLog("Skipping on table {}.{} since it does not 
exist in cache", dbName,
              :               tblName);
              :           return true;
              :         }
if the table is not present do we need to check if it is in the deletelog?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@990
PS12, Line 990: InsertEvent
I think we should implement a check for lastSyncedEventId based self-event 
detection in this class as well.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1175
PS12, Line 1175: !shouldSkipHelper(tableBefore_.getDbName(), 
tableBefore_.getTableName()) &&
               :           shouldSkipHelper(tableAfter_.getDbName(), 
tableAfter_.getTableName())
It is not clear to me why we are checking on both the before and after table?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1699
PS12, Line 1699: AlterPartitionEvent
Why does this class not implement shouldSkipWhenSyncingToLatestEventId?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
File 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@328
PS12, Line 328: shouldBeAlreadyLocked
Is there a caller which passes this as false?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@329
PS12, Line 329: Preconditions.checkState(tbl.isWriteLockedByCurrentThread(),
              :           String.format("Write lock is not held on table %s by 
current thread",
              :               tbl.getFullName()));
I think this preconditions check should be done in any case to catch bugs where 
a caller is calling this method without holding the lock and passing 
shouldBeAlreadyLocked as false.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@344
PS12, Line 344: if (events.isEmpty()) {
              :         LOG.debug("table {} synced till event id {}. No new HMS 
events to process from "
              :                 + "event id: {}", tbl.getFullName(), 
lastEventId,
              :             lastEventId + 1);
              :         return;
              :       }
it looks like the lastSyncedEventId represents the lastEventId of the table not 
the lastEventId globally. We should update the documentation of the field to 
reflect that accurately.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@354
PS12, Line 354: TODO:
              :        1. Should we stop after processing drop_table
              :           event because drop_table event would remove table
              :           from cache and subsequent create_table event would add
              :           new table object in cache?
Since the objective of this method to sync this table to its latest eventid, I 
think we should continue processing inspite of a drop table event. E.g if the 
table has been recreated since the lastSyncedEvent, it makes sense to me that 
this table will be recreated by this method.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@386
PS12, Line 386: tbl.setLastSyncedEventId(currentEvent.getEventId());
can this logic be moved to the event.processIfEnabled() itself?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@458
PS12, Line 458: TODO: should we ignore case
yes, case should be ignored. Also I think we should make sure that the catalog 
name is same as well or we skip the events on non-default catalogs like events 
processor.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@478
PS12, Line 478: event.getTableName() == null
not sure I get this part. Why do we need to do this?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@533
PS12, Line 533: storeEventFactory.get
can we use the singleton way to get the factory here?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@283
PS12, Line 283: dropDbIfExists
why not sync to latest event here?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@857
PS12, Line 857: // TODO: Check if HMS events are generated for
              :       // both source and dest table. I think exchange partition
              :       // generates drop_partition event for source table
              :       // and add_partition event for destination table
Can this be removed now?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@905
PS12, Line 905:       // TODO: Check if HMS events are generated for
              :       // both source and dest table. I think exchange partition
              :       // generates drop_partition event for source table
              :       // and add_partition event for destination table
remove if verified


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@1105
PS12, Line 1105: dropTableIfExists
why not sync to latest event here?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@1279
PS12, Line 1279: renameTable
Do we need a preconditions check to confirm that table is locked by this thread?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@1402
PS12, Line 1402:       catalog_.addIncompleteTable(dbName, tblName, 
createEventId);
               :       LOG.info("Added incomplete table {}.{} with create event 
id: {}", dbName, tblName,
               :           createEventId);
               :       // sync to latest event ID
               :       tbl = getTableAndAcquireWriteLock(dbName, tblName, 
apiName);
               :       catalog_.getLock().writeLock().unlock();
               :       syncToLatestEventId(tbl, apiName);
intuitively it probably makes more sense to let the syncToLatestEventId create 
the table for you.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@747
PS12, Line 747: TODO: Should we recreate table if its create event id
              :       // does not match the one passed in fn argument?
That sounds like a error condition to me. We could add a 
Preconditions.checkState(existingTable.getCreateEventId() >= eventId) in such a 
case.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@909
PS12, Line 909:   if (dbToAlter == null) {
              :       LOG.debug("Event id: {}, not altering db {} since it does 
not exist in catalogd",
              :           eventId, dbName);
              :
check deleteEventLog here?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@926
PS12, Line 926: updateDbIfExists
not sure if you should use this method here because it takes a version lock 
again. It looks like updateDbIfExists isn't used anymore so it would be good to 
remove it altogether and do the update from this method itself. You should also 
get the catalogVersion for the update before releasing the versionLock on 919 
so that we don't need to take it again.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@757
PS12, Line 757: TODO
still WIP?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1611
PS12, Line 1611:         if 
(!BackendConfig.INSTANCE.enableSyncToLatestEventOnDdls()) {
               :           assertEquals(EventProcessorStatus.NEEDS_INVALIDATE,
               :               eventsProcessor_.getStatus());
               :         }
not sure I fully understand this.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
File 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@199
PS12, Line 199: tsProcessor_.processEve
Will this cause flakiness of the test since the HMS is shared among multiple 
tests. If another test is running concurrently which generates the events, this 
check will fail.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@240
PS12, Line 240: // stored in catalog cache
              :             assertTrue(tbl.getPartitions().size() == 0);
              :
not sure I understand this. The add_partitions API in HMS handler should sync 
the table to latest event here. Why are we asserting that there would not be 
partitions?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@550
PS12, Line 550:
can you also assert here that lastSyncedEventId > createEventId


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@561
PS12, Line 561: log_.setMetastoreEventProcessor(eventsPr
not sure what is the goal of this test? This is testing functionality which was 
pre-existing and mostly covered in MetastoreEventsProcessorTest, right?



--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36364e401911352c4666674eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 14
Gerrit-Owner: Sourabh Goyal <soura...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <kis...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com>
Gerrit-Comment-Date: Tue, 05 Oct 2021 19:27:41 +0000
Gerrit-HasComments: Yes

Reply via email to