Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 )
Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints ...................................................................... Patch Set 14: (49 comments) Sorry for adding some comments on the older patch sets. I reviewed this over multiple days and the update was updated during that. Many of these comments are questions to understand things better. http://gerrit.cloudera.org:8080/#/c/17859/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17859/11//COMMIT_MSG@7 PS11, Line 7: Sync db/table in catalog cache to latest HMS event id when performing : DDL operations via catalog HMS endpoints If this patch is close to getting merged, now is a good to add more details here and follow the commit message format styles. http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG@7 PS12, Line 7: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing : DDL operations via catalog HMS endpoints Since this patch is not a WIP anymore, can you please follow the commit message conventions (limit the subject to 72 chars) and add a detailed description of change? http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@457 PS6, Line 457: for(int i = 0; i < numTables; i++) { : Table tbl = tables[i]; : if (!tryWriteLock(tbl)) { : LOG.debug("Could not acquire write lock on table: " + tbl.getFullName()); : // unlock previously locked tables : for(int j = 0; j < i; j++) { : tables[j].releaseWriteLock(); : } : return false; : } : // unlock version write lock for all tables : // except last : if (i < numTables-1) { : versionLock_.writeLock().unlock(); : } I think the versionLock.writeLock().unlock() should be moved out finally block. Also if tryWriteLock() throws an exception the locks on the previous tables is not released. It is critical to make sure that this method is releasing locks correctly under error conditions. http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3652 PS6, Line 3652: setEventFactoryForSyncToLatestEvent annotate with @VisibleForTesting if this was for testing. http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3658 PS6, Line 3658: public nit, missing newline. http://gerrit.cloudera.org:8080/#/c/17859/11/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17859/11/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@459 PS11, Line 459: tryWriteLock I had left some comments in the older gerrit url for this patch. Can you please address them? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@457 PS12, Line 457: for(int i = 0; i < numTables; i++) { : Table tbl = tables[i]; : if (!tryWriteLock(tbl)) { : LOG.debug("Could not acquire write lock on table: " + tbl.getFullName()); : // unlock previously locked tables : for(int j = 0; j < i; j++) { : tables[j].releaseWriteLock(); : } : return false; : } : // unlock version write lock for all tables : // except last : if (i < numTables-1) { : versionLock_.writeLock().unlock(); : } a RuntimeException thrown at line 459 will not release the table locks as well as the versionLock. Can you please make this more robust? Specifically, we want locks on all the tables or on none. A typical way to implement this would be List<Table> lockedTbls = new ArrayList<>(tables.length); try { for (Table tbl : tables) { if (!tryWriteLock(tbl)) throw new CatalogException("Could not acquire lock on " + tbl.fullName()); lockedTbls.add(tbl); } } finally { try { if (lockedTbls.size() != tables.length) { for (Table tbl : lockedTbls) tbl.releaseWriteLock(); } } catch (Exception e) { LOG.error("Some write locks may not have been released on the tables in + lockedTbls.toString()", e); } // versionLock_ must be released before leaving the method. versionLock_.writeLock().unlock(); } http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2672 PS12, Line 2672: getTable why do we need to override this method? http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java File fe/src/main/java/org/apache/impala/catalog/Db.java: http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java@115 PS6, Line 115: lastSyncedEventId_ please add a comment here explaining what the value of this field signifies. http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java@115 PS6, Line 115: volatile need this? http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java@132 PS6, Line 132: } add a newline. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Db.java File fe/src/main/java/org/apache/impala/catalog/Db.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Db.java@115 PS12, Line 115: lastSyncedEventId_ nit, add a comment on what this field represents and why it is volatile. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Db.java@132 PS12, Line 132: } nit, add a single blank line. http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Table.java@186 PS6, Line 186: volatile not sure I understand why this needs to be volatile? http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Table.java@211 PS6, Line 211: } new line. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java@186 PS12, Line 186: lastSyncedEventId_ nit, add a comment on what this field represents and why it is volatile. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java@212 PS12, Line 212: public nit, single line space. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoader.java File fe/src/main/java/org/apache/impala/catalog/TableLoader.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@170 PS12, Line 170: initMetrics since these metrics are not getting exposed anywhere. can we remove these? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@168 PS12, Line 168: start do we need these changes to this class? http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@166 PS6, Line 166: getInstance can be simply renamed to get() http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@461 PS12, Line 461: shouldSkipWhenSyncingToLatestEventId why are we not just using isSelfEvent() here? Would it not be cleaner to change its implementation to use lastSyncedEventId instead of self event identifiers? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@576 PS12, Line 576: boolean is this commented for a reason? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@800 PS12, Line 800: if (tbl == null) { : infoLog("Skipping on table {}.{} since it does not exist in cache", dbName, : tblName); : return true; : } if the table is not present do we need to check if it is in the deletelog? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@990 PS12, Line 990: InsertEvent I think we should implement a check for lastSyncedEventId based self-event detection in this class as well. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1175 PS12, Line 1175: !shouldSkipHelper(tableBefore_.getDbName(), tableBefore_.getTableName()) && : shouldSkipHelper(tableAfter_.getDbName(), tableAfter_.getTableName()) It is not clear to me why we are checking on both the before and after table? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1699 PS12, Line 1699: AlterPartitionEvent Why does this class not implement shouldSkipWhenSyncingToLatestEventId? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@328 PS12, Line 328: shouldBeAlreadyLocked Is there a caller which passes this as false? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@329 PS12, Line 329: Preconditions.checkState(tbl.isWriteLockedByCurrentThread(), : String.format("Write lock is not held on table %s by current thread", : tbl.getFullName())); I think this preconditions check should be done in any case to catch bugs where a caller is calling this method without holding the lock and passing shouldBeAlreadyLocked as false. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@344 PS12, Line 344: if (events.isEmpty()) { : LOG.debug("table {} synced till event id {}. No new HMS events to process from " : + "event id: {}", tbl.getFullName(), lastEventId, : lastEventId + 1); : return; : } it looks like the lastSyncedEventId represents the lastEventId of the table not the lastEventId globally. We should update the documentation of the field to reflect that accurately. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@354 PS12, Line 354: TODO: : 1. Should we stop after processing drop_table : event because drop_table event would remove table : from cache and subsequent create_table event would add : new table object in cache? Since the objective of this method to sync this table to its latest eventid, I think we should continue processing inspite of a drop table event. E.g if the table has been recreated since the lastSyncedEvent, it makes sense to me that this table will be recreated by this method. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@386 PS12, Line 386: tbl.setLastSyncedEventId(currentEvent.getEventId()); can this logic be moved to the event.processIfEnabled() itself? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@458 PS12, Line 458: TODO: should we ignore case yes, case should be ignored. Also I think we should make sure that the catalog name is same as well or we skip the events on non-default catalogs like events processor. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@478 PS12, Line 478: event.getTableName() == null not sure I get this part. Why do we need to do this? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@533 PS12, Line 533: storeEventFactory.get can we use the singleton way to get the factory here? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@283 PS12, Line 283: dropDbIfExists why not sync to latest event here? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@857 PS12, Line 857: // TODO: Check if HMS events are generated for : // both source and dest table. I think exchange partition : // generates drop_partition event for source table : // and add_partition event for destination table Can this be removed now? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@905 PS12, Line 905: // TODO: Check if HMS events are generated for : // both source and dest table. I think exchange partition : // generates drop_partition event for source table : // and add_partition event for destination table remove if verified http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@1105 PS12, Line 1105: dropTableIfExists why not sync to latest event here? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@1279 PS12, Line 1279: renameTable Do we need a preconditions check to confirm that table is locked by this thread? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@1402 PS12, Line 1402: catalog_.addIncompleteTable(dbName, tblName, createEventId); : LOG.info("Added incomplete table {}.{} with create event id: {}", dbName, tblName, : createEventId); : // sync to latest event ID : tbl = getTableAndAcquireWriteLock(dbName, tblName, apiName); : catalog_.getLock().writeLock().unlock(); : syncToLatestEventId(tbl, apiName); intuitively it probably makes more sense to let the syncToLatestEventId create the table for you. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@747 PS12, Line 747: TODO: Should we recreate table if its create event id : // does not match the one passed in fn argument? That sounds like a error condition to me. We could add a Preconditions.checkState(existingTable.getCreateEventId() >= eventId) in such a case. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@909 PS12, Line 909: if (dbToAlter == null) { : LOG.debug("Event id: {}, not altering db {} since it does not exist in catalogd", : eventId, dbName); : check deleteEventLog here? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@926 PS12, Line 926: updateDbIfExists not sure if you should use this method here because it takes a version lock again. It looks like updateDbIfExists isn't used anymore so it would be good to remove it altogether and do the update from this method itself. You should also get the catalogVersion for the update before releasing the versionLock on 919 so that we don't need to take it again. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@757 PS12, Line 757: TODO still WIP? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1611 PS12, Line 1611: if (!BackendConfig.INSTANCE.enableSyncToLatestEventOnDdls()) { : assertEquals(EventProcessorStatus.NEEDS_INVALIDATE, : eventsProcessor_.getStatus()); : } not sure I fully understand this. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java File fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@199 PS12, Line 199: tsProcessor_.processEve Will this cause flakiness of the test since the HMS is shared among multiple tests. If another test is running concurrently which generates the events, this check will fail. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@240 PS12, Line 240: // stored in catalog cache : assertTrue(tbl.getPartitions().size() == 0); : not sure I understand this. The add_partitions API in HMS handler should sync the table to latest event here. Why are we asserting that there would not be partitions? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@550 PS12, Line 550: can you also assert here that lastSyncedEventId > createEventId http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@561 PS12, Line 561: log_.setMetastoreEventProcessor(eventsPr not sure what is the goal of this test? This is testing functionality which was pre-existing and mostly covered in MetastoreEventsProcessorTest, right? -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36364e401911352c4666674eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 14 Gerrit-Owner: Sourabh Goyal <soura...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <kis...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com> Gerrit-Comment-Date: Tue, 05 Oct 2021 19:27:41 +0000 Gerrit-HasComments: Yes