Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17756 )
Change subject: [WIP]: Initial commit: Add logic to sync to latest event id ...................................................................... Patch Set 31: (12 comments) I will take another pass at this today or tomorrow since this patch is pretty big. http://gerrit.cloudera.org:8080/#/c/17756/31/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/17756/31/be/src/catalog/catalog-server.cc@116 PS31, Line 116: enable_catalogd_cache_sync_to_latest_event_id may be a better name would be enable_sync_to_latest_event_on_ddls. The "catalogd_cache" part of this config name is redundant. http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2141 PS31, Line 2141: if (tbl == null) return null; : LOG.debug("table {} exits in cache, last synced id {}", tbl.getFullName(), : tbl.getLastSyncedEventId()); : // if no validWriteIdList is provided, we return the tbl if its loaded : if (tbl.isLoaded() && validWriteIdList == null) { : LOG.debug("returning already loaded table {}", tbl.getFullName()); : return tbl; : } can you change these to trace? They might spew a lot of logs otherwise. http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/TableLoader.java File fe/src/main/java/org/apache/impala/catalog/TableLoader.java: http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@62 PS31, Line 62: eventFactory_ It would be good to avoid having eventFactory_ here since it leaks events processor details out in unrelated classes like TableLoader. http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@188 PS31, Line 188: initMetrics This is a weird place to have this method? Why are we initializing events processor metrics in TableLoader? http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@269 PS31, Line 269: NewMetastoreEventsFactory Do we need a new factory? Can we not override the isSelfEvent() method directly in the MetastoreEvent.isSelfEvent() to look for enable_catalogd_cache_sync_to_latest_event_id config and return false early. http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@841 PS31, Line 841: if (!catalog_.tryLockDb(db)) { : throw new CatalogException(String.format("Couldn't acquire lock on db %s", : db.getName())); : } : catalog_.getLock().writeLock().unlock(); : boolean shouldSkip = false; : if (db.getLastSyncedEventId() >= eventId) { : LOG.info("Skipping event {} on db {} since it is already synced till event id {}", : eventInfo, dbName_, eventId); : shouldSkip = true; : } : db.getLock().unlock(); please use try .. finally pattern when dealing with locks. Also make sure you have some form of UnlockWriteLockIfErronouslyLocked() in the finally block. http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@264 PS31, Line 264: import not sure why are we commenting these? http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@316 PS31, Line 316: metastoreEventsMetrics_ I think it would be very confusing to have same named metrics at 2 different places in the code. Why can't we reuse existing metastore event metrics? http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@438 PS31, Line 438: LOG.info("Adding db {} to catalogd, current event id {}", msDb.getName(), : currentEventId); : addDbToCatalog(currentEventId, msDb); Why do we need to add the db here? Shouldn't syncToLatestEventId(db, apiName) do it for us? http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3948 PS31, Line 3948: // set table's last sycned event id if no error occurred and : // table's last synced event id < current event id : if (!errorOccured && syncToLatestEventId && : table.getLastSyncedEventId() < eventId) { : table.setLastSyncedEventId(eventId); : } may be a cleaner way to do this is add if (syncToLatestEventId) table.setLastSyncedEventId(eventId); just before line 3942. You may have to keep a variable to track the number of partitions added and return that on line 3942. http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1613 PS31, Line 1613: // if sync to latest event id is enabled, then the event is skipped : // since table does not exist in cache : if (!BackendConfig.INSTANCE.enableCatalogdCacheSyncToLatestEventId()) { : assertEquals(EventProcessorStatus.NEEDS_INVALIDATE, : eventsProcessor_.getStatus()); : } Isn't this breaking an existing case when the flag is turned on? http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsFileMetadataTest.java File fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsFileMetadataTest.java: http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsFileMetadataTest.java@47 PS31, Line 47: import org.slf4j.Logger; : import org.slf4j.LoggerFactory; skip these changes if not needed? -- To view, visit http://gerrit.cloudera.org:8080/17756 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia822d15725d0a9a9ad1398e10ed4ae3288d0e9ad Gerrit-Change-Number: 17756 Gerrit-PatchSet: 31 Gerrit-Owner: Sourabh Goyal <soura...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Comment-Date: Mon, 20 Sep 2021 21:15:55 +0000 Gerrit-HasComments: Yes