Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17756 )

Change subject: [WIP]: Initial commit: Add logic to sync to latest event id
......................................................................


Patch Set 31:

(12 comments)

I will take another pass at this today or tomorrow since this patch is pretty 
big.

http://gerrit.cloudera.org:8080/#/c/17756/31/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/17756/31/be/src/catalog/catalog-server.cc@116
PS31, Line 116: enable_catalogd_cache_sync_to_latest_event_id
may be a better name would be enable_sync_to_latest_event_on_ddls. The 
"catalogd_cache" part of this config name is redundant.


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2141
PS31, Line 2141:       if (tbl == null) return null;
               :       LOG.debug("table {} exits in cache, last synced id {}", 
tbl.getFullName(),
               :           tbl.getLastSyncedEventId());
               :       // if no validWriteIdList is provided, we return the tbl 
if its loaded
               :       if (tbl.isLoaded() && validWriteIdList == null) {
               :         LOG.debug("returning already loaded table {}", 
tbl.getFullName());
               :         return tbl;
               :       }
can you change these to trace? They might spew a lot of logs otherwise.


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/TableLoader.java
File fe/src/main/java/org/apache/impala/catalog/TableLoader.java:

http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@62
PS31, Line 62: eventFactory_
It would be good to avoid having eventFactory_ here since it leaks events 
processor details out in unrelated classes like TableLoader.


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@188
PS31, Line 188: initMetrics
This is a weird place to have this method? Why are we initializing events 
processor metrics in TableLoader?


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@269
PS31, Line 269: NewMetastoreEventsFactory
Do we need a new factory? Can we not override the isSelfEvent() method directly 
in the MetastoreEvent.isSelfEvent() to look for 
enable_catalogd_cache_sync_to_latest_event_id config and return false early.


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@841
PS31, Line 841:       if (!catalog_.tryLockDb(db)) {
              :         throw new CatalogException(String.format("Couldn't 
acquire lock on db %s",
              :             db.getName()));
              :       }
              :       catalog_.getLock().writeLock().unlock();
              :       boolean shouldSkip = false;
              :       if (db.getLastSyncedEventId() >= eventId) {
              :         LOG.info("Skipping event {} on db {} since it is 
already synced till event id {}",
              :             eventInfo, dbName_, eventId);
              :         shouldSkip = true;
              :       }
              :       db.getLock().unlock();
please use try .. finally pattern when dealing with locks. Also make sure you 
have some form of UnlockWriteLockIfErronouslyLocked() in the finally block.


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@264
PS31, Line 264: import
not sure why are we commenting these?


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@316
PS31, Line 316: metastoreEventsMetrics_
I think it would be very confusing to have same named metrics at 2 different 
places in the code. Why can't we reuse existing metastore event metrics?


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@438
PS31, Line 438:       LOG.info("Adding db {} to catalogd, current event id {}", 
msDb.getName(),
              :           currentEventId);
              :       addDbToCatalog(currentEventId, msDb);
Why do we need to add the db here? Shouldn't syncToLatestEventId(db, apiName) 
do it for us?


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3948
PS31, Line 3948:       //  set table's last sycned event id  if no error 
occurred and
               :       //  table's last synced event id < current event id
               :       if (!errorOccured && syncToLatestEventId &&
               :           table.getLastSyncedEventId() < eventId) {
               :         table.setLastSyncedEventId(eventId);
               :       }
may be a cleaner way to do this is add
if (syncToLatestEventId) table.setLastSyncedEventId(eventId);
just before line 3942. You may have to keep a variable to track the number of 
partitions added and return that on line 3942.


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1613
PS31, Line 1613:         // if sync to latest event id is enabled, then the 
event is skipped
               :         // since table does not exist in cache
               :         if 
(!BackendConfig.INSTANCE.enableCatalogdCacheSyncToLatestEventId()) {
               :           assertEquals(EventProcessorStatus.NEEDS_INVALIDATE,
               :               eventsProcessor_.getStatus());
               :         }
Isn't this breaking an existing case when the flag is turned on?


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsFileMetadataTest.java
File 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsFileMetadataTest.java:

http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsFileMetadataTest.java@47
PS31, Line 47: import org.slf4j.Logger;
             : import org.slf4j.LoggerFactory;
skip these changes if not needed?



--
To view, visit http://gerrit.cloudera.org:8080/17756
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia822d15725d0a9a9ad1398e10ed4ae3288d0e9ad
Gerrit-Change-Number: 17756
Gerrit-PatchSet: 31
Gerrit-Owner: Sourabh Goyal <soura...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Sep 2021 21:15:55 +0000
Gerrit-HasComments: Yes

Reply via email to