Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 )
Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints ...................................................................... Patch Set 19: (23 comments) http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG@7 PS12, Line 7: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing : DDL operations via catalog HMS endpoints > Ack ping http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2672 PS12, Line 2672: per.getT > I think the intention for this change was - currently for removeTable() api I am not sure I fully understand. If this change is not needed for the patch, please remove this. It makes sense to me serialize write operations on the table. It is not clear to me why 2 read operations on a table should be serialized on the read lock for version number. The code doesn't change the table version number. Any concurrent table modification operation atomically replaces the table which is modified. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@288 PS19, Line 288: syncToLatestEventFactory_ Based on my understanding this field here is unnecessary and is complicating the code unnecessarily. Can we remove this field altogether? From what I looked at, this is used in MetastoreEventsProcessor.syncToLatestEventId() which is called from tableLoader and CatalogMetastoreServiceHandler. Within the method MetastoreEventsProcessor.syncToLatestEventId() it is only used to get the events so ultimately, the only purpose of having this is to disable isSelfEvent() method for certain event types. Why can't we just do it in the isSelfEvent() itself (look for enable_sync_to_latest_event_on_ddls and if it enabled return false.) The advantage of doing this would be we would get rid of this unnecessarily complicated way to initialize this factory and then inject in the Catalog from JniCatalog. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@316 PS19, Line 316: numLoadingThreads_ this is unused and can be removed. http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java File fe/src/main/java/org/apache/impala/catalog/Db.java: http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java@115 PS6, Line 115: > The intention was - by making lastSyncedEventId_ volatile, the get and set All the places where I see this variable getting accessed, I see that it is under the db lock. Am I missing something? http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java File fe/src/main/java/org/apache/impala/catalog/Db.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java@134 PS19, Line 134: Preconditions.checkState(eventId <= lastSyncedEventId_, : String.format("create event id: %s to be set for db %s should " : + "be <= lastSyncedEvent id: %s", eventId, getName(), lastSyncedEventId_)); : */ Pls remove if not needed anymore. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java@153 PS19, Line 153: /* : Preconditions.checkState(eventId >= createEventId_, : String.format("last synced event id: %s to be set for db %s should " : + "be >= createEvent id: %s", eventId, getName(), createEventId_)); : */ pls remove if not needed. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java@186 PS12, Line 186: > nit, add a comment on what this field represents and why it is volatile. Thanks for adding the comment. But similar to the Db.java's volatile keyword, I don't see any code which is accessing this variable without a lock on the table object. I would prefer to not keep it volatile unless it is absolutely necessary since AFAIK, introducing volatile keyword will disable certain compiler optimizations not just for this variable but for other variables as well to make sure that the "happens-before" relationship is maintained. See https://www.baeldung.com/java-volatile#happens-before for example. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@60 PS19, Line 60: import org.apache.log4j.Logger; please remove this http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@77 PS19, Line 77: org.slf4j. this is not needed if you remove line 60 http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@213 PS19, Line 213: Preconditions.checkState(eventId <= lastSyncedEventId_, : String.format("create event id: %s to be set for table %s should " : + "be <= lastSyncedEvent id: %s", eventId, getFullName(), : lastSyncedEventId_)); : */ remove if not needed. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@220 PS19, Line 220: // TODO: Should we reset lastSyncedEvent Id if it is less than event Id? : // If we don't reset it - we may start syncing table from an event id which : // is less than create event id the createEventId is set when the table is created, I am not sure when would it happen that lastSyncedEventId will be different than -1 in such a case. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@234 PS19, Line 234: Preconditions.checkState(eventId >= createEventId_, : String.format("last synced event id: %s to be set for table %s " : + "should be >= createEvent id: %s", eventId, getFullName(), : createEventId_)); : */ pls remove if not needed. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoader.java File fe/src/main/java/org/apache/impala/catalog/TableLoader.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@170 PS12, Line 170: initMetrics > Are you suggesting to pass metrics as null in I was suggesting that we should not pass the metrics in which case we would use the MetastoreEventsProcessor's metrics object. In CatalogMetastoreServiceHandler, we can explicitly pass a metrics object since the intent there is to have separate metrics eventually. In the long run these metrics should be moved to table objects instead of keeping them at a eventsprocessor or metastoreServiceHandler's level. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@34 PS19, Line 34: import remove if not necessary? In fact all the changes to this file can be removed from the patch since I don't see any relevant code changes related to this patch. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@461 PS12, Line 461: > Sorry I didn't understand your comment Actually, I see why you have a new abstract method now. However, instead of calling it from processIfEnabled() it would be cleaner in my opinion to have all this logic in one single method for readability reasons. isSelfEvent() is already hooked with the metrics correctly so it would make sense to just modify that to something like below: boolean isSelfEvent() { boolean canBeSkipped = false; if (BackendConfig.INSTANCE.enableSyncToLatestEventOnDdls()) { canBeSkipped = shouldSkipWhenSyncingToLatestEventId(); } else { canBeSkipped = catalog_.evaluateSelfEvent(getSelfEventContext())); } if (canBeSkipped) { metrics_.getCounter(MetastoreEventsProcessor.EVENTS_SKIPPED_METRIC) .inc(getNumberOfEvents()); infoLog("Incremented events skipped counter to {}", metrics_.getCounter(MetastoreEventsProcessor.EVENTS_SKIPPED_METRIC) .getCount()); } return canBeSkipped; } http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@316 PS19, Line 316: EventFactoryForSyncToLatestEvent Ideally would be great to get rid of this class since all it is really doing is to return false on isSelfEvent() for some events which can be done using the config check directly in MetastoreEventFactory http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@479 PS19, Line 479: BackendConfig pls see my comment about moving this logic into isSelfEvent() so that there is only one place where all the self-event evaluation (both legacy and new way). http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@875 PS19, Line 875: tbl.getLastSyncedEventId() == -1 Why do we need to logically AND with this condition here? http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@371 PS19, Line 371: if (currentEvent.isDropEvent()) { if we are stopping at the drop event it would be weird I feel. For instance, after table load there would not be any table. I think it would make sense to keep things simple and process all the events. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@435 PS19, Line 435: if (currentEvent.isDropEvent()) { same as previous comment. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1939 PS19, Line 1939: null I am surprised we are not hitting NPE due to this and other similar calls below. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/service/JniCatalog.java@40 PS19, Line 40: import org.apache.impala.catalog.TableLoadingMgr; : import org.apache.impala.catalog.events.EventFactory; please remove the unused imports -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36364e401911352c4666674eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 19 Gerrit-Owner: Sourabh Goyal <soura...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <kis...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com> Gerrit-Comment-Date: Thu, 21 Oct 2021 18:25:35 +0000 Gerrit-HasComments: Yes