Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 )
Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints ...................................................................... Patch Set 20: (20 comments) http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG@7 PS12, Line 7: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing : DDL operations via catalog HMS endpoints > ping @Vihang: I already have this in my to-do list. Will write a detailed commit message. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2672 PS12, Line 2672: lse; > I am not sure I fully understand. If this change is not needed for the patc I understand and as discussed over call, I will remove this method http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@288 PS19, Line 288: syncToLatestEventFactory_ > Based on my understanding this field here is unnecessary and is complicatin @Vihang: We do need to pass event factory object to syncToLatestEventId() in metastoreEventProcessor since it is a static method. However I agree that we should not create a new event factory and instead modify isSelfEvent() to accomodate sync to latest event id flag. One issue that I see is - if event processing is disabled then MetastoreEventProcessor is not initialized and there would be no way to access eventFactory object from it. Few ways to solve this: 1. Decouple MetastoreEventProcessor and EventFactory creation. In JniCatalog, we can create a common event factory that would be set in EventProcessor as well as used elsewhere. Doing so, we need to make sure that the factory is thread safe. 2. Make MetastoreEventFactory singleton and use it from all the places. This would avoid JniCatalog route. I had tried this approach in the past and encountered some test failures. Didn't investigate the failures in depth but it appeared to be race conditions issues. I can take a shot at it again if the approach seems cleaner. Let me know your thoughts. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@316 PS19, Line 316: > this is unused and can be removed. Ack http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java File fe/src/main/java/org/apache/impala/catalog/Db.java: http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java@115 PS6, Line 115: > All the places where I see this variable getting accessed, I see that it is As discussed over call, we will keep it as volatile so that EventProcessor can check if it needs to process an event or not without acquiring readlock on db/table. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java File fe/src/main/java/org/apache/impala/catalog/Db.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java@134 PS19, Line 134: LOG.debug("createEventId_ for db: {} set to: {}", getName(), createEventId_); : if (lastSyncedEventId_ < eventId) { : setLastSyncedEventId(eventId); : } > Pls remove if not needed anymore. Sure. I had added this check earlier but then saw some failures in the tests related to ddls from Impala shell . For now, I will add a TODO comment and we can address it in a follow up jira. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java@153 PS19, Line 153: /** : * Creates a Db object with no tables based on the given TDatabase thrift struct. : */ : public static Db fromTDatabase(TDatabase db) { : ret > pls remove if not needed. Same as previous comment http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java@186 PS12, Line 186: > Thanks for adding the comment. But similar to the Db.java's volatile keywor As discussed, we will keep the variable as volatile so that event processor can read it (to check if event should be skipped or not) without acquiring read lock on table object. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java@212 PS12, Line 212: crea > nit, single line space. Ack http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@60 PS19, Line 60: > please remove this Ack http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@77 PS19, Line 77: Logger LOG > this is not needed if you remove line 60 Ack http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@220 PS19, Line 220: } : : public long getLastSyncedEventId( > the createEventId is set when the table is created, I am not sure when woul I did see an occurrence of this scenario in catalogOpExecutor code but don't remember clearly now. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@234 PS19, Line 234: * Returns if the given HMS table is an external table (uses table type if : * available or else uses table properties). Implementation is based on org.apache : * .hadoop.hive.metastore.utils.MetaStoreUtils.isExternalTable() : */ : publi > pls remove if not needed. Sure I will add a TODO for now and we can take it up in a follow up jira. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoader.java File fe/src/main/java/org/apache/impala/catalog/TableLoader.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@170 PS12, Line 170: alidate met > I was suggesting that we should not pass the metrics in which case we would I understand. However reusing MetastoreEventProcessor's metrics seems to be tricky because: What if event processing is disabled? In that case eventProcessor will be an instance of NoOpEventProcessor. Also ExternalEventProcessor currently does not have getMetrics() api. We can expose this api and create a default metrics for NoOpEventProcessor as well. Let me know what you think. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@34 PS19, Line 34: import > remove if not necessary? In fact all the changes to this file can be remove Sure. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@461 PS12, Line 461: return dbName_; } > Actually, I see why you have a new abstract method now. However, instead of As discussed over call, we will keep this method separate from isSelfEvent() because when sync to latest event id is enabled by default, we can then get rid of self event logic http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@875 PS19, Line 875: > Why do we need to logically AND with this condition here? As discussed, I will add a comment for the same. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@371 PS19, Line 371: * @param db > if we are stopping at the drop event it would be weird I feel. For instance As discussed over call, there are few complications in processing events beyond drop event. For example - as a result of drop event, the table would be dropped from the cache. If we continue processing next events, the first such event would be create_table. While processing that event, we would again have to acquire write lock on the new table. But this method assumes that the write lock is already acquired. For now, we have discussed to break after drop event. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1939 PS19, Line 1939: pro > I am surprised we are not hitting NPE due to this and other similar calls b It is because we are not processing these events. But I agree that we shouldn't pass null metrics. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/service/JniCatalog.java@40 PS19, Line 40: import org.apache.impala.catalog.events.ExternalEventsProcessor; : import org.apache.impala.catalog.events.MetastoreEven > please remove the unused imports Sure. -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36364e401911352c4666674eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 20 Gerrit-Owner: Sourabh Goyal <soura...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <kis...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com> Gerrit-Comment-Date: Fri, 22 Oct 2021 18:01:10 +0000 Gerrit-HasComments: Yes