Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 )
Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id after the DDL operation for all DDLS from Impala clients ...................................................................... Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/20367/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/20367/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@352 PS5, Line 352: metrics_ Metrics is not thread-safe. We shouldn't use the one of event-processor. I think we should create an individual Metrics object for each DDL/DML operation and pass it here. In the future, we can show those metrics in the DDL profiles. http://gerrit.cloudera.org:8080/#/c/20367/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@358 PS5, Line 358: * @param catalog : * @param tbl: Catalog table to be synced : * @param eventFactory : * @param metrics : * @throws CatalogException : * @throws MetastoreNotificationException nit: remove empty parameter comments. They seem verbose. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1276 PS4, Line 1276: } > This is used to identify if the events are from other Impala services. This is used to identify if the events are from the current Impala service. I think we can skip these self-event codes when syncToLatestEventId is on. If the lastSyncEventId are always updated correctly, all stale events will be skipped, and self events is one kind of stale events. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7004 PS4, Line 7004: // Add catalog service id and the 'newCatalogVersion' to the table parameters. : // This way we can avoid reloading the table on self-events (Iceberg generates : // an ALTER TABLE statement to set the new metadata_location). : IcebergCatalogOpExecutor.addCatalogVersionToTxn(iceTxn, : catalog_.getCatalogServiceId(), newCatalogVersion); : > Ack. Reverted the change for insert operations. +1 for this. I don't see this is addressed in the latest patch set (PS5). I think what Csaba means is when syncToLatestEventId is enabled, the catch-clause should return an error. The reason is without the events being fired, we can't apply the changes in syncToLatestEventId(). http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7031 PS4, Line 7031: } > I wasn't too sure about Iceberg tables so I didn't touch this part. Also, l +1 for Csaba's suggestion. 'table' is an object of org.apache.impala.catalog.Table which has 'lastSyncEventId'. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7040 PS4, Line 7040: /** : * Populates insert event data and calls fireInsertEvents() if external event : * processing is enabled. : * This method is replicating what Hive does in case a table or partition is inserts : * into. There are 2 cases: : * 1. If the table is transactional, we should first generate ADD_PARTITION events : * for new partitions which are generated. This is taken care of in the updateCatalog : * method. Additionally, for each partition including existing partitions which were : * inserted into, this method creates an ACID_WRITE event using the HMS API : * ad > IMPALA-10926, https://gerrit.cloudera.org/c/17859/42/fe/src/main/java/org/a I think to be consistent, we should do this for all catalog operations, including DDL, DML and REFRESH/INVALIDATE. The flag of syncToLatestEventId means, whenever an operation is performed on a table, we sync its metadata to the latest state and update its lastSyncedEventId. -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 5 Gerrit-Owner: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <k.venureddy2...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Comment-Date: Fri, 03 Nov 2023 09:38:27 +0000 Gerrit-HasComments: Yes