Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 )
Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients ...................................................................... Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1388 PS3, Line 1388: database nit: "table" http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1396 PS3, Line 1396: tbl.setMetaStoreTable(msTbl); We need the table write lock to do this. Please add a precondition check at the beginning of the method: Preconditions.checkState(tbl.isWriteLockedByCurrentThread()); http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1033 PS3, Line 1033: tryWriteLock(tbl); Why do we move tryWriteLock() after getCatalogVersion()? http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1040 PS3, Line 1040: tbl = catalog_.updateMetastoreTable(msTbl, tbl); I might miss something so got confused. After IMPALA-10926, in CatalogMetastoreServiceHandler we perform the catalog operation in the following steps: 1. Acquire write lock on the table 2. Perform ddl operation in HMS 3. Sync table till the latest event id (as per HMS) since its last synced event id. Also update the last-synced-event-id at the end. #2 updates HMS and #3 updates the local catalog cache based on the events only. Catalogd won't modify the table cache based on the ddl request. Changes from different sources (local impala, external impala or other HMS clients) are applied in the order as they are executed in HMS, which brings us a higher consistency. However, the current patch just fetch the latest HMS object and use it to update the local cache before applying the ddl request. The last-synced-event-id is not updated at the end. This doesn't match what you mentioned in the commit message: > Currently catalogD applies any updates received from Impala clients in-place. Instead it should perform an HMS operation first and then replay all the HMS events since the last synced event id. Could you explain more about the current solution? -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 3 Gerrit-Owner: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <k.venureddy2...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Comment-Date: Thu, 19 Oct 2023 11:01:02 +0000 Gerrit-HasComments: Yes