Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 )
Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs ...................................................................... Patch Set 21: (26 comments) http://gerrit.cloudera.org:8080/#/c/20367/21//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20367/21//COMMIT_MSG@27 PS21, Line 27: Set 'enable_sync_to_latest_event_on_ddls' to true. > nit: more contents here? We just need to set the flag 'enable_sync_to_latest_event_on_ddls' right? I don't think we need anything else. http://gerrit.cloudera.org:8080/#/c/20367/21/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/20367/21/be/src/catalog/catalog-server.cc@139 PS21, Line 139: "cache_directive_id, cache_replication", "This configuration is used to whitelist " > nit: could you mention this change in the commit message? Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2676 PS21, Line 2676: tbl.setLastRefreshEventId(eventId, isSetLastSyncEventId); > Why do we still update lastRefreshEventId if isSetLastSyncEventId=false in Ack, regarding the 'isFullReload' naming convention. When the table is reloaded and it is not a fullReload, then we need to setRefreshEventId to currentHMSEventId so that older events can be skipped. http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2825 PS21, Line 2825: accessLevel_ = getAvailableAccessLevel(getFullName(), tblLocation, > This could be a heavy operation for large tables that have many partitions Ack. Agree with you. It happens today when sync_to_latest_events flag is not set. http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1623 PS21, Line 1623: !Objects.equals(beforeSd.getSerdeInfo(), afterSd.getSerdeInfo())) { > For supportability, we'd better add logs about why we return true here. Ack http://gerrit.cloudera.org:8080/#/c/20367/20/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20367/20/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6612 PS20, Line 6612: boolean syncToLatestEventId = > not used Ack http://gerrit.cloudera.org:8080/#/c/20367/20/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7098 PS20, Line 7098: if (update.isSetDebug_action()) { > We can remove this check since nulls are checked inside DebugUtils.executeD Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@404 PS21, Line 404: private final Metrics metrics_ = new Metrics(); > We will use this in concurrent DDL/DMLs. Based on the class comment, Metric Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1147 PS21, Line 1147: if (!syncToLatestEventId) { > Could you add a comment saying that HdfsTable is not updated in alterTableA Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1187 PS21, Line 1187: if (!syncToLatestEventId) { > Please also add a comment saying that HdfsTable is not updated in alterTabl Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4362 PS21, Line 4362: Also creates and adds new : * HdfsPartitions to the corresponding HdfsTable. > nit: add "if enableSyncToLatestEventOnDdls is false". Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@5234 PS21, Line 5234: * Also drops the corresponding partitions from its Hdfs table. > nit: add "if enableSyncToLatestEventOnDdls is false" Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1498 PS21, Line 1498: BackendConfig.INSTANCE.setInvalidateCatalogdHMSCacheOnDDLs(false); > nit: do we still need to set this flag in this test? Not required. Forgot to remove this. http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1511 PS21, Line 1511: eventsProcessor_.processEvents(); > Can we also verify all events are skipped? We can add a wrapper for this, e Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1513 PS21, Line 1513: tbl.getLastSyncedEventId() == eventsProcessor_.getCurrentEventId()); > Can we get the value of tbl.getLastSyncedEventId() before processEvents()? Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3533 PS21, Line 3533: throws ImpalaException > nit: remove "throws ImpalaException" Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3585 PS21, Line 3585: throws ImpalaException > nit: remove "throws ImpalaException" Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3638 PS21, Line 3638: throws ImpalaException { > nit: remove "throws ImpalaException" Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3772 PS21, Line 3772: throws ImpalaException > nit: remove "throws ImpalaException" Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3801 PS21, Line 3801: throws ImpalaException > nit: remove "throws ImpalaException" Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3889 PS21, Line 3889: throws ImpalaException > nit: remove "throws ImpalaException" Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3918 PS21, Line 3918: throws ImpalaException > nit: remove "throws ImpalaException". Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3947 PS21, Line 3947: throws ImpalaException { > nit: remove "throws ImpalaException" Ack http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java File fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java: http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@247 PS21, Line 247: prevSyncedEventId); > Both processEvents() at L232 and execDdlRequest() will update last synced e Ack http://gerrit.cloudera.org:8080/#/c/20367/21/tests/custom_cluster/test_sync_to_latest_hms_events.py File tests/custom_cluster/test_sync_to_latest_hms_events.py: http://gerrit.cloudera.org:8080/#/c/20367/21/tests/custom_cluster/test_sync_to_latest_hms_events.py@47 PS21, Line 47: Each test in : this class should start a hms_server using the catalogd flag --start_hms_server=true, : enable_sync_to_latest_event_on_ddls=true, invalidate_hms_cache_on_ddls=false. > This becomes stale now. Only enable_sync_to_latest_event_on_ddls=true is ne Ack http://gerrit.cloudera.org:8080/#/c/20367/21/tests/custom_cluster/test_sync_to_latest_hms_events.py@60 PS21, Line 60: def test_truncate_cleans_hdfs_files(self, unique_database): > These codes are copied from test_ddl.py. We'd better reuse the method inste Ack. I did refactoring wherever it is possible to do. -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 21 Gerrit-Owner: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <k.venureddy2...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Comment-Date: Fri, 05 Jan 2024 00:47:15 +0000 Gerrit-HasComments: Yes