Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 )
Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs ...................................................................... Patch Set 12: (10 comments) http://gerrit.cloudera.org:8080/#/c/20367/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20367/7//COMMIT_MSG@29 PS7, Line 29: et 'file_metadata_reload_pro > https://github.com/apache/impala/blob/master/be/src/catalog/catalog-server. 'invalidate_hms_cache_on_ddls' is used only if 'start_hms_server' is true. 'start_hms_server' is false by default. So I don't think we should set 'invalidate_hms_cache_on_ddls' to false here. Or do we have any code change depend on this flag? http://gerrit.cloudera.org:8080/#/c/20367/7//COMMIT_MSG@31 PS7, Line 31: : Note: We don't modify the cache using MetastoreEventsProcessor for : alter table rename > Done. https://issues.apache.org/jira/browse/IMPALA-12553 Yeah, we can mention it. http://gerrit.cloudera.org:8080/#/c/20367/12/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/20367/12/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2816 PS12, Line 2816: accessLevel_ = getAvailableAccessLevel(getFullName(), tblLocation, Could you add a comment on why this is not updated in processing the events (so we need it here)? Do we need the same for HdfsPartition? http://gerrit.cloudera.org:8080/#/c/20367/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20367/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@766 PS12, Line 766: nit: duplicated whitespaces http://gerrit.cloudera.org:8080/#/c/20367/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1530 PS12, Line 1530: skipTableMetadataReload_ = canSkipTableMetadataReload(tableBefore_, tableAfter_); It's unclear to me why we can skip reloading the HMS table if StorageDescriptor changes. Could you add some comments? BTW, if this is not needed in this patch, can we add them in a separate patch? http://gerrit.cloudera.org:8080/#/c/20367/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20367/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6667 PS7, Line 6667: // ACID tables, there is a Jira to cover this: HIVE-22062. : // 2: If no need for a full table reload then fetch partition level : // writeIds and reload only > https://gerrit.cloudera.org/c/20367/4/fe/src/main/java/org/apache/impala/se Sorry that I was wrong in that comment since I didn't consider the use case of REFRESH for file-only modification. For such case, there are no events to catch up the changes. But it seems we are good since catalog_.reloadTable() already updates the lastSyncedEventId. http://gerrit.cloudera.org:8080/#/c/20367/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6690 PS7, Line 6690: if (updatedThriftTable == null) { > Partition-level refresh events are updated in place in the cache. Here is t The test just adds a new partition in HMS and then runs partition-level REFRESH. What if there are events on other partitions? Do we apply them as well, so partition-level REFRESH will update the whole table? To be specific, if we modify several partitions outside Impala and run partition-level REFRESH on only one partition, what's the behavior when syncToLatestEventId is on? The modification can be HMS changes or file-only changes. http://gerrit.cloudera.org:8080/#/c/20367/12/tests/custom_cluster/test_sync_to_latest_hms_events.py File tests/custom_cluster/test_sync_to_latest_hms_events.py: http://gerrit.cloudera.org:8080/#/c/20367/12/tests/custom_cluster/test_sync_to_latest_hms_events.py@35 PS12, Line 35: "--start_hms_server=true "\ : "--hms_port=5899 "\ : "--fallback_to_hms_on_errors=true "\ : "--invalidate_hms_cache_on_ddls=false "\ I don't see we have HMS clients connecting to this port (5899) in this test. Do we really need these flags? http://gerrit.cloudera.org:8080/#/c/20367/12/tests/custom_cluster/test_sync_to_latest_hms_events.py@57 PS12, Line 57: @classmethod : def get_workload(self): : return 'functional-query' nit: move this after the below comment. http://gerrit.cloudera.org:8080/#/c/20367/12/tests/metadata/test_recover_partitions.py File tests/metadata/test_recover_partitions.py: http://gerrit.cloudera.org:8080/#/c/20367/12/tests/metadata/test_recover_partitions.py@261 PS12, Line 261: @SkipIfCatalogV2.impala_8489() We can remove this since IMPALA-8489 has been resolved. -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 12 Gerrit-Owner: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <k.venureddy2...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Comment-Date: Tue, 05 Dec 2023 08:22:16 +0000 Gerrit-HasComments: Yes