Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
......................................................................


Patch Set 12:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/20367/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20367/7//COMMIT_MSG@29
PS7, Line 29: et 'file_metadata_reload_pro
> https://github.com/apache/impala/blob/master/be/src/catalog/catalog-server.
'invalidate_hms_cache_on_ddls' is used only if 'start_hms_server' is true. 
'start_hms_server' is false by default. So I don't think we should set 
'invalidate_hms_cache_on_ddls' to false here. Or do we have any code change 
depend on this flag?


http://gerrit.cloudera.org:8080/#/c/20367/7//COMMIT_MSG@31
PS7, Line 31:
            : Note: We don't modify the cache using MetastoreEventsProcessor for
            : alter table rename
> Done. https://issues.apache.org/jira/browse/IMPALA-12553
Yeah, we can mention it.


http://gerrit.cloudera.org:8080/#/c/20367/12/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/20367/12/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2816
PS12, Line 2816:         accessLevel_ = getAvailableAccessLevel(getFullName(), 
tblLocation,
Could you add a comment on why this is not updated in processing the events (so 
we need it here)? Do we need the same for HdfsPartition?


http://gerrit.cloudera.org:8080/#/c/20367/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20367/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@766
PS12, Line 766:
nit: duplicated whitespaces


http://gerrit.cloudera.org:8080/#/c/20367/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1530
PS12, Line 1530:       skipTableMetadataReload_ = 
canSkipTableMetadataReload(tableBefore_, tableAfter_);
It's unclear to me why we can skip reloading the HMS table if StorageDescriptor 
changes. Could you add some comments?

BTW, if this is not needed in this patch, can we add them in a separate patch?


http://gerrit.cloudera.org:8080/#/c/20367/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20367/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6667
PS7, Line 6667:               //     ACID tables, there is a Jira to cover 
this: HIVE-22062.
              :               //   2: If no need for a full table reload then 
fetch partition level
              :               //     writeIds and reload only
> https://gerrit.cloudera.org/c/20367/4/fe/src/main/java/org/apache/impala/se
Sorry that I was wrong in that comment since I didn't consider the use case of 
REFRESH for file-only modification. For such case, there are no events to catch 
up the changes. But it seems we are good since catalog_.reloadTable() already 
updates the lastSyncedEventId.


http://gerrit.cloudera.org:8080/#/c/20367/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6690
PS7, Line 6690:       if (updatedThriftTable == null) {
> Partition-level refresh events are updated in place in the cache. Here is t
The test just adds a new partition in HMS and then runs partition-level 
REFRESH. What if there are events on other partitions? Do we apply them as 
well, so partition-level REFRESH will update the whole table?

To be specific, if we modify several partitions outside Impala and run 
partition-level REFRESH on only one partition, what's the behavior when 
syncToLatestEventId is on? The modification can be HMS changes or file-only 
changes.


http://gerrit.cloudera.org:8080/#/c/20367/12/tests/custom_cluster/test_sync_to_latest_hms_events.py
File tests/custom_cluster/test_sync_to_latest_hms_events.py:

http://gerrit.cloudera.org:8080/#/c/20367/12/tests/custom_cluster/test_sync_to_latest_hms_events.py@35
PS12, Line 35: "--start_hms_server=true "\
             :                                      "--hms_port=5899 "\
             :                                      
"--fallback_to_hms_on_errors=true "\
             :                                      
"--invalidate_hms_cache_on_ddls=false "\
I don't see we have HMS clients connecting to this port (5899) in this test. Do 
we really need these flags?


http://gerrit.cloudera.org:8080/#/c/20367/12/tests/custom_cluster/test_sync_to_latest_hms_events.py@57
PS12, Line 57:   @classmethod
             :   def get_workload(self):
             :     return 'functional-query'
nit: move this after the below comment.


http://gerrit.cloudera.org:8080/#/c/20367/12/tests/metadata/test_recover_partitions.py
File tests/metadata/test_recover_partitions.py:

http://gerrit.cloudera.org:8080/#/c/20367/12/tests/metadata/test_recover_partitions.py@261
PS12, Line 261:   @SkipIfCatalogV2.impala_8489()
We can remove this since IMPALA-8489 has been resolved.



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 12
Gerrit-Owner: Sai Hemanth Gantasala <saihema...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <k.venureddy2...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 Dec 2023 08:22:16 +0000
Gerrit-HasComments: Yes

Reply via email to