Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
......................................................................


Patch Set 21:

(26 comments)

http://gerrit.cloudera.org:8080/#/c/20367/21//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20367/21//COMMIT_MSG@27
PS21, Line 27: Set 'enable_sync_to_latest_event_on_ddls' to true.
> nit: more contents here?
We just need to set the flag 'enable_sync_to_latest_event_on_ddls' right? I 
don't think we need anything else.


http://gerrit.cloudera.org:8080/#/c/20367/21/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/20367/21/be/src/catalog/catalog-server.cc@139
PS21, Line 139:     "cache_directive_id, cache_replication", "This 
configuration is used to whitelist "
> nit: could you mention this change in the commit message?
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2676
PS21, Line 2676:         tbl.setLastRefreshEventId(eventId, 
isSetLastSyncEventId);
> Why do we still update lastRefreshEventId if isSetLastSyncEventId=false in
Ack, regarding the 'isFullReload' naming convention.
When the table is reloaded and it is not a fullReload, then we need to 
setRefreshEventId to currentHMSEventId so that older events can be skipped.


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2825
PS21, Line 2825:         accessLevel_ = getAvailableAccessLevel(getFullName(), 
tblLocation,
> This could be a heavy operation for large tables that have many partitions
Ack. Agree with you. It happens today when sync_to_latest_events flag is not 
set.


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1623
PS21, Line 1623:             !Objects.equals(beforeSd.getSerdeInfo(), 
afterSd.getSerdeInfo())) {
> For supportability, we'd better add logs about why we return true here.
Ack


http://gerrit.cloudera.org:8080/#/c/20367/20/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20367/20/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6612
PS20, Line 6612:     boolean syncToLatestEventId =
> not used
Ack


http://gerrit.cloudera.org:8080/#/c/20367/20/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7098
PS20, Line 7098:         if (update.isSetDebug_action()) {
> We can remove this check since nulls are checked inside DebugUtils.executeD
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@404
PS21, Line 404:   private final Metrics metrics_ = new Metrics();
> We will use this in concurrent DDL/DMLs. Based on the class comment, Metric
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1147
PS21, Line 1147:             if (!syncToLatestEventId) {
> Could you add a comment saying that HdfsTable is not updated in alterTableA
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1187
PS21, Line 1187:             if (!syncToLatestEventId) {
> Please also add a comment saying that HdfsTable is not updated in alterTabl
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4362
PS21, Line 4362: Also creates and adds new
               :    * HdfsPartitions to the corresponding HdfsTable.
> nit: add "if enableSyncToLatestEventOnDdls is false".
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@5234
PS21, Line 5234:    * Also drops the corresponding partitions from its Hdfs 
table.
> nit: add "if enableSyncToLatestEventOnDdls is false"
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1498
PS21, Line 1498:       
BackendConfig.INSTANCE.setInvalidateCatalogdHMSCacheOnDDLs(false);
> nit: do we still need to set this flag in this test?
Not required. Forgot to remove this.


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1511
PS21, Line 1511:       eventsProcessor_.processEvents();
> Can we also verify all events are skipped? We can add a wrapper for this, e
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1513
PS21, Line 1513:           tbl.getLastSyncedEventId() == 
eventsProcessor_.getCurrentEventId());
> Can we get the value of tbl.getLastSyncedEventId() before processEvents()?
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3533
PS21, Line 3533: throws ImpalaException
> nit: remove "throws ImpalaException"
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3585
PS21, Line 3585: throws ImpalaException
> nit: remove "throws ImpalaException"
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3638
PS21, Line 3638:       throws ImpalaException {
> nit: remove "throws ImpalaException"
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3772
PS21, Line 3772: throws ImpalaException
> nit: remove "throws ImpalaException"
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3801
PS21, Line 3801: throws ImpalaException
> nit: remove "throws ImpalaException"
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3889
PS21, Line 3889: throws ImpalaException
> nit: remove "throws ImpalaException"
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3918
PS21, Line 3918: throws ImpalaException
> nit: remove "throws ImpalaException".
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3947
PS21, Line 3947:       throws ImpalaException {
> nit: remove "throws ImpalaException"
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
File 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java:

http://gerrit.cloudera.org:8080/#/c/20367/21/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@247
PS21, Line 247:                 prevSyncedEventId);
> Both processEvents() at L232 and execDdlRequest() will update last synced e
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/tests/custom_cluster/test_sync_to_latest_hms_events.py
File tests/custom_cluster/test_sync_to_latest_hms_events.py:

http://gerrit.cloudera.org:8080/#/c/20367/21/tests/custom_cluster/test_sync_to_latest_hms_events.py@47
PS21, Line 47: Each test in
             :   this class should start a hms_server using the catalogd flag 
--start_hms_server=true,
             :   enable_sync_to_latest_event_on_ddls=true, 
invalidate_hms_cache_on_ddls=false.
> This becomes stale now. Only enable_sync_to_latest_event_on_ddls=true is ne
Ack


http://gerrit.cloudera.org:8080/#/c/20367/21/tests/custom_cluster/test_sync_to_latest_hms_events.py@60
PS21, Line 60:   def test_truncate_cleans_hdfs_files(self, unique_database):
> These codes are copied from test_ddl.py. We'd better reuse the method inste
Ack. I did refactoring wherever it is possible to do.



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 21
Gerrit-Owner: Sai Hemanth Gantasala <saihema...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <k.venureddy2...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Jan 2024 00:47:15 +0000
Gerrit-HasComments: Yes

Reply via email to