Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
after the DDL operation for all DDLS from Impala clients
......................................................................


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/20367/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
File 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/20367/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@352
PS5, Line 352: metrics_
Metrics is not thread-safe. We shouldn't use the one of event-processor.
I think we should create an individual Metrics object for each DDL/DML 
operation and pass it here. In the future, we can show those metrics in the DDL 
profiles.


http://gerrit.cloudera.org:8080/#/c/20367/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@358
PS5, Line 358:    * @param catalog
             :    * @param tbl: Catalog table to be synced
             :    * @param eventFactory
             :    * @param metrics
             :    * @throws CatalogException
             :    * @throws MetastoreNotificationException
nit: remove empty parameter comments. They seem verbose.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1276
PS4, Line 1276:       }
> This is used to identify if the events are from other Impala services.
This is used to identify if the events are from the current Impala service. I 
think we can skip these self-event codes when syncToLatestEventId is on. If the 
lastSyncEventId are always updated correctly, all stale events will be skipped, 
and self events is one kind of stale events.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7004
PS4, Line 7004:           // Add catalog service id and the 'newCatalogVersion' 
to the table parameters.
              :           // This way we can avoid reloading the table on 
self-events (Iceberg generates
              :           // an ALTER TABLE statement to set the new 
metadata_location).
              :           
IcebergCatalogOpExecutor.addCatalogVersionToTxn(iceTxn,
              :               catalog_.getCatalogServiceId(), 
newCatalogVersion);
              :
> Ack. Reverted the change for insert operations.
+1 for this. I don't see this is addressed in the latest patch set (PS5). I 
think what Csaba means is when syncToLatestEventId is enabled, the catch-clause 
should return an error. The reason is without the events being fired, we can't 
apply the changes in syncToLatestEventId().


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7031
PS4, Line 7031:     }
> I wasn't too sure about Iceberg tables so I didn't touch this part. Also, l
+1 for Csaba's suggestion. 'table' is an object of 
org.apache.impala.catalog.Table which has 'lastSyncEventId'.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7040
PS4, Line 7040:   /**
              :    * Populates insert event data and calls fireInsertEvents() 
if external event
              :    * processing is enabled.
              :    * This method is replicating what Hive does in case a table 
or partition is inserts
              :    * into. There are 2 cases:
              :    * 1. If the table is transactional, we should first generate 
ADD_PARTITION events
              :    * for new partitions which are generated. This is taken care 
of in the updateCatalog
              :    * method. Additionally, for each partition including 
existing partitions which were
              :    * inserted into, this method creates an ACID_WRITE event 
using the HMS API
              :    * ad
> IMPALA-10926, https://gerrit.cloudera.org/c/17859/42/fe/src/main/java/org/a
I think to be consistent, we should do this for all catalog operations, 
including DDL, DML and REFRESH/INVALIDATE. The flag of syncToLatestEventId 
means, whenever an operation is performed on a table, we sync its metadata to 
the latest state and update its lastSyncedEventId.



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 5
Gerrit-Owner: Sai Hemanth Gantasala <saihema...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <k.venureddy2...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Nov 2023 09:38:27 +0000
Gerrit-HasComments: Yes

Reply via email to