Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
for all DDLS from Impala clients
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1388
PS3, Line 1388: database
nit: "table"


http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1396
PS3, Line 1396:       tbl.setMetaStoreTable(msTbl);
We need the table write lock to do this. Please add a precondition check at the 
beginning of the method:

 Preconditions.checkState(tbl.isWriteLockedByCurrentThread());


http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1033
PS3, Line 1033:     tryWriteLock(tbl);
Why do we move tryWriteLock() after getCatalogVersion()?


http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1040
PS3, Line 1040:       tbl = catalog_.updateMetastoreTable(msTbl, tbl);
I might miss something so got confused. After IMPALA-10926, in 
CatalogMetastoreServiceHandler we perform the catalog operation in the 
following steps:

1. Acquire write lock on the table
2. Perform ddl operation in HMS
3. Sync table till the latest event id (as per HMS) since its last synced event 
id. Also update the last-synced-event-id at the end.

#2 updates HMS and #3 updates the local catalog cache based on the events only. 
Catalogd won't modify the table cache based on the ddl request. Changes from 
different sources (local impala, external impala or other HMS clients) are 
applied in the order as they are executed in HMS, which brings us a higher 
consistency.

However, the current patch just fetch the latest HMS object and use it to 
update the local cache before applying the ddl request. The 
last-synced-event-id is not updated at the end. This doesn't match what you 
mentioned in the commit message:

> Currently catalogD applies any updates received from Impala clients in-place.
Instead it should perform an HMS operation first and then replay all
the HMS events since the last synced event id.

Could you explain more about the current solution?



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala <saihema...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <k.venureddy2...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Oct 2023 11:01:02 +0000
Gerrit-HasComments: Yes

Reply via email to