[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients

2023-10-27 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
for all DDLS from Impala clients
..


Patch Set 4:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1045
PS4, Line 1045:   alterTableOrViewRename(tbl,
  :   
TableName.fromThrift(params.getRename_params().getNew_table_name()),
  :   newCatalogVersion, wantMinimalResult, response);
  :   return;
I don't see this change applied to renames, while the commit message suggests 
that it is applied to all DDLs.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1263
PS4, Line 1263:   boolean syncToLatestEventId =
  :   
BackendConfig.INSTANCE.enableSyncToLatestEventOnDdls();
  :   if (syncToLatestEventId) {
  : MetastoreEventsProcessor.syncToLatestEventId(catalog_, 
tbl,
  : catalog_.getEventFactoryForSyncToLatestEvent());
refactor idea: this could be moved to a function like 
CatalogOpExecutor.trySyncToLatestEventId(). If it returns true, then reloading 
the table / handling in-flight events could be skipped.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1270
PS4, Line 1270: if (tbl.hasInProgressModification()) 
Preconditions.checkState(reloadMetadata);
This seems valid even in the syncToLatestEventId case.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1276
PS4, Line 1276:   catalog_.addVersionsForInflightEvents(false, tbl, 
newCatalogVersion);
Are we using in flight event lists for anything when 
enableSyncToLatestEventOnDdls is true? If not, then checking the lists could be 
skipped in the event processor. This would solve IMPALA-12461.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1658
PS4, Line 1658: syncToLatestEventId
What kind of guarantees do we have for the state of the event processor when 
syncing the events for a table? I don't see any synchronization - is it 
possible that it processes events from HMS polling at the same time? My concern 
is about processing events for the same table - can't this lead to processing 
the event twice, once in syncToLatestEventId, and once on the the event 
processor thread?


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7004
PS4, Line 7004:   try {
  : createInsertEvents((FeFsTable) table, 
update.getUpdated_partitions(),
  : addedPartitionNames, update.is_overwrite, tblTxn);
  :   } catch (Exception e) {
  : LOG.error("Failed to fire insert events for table {}", 
table.getFullName(), e);
  :   }
Allowing createInsertEvents() to fail is more problematic in case of 
enableSyncToLatestEventOnDdls, as we rely on creating these events to reload 
the table.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7031
PS4, Line 7031:   catalog_.addVersionsForInflightEvents(false, table, 
newCatalogVersion);
Can't we skip this too if the events will by synced for the table?


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7040
PS4, Line 7040:   // update metastore table/partitions for insert command
  :   boolean syncToLatestEventId =
  :   
BackendConfig.INSTANCE.enableSyncToLatestEventOnDdls();
  :   if (syncToLatestEventId) {
  : MetastoreEventsProcessor.syncToLatestEventId(catalog_, 
table,
  : catalog_.getEventFactoryForSyncToLatestEvent());
  :   } else {
  : loadTableMetadata(table, newCatalogVersion, true, 
false, partsToLoadMetadata,
  : partitionToEventId, "INSERT", 
update.getDebug_action());
  :   }
The commit message and the flag name suggests that we only apply this for DDLs, 
while here it is done for a DML. I also see that it is applied to TRUNCATE.



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c

[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients

2023-10-27 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
for all DDLS from Impala clients
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/20367/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20367/4//COMMIT_MSG@8
PS4, Line 8: for all DDLS
Can you add to the title whether Impala syncs the events before or after the 
DDL operation? I think that this a critical info about the intention  of the 
patch


http://gerrit.cloudera.org:8080/#/c/20367/4//COMMIT_MSG@20
PS4, Line 20: Once HIVE-27499 is implemented, we can update the
: snapshot only if there are any pending events. Currently, there 
is no
: efficient way to identify if there are pending events for a 
db/table.
Won't there be always a pending event, as the DDL from Impala creates a new 
event for the DB/Table?


http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1396
PS3, Line 1396: versionLock_.writeLock().lock();
> We always obtain a write lock on the table in CatalogOpExecutor before doin
I think that we also need to hold versionLock_ before calling this, as the 
locking protocol states that version lock should be acquired before table lock. 
An example where we take version lock first and then the table lock is 
CatalogServiceCatalog.getOrLoadTable



Note that this is probably irrelevant, see comment at line 1391.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1391
PS4, Line 1391: updateMetastoreTable
I couldn't find any place where we call this function - is it still used?


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1546
PS4, Line 1546:   alterTableRename(testTblName, newTblName, TEST_DB_NAME);
  :   eventsProcessor_.processEvents();
What will happen if a rename is detected during a DDL and not in 
eventsProcessor_.processEvents();? E.g. a t1 is renamed to t2, then back to t1, 
and Impala tries to do a DDL on t1.

I don't think that we really need a proper handling for this, but it shouldn't 
cause global issues in the catalog.



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Fri, 27 Oct 2023 16:14:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients

2023-10-26 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
for all DDLS from Impala clients
..


Patch Set 4:

(4 comments)

> Patch Set 3:
>
> (4 comments)

http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1388
PS3, Line 1388: table ob
> nit: "table"
Ack


http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1396
PS3, Line 1396: versionLock_.writeLock().lock();
> We need the table write lock to do this. Please add a precondition check at
We always obtain a write lock on the table in CatalogOpExecutor before doing 
this (Eg: CatalogOpExecutor L#1634). Nevertheless, it is good to add a 
precondition here.


http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1033
PS3, Line 1033: long oldCatalogVersion = tbl.getCatalogVersion();
> Why do we move tryWriteLock() after getCatalogVersion()?
Not an intended change. Let me undo this.


http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1040
PS3, Line 1040:   if (params.getAlter_type() == TAlterTableType.RENAME_VIEW
> I might miss something so got confused. After IMPALA-10926, in CatalogMetas
Addressed in the latest patch.



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 26 Oct 2023 21:05:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients

2023-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
for all DDLS from Impala clients
..


Patch Set 4:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/14267/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 26 Oct 2023 20:02:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients

2023-10-26 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, k.venureddy2...@gmail.com, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20367

to look at the new patch set (#4).

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
for all DDLS from Impala clients
..

IMPALA-10976: Sync db/table in catalogD to latest
HMS event id for all DDLS from Impala clients

The idea is that when any DDL operation is performed by Impala, it also
syncs the db/table to its latest event ID as per HMS. This way updates
to a db/table's are applied in the same order as they appear in the
Notification log table in HMS which ensures consistency. Currently
catalogD applies any updates received from Impala clients in-place.
Instead it should perform an HMS operation first and then replay all
the HMS events since the last synced event id.

Implementation: when the enable_sync_to_latest_event_on_ddls flag is
set to true, we fetch the latest snapshot of db/table and update it in
the catalogD's cache. Once HIVE-27499 is implemented, we can update the
snapshot only if there are any pending events. Currently, there is no
efficient way to identify if there are pending events for a db/table.

Set 'enable_sync_to_latest_event_on_ddls' to true and
'invalidate_hms_cache_on_ddls' to false to use this feature.

Testing:
1) Added few tests in the MetaStoreEventProcessorForTest to verify this
feature that simulates the metadata sync between HMS and Impala.
2) Added few tests in the CatalogHmsSyncToLatestEventIdTest class to
the metadata sync between HMS end point, Catalog Metastore Server and
Impala. The HMS end point serves as common interface to metadata
changes outside the current Impala service such as Hive, Spark or other
Impala service. Also verified the table lastSyncEventId is updated
after the events are sync and confirmed that metastore event processor
ignored these synced events.

Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
6 files changed, 553 insertions(+), 101 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/20367/4
--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients

2023-10-19 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
for all DDLS from Impala clients
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1388
PS3, Line 1388: database
nit: "table"


http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1396
PS3, Line 1396:   tbl.setMetaStoreTable(msTbl);
We need the table write lock to do this. Please add a precondition check at the 
beginning of the method:

 Preconditions.checkState(tbl.isWriteLockedByCurrentThread());


http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1033
PS3, Line 1033: tryWriteLock(tbl);
Why do we move tryWriteLock() after getCatalogVersion()?


http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1040
PS3, Line 1040:   tbl = catalog_.updateMetastoreTable(msTbl, tbl);
I might miss something so got confused. After IMPALA-10926, in 
CatalogMetastoreServiceHandler we perform the catalog operation in the 
following steps:

1. Acquire write lock on the table
2. Perform ddl operation in HMS
3. Sync table till the latest event id (as per HMS) since its last synced event 
id. Also update the last-synced-event-id at the end.

#2 updates HMS and #3 updates the local catalog cache based on the events only. 
Catalogd won't modify the table cache based on the ddl request. Changes from 
different sources (local impala, external impala or other HMS clients) are 
applied in the order as they are executed in HMS, which brings us a higher 
consistency.

However, the current patch just fetch the latest HMS object and use it to 
update the local cache before applying the ddl request. The 
last-synced-event-id is not updated at the end. This doesn't match what you 
mentioned in the commit message:

> Currently catalogD applies any updates received from Impala clients in-place.
Instead it should perform an HMS operation first and then replay all
the HMS events since the last synced event id.

Could you explain more about the current solution?



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 19 Oct 2023 11:01:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients

2023-10-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
for all DDLS from Impala clients
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14190/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 16 Oct 2023 19:14:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients

2023-10-16 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
for all DDLS from Impala clients
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1006
PS2, Line 1006: msTbl
> How do we deal with the case when table is renamed through external source(
If the rename event happened outside of metastore, how do you expect it sync to 
catalogD?
CatalogD can deal with events present in the metastore.


http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1007
PS2, Line 1007:   tbl = catalog_.updateMetastoreTable(msTbl, tbl);
> We don't seem to reload table ? Probably table schema, partitions would hav
We don't have to reload entire hdfs table for 2 reasons. 1) It's a very costly 
operation to reload hdfs tables. (even during alter table query from impala, we 
only reload what is really required in the table like table schema or file 
schema). 2) We always access the metastore table from the cached impala 
representation of the table, so we would apply the current change on top of the 
latest metastore's table and reload the part that is really required.
All in all, I believe reloading the table without knowing what changed is too 
aggressive and it would have complications in the performance.


http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1011
PS2, Line 1011: long oldCatalogVersion = tbl.getCatalogVersion();
> I believe, we need to get the oldCatalogVersion before updating tbl. Becaus
Ack


http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2398
PS2, Line 2398:   long newCatalogVersion = 
catalog_.incrementAndGetCatalogVersion();
> It is good to get the newCatalogVersion after catalog_.updateMetastoreTable
Ack



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 2
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 16 Oct 2023 18:49:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients

2023-10-16 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, k.venureddy2...@gmail.com, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20367

to look at the new patch set (#3).

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
for all DDLS from Impala clients
..

IMPALA-10976: Sync db/table in catalogD to latest
HMS event id for all DDLS from Impala clients

The idea is that when any DDL operation is performed by Impala, it also
syncs the db/table to its latest event ID as per HMS. This way updates
to a db/table's are applied in the same order as they appear in the
Notification log table in HMS which ensures consistency. Currently
catalogD applies any updates received from Impala clients in-place.
Instead it should perform an HMS operation first and then replay all
the HMS events since the last synced event id.

Implementation: when the enable_sync_to_latest_event_on_ddls flag is
set to true, we fetch the latest snapshot of db/table and update it in
the catalogD's cache. Once HIVE-27499 is implemented, we can update the
snapshot only if there are any pending events. Currently, there is no
efficient way to identify if there are pending events for a db/table.

Set enable_sync_to_latest_event_on_ddls to true to use this feature.

Testing:
Added few tests in the MetaStoreEventProcessorForTest to verify this
feature.

Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
4 files changed, 150 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/20367/3
--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients

2023-09-29 Thread Anonymous Coward (Code Review)
k.venureddy2...@gmail.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
for all DDLS from Impala clients
..


Patch Set 2:

(4 comments)

few observations

http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1006
PS2, Line 1006: msTbl
How do we deal with the case when table is renamed through external 
source(hive/other catalogd) and the rename event is still not synced on this 
catalogd ?


http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1007
PS2, Line 1007:   tbl = catalog_.updateMetastoreTable(msTbl, tbl);
We don't seem to reload table ? Probably table schema, partitions would have 
changed.


http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1011
PS2, Line 1011: long oldCatalogVersion = tbl.getCatalogVersion();
I believe, we need to get the oldCatalogVersion before updating tbl. Because 
addTableToCatalogUpdate() is done only when oldCatalogVersion is different from 
tbl.getCatalogVersion().


http://gerrit.cloudera.org:8080/#/c/20367/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2398
PS2, Line 2398:   long newCatalogVersion = 
catalog_.incrementAndGetCatalogVersion();
It is good to get the newCatalogVersion after catalog_.updateMetastoreTable(). 
Otherwise newCatalogVersion would be smaller than table.getCatalogVersion().



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 2
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Fri, 29 Sep 2023 20:16:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients

2023-09-05 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
for all DDLS from Impala clients
..


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/20367/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20367/1//COMMIT_MSG@8
PS1, Line 8: Impala clien
> I think that impala shell is misleading here, as any client initiate DDLs.
Ack


http://gerrit.cloudera.org:8080/#/c/20367/1//COMMIT_MSG@10
PS1, Line 10: by Impala, it al
> Same as for the title, IMO "performed by Impala" would be better.
Ack


http://gerrit.cloudera.org:8080/#/c/20367/1//COMMIT_MSG@14
PS1, Line 14: Impala clien
> same as for the title
Ack


http://gerrit.cloudera.org:8080/#/c/20367/1//COMMIT_MSG@20
PS1, Line 20:  HIVE-27499
> Do you think that this feature can practically work before  HIVE-27499? My
Before HIVE-27499, we'll always fetch the HMS metadata no matter the event 
processor is lagging or not. Unfortunately, It does add some overhead. But this 
overhead ensures consistency.
Yeah, any thing can happen between RPC call and processing of DDL, as it 
happens today and we cannot do anything about it.


http://gerrit.cloudera.org:8080/#/c/20367/1//COMMIT_MSG@24
PS1, Line 24: Set enable_sync_to_latest_event_on_ddls to true to use this 
feature.
> This flag is not exposed yet, we can only set it in tests.
This is global flag 
https://github.com/apache/impala/blob/master/be/src/catalog/catalog-server.cc#L121
 and by default it is disabled.


http://gerrit.cloudera.org:8080/#/c/20367/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/20367/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1369
PS1, Line 1369: the table with the given metastore table
> This function seems to be about tables, not databases.
Ack


http://gerrit.cloudera.org:8080/#/c/20367/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20367/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1012
PS1, Line 1012: // Get a new catalog version to assign to the table being 
altered.
> Calling an HMS RPC when the global catalog version lock is locked is proble
Valid point. We should always do RPC call before locking the catalog version.


http://gerrit.cloudera.org:8080/#/c/20367/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/20367/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1488
PS1, Line 1488: impala shell
> We are not using the shell here (or any Impala client) here, right? My unde
Ack. I'm just simulating what would happen once the request lands on Impala's 
catalog OpExecutor.


http://gerrit.cloudera.org:8080/#/c/20367/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1498
PS1, Line 1498: BackendConfig.INSTANCE.setHMSPollingIntervalInSeconds(10);
> Will this have any effect? We seem check this only once, when the event pro
Ah!! I thought this would take into effect.
Is there any other way to make the event processor lagging in the Java unit 
tests? We would want to this in Java tests because we need to verify loaded 
table in java.
There is no way we could verify this in end-to-end python tests.



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 2
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 05 Sep 2023 19:46:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients

2023-08-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
for all DDLS from Impala clients
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13796/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 2
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 22 Aug 2023 02:02:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients

2023-08-21 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20367

to look at the new patch set (#2).

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
for all DDLS from Impala clients
..

IMPALA-10976: Sync db/table in catalogD to latest
HMS event id for all DDLS from Impala clients

The idea is that when any DDL operation is performed by Impala, it also
syncs the db/table to its latest event ID as per HMS. This way updates
to a db/table's are applied in the same order as they appear in the
Notification log table in HMS which ensures consistency. Currently
catalogD applies any updates received from Impala clients in-place.
Instead it should perform an HMS operation first and then replay all
the HMS events since the last synced event id.

Implementation: when the enable_sync_to_latest_event_on_ddls flag is
set to true, we fetch the latest snapshot of db/table and update it in
the catalogD's cache. Once HIVE-27499 is implemented, we can update the
snapshot only if there are any pending events. Currently, there is no
efficient way to identify if there are pending events for a db/table.

Set enable_sync_to_latest_event_on_ddls to true to use this feature.

Testing:
Added few tests in the MetaStoreEventProcessorForTest to verify this
feature.

Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
4 files changed, 149 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/20367/2
--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 2
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala