Hello Quanlong Huang, k.venureddy2...@gmail.com, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20367

to look at the new patch set (#23).

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
......................................................................

IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

The idea is that when any DDL/DML operation is performed by Impala, it
also syncs the db/table to its latest event ID as per HMS. This way
updates to a db/table's are applied in the same order as they appear in
the Notification log table in HMS which ensures consistency. Currently
catalogD applies any updates received from Impala clients in-place.
Instead it should perform an HMS operation first and then replay all
the HMS events since the last synced event id.

Implementation: when the enable_sync_to_latest_event_on_ddls flag is
set to true, we do the DDL/DML operation first, i.e., perform HMS
operation and then sync the db/table in the catalogD's cache to the
latest event in HMS for the corresponding db/table. Currently we fetch
all events greater than the db/table's lastSyncEventId and filter them
in the events processor to sync only the current db/table events. Once
HIVE-27499 is implemented, we can directly fetch the events only for
the respective db/table and process them. Currently, there is no
efficient way to identify if there are pending events for a db/table.

Set 'enable_sync_to_latest_event_on_ddls' to true.

Note: We don't modify the cache using MetastoreEventsProcessor for
alter table rename operation as this is a complex operation regarding
cache modification (IMPALA-12553 has more details about this). We also
don't modify the cache this way for the truncate table operation
(IMPALA-12636 has more details about this). We don't modify cache using
above process for 'refresh table'/'invalidate metadata table' commands.
Few more properties are added to the 'file_metadata_reload_properties'
config like numFiles, numRows, totalSize e.t.c (see IMPALA-11534 for
more details).

Testing:
1) Added few tests in the MetaStoreEventProcessorForTest to verify this
feature that simulates the metadata sync between HMS and Impala.
2) Added few tests in the CatalogHmsSyncToLatestEventIdTest class to
the metadata sync between HMS end point, Catalog Metastore Server and
Impala. The HMS end point serves as common interface to metadata
changes outside the current Impala service such as Hive, Spark or other
Impala service. Also verified the table lastSyncEventId is updated
after the events are sync and confirmed that metastore event processor
ignored these synced events.
3) Added some end-to-end tests in test_sync_to_latest_hms_events.py

Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
---
M be/src/catalog/catalog-server.cc
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
A tests/custom_cluster/test_sync_to_latest_hms_events.py
A tests/metadata/__init__.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
M tests/metadata/test_event_processing.py
M tests/metadata/test_recover_partitions.py
18 files changed, 1,106 insertions(+), 368 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/20367/23
--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 23
Gerrit-Owner: Sai Hemanth Gantasala <saihema...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <k.venureddy2...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com>

Reply via email to