[jira] [Created] (IMPALA-11179) INVALIDATE METADATA might not bring up the table when event processor is enabled

Fri, 11 Mar 2022 01:26:19 -0800

Quanlong Huang created IMPALA-11179:
---------------------------------------

             Summary: INVALIDATE METADATA <table> might not bring up the table 
when event processor is enabled
                 Key: IMPALA-11179
                 URL: https://issues.apache.org/jira/browse/IMPALA-11179
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
            Reporter: Quanlong Huang


When a table is created outside Impala (e.g. via Hive/Spark), we use INVALIDATE 
METADATA <table> to bring it up in Impala. However, when event processor is 
enabled, IM might not work when the database and table are created together 
shortly outside Impala.

The following test code reveals the bug:
{code:java}
  def test_describe_after_invalidate(self, unique_name):
    self.run_stmt_in_hive("create database %s" % unique_name)
    self.run_stmt_in_hive("create table %s.tbl (x int)" % unique_name)
    self.client.execute("invalidate metadata %s.tbl" % unique_name)
    self.client.execute("describe %s.tbl" % unique_name)
{code}
The last DESCRIBE statement will fail.
If event processor is disabled (setting --hms_event_polling_interval_s=0 in 
catalogd), the test passes.

*Root Cause Analysis*

Catalogd receives and applies the createDatabase event and then publishes the 
catalog update to statestore. Logs:
{code:java}
I0311 12:00:03.479416 23584 MetastoreEventsProcessor.java:787] Received 3 
events. Start event id : 84034
...
I0311 12:00:03.546059 23584 MetastoreEvents.java:617] EventId: 84036 EventType: 
CREATE_DATABASE Successfully added database 
test_describe_after_invalidate_a8718cc4
I0311 12:00:03.843058 24858 catalog-server.cc:813] Collected update: 
1:DATABASE:test_describe_after_invalidate_a8718cc4, version=1940, original 
size=246, compressed size=174
I0311 12:00:03.843199 24858 catalog-server.cc:813] Collected update: 
1:CATALOG_SERVICE_ID, version=1940, original size=60, compressed size=58
I0311 12:00:05.839917 24865 catalog-server.cc:400] A catalog update with 2 
entries is assembled. Catalog version: 1940 Last sent catalog version: 
1939{code}
Before catalogd receiving the createTable event, the IM request come:
{code:java}
I0311 12:00:05.939095 24876 TAcceptQueueServer.cpp:340] New connection to 
server CatalogService from client <Host: ::ffff:127.0.0.1 Port: 37536>
I0311 12:00:05.957701 25399 CatalogServiceCatalog.java:2592] Invalidating table 
metadata: test_describe_after_invalidate_a8718cc4.tbl
{code}
It then returns with *{color:#FF0000}only{color}* the table object to the 
coordinator, because the database update is already published. However, the 
coordinator hasn't received the database update from statestore yet. When 
applying the table update, it's ignored since the parent database doesn't not 
exists.
{code:java}
I0311 12:00:05.911598 25120 Frontend.java:1636] 
4841a8e164843bef:b2539dd900000000] Analyzing query: invalidate metadata 
test_describe_after_invalidate_a8718cc4.tbl db: default
I0311 12:00:05.915292 25120 BaseAuthorizationChecker.java:112] 
4841a8e164843bef:b2539dd900000000] Authorization check took 3 ms 
I0311 12:00:05.915319 25120 Frontend.java:1679] 
4841a8e164843bef:b2539dd900000000] Analysis and authorization finished.
I0311 12:00:05.938761 25120 client-request-state.cc:754] 
4841a8e164843bef:b2539dd900000000] DDL exec mode=asynchronous
I0311 12:00:05.991802 25398 ImpaladCatalog.java:223] 
4841a8e164843bef:b2539dd900000000] Adding: 
TABLE:test_describe_after_invalidate_a8718cc4.tbl version: 1941 size: 79
I0311 12:00:05.991894 25398 ImpaladCatalog.java:460] 
4841a8e164843bef:b2539dd900000000] Parent database of table does not exist: 
test_describe_after_invalidate_a8718cc4.tbl
{code}
The related code: ImpaladCatalog#addTable()
{code:java}
private void addTable(TTable thriftTable, List<THdfsPartition> newPartitions,
    long catalogVersion) throws TableLoadingException {
  Db db = getDb(thriftTable.db_name);
  if (db == null) {
    if (LOG.isTraceEnabled()) {
      LOG.trace("Parent database of table does not exist: " + 
          thriftTable.db_name + "." + thriftTable.tbl_name);
    }   
    return;
  }
{code}

*Proposed Solution*
Currently there are no way for the catalogd to know the catalog version in each 
coordinator. It should be taken in the catalog request, so catalogd can return 
exactly what the coordinator needs. In this case, catalogd should return the 
database object and the table object together.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to