[ https://issues.apache.org/jira/browse/IMPALA-9101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967783#comment-16967783 ]
Vihang Karajgaonkar commented on IMPALA-9101: --------------------------------------------- I am going to use this JIRA to fix another issue with self-events which happen if the table or partition is reloaded before the event is received. Currently, there is a problem in case of partitions where if there is a refresh partition between the 2 steps below, we lose the information about in-flight events and hence it causes an unnecessary refresh. 1. DDL on partition, event is generated 2. Event is received, events processor detects if this is a self-event and refreshes it if needed > Unneccessary REFRESH due to wrong self-event detection > ------------------------------------------------------ > > Key: IMPALA-9101 > URL: https://issues.apache.org/jira/browse/IMPALA-9101 > Project: IMPALA > Issue Type: Bug > Reporter: Quanlong Huang > Assignee: Vihang Karajgaonkar > Priority: Critical > > In {{CatalogOpExecutor.alterTable()}}, we call > {{addVersionsForInflightEvents()}} whenever the AlterTable operation changes > anything or not. If nothing changes, no HMS RPCs are sent. The event > processor ends up waiting on a non-existed self-event. Then all self-events > are treated as outside events and unneccessary REFRESH/INVALIDATE on this > table will be performed. > Codes: > {code:java} > private void alterTable(TAlterTableParams params, TDdlExecResponse response) > throws ImpalaException { > .... > tryLock(tbl); > // Get a new catalog version to assign to the table being altered. > long newCatalogVersion = catalog_.incrementAndGetCatalogVersion(); > addCatalogServiceIdentifiers(tbl, catalog_.getCatalogServiceId(), > newCatalogVersion); > .... > // now that HMS alter operation has succeeded, add this version to list > of inflight > // events in catalog table if event processing is enabled > catalog_.addVersionsForInflightEvents(tbl, newCatalogVersion); <---- > We should check before calling this. > } > {code} > Reproduce: > {code:sql} > create table testtbl (col int) partitioned by (p1 int, p2 int); > alter table testtbl add partition (p1=2,p2=6); > alter table testtbl add if not exists partition (p1=2,p2=6); > -- After this point, can't detect self-events on this table > alter table testtbl add partition (p1=2,p2=7); > {code} > Catalogd logs: > {code:bash} > I1029 07:41:15.310956 8546 HdfsTable.java:630] Loaded file and block > metadata for default.testtbl partitions: p1=2/p2=6 > I1029 07:41:15.892410 8321 MetastoreEventsProcessor.java:480] Received 1 > events. Start event id : 11463 > I1029 07:41:15.895717 8321 MetastoreEvents.java:396] EventId: 11464 > EventType: ADD_PARTITION Creating event 11464 of type ADD_PARTITION on table > default.testtbl > I1029 07:41:15.940225 8321 MetastoreEvents.java:241] Total number of events > received: 1 Total number of events filtered out: 0 > I1029 07:41:15.940414 8321 MetastoreEvents.java:385] EventId: 11464 > EventType: ADD_PARTITION Not processing the event as it is a self-event > #### Correctly recognize self-event ^^^^ > I1029 07:41:16.829824 8329 catalog-server.cc:641] Collected update: > 1:TABLE:default.testtbl, version=1385, original size=4438, compressed > size=1216 > I1029 07:41:16.831853 8329 catalog-server.cc:641] Collected update: > 1:CATALOG_SERVICE_ID, version=1385, original size=60, compressed size=58 > I1029 07:41:18.827137 8339 catalog-server.cc:337] A catalog update with 2 > entries is assembled. Catalog version: 1385 Last sent catalog version: 1384 > #### No events for adding partition p1=2,p2=6 again. But we still bump the > catalog version. > I1029 07:45:38.900974 8329 catalog-server.cc:641] Collected update: > 1:CATALOG_SERVICE_ID, version=1386, original size=60, compressed size=58 > I1029 07:45:40.899353 8339 catalog-server.cc:337] A catalog update with 1 > entries is assembled. Catalog version: 1386 Last sent catalog version: 1385 > #### Creating partition p1=2,p2=7 > I1029 07:45:48.827221 8546 HdfsTable.java:630] Loaded file and block > metadata for default.testtbl partitions: p1=2/p2=7 > I1029 07:45:48.904234 8329 catalog-server.cc:641] Collected update: > 1:TABLE:default.testtbl, version=1387, original size=4886, compressed > size=1251 > I1029 07:45:48.905262 8329 catalog-server.cc:641] Collected update: > 1:CATALOG_SERVICE_ID, version=1387, original size=60, compressed size=58 > I1029 07:45:49.523567 8321 MetastoreEventsProcessor.java:480] Received 1 > events. Start event id : 11464 > I1029 07:45:49.524150 8321 MetastoreEvents.java:396] EventId: 11465 > EventType: ADD_PARTITION Creating event 11465 of type ADD_PARTITION on table > default.testtbl > I1029 07:45:49.527262 8321 MetastoreEvents.java:241] Total number of events > received: 1 Total number of events filtered out: 0 > I1029 07:45:49.530278 8321 MetastoreEvents.java:385] EventId: 11465 > EventType: ADD_PARTITION Trying to refresh 1 partitions added to table > default.testtbl in the event > I1029 07:45:49.531026 8321 CatalogServiceCatalog.java:2572] Refreshing > partition metadata: default.testtbl p1=2/p2=7 (processing ADD_PARTITION event > from HMS) > #### Unneccessary REFRESH ^^^^ > I1029 07:45:49.604936 8321 HdfsTable.java:630] Loaded file and block > metadata for default.testtbl partitions: p1=2/p2=7 > I1029 07:45:49.605069 8321 CatalogServiceCatalog.java:2594] Refreshed > partition metadata: default.testtbl p1=2/p2=7 > I1029 07:45:49.605273 8321 MetastoreEvents.java:385] EventId: 11465 > EventType: ADD_PARTITION Refreshed 1 partitions of table default.testtbl > I1029 07:45:50.901763 8339 catalog-server.cc:337] A catalog update with 2 > entries is assembled. Catalog version: 1387 Last sent catalog version: 1386 > I1029 07:45:50.904940 8329 catalog-server.cc:641] Collected update: > 1:TABLE:default.testtbl, version=1388, original size=4886, compressed > size=1251 > I1029 07:45:50.905792 8329 catalog-server.cc:641] Collected update: > 1:CATALOG_SERVICE_ID, version=1388, original size=60, compressed size=58 > I1029 07:45:52.902602 8339 catalog-server.cc:337] A catalog update with 2 > entries is assembled. Catalog version: 1388 Last sent catalog version: 1387 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org