[ https://issues.apache.org/jira/browse/IMPALA-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17724936#comment-17724936 ]
Quanlong Huang commented on IMPALA-11768: ----------------------------------------- Yeah, catching InvalidObjectException makes sense to me. > EventProcessor should ignore failures on non-existing dbs/tables > ---------------------------------------------------------------- > > Key: IMPALA-11768 > URL: https://issues.apache.org/jira/browse/IMPALA-11768 > Project: IMPALA > Issue Type: Bug > Components: Catalog > Reporter: Quanlong Huang > Assignee: Sai Hemanth Gantasala > Priority: Critical > Attachments: > catalogd.84223c9b5251.invalid-user.log.INFO.20221129-132600.1.gz, > hive-metastore.log.gz, hive-server2.log.gz, > impalad.b99660bbd523.invalid-user.log.INFO.20221129-132601.1.bz2 > > > We recently saw test failures in > [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/6697/] > {noformat} > metadata.test_event_processing.TestEventProcessing.test_event_based_replication > metadata.test_event_processing.TestEventProcessing.test_empty_partition_events > metadata.test_event_processing.TestEventProcessing.test_transactional_insert_events > metadata.test_event_processing.TestEventProcessing.test_insert_events > metadata.test_event_processing.TestEventProcessing.test_empty_partition_events_transactional > {noformat} > The errors are similar: > {code:python} > metadata/test_event_processing.py:179: in test_event_based_replication > self.__run_event_based_replication_tests() > metadata/test_event_processing.py:194: in __run_event_based_replication_tests > EventProcessorUtils.wait_for_event_processing(self) > util/event_processor_utils.py:65: in wait_for_event_processing > within {1} seconds".format(current_event_id, timeout)) > E Exception: Event processor did not sync till last known event id 21676 > within 10 seconds > {code} > Stderr: > {noformat} > SET > client_identifier=metadata/test_event_processing.py::TestEventProcessing::()::test_event_based_replication; > -- connecting to: localhost:21000 > -- connecting to localhost:21050 with impyla > -- 2022-11-29 14:10:14,332 INFO MainThread: Closing active operation > -- connecting to localhost:28000 with impyla > -- 2022-11-29 14:10:14,346 INFO MainThread: Closing active operation > -- 2022-11-29 14:10:19,364 INFO MainThread: Waiting until events > processor syncs to event id:21676 > {noformat} > Looking into the catalogd logs, the event-processor is already in the > NEEDS_INVALIDATE state so it's paused: > {noformat} > W1129 14:10:09.401335 92 MetastoreEventsProcessor.java:844] Event > processing is skipped since status is NEEDS_INVALIDATE. Last synced event id > is 21538 > {noformat} > The last event it processed is an INSERT event. The metadata reloading failed: > {noformat} > I1129 14:10:07.365427 92 MetastoreEventsProcessor.java:805] Received 36 > events. Start event id : 21538 > I1129 14:10:07.367424 92 MetastoreEvents.java:631] EventId: 21552 > EventType: DROP_DATABASE Creating event 21552 of type DROP_DATABASE on > database test_views_ddl_b1c33615 > I1129 14:10:07.371448 92 MetastoreEvents.java:631] EventId: 21573 > EventType: DROP_DATABASE Creating event 21573 of type DROP_DATABASE on > database test_compute_stats_9c95c5d8 > I1129 14:10:07.371696 92 MetastoreEvents.java:281] Total number of events > received: 36 Total number of events filtered out: 0 > I1129 14:10:07.371767 92 MetastoreEvents.java:285] Incremented skipped > metric to 2631 > I1129 14:10:07.371951 92 HdfsTable.java:2744] Reloading partition > metadata: test_alter_table_hdfs_caching_95073aa2.i1670e_alter x=1 (INSERT > event) > W1129 14:10:07.381459 92 RetryingMetaStoreClient.java:269] MetaStoreClient > lost connection. Attempting to reconnect (1 of 1) after 1s. > getPartitionsByNames > Java exception follows: > org.apache.thrift.TApplicationException: Internal error processing > get_partitions_by_names_req > ... > {noformat} > It then fails in the retry, and needs manual invalidate metadata: > {noformat} > I1129 14:10:08.381718 92 RetryingMetaStoreClient.java:185] > RetryingMetaStoreClient trying reconnect as ubuntu (auth:SIMPLE) > ... > I1129 14:10:08.395550 92 HiveMetaStoreClient.java:821] Connected to > metastore. > I1129 14:10:08.400802 92 MetastoreEventsProcessor.java:1026] Time elapsed > in processing event batch: 1s035ms > E1129 14:10:08.401016 92 MetastoreEventsProcessor.java:859] Event > processing needs a invalidate command to resolve the state > Java exception follows: > org.apache.impala.catalog.events.MetastoreNotificationNeedsInvalidateException: > EventId: 21539 EventType: INSERT Refresh partition on table {} partition {} > failed. Event processing cannot continue. Issue an invalidate metadata > command to reset the event processor state. > at > org.apache.impala.catalog.events.MetastoreEvents$InsertEvent.processPartitionInserts(MetastoreEvents.java:1298) > at > org.apache.impala.catalog.events.MetastoreEvents$InsertEvent.process(MetastoreEvents.java:1277) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:514) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1013) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:851) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.impala.catalog.CatalogException: Unexpected error while > retrieving partitions for table > test_alter_table_hdfs_caching_95073aa2.i1670e_alter > at > org.apache.impala.catalog.HdfsTable.reloadPartitionsFromNames(HdfsTable.java:2768) > at > org.apache.impala.service.CatalogOpExecutor.reloadPartitionsIfExist(CatalogOpExecutor.java:4501) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadPartitions(MetastoreEvents.java:879) > at > org.apache.impala.catalog.events.MetastoreEvents$InsertEvent.processPartitionInserts(MetastoreEvents.java:1295) > ... 11 more > Caused by: org.apache.thrift.TApplicationException: Internal error processing > get_partitions_by_names_req > at > org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions_by_names_req(ThriftHiveMetastore.java:4162) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_by_names_req(ThriftHiveMetastore.java:4149) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsByNamesInternal(HiveMetaStoreClient.java:2500) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsByNames(HiveMetaStoreClient.java:2491) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsByNames(HiveMetaStoreClient.java:2468) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsByNames(HiveMetaStoreClient.java:2445) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsByNames(HiveMetaStoreClient.java:2409) > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212) > at com.sun.proxy.$Proxy11.getPartitionsByNames(Unknown Source) > at > org.apache.impala.catalog.HdfsTable.reloadPartitionsFromNames(HdfsTable.java:2749) > ... 14 more > {noformat} > In HMS logs, the internal error is that the table is dropped: > {noformat} > 2022-11-29T14:10:08,398 ERROR [TThreadPoolServer WorkerProcess-28] > metastore.RetryingHMSHandler: > InvalidObjectException(message:hive.test_alter_table_hdfs_caching_95073aa2.i1670e_alter > table not found) > ... > 2022-11-29T14:10:08,398 ERROR [TThreadPoolServer WorkerProcess-28] > thrift.ProcessFunction: Internal error processing get_partitions_by_names_req > org.apache.hadoop.hive.metastore.api.InvalidObjectException: > hive.test_alter_table_hdfs_caching_95073aa2.i1670e_alter table not found > ... > {noformat} > It's dropped by a concurrent DROP DATABASE CASCADE command. In the catalogd > logs: > {noformat} > I1129 14:10:07.005285 104 JniCatalog.java:257] execDdl request: > DROP_DATABASE test_alter_table_hdfs_caching_95073aa2 issued by ubuntu > ... > I1129 14:10:07.387348 104 CatalogServiceCatalog.java:3290] Should wait for > next update for DATABASE:test_alter_table_hdfs_caching_95073aa2: older > version 4077 is sent. Expects a version >= 4727. > {noformat} > When failed to process the event, the event-processor should doulbe check > whether the target (db/table) still exists in HMS in case of such concurrent > drop operations. If the target no longer exists, just skip the failure and > invalidate the corresponding catalog object. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org