[ https://issues.apache.org/jira/browse/IMPALA-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
bharath v resolved IMPALA-7961. ------------------------------- Resolution: Fixed Fix Version/s: Impala 3.2.0 > Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail > fast > ------------------------------------------------------------------------------- > > Key: IMPALA-7961 > URL: https://issues.apache.org/jira/browse/IMPALA-7961 > Project: IMPALA > Issue Type: Bug > Components: Catalog > Affects Versions: Impala 2.12.0, Impala 3.1.0 > Reporter: bharath v > Assignee: bharath v > Priority: Critical > Fix For: Impala 3.2.0 > > Attachments: 0001-Repro-of-IMPALA-7961.patch > > > When catalog server is under heavy load with concurrent updates to objects, > queries with SYNC_DDL can fail with the following message. > *User facing error message:* > {noformat} > ERROR: CatalogException: Couldn't retrieve the catalog topic version for the > SYNC_DDL operation after 3 attempts.The operation has been successfully > executed but its effects may have not been broadcast to all the coordinators. > {noformat} > *Exception from the catalog server log:* > {noformat} > I1031 00:00:49.168761 1127039 CatalogServiceCatalog.java:1903] Operation > using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify > topic version (msec): 1088 > I1031 00:00:49.168824 1125528 CatalogServiceCatalog.java:1903] Operation > using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify > topic version (msec): 12625 > I1031 00:00:49.168851 1131986 jni-util.cc:230] > org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog > topic version for the SYNC_DDL operation after 3 attempts.The operation has > been successfully executed but its effects may have not been broadcast to all > the coordinators. > at > org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:1891) > at > org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:336) > at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:146) > :::: > {noformat} > *What this means* > The Catalog operation is actually successful (the change has been committed > to HMS and Catalog server cache) but the Catalog server noticed that it is > taking longer than expected time for it to broadcast the changes (for > whatever reason) and instead of hanging in there, it fails fast. The > coordinators are expected to eventually sync up in the background. > *Problem* > - This violates the contract of the SYNC_DDL query option since the query > returns early. > - This is a behavioral regression from pre IMPALA-5058 state where the > queries would wait forever for SYNC_DDL based changes to propagate. > *Notes* > - Introduced by IMPALA-5058 > - Based on the occurrences of this issue, we narrowed it down to a specific > kind of DDLs (see Jira comments). > - My understanding is that this also applies to the Catalog V2 (or > LocalCatalog mode) since we still rely on the CatalogServer for DDL > orchestration and hence it takes this codepath. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org