[ https://issues.apache.org/jira/browse/IMPALA-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
bharath v updated IMPALA-7961: ------------------------------ Priority: Critical (was: Major) > Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail > fast > ------------------------------------------------------------------------------- > > Key: IMPALA-7961 > URL: https://issues.apache.org/jira/browse/IMPALA-7961 > Project: IMPALA > Issue Type: Bug > Components: Catalog > Reporter: bharath v > Priority: Critical > > When catalog server is under heavy load with concurrent updates to objects, > queries with SYNC_DDL can fail with the following message. > *User facing error message:* > {noformat} > ERROR: CatalogException: Couldn't retrieve the catalog topic version for the > SYNC_DDL operation after 3 attempts.The operation has been successfully > executed but its effects may have not been broadcast to all the coordinators. > {noformat} > *Exception from the catalog server log:* > {noformat} > I1031 00:00:49.168761 1127039 CatalogServiceCatalog.java:1903] Operation > using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify > topic version (msec): 1088 > I1031 00:00:49.168824 1125528 CatalogServiceCatalog.java:1903] Operation > using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify > topic version (msec): 12625 > I1031 00:00:49.168851 1131986 jni-util.cc:230] > org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog > topic version for the SYNC_DDL operation after 3 attempts.The operation has > been successfully executed but its effects may have not been broadcast to all > the coordinators. > at > org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:1891) > at > org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:336) > at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:146) > :::: > {noformat} > *What this means* > This means that the Catalog operation is actually successful (the change has > been committed to HMS and Catalog server cache) but the Catalog server > noticed that it is taking longer than expected time for it to broadcast the > changes (for whatever reason) and instead of hanging in there, it fails fast. > The coordinators are expected to eventually sync up in the background. > *Problem* > - This violates the contract of the SYNC_DDL query option since the query > returns early. > - This is a behavioral regression from pre IMPALA-5058 state where the > queries would wait forever for SYNC_DDL based changes to propagate. > *Notes* > - Usual suspect here is heavily concurrent catalog operations with long > running DDLs. > - Introduced by IMPALA-5058 > Please refer to the jira comment for technical explanation as to why this is > happening. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org