[ https://issues.apache.org/jira/browse/IMPALA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joe McDonnell resolved IMPALA-9549. ----------------------------------- Fix Version/s: Impala 4.0 Target Version: Impala 4.0 Resolution: Fixed > Impalad startup fails to wait for catalogd to startup when using local catalog > ------------------------------------------------------------------------------ > > Key: IMPALA-9549 > URL: https://issues.apache.org/jira/browse/IMPALA-9549 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 4.0 > Reporter: Joe McDonnell > Assignee: Joe McDonnell > Priority: Critical > Fix For: Impala 4.0 > > > Since Impala coordinators and executors may be starting up at the same time > as the catalogd, they should be tolerant of delays in the catalogd starting > up. When using local catalog (use_local_catalog=true), the Impalads fail with > the following error if the catalogd startup is delayed: > {noformat} > I0323 14:22:03.151849 29565 jni-util.cc:288] > org.apache.impala.catalog.local.LocalCatalogException: Unable to load > database names > I0323 14:22:03.151849 29565 jni-util.cc:288] > org.apache.impala.catalog.local.LocalCatalogException: Unable to load > database names > at org.apache.impala.catalog.local.LocalCatalog.loadDbs(LocalCatalog.java:94) > at org.apache.impala.catalog.local.LocalCatalog.getDbs(LocalCatalog.java:83) > at org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:753) > at > org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:220) > Caused by: org.apache.thrift.TException: > org.apache.impala.common.InternalException: Couldn't open transport for > localhost:26000 (connect() failed: Connection refused) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:382) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:174) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$1.call(CatalogdMetaProvider.java:583) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$1.call(CatalogdMetaProvider.java:578) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:509) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadDbList(CatalogdMetaProvider.java:577) > at org.apache.impala.catalog.local.LocalCatalog.loadDbs(LocalCatalog.java:92) > ... 3 more > Caused by: org.apache.impala.common.InternalException: Couldn't open > transport for localhost:26000 (connect() failed: Connection refused) > at org.apache.impala.service.FeSupport.NativeGetPartialCatalogObject(Native > Method) > at > org.apache.impala.service.FeSupport.GetPartialCatalogObject(FeSupport.java:440) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:380) > ... 9 more > I0323 14:22:03.217051 29565 status.cc:126] LocalCatalogException: Unable to > load database names > CAUSED BY: TException: org.apache.impala.common.InternalException: Couldn't > open transport for localhost:26000 (connect() failed: Connection > refused){noformat} > What happens is that the ImpalaServer constructor calls > ImpalaServer::UpdateCatalogMetrics() > ([https://github.com/apache/impala/blob/3b833902519fb8f0ef9b5fd20919c5fd85d22fcf/be/src/service/impala-server.cc#L452] > ). UpdateCatalogMetrics() is maintaining two metrics that track the number > of databases and the number of tables. This ends up calling > org.apache.impala.catalog.local.LocalCatalog.getDbs(), which calls loadDbs() > ([https://github.com/apache/impala/blob/ca0785ec206f27f06d8d6fd1b710779e548bbd8e/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java#L83] > ). loadDbs() requires a connection to catalogd and will fail if it cannot > connect. > Importantly, this all happens before waiting for the catalogd to start up in > the regular ImpalaServer::Start(): > {code:java} > if (FLAGS_is_coordinator) exec_env_->frontend()->WaitForCatalog(); > {code} > > In the old catalog implementation (use_local_catalog=false), the getDbs() > call on the catalog returns whatever values it has, and it does not try to > contact the catalogd. This is why the regular case does not see this problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org