Joe McDonnell created IMPALA-9549:
-------------------------------------
Summary: Impalad startup fails to wait for catalogd to startup
when using local catalog
Key: IMPALA-9549
URL: https://issues.apache.org/jira/browse/IMPALA-9549
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 4.0
Reporter: Joe McDonnell
Since Impala coordinators and executors may be starting up at the same time as
the catalogd, they should be tolerant of delays in the catalogd starting up.
When using local catalog (use_local_catalog=true), the Impalads fail with the
following error if the catalogd startup is delayed:
{noformat}
I0323 14:22:03.151849 29565 jni-util.cc:288]
org.apache.impala.catalog.local.LocalCatalogException: Unable to load database
names
I0323 14:22:03.151849 29565 jni-util.cc:288]
org.apache.impala.catalog.local.LocalCatalogException: Unable to load database
names
at org.apache.impala.catalog.local.LocalCatalog.loadDbs(LocalCatalog.java:94)
at org.apache.impala.catalog.local.LocalCatalog.getDbs(LocalCatalog.java:83)
at org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:753)
at
org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:220)
Caused by: org.apache.thrift.TException:
org.apache.impala.common.InternalException: Couldn't open transport for
localhost:26000 (connect() failed: Connection refused)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:382)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:174)
at
org.apache.impala.catalog.local.CatalogdMetaProvider$1.call(CatalogdMetaProvider.java:583)
at
org.apache.impala.catalog.local.CatalogdMetaProvider$1.call(CatalogdMetaProvider.java:578)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:509)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.loadDbList(CatalogdMetaProvider.java:577)
at org.apache.impala.catalog.local.LocalCatalog.loadDbs(LocalCatalog.java:92)
... 3 more
Caused by: org.apache.impala.common.InternalException: Couldn't open transport
for localhost:26000 (connect() failed: Connection refused)
at org.apache.impala.service.FeSupport.NativeGetPartialCatalogObject(Native
Method)
at
org.apache.impala.service.FeSupport.GetPartialCatalogObject(FeSupport.java:440)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:380)
... 9 more
I0323 14:22:03.217051 29565 status.cc:126] LocalCatalogException: Unable to
load database names
CAUSED BY: TException: org.apache.impala.common.InternalException: Couldn't
open transport for localhost:26000 (connect() failed: Connection
refused){noformat}
What happens is that the ImpalaServer constructor calls
ImpalaServer::UpdateCatalogMetrics()
([https://github.com/apache/impala/blob/3b833902519fb8f0ef9b5fd20919c5fd85d22fcf/be/src/service/impala-server.cc#L452]
). UpdateCatalogMetrics() is maintaining two metrics that track the number of
databases and the number of tables. This ends up calling
org.apache.impala.catalog.local.LocalCatalog.getDbs(), which calls loadDbs()
([https://github.com/apache/impala/blob/ca0785ec206f27f06d8d6fd1b710779e548bbd8e/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java#L83]
). loadDbs() requires a connection to catalogd and will fail if it cannot
connect.
Importantly, this all happens before waiting for the catalogd to start up in
the regular ImpalaServer::Start():
{code:java}
if (FLAGS_is_coordinator) exec_env_->frontend()->WaitForCatalog();
{code}
In the old catalog implementation (use_local_catalog=false), the getDbs() call
on the catalog returns whatever values it has, and it does not try to contact
the catalogd. This is why the regular case does not see this problem.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]