[ 
https://issues.apache.org/jira/browse/IMPALA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-9549.
-----------------------------------
     Fix Version/s: Impala 4.0
    Target Version: Impala 4.0
        Resolution: Fixed

> Impalad startup fails to wait for catalogd to startup when using local catalog
> ------------------------------------------------------------------------------
>
>                 Key: IMPALA-9549
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9549
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Critical
>             Fix For: Impala 4.0
>
>
> Since Impala coordinators and executors may be starting up at the same time 
> as the catalogd, they should be tolerant of delays in the catalogd starting 
> up. When using local catalog (use_local_catalog=true), the Impalads fail with 
> the following error if the catalogd startup is delayed:
> {noformat}
> I0323 14:22:03.151849 29565 jni-util.cc:288] 
> org.apache.impala.catalog.local.LocalCatalogException: Unable to load 
> database names
> I0323 14:22:03.151849 29565 jni-util.cc:288] 
> org.apache.impala.catalog.local.LocalCatalogException: Unable to load 
> database names
>  at org.apache.impala.catalog.local.LocalCatalog.loadDbs(LocalCatalog.java:94)
>  at org.apache.impala.catalog.local.LocalCatalog.getDbs(LocalCatalog.java:83)
>  at org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:753)
>  at 
> org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:220)
> Caused by: org.apache.thrift.TException: 
> org.apache.impala.common.InternalException: Couldn't open transport for 
> localhost:26000 (connect() failed: Connection refused)
>  at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:382)
>  at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:174)
>  at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$1.call(CatalogdMetaProvider.java:583)
>  at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$1.call(CatalogdMetaProvider.java:578)
>  at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:509)
>  at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadDbList(CatalogdMetaProvider.java:577)
>  at org.apache.impala.catalog.local.LocalCatalog.loadDbs(LocalCatalog.java:92)
>  ... 3 more
> Caused by: org.apache.impala.common.InternalException: Couldn't open 
> transport for localhost:26000 (connect() failed: Connection refused)
>  at org.apache.impala.service.FeSupport.NativeGetPartialCatalogObject(Native 
> Method)
>  at 
> org.apache.impala.service.FeSupport.GetPartialCatalogObject(FeSupport.java:440)
>  at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:380)
>  ... 9 more
> I0323 14:22:03.217051 29565 status.cc:126] LocalCatalogException: Unable to 
> load database names
> CAUSED BY: TException: org.apache.impala.common.InternalException: Couldn't 
> open transport for localhost:26000 (connect() failed: Connection 
> refused){noformat}
> What happens is that the ImpalaServer constructor calls 
> ImpalaServer::UpdateCatalogMetrics() 
> ([https://github.com/apache/impala/blob/3b833902519fb8f0ef9b5fd20919c5fd85d22fcf/be/src/service/impala-server.cc#L452]
>  ). UpdateCatalogMetrics() is maintaining two metrics that track the number 
> of databases and the number of tables. This ends up calling 
> org.apache.impala.catalog.local.LocalCatalog.getDbs(), which calls loadDbs() 
> ([https://github.com/apache/impala/blob/ca0785ec206f27f06d8d6fd1b710779e548bbd8e/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java#L83]
>  ). loadDbs() requires a connection to catalogd and will fail if it cannot 
> connect.
> Importantly, this all happens before waiting for the catalogd to start up in 
> the regular ImpalaServer::Start():
> {code:java}
> if (FLAGS_is_coordinator) exec_env_->frontend()->WaitForCatalog();
> {code}
>  
> In the old catalog implementation (use_local_catalog=false), the getDbs() 
> call on the catalog returns whatever values it has, and it does not try to 
> contact the catalogd. This is why the regular case does not see this problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to