Hello Anurag Mantripragada, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15561

to look at the new patch set (#3).

Change subject: IMPALA-9549: Handle catalogd startup delays when using local 
catalog
......................................................................

IMPALA-9549: Handle catalogd startup delays when using local catalog

Impalads should be tolerant of delays in catalogd startup.
Currently, when running with the local catalog
(use_local_catalog=true), impalad startup can fail when catalogd
startup is delayed. What happens is that ImpalaServer's constructor
calls ImpalaServer::UpdateCatalogMetrics(), which maintains two
metrics counting the number of tables and databases. This is before
the code in ImpalaServer::Start() that waits for the catalogd to
start (added by IMPALA-4704), so there is no guarantee that catalogd
is running. The UpdateCatalogMetrics() call ends up calling getDbs()
in the frontend catalog. LocalCatalog::getDbs() tries to load the
databases (and thus contact catalogd), and this call will fail if
catalogd is not running. This fails startup.

use_local_catalog=false is immune to this only because it does not
contact catalogd in Catalog::getDbs().

This moves the UpdateCatalogMetrics() call from the ImpalaServer
constructor to ImpalaServer::Start() after the impalad has already
waited for the catalogd to start up. It also limits the call to
run only in coordinators.

Testing:
 - Added a test to custom_cluster.test_catalog_wait to delay catalogd
   start up by 60 seconds and verify that the impalads successfully
   start up. This test fails prior to this change.
 - Hand tested to verify that the metrics that are maintained by
   UpdateCatalogMetrics() are not meaningfully changed. They are not.
   Running coordinators and executors have the right count after they
   successfully start up.

Change-Id: I1b5a94c59faaaa25927a169dcb58f310ce6b1044
---
M be/src/catalog/catalog-server.cc
M be/src/common/global-flags.cc
M be/src/service/impala-server.cc
M tests/custom_cluster/test_catalog_wait.py
4 files changed, 66 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/15561/3
--
To view, visit http://gerrit.cloudera.org:8080/15561
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1b5a94c59faaaa25927a169dcb58f310ce6b1044
Gerrit-Change-Number: 15561
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <anu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>

Reply via email to