[ 
https://issues.apache.org/jira/browse/IMPALA-8634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933844#comment-16933844
 ] 

ASF subversion and git services commented on IMPALA-8634:
---------------------------------------------------------

Commit b96b3b0b1ca97e5d756392a159e22dfcd8bcae71 in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b96b3b0 ]

IMPALA-8634: Catalog client should retry RPCs

Add retries to catalogd RPCs. Previously, connection failures triggered
a retry, but failures on the actual RPC did not trigger a retry. This
change replaces all usages of ClientCache::DoRpc() in the
CatalogOpExecutor with ClientCache::DoRpcWithRetry(). This change moves
the connection retry loop to DoRpcWithRetry(), instead of relying on the
ClientCache to retry the connection.

This patch is based to IMPALA-8904, which adds similar functionality to
statestore RPCs.

Testing:
* Renamed test_statestore_rpc_errors.py to test_services_rpc_errors.py
and added new tests for catalogd RPC errors
* Added new tests to test_restart_services.py
* Ran core tests

Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Reviewed-on: http://gerrit.cloudera.org:8080/14246
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Catalog client should be resilient to temporary Catalog outage
> --------------------------------------------------------------
>
>                 Key: IMPALA-8634
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8634
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 3.2.0
>            Reporter: Michael Ho
>            Assignee: Sahil Takiar
>            Priority: Critical
>
> Currently, when the catalog server is down, catalog clients will fail all 
> RPCs sent to it. In essence, DDL queries will fail and the Impala service 
> becomes a lot less functional. Catalog clients should consider retrying 
> failed RPCs with some exponential backoff in between while catalog server is 
> being restarted after crashing. We probably need to add [a test 
> |https://github.com/apache/impala/blob/master/tests/custom_cluster/test_restart_services.py]
>  to exercise the paths of catalog restart to verify coordinators are 
> resilient to it.
> cc'ing [~stakiar], [~joemcdonnell], [~twm378]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to