[ 
https://issues.apache.org/jira/browse/KUDU-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418350#comment-17418350
 ] 

ASF subversion and git services commented on KUDU-1620:
-------------------------------------------------------

Commit 41ebabf2eb618b33fd30ad1821ccbda9d6390010 in kudu's branch 
refs/heads/master from Andrew Wong
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=41ebabf ]

[rpc] KUDU-75: refresh DNS entries if proxies hit a network error

This patch aims to tackle the following issues that revolve around
changes in addresses at runtime.
- KUDU-1885: master long-lived tserver proxies need to be re-resolved in
  case nodes are assigned different addresses; today we just retry at
  the same location forever.
- KUDU-1620: tablet consensus long-lived proxies need to be re-resolved
  on failure.
- C++ clients' usages of RemoteTabletServer also have long-lived proxies
  and are likely to run into similar problems if tservers are restarted
  and assigned new physical addresses.

It addresses this by plumbing a DnsResolver into the rpc::Proxy class,
and chaining the asynchronous callback to an asynchronous refresh of the
address with the newly introduced refreshing capabilities of the
DnsResolver.

The new style of proxy isn't currently used, but a test is added
exercising the new functionality.

Change-Id: I777d169bd3a461294e5721f05071b726ced70f7e
Reviewed-on: http://gerrit.cloudera.org:8080/17839
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <aser...@cloudera.com>


> Consensus peer proxy hostnames should be reresolved on failure
> --------------------------------------------------------------
>
>                 Key: KUDU-1620
>                 URL: https://issues.apache.org/jira/browse/KUDU-1620
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>    Affects Versions: 1.0.0
>            Reporter: Adar Dembo
>            Priority: Major
>              Labels: docker
>
> Noticed this while documenting the workflow to replace a dead master, which 
> currently bypasses Raft config changes in favor of having the replacement 
> master "masquerade" as the dead master via DNS changes.
> Internally we never rebuild consensus peer proxies in the event of network 
> failure; we assume that the peer will return at the same location. Nominally 
> this is reasonable; allowing peers to change host/port information on the fly 
> is tricky and has yet to be implemented. But, we should at least retry the 
> DNS resolution; not doing so forces the workflow to include steps to restart 
> the existing masters, which creates a (small) availability outage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to