[ https://issues.apache.org/jira/browse/KUDU-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418350#comment-17418350 ]
ASF subversion and git services commented on KUDU-1620: ------------------------------------------------------- Commit 41ebabf2eb618b33fd30ad1821ccbda9d6390010 in kudu's branch refs/heads/master from Andrew Wong [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=41ebabf ] [rpc] KUDU-75: refresh DNS entries if proxies hit a network error This patch aims to tackle the following issues that revolve around changes in addresses at runtime. - KUDU-1885: master long-lived tserver proxies need to be re-resolved in case nodes are assigned different addresses; today we just retry at the same location forever. - KUDU-1620: tablet consensus long-lived proxies need to be re-resolved on failure. - C++ clients' usages of RemoteTabletServer also have long-lived proxies and are likely to run into similar problems if tservers are restarted and assigned new physical addresses. It addresses this by plumbing a DnsResolver into the rpc::Proxy class, and chaining the asynchronous callback to an asynchronous refresh of the address with the newly introduced refreshing capabilities of the DnsResolver. The new style of proxy isn't currently used, but a test is added exercising the new functionality. Change-Id: I777d169bd3a461294e5721f05071b726ced70f7e Reviewed-on: http://gerrit.cloudera.org:8080/17839 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <aser...@cloudera.com> > Consensus peer proxy hostnames should be reresolved on failure > -------------------------------------------------------------- > > Key: KUDU-1620 > URL: https://issues.apache.org/jira/browse/KUDU-1620 > Project: Kudu > Issue Type: Bug > Components: consensus > Affects Versions: 1.0.0 > Reporter: Adar Dembo > Priority: Major > Labels: docker > > Noticed this while documenting the workflow to replace a dead master, which > currently bypasses Raft config changes in favor of having the replacement > master "masquerade" as the dead master via DNS changes. > Internally we never rebuild consensus peer proxies in the event of network > failure; we assume that the peer will return at the same location. Nominally > this is reasonable; allowing peers to change host/port information on the fly > is tricky and has yet to be implemented. But, we should at least retry the > DNS resolution; not doing so forces the workflow to include steps to restart > the existing masters, which creates a (small) availability outage. -- This message was sent by Atlassian Jira (v8.3.4#803005)