[ https://issues.apache.org/jira/browse/KUDU-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418349#comment-17418349 ]
ASF subversion and git services commented on KUDU-1885: ------------------------------------------------------- Commit 41ebabf2eb618b33fd30ad1821ccbda9d6390010 in kudu's branch refs/heads/master from Andrew Wong [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=41ebabf ] [rpc] KUDU-75: refresh DNS entries if proxies hit a network error This patch aims to tackle the following issues that revolve around changes in addresses at runtime. - KUDU-1885: master long-lived tserver proxies need to be re-resolved in case nodes are assigned different addresses; today we just retry at the same location forever. - KUDU-1620: tablet consensus long-lived proxies need to be re-resolved on failure. - C++ clients' usages of RemoteTabletServer also have long-lived proxies and are likely to run into similar problems if tservers are restarted and assigned new physical addresses. It addresses this by plumbing a DnsResolver into the rpc::Proxy class, and chaining the asynchronous callback to an asynchronous refresh of the address with the newly introduced refreshing capabilities of the DnsResolver. The new style of proxy isn't currently used, but a test is added exercising the new functionality. Change-Id: I777d169bd3a461294e5721f05071b726ced70f7e Reviewed-on: http://gerrit.cloudera.org:8080/17839 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <aser...@cloudera.com> > Master caches DNS name resolution forever > ----------------------------------------- > > Key: KUDU-1885 > URL: https://issues.apache.org/jira/browse/KUDU-1885 > Project: Kudu > Issue Type: Bug > Components: master > Affects Versions: 1.3.0 > Reporter: Adar Dembo > Priority: Major > > TSDescriptor::GetTSAdminProxy() and TSDescriptor::GetConsensusProxy() will > return the same proxy instances over and over. Normally, this is a reasonable > optimization. But suppose the IP address of the tserver changes (due to a > DHCP lease expiring or some such). Now these methods will be returning > unusable proxies, and there's no way to "reset" them. > Admittedly this scenario is a little contrived: if a tserver's IP address > suddenly changes, a bunch of other stuff will break too. The tserver will > probably need to be restarted (since it's bound to a socket whose address no > longer exists), and consensus may be thoroughly wrecked due to built-in > host/port assumptions (see KUDU-418). > An issue like this was reported by a user in Slack, who was running a master > and tserver on the same box. The symptom was "half-open" communication > between them: the tserver could heartbeat to the master, but the master could > not send RPCs to the tserver. -- This message was sent by Atlassian Jira (v8.3.4#803005)