[ https://issues.apache.org/jira/browse/KUDU-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362395#comment-17362395 ]
ASF subversion and git services commented on KUDU-2302: ------------------------------------------------------- Commit f9647149a49ddb87ea0ecf069eab3b5ec0217136 in kudu's branch refs/heads/master from Andrew Wong [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=f964714 ] [consensus] KUDU-2302: don't crash if new leader can't resolve peer When a tablet replica is elected leader, it constructs Peer objects for each replica in the Raft config for the sake of sending RPCs to each. If, during this construction, any remote peer cannot be reached for whatever reason, this would result in a crash. Rather than crashing, this patch allows us to start Peers without a proxy, and retries constructing the proxy the next time a proxy is required. Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Reviewed-on: http://gerrit.cloudera.org:8080/17585 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <aser...@cloudera.com> > Leader crashes if it can't resolve DNS address of a peer > -------------------------------------------------------- > > Key: KUDU-2302 > URL: https://issues.apache.org/jira/browse/KUDU-2302 > Project: Kudu > Issue Type: Bug > Components: consensus, master, tserver > Affects Versions: 1.6.0, 1.7.0, 1.8.0, 1.7.1, 1.9.0, 1.10.0, 1.10.1, > 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0 > Reporter: Todd Lipcon > Assignee: Andrew Wong > Priority: Critical > Labels: crash, roadmap-candidate, stability > > In BecomeLeader we call: > {code} > CHECK_OK(BecomeLeaderUnlocked()); > {code} > This will fail if it fails to resolve the address of one of its peers. > Instead it should probably continue to be leader but consider attempts to RPC > to that peer to be failed due to network resolution (with periodic retries of > resolution) -- This message was sent by Atlassian Jira (v8.3.4#803005)