[
https://issues.apache.org/jira/browse/ZOOKEEPER-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370735#comment-16370735
]
Eron Wright commented on ZOOKEEPER-2982:
-----------------------------------------
Attached 'fixed.log' which demonstrates the behavior after the fix is applied.
Let me know if you also need to see the output from an unpatched cluster (I
would prefer not to spend the time to get that).
> Re-try DNS hostname -> IP resolution
> ------------------------------------
>
> Key: ZOOKEEPER-2982
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2982
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.5.0, 3.5.1, 3.5.3
> Reporter: Eron Wright
> Priority: Blocker
> Fix For: 3.5.4, 3.6.0
>
> Attachments: fixed.log
>
>
> ZOOKEEPER-1506 fixed a DNS resolution issue in 3.4. Some portions of the fix
> haven't yet been ported to 3.5.
> To recap the outstanding problem in 3.5, if a given ZK server is started
> before all peer addresses are resolvable, that server may cache a negative
> lookup result and forever fail to resolve the address. For example,
> deploying ZK 3.5 to Kubernetes using a StatefulSet plus a Service (headless)
> may fail because the DNS records are created lazily.
> {code}
> 2018-02-18 09:11:22,583 [myid:0] - WARN
> [QuorumPeer[myid=0](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@95]
> - Exception when following the leader
> java.net.UnknownHostException: zk-2.zk.default.svc.cluster.local
> at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at
> org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:227)
> at
> org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:256)
> at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:76)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {code}
> In the above example, the address `zk-2.zk.default.svc.cluster.local` was not
> resolvable when the server started, but became resolvable shortly thereafter.
> The server should eventually succeed but doesn't.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)