[ https://issues.apache.org/jira/browse/ZOOKEEPER-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038180#comment-17038180 ]
Michael Dürr commented on ZOOKEEPER-2164: ----------------------------------------- [~symat] ??Still, it is a question for me if this ticket was originally about this issue or not. Some of the comments seems to indicate that people were hitting the 0.0.0.0 issues, but in the original description ZooKeeper 3.4.5 was mentioned, and that can not be the the issue you and I were talking here. I still have to look into that.?? I'm one of the guys who experienced this behavior with a docker setup based on version 3.5.5 [#comment-16940855]. I never had problems with any of the 3.4.X versions. As [~paxi] we have the problem, that we have to rely on 0.0.0.0 as the local address. The docker images cannot bind to the external host address and we cannot use FQDNs as well. I would be really grateful if you can provide a fix for this issue! > fast leader election keeps failing > ---------------------------------- > > Key: ZOOKEEPER-2164 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2164 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection > Affects Versions: 3.4.5 > Reporter: Michi Mutsuzaki > Assignee: Mate Szalay-Beko > Priority: Major > Fix For: 3.7.0, 3.5.8 > > > I have a 3-node cluster with sids 1, 2 and 3. Originally 2 is the leader. > When I shut down 2, 1 and 3 keep going back to leader election. Here is what > seems to be happening. > - Both 1 and 3 elect 3 as the leader. > - 1 receives votes from 3 and itself, and starts trying to connect to 3 as a > follower. > - 3 doesn't receive votes for 5 seconds because connectOne() to 2 doesn't > timeout for 5 seconds: > https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L346 > - By the time 3 receives votes, 1 has given up trying to connect to 3: > https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L247 > I'm using 3.4.5, but it looks like this part of the code hasn't changed for a > while, so I'm guessing later versions have the same issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)