[
https://issues.apache.org/jira/browse/ZOOKEEPER-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001460#comment-13001460
]
Vishal K commented on ZOOKEEPER-1005:
-------------------------------------
{quote}
I agree that the follower should not wait forever, and 40 seconds may be too
long. But these settings can be configure in zoo.cfg. I also think that the
fact that the leader takes so long to actually start the leader process is a
problem (which should be addressed separately, and may be related to our
particular failure simulation) and is a definite contributing factor to this
issue.
I am just surprised that the documentation indicates that follower should
continue to attempt to connect to the leader for the time specified by
initLimit, but that does not happen in this case. So the choice of initLimit is
a worthy debate, but does not seem to change the fact that a follower should be
trying to connect to leader until the initLimit time has elapsed (according to
documentation). The timeout on the socket is set to self.tickTime *
self.initLimit so this seems to indicate that the initLimit time is adhered to,
but only in certain classes of failures.
{quote}
Per my understanding, the leader waits for tickTime * initLimit and gives up
leadership if a majority of followers do not join the leader in that time. So I
think the text is written from leader's perspective. The follower does not wait
for that amount of time, but keeps trying to run leader election and then joins
the leader. In your case, the follower was not able to join the leader after 5
attempts. The follower will start new a leader election, but that will not
result in a new leader election round at the leader. The leader will wait for
40 seconds before giving up leadership. In the meantime, the follower will
complete the second round of election, elect the same leader and again try to
join the leader. I would expect that this process should complete soon (at
least within 40 seconds of leader wait time).
Without logs it will be hard to pinpoint the problem. Can you also take a stack
trace at the leader and the follower?
Can you give details about your experiment (you mentioned you also replaced a
node). Also, what is your SyncLimit? There have been some fixes in the code
path in question (ZOOKEEPER-822 and ZOOKEEPER-900). Higher syncLimit values or
failure of a node during leader election can cause significant delays without
these fixes.
Note that there have been many fixes in leader election since ZK 3.2.2.
We are committed to zookeeper 3.2.2 for the moment, since we do not have
sufficient resources to test a later zookeeper in time for inclusion in our
next product release. We will be implementing other workarounds for the moment,
but it seemed like a good idea to raise the issue and see whether this is the
desired behavior.
> Zookeeper servers fail to elect a leader succesfully.
> -----------------------------------------------------
>
> Key: ZOOKEEPER-1005
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1005
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.2.2
> Environment: zookeeper-3.2.2; debian
> Reporter: Alexandre Hardy
> Fix For: 3.4.0
>
>
> We were running 3 zookeeper servers, and simulated a failure on one of the
> servers.
> The one zookeeper node follows the other, but has trouble connecting. It
> looks like the following exception is the cause:
> {noformat}
> 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] --
> [org.apache.zookeeper.server.quorum.QuorumPeer] FOLLOWING
> 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] --
> [org.apache.zookeeper.server.ZooKeeperServer] Created server
> 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] --
> [org.apache.zookeeper.server.quorum.Follower] Following
> zookeeper3/192.168.131.11:2888
> 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING [zookeeper] --
> [org.apache.zookeeper.server.quorum.Follower] Unexpected exception, tries=0
> 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING
> java.net.ConnectException: -- Connection refused
> 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at
> java.net.PlainSocketImpl.socketConnect(Native Method)
> 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310)
> 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176)
> 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163)
> 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at
> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
> 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at
> java.net.Socket.connect(Socket.java:546)
> 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:156)
> 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:549)
> {noformat}
> The last exception while connecting was:
> {noformat}
> 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR [zookeeper] --
> [org.apache.zookeeper.server.quorum.Follower] Unexpected exception
> 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR java.net.ConnectException: --
> Connection refused
> 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at
> java.net.PlainSocketImpl.socketConnect(Native Method)
> 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310)
> 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176)
> 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163)
> 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at
> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
> 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at
> java.net.Socket.connect(Socket.java:546)
> 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:156)
> 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:549)
> 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 WARNING [zookeeper] --
> [org.apache.zookeeper.server.quorum.Follower] Exception when following the
> leader
> {noformat}
> The leader started leading a bit later
> {noformat}
> 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --
> [org.apache.zookeeper.server.quorum.FastLeaderElection] Notification: 0,
> 94489312534, 25, 2, LOOKING, LOOKING, 0
> 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --
> [org.apache.zookeeper.server.quorum.FastLeaderElection] Adding vote
> 2011-03-01T14:02:32+02:00 e0-cb-4e-65-4d-7d WARNING [zookeeper] --
> [org.apache.zookeeper.server.quorum.QuorumCnxManager] Cannot open channel to
> 1 at election address zookeeper2/192.168.132.10:3888
> 2011-03-01T14:02:32+02:00 e0-cb-4e-65-4d-7d WARNING -- at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
> 2011-03-01T14:02:50+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --
> [org.apache.zookeeper.server.quorum.QuorumPeer] LEADING
> {noformat}
> But at that time the follower had already terminated and started a new
> election, so the leader failed:
> {noformat}
> 2011-03-01T14:02:50+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --
> [org.apache.zookeeper.server.ZooKeeperServer] Created server
> 2011-03-01T14:02:50+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --
> [org.apache.zookeeper.server.persistence.FileSnap] Reading snapshot
> /var/lib/zookeeper/version-2/snapshot.1600007d16
> 2011-03-01T14:02:50+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --
> [org.apache.zookeeper.server.persistence.FileTxnSnapLog] Snapshotting:
> 1600007d16
> 2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING [zookeeper] --
> [org.apache.zookeeper.server.quorum.QuorumCnxManager] Cannot open channel to
> 1 at election address zookeeper2/192.168.132.10:3888
> 2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING -- at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
> 2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING -- at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:302)
> 2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING -- at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:323)
> 2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING -- at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:296)
> 2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --
> [org.apache.zookeeper.server.quorum.FastLeaderElection] Sending new
> notification.
> 2011-03-01T14:03:11+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --
> [org.apache.zookeeper.server.quorum.FastLeaderElection] Sending new
> notification.
> 2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING [zookeeper] --
> [org.apache.zookeeper.server.quorum.QuorumCnxManager] Cannot open channel to
> 1 at election address zookeeper2/192.168.132.10:3888
> 2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING -- at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
> 2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING -- at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:302)
> 2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING -- at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:323)
> 2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING -- at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:296)
> 2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --
> [org.apache.zookeeper.server.quorum.FastLeaderElection] Sending new
> notification.
> 2011-03-01T14:03:32+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --
> [org.apache.zookeeper.server.quorum.FastLeaderElection] Sending new
> notification.
> 2011-03-01T14:03:34+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --
> [org.apache.zookeeper.server.quorum.Leader] Shutdown called
> 2011-03-01T14:03:34+02:00 e0-cb-4e-65-4d-7d INFO -- at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:371)
> 2011-03-01T14:03:34+02:00 e0-cb-4e-65-4d-7d INFO -- at
> org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:297)
> 2011-03-01T14:03:34+02:00 e0-cb-4e-65-4d-7d INFO -- at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:562)
> {noformat}
> From http://zookeeper.apache.org/doc/r3.2.2/zookeeperStarted.html:
> {quote}
> The new entry, initLimit is timeouts ZooKeeper uses to limit the length of
> time the ZooKeeper servers in quorum have to connect to a leader
> {quote}
> Since we have initLimit=10 and tickTime=4000, we should have 40 seconds for a
> zookeeper server to contact the leader.
> However, in the source code
> src/java/main/org/apache/zookeeper/server/quorum/Follower.java:
> {noformat}
> 152 for (int tries = 0; tries < 5; tries++) {
> 153 try {
> 154 //sock = new Socket();
> 155 //sock.setSoTimeout(self.tickTime *
> self.initLimit);
> 156 sock.connect(addr, self.tickTime *
> self.syncLimit);
> 157 sock.setTcpNoDelay(nodelay);
> 158 break;
> 159 } catch (IOException e) {
> 160 if (tries == 4) {
> 161 LOG.error("Unexpected exception",e);
> 162 throw e;
> 163 } else {
> 164 LOG.warn("Unexpected exception,
> tries="+tries,e);
> 165 sock = new Socket();
> 166 sock.setSoTimeout(self.tickTime *
> self.initLimit);
> 167 }
> 168 }
> 169 Thread.sleep(1000);
> 170 }
> {noformat}
> It appears as if we only have 4 seconds to contact the leader. The timeouts
> are applied to the socket, but do not take into account that the zookeeper
> leader may not have started its zookeeper service yet.
> Is this the expected behaviour? Or is the expected behaviour that followers
> should always be able to connect to the leader?
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira