[jira] [Commented] (ZOOKEEPER-1697) large snapshots can cause continuous quorum failure

Flavio Junqueira (JIRA) Fri, 10 May 2013 00:53:21 -0700

    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13653624#comment-13653624
 ]


Flavio Junqueira commented on ZOOKEEPER-1697:
---------------------------------------------

Ah, great question. In my view, the UPTODATE message is the commit of the 
synchronization phase of recovery, so no ack is actually needed, but we have it 
for backward compatibility. In the way we are doing it currently, you're right 
that the first ack received in the LearnerHandler#run() while(true) loop is a 
response to the UPTODATE so it could be considered part of the synchronization 
phase. 

However, if we follow this interpretation, there is a discrepancy in what is 
implemented, I think. The leader must gather a quorum of supporters within 
initLimit ticks. Once it gets it, the leader starts running and believes that 
everyone is synced up. But, the UPTODATE message really goes out only after the 
leader has already started running:

{code}
            /*
             * Wait until leader starts up
             */
            synchronized(leader.zk){
                while(!leader.zk.isRunning() && !this.isInterrupted()){
                    leader.zk.wait(20);
                }
            }
            // Mutation packets will be queued during the serialize,
            // so we need to mark when the peer can actually start
            // using the data
            //
            LOG.debug("Sending UPTODATE message to " + sid);      
            queuedPackets.add(new QuorumPacket(Leader.UPTODATE, -1, null, 
null));
{code}  

>From the leader perspective, it is already established when 
>leader.zk.isRunning() is true and in my understanding we should be giving 
>syncLimit ticks for acks once that predicate is true. Now, in practice a tick 
>should be long enough to enable the leader to get a response for the UPTODATE 
>message. I think what we are really discussing here is definitions, which is 
>great. 
                
> large snapshots can cause continuous quorum failure
> ---------------------------------------------------
>
>                 Key: ZOOKEEPER-1697
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1697
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3, 3.5.0
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.5.0, 3.4.6
>
>         Attachments: ZOOKEEPER-1697_branch34.patch, 
> ZOOKEEPER-1697_branch34.patch, ZOOKEEPER-1697.patch, ZOOKEEPER-1697.patch
>
>
> I keep seeing this on the leader:
> 2013-04-30 01:18:39,754 INFO
> org.apache.zookeeper.server.quorum.Leader: Shutdown called
> java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 2
> at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:447)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:422)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
> The followers are downloading the snapshot when this happens, and are
> trying to do their first ACK to the leader, the ack fails with broken
> pipe.
> In this case the snapshots are large and the config has increased the
> initLimit. syncLimit is small - 10 or so with ticktime of 2000. Note
> this is 3.4.3 with ZOOKEEPER-1521 applied.
> I originally speculated that
> https://issues.apache.org/jira/browse/ZOOKEEPER-1521 might be related.
> I thought I might have broken something for this environment. That
> doesn't look to be the case.
> As it looks now it seems that 1521 didn't go far enough. The leader
> verifies that all followers have ACK'd to the leader within the last
> "syncLimit" time period. This runs all the time in the background on
> the leader to identify the case where a follower drops. In this case
> the followers take so long to load the snapshot that this check fails
> the very first time, as a result the leader drops (not enough ack'd
> followers w/in the sync limit) and re-election happens. This repeats
> forever. (the above error)
> this is the call:
> org.apache.zookeeper.server.quorum.LearnerHandler.synced() that's at
> odds.
> look at setting of tickOfLastAck in
> org.apache.zookeeper.server.quorum.LearnerHandler.run()
> It's not set until the follower first acks - in this case I can see
> that the followers are not getting to the ack prior to the leader
> shutting down due to the error log above.
> It seems that sync() should probably use the init limit until the
> first ack comes in from the follower. I also see that while tickOfLastAck and 
> leader.self.tick is shared btw two threads there is no synchronization of the 
> shared resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1697) large snapshots can cause continuous quorum failure

Reply via email to