Ok. Seems like a pretty bad problem, I made a JIRA (https://issues.apache.org/jira/browse/ZOOKEEPER-962), hopefully we can get a fix out asap. Depending on how busy I am over Christmas, I might have time to code it up later this week.
C -----Original Message----- From: Benjamin Reed [mailto:[email protected]] Sent: Tuesday, December 21, 2010 1:53 AM To: [email protected] Subject: Re: Question about leader/follower coherence oh right, that sync :) i think you are correct camille. there is a problem that was introduced when we put in fast sync. we need to synchronize the queuing of the diff messages with the startForwarding since there is an assumption that the toBeApplied list will not change between sending missing commits and startForwarding. unfortunately that code is very broken! tragically the synchronized block around the DIFF calculation is working with a copy of the log, so the synchronized is not helping. (this broke in ZOOKEEPER-783) i think that synchronizing on the real committedList and putting the startForwarding inside that sync block will fix it. we also need to update the "updates" variable when we send a DIFF to the last zxid sent, otherwise we end up sending duplicates. thanx for catching this camille! ben On 12/20/2010 03:17 PM, Fournier, Camille F. [Tech] wrote: > Oh, I did not articulate myself well. I mean the sync when a follower starts > up ("syncWithLeader" as it were), which doesn't seem to use the actual sync > feature. Or does it and I'm just not seeing where it is in the code? > > It seems like we rely on the LearnerHandler thread startup to capture all of > the missing committed transactions in the SNAP or DIFF, but I don't see > anything (especially in the DIFF case) that is preventing us for committing > more transactions before we actually start forwarding updates to the new > follower. > > Let me explain using my example from ZOOKEEPER-919. Assume we have quorum > already, so the leader can be processing transactions while my follower is > starting up. > > I'm a follower at zxid N-5, the leader is at N. I send my FOLLOWERINFO packet > to the leader with that information. The leader gets the proposals from its > committed log (time T1), then syncs on the proposal list (LearnerHandler line > 267. Why? It's a copy of the underlying proposal list... this might be part > of our problem). I check to see if the peerLastZxid is within my max and min > committed log and it is, so I'm going to send a diff. I set the zxidToSend to > be the maxCommittedLog at time T3 (we already know this is sketchy), and > forward the proposals from my copied proposal list starting at the > peerLastZxid+1 up to the last proposal transaction (as seen at time T1). > > After I have queued up all those diffs to send, I tell the leader to > startFowarding updates to this follower (line 308). > > So, let's say that at time T2 I actually swap out the leader to the thread > that is handling the various request processors, and see that I got enough > votes to commit zxid N+1. I commit N+1 and so my maxCommittedLog at T3 is > N+1, but this proposal is not in the list of proposals that I got back at > time T1, so I don't forward this diff to the client. Additionally, I > processed the commit and removed it from my leader's toBeApplied list. So > when I call startForwarding for this new follower, I don't see this > transaction as a transaction to be forwarded. > > There's one problem. Let's also imagine, however, that I commit N+1 at time > T4. The maxCommittedLog value is consistent with the max of the diff packets > I am going to send the follower. But, I still committed N+1 and removed it > from the toBeApplied list before calling startFowarding with this follower. > How does the follower get this transaction? Does it? > > To put it another way, here is the thread interaction, hopefully formatted so > you can read it... > > LearnerHandlerThread > RequestProcessorThread > T1(LH): get list of proposals (COPY) > T2(RPT): commit > N+1, remove from toBeApplied > T3(LH): get maxCommittedLog > T4(LH): send diffs from view at T1 > T5(LH): startForwarding > > > Or > T1(LH): get list of proposals (COPY) > T2(LH): get maxCommittedLog > T3(RPT): commit > N+1, remove from toBeApplied > T4(LH): send diffs from view at T1 > T5(LH): startFowarding > > > I'm trying to figure out what, if anything, keeps the requests from being > committed, removed, and never seen by the follower before it fully starts up. > > Thanks, > C > > > -----Original Message----- > From: Benjamin Reed [mailto:[email protected]] > Sent: Monday, December 20, 2010 4:06 PM > To: [email protected] > Subject: Re: Question about leader/follower coherence > > it turns out that there is a simple answer. > > first the sync guarantee: the client will see the effect of all > operations that happened before the sync started. > > to make that guarantee we just need to make sure that the follower that > the client is connected to has all transactions that were in flight when > the sync was received. > > as a side note let me point out that if you do a write, even if it > fails, you will get the same guarantee as the sync, but it will be > heavier weight because the write result will get pushed through the > atomic broadcast. > > to implement sync, the follower forwards the sync to the leader. the > processing pipeline at the follower will delay any requests after the > sync until the leader replies to the sync. when the leader get sync > there are two things that can happen: > > 1) there aren't any outstanding transactions: the leader queues a sync > reply to the follower. it will get queued behind any pending operations > that were previously sent to the follower. > > 2) there are outstanding transactions: leader notes the zxid of the last > outstanding transaction and installs a trigger to queue the sync reply > when that zxid gets committed. > > because everything is processed in order once a follower processes a > sync reply that follower will have processed all operations started > before the sync. note that the implementation has a stronger guarantee > than needed because it covers all operations started at the leader > before the sync. however, it is hard to reason about "started before" > since the leader determines the ordering. > > ok, that was a rather long simple answer :) > > ben > > On 12/20/2010 11:22 AM, Fournier, Camille F. [Tech] wrote: >> Hi everyone, >> >> A simple question with a possibly not simple answer: >> For transactions that happen and are committed on the leader/in the cluster >> (given a cluster with quorum already) during the time in which a new >> follower is being synced (sending diffs, sync, etc), what mechanism is it >> that ensure that those transactions also make it to the follower that was >> syncing at that time? >> >> Thanks, >> Camille >>
