[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984500#action_12984500
 ] 

Benjamin Reed commented on ZOOKEEPER-962:
-----------------------------------------

yeah TRUNC almost never happens, so it would be hard to recreate :)

> leader/follower coherence issue when follower is receiving a DIFF
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-962
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-962
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2
>            Reporter: Camille Fournier
>            Assignee: ChiaHung Lin
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: ZOOKEEPER-962.patch, ZOOKEEPER-962_2.patch, 
> ZOOKEEPER-962_3.patch, ZOOKEEPER-962_4.patch, ZOOKEEPER-962_5.patch
>
>
> From mailing list:
> It seems like we rely on the LearnerHandler thread startup to capture all of 
> the missing committed
> transactions in the SNAP or DIFF, but I don't see anything (especially in the 
> DIFF case) that
> is preventing us for committing more transactions before we actually start 
> forwarding updates
> to the new follower.
> Let me explain using my example from ZOOKEEPER-919. Assume we have quorum 
> already, so the
> leader can be processing transactions while my follower is starting up.
> I'm a follower at zxid N-5, the leader is at N. I send my FOLLOWERINFO packet 
> to the leader
> with that information. The leader gets the proposals from its committed log 
> (time T1), then
> syncs on the proposal list (LearnerHandler line 267. Why? It's a copy of the 
> underlying proposal
> list... this might be part of our problem). I check to see if the 
> peerLastZxid is within my
> max and min committed log and it is, so I'm going to send a diff. I set the 
> zxidToSend to
> be the maxCommittedLog at time T3 (we already know this is sketchy), and 
> forward the proposals
> from my copied proposal list starting at the peerLastZxid+1 up to the last 
> proposal transaction
> (as seen at time T1).
> After I have queued up all those diffs to send, I tell the leader to 
> startFowarding updates
> to this follower (line 308). 
> So, let's say that at time T2 I actually swap out the leader to the thread 
> that is handling
> the various request processors, and see that I got enough votes to commit 
> zxid N+1. I commit
> N+1 and so my maxCommittedLog at T3 is N+1, but this proposal is not in the 
> list of proposals
> that I got back at time T1, so I don't forward this diff to the client. 
> Additionally, I processed
> the commit and removed it from my leader's toBeApplied list. So when I call 
> startForwarding
> for this new follower, I don't see this transaction as a transaction to be 
> forwarded. 
> There's one problem. Let's also imagine, however, that I commit N+1 at time 
> T4. The maxCommittedLog
> value is consistent with the max of the diff packets I am going to send the 
> follower. But,
> I still committed N+1 and removed it from the toBeApplied list before calling 
> startFowarding
> with this follower. How does the follower get this transaction? Does it?
> To put it another way, here is the thread interaction, hopefully formatted so 
> you can read
> it...
>               LearnerHandlerThread                                    
> RequestProcessorThread
> T1(LH):       get list of proposals (COPY)
> T2(RPT):                                                              commit 
> N+1, remove from toBeApplied
> T3(LH):       get maxCommittedLog
> T4(LH):       send diffs from view at T1
> T5(LH):       startForwarding
> Or
> T1(LH):       get list of proposals (COPY)
> T2(LH):       get maxCommittedLog
> T3(RPT):                                                              commit 
> N+1, remove from toBeApplied
> T4(LH):       send diffs from view at T1
> T5(LH):       startFowarding
> I'm trying to figure out what, if anything, keeps the requests from being 
> committed, removed,
> and never seen by the follower before it fully starts up. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to