Hi Alex, Because of the following:

1- A follower F processes operations from a client in FIFO order, and say that 
a client submits as you say sync + read;
2- A sync will be processed by the leader and returned to the follower. It will 
be queued after all pending updates that the follower hasn't processed;
3- The follower will process all pending updates before processing the response 
of the sync;
4- Once the follower processes the sync, it picks the read operation to 
process. It reads the local state of the follower and returns to the client.

When we process the read in Step 4, we have applied all pending updates the 
leader had for the follower by the time the read request started. 

This implementation is a bit of a hack because it doesn't follow the same code 
path as the other operations that go to the leader, but it avoids some 
unnecessary steps, which is important for fast reads. In the sync case, the 
other followers don't really need to know about it (there is nothing to be 
updated) and the leader simply inserts it in the sequence of updates of F, 
ordering it.

-Flavio

On Sep 27, 2012, at 9:12 AM, Alexander Shraer wrote:

> Hi Flavio,
> 
>> Starting a read operation concurrently with a sync implies that the result 
>> of the read will not miss an update committed before the read started.
> 
> I thought that the intention of sync was to give something like
> linearizable reads, so if you invoke a sync and then a read, your read
> is guaranteed to (at least) see any write which completed before the
> sync began. Is this the intention ? If so, how is this achieved
> without running agreement on the sync op ?
> 
> Thanks,
> Alex
> 
> On Thu, Sep 27, 2012 at 12:05 AM, Flavio Junqueira <[email protected]> wrote:
>> sync simply flushes the channel between the leader and the follower that 
>> forwarded the sync operation, so it doesn't go through the full zab 
>> pipeline. Flushing means that all pending updates from the leader to the 
>> follower are received by the time sync completes. Starting a read operation 
>> concurrently with a sync implies that the result of the read will not miss 
>> an update committed before the read started.
>> 
>> -Flavio
>> 
>> On Sep 27, 2012, at 3:43 AM, Alexander Shraer wrote:
>> 
>>> Its strange that sync doesn't run through agreement, I was always
>>> assuming that it is... Exactly for the reason you say -
>>> you may trust your leader, but I may have a different leader and your
>>> leader may not detect it yet and still think its the leader.
>>> 
>>> This seems like a bug to me.
>>> 
>>> Similarly to Paxos, Zookeeper's safety guarantees don't (or shouldn't)
>>> depend on timing assumption.
>>> Only progress guarantees depend on time.
>>> 
>>> Alex
>>> 
>>> 
>>> On Wed, Sep 26, 2012 at 4:41 PM, John Carrino <[email protected]> 
>>> wrote:
>>>> I have some pretty strong requirements in terms of consistency where
>>>> reading from followers that may be behind in terms of updates isn't ok for
>>>> my use case.
>>>> 
>>>> One error case that worries me is if a follower and leader are partitioned
>>>> off from the network.  A new leader is elected, but the follower and old
>>>> leader don't know about it.
>>>> 
>>>> Normally I think sync was made for this purpost, but I looked at the sync
>>>> code and if there aren't any outstanding proposals the leader sends the
>>>> sync right back to the client without first verifying that it still has
>>>> quorum, so this won't work for my use case.
>>>> 
>>>> At the core of the issue all I really need is a call that will make it's
>>>> way to the leader and will ping it's followers, ensure it still has a
>>>> quorum and return success.
>>>> 
>>>> Basically a getCurrentLeaderEpoch() method that will be forwarded to the
>>>> leader, leader will ensure it still has quorum and return it's epoch.  I
>>>> can use this primitive to implement all the other properties I want to
>>>> verify (assuming that my client will never connect to an older epoch after
>>>> this call returns). Also the nice thing about this method is that it will
>>>> not have to hit disk and the latency should just be a round trip to the
>>>> followers.
>>>> 
>>>> Most of the guarentees offered by zookeeper are time based an rely on
>>>> clocks and expiring timers, but I'm hoping to offer some guarantees in
>>>> spite of busted clocks, horrible GC perf, VM suspends and any other way
>>>> time is broken.
>>>> 
>>>> Also if people are interested I can go into more detail about what I am
>>>> trying to write.
>>>> 
>>>> -jc
>> 

Reply via email to