Re: Leader election

Michael Han Wed, 12 Dec 2018 19:00:01 -0800

>> Can we reduce this time by configuring "syncLimit" and "tickTime" to
let's say 5 seconds? Can we have a strong
guarantee on this time bound?


It's not possible to guarantee the time bound, because of FLP impossibility
(reliable failure detection is not possible in async environment). Though
it's certainly possible to tune the parameters to some reasonable value
that fits your environment (which would be the SLA of your service).

>> As describe above - you might use 'sync'+'read' to avoid this problem.

I am afraid sync + read would not be correct 100% in all cases here. The
state of the world (e.g. leaders) could change between sync and read
operation. What we need here is linearizable read, which means we need have
read operations also go through the quorum consensus, which might be a nice
feature to have for ZooKeeper (for reference, etcd implements linearizable
read). Also, note ZooKeeper sync has bugs (sync should be a quorum
operation itself, but it's not implemented that way).

On Fri, Dec 7, 2018 at 2:11 AM Maciej Smoleński <jezd...@gmail.com> wrote:

> On Fri, Dec 7, 2018 at 3:03 AM Michael Borokhovich <michael...@gmail.com>
> wrote:
>
> > We are planning to run Zookeeper nodes embedded with the client nodes.
> > I.e., each client runs also a ZK node. So, network partition will
> > disconnect a ZK node and not only the client.
> > My concern is about the following statement from the ZK documentation:
> >
> > "Timeliness: The clients view of the system is guaranteed to be
> up-to-date
> > within a certain time bound. (*On the order of tens of seconds.*) Either
> > system changes will be seen by a client within this bound, or the client
> > will detect a service outage."
> >
>
> This is related to the fact that ZooKeeper server handles reads from its
> local state - without communicating with other ZooKeeper servers.
> This design ensures scalability for read dominated workloads.
> In this approach client might receive data which is not up to date (it
> might not contain updates from other ZooKeeper servers (quorum)).
> Parameter 'syncLimit' describes how often ZooKeeper server
> synchronizes/updates its local state to global state.
> Client read operation will retrieve data from state not older then
> described by 'syncLimit'.
>
> However ZooKeeper client can always force to retrieve data which is up to
> date.
> It needs to issue 'sync' command to ZooKeeper server before issueing
> 'read'.
> With 'sync' ZooKeeper server with synchronize its local state with global
> state.
> Later 'read' will be handled from updated state.
> Client should be careful here - so that it communicates with the same
> ZooKeeper server for both 'sync' and 'read'.
>
>
> > What are these "*tens of seconds*"? Can we reduce this time by
> configuring
> > "syncLimit" and "tickTime" to let's say 5 seconds? Can we have a strong
> > guarantee on this time bound?
> >
>
> As describe above - you might use 'sync'+'read' to avoid this problem.
>
>
> >
> >
> > On Thu, Dec 6, 2018 at 1:05 PM Jordan Zimmerman <
> > jor...@jordanzimmerman.com>
> > wrote:
> >
> > > > Old service leader will detect network partition max 15 seconds after
> > it
> > > > happened.
> > >
> > > If the old service leader is in a very long GC it will not detect the
> > > partition. In the face of VM pauses, etc. it's not possible to avoid 2
> > > leaders for a short period of time.
> > >
> > > -JZ
> >
>

Re: Leader election

Reply via email to