Re: Leader election

Michael Borokhovich Mon, 10 Dec 2018 22:33:32 -0800

Makes sense. Thanks, Ted. We will design our system to cope with the short
periods where we might have two leaders.


On Thu, Dec 6, 2018 at 11:03 PM Ted Dunning <ted.dunn...@gmail.com> wrote:

> ZK is able to guarantee that there is only one leader for the purposes of
> updating ZK data. That is because all commits have to originate with the
> current quorum leader and then be acknowledged by a quorum of the current
> cluster. IF the leader can't get enough acks, then it has de facto lost
> leadership.
>
> The problem comes when there is another system that depends on ZK data.
> Such data might record which node is the leader for some other purposes.
> That leader will only assume that they have become leader if they succeed
> in writing to ZK, but if there is a partition, then the old leader may not
> be notified that another leader has established themselves until some time
> after it has happened. Of course, if the erstwhile leader tried to validate
> its position with a write to ZK, that write would fail, but you can't spend
> 100% of your time doing that.
>
> it all comes down to the fact that a ZK client determines that it is
> connected to a ZK cluster member by pinging and that cluster member sees
> heartbeats from the leader. The fact is, though, that you can't tune these
> pings to be faster than some level because you start to see lots of false
> positives for loss of connection. Remember that it isn't just loss of
> connection here that is the point. Any kind of delay would have the same
> effect. Getting these ping intervals below one second makes for a very
> twitchy system.
>
>
>
> On Fri, Dec 7, 2018 at 11:03 AM Michael Borokhovich <michael...@gmail.com>
> wrote:
>
> > We are planning to run Zookeeper nodes embedded with the client nodes.
> > I.e., each client runs also a ZK node. So, network partition will
> > disconnect a ZK node and not only the client.
> > My concern is about the following statement from the ZK documentation:
> >
> > "Timeliness: The clients view of the system is guaranteed to be
> up-to-date
> > within a certain time bound. (*On the order of tens of seconds.*) Either
> > system changes will be seen by a client within this bound, or the client
> > will detect a service outage."
> >
> > What are these "*tens of seconds*"? Can we reduce this time by
> configuring
> > "syncLimit" and "tickTime" to let's say 5 seconds? Can we have a strong
> > guarantee on this time bound?
> >
> >
> > On Thu, Dec 6, 2018 at 1:05 PM Jordan Zimmerman <
> > jor...@jordanzimmerman.com>
> > wrote:
> >
> > > > Old service leader will detect network partition max 15 seconds after
> > it
> > > > happened.
> > >
> > > If the old service leader is in a very long GC it will not detect the
> > > partition. In the face of VM pauses, etc. it's not possible to avoid 2
> > > leaders for a short period of time.
> > >
> > > -JZ
> >
>

Re: Leader election

Reply via email to