Re: Leader election

Maciej Smoleński Thu, 06 Dec 2018 12:55:46 -0800

Hello,

Ensuring reliability requires to use consensus directly in your service or
change the service to use distributed log/journal (e.g. bookkeeper).


However following idea is simple and in many situation good enough.
If you configure session timeout to 15 seconds - then zookeeper client will
be disconnected when partitioned - after max 15 seconds.
Old service leader will detect network partition max 15 seconds after it
happened.
The new service leader should be idle for initial 15+ seconds (let's say 30
seconds).
In this way you avoid situation with 2 concurrently working leaders.

Described solution has time dependencies and in some situations leads to
incorrect state e.g.:
High load on machine might cause that zookeeper client will detect
disconnection after 60 seconds (instead of expected 15 seconds). In such
situation there will be 2 concurrent leaders.

Maciej




On Thu, Dec 6, 2018 at 8:09 PM Jordan Zimmerman <jor...@jordanzimmerman.com>
wrote:

> > it seems like the
> > inconsistency may be caused by the partition of the Zookeeper cluster
> > itself
>
> Yes - there are many ways in which you can end up with 2 leaders. However,
> if properly tuned and configured, it will be for a few seconds at most.
> During a GC pause no work is being done anyway. But, this stuff is very
> tricky. Requiring an atomically unique leader is actually a design smell
> and you should reconsider your architecture.
>
> > Maybe we can use a more
> > lightweight Hazelcast for example?
>
> There is no distributed system that can guarantee a single leader. Instead
> you need to adjust your design and algorithms to deal with this (using
> optimistic locking, etc.).
>
> -Jordan
>
> > On Dec 6, 2018, at 1:52 PM, Michael Borokhovich <michael...@gmail.com>
> wrote:
> >
> > Thanks Jordan,
> >
> > Yes, I will try Curator.
> > Also, beyond the problem described in the Tech Note, it seems like the
> > inconsistency may be caused by the partition of the Zookeeper cluster
> > itself. E.g., if a "leader" client is connected to the partitioned ZK
> node,
> > it may be notified not at the same time as the other clients connected to
> > the other ZK nodes. So, another client may take leadership while the
> > current leader still unaware of the change. Is it true?
> >
> > Another follow up question. If Zookeeper can guarantee a single leader,
> is
> > it worth using it just for leader election? Maybe we can use a more
> > lightweight Hazelcast for example?
> >
> > Michael.
> >
> >
> > On Thu, Dec 6, 2018 at 4:50 AM Jordan Zimmerman <
> jor...@jordanzimmerman.com>
> > wrote:
> >
> >> It is not possible to achieve the level of consistency you're after in
> an
> >> eventually consistent system such as ZooKeeper. There will always be an
> >> edge case where two ZooKeeper clients will believe they are leaders
> (though
> >> for a short period of time). In terms of how it affects Apache Curator,
> we
> >> have this Tech Note on the subject:
> >> https://cwiki.apache.org/confluence/display/CURATOR/TN10 <
> >> https://cwiki.apache.org/confluence/display/CURATOR/TN10> (the
> >> description is true for any ZooKeeper client, not just Curator
> clients). If
> >> you do still intend to use a ZooKeeper lock/leader I suggest you try
> Apache
> >> Curator as writing these "recipes" is not trivial and have many gotchas
> >> that aren't obvious.
> >>
> >> -Jordan
> >>
> >> http://curator.apache.org <http://curator.apache.org/>
> >>
> >>
> >>> On Dec 5, 2018, at 6:20 PM, Michael Borokhovich <michael...@gmail.com>
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> We have a service that runs on 3 hosts for high availability. However,
> at
> >>> any given time, exactly one instance must be active. So, we are
> thinking
> >> to
> >>> use Leader election using Zookeeper.
> >>> To this goal, on each service host we also start a ZK server, so we
> have
> >> a
> >>> 3-nodes ZK cluster and each service instance is a client to its
> dedicated
> >>> ZK server.
> >>> Then, we implement a leader election on top of Zookeeper using a basic
> >>> recipe:
> >>> https://zookeeper.apache.org/doc/r3.1.2/recipes.html#sc_leaderElection
> .
> >>>
> >>> I have the following questions doubts regarding the approach:
> >>>
> >>> 1. It seems like we can run into inconsistency issues when network
> >>> partition occurs. Zookeeper documentation says that the inconsistency
> >>> period may last “tens of seconds”. Am I understanding correctly that
> >> during
> >>> this time we may have 0 or 2 leaders?
> >>> 2. Is it possible to reduce this inconsistency time (let's say to 3
> >>> seconds) by tweaking tickTime and syncLimit parameters?
> >>> 3. Is there a way to guarantee exactly one leader all the time? Should
> we
> >>> implement a more complex leader election algorithm than the one
> suggested
> >>> in the recipe (using ephemeral_sequential nodes)?
> >>>
> >>> Thanks,
> >>> Michael.
> >>
> >>
>
>

Re: Leader election

Reply via email to