> it seems like the > inconsistency may be caused by the partition of the Zookeeper cluster > itself
Yes - there are many ways in which you can end up with 2 leaders. However, if properly tuned and configured, it will be for a few seconds at most. During a GC pause no work is being done anyway. But, this stuff is very tricky. Requiring an atomically unique leader is actually a design smell and you should reconsider your architecture. > Maybe we can use a more > lightweight Hazelcast for example? There is no distributed system that can guarantee a single leader. Instead you need to adjust your design and algorithms to deal with this (using optimistic locking, etc.). -Jordan > On Dec 6, 2018, at 1:52 PM, Michael Borokhovich <[email protected]> wrote: > > Thanks Jordan, > > Yes, I will try Curator. > Also, beyond the problem described in the Tech Note, it seems like the > inconsistency may be caused by the partition of the Zookeeper cluster > itself. E.g., if a "leader" client is connected to the partitioned ZK node, > it may be notified not at the same time as the other clients connected to > the other ZK nodes. So, another client may take leadership while the > current leader still unaware of the change. Is it true? > > Another follow up question. If Zookeeper can guarantee a single leader, is > it worth using it just for leader election? Maybe we can use a more > lightweight Hazelcast for example? > > Michael. > > > On Thu, Dec 6, 2018 at 4:50 AM Jordan Zimmerman <[email protected]> > wrote: > >> It is not possible to achieve the level of consistency you're after in an >> eventually consistent system such as ZooKeeper. There will always be an >> edge case where two ZooKeeper clients will believe they are leaders (though >> for a short period of time). In terms of how it affects Apache Curator, we >> have this Tech Note on the subject: >> https://cwiki.apache.org/confluence/display/CURATOR/TN10 < >> https://cwiki.apache.org/confluence/display/CURATOR/TN10> (the >> description is true for any ZooKeeper client, not just Curator clients). If >> you do still intend to use a ZooKeeper lock/leader I suggest you try Apache >> Curator as writing these "recipes" is not trivial and have many gotchas >> that aren't obvious. >> >> -Jordan >> >> http://curator.apache.org <http://curator.apache.org/> >> >> >>> On Dec 5, 2018, at 6:20 PM, Michael Borokhovich <[email protected]> >> wrote: >>> >>> Hello, >>> >>> We have a service that runs on 3 hosts for high availability. However, at >>> any given time, exactly one instance must be active. So, we are thinking >> to >>> use Leader election using Zookeeper. >>> To this goal, on each service host we also start a ZK server, so we have >> a >>> 3-nodes ZK cluster and each service instance is a client to its dedicated >>> ZK server. >>> Then, we implement a leader election on top of Zookeeper using a basic >>> recipe: >>> https://zookeeper.apache.org/doc/r3.1.2/recipes.html#sc_leaderElection. >>> >>> I have the following questions doubts regarding the approach: >>> >>> 1. It seems like we can run into inconsistency issues when network >>> partition occurs. Zookeeper documentation says that the inconsistency >>> period may last “tens of seconds”. Am I understanding correctly that >> during >>> this time we may have 0 or 2 leaders? >>> 2. Is it possible to reduce this inconsistency time (let's say to 3 >>> seconds) by tweaking tickTime and syncLimit parameters? >>> 3. Is there a way to guarantee exactly one leader all the time? Should we >>> implement a more complex leader election algorithm than the one suggested >>> in the recipe (using ephemeral_sequential nodes)? >>> >>> Thanks, >>> Michael. >> >>
