Thanks, Maciej. That sounds good. We will try playing with the parameters and have at least a known upper limit on the inconsistency interval.
On Fri, Dec 7, 2018 at 2:11 AM Maciej Smoleński <jezd...@gmail.com> wrote: > On Fri, Dec 7, 2018 at 3:03 AM Michael Borokhovich <michael...@gmail.com> > wrote: > > > We are planning to run Zookeeper nodes embedded with the client nodes. > > I.e., each client runs also a ZK node. So, network partition will > > disconnect a ZK node and not only the client. > > My concern is about the following statement from the ZK documentation: > > > > "Timeliness: The clients view of the system is guaranteed to be > up-to-date > > within a certain time bound. (*On the order of tens of seconds.*) Either > > system changes will be seen by a client within this bound, or the client > > will detect a service outage." > > > > This is related to the fact that ZooKeeper server handles reads from its > local state - without communicating with other ZooKeeper servers. > This design ensures scalability for read dominated workloads. > In this approach client might receive data which is not up to date (it > might not contain updates from other ZooKeeper servers (quorum)). > Parameter 'syncLimit' describes how often ZooKeeper server > synchronizes/updates its local state to global state. > Client read operation will retrieve data from state not older then > described by 'syncLimit'. > > However ZooKeeper client can always force to retrieve data which is up to > date. > It needs to issue 'sync' command to ZooKeeper server before issueing > 'read'. > With 'sync' ZooKeeper server with synchronize its local state with global > state. > Later 'read' will be handled from updated state. > Client should be careful here - so that it communicates with the same > ZooKeeper server for both 'sync' and 'read'. > > > > What are these "*tens of seconds*"? Can we reduce this time by > configuring > > "syncLimit" and "tickTime" to let's say 5 seconds? Can we have a strong > > guarantee on this time bound? > > > > As describe above - you might use 'sync'+'read' to avoid this problem. > > > > > > > > On Thu, Dec 6, 2018 at 1:05 PM Jordan Zimmerman < > > jor...@jordanzimmerman.com> > > wrote: > > > > > > Old service leader will detect network partition max 15 seconds after > > it > > > > happened. > > > > > > If the old service leader is in a very long GC it will not detect the > > > partition. In the face of VM pauses, etc. it's not possible to avoid 2 > > > leaders for a short period of time. > > > > > > -JZ > > >