Hi tison, Thank you for your information.
> what will happen if the client decides session expired but the server hold a > valid session and reconnect This will not happen inside a single `ZooKeeper` client. Once a client concludes a session as expired it will not try to establish connection anymore. I rarely see usage of session transition[1] from one client to another. Theoretically, it is possible to reestablish a session expired solely by client. But I don't think it is the main focus of ZooKeeper. Client has to do a quorum operation to gain "happens-before relation" or linearizability[2] on the data tree in session transition regardless of this feature. That is, there is no need to prove such a relation from my point of review. If this is a concern, I think we could also make this feature a configurable option(including the `4/3`) in `ZooKeeperOptions`[3]. > Curator has done this client-side expiration with a similar algorithm for a > long time and I didn't hear any issues reported. So such a solution can be > battle-tested. I was actually somewhat surprised by such a lack in ZooKeeper in my first attempt to fix this `endless connection loss`[4]. :-) [1]: https://github.com/apache/zookeeper/blob/03a36d08e257c43e8377e5549d5524805fc6b8bb/zookeeper-server/src/main/java/org/apache/zookeeper/ZooKeeper.java#L851 [2]: https://zookeeper.apache.org/doc/r3.9.0/zookeeperInternals.html#sc_consistency [3]: https://github.com/apache/zookeeper/pull/2001/files#diff-e19fc4f18a3bb65b0ecf90d98d01cb2d7705afffc0249d7e3a6f8c2655cfd702R32 [4]: https://github.com/apache/zookeeper/pull/1847#pullrequestreview-1433893017 Best, Kezhu Wang On Fri, Sep 1, 2023 at 4:53 PM tison <wander4...@gmail.com> wrote: > > IIUC the major issue here is what will happen if the client decides session > expired but the server hold a valid session and reconnect. > > 4/3 time may best effort do the expiration after the server expires the > session, but we need to prove a happens-before relation or think of the > issues described above. > > However, Curator has done this client-side expiration with a similar > algorithm for a long time and I didn't hear any issues reported. So such a > solution can be battle-tested. > > Best, > tison. > > > Kezhu Wang <kez...@gmail.com> 于2023年9月1日周五 16:24写道: > > > Hi all, > > > > ZooKeeper session will expire approximately after negotiated session > > timeout. Currently, client will learn this after successful contact to > > ZooKeeper cluster. This exposes an endless client side connection loss > > when client can't reach ZooKeeper cluster due to either incomplete > > connection string or whole cluster downtime. > > > > There is a `SessionTimeoutException` in `CliientCnxn`, but it never > > counts as session expiration. > > > > Possibly at least four jira issues reported the behavior described above. > > > > * ZOOKEEPER-2188[1]: client connection hung up because of dead loop > > * ZOOKEEPER-4412[2]: client blocked too long before session timeout > > * ZOOKEEPER-4508[3]: ZooKeeper client run to endless loop in > > ClientCnxn.SendThread.run if all server down > > * ZOOKEEPER-4692[4]: Handle SessionTimeoutException in Java client > > > > I propose to add an `expirationTimeout` in `ClientCnxn` to deal with > > this. The value could be approximately `4/3` of `connectTimeout` or > > `negotiatedSessionTimeout` depending on stage. I opened a pr[5] for > > evaluation. > > > > Any suggestions ? Thanks! > > > > [1]: https://issues.apache.org/jira/browse/ZOOKEEPER-2188 > > [2]: https://issues.apache.org/jira/browse/ZOOKEEPER-4412 > > [3]: https://issues.apache.org/jira/browse/ZOOKEEPER-4508 > > [4]: https://issues.apache.org/jira/browse/ZOOKEEPER-4692 > > [5]: https://github.com/apache/zookeeper/pull/2058 > > > > Best, > > Kezhu Wang > >