Hi all, Before ZOOKEEPER-4508[1], clients could run into endless session reestablishment if zk cluster is unavailable.
1. For established sessions, the client tries to reestablish the session endlessly. Thus, the client will not get `Expired` until the cluster comes back. 2. In the case of no established session, the client tries to establish a brand-new session endlessly. I authored pr-2058[2] to fix them both by issuing `Expired` after approximately 4/3 * sessionTimeout as I think: 1. The first is a bug from clients' perspective. `Expired` deserves prompt awareness. 2. Clients should be aware of failure after a period of time, so as not to block clients indefinitely. In investigating ZOOKEEPER-4921[3], I found that Twitter Finagle[4] probably depends on the endless retry to establish a brand-new session. I think we have roads to go: 1. Treat endless retry in case of unestablished session as a bug. This will require third-party applications to fix themselves once they depend on the endless retry. I could author a fix for Finagle. But I doubt whether this endless retry is an already established feature for some third libraries. 2. Rollback to endless retry in 3.9.4. And provide customized `newSessionTimeout` to control timeout to establish a brand-new session through `ZooKeeperBuilder`[5] in 3.10. Together with ZOOKEEPER-4508, we will get `Expired` for established sessions and gain timeout control in establishing brand-new sessions without breaking possible third party applications. I am kindly leaning towards the second approach. What do you think ? Best, Kezhu [1]: https://issues.apache.org/jira/browse/ZOOKEEPER-4508 [2]: https://github.com/apache/zookeeper/pull/2058 [3]: https://issues.apache.org/jira/browse/ZOOKEEPER-4921 [4]: https://github.com/twitter/finagle/blob/develop/finagle-serversets/src/main/java/com/twitter/finagle/common/zookeeper/ZooKeeperClient.java#L357 [5]: https://issues.apache.org/jira/browse/ZOOKEEPER-4697