Hi all,

Before ZOOKEEPER-4508[1], clients could run into endless session
reestablishment if zk cluster is unavailable.

1. For established sessions, the client tries to reestablish the
session endlessly. Thus, the client will not get `Expired` until the
cluster comes back.
2. In the case of no established session, the client tries to
establish a brand-new session endlessly.

I authored pr-2058[2] to fix them both by issuing `Expired` after
approximately 4/3 * sessionTimeout as I think:
1. The first is a bug from clients' perspective. `Expired` deserves
prompt awareness.
2. Clients should be aware of failure after a period of time, so as
not to block clients indefinitely.

In investigating ZOOKEEPER-4921[3], I found that Twitter Finagle[4]
probably depends on the endless retry to establish a brand-new
session.

I think we have roads to go:
1. Treat endless retry in case of unestablished session as a bug. This
will require third-party applications to fix themselves once they
depend on the endless retry. I could author a fix for Finagle. But I
doubt whether this endless retry is an already established feature for
some third libraries.
2. Rollback to endless retry in 3.9.4. And provide customized
`newSessionTimeout` to control timeout to establish a brand-new
session through `ZooKeeperBuilder`[5] in 3.10. Together with
ZOOKEEPER-4508, we will get `Expired` for established sessions and
gain timeout control in establishing brand-new sessions without
breaking possible third party applications.

I am kindly leaning towards the second approach.

What do you think ?

Best,
Kezhu

[1]: https://issues.apache.org/jira/browse/ZOOKEEPER-4508
[2]: https://github.com/apache/zookeeper/pull/2058
[3]: https://issues.apache.org/jira/browse/ZOOKEEPER-4921
[4]: 
https://github.com/twitter/finagle/blob/develop/finagle-serversets/src/main/java/com/twitter/finagle/common/zookeeper/ZooKeeperClient.java#L357
[5]: https://issues.apache.org/jira/browse/ZOOKEEPER-4697

Reply via email to