Hi tison,

Thank you for your information.

> what will happen if the client decides session expired but the server hold a 
> valid session and reconnect

This will not happen inside a single `ZooKeeper` client. Once a client
concludes a session as expired it will not try to establish connection
anymore.

I rarely see usage of session transition[1] from one client to
another. Theoretically, it is possible to reestablish a session
expired solely by client. But I don't think it is the main focus of
ZooKeeper. Client has to do a quorum operation to gain "happens-before
relation" or linearizability[2] on the data tree in session transition
regardless of this feature. That is, there is no need to prove such a
relation from my point of review. If this is a concern, I think we
could also make this feature a configurable option(including the
`4/3`)  in `ZooKeeperOptions`[3].

> Curator has done this client-side expiration with a similar algorithm for a 
> long time and I didn't hear any issues reported. So such a solution can be 
> battle-tested.

I was actually somewhat surprised by such a lack in ZooKeeper in my
first attempt to fix this `endless connection loss`[4]. :-)

[1]: 
https://github.com/apache/zookeeper/blob/03a36d08e257c43e8377e5549d5524805fc6b8bb/zookeeper-server/src/main/java/org/apache/zookeeper/ZooKeeper.java#L851
[2]: 
https://zookeeper.apache.org/doc/r3.9.0/zookeeperInternals.html#sc_consistency
[3]: 
https://github.com/apache/zookeeper/pull/2001/files#diff-e19fc4f18a3bb65b0ecf90d98d01cb2d7705afffc0249d7e3a6f8c2655cfd702R32
[4]: https://github.com/apache/zookeeper/pull/1847#pullrequestreview-1433893017

Best,
Kezhu Wang

On Fri, Sep 1, 2023 at 4:53 PM tison <wander4...@gmail.com> wrote:
>
> IIUC the major issue here is what will happen if the client decides session
> expired but the server hold a valid session and reconnect.
>
> 4/3 time may best effort do the expiration after the server expires the
> session, but we need to prove a happens-before relation or think of the
> issues described above.
>
> However, Curator has done this client-side expiration with a similar
> algorithm for a long time and I didn't hear any issues reported. So such a
> solution can be battle-tested.
>
> Best,
> tison.
>
>
> Kezhu Wang <kez...@gmail.com> 于2023年9月1日周五 16:24写道:
>
> > Hi all,
> >
> > ZooKeeper session will expire approximately after negotiated session
> > timeout. Currently, client will learn this after successful contact to
> > ZooKeeper cluster. This exposes an endless client side connection loss
> > when client can't reach ZooKeeper cluster due to either incomplete
> > connection string or whole cluster downtime.
> >
> > There is a `SessionTimeoutException` in `CliientCnxn`, but it never
> > counts as session expiration.
> >
> > Possibly at least four jira issues reported the behavior described above.
> >
> > * ZOOKEEPER-2188[1]: client connection hung up because of dead loop
> > * ZOOKEEPER-4412[2]: client blocked too long before session timeout
> > * ZOOKEEPER-4508[3]: ZooKeeper client run to endless loop in
> > ClientCnxn.SendThread.run if all server down
> > * ZOOKEEPER-4692[4]: Handle SessionTimeoutException in Java client
> >
> > I propose to add an `expirationTimeout` in `ClientCnxn` to deal with
> > this. The value could be approximately `4/3` of `connectTimeout` or
> > `negotiatedSessionTimeout` depending on stage. I opened a pr[5] for
> > evaluation.
> >
> > Any suggestions ? Thanks!
> >
> > [1]: https://issues.apache.org/jira/browse/ZOOKEEPER-2188
> > [2]: https://issues.apache.org/jira/browse/ZOOKEEPER-4412
> > [3]: https://issues.apache.org/jira/browse/ZOOKEEPER-4508
> > [4]: https://issues.apache.org/jira/browse/ZOOKEEPER-4692
> > [5]: https://github.com/apache/zookeeper/pull/2058
> >
> > Best,
> > Kezhu Wang
> >

Reply via email to