[
https://issues.apache.org/jira/browse/ZOOKEEPER-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17946974#comment-17946974
]
Kezhu Wang commented on ZOOKEEPER-4921:
---------------------------------------
> Are you saying that starting with version 3.9.3, the application is now
> responsible for managing session establishment itself?
{{org.apache.zookeeper.ZooKeeper}} manages a single session which tolerate
network downtime within session timeout. Beyond that, applications(Finagle,
Curator, etc.) have to manage themselves across sessions.
> In our case, the Zookeeper library is embedded within the Twitter Finagle
> library, creating multiple layers of abstraction that make it nearly
> impossible for us to override this behavior.
Seems that finagle's
[ZooKeeperClient.get|https://github.com/twitter/finagle/blob/develop/finagle-serversets/src/main/java/com/twitter/finagle/common/zookeeper/ZooKeeperClient.java#L357]
relies on the endless retrying during estalishing a brand new session. We
could fix it there.
Alternative, we could provide a option for zookeeper to loop until a customized
timeout in new session establishment. I think it might be a common pattern to
get {{Expired}} and then loop until connected.
> Zookeeper Client 3.9.3 Fails to Reconnect After Network Failures
> ----------------------------------------------------------------
>
> Key: ZOOKEEPER-4921
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4921
> Project: ZooKeeper
> Issue Type: Bug
> Components: java client
> Affects Versions: 3.9.3
> Reporter: Chuong Tran
> Priority: Critical
>
> After upgrading the Java Zookeeper client to version 3.9.3, we observed that
> it is not resilient to brief network disruptions, such as a short VPN blip.
> In such cases, the client attempts to reconnect only once, and if
> unsuccessful, the session expires.
> {quote}Apr 23, 2025 10:19:23 AM
> com.twitter.finagle.common.zookeeper.ZooKeeperClient$3 process
> INFO: Zookeeper session expired. Event: WatchedEvent state:Expired type:None
> path:null zxid: -1
> {quote}
> In contrast, the previous version (3.9.2) would continuously retry until the
> network connection was restored, maintaining the session more reliably.
> I believe it's a new issue with this change:
> https://issues.apache.org/jira/browse/ZOOKEEPER-4508
>
> Step to repro:
> # Open VPN.
> # Start the application which connects to the Zookeeper server with the VPN.
> # Disable VPN for a couple of minutes.
> # Observe the application.
> # Enable the VPN again.
> {quote}3.9.3:
> "message" : "Session 0x0 for server XXX, Closing socket connection.
> Attempting reconnect except it is a SessionExpiredException or
> SessionTimeoutException.",
> "stackTrace" : "o.a.z.ClientCnxn$SessionTimeoutException: Client session
> timed out, have not heard from server in 5590ms for session id 0x0
> at o.a.z.ClientCnxn$SendThread.run(ClientCnxn.java:1253)
> {quote}
> 3.9.2: Application will be reconnected successfully.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)