[
https://issues.apache.org/jira/browse/ZOOKEEPER-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kevin Wikant updated ZOOKEEPER-4840:
------------------------------------
Affects Version/s: 3.9.1
> Repeated SessionExpiredException after Zookeeper daemon restart
> ---------------------------------------------------------------
>
> Key: ZOOKEEPER-4840
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4840
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.9.1
> Reporter: Kevin Wikant
> Priority: Major
>
> h2. TL;DR
> After upgrade Apache Zookeeper & Apache Curator, our application is
> experiencing a new unexpected behaviour after events such as:
> * Zookeeper server(s) being restarted
> * Zookeeper server failing & being replaced
> h2. Background
> Application is using Zookeeper for leader election & metadata storage. The
> application runs on 3 hosts which each also have 1 Zookeeper server/daemon
> running.
> Previously the application was running on Zookeeper version 3.5.10 & Curator
> version 4.3.0
> After upgrading to Zookeeper version 3.9.1 & Curator version 5.2.0 a new edge
> case was observed where after Zookeeper daemons are restarted/failed the
> application (i.e. Zookeeper client) enters a 15+ minute loop of repeatedly
> logging "{{{}SessionExpiredException"{}}}
> These repeated "{{{}SessionExpiredException"{}}} are not indicative of a full
> Zookeeper client/communication outage because DEBUG logs show that other
> Zookeeper sessions are communicating just fine. The
> "{{{}SessionExpiredException"{}}} logs unfortunately do not show the
> associated Session ID which is encountering the
> "{{{}SessionExpiredException"{}}}
> h2. Symptoms
> When using Zookeeper version 3.9.1 & Curator version 5.2.0, after
> restarting/failing some of the Zookeeper daemons:
> # All the 3 zookeeper clients experience some connections failures lasting a
> few seconds after the Zookeeper daemons were failed/restarted.
> # These connection failure issues are resolved shortly without any action
> needed.
> # Around 1 minute after the Zookeeper daemons were failed/restarted, all the
> 3 zookeeper clients start repeatedly logging "{{{}SessionExpiredException"{}}}
> # The "{{{}SessionExpiredException" {}}}is repeatedly logged for 15+
> minutes. During this time there are no connectivity issues. We can see from
> the Zookeeper server logs that all 3 Zookeeper servers are receiving regular
> traffic from the clients.
> # Interestingly, each Zookeeper server is not receiving any requests from
> the local Zookeeper client for the duration of the period where
> "{{{}SessionExpiredException"{}}}is repeatedly logged by the clients.
> However, each Zookeeper server is receiving regular traffic from the 2 remote
> Zookeeper clients.
> The evidence suggests that this is a client-side issue & the
> "{{{}SessionExpiredException" {}}}is being thrown before the request is even
> sent to the Zookeeper server.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)