Kevin Wikant created ZOOKEEPER-4840:
---------------------------------------
Summary: Repeated SessionExpiredException after Zookeeper daemon
restart
Key: ZOOKEEPER-4840
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4840
Project: ZooKeeper
Issue Type: Bug
Reporter: Kevin Wikant
## Background
Application is using Zookeeper for leader election & metadata storage. The
application runs on 3 hosts which each also have 1 Zookeeper daemon running.
Previously the application was running on Zookeeper version 3.5.10 & Curator
version 4.3.0
After upgrading to Zookeeper version 3.9.1 & Curator version 5.2.0 a new edge
case was observed where after Zookeeper daemons are restarted/failed the
application (i.e. Zookeeper client) enters a 15+ minute loop of repeatedly
logging "{{{}SessionExpiredException"{}}}
These repeated "{{{}SessionExpiredException"{}}} are not indicative of a full
Zookeeper client outage because DEBUG logs show that other Zookeeper sessions
are communicating just fine. The "{{{}SessionExpiredException"{}}} logs
unfortunately do not show the associated Session ID
## Symptoms
When using Zookeeper version 3.9.1 & Curator version 5.2.0, after
restarting/failing some of the Zookeeper daemons:
# All the 3 zookeeper clients experience some connections failures lasting a
few seconds after the Zookeeper daemons were failed/restarted.
# These connection failure issues are resolved shortly without any action
needed.
# Around 1 minute after the Zookeeper daemons were failed/restarted, all the 3
zookeeper clients start repeatedly logging "{{{}SessionExpiredException"{}}}
# The "{{{}SessionExpiredException" {}}}is repeatedly logged for 15+ minutes.
During this time there are no connectivity issues. We can see from the
Zookeeper server logs that all 3 Zookeeper servers are receiving regular
traffic from the clients.
# Interestingly, each Zookeeper server is not receiving any requests from the
local Zookeeper client for the duration of the period where
"{{{}SessionExpiredException"{}}}is repeatedly logged. However, each Zookeeper
server is receiving regular traffic from the 2 remote Zookeeper clients.
The evidence suggests that this is a client-side issue & the
"{{{}SessionExpiredException" {}}}is being thrown before the request is even
sent to the Zookeeper server.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)