[ https://issues.apache.org/jira/browse/ZOOKEEPER-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kevin Wikant updated ZOOKEEPER-4840: ------------------------------------ Description: h2. Background Application is using Zookeeper for leader election & metadata storage. The application runs on 3 hosts which each also have 1 Zookeeper daemon running. Previously the application was running on Zookeeper version 3.5.10 & Curator version 4.3.0 After upgrading to Zookeeper version 3.9.1 & Curator version 5.2.0 a new edge case was observed where after Zookeeper daemons are restarted/failed the application (i.e. Zookeeper client) enters a 15+ minute loop of repeatedly logging "{{{}SessionExpiredException"{}}} These repeated "{{{}SessionExpiredException"{}}} are not indicative of a full Zookeeper client outage because DEBUG logs show that other Zookeeper sessions are communicating just fine. The "{{{}SessionExpiredException"{}}} logs unfortunately do not show the associated Session ID h2. Symptoms When using Zookeeper version 3.9.1 & Curator version 5.2.0, after restarting/failing some of the Zookeeper daemons: # All the 3 zookeeper clients experience some connections failures lasting a few seconds after the Zookeeper daemons were failed/restarted. # These connection failure issues are resolved shortly without any action needed. # Around 1 minute after the Zookeeper daemons were failed/restarted, all the 3 zookeeper clients start repeatedly logging "{{{}SessionExpiredException"{}}} # The "{{{}SessionExpiredException" {{}}}}is repeatedly logged for 15+ minutes. During this time there are no connectivity issues. We can see from the Zookeeper server logs that all 3 Zookeeper servers are receiving regular traffic from the clients. # Interestingly, each Zookeeper server is not receiving any requests from the local Zookeeper client for the duration of the period where "{{{}SessionExpiredException"{}}}is repeatedly logged. However, each Zookeeper server is receiving regular traffic from the 2 remote Zookeeper clients. The evidence suggests that this is a client-side issue & the "{{{}SessionExpiredException" {{}}}}is being thrown before the request is even sent to the Zookeeper server. was: ## Background Application is using Zookeeper for leader election & metadata storage. The application runs on 3 hosts which each also have 1 Zookeeper daemon running. Previously the application was running on Zookeeper version 3.5.10 & Curator version 4.3.0 After upgrading to Zookeeper version 3.9.1 & Curator version 5.2.0 a new edge case was observed where after Zookeeper daemons are restarted/failed the application (i.e. Zookeeper client) enters a 15+ minute loop of repeatedly logging "{{{}SessionExpiredException"{}}} These repeated "{{{}SessionExpiredException"{}}} are not indicative of a full Zookeeper client outage because DEBUG logs show that other Zookeeper sessions are communicating just fine. The "{{{}SessionExpiredException"{}}} logs unfortunately do not show the associated Session ID ## Symptoms When using Zookeeper version 3.9.1 & Curator version 5.2.0, after restarting/failing some of the Zookeeper daemons: # All the 3 zookeeper clients experience some connections failures lasting a few seconds after the Zookeeper daemons were failed/restarted. # These connection failure issues are resolved shortly without any action needed. # Around 1 minute after the Zookeeper daemons were failed/restarted, all the 3 zookeeper clients start repeatedly logging "{{{}SessionExpiredException"{}}} # The "{{{}SessionExpiredException" {}}}is repeatedly logged for 15+ minutes. During this time there are no connectivity issues. We can see from the Zookeeper server logs that all 3 Zookeeper servers are receiving regular traffic from the clients. # Interestingly, each Zookeeper server is not receiving any requests from the local Zookeeper client for the duration of the period where "{{{}SessionExpiredException"{}}}is repeatedly logged. However, each Zookeeper server is receiving regular traffic from the 2 remote Zookeeper clients. The evidence suggests that this is a client-side issue & the "{{{}SessionExpiredException" {}}}is being thrown before the request is even sent to the Zookeeper server. > Repeated SessionExpiredException after Zookeeper daemon restart > --------------------------------------------------------------- > > Key: ZOOKEEPER-4840 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4840 > Project: ZooKeeper > Issue Type: Bug > Reporter: Kevin Wikant > Priority: Major > > h2. Background > Application is using Zookeeper for leader election & metadata storage. The > application runs on 3 hosts which each also have 1 Zookeeper daemon running. > Previously the application was running on Zookeeper version 3.5.10 & Curator > version 4.3.0 > After upgrading to Zookeeper version 3.9.1 & Curator version 5.2.0 a new edge > case was observed where after Zookeeper daemons are restarted/failed the > application (i.e. Zookeeper client) enters a 15+ minute loop of repeatedly > logging "{{{}SessionExpiredException"{}}} > These repeated "{{{}SessionExpiredException"{}}} are not indicative of a full > Zookeeper client outage because DEBUG logs show that other Zookeeper sessions > are communicating just fine. The "{{{}SessionExpiredException"{}}} logs > unfortunately do not show the associated Session ID > h2. Symptoms > When using Zookeeper version 3.9.1 & Curator version 5.2.0, after > restarting/failing some of the Zookeeper daemons: > # All the 3 zookeeper clients experience some connections failures lasting a > few seconds after the Zookeeper daemons were failed/restarted. > # These connection failure issues are resolved shortly without any action > needed. > # Around 1 minute after the Zookeeper daemons were failed/restarted, all the > 3 zookeeper clients start repeatedly logging "{{{}SessionExpiredException"{}}} > # The "{{{}SessionExpiredException" {{}}}}is repeatedly logged for 15+ > minutes. During this time there are no connectivity issues. We can see from > the Zookeeper server logs that all 3 Zookeeper servers are receiving regular > traffic from the clients. > # Interestingly, each Zookeeper server is not receiving any requests from > the local Zookeeper client for the duration of the period where > "{{{}SessionExpiredException"{}}}is repeatedly logged. However, each > Zookeeper server is receiving regular traffic from the 2 remote Zookeeper > clients. > The evidence suggests that this is a client-side issue & the > "{{{}SessionExpiredException" {{}}}}is being thrown before the request is > even sent to the Zookeeper server. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)