Ryan Ruel created ZOOKEEPER-4428:
------------------------------------
Summary: ZooKeeper leaks "SyncThread" threads when leadership
connection times out and is reestablished
Key: ZOOKEEPER-4428
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4428
Project: ZooKeeper
Issue Type: Bug
Components: server
Affects Versions: 3.6.3
Environment: # On a follower node for an established ZooKeeper
ensemble, issue the following command to determine number of SyncThreads:
ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
# Issue the following IP tables command on the leader to drop traffic coming
from the follower used in Step 1:
iptables -A INPUT -s <Follower IP Address> -j DROP
# Watch the zookeeper logs on the nodes and wait for the connection to drop
due to timeout.
# Issue the following IP tables command on the leader to re-enable traffic
coming from follower used in Step 1:
iptables -D INPUT -s <Follower IP Address> -j DROP
# Watch the zookeeper logs on the nodes and wait for the connection to the
leader to reestablish.
# On the follower node (used in Step 1), check the number of SyncThreads.
That value should have increased by one and stay pinned there indefinitely:
ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
Reporter: Ryan Ruel
In a production environment with some connectivity problems it was found the
ZooKeeper server was using over 1000 threads with name "SyncThread" (that were
never being freed).
Looking through the server logs indicates that these nodes were experiencing
connection timeouts to the leader.
A test environment (described below in the "environment" field of this ticket)
showed that these connection timeouts are what seems to be leaking these
threads.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)