Ryan Ruel created ZOOKEEPER-4428: ------------------------------------ Summary: ZooKeeper leaks "SyncThread" threads when leadership connection times out and is reestablished Key: ZOOKEEPER-4428 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4428 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.6.3 Environment: # On a follower node for an established ZooKeeper ensemble, issue the following command to determine number of SyncThreads:
ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc # Issue the following IP tables command on the leader to drop traffic coming from the follower used in Step 1: iptables -A INPUT -s <Follower IP Address> -j DROP # Watch the zookeeper logs on the nodes and wait for the connection to drop due to timeout. # Issue the following IP tables command on the leader to re-enable traffic coming from follower used in Step 1: iptables -D INPUT -s <Follower IP Address> -j DROP # Watch the zookeeper logs on the nodes and wait for the connection to the leader to reestablish. # On the follower node (used in Step 1), check the number of SyncThreads. That value should have increased by one and stay pinned there indefinitely: ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc Reporter: Ryan Ruel In a production environment with some connectivity problems it was found the ZooKeeper server was using over 1000 threads with name "SyncThread" (that were never being freed). Looking through the server logs indicates that these nodes were experiencing connection timeouts to the leader. A test environment (described below in the "environment" field of this ticket) showed that these connection timeouts are what seems to be leaking these threads. -- This message was sent by Atlassian Jira (v8.20.1#820001)