Ryan Ruel created ZOOKEEPER-4428:
------------------------------------

             Summary: ZooKeeper leaks "SyncThread" threads when leadership 
connection times out and is reestablished 
                 Key: ZOOKEEPER-4428
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4428
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.6.3
         Environment: # On a follower node for an established ZooKeeper 
ensemble, issue the following command to determine number of SyncThreads:

ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc


 # Issue the following IP tables command on the leader to drop traffic coming 
from the follower used in Step 1:

 iptables -A INPUT -s <Follower IP Address> -j DROP


 # Watch the zookeeper logs on the nodes and wait for the connection to drop 
due to timeout.


 # Issue the following IP tables command on the leader to re-enable traffic 
coming from follower used in Step 1:

iptables -D INPUT -s <Follower IP Address> -j DROP


 # Watch the zookeeper logs on the nodes and wait for the connection to the 
leader to reestablish.


 # On the follower node (used in Step 1), check the number of SyncThreads.  
That value should have increased by one and stay pinned there indefinitely: 

ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
            Reporter: Ryan Ruel


In a production environment with some connectivity problems it was found the 
ZooKeeper server was using over 1000 threads with name "SyncThread" (that were 
never being freed).

Looking through the server logs indicates that these nodes were experiencing 
connection timeouts to the leader.

A test environment (described below in the "environment" field of this ticket) 
showed that these connection timeouts are what seems to be leaking these 
threads.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to