All, After looking into this bug report: https://issues.apache.org/jira/browse/ZOOKEEPER-2615
I believe we have a system-wide race with watches on the server. AFAICT, a request with a watch can be in flight at the same time a connection is being closed. If the in-flight request is executed after this line of NIOServerCnxn.close: if (zkServer != null) { zkServer.removeCnxn(this); } The watches will be added and never cleaned up. This is particularly bad in the case of watches that are being re-created due to a client reconnecting to a server after being disconnected, the SetWatches command, because there can be a large number of new watches created in this command, causing a bigger leak such as the one mentioned in the ticket above. Creating a test that reproduces is not something I've gotten all the way through yet but I believe I can reproduce it with various sleep statements locally. If you have thoughts on the right approach for a fix, LMK in the ticket or here. C