All,

After looking into this bug report:
https://issues.apache.org/jira/browse/ZOOKEEPER-2615

I believe we have a system-wide race with watches on the server. AFAICT, a
request with a watch can be in flight at the same time a connection is
being closed. If the in-flight request is executed after this line of
NIOServerCnxn.close:

  if (zkServer != null) {

            zkServer.removeCnxn(this);

        }

The watches will be added and never cleaned up.

This is particularly bad in the case of watches that are being re-created
due to a client reconnecting to a server after being disconnected, the
SetWatches command, because there can be a large number of new watches
created in this command, causing a bigger leak such as the one mentioned in
the ticket above.

Creating a test that reproduces is not something I've gotten all the way
through yet but I believe I can reproduce it with various sleep statements
locally. If you have thoughts on the right approach for a fix, LMK in the
ticket or here.

C

Reply via email to