Github user nicktrav commented on the issue:
https://github.com/apache/zookeeper/pull/330
@DanBenediktson - I've been looking into writing a test for this patch, but
I can't seem to replicate the case you speak about on the original ticket.
Specifically:
> The exact code path it goes through in this case is complicated, because
there has to be a previously-closed socket still waiting in the selector
(otherwise, the first timeout evaluation will not fail because "now" still
hasn't been updated, and then the actual connect timeout will be applied in
ClientCnxnSocket.doTransport()) so that select() will harvest the IO from the
previous socket and updateNow(), resulting in the next loop through
ClientCnxnSocket.SendThread.run() observing the spurious timeout and failing.
Are you able to provide some more details on how this client can get into
this state? Walking through the code, I'm having difficulty understanding how
the client can end up a reconnect loop.
We are keen to see this patch land as it would make a fix for
ZOOKEEPER-2869 inherently safer.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---