It doesn't ring a bell, but it might be worth having a look at the logs to see if there is anything unusual.
Just to clarify, was the number of outstanding requests growing, constant? I suppose the server was following/leading and operations were going through, otherwise it'd have dropped the connection to the leader or leadership. -Flavio > On 17 Feb 2015, at 18:01, Marshall McMullen <marshall.mcmul...@gmail.com> > wrote: > > Greetings, > > We saw an issue recently that I've never seen before and am hoping I can > get some clarity on what may cause this and whether it's a known issue. We > had a 5 node ensemble and were unable to connect to one of the ZooKeeper > instances. When trying to connect with zkCli it would timeout. When I > connected via telnet and issued the srvr four letter word, I was surprised > to see that this one server reported a massive number of 'Outstanding' > requests. I'd never seen that really be anything other than 0 before. On > the ZK dev guide it says: > > "outstanding is the number of queued requests, this increases when the > server is under load and is receiving more sustained requests than it can > process, ie the request queue". I looked at all the ZK servers in my > ensemble: > > for ip in 101 102 103 104 105; do echo srvr | nc 172.21.20.${ip} 2181 | > grep Outstanding; done > Outstanding: 0 > Outstanding: 0 > Outstanding: 0 > Outstanding: 0 > Outstanding: 18876 > > I eventually killed ZK on the affected server and everything corrected > itself and Outstanding went to zero and I was able to connect again. > > Is this something anyone's familiar with? I have logs if it would be > helpful. > > Thanks!