i've seen it before when the configuration files haven't been setup properly. i would check the configuration. if the leader is still the leader, it must have active followers connected to it, otherwise it would give up leadership. i would use netstat to find out who they are.
ben On Tue, Dec 20, 2011 at 9:00 AM, Marshall McMullen <marshall.mcmul...@gmail.com> wrote: > Zookeeper devs, > > I've got a cluster with 3 servers in the ensemble all running 3.4.0. After > a few days of successful operation, we observed all zookeeper reads and > writes began failing every time. In our log files, the error being reported > is INVALID_STATE. I then telnetted to port 2181 on all three servers and > was surprised to see that *two* of these servers both report they are the > leader! Two of the nodes are in agreement on the Zxid, and one of the nodes > is way out of whack with a much much larger Zxid. The node that all writes > are flowing through is the one with the much higher Zxid. > > Has anyone ever seen this before? What can I do to diagnose this problem > and resolve it? I was considering killing zookeeper on the node that should > not be the leader (the one with the wrong Zxid) and removing the zookeeper > data directory, then restarting zookeeper on that node. Any other ideas? > > I appreciate any help.