Really the logs are critical here. If you can provide them it would shed light.
Patrick On Tue, Dec 20, 2011 at 10:13 AM, Benjamin Reed <[email protected]> wrote: > i've seen it before when the configuration files haven't been setup > properly. i would check the configuration. if the leader is still the > leader, it must have active followers connected to it, otherwise it > would give up leadership. i would use netstat to find out who they > are. > > ben > > On Tue, Dec 20, 2011 at 9:00 AM, Marshall McMullen > <[email protected]> wrote: >> Zookeeper devs, >> >> I've got a cluster with 3 servers in the ensemble all running 3.4.0. After >> a few days of successful operation, we observed all zookeeper reads and >> writes began failing every time. In our log files, the error being reported >> is INVALID_STATE. I then telnetted to port 2181 on all three servers and >> was surprised to see that *two* of these servers both report they are the >> leader! Two of the nodes are in agreement on the Zxid, and one of the nodes >> is way out of whack with a much much larger Zxid. The node that all writes >> are flowing through is the one with the much higher Zxid. >> >> Has anyone ever seen this before? What can I do to diagnose this problem >> and resolve it? I was considering killing zookeeper on the node that should >> not be the leader (the one with the wrong Zxid) and removing the zookeeper >> data directory, then restarting zookeeper on that node. Any other ideas? >> >> I appreciate any help.
