Zookeeper devs, I've got a cluster with 3 servers in the ensemble all running 3.4.0. After a few days of successful operation, we observed all zookeeper reads and writes began failing every time. In our log files, the error being reported is INVALID_STATE. I then telnetted to port 2181 on all three servers and was surprised to see that *two* of these servers both report they are the leader! Two of the nodes are in agreement on the Zxid, and one of the nodes is way out of whack with a much much larger Zxid. The node that all writes are flowing through is the one with the much higher Zxid.
Has anyone ever seen this before? What can I do to diagnose this problem and resolve it? I was considering killing zookeeper on the node that should not be the leader (the one with the wrong Zxid) and removing the zookeeper data directory, then restarting zookeeper on that node. Any other ideas? I appreciate any help.