Hi all, I recently worked on fixing flaky test -- testPortChange(), which is related to ZOOKEEPER-2000.
This is what I have figured out: * Server (1) and (2) were followers, (3) was the leader. * client connected to (1), did a reconfig(). * (1) and (2) formed a quorum, reconfig was successful, and returned. * (3) still thinks he's the leader, so using LeaderZooKeeperServer. * client connected to (3) did a sync(), and the sync didn't go through a quorum. THE CLIENT WHO DID SYNC() GETS WRONG BEHAVIOR. There's a split brain here for sync(). * Then (3) gradually moves to the new quorum config. I'm proposing to change sync() to need quorum acks. I've privately talked with my friend Xiang Li who's working on etcd. He previously had similar experience and finally changed sync to go through quorum. Since this change affects the behavior of sync(), I'm asking in public if there's any concern/assumption? Let's discuss it here. Best, -- *- Hongchao Deng* *Software Engineer*
