lujingyu created ZOOKEEPER-4981:
-----------------------------------
Summary: One node has neither crashed nor restarted again, but its
information is inconsistent with other nodes
Key: ZOOKEEPER-4981
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4981
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.6.3
Reporter: lujingyu
Attachments: C1ZK1-1.log, C1ZK2.log, C1ZK3.log, C1ZK4.log,
fuzzlog.txt, zookeeper--server-C1ZK1.txt, zookeeper--server-C1ZK2.txt,
zookeeper--server-C1ZK3.txt, zookeeper--server-C1ZK4.txt,
zookeeper--server-C1ZK5.txt
The Time line of the bug triggered:
# A cluster with five nodes: zk1, zk2, zk3, zk4, zk5. And zk3 is the leader.
# client1 creates a znode "/bug" ;
# client1 creates a znode "delete";
# then, zk5 crashes(a follower, but it likely to be the server which connects
to client1, because the fuzzlog file shows that before client1 sets znode
"/bug" to nice , client1 seems to have been briefly disconnected from the
cluster);
# then, client1 reconnect to the cluster, and sets znode "/bug" 's value to
nice;
# client1 reads znode "/bug", get the value "nice";
# client1 deletes znode "/delete";
# client1 creates ephemeral znode /eph;
# after that, zk4 crashes(important, it is the server which connect with
client1);
# then, client2 connects to the cluster, and it can read the "/eph" node which
should be deleted after client1 leaves.
*********************************************************************************
The acutal testing scenario is as following:
The cluster has five nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3),
C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6)
# 2025-09-05 19:19:22,788 [ZK1Cli] - INFO - build connection with zookeeper
(Client1 builds connection with C1ZK4 [observered by the log])
# 2025-09-05 19:19:23,025 [ZK1Cli] - INFO - created znode /bug hello
# 2025-09-05 19:19:23,036 [ZK1Cli] - INFO - created znode /delete hello
# 2025-09-05 19:19:22,887 prepare to crash node C1ZK5 before the IO operation
did by C1ZK1 (C1ZK1 is going to create file:
"/home/zk1/evaluation/zk-3.6.3/zkData/version-2/log.100000001")
{code:java}
java.io.FileOutputStream.<init>(FileOutputStream.java:213),
java.io.FileOutputStream.<init>(FileOutputStream.java:162),
org.apache.zookeeper.server.persistence.FileTxnLog.append$$PHOSPHORTAGGED(FileTxnLog.java:287),
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append$$PHOSPHORTAGGED(FileTxnSnapLog.java:582),
org.apache.zookeeper.server.ZKDatabase.append$$PHOSPHORTAGGED(ZKDatabase.java:641),
org.apache.zookeeper.server.SyncRequestProcessor.run$$PHOSPHORTAGGED(SyncRequestProcessor.java:181),
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0)
{code}
# 2025-09-05 19:19:23,724 [ZK1Cli] - INFO - set znode /bug nice
# 2025-09-05 19:19:23,738 [ZK1Cli] - INFO - read znode /bug:nice
# 2025-09-05 19:19:23,758 [ZK1Cli] - INFO - deleted znode /delete
# 2025-09-05 19:19:23,770 [ZK1Cli] - INFO - created ephemeral znode /eph ephem
# Client1 successfully create "/eph".
# 2025-09-05 19:19:26,278 prepare to crash node C1ZK4
{code:java}
org.apache.zookeeper.server.quorum.QuorumPacket.serialize$$PHOSPHORTAGGED(QuorumPacket.java:68),
org.apache.jute.BinaryOutputArchive.writeRecord$$PHOSPHORTAGGED(BinaryOutputArchive.java:126),
org.apache.zookeeper.server.quorum.Learner.writePacketNow$$PHOSPHORTAGGED(Learner.java:194),
org.apache.zookeeper.server.quorum.Learner.writePacket$$PHOSPHORTAGGED(Learner.java:186),
org.apache.zookeeper.server.quorum.Learner.request$$PHOSPHORTAGGED(Learner.java:254),
org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run$$PHOSPHORTAGGED(FollowerRequestProcessor.java:104),
org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java)
{code}
# 2025-09-05 19:19:27,124 ZK1Cli2 connect to the cluster and then incorrectly
got ephemeral znode /eph created by cli1.
# 2025-09-05 19:19:39,760 [ZKChecker] - INFO - server C1ZK3:11181 and server
C1ZK1:11181 have different number of znodes:[/zookeeper/quota, /bug,
/zookeeper] [/zookeeper/quota, /bug, /eph, /zookeeper]
**************************************************************************
The following information is the logs of each server. The transaction
0x100000001 "9/5/25 7:19:22 PM CST session 0x5005888b8400000 cxid 0x0 zxid
0x100000001 createSession 15000" indicates that they create a connection with
the client.(C1ZK3 is the leader, so there is no such log.) From the following
information, C1ZK5 is the earliest server connecting to the client1.
C1ZK4: 2025-09-05 19:19:22,932 [myid:4] - WARN
[QuorumPeer[myid=4](plain=[0:0:0:0:0:0:0:0]:11181)(secure=disabled):Follower@170]
- Got zxid 0x100000001 expected 0x1
C1ZK1:2025-09-05 19:19:22,889 [myid:1] - WARN
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:11181)(secure=disabled):Follower@170]
- Got zxid 0x100000001 expected 0x1
C1ZK2:2025-09-05 19:19:22,889 [myid:2] - WARN
[QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:11181)(secure=disabled):Follower@170]
- Got zxid 0x100000001 expected 0x1
C1ZK5:2025-09-05 19:19:22,884 [myid:5] - WARN
[QuorumPeer[myid=5](plain=[0:0:0:0:0:0:0:0]:11181)(secure=disabled):Follower@170]
- Got zxid 0x100000001 expected 0x1
So it is really confused that why only the C1ZK1's node list is different from
other servers. C1ZK1 is not the node which is crashed and not the node which is
rebooted before Client2 connects to cluster.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)