[
https://issues.apache.org/jira/browse/ZOOKEEPER-4444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829769#comment-17829769
]
LINFUSHOU commented on ZOOKEEPER-4444:
--------------------------------------
This issue happened in my environment, then zookeeper follower data was
inconsistent with leader. znode was deleted, but could be read from one
follower.
2023-09-22 06:14:59,309 [myid:3] - INFO
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:9639)(secure=disabled):Learner@737][]
- Learner received NEWLEADER message
2023-09-22 06:14:59,310 [myid:3] - INFO
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:9639)(secure=disabled):QuorumPeer@921][]
- Peer state changed: following - synchronization
2023-09-22 06:14:59,310 [myid:3] - INFO
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:9639)(secure=disabled):CommitProcessor@491][]
- Configuring CommitProcessor with readBatchSize -1 commitBatchSize 1
2023-09-22 06:14:59,310 [myid:3] - INFO
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:9639)(secure=disabled):CommitProcessor@452][]
- Configuring CommitProcessor with 28 worker threads.
2023-09-22 06:14:59,310 [myid:3] - INFO
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:9639)(secure=disabled):FollowerRequestProcessor@59][]
- Initialized FollowerRequestProcessor with
zookeeper.follower.skipLearnerRequestToNextProcessor as false
2023-09-22 06:14:59,311 [myid:3] - WARN
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:9639)(secure=disabled):MBeanRegistry@110][]
- Failed to register MBean InMemoryDataTree
2023-09-22 06:14:59,311 [myid:3] - WARN
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:9639)(secure=disabled):LearnerZooKeeperServer@104][]
- Failed to register with JMX
org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.3,name2=Follower,name3=InMemoryDataTree
at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
at
com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
at org.apache.zookeeper.jmx.MBeanRegistry.register(MBeanRegistry.java:106)
at
org.apache.zookeeper.server.quorum.LearnerZooKeeperServer.registerJMX(LearnerZooKeeperServer.java:102)
at
org.apache.zookeeper.server.ZooKeeperServer.startupWithServerState(ZooKeeperServer.java:715)
at
org.apache.zookeeper.server.ZooKeeperServer.startupWithoutServing(ZooKeeperServer.java:698)
at
org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:760)
at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:108)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1522)
2023-09-22 06:15:00,292 [myid:3] - WARN
[NIOWorkerThread-18:NIOServerCnxn@380][] - Close of session 0x0
ZooKeeperServer not running
at
org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:554)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:339)
at
org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:508)
at
org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) // 30065370041
2023-09-22 06:15:00,292 [myid:3] - WARN [SyncThread:3:FileTxnLog@275][] -
Current zxid 30065370041 is <= 30065371040 for 5
2023-09-22 06:15:00,292 [myid:3] - WARN
[NIOWorkerThread-10:NIOServerCnxn@380][] - Close of session 0x0
ZooKeeperServer not running
at
org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:554)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:339)
at
org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:508)
at
org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2023-09-22 06:15:00,293 [myid:3] - WARN [SyncThread:3:FileTxnLog@275][] -
Current zxid 30065370042 is <= 30065371040 for 1
2023-09-22 06:15:00,293 [myid:3] - WARN [SyncThread:3:FileTxnLog@275][] -
Current zxid 30065370043 is <= 30065371040 for 1
2023-09-22 06:15:00,293 [myid:3] - WARN [SyncThread:3:FileTxnLog@275][] -
Current zxid 30065370044 is <= 30065371040 for 1
2023-09-22 06:15:00,294 [myid:3] - WARN [SyncThread:3:FileTxnLog@275][] -
Current zxid 30065370045 is <= 30065371040 for 2
2023-09-22 06:15:00,294 [myid:3] - WARN [SyncThread:3:FileTxnLog@275][] -
Current zxid 30065370046 is <= 30065371040 for 5
2023-09-22 06:15:00,294 [myid:3] - WARN [SyncThread:3:FileTxnLog@275][] -
Current zxid 30065370047 is <= 30065371040 for 1
2023-09-22 06:15:00,294 [myid:3] - WARN [SyncThread:3:FileTxnLog@275][] -
Current zxid 30065370048 is <= 30065371040 for 1
2023-09-22 06:15:00,294 [myid:3] - WARN [SyncThread:3:FileTxnLog@275][] -
Current zxid 30065370049 is <= 30065371040 for 2
2023-09-22 06:15:00,294 [myid:3] - WARN [SyncThread:3:FileTxnLog@275][] -
Current zxid 30065370050 is <= 30065371040 for 5
2023-09-22 06:15:00,294 [myid:3] - WARN [SyncThread:3:FileTxnLog@275][] -
Current zxid 30065370051 is <= 30065371040 for 2
2023-09-22 06:15:00,294 [myid:3] - WARN [SyncThread:3:FileTxnLog@275][] -
Current zxid 30065370052 is <= 30065371040 for 5
2023-09-22 06:15:00,294 [myid:3] - WARN [SyncThread:3:FileTxnLog@275][] -
Current zxid 30065370053 is <= 30065371040 for 1
2023-09-22 06:15:00,294 [myid:3] - WARN [SyncThread:3:FileTxnLog@275][] -
Current zxid 30065370054 is <= 30065371040 for 1
> Follower doesn't get synchronized after process restart
> -------------------------------------------------------
>
> Key: ZOOKEEPER-4444
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4444
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.6.3
> Reporter: Andreas Weber
> Priority: Major
>
> Hi folks, I've got an issue with 3.6.3.
> I can't provide a simple test, because it seems to depend on timing in a
> cluster environment, but I tried to reduce the scenario as far as possible:
> * zookeeper cluster with 5 nodes, all of them Followers (no Observers)
> * start some parallel threads which do some writes in the cluster (e.g.
> create/delete znodes)
> * kill one of the zookeeper processes, let's say on node X (where node X is
> not the Leader)
> * restart zookeeper process on node X
> * wait a few seconds
> * kill zookeeper process on node X again
> * restart zookeeper process on node X again
> In some cases (every 3-4 runs) I see the following in the log of node X:
> After first restart of node X:
> {noformat}
> WARN persistence.FileTxnLog - Current zxid 4294968525 is <=
> 4294969524 for 15
> WARN persistence.FileTxnLog - Current zxid 4294968526 is <=
> 4294969524 for 15
> WARN persistence.FileTxnLog - Current zxid 4294968527 is <=
> 4294969524 for 15
> ... (this kind of WARN is repeated some hundred times)
> WARN quorum.SendAckRequestProcessor - Closing connection to leader,
> exception during packet send java.net.SocketException: Socket closed ...
> ... (this kind of WARN is repeated some hundred times)
> {noformat}
> After second restart of node X:
> {noformat}
> ERROR persistence.FileTxnSnapLog - 4294970146(highestZxid) >
> 4294969147(next log) for type 2
> WARN server.DataTree - Message:Digests are not matching.
> Value is Zxid. Value:4294969147
> ERROR server.DataTree - First digest mismatch on txn:
> 360466402305310720,3870,4294969147,1639258399998,2
> , ...
> , expected digest is 2,1365261838770
> , actual digest is 1098406565142,
> ERROR persistence.FileTxnSnapLog - 4294970146(highestZxid) >
> 4294969148(next log) for type 2
> ERROR persistence.FileTxnSnapLog - 4294970146(highestZxid) >
> 4294969149(next log) for type 5
> ERROR persistence.FileTxnSnapLog - 4294970146(highestZxid) >
> 4294969150(next log) for type 2
> ... (this kind of ERROR is repeated some hundred times)
> {noformat}
> And afterwards (in the actual application), zookepeer on node X seems to have
> a different view of the cluster state and doesn't get synchronized, at least
> for a few hours.
> This e.g. leads to phantom reads of znodes that were deleted a long time ago.
> (The resulting behaviour looks a little bit similar as described in
> ZOOKEEPER-3911.)
> This does not happen with zookeeper 3.6.2 !
> (at least I can't reproduce it with this version)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)