[ https://issues.apache.org/jira/browse/ZOOKEEPER-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781196#comment-13781196 ]
yuxin.yan commented on ZOOKEEPER-1768: -------------------------------------- Firstly, thanks for your attention. May be i haven't explained the problem clearly. The problem is like ZOOKEEPER-1115. I copy the log below: 2013-09-27 15:18:43,172 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading snapshot /data/zookeeper/version-2/snapshot.200a74d3b 2013-09-27 15:19:10,358 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@740] - New election. My id = 4, proposed zxid=0x200a74d3b 2013-09-27 15:19:10,359 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x200a74d3b (n.zxid), 0x3 (n.round), LOOKING (n.state), 4 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state) 2013-09-27 15:19:10,359 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), LEADING (n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) 2013-09-27 15:19:10,359 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), FOLLOWING (n.state), 3 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) 2013-09-27 15:19:10,359 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), FOLLOWING (n.state), 5 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) 2013-09-27 15:19:10,360 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@738] - FOLLOWING 2013-09-27 15:19:10,360 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@162] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /data/zookeeper/version-2 snapdir /data/zookeeper/version-2 2013-09-27 15:19:10,360 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@63] - FOLLOWING - LEADER ELECTION TOOK - 27191 2013-09-27 15:19:10,363 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Learner@322] - Getting a diff from the leader 0x200a74e6d 2013-09-27 15:19:10,364 - WARN [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Learner@373] - Got zxid 0x200a74d3c expected 0x1 2013-09-27 15:19:10,376 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@240] - Snapshotting: 0x200a74e6d to /data/zookeeper/version-2/snapshot.200a74e6d 2013-09-27 15:19:13,935 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 1 (n.leader), 0x200a74d48 (n.zxid), 0x3 (n.round), LOOKING (n.state), 1 (n.sid), 0x2 (n.peerEPoch), FOLLOWING (my state) 2013-09-27 15:19:35,856 - WARN [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:366) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:82) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2013-09-27 15:19:35,856 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) 2013-09-27 15:19:35,857 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@139] - Shutting down 2013-09-27 15:19:35,857 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@419] - shutting down 2013-09-27 15:19:35,857 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@670] - LOOKING 2013-09-27 15:19:35,861 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading snapshot /data/zookeeper/version-2/snapshot.200a74e6d 2013-09-27 15:19:42,387 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.65.18:35227 2013-09-27 15:19:42,457 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2013-09-27 15:19:42,457 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.65.18:35227 (no session established for client) 2013-09-27 15:19:55,880 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.65.16:39338 2013-09-27 15:19:55,881 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2013-09-27 15:19:55,881 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.65.16:39338 (no session established for client) 2013-09-27 15:19:57,588 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:53888 2013-09-27 15:19:57,589 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:662) 2013-09-27 15:19:57,589 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:53888 (no session established for client) 2013-09-27 15:20:02,939 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@740] - New election. My id = 4, proposed zxid=0x200a74e6d 2013-09-27 15:20:02,939 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x200a74e6d (n.zxid), 0x3 (n.round), LOOKING (n.state), 4 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state) 2013-09-27 15:20:02,940 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), LEADING (n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) 2013-09-27 15:20:02,940 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), FOLLOWING (n.state), 3 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) 2013-09-27 15:20:02,940 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), FOLLOWING (n.state), 5 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) 2013-09-27 15:20:02,940 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@738] - FOLLOWING 2013-09-27 15:20:02,941 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@162] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /data/zookeeper/version-2 snapdir /data/zookeeper/version-2 2013-09-27 15:20:02,941 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@63] - FOLLOWING - LEADER ELECTION TOOK - 27083 2013-09-27 15:20:02,944 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Learner@322] - Getting a diff from the leader 0x200a74f72 2013-09-27 15:20:02,944 - WARN [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Learner@373] - Got zxid 0x200a74e6e expected 0x1 2013-09-27 15:20:02,956 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@240] - Snapshotting: 0x200a74f72 to /data/zookeeper/version-2/snapshot.200a74f72 2013-09-27 15:20:03,366 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.65.16:39411 2013-09-27 15:20:03,367 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:662) 2013-09-27 15:20:03,367 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.65.16:39411 (no session established for client) 2013-09-27 15:20:17,792 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 1 (n.leader), 0x200a74e7c (n.zxid), 0x3 (n.round), LOOKING (n.state), 1 (n.sid), 0x2 (n.peerEPoch), FOLLOWING (my state) 2013-09-27 15:20:28,825 - WARN [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:366) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:82) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2013-09-27 15:20:28,826 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) 2013-09-27 15:20:28,826 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@139] - Shutting down 2013-09-27 15:20:28,826 - INFO [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@419] - shutting down 2013-09-27 15:2 That means the node always elect the leader and sync from the leader, and the number of node 's snapshots gets larger and larger, so the disk space is destined full. So now could you give me any help? Thanks! yyx, > Cluster fails election loop until the device is full > ---------------------------------------------------- > > Key: ZOOKEEPER-1768 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1768 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection > Affects Versions: 3.4.5 > Reporter: yuxin.yan > Attachments: zk_debug.log.2013-09-25.log, zoo.cfg > > > Hi, > I have a five nodes cluster versioned 3.4.5 and now i find one node is > offline. > Firstly i restart the node but i find that "Error contacting service. It is > probably not running." and i find that the node always elect the leader and > always sync the snapshot logs and the device will be full every ten mins. > so could someone help me? i will put the log and zoo.cfg in the attachment. > Thanks all. > yyx, -- This message was sent by Atlassian JIRA (v6.1#6144)