[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781196#comment-13781196
 ] 

yuxin.yan commented on ZOOKEEPER-1768:
--------------------------------------

Firstly, thanks for your attention.  May be i haven't explained the problem 
clearly. The problem is like ZOOKEEPER-1115. I copy the log below:
2013-09-27 15:18:43,172 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading snapshot 
/data/zookeeper/version-2/snapshot.200a74d3b
2013-09-27 15:19:10,358 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@740] - New 
election. My id =  4, proposed zxid=0x200a74d3b
2013-09-27 15:19:10,359 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] 
- Notification: 4 (n.leader), 0x200a74d3b (n.zxid), 0x3 (n.round), LOOKING 
(n.state), 4 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)
2013-09-27 15:19:10,359 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] 
- Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), LEADING 
(n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2013-09-27 15:19:10,359 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] 
- Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), FOLLOWING 
(n.state), 3 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2013-09-27 15:19:10,359 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] 
- Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), FOLLOWING 
(n.state), 5 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2013-09-27 15:19:10,360 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@738] - FOLLOWING
2013-09-27 15:19:10,360 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@162] - Created server 
with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir 
/data/zookeeper/version-2 snapdir /data/zookeeper/version-2
2013-09-27 15:19:10,360 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@63] - FOLLOWING - LEADER 
ELECTION TOOK - 27191
2013-09-27 15:19:10,363 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Learner@322] - Getting a diff from the 
leader 0x200a74e6d
2013-09-27 15:19:10,364 - WARN  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Learner@373] - Got zxid 0x200a74d3c 
expected 0x1
2013-09-27 15:19:10,376 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@240] - Snapshotting: 
0x200a74e6d to /data/zookeeper/version-2/snapshot.200a74e6d
2013-09-27 15:19:13,935 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] 
- Notification: 1 (n.leader), 0x200a74d48 (n.zxid), 0x3 (n.round), LOOKING 
(n.state), 1 (n.sid), 0x2 (n.peerEPoch), FOLLOWING (my state)
2013-09-27 15:19:35,856 - WARN  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
following the leader
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at 
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
        at 
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
        at 
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
        at 
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
        at 
org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:366)
        at 
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:82)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
2013-09-27 15:19:35,856 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
        at 
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
2013-09-27 15:19:35,857 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@139] - 
Shutting down
2013-09-27 15:19:35,857 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@419] - shutting down
2013-09-27 15:19:35,857 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@670] - LOOKING
2013-09-27 15:19:35,861 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading snapshot 
/data/zookeeper/version-2/snapshot.200a74e6d
2013-09-27 15:19:42,387 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /192.168.65.18:35227
2013-09-27 15:19:42,457 - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
running
2013-09-27 15:19:42,457 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket 
connection for client /192.168.65.18:35227 (no session established for client)
2013-09-27 15:19:55,880 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /192.168.65.16:39338
2013-09-27 15:19:55,881 - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
running
2013-09-27 15:19:55,881 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket 
connection for client /192.168.65.16:39338 (no session established for client)
2013-09-27 15:19:57,588 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /127.0.0.1:53888
2013-09-27 15:19:57,589 - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of 
stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0, 
likely client has closed socket
        at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:662)
2013-09-27 15:19:57,589 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket 
connection for client /127.0.0.1:53888 (no session established for client)
2013-09-27 15:20:02,939 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@740] - New 
election. My id =  4, proposed zxid=0x200a74e6d
2013-09-27 15:20:02,939 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] 
- Notification: 4 (n.leader), 0x200a74e6d (n.zxid), 0x3 (n.round), LOOKING 
(n.state), 4 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)
2013-09-27 15:20:02,940 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] 
- Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), LEADING 
(n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2013-09-27 15:20:02,940 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] 
- Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), FOLLOWING 
(n.state), 3 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2013-09-27 15:20:02,940 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] 
- Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), FOLLOWING 
(n.state), 5 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2013-09-27 15:20:02,940 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@738] - FOLLOWING
2013-09-27 15:20:02,941 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@162] - Created server 
with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir 
/data/zookeeper/version-2 snapdir /data/zookeeper/version-2
2013-09-27 15:20:02,941 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@63] - FOLLOWING - LEADER 
ELECTION TOOK - 27083
2013-09-27 15:20:02,944 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Learner@322] - Getting a diff from the 
leader 0x200a74f72
2013-09-27 15:20:02,944 - WARN  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Learner@373] - Got zxid 0x200a74e6e 
expected 0x1
2013-09-27 15:20:02,956 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@240] - Snapshotting: 
0x200a74f72 to /data/zookeeper/version-2/snapshot.200a74f72
2013-09-27 15:20:03,366 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /192.168.65.16:39411
2013-09-27 15:20:03,367 - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of 
stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0, 
likely client has closed socket
        at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:662)
2013-09-27 15:20:03,367 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket 
connection for client /192.168.65.16:39411 (no session established for client)
2013-09-27 15:20:17,792 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] 
- Notification: 1 (n.leader), 0x200a74e7c (n.zxid), 0x3 (n.round), LOOKING 
(n.state), 1 (n.sid), 0x2 (n.peerEPoch), FOLLOWING (my state)
2013-09-27 15:20:28,825 - WARN  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
following the leader
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at 
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
        at 
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
        at 
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
        at 
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
        at 
org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:366)
        at 
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:82)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
2013-09-27 15:20:28,826 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
        at 
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
2013-09-27 15:20:28,826 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@139] - 
Shutting down
2013-09-27 15:20:28,826 - INFO  
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@419] - shutting down
2013-09-27 15:2

That means the node always elect the leader and sync from the leader, and the 
number of node 's snapshots gets larger and larger, so the disk space is 
destined full. So now could you give me any help? Thanks!
yyx,

> Cluster fails election loop until the device is full
> ----------------------------------------------------
>
>                 Key: ZOOKEEPER-1768
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1768
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: leaderElection
>    Affects Versions: 3.4.5
>            Reporter: yuxin.yan
>         Attachments: zk_debug.log.2013-09-25.log, zoo.cfg
>
>
> Hi, 
> I have a five nodes cluster versioned 3.4.5 and now i find one node is 
> offline.
> Firstly i restart the node but i find that "Error contacting service. It is 
> probably not running." and i find that the node always elect the leader and 
> always sync the snapshot logs and the device will be full every ten mins. 
> so could someone help me? i will put the log and zoo.cfg in the attachment.
> Thanks all.
> yyx,



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to