[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17945195#comment-17945195
 ] 

Hany commented on ZOOKEEPER-2106:
---------------------------------

Hi there, do you resolve this issue? We met it too.

> Error when reading from leader causes JVM to hang
> -------------------------------------------------
>
>                 Key: ZOOKEEPER-2106
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2106
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.5
>            Reporter: Robert Joseph Evans
>            Priority: Critical
>
> I tried looking through existing JIRA for something like this, but the 
> closest I came was ZOOKEEPER-2104.  It looks very similar, but I don't know 
> if it really is the same thing.  Essentially we had a 5 node ensemble for a 
> large storm cluster.  For a few of the nodes at the same time they get an 
> error that looks like.
> {code}
> WARN  [RecvWorker:2:QuorumCnxManager$RecvWorker@762] - Connection broken for 
> id 2, my id = 4, error = 
> java.io.EOFException
>       at java.io.DataInputStream.readInt(DataInputStream.java:392)
>       at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:747)
> WARN  [RecvWorker:2:QuorumCnxManager$RecvWorker@765] - Interrupting SendWorker
> WARN  [SendWorker:2:QuorumCnxManager$SendWorker@679] - Interrupted while 
> waiting for message on queue
> java.lang.InterruptedException
>      at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
>       at 
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
>       at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:831)
>       at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:62)
>      at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:667)
> WARN  [SendWorker:2:QuorumCnxManager$SendWorker@688] - Send worker leaving 
> thread
> WARN  [QuorumPeer[myid=4]/0.0.0.0:50512:Follower@89] - Exception when 
> following the leader
> java.net.SocketException: Connection reset
>      at java.net.SocketInputStream.read(SocketInputStream.java:189)
>      at java.net.SocketInputStream.read(SocketInputStream.java:121)
>      at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>      at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>      at java.io.DataInputStream.readInt(DataInputStream.java:387)
>      at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>      at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>      at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>      at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
>      at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>     at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
> INFO  [QuorumPeer[myid=4]/0.0.0.0:50512:Follower@166] - shutdown called
> java.lang.Exception: shutdown Follower
>       at 
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>      at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
> {code}
> After that all of the connections are shut down
> {code}
> INFO  [QuorumPeer[myid=4]/0.0.0.0:50512:NIOServerCnxn@1001] - Closed socket 
> connection for client ...
> {code}
> but it does not manage to have the JVM shut down
> {code}
> INFO  [QuorumPeer[myid=4]/0.0.0.0:50512:FollowerZooKeeperServer@139] - 
> Shutting down
> INFO  [QuorumPeer[myid=4]/0.0.0.0:50512:ZooKeeperServer@419] - shutting down
> INFO  [QuorumPeer[myid=4]/0.0.0.0:50512:FollowerRequestProcessor@105] - 
> Shutting down
> INFO  [QuorumPeer[myid=4]/0.0.0.0:50512:CommitProcessor@181] - Shutting down
> INFO  [FollowerRequestProcessor:4:FollowerRequestProcessor@95] - 
> FollowerRequestProcessor exited loop!
> INFO  [QuorumPeer[myid=4]/0.0.0.0:50512:FinalRequestProcessor@415] - shutdown 
> of request processor complete
> INFO  [CommitProcessor:4:CommitProcessor@150] - CommitProcessor exited loop!
> WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:50512:NIOServerCnxn@354] - 
> Exception causing close of session 0x0 due to java.io.IOException: 
> ZooKeeperServer not running
> INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:50512:NIOServerCnxn@1001] - 
> Closed socket connection for client /... (no session established for client)
> INFO  [QuorumPeer[myid=4]/0.0.0.0:50512:SyncRequestProcessor@175] - Shutting 
> down
> INFO  [SyncThread:4:SyncRequestProcessor@155] - SyncRequestProcessor exited!
> INFO  [QuorumPeer[myid=4]/0.0.0.0:50512:QuorumPeer@670] - LOOKING
> {code}
> after that all connections to that node initiate, and then are shut down with 
> ZooKeeperServer not running.  It seems to stay in this state indefinitely 
> until the process is manually restarted.  After that it recovers.
> We have seen this happen on multiple servers at the same time resulting in 
> the entire ensemble being unusable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to