[ https://issues.apache.org/jira/browse/ZOOKEEPER-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739750#comment-16739750 ]
Brian Nixon commented on ZOOKEEPER-3240: ---------------------------------------- This may be the same issue as detected in ZOOKEEPER-2669. The two share certain similarities but I haven't looked into it. > Close socket on Learner shutdown to avoid dangling socket > --------------------------------------------------------- > > Key: ZOOKEEPER-3240 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3240 > Project: ZooKeeper > Issue Type: Improvement > Components: server > Affects Versions: 3.6.0 > Reporter: Brian Nixon > Priority: Minor > > There was a Learner that had two connections to the Leader after that Learner > hit an unexpected exception during flush txn to disk, which will shutdown > previous follower instance and restart a new one. > > {quote}2018-10-26 02:31:35,568 ERROR > [SyncThread:3:ZooKeeperCriticalThread@48] - Severe unrecoverable error, from > thread : SyncThread:3 > java.io.IOException: Input/output error > at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at > java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:72) > at > java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:395) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:457) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:548) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:769) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:172) > 2018-10-26 02:31:35,568 INFO [SyncThread:3:ZooKeeperServerListenerImpl@42] - > Thread SyncThread:3 exits, error code 1 > 2018-10-26 02:31:35,568 INFO [SyncThread:3:SyncRequestProcessor@234] - > SyncRequestProcessor exited!{quote} > > It is supposed to close the previous socket, but it doesn't seem to be done > anywhere in the code. This leaves the socket open with no one reading from > it, and caused the queue full and blocked on sender. > > Since the LearnerHandler didn't shutdown gracefully, the learner queue size > keeps growing, the JVM heap size on leader keeps growing and added pressure > to the GC, and cause high GC time and latency in the quorum. > > The simple fix is to gracefully shutdown the socket. -- This message was sent by Atlassian JIRA (v7.6.3#76005)