[ https://issues.apache.org/jira/browse/ZOOKEEPER-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108046#comment-17108046 ]
benwang li edited comment on ZOOKEEPER-3829 at 5/15/20, 8:04 AM: ----------------------------------------------------------------- We start `CommitProcessor` [here|https://github.com/apache/zookeeper/blob/e87bad6774e7269ef21a156aff9dad089ef54794/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/CommitProcessor.java#L455] . We shutdown `CommitProcessor` [here|https://github.com/apache/zookeeper/blob/e87bad6774e7269ef21a156aff9dad089ef54794/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/CommitProcessor.java#L637]. But when we call `start` method again, the `workerPool` will not work anymore. I submit the node D logs attachment `d.log`, and we can see that happens. {code:java} 308 2020-05-14 18:04:12,022 [myid:4] - INFO [FollowerRequestProcessor:4:FollowerRequestProcessor@110] - FollowerRequestProcessor exited loop! 309 2020-05-14 18:04:12,022 [myid:4] - INFO [CommitProcessor:4:CommitProcessor@195] - CommitProcessor exited loop! 310 2020-05-14 18:04:12,023 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):FinalRequestProcessor@514] - shutdown of request processor complete 311 2020-05-14 18:04:12,024 [myid:4] - DEBUG [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):FileTxnLog$FileTxnIterator@655] - Created new input stream /data1/zookeeper /logs/version-2/log.2a0000000b 312 2020-05-14 18:04:12,024 [myid:4] - DEBUG [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):FileTxnLog$FileTxnIterator@658] - Created new input archive /data1/zookeepe r/logs/version-2/log.2a0000000b 313 2020-05-14 18:04:12,024 [myid:4] - DEBUG [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):FileTxnLog$FileTxnIterator@696] - EOF exception java.io.EOFException: Faile d to read /data1/zookeeper/logs/version-2/log.2a0000000b 314 -- 315 2020-05-14 18:04:29,000 [myid:4] - DEBUG [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):SessionTrackerImpl@274] - Adding session 0x3082f5048fc0000 316 2020-05-14 18:04:29,000 [myid:4] - DEBUG [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):SessionTrackerImpl@274] - Adding session 0x40a33f8f3f40002 317 2020-05-14 18:04:29,000 [myid:4] - DEBUG [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):SessionTrackerImpl@274] - Adding session 0x40a33f8f3f40000 318 2020-05-14 18:04:29,000 [myid:4] - DEBUG [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):SessionTrackerImpl@274] - Adding session 0x40a33f8f3f40001 319 2020-05-14 18:04:29,000 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):CommitProcessor@256] - Configuring CommitProcessor with 24 worker threads. 320 2020-05-14 18:04:29,002 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):ContainerManager@64] - Using checkIntervalMs=60000 maxPerMinute=10000 321 2020-05-14 18:04:29,003 [myid:4] - DEBUG [LearnerHandler-/146.196.79.232:38708:LearnerHandler@534] - Sending UPTODATE message to 3 {code} was (Author: sundyli): We start `CommitProcessor` [here|https://github.com/apache/zookeeper/blob/e87bad6774e7269ef21a156aff9dad089ef54794/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/CommitProcessor.java#L455] . We shutdown `CommitProcessor` [here|https://github.com/apache/zookeeper/blob/e87bad6774e7269ef21a156aff9dad089ef54794/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/CommitProcessor.java#L637]. But when we call `start` method again, the `workerPool` will work anymore. I submit the node D logs attachment `d.log`, and we can see that happens. {code:java} 308 2020-05-14 18:04:12,022 [myid:4] - INFO [FollowerRequestProcessor:4:FollowerRequestProcessor@110] - FollowerRequestProcessor exited loop! 309 2020-05-14 18:04:12,022 [myid:4] - INFO [CommitProcessor:4:CommitProcessor@195] - CommitProcessor exited loop! 310 2020-05-14 18:04:12,023 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):FinalRequestProcessor@514] - shutdown of request processor complete 311 2020-05-14 18:04:12,024 [myid:4] - DEBUG [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):FileTxnLog$FileTxnIterator@655] - Created new input stream /data1/zookeeper /logs/version-2/log.2a0000000b 312 2020-05-14 18:04:12,024 [myid:4] - DEBUG [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):FileTxnLog$FileTxnIterator@658] - Created new input archive /data1/zookeepe r/logs/version-2/log.2a0000000b 313 2020-05-14 18:04:12,024 [myid:4] - DEBUG [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):FileTxnLog$FileTxnIterator@696] - EOF exception java.io.EOFException: Faile d to read /data1/zookeeper/logs/version-2/log.2a0000000b 314 -- 315 2020-05-14 18:04:29,000 [myid:4] - DEBUG [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):SessionTrackerImpl@274] - Adding session 0x3082f5048fc0000 316 2020-05-14 18:04:29,000 [myid:4] - DEBUG [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):SessionTrackerImpl@274] - Adding session 0x40a33f8f3f40002 317 2020-05-14 18:04:29,000 [myid:4] - DEBUG [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):SessionTrackerImpl@274] - Adding session 0x40a33f8f3f40000 318 2020-05-14 18:04:29,000 [myid:4] - DEBUG [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):SessionTrackerImpl@274] - Adding session 0x40a33f8f3f40001 319 2020-05-14 18:04:29,000 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):CommitProcessor@256] - Configuring CommitProcessor with 24 worker threads. 320 2020-05-14 18:04:29,002 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2183)(secure=disabled):ContainerManager@64] - Using checkIntervalMs=60000 maxPerMinute=10000 321 2020-05-14 18:04:29,003 [myid:4] - DEBUG [LearnerHandler-/146.196.79.232:38708:LearnerHandler@534] - Sending UPTODATE message to 3 {code} > Zookeeper refuses request after node expansion > ---------------------------------------------- > > Key: ZOOKEEPER-3829 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3829 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.5.6 > Reporter: benwang li > Priority: Major > Attachments: d.log > > > It's easy to reproduce this bug. > {code:java} > //代码占位符 > > Step 1. Deploy 3 nodes A,B,C with configuration A,B,C . > Step 2. Deploy node ` D` with configuration `A,B,C,D` , cluster state is ok > now. > Step 3. Restart nodes A,B,C with configuration A,B,C,D, then the leader will > be D, cluster hangs, but it can accept `mntr` command, other command like `ls > /` will be blocked. > Step 4. Restart nodes D, cluster state is back to normal now. > > {code} > > We have looked into the code of 3.5.6 version, and we found it may be the > issue of `workerPool` . > The `CommitProcessor` shutdown and make `workerPool` shutdown, but > `workerPool` still exists. It will never work anymore, yet the cluster still > thinks it's ok. > > I think the bug may still exist in master branch. > We have tested it in our machines by reset the `workerPool` to null. If it's > ok, please assign this issue to me, and then I'll create a PR. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)