[ https://issues.apache.org/jira/browse/ZOOKEEPER-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110847#comment-17110847 ]
benwang li edited comment on ZOOKEEPER-3829 at 5/19/20, 4:49 AM: ----------------------------------------------------------------- [~symat] Hi, I have pushed my code to my repo. `git clone --depth=50 --branch=ZOOKEEPER-3829 g...@github.com:sundy-li/zookeeper.git` With this fix, we won't reproduce this issue(but we use docker-compose down, we still have this issue), you can try that. {code:java} docker-compose --project-name zookeeper --file 3_nodes_zk_mounted_data_folder.yml up -ddocker-compose --project-name zookeeper --file 4_nodes_zk_mounted_data_folder.yml create zoo4 docker-compose --project-name zookeeper --file 4_nodes_zk_mounted_data_folder.yml start zoo4 docker-compose --project-name zookeeper --file 3_nodes_zk_mounted_data_folder.yml stop zoo1 docker-compose --project-name zookeeper --file 3_nodes_zk_mounted_data_folder.yml stop zoo2 docker-compose --project-name zookeeper --file 3_nodes_zk_mounted_data_folder.yml stop zoo3 ## If we use docker-compose --project-name zookeeper --file 3_nodes_zk_mounted_data_folder.yml down, we reproduce this issue even with the fix, that's weired. docker-compose --project-name zookeeper --file 4_nodes_zk_mounted_data_folder.yml up -d {code} Yet I believe `localSessionsEnabled` configuration matters, maybe there is not just one bug that causes this issue. was (Author: sundyli): [~symat] Hi, I have pushed my code to my repo. `git clone --depth=50 --branch=ZOOKEEPER-3829 g...@github.com:sundy-li/zookeeper.git` With this fix, we won't reproduce this issue. I still think it's the problem of reusing `workerPool`, you can try that. > Zookeeper refuses request after node expansion > ---------------------------------------------- > > Key: ZOOKEEPER-3829 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3829 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.5.6 > Reporter: benwang li > Priority: Major > Attachments: d.log, screenshot-1.png > > > It's easy to reproduce this bug. > {code:java} > //代码占位符 > > Step 1. Deploy 3 nodes A,B,C with configuration A,B,C . > Step 2. Deploy node ` D` with configuration `A,B,C,D` , cluster state is ok > now. > Step 3. Restart nodes A,B,C with configuration A,B,C,D, then the leader will > be D, cluster hangs, but it can accept `mntr` command, other command like `ls > /` will be blocked. > Step 4. Restart nodes D, cluster state is back to normal now. > > {code} > > We have looked into the code of 3.5.6 version, and we found it may be the > issue of `workerPool` . > The `CommitProcessor` shutdown and make `workerPool` shutdown, but > `workerPool` still exists. It will never work anymore, yet the cluster still > thinks it's ok. > > I think the bug may still exist in master branch. > We have tested it in our machines by reset the `workerPool` to null. If it's > ok, please assign this issue to me, and then I'll create a PR. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)