[ https://issues.apache.org/jira/browse/ZOOKEEPER-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110857#comment-17110857 ]
Keli Wang edited comment on ZOOKEEPER-3829 at 5/19/20, 5:13 AM: ---------------------------------------------------------------- [~sundyli] I tried your reproduce steps with your fixed branch, but still got zkCli.sh stuck in CONNECTING state. {code} zookeeper-docker-test on master [!] took 3m55s ❯ docker-compose --project-name zookeeper --file 4_nodes_zk_mounted_data_folder.yml down Stopping zookeeper_zoo2_1 ... done Stopping zookeeper_zoo3_1 ... done Stopping zookeeper_zoo1_1 ... done Stopping zookeeper_zoo4_1 ... done Removing zookeeper_zoo2_1 ... done Removing zookeeper_zoo3_1 ... done Removing zookeeper_zoo1_1 ... done Removing zookeeper_zoo4_1 ... done Removing network zookeeper_default zookeeper-docker-test on master [!] took 11s ❯ rm -rf data zookeeper-docker-test on master [!] ❯ exa 3_nodes_2_networks_zk.yml 3_nodes_zk.yml 3_nodes_zk_jdk_12.yml 3_nodes_zk_no_wildcard_addr.yml 4_nodes_2_networks_zk.yml conf logs start_zookeeper.sh 3_nodes_digest_quorum_auth.yml 3_nodes_zk_dynamic_config.yml 3_nodes_zk_mounted_data_folder.yml 3_nodes_zk_server_hostname_changed.yml 4_nodes_zk_mounted_data_folder.yml LICENSE.txt README.md zookeeper-docker-test on master [!] ❯ docker-compose --project-name zookeeper --file 3_nodes_zk_mounted_data_folder.yml up -d Creating network "zookeeper_default" with the default driver Creating zookeeper_zoo1_1 ... done Creating zookeeper_zoo2_1 ... done Creating zookeeper_zoo3_1 ... done zookeeper-docker-test on master [!] ❯ docker-compose --project-name zookeeper --file 4_nodes_zk_mounted_data_folder.yml create zoo4 WARNING: The create command is deprecated. Use the up command with the --no-start flag instead. Creating zookeeper_zoo4_1 ... done zookeeper-docker-test on master [!] ❯ docker-compose --project-name zookeeper --file 4_nodes_zk_mounted_data_folder.yml start zoo4 Starting zoo4 ... done zookeeper-docker-test on master [!] ❯ docker-compose --project-name zookeeper --file 3_nodes_zk_mounted_data_folder.yml stop zoo1 docker-compose --project-name zookeeper --file 3_nodes_zk_mounted_data_folder.yml stop zoo2 docker-compose --project-name zookeeper --file 3_nodes_zk_mounted_data_folder.yml stop zoo3 Stopping zookeeper_zoo1_1 ... done Stopping zookeeper_zoo2_1 ... done Stopping zookeeper_zoo3_1 ... done zookeeper-docker-test on master [!] took 32s ❯ docker-compose --project-name zookeeper --file 4_nodes_zk_mounted_data_folder.yml up -d Recreating zookeeper_zoo2_1 ... Recreating zookeeper_zoo2_1 ... done Recreating zookeeper_zoo3_1 ... done Recreating zookeeper_zoo1_1 ... done zookeeper-docker-test on master [!] ❯ docker exec -it zookeeper_zoo4_1 /bin/bash /zookeeper/bin/zkCli.sh Connecting to localhost:2181 2020-05-19 05:10:25,321 [myid:] - INFO [main:Environment@98] - Client environment:zookeeper.version=3.7.0-SNAPSHOT-f87c14dd8a984b5850b9afc9a5c9358f5420877e, built on 2020-05-19 05:03 UTC 2020-05-19 05:10:25,326 [myid:] - INFO [main:Environment@98] - Client environment:host.name=zoo4 2020-05-19 05:10:25,326 [myid:] - INFO [main:Environment@98] - Client environment:java.version=1.8.0_222 2020-05-19 05:10:25,333 [myid:] - INFO [main:Environment@98] - Client environment:java.vendor=Oracle Corporation 2020-05-19 05:10:25,333 [myid:] - INFO [main:Environment@98] - Client environment:java.home=/usr/local/openjdk-8 2020-05-19 05:10:25,333 [myid:] - INFO [main:Environment@98] - Client environment:java.class.path=/zookeeper/bin/../zookeeper-server/target/classes:/zookeeper/bin/../build/classes:/zookeeper/bin/../zookeeper-server/target/lib/*.jar:/zookeeper/bin/../build/lib/*.jar:/zookeeper/bin/../lib/zookeeper-prometheus-metrics-3.7.0-SNAPSHOT.jar:/zookeeper/bin/../lib/zookeeper-jute-3.7.0-SNAPSHOT.jar:/zookeeper/bin/../lib/zookeeper-3.7.0-SNAPSHOT.jar:/zookeeper/bin/../lib/snappy-java-1.1.7.jar:/zookeeper/bin/../lib/slf4j-log4j12-1.7.25.jar:/zookeeper/bin/../lib/slf4j-api-1.7.25.jar:/zookeeper/bin/../lib/simpleclient_servlet-0.6.0.jar:/zookeeper/bin/../lib/simpleclient_hotspot-0.6.0.jar:/zookeeper/bin/../lib/simpleclient_common-0.6.0.jar:/zookeeper/bin/../lib/simpleclient-0.6.0.jar:/zookeeper/bin/../lib/netty-transport-native-unix-common-4.1.48.Final.jar:/zookeeper/bin/../lib/netty-transport-native-epoll-4.1.48.Final.jar:/zookeeper/bin/../lib/netty-transport-4.1.48.Final.jar:/zookeeper/bin/../lib/netty-resolver-4.1.48.Final.jar:/zookeeper/bin/../lib/netty-handler-4.1.48.Final.jar:/zookeeper/bin/../lib/netty-common-4.1.48.Final.jar:/zookeeper/bin/../lib/netty-codec-4.1.48.Final.jar:/zookeeper/bin/../lib/netty-buffer-4.1.48.Final.jar:/zookeeper/bin/../lib/metrics-core-3.2.5.jar:/zookeeper/bin/../lib/log4j-1.2.17.jar:/zookeeper/bin/../lib/json-simple-1.1.1.jar:/zookeeper/bin/../lib/jline-2.14.6.jar:/zookeeper/bin/../lib/jetty-util-9.4.24.v20191120.jar:/zookeeper/bin/../lib/jetty-servlet-9.4.24.v20191120.jar:/zookeeper/bin/../lib/jetty-server-9.4.24.v20191120.jar:/zookeeper/bin/../lib/jetty-security-9.4.24.v20191120.jar:/zookeeper/bin/../lib/jetty-io-9.4.24.v20191120.jar:/zookeeper/bin/../lib/jetty-http-9.4.24.v20191120.jar:/zookeeper/bin/../lib/javax.servlet-api-3.1.0.jar:/zookeeper/bin/../lib/jackson-databind-2.10.3.jar:/zookeeper/bin/../lib/jackson-core-2.10.3.jar:/zookeeper/bin/../lib/jackson-annotations-2.10.3.jar:/zookeeper/bin/../lib/commons-lang-2.6.jar:/zookeeper/bin/../lib/commons-cli-1.2.jar:/zookeeper/bin/../lib/audience-annotations-0.5.0.jar:/zookeeper/bin/../zookeeper-*.jar:/zookeeper/bin/../zookeeper-server/src/main/resources/lib/*.jar:/zookeeper/bin/../conf: 2020-05-19 05:10:25,334 [myid:] - INFO [main:Environment@98] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2020-05-19 05:10:25,334 [myid:] - INFO [main:Environment@98] - Client environment:java.io.tmpdir=/tmp 2020-05-19 05:10:25,334 [myid:] - INFO [main:Environment@98] - Client environment:java.compiler=<NA> 2020-05-19 05:10:25,334 [myid:] - INFO [main:Environment@98] - Client environment:os.name=Linux 2020-05-19 05:10:25,335 [myid:] - INFO [main:Environment@98] - Client environment:os.arch=amd64 2020-05-19 05:10:25,335 [myid:] - INFO [main:Environment@98] - Client environment:os.version=4.19.76-linuxkit 2020-05-19 05:10:25,335 [myid:] - INFO [main:Environment@98] - Client environment:user.name=root 2020-05-19 05:10:25,335 [myid:] - INFO [main:Environment@98] - Client environment:user.home=/root 2020-05-19 05:10:25,336 [myid:] - INFO [main:Environment@98] - Client environment:user.dir=/ 2020-05-19 05:10:25,336 [myid:] - INFO [main:Environment@98] - Client environment:os.memory.free=23MB 2020-05-19 05:10:25,338 [myid:] - INFO [main:Environment@98] - Client environment:os.memory.max=228MB 2020-05-19 05:10:25,338 [myid:] - INFO [main:Environment@98] - Client environment:os.memory.total=31MB 2020-05-19 05:10:25,347 [myid:] - INFO [main:ZooKeeper@633] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@65e579dc 2020-05-19 05:10:25,351 [myid:] - INFO [main:X509Util@77] - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation 2020-05-19 05:10:25,362 [myid:] - INFO [main:ClientCnxnSocket@239] - jute.maxbuffer value is 1048575 Bytes 2020-05-19 05:10:25,374 [myid:] - INFO [main:ClientCnxn@1714] - zookeeper.request.timeout value is 0. feature enabled=false Welcome to ZooKeeper! 2020-05-19 05:10:25,394 [myid:localhost:2181] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1159] - Opening socket connection to server localhost/127.0.0.1:2181. 2020-05-19 05:10:25,395 [myid:localhost:2181] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1161] - SASL config status: Will not attempt to authenticate using SASL (unknown error) JLine support is enabled 2020-05-19 05:10:25,407 [myid:localhost:2181] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@993] - Socket connection established, initiating session, client: /127.0.0.1:40788, server: localhost/127.0.0.1:2181 [zk: localhost:2181(CONNECTING) 0] 2020-05-19 05:10:49,712 [myid:] - ERROR [main:ServiceUtils@42] - Exiting JVM with code 0 {code} was (Author: keliwang): [~sundyli] I tried your reproduce steps with your fixed branch, but still got zkCli.sh stuck in CONNECTING state. > Zookeeper refuses request after node expansion > ---------------------------------------------- > > Key: ZOOKEEPER-3829 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3829 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.5.6 > Reporter: benwang li > Priority: Major > Attachments: d.log, screenshot-1.png > > > It's easy to reproduce this bug. > {code:java} > //代码占位符 > > Step 1. Deploy 3 nodes A,B,C with configuration A,B,C . > Step 2. Deploy node ` D` with configuration `A,B,C,D` , cluster state is ok > now. > Step 3. Restart nodes A,B,C with configuration A,B,C,D, then the leader will > be D, cluster hangs, but it can accept `mntr` command, other command like `ls > /` will be blocked. > Step 4. Restart nodes D, cluster state is back to normal now. > > {code} > > We have looked into the code of 3.5.6 version, and we found it may be the > issue of `workerPool` . > The `CommitProcessor` shutdown and make `workerPool` shutdown, but > `workerPool` still exists. It will never work anymore, yet the cluster still > thinks it's ok. > > I think the bug may still exist in master branch. > We have tested it in our machines by reset the `workerPool` to null. If it's > ok, please assign this issue to me, and then I'll create a PR. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)