[jira] [Commented] (ZOOKEEPER-3829) Zookeeper refuses request after node expansion

Mate Szalay-Beko (Jira) Fri, 15 May 2020 02:04:29 -0700


    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108106#comment-17108106
 ]


Mate Szalay-Beko commented on ZOOKEEPER-3829:
---------------------------------------------

I failed to reproduce your case. I created docker compose files 
([https://github.com/symat/zookeeper-docker-test]) and using 3.5.6, I executed 
these steps:
 * start A,B,C with config (A,B,C)
 * start D with config (A,B,C,D)
 * stop A
 * start A with config (A,B,C,D)
 * stop B
 * start B with config (A,B,C,D)
 * stop C
 * start C with config (A,B,C,D)

At the end, everything worked for me just fine, leader was D and all nodes were 
up, forming a quorum (A,B,C,D) and zkCli worked ("{{ls /"}})

 
 There must be some differences between your reproduction and mine. Can you 
please share your zoo.cfg?

My looks like:
{code:java}
dataDir=/data
dataLogDir=/datalog
tickTime=2000
initLimit=5
syncLimit=2
autopurge.snapRetainCount=3
autopurge.purgeInterval=0
maxClientCnxns=60

standaloneEnabled=true
admin.enableServer=true
localSessionsEnabled=true
localSessionsUpgradingEnabled=true

4lw.commands.whitelist=stat, ruok, conf, isro, wchc, wchp, srvr, mntr, cons

clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory

# server host/port config in my case (when I have 4 nodes)
server.1=zoo1:2888:3888;2181
server.2=zoo2:2888:3888;2181
server.3=zoo3:2888:3888;2181
server.4=zoo4:2888:3888;2181
{code}
I checked the log file you uploaded. But I don't really see why you think the 
problem is with CommitProcessor. Maybe I miss something. Is this the full log 
file from your D node?

Also I checked the code. I think the {{CommitProcessor}} class should never be 
reused after a {{shutdown()}} is called. After a new leader election, a new 
{{LeaderZooKeeperServer}} / {{FollowerZooKeeperServer}} / 
{{ObserverZooKeeperServer}} object will be created (depending on the role of 
the given server), with a fresh {{CommitProcessor}} and new {{workerPool}}. So 
AFAICT (based only on a high-level look on the code) it shouldn't really matter 
to set {{workerPool=null}} in the shutdown method. But maybe I just don't 
follow your reasoning, or missed something in the code. Feel free to create a 
PR then we can see what you suggest.

Did you try your proposed fix already and saw that it solves your original 
issue?

> Zookeeper refuses request after node expansion
> ----------------------------------------------
>
>                 Key: ZOOKEEPER-3829
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3829
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.6
>            Reporter: benwang li
>            Priority: Major
>         Attachments: d.log
>
>
> It's easy to reproduce this bug.
> {code:java}
> //代码占位符
>  
> Step 1. Deploy 3 nodes  A,B,C with configuration A,B,C .
> Step 2. Deploy node ` D` with configuration  `A,B,C,D` , cluster state is ok 
> now.
> Step 3. Restart nodes A,B,C with configuration A,B,C,D, then the leader will 
> be D, cluster hangs, but it can accept `mntr` command, other command like `ls 
> /` will be blocked.
> Step 4. Restart nodes D, cluster state is back to normal now.
>  
> {code}
>  
> We have looked into the code of 3.5.6 version, and we found it may be the 
> issue of  `workerPool` .
> The `CommitProcessor` shutdown and make `workerPool` shutdown, but 
> `workerPool` still exists. It will never work anymore, yet the cluster still 
> thinks it's ok.
>  
> I think the bug may still exist in master branch.
> We have tested it in our machines by reset the `workerPool` to null. If it's 
> ok, please assign this issue to me, and then I'll create a PR. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ZOOKEEPER-3829) Zookeeper refuses request after node expansion

Reply via email to