[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110780#comment-17110780
 ] 

Keli Wang edited comment on ZOOKEEPER-3829 at 5/19/20, 2:53 AM:
----------------------------------------------------------------

{code:java}
if (curQV.getVersion() == 0 && curQV.getVersion() == lastSeenQV.getVersion()) {
    // This was added in ZOOKEEPER-1783. The initial config has version 0 (not 
explicitly
    // specified by the user; the lack of version in a config file is 
interpreted as version=0). 
    // As soon as a config is established we would like to increase its version 
so that it
    // takes presedence over other initial configs that were not established 
(such as a config
    // of a server trying to join the ensemble, which may be a partial view of 
the system, not the full config). 
    // We chose to set the new version to the one of the NEWLEADER message. 
However, before we can do that
    // there must be agreement on the new version, so we can only change the 
version when sending/receiving UPTODATE,
    // not when sending/receiving NEWLEADER. In other words, we can't change 
curQV here since its the committed quorum verifier, 
    // and there's still no agreement on the new version that we'd like to use. 
Instead, we use 
    // lastSeenQuorumVerifier which is being sent with NEWLEADER message
    // so its a good way to let followers know about the new version. (The 
original reason for sending 
    // lastSeenQuorumVerifier with NEWLEADER is so that the leader completes 
any potentially uncommitted reconfigs
    // that it finds before starting to propose operations. Here we're reusing 
the same code path for 
    // reaching consensus on the new version number.)

    // It is important that this is done before the leader executes 
waitForEpochAck,
    // so before LearnerHandlers return from their waitForEpochAck
    // hence before they construct the NEWLEADER message containing
    // the last-seen-quorumverifier of the leader, which we change below
    try {
        QuorumVerifier newQV = self.configFromString(curQV.toString());
        newQV.setVersion(zk.getZxid());
        self.setLastSeenQuorumVerifier(newQV, true);    
    } catch (Exception e) {
        throw new IOException(e);
    }
}
{code}

[~symat] In the code above, can leader always overwrite lastSeenQuorumVerifier 
with its latest quorumVerifier when dynamic-reconfig disabled? If 
lastSeenQuorumVerifier is the same as quorumVerifier, then allowedToCommit 
should always be true.



was (Author: keliwang):
{code:java}
if (curQV.getVersion() == 0 && curQV.getVersion() == lastSeenQV.getVersion()) {
    // This was added in ZOOKEEPER-1783. The initial config has version 0 (not 
explicitly
    // specified by the user; the lack of version in a config file is 
interpreted as version=0). 
    // As soon as a config is established we would like to increase its version 
so that it
    // takes presedence over other initial configs that were not established 
(such as a config
    // of a server trying to join the ensemble, which may be a partial view of 
the system, not the full config). 
    // We chose to set the new version to the one of the NEWLEADER message. 
However, before we can do that
    // there must be agreement on the new version, so we can only change the 
version when sending/receiving UPTODATE,
    // not when sending/receiving NEWLEADER. In other words, we can't change 
curQV here since its the committed quorum verifier, 
    // and there's still no agreement on the new version that we'd like to use. 
Instead, we use 
    // lastSeenQuorumVerifier which is being sent with NEWLEADER message
    // so its a good way to let followers know about the new version. (The 
original reason for sending 
    // lastSeenQuorumVerifier with NEWLEADER is so that the leader completes 
any potentially uncommitted reconfigs
    // that it finds before starting to propose operations. Here we're reusing 
the same code path for 
    // reaching consensus on the new version number.)

    // It is important that this is done before the leader executes 
waitForEpochAck,
    // so before LearnerHandlers return from their waitForEpochAck
    // hence before they construct the NEWLEADER message containing
    // the last-seen-quorumverifier of the leader, which we change below
    try {
        QuorumVerifier newQV = self.configFromString(curQV.toString());
        newQV.setVersion(zk.getZxid());
        self.setLastSeenQuorumVerifier(newQV, true);    
    } catch (Exception e) {
        throw new IOException(e);
    }
}
{code}

[~symat] In the code above, can leader always overwrite lastSeenQuorumVerifier 
with its latest quorumVerifier when dyanmic-reconfig disabled? If 
lastSeenQuorumVerifier is the same as quorumVerifier, then allowedToCommit 
should always be true.


> Zookeeper refuses request after node expansion
> ----------------------------------------------
>
>                 Key: ZOOKEEPER-3829
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3829
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.6
>            Reporter: benwang li
>            Priority: Major
>         Attachments: d.log, screenshot-1.png
>
>
> It's easy to reproduce this bug.
> {code:java}
> //代码占位符
>  
> Step 1. Deploy 3 nodes  A,B,C with configuration A,B,C .
> Step 2. Deploy node ` D` with configuration  `A,B,C,D` , cluster state is ok 
> now.
> Step 3. Restart nodes A,B,C with configuration A,B,C,D, then the leader will 
> be D, cluster hangs, but it can accept `mntr` command, other command like `ls 
> /` will be blocked.
> Step 4. Restart nodes D, cluster state is back to normal now.
>  
> {code}
>  
> We have looked into the code of 3.5.6 version, and we found it may be the 
> issue of  `workerPool` .
> The `CommitProcessor` shutdown and make `workerPool` shutdown, but 
> `workerPool` still exists. It will never work anymore, yet the cluster still 
> thinks it's ok.
>  
> I think the bug may still exist in master branch.
> We have tested it in our machines by reset the `workerPool` to null. If it's 
> ok, please assign this issue to me, and then I'll create a PR. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to