[GitHub] [zookeeper] symat commented on a change in pull request #1356: ZOOKEEPER-3829: backward compatibility fix for rolling restart without dynamic reconfig

2020-05-26 Thread GitBox


symat commented on a change in pull request #1356:
URL: https://github.com/apache/zookeeper/pull/1356#discussion_r430210333



##
File path: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java
##
@@ -943,6 +944,7 @@ public synchronized boolean tryToCommit(Proposal p, long 
zxid, SocketAddress fol
 self.processReconfig(newQV, designatedLeader, zk.getZxid(), true);
 
 if (designatedLeader != self.getId()) {
+LOG.info("Committing a reconfiguration; this leader is not the 
designated leader anymore, setting allowedToCommit = false");

Review comment:
   thanks, good idea

##
File path: 
zookeeper-server/src/test/java/org/apache/zookeeper/test/ReconfigExceptionTest.java
##
@@ -90,6 +90,16 @@ public void tearDown() throws Exception {
 @Test(timeout = 1)
 public void testReconfigDisabled() throws InterruptedException {
 QuorumPeerConfig.setReconfigEnabled(false);
+
+// for thsi test we need to restart the quorum peers to get the config 
change,

Review comment:
   thanks!

##
File path: 
zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/ReconfigRollingRestartCompatibilityTest.java
##
@@ -209,6 +209,237 @@ public void testRollingRestartWithMembershipChange() 
throws Exception {
 }
 }
 
+@Test
+public void testRollingRestartWithExtendedMembershipConfig() throws 
Exception {
+// in this test we are performing rolling restart with extended quorum 
config, see ZOOKEEPER-3829
+
+// Start a quorum with 3 members
+int serverCount = 3;
+String config = generateNewQuorumConfig(serverCount);
+QuorumPeerTestBase.MainThread[] mt = new 
QuorumPeerTestBase.MainThread[serverCount];
+List joiningServers = new ArrayList<>();
+for (int i = 0; i < serverCount; i++) {
+mt[i] = new QuorumPeerTestBase.MainThread(i, clientPorts.get(i), 
config, false);
+mt[i].start();
+joiningServers.add(serverAddress.get(i));
+}
+for (int i = 0; i < serverCount; i++) {
+assertTrue("waiting for server " + i + " being up", 
ClientBase.waitForServerUp("127.0.0.1:" + clientPorts.get(i), 
CONNECTION_TIMEOUT));
+}
+for (int i = 0; i < serverCount; i++) {
+verifyQuorumConfig(i, joiningServers, null);
+verifyQuorumMembers(mt[i]);
+}
+
+// Create updated config with 4 members
+List newServers = new ArrayList<>(joiningServers);
+config = updateExistingQuorumConfig(Arrays.asList(3), new 
ArrayList<>());
+newServers.add(serverAddress.get(3));
+serverCount = serverAddress.size();
+assertEquals("Server count should be 4 after config update.", 
serverCount, 4);
+
+// We are adding one new server to the ensemble. The new server should 
be started with the new config
+mt = Arrays.copyOf(mt, mt.length + 1);
+mt[3] = new QuorumPeerTestBase.MainThread(3, clientPorts.get(3), 
config, false);
+mt[3].start();
+assertTrue("waiting for server 3 being up", 
ClientBase.waitForServerUp("127.0.0.1:" + clientPorts.get(3), 
CONNECTION_TIMEOUT));
+verifyQuorumConfig(3, newServers, null);
+verifyQuorumMembers(mt[3]);
+
+// Now we restart the first 3 servers, one-by-one with the new config
+for (int i = 0; i < 3; i++) {
+mt[i].shutdown();
+
+assertTrue(String.format("Timeout during waiting for server %d to 
go down", i),
+   ClientBase.waitForServerDown("127.0.0.1:" + 
clientPorts.get(i), ClientBase.CONNECTION_TIMEOUT));
+
+mt[i] = new QuorumPeerTestBase.MainThread(i, clientPorts.get(i), 
config, false);
+mt[i].start();
+assertTrue("waiting for server " + i + " being up", 
ClientBase.waitForServerUp("127.0.0.1:" + clientPorts.get(i), 
CONNECTION_TIMEOUT));
+verifyQuorumConfig(i, newServers, null);
+verifyQuorumMembers(mt[i]);
+}
+
+// now verify that all nodes can handle traffic
+for (int i = 0; i < 4; ++i) {
+ZooKeeper zk = ClientBase.createZKClient("127.0.0.1:" + 
clientPorts.get(i));
+ReconfigTest.testNormalOperation(zk, zk, false);
+}
+
+for (int i = 0; i < 4; ++i) {
+mt[i].shutdown();
+}
+}
+
+
+@Test
+public void 
testRollingRestartWithExtendedMembershipConfigRestartingLeaderFirst() throws 
Exception {
+// in this test we are performing rolling restart with extended quorum 
config, see ZOOKEEPER-3829
+// first we start the new nodes, then restart the current leader
+// this is a special case, as the old servers will not be able to join 
to the new nodes after they became leader
+
+// Start a quorum with 3 members
+int serverCount = 3;
+String config = 

[GitHub] [zookeeper] symat commented on a change in pull request #1356: ZOOKEEPER-3829: backward compatibility fix for rolling restart without dynamic reconfig

2020-05-22 Thread GitBox


symat commented on a change in pull request #1356:
URL: https://github.com/apache/zookeeper/pull/1356#discussion_r429061004



##
File path: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java
##
@@ -614,7 +614,7 @@ void lead() throws IOException, InterruptedException {
 
 QuorumVerifier lastSeenQV = self.getLastSeenQuorumVerifier();
 QuorumVerifier curQV = self.getQuorumVerifier();
-if (curQV.getVersion() == 0 && curQV.getVersion() == 
lastSeenQV.getVersion()) {
+if (!QuorumPeerConfig.isReconfigEnabled() || (curQV.getVersion() 
== 0 && curQV.getVersion() == lastSeenQV.getVersion())) {

Review comment:
   I think I'll revert this part of the change in the next commit.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [zookeeper] symat commented on a change in pull request #1356: ZOOKEEPER-3829: backward compatibility fix for rolling restart without dynamic reconfig

2020-05-21 Thread GitBox


symat commented on a change in pull request #1356:
URL: https://github.com/apache/zookeeper/pull/1356#discussion_r428731977



##
File path: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java
##
@@ -614,7 +614,7 @@ void lead() throws IOException, InterruptedException {
 
 QuorumVerifier lastSeenQV = self.getLastSeenQuorumVerifier();
 QuorumVerifier curQV = self.getQuorumVerifier();
-if (curQV.getVersion() == 0 && curQV.getVersion() == 
lastSeenQV.getVersion()) {
+if (!QuorumPeerConfig.isReconfigEnabled() || (curQV.getVersion() 
== 0 && curQV.getVersion() == lastSeenQV.getVersion())) {

Review comment:
   Thanks for checking!
   
   > What's being done here is updating the config version to the current 
leader zxid
   
   I added the extra condition because here also the lastSeenQuorumVerifier is 
set to be equal to the currentQuorumVerifyer. The problem I found in 
ZOOKEEPER-3814 was that old server addresses were still propagated through the 
lastSeenQuorumVerifier to newly restarted nodes when dynamic reconfig was 
disabled. So my idea was to reset the lastSeenQuorumVerifier to the current 
config here. Maybe a better way would be to not update lastSeenQuorumVerifier 
when we process the NEWLEADER message (if dynamic reconfig is disabled)? 
   
   > This is done only once, when the ensemble is initial.
   
   Isn't the config version always 0 when dynamic reconfig is disabled?
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [zookeeper] symat commented on a change in pull request #1356: ZOOKEEPER-3829: backward compatibility fix for rolling restart without dynamic reconfig

2020-05-19 Thread GitBox


symat commented on a change in pull request #1356:
URL: https://github.com/apache/zookeeper/pull/1356#discussion_r427212264



##
File path: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java
##
@@ -1508,20 +1510,25 @@ private synchronized void startZkServer() {
  newLeaderProposal.ackSetsToString(),
  Long.toHexString(zk.getZxid()));
 
-/*
- * ZOOKEEPER-1324. the leader sends the new config it must complete
- *  to others inside a NEWLEADER message (see LearnerHandler where
- *  the NEWLEADER message is constructed), and once it has enough
- *  acks we must execute the following code so that it applies the
- *  config to itself.
- */
-QuorumVerifier newQV = self.getLastSeenQuorumVerifier();
+if (QuorumPeerConfig.isReconfigEnabled()) {
+/*
+ * ZOOKEEPER-1324. the leader sends the new config it must complete
+ *  to others inside a NEWLEADER message (see LearnerHandler where
+ *  the NEWLEADER message is constructed), and once it has enough
+ *  acks we must execute the following code so that it applies the
+ *  config to itself.
+ */
+QuorumVerifier newQV = self.getLastSeenQuorumVerifier();
 
-Long designatedLeader = getDesignatedLeader(newLeaderProposal, 
zk.getZxid());
+Long designatedLeader = getDesignatedLeader(newLeaderProposal, 
zk.getZxid());
 
-self.processReconfig(newQV, designatedLeader, zk.getZxid(), true);
-if (designatedLeader != self.getId()) {
-allowedToCommit = false;
+self.processReconfig(newQV, designatedLeader, zk.getZxid(), true);
+if (designatedLeader != self.getId()) {
+LOG.warn("This leader is not the designated leader, it will be 
initialized with allowedToCommit = false");
+allowedToCommit = false;
+}
+} else {
+LOG.debug("Reconfig feature is disabled, skip designatedLeader 
calculation and reconfig processing.");

Review comment:
   I agree with the LOG.info (this message will be printed only once when 
the leader starts to lead after an election. Yet, the message can help 
debugging production issues, when debug logs are disabled)
   
   The tests are simulating the "backward compatibility" case when 
`QuorumPeerConfig.isReconfigEnabled()==false`. I don't think that changing the 
config with rolling-restart would be advised when dynamic-reconfig is enabled. 
I think when the reconfig is enabled, the new config can propagate during the 
leader election, so the rolling-restart will be unneccessary.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org