[ https://issues.apache.org/jira/browse/ZOOKEEPER-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Haoze Wu updated ZOOKEEPER-4419: -------------------------------- Description: We were doing some testing in ZooKeeper 3.6.2, and found that sometimes a follower may take longer time (about 10s) to join the quorum. In our cluster, usually we expect that the leader and followers are ready to provide service in 1 or 2 seconds. *Analysis* We did some investigations and found that the root cause is the exception thrown from `Learner.createSocket` ([https://github.com/apache/zookeeper/blob/release-3.6.2/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Learner.java#L410-L423]). `Learner.createSocket` is used in `Learner$LeaderConnector.connectToLeader`. The code snippet ([https://github.com/apache/zookeeper/blob/release-3.6.2/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Learner.java#L344-L407]) is as follows: {code:java} private Socket connectToLeader() throws IOException, X509Exception, InterruptedException { Socket sock = createSocket(); // line 345 // ... int remainingTimeout; long startNanoTime = nanoTime(); for (int tries = 0; tries < 5 && socket.get() == null; tries++) { try { remainingTimeout = connectTimeout - (int) ((nanoTime() - startNanoTime) / 1_000_000); if (remainingTimeout <= 0) { LOG.error(...); throw new IOException("connectToLeader exceeded on retries."); } sockConnect(sock, address, Math.min(connectTimeout, remainingTimeout)); // ... break; } catch (IOException e) { remainingTimeout = connectTimeout - (int) ((nanoTime() - startNanoTime) / 1_000_000); if (remainingTimeout <= leaderConnectDelayDuringRetryMs) { LOG.error(...); throw e; } else if (tries >= 4) { LOG.error(...); throw e; } else { LOG.warn(...); sock = createSocket(); // line 400 } } Thread.sleep(leaderConnectDelayDuringRetryMs); } return sock; } {code} The retry (5 times) in this code implies that sometimes exceptions are expected to appear, and the program should proceed to try again. However, `Learner.createSocket` is invoked in either line 345 or line 400. And it may also throw exceptions. But it’s not within the try block. Therefore, this exception will turn to `Follower.followLeader` (the caller of `Learner$LeaderConnector.connectToLeader`). And then this server will change from FOLLOWING state to LOOKING state. `Learner.createSocket` may throw exceptions in line 417 or 421: {code:java} /** * Creating a simple or and SSL socket. * This can be overridden in tests to fake already connected sockets for connectToLeader. */ protected Socket createSocket() throws X509Exception, IOException { Socket sock; if (self.isSslQuorum()) { sock = self.getX509Util().createSSLSocket(); // line 417 } else { sock = new Socket(); } sock.setSoTimeout(self.tickTime * self.initLimit); // line 421 return sock; } {code} The reason for throwing the exception may be [https://stackoverflow.com/questions/22423063/java-exception-on-sslsocket-creation] or [https://stackoverflow.com/questions/7064788/when-does-java-net-socket-setsotimeout-throws-socketexception.] We reproduced this bug by intentionally injecting an IOException right before the invocation of `Learner.createSocket` in line 345. We only grant one single injection in a testing run of a ZooKeeper cluster (3 nodes). Our analysis proves to be true. In terms of the symptom, sometimes the affected follower takes more time to join the quorum; sometimes the leader election is also affected, and the whole cluster does the re-election, meaning that every node is affected. *Fix* We propose that the `Learner.createSocket` should be moved to the try-catch block. We have created a pull request. was: We were doing some testing in ZooKeeper 3.6.2, and found that sometimes a follower may take longer time (about 10s) to join the quorum. In our cluster, usually we expect that the leader and followers are ready to provide service in 1 or 2 seconds. *Analysis* We did some investigations and found that the root cause is the exception thrown from `Learner.createSocket` ([https://github.com/apache/zookeeper/blob/release-3.6.2/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Learner.java#L410-L423]). `Learner.createSocket` is used in `Learner$LeaderConnector.connectToLeader`. The code snippet ([https://github.com/apache/zookeeper/blob/release-3.6.2/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Learner.java#L344-L407]) is as follows: {code:java} private Socket connectToLeader() throws IOException, X509Exception, InterruptedException { Socket sock = createSocket(); // line 345 // ... int remainingTimeout; long startNanoTime = nanoTime(); for (int tries = 0; tries < 5 && socket.get() == null; tries++) { try { remainingTimeout = connectTimeout - (int) ((nanoTime() - startNanoTime) / 1_000_000); if (remainingTimeout <= 0) { LOG.error(...); throw new IOException("connectToLeader exceeded on retries."); } sockConnect(sock, address, Math.min(connectTimeout, remainingTimeout)); // ... break; } catch (IOException e) { remainingTimeout = connectTimeout - (int) ((nanoTime() - startNanoTime) / 1_000_000); if (remainingTimeout <= leaderConnectDelayDuringRetryMs) { LOG.error(...); throw e; } else if (tries >= 4) { LOG.error(...); throw e; } else { LOG.warn(...); sock = createSocket(); // line 400 } } Thread.sleep(leaderConnectDelayDuringRetryMs); } return sock; } {code} The retry (5 times) in this code implies that sometimes exceptions are expected to appear, and the program should proceed to try again. However, `Learner.createSocket` is invoked in either line 345 or line 400. And it may also throw exceptions. But it’s not within the try block. Therefore, this exception will turn to `Follower.followLeader` (the caller of `Learner$LeaderConnector.connectToLeader`). And then this server will change from FOLLOWING state to LOOKING state. `Learner.createSocket` may throw exceptions in line 417 or 421: {code:java} /** * Creating a simple or and SSL socket. * This can be overridden in tests to fake already connected sockets for connectToLeader. */ protected Socket createSocket() throws X509Exception, IOException { Socket sock; if (self.isSslQuorum()) { sock = self.getX509Util().createSSLSocket(); // line 417 } else { sock = new Socket(); } sock.setSoTimeout(self.tickTime * self.initLimit); // line 421 return sock; } {code} The reason for throwing the exception may be [https://stackoverflow.com/questions/22423063/java-exception-on-sslsocket-creation] or [https://stackoverflow.com/questions/7064788/when-does-java-net-socket-setsotimeout-throws-socketexception.] We reproduced this bug by intentionally injecting an IOException right before the invocation of `Learner.createSocket` in line 345. We only grant one single injection in a testing run of a ZooKeeper cluster (3 nodes). Our analysis proves to be true. In terms of the symptom, sometimes the affected follower takes more time to join the quorum; sometimes the leader election is also affected, and the whole cluster does the re-election, meaning that every node is affected. *Fix* We propose that the `Learner.createSocket` should be moved to the try-catch block. We will create a pull request. > The potential exception in Learner$LeaderConnector.connectToLeader may cause > unnecessary re-election or service delay > --------------------------------------------------------------------------------------------------------------------- > > Key: ZOOKEEPER-4419 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4419 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.6.2 > Reporter: Haoze Wu > Priority: Major > > We were doing some testing in ZooKeeper 3.6.2, and found that sometimes a > follower may take longer time (about 10s) to join the quorum. In our cluster, > usually we expect that the leader and followers are ready to provide service > in 1 or 2 seconds. > *Analysis* > We did some investigations and found that the root cause is the exception > thrown from `Learner.createSocket` > ([https://github.com/apache/zookeeper/blob/release-3.6.2/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Learner.java#L410-L423]). > `Learner.createSocket` is used in `Learner$LeaderConnector.connectToLeader`. > The code snippet > ([https://github.com/apache/zookeeper/blob/release-3.6.2/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Learner.java#L344-L407]) > is as follows: > {code:java} > private Socket connectToLeader() throws IOException, X509Exception, > InterruptedException { > Socket sock = createSocket(); // line 345 > // ... > int remainingTimeout; > long startNanoTime = nanoTime(); > for (int tries = 0; tries < 5 && socket.get() == null; tries++) { > try { > remainingTimeout = connectTimeout - (int) ((nanoTime() - > startNanoTime) / 1_000_000); > if (remainingTimeout <= 0) { > LOG.error(...); > throw new IOException("connectToLeader exceeded on > retries."); > } > sockConnect(sock, address, Math.min(connectTimeout, > remainingTimeout)); > // ... > break; > } catch (IOException e) { > remainingTimeout = connectTimeout - (int) ((nanoTime() - > startNanoTime) / 1_000_000); > if (remainingTimeout <= leaderConnectDelayDuringRetryMs) { > LOG.error(...); > throw e; > } else if (tries >= 4) { > LOG.error(...); > throw e; > } else { > LOG.warn(...); > sock = createSocket(); // line 400 > } > } > Thread.sleep(leaderConnectDelayDuringRetryMs); > } > return sock; > } > {code} > The retry (5 times) in this code implies that sometimes exceptions are > expected to appear, and the program should proceed to try again. However, > `Learner.createSocket` is invoked in either line 345 or line 400. And it may > also throw exceptions. But it’s not within the try block. Therefore, this > exception will turn to `Follower.followLeader` (the caller of > `Learner$LeaderConnector.connectToLeader`). And then this server will change > from FOLLOWING state to LOOKING state. > `Learner.createSocket` may throw exceptions in line 417 or 421: > {code:java} > /** > * Creating a simple or and SSL socket. > * This can be overridden in tests to fake already connected sockets for > connectToLeader. > */ > protected Socket createSocket() throws X509Exception, IOException { > Socket sock; > if (self.isSslQuorum()) { > sock = self.getX509Util().createSSLSocket(); // line 417 > } else { > sock = new Socket(); > } > sock.setSoTimeout(self.tickTime * self.initLimit); // line 421 > return sock; > } {code} > The reason for throwing the exception may be > [https://stackoverflow.com/questions/22423063/java-exception-on-sslsocket-creation] > or > [https://stackoverflow.com/questions/7064788/when-does-java-net-socket-setsotimeout-throws-socketexception.] > We reproduced this bug by intentionally injecting an IOException right before > the invocation of `Learner.createSocket` in line 345. We only grant one > single injection in a testing run of a ZooKeeper cluster (3 nodes). Our > analysis proves to be true. In terms of the symptom, sometimes the affected > follower takes more time to join the quorum; sometimes the leader election is > also affected, and the whole cluster does the re-election, meaning that every > node is affected. > *Fix* > We propose that the `Learner.createSocket` should be moved to the try-catch > block. We have created a pull request. -- This message was sent by Atlassian Jira (v8.20.1#820001)