[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695441#comment-16695441 ] Michael K. Edwards commented on ZOOKEEPER-2836: --- Fix needed for 3.5.5? > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Assignee: gaoshu >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN > QuorumPeer/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open > channel to 3 at election address $/$:3888' . > As there is no timeout specified for ServerSocket it should never > timeout but there are some already discussed issues where people have seen > this issue and added checks for SocketTimeoutException explicitly like > https://issues.apache.org/jira/browse/KARAF-3325 . > I think we need to handle SocketTimeoutException on similar lines for > zookeeper as well -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139876#comment-16139876 ] Hadoop QA commented on ZOOKEEPER-2836: -- +1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/962//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/962//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/962//console This message is automatically generated. > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139864#comment-16139864 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user bitgaoshu commented on the issue: https://github.com/apache/zookeeper/pull/336 gaoshu, thank you very much. @hanm > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139863#comment-16139863 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user bitgaoshu commented on the issue: https://github.com/apache/zookeeper/pull/336 ok, i had made this patch apply to the 3.5. [PR](https://github.com/apache/zookeeper/pull/344) > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139862#comment-16139862 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- GitHub user bitgaoshu opened a pull request: https://github.com/apache/zookeeper/pull/344 ZOOKEEPER-2836: Fix SocketTimeoutException apply patch from master to branch-3.5 You can merge this pull request into a Git repository by running: $ git pull https://github.com/bitgaoshu/zookeeper branch-3.5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/344.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #344 commit 86701965dba3dac9aad9cc5263661e09255c766a Author: bitgaoshuDate: 2017-08-24T06:23:32Z ZOOKEEPER-2836: Fix SocketTimeoutException > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139558#comment-16139558 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/336 @bitgaoshu What's your username on JIRA (https://issues.apache.org/jira/projects/ZOOKEEPER)? Please create an account on JIRA if you haven't and let me know the user name. Since you contributed the patch, I should assign the JIRA issue to you so you get the credits on your contribution. > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138773#comment-16138773 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user skamille commented on the issue: https://github.com/apache/zookeeper/pull/336 @bitgaoshu if you could make this patch apply to the 3.5 branch I will commit it there as well. Unfortunately that branch has a significant divergence from master and I suspect we want this fix in before we release 3.6 > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137737#comment-16137737 ] Hudson commented on ZOOKEEPER-2836: --- SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #3509 (See [https://builds.apache.org/job/ZooKeeper-trunk/3509/]) ZOOKEEPER-2836: fix SocketTimeoutException (camille: rev 52aff3eca439bba70f2b4d175ce331754dcd03db) * (edit) src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137655#comment-16137655 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user skamille commented on the issue: https://github.com/apache/zookeeper/pull/336 +1 merging > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137653#comment-16137653 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user asfgit closed the pull request at: https://github.com/apache/zookeeper/pull/336 > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134030#comment-16134030 ] Hadoop QA commented on ZOOKEEPER-2836: -- +1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/949//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/949//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/949//console This message is automatically generated. > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134000#comment-16134000 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user bitgaoshu commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/336#discussion_r134085352 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -638,13 +639,22 @@ public void run() { LOG.info("My election bind port: " + addr.toString()); setName(addr.toString()); ss.bind(addr); +ss.setSoTimeout(10 * 1000); // Ten seconds --- End diff -- i also think the timeout not need to be set. i just follow [this](https://github.com/apache/karaf/pull/50/commits/0349d582c4899f19ad73ee37c8c688660cbc7354) simply. > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133998#comment-16133998 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user bitgaoshu commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/336#discussion_r134085348 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -638,13 +639,22 @@ public void run() { LOG.info("My election bind port: " + addr.toString()); setName(addr.toString()); ss.bind(addr); +ss.setSoTimeout(10 * 1000); // Ten seconds +long acceptStartTime = System.currentTimeMillis(); while (!shutdown) { -client = ss.accept(); -setSockOpts(client); -LOG.info("Received connection request " -+ client.getRemoteSocketAddress()); -receiveConnection(client); -numRetries = 0; +try { +client = ss.accept(); +setSockOpts(client); +LOG.info("Received connection request " + + client.getRemoteSocketAddress()); +receiveConnection(client); +numRetries = 0; +} catch (SocketTimeoutException e) { +LOG.warn("The socket is listening for the election accepted " + + "an unexpected timeout [" + + (System.currentTimeMillis() - acceptStartTime) + "]milliseconds" + + "after the call to accept(). is this an instance of bug ZOOKEEPER-2836?"); --- End diff -- I agree. I will leave the timeout at 0 by default and leave a log statement. > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133178#comment-16133178 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user skamille commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/336#discussion_r133994057 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -638,13 +639,22 @@ public void run() { LOG.info("My election bind port: " + addr.toString()); setName(addr.toString()); ss.bind(addr); +ss.setSoTimeout(10 * 1000); // Ten seconds +long acceptStartTime = System.currentTimeMillis(); while (!shutdown) { -client = ss.accept(); -setSockOpts(client); -LOG.info("Received connection request " -+ client.getRemoteSocketAddress()); -receiveConnection(client); -numRetries = 0; +try { +client = ss.accept(); +setSockOpts(client); +LOG.info("Received connection request " + + client.getRemoteSocketAddress()); +receiveConnection(client); +numRetries = 0; +} catch (SocketTimeoutException e) { +LOG.warn("The socket is listening for the election accepted " + + "an unexpected timeout [" + + (System.currentTimeMillis() - acceptStartTime) + "]milliseconds" + + "after the call to accept(). is this an instance of bug ZOOKEEPER-2836?"); --- End diff -- I don't love this error message and it doesn't make sense because we've set the socket timeout above so it will never time out based on that weird possible JVM error. So either we just continue after a timeout, or leave the timeout at 0 and leave a log statement that indicates it timed out unexpectedly but will retry, but the double fix doesn't really make sense to me. > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133137#comment-16133137 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user skamille commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/336#discussion_r133989279 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -638,13 +639,22 @@ public void run() { LOG.info("My election bind port: " + addr.toString()); setName(addr.toString()); ss.bind(addr); +ss.setSoTimeout(10 * 1000); // Ten seconds --- End diff -- By setting the timeout to 10s the timeout exception will fire potentially every 10s, is that what we want below? > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131788#comment-16131788 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user bitgaoshu commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/336#discussion_r133886561 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -647,11 +648,10 @@ public void run() { numRetries = 0; } } catch (IOException e) { -if (shutdown) { --- End diff -- Update. I used to consider that the `closeSocket(client); ` should also be executed when exception was thrown. > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131777#comment-16131777 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user bitgaoshu commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/336#discussion_r133885327 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -647,11 +648,10 @@ public void run() { numRetries = 0; } } catch (IOException e) { -if (shutdown) { -break; -} LOG.error("Exception while listening", e); -numRetries++; +if (!(e instanceof SocketTimeoutException)) { --- End diff -- - update - l checked the native method `java.net.PlainSocketImpl.socketAccept(Native Method)` in [openjdk](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/9d617cfd6717/src/solaris/native/java/net/PlainSocketImpl.c), **line709-721**, in which it changed from 0 to -1. and then timeout of -1 is interpreted as an infinite timeout. In some cases, [-1 was interpreted as a larger positive integer](https://lwn.net/Articles/483078/). so this issue always happend after 49days. It's my wild conjecture. > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131691#comment-16131691 ] Hadoop QA commented on ZOOKEEPER-2836: -- +1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//console This message is automatically generated. > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131588#comment-16131588 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user maoling commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/336#discussion_r133865685 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -647,11 +648,10 @@ public void run() { numRetries = 0; } } catch (IOException e) { -if (shutdown) { -break; -} LOG.error("Exception while listening", e); -numRetries++; +if (!(e instanceof SocketTimeoutException)) { --- End diff -- - can we reproduce this issue?(haha,49days)? This should never happen theoretically.According to [KARAF-3325](https://issues.apache.org/jira/browse/KARAF-3325) or [tomcat-56684](https://bz.apache.org/bugzilla/show_bug.cgi?id=56684),they also didn't find the root-cause,just do like [this](https://github.com/apache/karaf/pull/50/commits/0349d582c4899f19ad73ee37c8c688660cbc7354) to add some protections against this issue here. - One assumption is SocketServer.accept() use the default infinite value(2 ^ 32 -1=4294967295) without no timeout specified or setSoTimeout(0) > a call to accept() for this ServerSocket will block for only this amount of time. If the timeout expires, a java.net.SocketTimeoutException is raised, though the ServerSocket is still valid. The option must be enabled prior to entering the blocking operation to have effect. The timeout must be > 0. A timeout of zero is interpreted as an infinite timeout. so this issuse always happended after 49days 17 hours(4294967295/1000/60/60/24=49.7days) > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131577#comment-16131577 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user maoling commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/336#discussion_r133864927 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -647,11 +648,10 @@ public void run() { numRetries = 0; } } catch (IOException e) { -if (shutdown) { --- End diff -- why we need to move code block **Line650-Line652** to code block **Line665-Line667** ? > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125339#comment-16125339 ] Hadoop QA commented on ZOOKEEPER-2836: -- +1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/941//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/941//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/941//console This message is automatically generated. > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125335#comment-16125335 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user bitgaoshu commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/334#discussion_r132887998 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -647,11 +648,11 @@ public void run() { numRetries = 0; } } catch (IOException e) { -if (shutdown) { -break; -} LOG.error("Exception while listening", e); -numRetries++; +if (!(e instanceof SocketTimeoutException)) { +numRetries++; +} +}finally { --- End diff -- it's my first time to commit code on github. i open a new [pr](https://github.com/apache/zookeeper/pull/336), which has fixed according to your opinion. I am sorry for my inconvenience. > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125320#comment-16125320 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- GitHub user bitgaoshu opened a pull request: https://github.com/apache/zookeeper/pull/336 ZOOKEEPER-2836 fix SocketTimeoutException You can merge this pull request into a Git repository by running: $ git pull https://github.com/bitgaoshu/zookeeper ZOOKEEPER-2836 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/336.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #336 commit 3653e6ddc21589355fb06c98aa60665fce4a4e24 Author: bitgaoshuDate: 2017-08-14T07:02:16Z ZOOKEEPER-2836 fix SocketTimeoutException > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125184#comment-16125184 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user bitgaoshu commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/334#discussion_r132868731 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -647,11 +648,11 @@ public void run() { numRetries = 0; } } catch (IOException e) { -if (shutdown) { -break; -} LOG.error("Exception while listening", e); -numRetries++; +if (!(e instanceof SocketTimeoutException)) { +numRetries++; +} +}finally { --- End diff -- 1. i had removed **finally** code block in my newest code. 2. according to jira , the **SocketTimeoutException** was likely thrown every about 49days, maybe unsigned int -1. Or, we can follow [Karaf](https://issues.apache.org/jira/browse/KARAF-3325l) to fix this code. but i think it was thrown too often. > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124606#comment-16124606 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user maoling commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/334#discussion_r132820615 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -647,11 +648,11 @@ public void run() { numRetries = 0; } } catch (IOException e) { -if (shutdown) { -break; -} LOG.error("Exception while listening", e); -numRetries++; +if (!(e instanceof SocketTimeoutException)) { +numRetries++; +} +}finally { --- End diff -- 1. add a space between **}** and **finally** 2. why we need to move some codes about closing **ServerSocket** in **catch** code block to **finally** code block? this will make codes in **Line677-Line685** redundant 3. If **numRetries** don't count **SocketTimeoutExceptions**.is this code facing an endless loop if SocketTimeoutExceptions always happen for a long time? this way of handling SocketTimeoutException is appropriate? > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123067#comment-16123067 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user bitgaoshu closed the pull request at: https://github.com/apache/zookeeper/pull/334 > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123064#comment-16123064 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- GitHub user bitgaoshu opened a pull request: https://github.com/apache/zookeeper/pull/334 ZOOKEEPER-2836 SocketTimeoutException You can merge this pull request into a Git repository by running: $ git pull https://github.com/bitgaoshu/zookeeper fix/ZOOKEEPER-2836 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/334.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #334 commit 313406fdb0bd247c897409d8dbf800f6a6d62ce4 Author: fengweiDate: 2017-08-11T09:10:06Z ZOOKEEPER-2836 SocketTimeoutException > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN >