[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2018-11-21 Thread Michael K. Edwards (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695441#comment-16695441
 ] 

Michael K. Edwards commented on ZOOKEEPER-2836:
---

Fix needed for 3.5.5?

> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Assignee: gaoshu
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> QuorumPeer/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open 
> channel to 3 at election address $/$:3888' .
> As there is no timeout specified for ServerSocket it should never 
> timeout but there are some already discussed issues where people have seen 
> this issue and added checks for SocketTimeoutException explicitly like 
> https://issues.apache.org/jira/browse/KARAF-3325 . 
> I think we need to handle SocketTimeoutException on similar lines for 
> zookeeper as well 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139876#comment-16139876
 ] 

Hadoop QA commented on ZOOKEEPER-2836:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/962//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/962//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/962//console

This message is automatically generated.

> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139864#comment-16139864
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user bitgaoshu commented on the issue:

https://github.com/apache/zookeeper/pull/336
  
gaoshu, thank you very much. @hanm 


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139863#comment-16139863
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user bitgaoshu commented on the issue:

https://github.com/apache/zookeeper/pull/336
  
ok, i had made this patch apply to the 3.5. 
[PR](https://github.com/apache/zookeeper/pull/344)


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139862#comment-16139862
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

GitHub user bitgaoshu opened a pull request:

https://github.com/apache/zookeeper/pull/344

ZOOKEEPER-2836: Fix SocketTimeoutException

apply patch from master to branch-3.5

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bitgaoshu/zookeeper branch-3.5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/344.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #344


commit 86701965dba3dac9aad9cc5263661e09255c766a
Author: bitgaoshu 
Date:   2017-08-24T06:23:32Z

ZOOKEEPER-2836: Fix SocketTimeoutException




> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139558#comment-16139558
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/336
  
@bitgaoshu What's your username on JIRA 
(https://issues.apache.org/jira/projects/ZOOKEEPER)? Please create an account 
on JIRA if you haven't and let me know the user name. Since you contributed the 
patch, I should assign the JIRA issue to you so you get the credits on your 
contribution.


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138773#comment-16138773
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user skamille commented on the issue:

https://github.com/apache/zookeeper/pull/336
  
@bitgaoshu if you could make this patch apply to the 3.5 branch I will 
commit it there as well. Unfortunately that branch has a significant divergence 
from master and I suspect we want this fix in before we release 3.6


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137737#comment-16137737
 ] 

Hudson commented on ZOOKEEPER-2836:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #3509 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/3509/])
ZOOKEEPER-2836: fix SocketTimeoutException (camille: rev 
52aff3eca439bba70f2b4d175ce331754dcd03db)
* (edit) src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137655#comment-16137655
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user skamille commented on the issue:

https://github.com/apache/zookeeper/pull/336
  
+1 merging


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137653#comment-16137653
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/336


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134030#comment-16134030
 ] 

Hadoop QA commented on ZOOKEEPER-2836:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/949//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/949//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/949//console

This message is automatically generated.

> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134000#comment-16134000
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user bitgaoshu commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/336#discussion_r134085352
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -638,13 +639,22 @@ public void run() {
 LOG.info("My election bind port: " + addr.toString());
 setName(addr.toString());
 ss.bind(addr);
+ss.setSoTimeout(10 * 1000); // Ten seconds
--- End diff --

i also think the timeout not need to be set. i just follow 
[this](https://github.com/apache/karaf/pull/50/commits/0349d582c4899f19ad73ee37c8c688660cbc7354)
  simply.


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133998#comment-16133998
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user bitgaoshu commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/336#discussion_r134085348
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -638,13 +639,22 @@ public void run() {
 LOG.info("My election bind port: " + addr.toString());
 setName(addr.toString());
 ss.bind(addr);
+ss.setSoTimeout(10 * 1000); // Ten seconds
+long acceptStartTime = System.currentTimeMillis();
 while (!shutdown) {
-client = ss.accept();
-setSockOpts(client);
-LOG.info("Received connection request "
-+ client.getRemoteSocketAddress());
-receiveConnection(client);
-numRetries = 0;
+try {
+client = ss.accept();
+setSockOpts(client);
+LOG.info("Received connection request "
+ + client.getRemoteSocketAddress());
+receiveConnection(client);
+numRetries = 0;
+} catch (SocketTimeoutException e) {
+LOG.warn("The socket is listening for the 
election accepted "
+ + "an unexpected timeout ["
+ + (System.currentTimeMillis() - 
acceptStartTime) + "]milliseconds"
+ + "after the call to accept(). is 
this an instance of bug ZOOKEEPER-2836?");
--- End diff --

I agree.  I will leave the timeout at 0 by default and leave a log 
statement.


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133178#comment-16133178
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user skamille commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/336#discussion_r133994057
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -638,13 +639,22 @@ public void run() {
 LOG.info("My election bind port: " + addr.toString());
 setName(addr.toString());
 ss.bind(addr);
+ss.setSoTimeout(10 * 1000); // Ten seconds
+long acceptStartTime = System.currentTimeMillis();
 while (!shutdown) {
-client = ss.accept();
-setSockOpts(client);
-LOG.info("Received connection request "
-+ client.getRemoteSocketAddress());
-receiveConnection(client);
-numRetries = 0;
+try {
+client = ss.accept();
+setSockOpts(client);
+LOG.info("Received connection request "
+ + client.getRemoteSocketAddress());
+receiveConnection(client);
+numRetries = 0;
+} catch (SocketTimeoutException e) {
+LOG.warn("The socket is listening for the 
election accepted "
+ + "an unexpected timeout ["
+ + (System.currentTimeMillis() - 
acceptStartTime) + "]milliseconds"
+ + "after the call to accept(). is 
this an instance of bug ZOOKEEPER-2836?");
--- End diff --

I don't love this error message and it doesn't make sense because we've set 
the socket timeout above so it will never time out based on that weird possible 
JVM error. So either we just continue after a timeout, or leave the timeout at 
0 and leave a log statement that indicates it timed out unexpectedly but will 
retry, but the double fix doesn't really make sense to me.


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133137#comment-16133137
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user skamille commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/336#discussion_r133989279
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -638,13 +639,22 @@ public void run() {
 LOG.info("My election bind port: " + addr.toString());
 setName(addr.toString());
 ss.bind(addr);
+ss.setSoTimeout(10 * 1000); // Ten seconds
--- End diff --

By setting the timeout to 10s the timeout exception will fire potentially 
every 10s, is that what we want below?


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131788#comment-16131788
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user bitgaoshu commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/336#discussion_r133886561
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -647,11 +648,10 @@ public void run() {
 numRetries = 0;
 }
 } catch (IOException e) {
-if (shutdown) {
--- End diff --

Update. I used to consider that the `closeSocket(client); ` should also be 
executed when exception was thrown.


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131777#comment-16131777
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user bitgaoshu commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/336#discussion_r133885327
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -647,11 +648,10 @@ public void run() {
 numRetries = 0;
 }
 } catch (IOException e) {
-if (shutdown) {
-break;
-}
 LOG.error("Exception while listening", e);
-numRetries++;
+if (!(e instanceof SocketTimeoutException)) {
--- End diff --

- update

- l checked the native method `java.net.PlainSocketImpl.socketAccept(Native 
Method)` in 
[openjdk](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/9d617cfd6717/src/solaris/native/java/net/PlainSocketImpl.c),
 **line709-721**, in which it changed from 0 to -1. and then timeout of -1 is 
interpreted as an infinite timeout.  In some cases, [-1 was interpreted as a 
larger positive integer](https://lwn.net/Articles/483078/). so this issue 
always happend after 49days. It's my wild conjecture.


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131691#comment-16131691
 ] 

Hadoop QA commented on ZOOKEEPER-2836:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//console

This message is automatically generated.

> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131588#comment-16131588
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user maoling commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/336#discussion_r133865685
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -647,11 +648,10 @@ public void run() {
 numRetries = 0;
 }
 } catch (IOException e) {
-if (shutdown) {
-break;
-}
 LOG.error("Exception while listening", e);
-numRetries++;
+if (!(e instanceof SocketTimeoutException)) {
--- End diff --

-  can we reproduce this issue?(haha,49days)? This should never happen 
theoretically.According to 
[KARAF-3325](https://issues.apache.org/jira/browse/KARAF-3325) or 
[tomcat-56684](https://bz.apache.org/bugzilla/show_bug.cgi?id=56684),they also  
didn't find the root-cause,just do like 
[this](https://github.com/apache/karaf/pull/50/commits/0349d582c4899f19ad73ee37c8c688660cbc7354)
 to add some protections against this issue here.
-  One assumption is SocketServer.accept() use the default infinite value(2 
^ 32 -1=4294967295) without no timeout specified or setSoTimeout(0) 
> a call to accept() for this ServerSocket will block for only this 
amount of time. If the timeout expires, a java.net.SocketTimeoutException is 
raised, though the ServerSocket is still valid. The option must be enabled 
prior to entering the blocking operation to have effect. The timeout must be > 
0. A timeout of zero is interpreted as an infinite timeout.

   so this issuse always happended after 49days 17 
hours(4294967295/1000/60/60/24=49.7days)


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131577#comment-16131577
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user maoling commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/336#discussion_r133864927
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -647,11 +648,10 @@ public void run() {
 numRetries = 0;
 }
 } catch (IOException e) {
-if (shutdown) {
--- End diff --

why we need to move code block **Line650-Line652** to code block 
**Line665-Line667** ?


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125339#comment-16125339
 ] 

Hadoop QA commented on ZOOKEEPER-2836:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/941//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/941//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/941//console

This message is automatically generated.

> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125335#comment-16125335
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user bitgaoshu commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/334#discussion_r132887998
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -647,11 +648,11 @@ public void run() {
 numRetries = 0;
 }
 } catch (IOException e) {
-if (shutdown) {
-break;
-}
 LOG.error("Exception while listening", e);
-numRetries++;
+if (!(e instanceof SocketTimeoutException)) {
+numRetries++;
+}
+}finally {
--- End diff --

it's my first time to commit code on github. i open a new  
[pr](https://github.com/apache/zookeeper/pull/336), which has fixed according 
to your opinion.  I am sorry for my inconvenience. 


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125320#comment-16125320
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

GitHub user bitgaoshu opened a pull request:

https://github.com/apache/zookeeper/pull/336

ZOOKEEPER-2836 fix SocketTimeoutException



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bitgaoshu/zookeeper ZOOKEEPER-2836

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/336.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #336


commit 3653e6ddc21589355fb06c98aa60665fce4a4e24
Author: bitgaoshu 
Date:   2017-08-14T07:02:16Z

ZOOKEEPER-2836 fix SocketTimeoutException




> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125184#comment-16125184
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user bitgaoshu commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/334#discussion_r132868731
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -647,11 +648,11 @@ public void run() {
 numRetries = 0;
 }
 } catch (IOException e) {
-if (shutdown) {
-break;
-}
 LOG.error("Exception while listening", e);
-numRetries++;
+if (!(e instanceof SocketTimeoutException)) {
+numRetries++;
+}
+}finally {
--- End diff --

1. i had removed **finally** code block in my newest code.
2. according to jira , the **SocketTimeoutException** was likely thrown 
every about 49days, maybe unsigned int -1. Or, we can follow 
[Karaf](https://issues.apache.org/jira/browse/KARAF-3325l) to fix this code. 
but i think it was thrown too often.


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124606#comment-16124606
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user maoling commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/334#discussion_r132820615
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -647,11 +648,11 @@ public void run() {
 numRetries = 0;
 }
 } catch (IOException e) {
-if (shutdown) {
-break;
-}
 LOG.error("Exception while listening", e);
-numRetries++;
+if (!(e instanceof SocketTimeoutException)) {
+numRetries++;
+}
+}finally {
--- End diff --

1. add a space between **}** and **finally**
2. why we need to move some codes about closing **ServerSocket** in 
**catch** code block to **finally** code block? this will 
   make codes in **Line677-Line685** redundant
3. If **numRetries** don't count **SocketTimeoutExceptions**.is this code 
facing an endless loop if SocketTimeoutExceptions always happen for a long 
time? this way of handling SocketTimeoutException is appropriate?


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123067#comment-16123067
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user bitgaoshu closed the pull request at:

https://github.com/apache/zookeeper/pull/334


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> 

[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123064#comment-16123064
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

GitHub user bitgaoshu opened a pull request:

https://github.com/apache/zookeeper/pull/334

ZOOKEEPER-2836 SocketTimeoutException



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bitgaoshu/zookeeper fix/ZOOKEEPER-2836

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/334.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #334


commit 313406fdb0bd247c897409d8dbf800f6a6d62ce4
Author: fengwei 
Date:   2017-08-11T09:10:06Z

ZOOKEEPER-2836 SocketTimeoutException




> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
>