[
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131588#comment-16131588
]
ASF GitHub Bot commented on ZOOKEEPER-2836:
-------------------------------------------
Github user maoling commented on a diff in the pull request:
https://github.com/apache/zookeeper/pull/336#discussion_r133865685
--- Diff:
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -647,11 +648,10 @@ public void run() {
numRetries = 0;
}
} catch (IOException e) {
- if (shutdown) {
- break;
- }
LOG.error("Exception while listening", e);
- numRetries++;
+ if (!(e instanceof SocketTimeoutException)) {
--- End diff --
- can we reproduce this issue?(haha,49days)? This should never happen
theoretically.According to
[KARAF-3325](https://issues.apache.org/jira/browse/KARAF-3325) or
[tomcat-56684](https://bz.apache.org/bugzilla/show_bug.cgi?id=56684),they also
didn't find the root-cause,just do like
[this](https://github.com/apache/karaf/pull/50/commits/0349d582c4899f19ad73ee37c8c688660cbc7354)
to add some protections against this issue here.
- One assumption is SocketServer.accept() use the default infinite value(2
^ 32 -1=4294967295) without no timeout specified or setSoTimeout(0)
> a call to accept() for this ServerSocket will block for only this
amount of time. If the timeout expires, a java.net.SocketTimeoutException is
raised, though the ServerSocket is still valid. The option must be enabled
prior to entering the blocking operation to have effect. The timeout must be >
0. A timeout of zero is interpreted as an infinite timeout.
so this issuse always happended after 49days 17
hours(4294967295/1000/60/60/24=49.7days)
> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --------------------------------------------------------------------------
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
> Issue Type: Bug
> Components: leaderElection, quorum
> Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version: 3.4.6.2.3.2.0-2950
> Reporter: Amarjeet Singh
> Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are
> getting SocketTimeoutException on our boxes after 49days 17 hours . As per
> current code there is a 3 times retry and after that it says "_As I'm leaving
> the listener thread, I won't be able to participate in leader election any
> longer: $<hostname>/$<ip>:3888__" , Once server nodes reache this state and
> we restart or add a new node ,it fails to join cluster and logs 'WARN
> QuorumPeer<myid=1>/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open
> channel to 3 at election address $<hostname>/$<ip>:3888' .
> As there is no timeout specified for ServerSocket it should never
> timeout but there are some already discussed issues where people have seen
> this issue and added checks for SocketTimeoutException explicitly like
> https://issues.apache.org/jira/browse/KARAF-3325 .
> I think we need to handle SocketTimeoutException on similar lines for
> zookeeper as well
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)