Amarjeet Singh created ZOOKEEPER-2836:
-----------------------------------------
Summary: QuorumCnxManager.Listener Thread Better handling of
SocketTimeoutException
Key: ZOOKEEPER-2836
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
Project: ZooKeeper
Issue Type: Bug
Components: leaderElection, quorum
Affects Versions: 3.4.6
Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1
x86_64 GNU/Linux
Java Version: jdk64/jdk1.8.0_40
zookeeper version: 3.4.6.2.3.2.0-2950
Reporter: Amarjeet Singh
Priority: Critical
QuorumCnxManager Listener thread blocks SocketServer on accept but we are
getting SocketTimeoutException on our boxes after 49days 17 hours . As per
current code there is a 3 times retry and after that it says "_As I'm leaving
the listener thread, I won't be able to participate in leader election any
longer: $<hostname>/$<ip>:3888__" , Once server nodes reache this state and we
restart or add a new node ,it fails to join cluster and logs 'WARN
QuorumPeer<myid=1>/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open
channel to 3 at election address $<hostname>/$<ip>:3888' .
As there is no timeout specified for ServerSocket it should never
timeout but there are some already discussed issues where people have seen this
issue and added checks for SocketTimeoutException explicitly like
https://issues.apache.org/jira/browse/KARAF-3325 .
I think we need to handle SocketTimeoutException on similar lines for
zookeeper as well
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)