Amarjeet Singh created ZOOKEEPER-2836:
-----------------------------------------

             Summary: QuorumCnxManager.Listener Thread Better handling of 
SocketTimeoutException
                 Key: ZOOKEEPER-2836
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
             Project: ZooKeeper
          Issue Type: Bug
          Components: leaderElection, quorum
    Affects Versions: 3.4.6
         Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
x86_64 GNU/Linux
Java Version: jdk64/jdk1.8.0_40
zookeeper version:  3.4.6.2.3.2.0-2950 
            Reporter: Amarjeet Singh
            Priority: Critical


QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
current code there is a 3 times retry and after that it says "_As I'm leaving 
the listener thread, I won't be able to participate in leader election any 
longer: $<hostname>/$<ip>:3888__" , Once server nodes reache this state and we 
restart or add a new node ,it fails to join cluster and logs 'WARN  
QuorumPeer<myid=1>/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open 
channel to 3 at election address $<hostname>/$<ip>:3888' .


        As there is no timeout specified for ServerSocket it should never 
timeout but there are some already discussed issues where people have seen this 
issue and added checks for SocketTimeoutException explicitly like 
https://issues.apache.org/jira/browse/KARAF-3325 . 

        I think we need to handle SocketTimeoutException on similar lines for 
zookeeper as well 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to