[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124606#comment-16124606 ]
ASF GitHub Bot commented on ZOOKEEPER-2836: ------------------------------------------- Github user maoling commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/334#discussion_r132820615 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -647,11 +648,11 @@ public void run() { numRetries = 0; } } catch (IOException e) { - if (shutdown) { - break; - } LOG.error("Exception while listening", e); - numRetries++; + if (!(e instanceof SocketTimeoutException)) { + numRetries++; + } + }finally { --- End diff -- 1. add a space between **}** and **finally** 2. why we need to move some codes about closing **ServerSocket** in **catch** code block to **finally** code block? this will make codes in **Line677-Line685** redundant 3. If **numRetries** don't count **SocketTimeoutExceptions**.is this code facing an endless loop if SocketTimeoutExceptions always happen for a long time? this way of handling SocketTimeoutException is appropriate? > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -------------------------------------------------------------------------- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum > Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 > Reporter: Amarjeet Singh > Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $<hostname>/$<ip>:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN > QuorumPeer<myid=1>/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open > channel to 3 at election address $<hostname>/$<ip>:3888' . > As there is no timeout specified for ServerSocket it should never > timeout but there are some already discussed issues where people have seen > this issue and added checks for SocketTimeoutException explicitly like > https://issues.apache.org/jira/browse/KARAF-3325 . > I think we need to handle SocketTimeoutException on similar lines for > zookeeper as well -- This message was sent by Atlassian JIRA (v6.4.14#64029)