[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133998#comment-16133998
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
-------------------------------------------

Github user bitgaoshu commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/336#discussion_r134085348
  
    --- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
    @@ -638,13 +639,22 @@ public void run() {
                         LOG.info("My election bind port: " + addr.toString());
                         setName(addr.toString());
                         ss.bind(addr);
    +                    ss.setSoTimeout(10 * 1000); // Ten seconds
    +                    long acceptStartTime = System.currentTimeMillis();
                         while (!shutdown) {
    -                        client = ss.accept();
    -                        setSockOpts(client);
    -                        LOG.info("Received connection request "
    -                                + client.getRemoteSocketAddress());
    -                        receiveConnection(client);
    -                        numRetries = 0;
    +                        try {
    +                            client = ss.accept();
    +                            setSockOpts(client);
    +                            LOG.info("Received connection request "
    +                                     + client.getRemoteSocketAddress());
    +                            receiveConnection(client);
    +                            numRetries = 0;
    +                        } catch (SocketTimeoutException e) {
    +                            LOG.warn("The socket is listening for the 
election accepted "
    +                                     + "an unexpected timeout ["
    +                                     + (System.currentTimeMillis() - 
acceptStartTime) + "]milliseconds"
    +                                     + "after the call to accept(). is 
this an instance of bug ZOOKEEPER-2836?");
    --- End diff --
    
    I agree.  I will leave the timeout at 0 by default and leave a log 
statement.


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2836
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: leaderElection, quorum
>    Affects Versions: 3.4.6
>         Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>            Reporter: Amarjeet Singh
>            Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $<hostname>/$<ip>:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> QuorumPeer<myid=1>/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open 
> channel to 3 at election address $<hostname>/$<ip>:3888' .
>         As there is no timeout specified for ServerSocket it should never 
> timeout but there are some already discussed issues where people have seen 
> this issue and added checks for SocketTimeoutException explicitly like 
> https://issues.apache.org/jira/browse/KARAF-3325 . 
>         I think we need to handle SocketTimeoutException on similar lines for 
> zookeeper as well 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to