Vishal Khandelwal created ZOOKEEPER-2447:
--------------------------------------------

             Summary: Zookeeper takes  good delay when one of the quorum host 
is not reachable
                 Key: ZOOKEEPER-2447
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2447
             Project: ZooKeeper
          Issue Type: Bug
    Affects Versions: 3.5.0, 3.4.6
            Reporter: Vishal Khandelwal


StaticHostProvider --> resolveAndShuffle method adds all of the address which 
are valid in the quorum to the list, shuffles them and sends back to client 
connection class. If after shuffling if first node appear to be the one which 
is not reachable, Clientcnx.SendThread.run will keep on connecting to the 
failure till a timeout and the moves to a different node. This adds up random 
delay in zookeeper connection in case a host is down. Rather we could check if 
host is reachable in StaticHostProvider and ignore isReachable is false. Same 
as we do for UnknownHostException Exception.

This can tested using following test code by providing a valid host which is 
not reachable. for quick test comment Collections.shuffle(tmpList, 
sourceOfRandomness); in StaticHostProvider.resolveAndShuffle

{code}
 @Test
  public void test() throws Exception {
    EventsWatcher watcher = new EventsWatcher();
    QuorumUtil qu = new QuorumUtil(1);
    qu.startAll();
    
    ZooKeeper zk =
        new ZooKeeper("<hostnamet:2181," + qu.getConnString(), 180 * 1000, 
watcher);
    
    watcher.waitForConnected(CONNECTION_TIMEOUT * 5);
    Assert.assertTrue("connection Established", watcher.isConnected());
    zk.close();    
  }

Following fix can be added to StaticHostProvider.resolveAndShuffle
 if(taddr.isReachable(4000 // can be some value)) {
                      tmpList.add(new InetSocketAddress(taddr, 
address.getPort()));
                    } 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to