[ https://issues.apache.org/jira/browse/ZOOKEEPER-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406524#comment-15406524 ]
Edward Ribeiro commented on ZOOKEEPER-2447: ------------------------------------------- Hey [~fpj], no problem at all! :) Feel free to work on it [~vishk]. PS: I was absent last month, but getting back now. Still catching up with the latest ZK issues. :) > Zookeeper adds good delay when one of the quorum host is not reachable > ----------------------------------------------------------------------- > > Key: ZOOKEEPER-2447 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2447 > Project: ZooKeeper > Issue Type: Bug > Affects Versions: 3.4.6, 3.5.0 > Reporter: Vishal Khandelwal > Assignee: Vishal Khandelwal > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2447-MinConnectTimeoutOnly.patch, > ZOOKEEPER-2447.3.5.patch, withfix.txt, withoutFix.txt > > > StaticHostProvider --> resolveAndShuffle method adds all of the address which > are valid in the quorum to the list, shuffles them and sends back to client > connection class. If after shuffling if first node appear to be the one which > is not reachable, Clientcnx.SendThread.run will keep on connecting to the > failure till a timeout and the moves to a different node. This adds up random > delay in zookeeper connection in case a host is down. Rather we could check > if host is reachable in StaticHostProvider and ignore isReachable is false. > Same as we do for UnknownHostException Exception. > This can tested using following test code by providing a valid host which is > not reachable. for quick test comment Collections.shuffle(tmpList, > sourceOfRandomness); in StaticHostProvider.resolveAndShuffle > {code} > @Test > public void test() throws Exception { > EventsWatcher watcher = new EventsWatcher(); > QuorumUtil qu = new QuorumUtil(1); > qu.startAll(); > > ZooKeeper zk = > new ZooKeeper("<hostnamet:2181," + qu.getConnString(), 180 * 1000, > watcher); > > watcher.waitForConnected(CONNECTION_TIMEOUT * 5); > Assert.assertTrue("connection Established", watcher.isConnected()); > zk.close(); > } > {code} > Following fix can be added to StaticHostProvider.resolveAndShuffle > {code} > if(taddr.isReachable(4000 // can be some value)) { > tmpList.add(new InetSocketAddress(taddr, > address.getPort())); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)