[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414284#comment-15414284
 ] 

Michael Han commented on ZOOKEEPER-2508:
----------------------------------------

Thanks for the quick update [~arshad.mohammad]. New tests are very clean with 
the abstraction, comparing the old tests where each test reinvented the wheel.

I have a few more comments on 04.patch:

* In "ObserverTest.testObserver" two checks are removed:
{code}
Assert.assertNotSame("Client is still connected to non-quorate cluster", 
KeeperState.SyncConnected,lastEvent.getState());
Assert.assertTrue("Client didn't reconnect to quorate ensemble (state was" + 
lastEvent.getState() + ")", 
    (KeeperState.SyncConnected==lastEvent.getState() ||         
    KeeperState.Expired==lastEvent.getState())); 
{code}

This is a functional change to the test case because the new abstraction only 
checks initially when the ZK handle was created. The removed checks seems very 
reasonable to have from a functional testing perspective, so I think this 
should be fixed. If abstracting these checks is cumbersome maybe we can just 
leave this test case as it is without using the new createZKClient introduced 
in this patch.

*  For 'public static ZooKeeper createZKClient(String cxnString, int 
sessionTimeout)', maybe using Assert explicitly is better than throwing a 
TimeoutException? For example:
{code}
public static ZooKeeper createZKClient(String cxnString, int sessionTimeout) 
throws Exception {
    CountdownWatcher watcher = new CountdownWatcher(); 
    ZooKeeper zk = new ZooKeeper(cxnString, sessionTimeout, watcher);
    try {
        watcher.waitForConnected(CONNECTION_TIMEOUT);
    } catch (InterruptedException e) {
        // Ignore interrupt.. not interested.
    } catch (TimeoutException e) {
        Assert.fail("ZooKeeper client can not connect to " + cxnString);
    }
    return zk;
}
{code}
By using Assert we make the contract clear that we always expect a functional 
connected ZK client before proceeding to execute rest of tests; it also makes 
the test report easier to read, in case such assumption is violated as we would 
know what exact expectation was failed (instead of digging into code and 
figured out why).

* I am thinking we should make the 'CONNECTION_TIMEOUT' a parameter in the 
interface of 'createZKClient', so each test case can tune the timeout based on 
their needs, given that there are a lot of variations of timeouts per test.

A side note regarding test timeout variation among tests: some requires as 
small as 5 seconds while others require 10, 20, 30 or more. This might be a by 
design but could also be an optimization opportunity for us to normalize the 
timeout value so tests could run faster. This however can be tackled in a 
separate JIRA.

> Many ZooKeeper tests are flaky because they proceed with zk operation without 
> connecting to ZooKeeper server.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2508
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2508
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: tests
>            Reporter: Arshad Mohammad
>            Assignee: Arshad Mohammad
>             Fix For: 3.5.3
>
>         Attachments: ZOOKEEPER-2508-01.patch, ZOOKEEPER-2508-02.patch, 
> ZOOKEEPER-2508-03.patch, ZOOKEEPER-2508-04.patch
>
>
> Many ZooKeeper tests are flaky because they proceed with zk operation without 
> connecting to ZooKeeper server.
> Recently in our build 
> {{org.apache.zookeeper.server.ZooKeeperServerMainTest.testStandalone()}} 
> failed.
> After analyzing we found that it is failed because it is not waiting for 
> ZooKeeper client get connected to server. In this case normally zookeeper 
> client gets connected immediately but if not connected immediately the test 
> case is bound to fail.
> Not only ZooKeeperServerMainTest but there are many other classes which have 
> such test cases. This jira is to address all those test cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to