[jira] [Commented] (CURATOR-355) Curator client fails when connecting to read-only ensemble

Jordan Zimmerman (JIRA) Mon, 10 Oct 2016 10:58:39 -0700

    [ 
https://issues.apache.org/jira/browse/CURATOR-355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15562989#comment-15562989
 ]


Jordan Zimmerman commented on CURATOR-355:
------------------------------------------

Firstly, the call to 
`client.getZookeeperClient().blockUntilConnectedOrTimedOut();` is unnecessary 
as Curator does this internally. 

Curator 3.0 has better connection timeout behavior than Curator 2.0. In 2.0, 
the connection timeout is applied for each iteration of the Retry Policy. So, 
in this case, you'd expect `getData()` to wait 15 seconds * 3, plus 5 seconds * 
3 for a total of one minute. In my recreation of your test that's exactly what 
I see:

```
        System.setProperty("readonlymode.enabled", "true");
        TestingCluster cluster = new TestingCluster(3);
        cluster.getServers().get(0).stop();
        cluster.getServers().get(1).stop();

        CuratorFrameworkFactory.Builder curatorClientBuilder = 
CuratorFrameworkFactory.builder()
            .connectString(cluster.getConnectString())
            .sessionTimeoutMs(45000).connectionTimeoutMs(15000)
            .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true);

        CuratorFramework client = curatorClientBuilder.build();
        client.start();
        client.getZookeeperClient().blockUntilConnectedOrTimedOut();
        System.out.println("Successfully established the connection with 
ZooKeeper");

        client.getData().forPath("/");
        System.out.println("Done.");
```

With Curator 3.0, the time improves to just 15 seconds * 2 - the connection 
timeout number twice. Once for the `blockUntilConnectedOrTimedOut()` and once 
for the `getData()`. Note: `blockUntilConnectedOrTimedOut()` in all cases 
would've returned `false` implying you should not continue.

> Curator client fails when connecting to read-only ensemble
> ----------------------------------------------------------
>
>                 Key: CURATOR-355
>                 URL: https://issues.apache.org/jira/browse/CURATOR-355
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.11.0
>            Reporter: Benjamin Jaton
>            Priority: Critical
>         Attachments: test2.log
>
>
> ZK is 3.5.1-alpha
> I have a 3 nodes ZK cluster , readonly mode is enabled.
> 2 nodes are down, so one of them (QA-E8WIN11) is in read-only (verified by 
> using the ZK API manually). All the machines of the ensemble can be pinged 
> from the client.
> I'm using this piece of code:
> {code}
>               Builder curatorClientBuilder = CuratorFrameworkFactory.builder()
>                               
> .connectString("QA-E8WIN11:2181,QA-E8WIN12:2181")
>                               
> .sessionTimeoutMs(45000).connectionTimeoutMs(15000)
>                               .retryPolicy(new RetryNTimes(3, 
> 5000)).canBeReadOnly(true);
>               CuratorFramework client = curatorClientBuilder.build();
>               client.start();
>               client.getZookeeperClient().blockUntilConnectedOrTimedOut();
>               System.out.println("Successfully established the connection 
> with ZooKeeper");
>               
>               client.getData().forPath("/");
>               System.out.println("Done.");{code}
> When curator pick the host that is UP first, it goes through very quickly. 
> When it picks the host that is down first (QA-E8WIN12), it seems to be stuck 
> at the getData() call for a very long time, and then eventually fail with a 
> ConnectionLossException. (see attached log)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CURATOR-355) Curator client fails when connecting to read-only ensemble

Reply via email to