You can lose connections to ZK for a variety of reasons:

* Network partition
* Restart of the leader node (causes loss of ZK quorum while a new leader is 
elected)
* Slow network (causing a client to miss a heartbeat to the server)
* Random server crashes, etc.

The standard ZooKeeper client can transparently handle most of these. When it 
loses connection to a server, it will try to reconnect to another server. 
However, it could fail to connect before the session times out. If this 
happens, you must re-create the ZooKeeper handle. Also, while it’s trying to 
connect, all ZooKeeper API calls will fail until connection is re-established. 
If your code isn’t prepared to handle these types of things, you’ll have 
problems.

Curator helps in several ways (note: Curator is a wrapper around the standard 
ZooKeeper client):

* Curator has a built in retry mechanism so that clients are immune to 
short-term connection losses. i.e. normally the ZooKeeper client can reconnect 
to another server very quickly. So, the first API call may fail but Curator 
will retry it a few times and it’s likely to succeed on subsequent tries.

* Curator monitors the internal ZooKeeper connection. Any Curator API will wait 
until connection is established, if the ZooKeeper session fails Curator will 
transparently recreate a new ZooKeeper instance. Most of the drudgery of 
managing the ZooKeeper connection is done for you by Curator.

These are the main features that will help connection problems with ZK. 
Additionally, Curator…

* Has dozens of pre-built recipes that are production tested by thousands of 
sites: locks, leaders, caches, etc.

* Has lots of nice utilities that make writing new recipes much easier.

* Has APIs that work around well known ZK edge cases. E.g. guaranteed deletes, 
protected sequential node creation, automatic parent node creation, etc.

That said, even with Curator, writing correct ZooKeeper applications is not 
easy. I usually tell people “Friends don’t let friends write ZooKeeper 
recipes”. If you want me to review some of your usages, I can do that.

I hope this helps.

-Jordan

On April 22, 2015 at 9:44:30 AM, Suresh Marru ([email protected]) wrote:

Hi Jordon,

Can you please advice us on this issue? Within Apache Airavata, we are using 
Zookeper for co-ordiantion of services. Intermittently, we see ZK connection 
loss errors. 

Is this an issue curator will help mitigate it? 

Can you please also shed some light on how to decide on when we use ZK vs 
curator? 

Thanks,
Suresh

On Apr 22, 2015, at 12:59 AM, Lahiru Ginnaliya Gamathige <[email protected]> 
wrote:


---------- Forwarded message ----------
From: <[email protected]>
Date: Wed, Apr 22, 2015 at 12:51 AM
Subject: Re: Intermittent connection loss error
To: [email protected]


Hi Lahiru Ginnaliya Gamathige,

Once in a while there may be a time change caused by NTP which causes all 
zookeeper client sessions to close. This may be the reason. Pls refer 
https://issues.apache.org/jira/browse/ZOOKEEPER-1366. (But this has been fixed 
in 3.5.1). This may help you.

Regards,
Indira Priyadharshini
______________________________________
From: Lahiru Ginnaliya Gamathige <[email protected]>
Sent: Tuesday, April 21, 2015 8:13 PM
To: [email protected]
Subject: Intermittent connection loss error

Hi Devs,

We are using ZK in Apache Airavata and when we run it for sometime some
connections are get lost and never get reconnect. I get following error and
since I try to reconnect in my process method it keeps trying and exhaust
the log. Of course I can fix the log issue but I am not sure why this is
happening . I am using ZK in standalone mode just single instance and below
is the code I use to reconnect and the log.

2015-04-08 09:43:10,785 [main-SendThread(gw111.iu.xsede.org:2181)] WARN
org.apache.zookeeper.ClientCnxn - Session 0x0 for server
gw111.iu.xsede.org/149.165.228.109:2181, unexpected error, closing socket
connection and attempting reconnect
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:66)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041)


@Override
synchronized public void process(WatchedEvent watchedEvent) {
    logger.info(watchedEvent.getPath());
    synchronized (mutex) {
        Event.KeeperState state = watchedEvent.getState();
        logger.info(state.name());
        switch(state){
            case SyncConnected:
                mutex.notify();
            case Expired:case Disconnected:
                try {
                    mutex = -1;
                    zk = new
ZooKeeper(AiravataZKUtils.getZKhostPort(),
AiravataZKUtils.getZKTimeout(), this);
                    synchronized (mutex) {
                        mutex.wait();  // waiting for the syncConnected event
                    }
                    storeServerConfig();
                } catch (IOException e) {
                    logger.error("Error while synchronizing with zookeeper", e);
                } catch (ApplicationSettingsException e) {
                    logger.error("Error while synchronizing with zookeeper", e);
                } catch (InterruptedException e) {
                    logger.error("Error while synchronizing with zookeeper", e);
                } catch (AiravataSystemException e) {
                    logger.error("Error while synchronizing with zookeeper", e);
                }
        }
    }
}


Lahiru


--
Research Assistant
Science Gateways Group
Indiana University
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com



--
Research Assistant 
Science Gateways Group
Indiana University

Reply via email to