I have seen session expire events mainly when we unplug nw from the node to ZK servers ( mainly i have seen when we are developing failover controller fw with ZK). This may not be usual scenario, but it can happen.
Coming to BK case, when we loos the zk handle connectivity, simply replacing may not be possible always because, we will not be sure when exactly we can create new connection with ZK back. So, may be the fix could be that BK clients can throw the exception as they can not serve when ZK is not availble. Let the application take actions? Regards, Uma ________________________________ From: Flavio Junqueira [[email protected]] Sent: Tuesday, May 01, 2012 6:59 PM To: [email protected] Subject: Re: ZooKeeper Session Expiration I don't know if this is your case, but we have seen in the past with zookeeper such issues caused by GC pauses. I remember one case with hbase, and I think it is this one: https://issues.apache.org/jira/browse/HBASE-1316 We have seen zookeeper clusters serving thousands of clients, so ~100 shouldn't be a problem. Still session expiration is part of zookeeper, so we need to deal with here as well. -Flavio On May 1, 2012, at 3:14 PM, John Nagro wrote: Flavio - We're trying to get to the bottom of it. As I understand it, in a properly configured and operating Zk Cluster we should never see a session expiration exception. Globally (including all systems) we see them perhaps once a week for the last month - and it causes some issues in our system. We saw one last night, and bookkeeper had an issue a couple days ago. We do have a lot of nodes connecting to zookeeper for various things. We have a home-built configuration management tool that uses zk as the data store, the bookkeeper stuff obviously does, my coordination on top of the bookkeeper ledgers uses it, etc. So yes, lots of machines (dozens up to ~100) talk to this zk cluster in some fashion or another - we have other clusters too. Ultimately, more machines will talk to the configuration stuff in the long term. I could potentially move my zk stuff off that cluster if you think it would help. -John On Tue, May 1, 2012 at 8:44 AM, Flavio Junqueira <[email protected]<mailto:[email protected]>> wrote: This is definitely not ideal. If you lose your zookeeper session, then you're not able to close your open ledgers, which will force ledger recovery. It is not a correctness issue, but certainly inconvenient. We need to fix, and I'm glad that Uma is already looking into it. I'm curious about why you're getting session expirations, though. Is it frequent or you got it once? Do you have many nodes connecting to your ZooKeeper instance? -Flavio On May 1, 2012, at 2:07 PM, John Nagro wrote: Thanks Uma - that is exactly what i am looking for. The way i am handing it now is to pass a bookkeeper client factory rather than an instance. When i encounter zk session expiration, i create a new client and discard the old one - getting a fresh set of connections to zk. Perhaps not idea, but gets the job done. thanks! -John On Tue, May 1, 2012 at 12:09 AM, Uma Maheswara Rao G <[email protected]<mailto:[email protected]>> wrote: Hi John, BK client need to handle session expire events from ZK. Here is the issue for that BOOKKEEPER-225<https://issues.apache.org/jira/browse/BOOKKEEPER-225>. We will implement it soon. I hope this is your doubt. Please correct me if my interpretation is wrong about your question here. Thanks a lot, Uma ________________________________ From: John Nagro [[email protected]<mailto:[email protected]>] Sent: Tuesday, May 01, 2012 1:20 AM To: [email protected]<mailto:[email protected]> Subject: ZooKeeper Session Expiration Hello - If I start seeing ZKExceptions in the Bk Client, which appear to be due to SessionExpiration errors... it seems that the BookKeeper client never recovers from that? Is that correct? Thanks! -John Nagro flavio junqueira senior research scientist [email protected]<mailto:[email protected]> direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300 fax (408) 349 3301
