Assuming that clocks are usually not too out of step, Curator should be able to infer when the server would have terminated the existing session based on the clock. A little bit of thought would need to be put into resolving the race condition when you reconnect right as you were about to time out, in order to present a unified view of the state change, but that doesn't seem infeasible. This seems like exactly the kind of problem Curator should be solving.
On Thu, Aug 20, 2015 at 11:21 AM, Jordan Zimmerman < [email protected]> wrote: > Yeah, in hindsight LOST isn’t useful which is why all the recipes refer to > SUSPENDED. Having a session-expired state is complicated in Curator as > Curator sometimes re-creates the connection without a ZK generated event. > So, the SESSION lost would have to be inferred. > > -Jordan > > > > On August 20, 2015 at 10:13:19 AM, Scott Blum ([email protected]) > wrote: > > Ahh... that is confusing, and seems dubiously useful. I think 99% of the > time I'd rather get an event that represents that the session is definitely > lost. > > On Thu, Aug 20, 2015 at 10:53 AM, Jordan Zimmerman < > [email protected]> wrote: > >> Maybe I'm confused, but I thought that's what ConnectionState SUSPENDED >> vs. >> LOST was all about? >> >> It’s a big source of confusion with Curator. LOST does _not_ mean the >> session was lost. It means Curator has given up after retries, etc. Because >> Curator re-creates ZK handles internally the notion of a “session” is more >> complicated than using raw ZooKeeper. >> >> >> -Jordan >> >> >> >> >> On August 20, 2015 at 9:50:56 AM, Scott Blum ([email protected]) >> wrote: >> >> Maybe I'm confused, but I thought that's what ConnectionState SUSPENDED >> vs. >> LOST was all about? >> >> Maybe the recipes just need to be tweaked a bit? >> >> I always assumed emphemeral nodes would be gone on LOST but not gone if >> you >> get a SUSPENDED followed by RECONNECTED. >> >> The one question I've always wondered is what happens to Watchers on >> SUSPENDED, do they all need to be re-applied, or will they still fire >> later >> as long as you don't get LOST? >> >> On Thu, Aug 20, 2015 at 10:41 AM, Jordan Zimmerman < >> [email protected]> wrote: >> >> > I wonder if we can add error handling policies to Curator. Currently, >> the >> > policy of all recipes is hard-coded to treat SUSPENDED as a type of lost >> > session. We could change this to be injected like the retry policy. To >> > solve this particular issue we’d also need to introduce a SESSION_LOST >> > state of some type. This is complicated as Curator re-creates >> connections >> > internally. >> > >> > Thoughts? >> > >> > -Jordan >> > >> > >> > >> > On August 20, 2015 at 2:10:52 AM, Dong Lei ([email protected]) >> wrote: >> > >> > Hi curator-devs: >> > >> > We use Spark in standalone mode in which Spark leverage curator to >> manage >> > ZK connections and elect leader. Our Zookeeper may be not very stable >> and >> > we get "session suspended and reconnected" sometimes. The problem is >> that >> > this kind of disassociated and reconnected triggers leader election >> quite >> > often. And Spark's reaction to leadership switching can be very costly. >> > >> > So I'm thinking about whether it's possible to tolerate such failure >> cases >> > if we can reconnect soon and the session is actually kept after the >> > reconnection? >> > Or does such a requirement makes sense to you? >> > >> > Any advice will be appreciated. >> > >> > >> > Thanks >> > Dong Lei >> > >> > >> >> >
