Hi Yunong, Zookeeper client shouldn't return ZOO_EXPIRED_SESSION_STATE unless the server tells the client the session is expired. Otherwise the client might think the session is expired when it isn't. In case of the single-threaded client, it looks like you need to keep calling zookeeper_interest() until select() succeeds. It would be great to have sample code / documentation.
--Michi On Mon, Nov 5, 2012 at 1:19 PM, Yunong Xiao <yjx...@gmail.com> wrote: > Michi, > > I don't think just returning error after checking SO_ERROR is sufficient here, > since this indicates connection loss but not session expiration. The client > could still connect to a different zk server on future iterations of > zookeeper_interest(), which will not occur if we return error. I think what is > needed here is to keep track of the amount of time that has elapsed since the > last time a connection was established to the server, while trying to connect > to servers in the list. If this time exceeds session_timeout, then return an > error such as ZOO_EXPIRED_SESSION_STATE. I've patched my local zookeeper > client > to keep track of this state. > > In addition, It's not clear to me how you would go about checking for SO_ERROR > within zookeeper.c, since you'll need to check SO_ERROR after calling > select(), > which -- at least in my case, since I'm using the single threaded client, and > I > am embedding in the node.js/libuv runtime -- needs to be outside of the C > client. I'd like to hear your thoughts on how this could be better achieved. I > would propose the following patch: > > 1) zookeeper_interest needs to return state about it's current connection > status, since the zhandle is opaque. This will allow consuming clients to > check > for the status of the non-blocking connect. > > 2) zookeeper_interest has to keep track of the last time it was connected to a > server, and return ZOO_EXPIRED_SESSION_STATE if it's still unable to connect > after the timeout exceeds the session timeout. > > 3) some sample code/documentation around checking for the status of > non-blocking connects in user code for consumers of zookeeper.h. > > Does this seem reasonable? Would you guys be open to taking my patch? > > -Yunong > > On Nov 5, 2012, at 10:33 AM, Michi Mutsuzaki <mi...@cs.stanford.edu> wrote: > >> Hi Yunong, >> >> Yes, this looks like a bug. The problem is that the C client is not >> handling the case when connect() returns EINPROGRESS or EWOULDBLOCK >> and eventually fails. I think the right fix is to check SO_ERROR after >> the socket becomes writable. Please go ahead and open a jira. >> >> Thanks! >> --Michi >> >> On Sun, Nov 4, 2012 at 2:11 PM, Yunong Xiao <yjx...@gmail.com> wrote: >>> I have a fairly simple single-threaded C client set up -- single-threaded >>> because we are embedding zk in the node.js/libuv runtime -- which consists >>> of >>> the following algorithm: >>> >>> zookeeper_interest(); select(); >>> // perform zookeeper api calls >>> zookeeper_process(); >>> >>> I've noticed that zookeeper_interest in the C client never returns error if >>> it >>> is unable to connect to the zk server. >>> >>> From the spec of the zookeeper_interest API, I see that zookeeper_interest >>> is >>> supposed to return ZCONNECTIONLOSS when disconnected from the client. >>> However, >>> digging into the code, I see that the client is making a non-blocking >>> connect >>> call >>> https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1596-1613 >>> , and returning ZOK >>> https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1684 >>> >>> If we assume that the server is not up, this will mean that the subsequent >>> select() call would return 0, since the fd is not ready, and future calls to >>> zookeeper_interest will always return 0 and not the expected >>> ZCONNECTIONLOSS. >>> Thus an upstream client will never be aware that the connection is lost. >>> >>> I don't think this is the expected behavior. I have temporarily patched the >>> zk >>> C client such that zookeeper_interest will return ZCONNECTIONLOSS if it's >>> still >>> unable to connect after session_timeout has been exceeded. >>> >>> Is this the right interpretation of the API? Are you guys open to taking the >>> patch I described? >>> >>> -Yunong >