Hi Yunong,

Zookeeper client shouldn't return ZOO_EXPIRED_SESSION_STATE unless the
server tells the client the session is expired. Otherwise the client
might think the session is expired when it isn't. In case of the
single-threaded client, it looks like you need to keep calling
zookeeper_interest() until select() succeeds. It would be great to
have sample code / documentation.

--Michi

On Mon, Nov 5, 2012 at 1:19 PM, Yunong Xiao <yjx...@gmail.com> wrote:
> Michi,
>
> I don't think just returning error after checking SO_ERROR is sufficient here,
> since this indicates connection loss but not session expiration. The client
> could still connect to a different zk server on future iterations of
> zookeeper_interest(), which will not occur if we return error. I think what is
> needed here is to keep track of the amount of time that has elapsed since the
> last time a connection was established to the server, while trying to connect
> to servers in the list. If this time exceeds session_timeout, then return an
> error such as ZOO_EXPIRED_SESSION_STATE. I've patched my local zookeeper 
> client
> to keep track of this state.
>
> In addition, It's not clear to me how you would go about checking for SO_ERROR
> within zookeeper.c, since you'll need to check SO_ERROR after calling 
> select(),
> which -- at least in my case, since I'm using the single threaded client, and 
> I
> am embedding in the node.js/libuv runtime -- needs to be outside of the C
> client. I'd like to hear your thoughts on how this could be better achieved. I
> would propose the following patch:
>
> 1) zookeeper_interest needs to return state about it's current connection
> status, since the zhandle is opaque. This will allow consuming clients to 
> check
> for the status of the non-blocking connect.
>
> 2) zookeeper_interest has to keep track of the last time it was connected to a
> server, and return ZOO_EXPIRED_SESSION_STATE if it's still unable to connect
> after the timeout exceeds the session timeout.
>
> 3) some sample code/documentation around checking for the status of
> non-blocking connects in user code for consumers of zookeeper.h.
>
> Does this seem reasonable? Would you guys be open to taking my patch?
>
> -Yunong
>
> On Nov 5, 2012, at 10:33 AM, Michi Mutsuzaki <mi...@cs.stanford.edu> wrote:
>
>> Hi Yunong,
>>
>> Yes, this looks like a bug. The problem is that the C client is not
>> handling the case when connect() returns EINPROGRESS or EWOULDBLOCK
>> and eventually fails. I think the right fix is to check SO_ERROR after
>> the socket becomes writable. Please go ahead and open a jira.
>>
>> Thanks!
>> --Michi
>>
>> On Sun, Nov 4, 2012 at 2:11 PM, Yunong Xiao <yjx...@gmail.com> wrote:
>>> I have a fairly simple single-threaded C client set up -- single-threaded
>>> because we are embedding zk in the node.js/libuv runtime -- which consists 
>>> of
>>> the following algorithm:
>>>
>>> zookeeper_interest(); select();
>>> // perform zookeeper api calls
>>> zookeeper_process();
>>>
>>> I've noticed that zookeeper_interest in the C client never returns error if 
>>> it
>>> is unable to connect to the zk server.
>>>
>>> From the spec of the zookeeper_interest API, I see that zookeeper_interest 
>>> is
>>> supposed to return ZCONNECTIONLOSS when disconnected from the client. 
>>> However,
>>> digging into the code, I see that the client is making a non-blocking 
>>> connect
>>> call
>>> https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1596-1613
>>> ,  and returning ZOK
>>> https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1684
>>>
>>> If we assume that the server is not up, this will mean that the subsequent
>>> select() call would return 0, since the fd is not ready, and future calls to
>>> zookeeper_interest will always return 0 and not the expected 
>>> ZCONNECTIONLOSS.
>>> Thus an upstream client will never be aware that the connection is lost.
>>>
>>> I don't think this is the expected behavior. I have temporarily patched the 
>>> zk
>>> C client such that zookeeper_interest will return ZCONNECTIONLOSS if it's 
>>> still
>>> unable to connect after session_timeout has been exceeded.
>>>
>>> Is this the right interpretation of the API? Are you guys open to taking the
>>> patch I described?
>>>
>>> -Yunong
>

Reply via email to