Re: zookeeper_interest returning ZOK on Connection Loss
On Mar 27, 2013, at 10:49 AM, Marshall McMullen marshall.mcmul...@gmail.com wrote: We already have a ZOO_NOTCONNECTED_STATE. Seems like that should work for these purposes... On Wed, Mar 27, 2013 at 11:23 AM, Michi Mutsuzaki mi...@cs.stanford.eduwrote: Hi Anthony, Is there a open jira for this patch? I don't have any problem integrating this patch. Only thing I'd suggest is to not call the event ZOO_EXPIRED_SESSION_STATE (which should be triggered only by the server). --Michi Michi-- Have you guys had a chance to look at the Jira and patch? I submitted the Jira https://issues.apache.org/jira/browse/ZOOKEEPER-1676 as well a link to my patch https://github.com/yunong/zookeeper/tree/release-3.4.3-patched a couple weeks ago. Just wondered if there's anything else you need from me in order to get this patch integrated. -Yunong
[jira] [Updated] (ZOOKEEPER-1676) C client zookeeper_interest returning ZOK on Connection Loss
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yunong Xiao updated ZOOKEEPER-1676: --- Component/s: c client C client zookeeper_interest returning ZOK on Connection Loss Key: ZOOKEEPER-1676 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1676 Project: ZooKeeper Issue Type: Bug Components: c client Affects Versions: 3.4.3 Environment: All Reporter: Yunong Xiao Priority: Critical I have a fairly simple single-threaded C client set up -- single-threaded because we are embedding zk in the node.js/libuv runtime -- which consists of the following algorithm: zookeeper_interest(); select(); // perform zookeeper api calls zookeeper_process(); I've noticed that zookeeper_interest in the C client never returns error if it is unable to connect to the zk server. From the spec of the zookeeper_interest API, I see that zookeeper_interest is supposed to return ZCONNECTIONLOSS when disconnected from the client. However, digging into the code, I see that the client is making a non-blocking connect call https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1596-1613 , and returning ZOK https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1684 If we assume that the server is not up, this will mean that the subsequent select() call would return 0, since the fd is not ready, and future calls to zookeeper_interest will always return 0 and not the expected ZCONNECTIONLOSS. Thus an upstream client will never be aware that the connection is lost. I don't think this is the expected behavior. I have temporarily patched the zk C client such that zookeeper_interest will return ZCONNECTIONLOSS if it's still unable to connect after session_timeout has been exceeded. I have included a patch for the client which fixes this for release 3.4.3 6b35e96 in this branch: https://github.com/yunong/zookeeper/tree/release-3.4.3-patched Here's the patch https://gist.github.com/yunong/efe869a0345867d54adf For more information, please see this email thread. http://mail-archives.apache.org/mod_mbox/zookeeper-dev/201211.mbox/%3c11a8e7c3-4dde-45d8-abec-a8a4d32cf...@gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (ZOOKEEPER-1676) C client zookeeper_interest returning ZOK on Connection Loss
Yunong Xiao created ZOOKEEPER-1676: -- Summary: C client zookeeper_interest returning ZOK on Connection Loss Key: ZOOKEEPER-1676 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1676 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.3 Environment: All Reporter: Yunong Xiao Priority: Critical I have a fairly simple single-threaded C client set up -- single-threaded because we are embedding zk in the node.js/libuv runtime -- which consists of the following algorithm: zookeeper_interest(); select(); // perform zookeeper api calls zookeeper_process(); I've noticed that zookeeper_interest in the C client never returns error if it is unable to connect to the zk server. From the spec of the zookeeper_interest API, I see that zookeeper_interest is supposed to return ZCONNECTIONLOSS when disconnected from the client. However, digging into the code, I see that the client is making a non-blocking connect call https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1596-1613 , and returning ZOK https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1684 If we assume that the server is not up, this will mean that the subsequent select() call would return 0, since the fd is not ready, and future calls to zookeeper_interest will always return 0 and not the expected ZCONNECTIONLOSS. Thus an upstream client will never be aware that the connection is lost. I don't think this is the expected behavior. I have temporarily patched the zk C client such that zookeeper_interest will return ZCONNECTIONLOSS if it's still unable to connect after session_timeout has been exceeded. I have included a patch for the client which fixes this for release 3.4.3 6b35e96 in this branch: https://github.com/yunong/zookeeper/tree/release-3.4.3-patched Here's the patch https://gist.github.com/yunong/efe869a0345867d54adf For more information, please see this email thread. http://mail-archives.apache.org/mod_mbox/zookeeper-dev/201211.mbox/%3c11a8e7c3-4dde-45d8-abec-a8a4d32cf...@gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: zookeeper_interest returning ZOK on Connection Loss
On Mar 27, 2013, at 10:23 AM, Michi Mutsuzaki mi...@cs.stanford.edu wrote: Is there a open jira for this patch? I don't have any problem integrating this patch. Only thing I'd suggest is to not call the event ZOO_EXPIRED_SESSION_STATE (which should be triggered only by the server). --Michi I've submitted the JIRA: https://issues.apache.org/jira/browse/ZOOKEEPER-1676 Of note I've submitted links to the patch: https://gist.github.com/yunong/efe869a0345867d54adf as well as my private branch of the patch https://github.com/yunong/zookeeper/tree/release-3.4.3-patched On Mar 27, 2013, at 10:49 AM, Marshall McMullen marshall.mcmul...@gmail.com wrote: We already have a ZOO_NOTCONNECTED_STATE. Seems like that should work for these purposes… At the time I wrote the patch, I don't think 3.4.3 had ZOO_NOTCONNECTED_STATE. If it's there now in the newer versions, then it certainly seems viable. -Yunong
Re: zookeeper_interest returning ZOK on Connection Loss
Michi -- On Nov 5, 2012, at 11:24 PM, Michi Mutsuzaki mi...@cs.stanford.edu wrote: On Mon, Nov 5, 2012 at 2:16 PM, Yunong Xiao yjx...@gmail.com wrote: What happens if the server is offline? In the scenario you described, select() will never succeed, and zookeeper_interest() will never inform the upstream consumer that the server is down. That's the crux of the problem here, when the server is unavailable, the client needs to know after some amount of time that the connection has failed. Otherwise the upstream consumer has no idea the connection to zookeeper has been severed and will hang indefinitely. If the server is offline, zookeeper client keeps trying to connect to the server indefinitely. It's up to the caller to decide when to give up. One possibility is to poll the connection state using zoo_state() and give up on the handle if it doesn't become connected after certain time. I certainly don't disagree here. What you're suggesting makes sense here -- indeed my patch does a variation of this -- however, it relies on the consuming client to keep track of the time since the client was last connected to the server. I dare say it'd be more preferable for zookeeper.c to keep track of this time internally, and inform the consuming client when this time has exceeded a certain threshold, such as session_timeout. The consuming client can then decide to give up. It seems like common functionality that all consumers would like to have as part of using the C client.
Re: zookeeper_interest returning ZOK on Connection Loss
Michi, I don't think just returning error after checking SO_ERROR is sufficient here, since this indicates connection loss but not session expiration. The client could still connect to a different zk server on future iterations of zookeeper_interest(), which will not occur if we return error. I think what is needed here is to keep track of the amount of time that has elapsed since the last time a connection was established to the server, while trying to connect to servers in the list. If this time exceeds session_timeout, then return an error such as ZOO_EXPIRED_SESSION_STATE. I've patched my local zookeeper client to keep track of this state. In addition, It's not clear to me how you would go about checking for SO_ERROR within zookeeper.c, since you'll need to check SO_ERROR after calling select(), which -- at least in my case, since I'm using the single threaded client, and I am embedding in the node.js/libuv runtime -- needs to be outside of the C client. I'd like to hear your thoughts on how this could be better achieved. I would propose the following patch: 1) zookeeper_interest needs to return state about it's current connection status, since the zhandle is opaque. This will allow consuming clients to check for the status of the non-blocking connect. 2) zookeeper_interest has to keep track of the last time it was connected to a server, and return ZOO_EXPIRED_SESSION_STATE if it's still unable to connect after the timeout exceeds the session timeout. 3) some sample code/documentation around checking for the status of non-blocking connects in user code for consumers of zookeeper.h. Does this seem reasonable? Would you guys be open to taking my patch? -Yunong On Nov 5, 2012, at 10:33 AM, Michi Mutsuzaki mi...@cs.stanford.edu wrote: Hi Yunong, Yes, this looks like a bug. The problem is that the C client is not handling the case when connect() returns EINPROGRESS or EWOULDBLOCK and eventually fails. I think the right fix is to check SO_ERROR after the socket becomes writable. Please go ahead and open a jira. Thanks! --Michi On Sun, Nov 4, 2012 at 2:11 PM, Yunong Xiao yjx...@gmail.com wrote: I have a fairly simple single-threaded C client set up -- single-threaded because we are embedding zk in the node.js/libuv runtime -- which consists of the following algorithm: zookeeper_interest(); select(); // perform zookeeper api calls zookeeper_process(); I've noticed that zookeeper_interest in the C client never returns error if it is unable to connect to the zk server. From the spec of the zookeeper_interest API, I see that zookeeper_interest is supposed to return ZCONNECTIONLOSS when disconnected from the client. However, digging into the code, I see that the client is making a non-blocking connect call https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1596-1613 , and returning ZOK https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1684 If we assume that the server is not up, this will mean that the subsequent select() call would return 0, since the fd is not ready, and future calls to zookeeper_interest will always return 0 and not the expected ZCONNECTIONLOSS. Thus an upstream client will never be aware that the connection is lost. I don't think this is the expected behavior. I have temporarily patched the zk C client such that zookeeper_interest will return ZCONNECTIONLOSS if it's still unable to connect after session_timeout has been exceeded. Is this the right interpretation of the API? Are you guys open to taking the patch I described? -Yunong
Re: zookeeper_interest returning ZOK on Connection Loss
Hi Michi, Otherwise the client might think the session is expired when it isn't. In case of the single-threaded client, it looks like you need to keep calling zookeeper_interest() until select() succeeds. What happens if the server is offline? In the scenario you described, select() will never succeed, and zookeeper_interest() will never inform the upstream consumer that the server is down. That's the crux of the problem here, when the server is unavailable, the client needs to know after some amount of time that the connection has failed. Otherwise the upstream consumer has no idea the connection to zookeeper has been severed and will hang indefinitely. Zookeeper client shouldn't return ZOO_EXPIRED_SESSION_STATE unless the server tells the client the session is expired. That's fair, I could return ZOO_CONNECTION_LOSS here instead. On Nov 5, 2012, at 1:59 PM, Michi Mutsuzaki mi...@cs.stanford.edu wrote: Hi Yunong, Zookeeper client shouldn't return ZOO_EXPIRED_SESSION_STATE unless the server tells the client the session is expired. Otherwise the client might think the session is expired when it isn't. In case of the single-threaded client, it looks like you need to keep calling zookeeper_interest() until select() succeeds. It would be great to have sample code / documentation. --Michi On Mon, Nov 5, 2012 at 1:19 PM, Yunong Xiao yjx...@gmail.com wrote: Michi, I don't think just returning error after checking SO_ERROR is sufficient here, since this indicates connection loss but not session expiration. The client could still connect to a different zk server on future iterations of zookeeper_interest(), which will not occur if we return error. I think what is needed here is to keep track of the amount of time that has elapsed since the last time a connection was established to the server, while trying to connect to servers in the list. If this time exceeds session_timeout, then return an error such as ZOO_EXPIRED_SESSION_STATE. I've patched my local zookeeper client to keep track of this state. In addition, It's not clear to me how you would go about checking for SO_ERROR within zookeeper.c, since you'll need to check SO_ERROR after calling select(), which -- at least in my case, since I'm using the single threaded client, and I am embedding in the node.js/libuv runtime -- needs to be outside of the C client. I'd like to hear your thoughts on how this could be better achieved. I would propose the following patch: 1) zookeeper_interest needs to return state about it's current connection status, since the zhandle is opaque. This will allow consuming clients to check for the status of the non-blocking connect. 2) zookeeper_interest has to keep track of the last time it was connected to a server, and return ZOO_EXPIRED_SESSION_STATE if it's still unable to connect after the timeout exceeds the session timeout. 3) some sample code/documentation around checking for the status of non-blocking connects in user code for consumers of zookeeper.h. Does this seem reasonable? Would you guys be open to taking my patch? -Yunong On Nov 5, 2012, at 10:33 AM, Michi Mutsuzaki mi...@cs.stanford.edu wrote: Hi Yunong, Yes, this looks like a bug. The problem is that the C client is not handling the case when connect() returns EINPROGRESS or EWOULDBLOCK and eventually fails. I think the right fix is to check SO_ERROR after the socket becomes writable. Please go ahead and open a jira. Thanks! --Michi On Sun, Nov 4, 2012 at 2:11 PM, Yunong Xiao yjx...@gmail.com wrote: I have a fairly simple single-threaded C client set up -- single-threaded because we are embedding zk in the node.js/libuv runtime -- which consists of the following algorithm: zookeeper_interest(); select(); // perform zookeeper api calls zookeeper_process(); I've noticed that zookeeper_interest in the C client never returns error if it is unable to connect to the zk server. From the spec of the zookeeper_interest API, I see that zookeeper_interest is supposed to return ZCONNECTIONLOSS when disconnected from the client. However, digging into the code, I see that the client is making a non-blocking connect call https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1596-1613 , and returning ZOK https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1684 If we assume that the server is not up, this will mean that the subsequent select() call would return 0, since the fd is not ready, and future calls to zookeeper_interest will always return 0 and not the expected ZCONNECTIONLOSS. Thus an upstream client will never be aware that the connection is lost. I don't think this is the expected behavior. I have temporarily patched the zk C client such that zookeeper_interest will return ZCONNECTIONLOSS if it's still unable to connect after session_timeout has been exceeded. Is this the right