[jira] [Updated] (ZOOKEEPER-1057) zookeeper c-client, connection to offline server fails to successfully fallback to second zk host

2013-12-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Germán Blanco updated ZOOKEEPER-1057:
-

Attachment: ZOOKEEPER-1057.patch

The test is simpler and looks better if integrated into TestClient.cc.
The attached patch can be applied both to trunk and branch 3.4.
With this version, the test case passes for the single threaded version, but 
for the multithreaded version it hangs forever (or at least more than a few 
minutes).

 zookeeper c-client, connection to offline server fails to successfully 
 fallback to second zk host
 -

 Key: ZOOKEEPER-1057
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1057
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.1, 3.3.2, 3.3.3
 Environment: snowdutyrise-lm ~/- uname -a
 Darwin snowdutyrise-lm 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 
 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
 also observed on:
 2.6.35-28-server 49-Ubuntu SMP Tue Mar 1 14:55:37 UTC 2011
Reporter: Woody Anderson
Assignee: Michi Mutsuzaki
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1057-b3.4.patch, ZOOKEEPER-1057.patch, 
 ZOOKEEPER-1057.patch, ZOOKEEPER-1057.patch, ZOOKEEPER-1057.patch


 Hello, I'm a contributor for the node.js zookeeper module: 
 https://github.com/yfinkelstein/node-zookeeper
 i'm using zk 3.3.3 for the purposes of this issue, but i have validated it 
 fails on 3.3.1 and 3.3.2
 i'm having an issue when trying to connect when one of my zookeeper servers 
 is offline.
 if the first server attempted is online, all is good.
 if the offline server is attempted first, then the client is never able to 
 connect to _any_ server.
 inside zookeeper.c a connection loss (-4) is received, the socket is closed 
 and buffers are cleaned up, it then attempts the next server in the list, 
 creates a new socket (which gets the same fd as the previously closed socket) 
 and connecting fails, and it continues to fail seemingly forever.
 The nature of this fail is not that it gets -4 connection loss errors, but 
 that zookeeper_interest doesn't find anything going on on the socket before 
 the user provided timeout kicks things out. I don't want to have to wait 5 
 minutes, even if i could make myself.
 this is the message that follows the connection loss:
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket 
 [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection 
 timed out (exceeded timeout by 3ms)
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest 
 returned error: -7 - operation timeout
 While investigating, i decided to comment out close(zh-fd) in handle_error 
 (zookeeper.c#1153)
 now everything works (obviously i'm leaking an fd). Connection the the second 
 host works immediately.
 this is the behavior i'm looking for, though i clearly don't want to leak the 
 fd, so i'm wondering why the fd re-use is causing this issue.
 close() is not returning an error (i checked even though current code assumes 
 success).
 i'm on osx 10.6.7
 i tried adding a setsockopt so_linger (though i didn't want that to be a 
 solution), it didn't work.
 full debug traces are included in issue here: 
 https://github.com/yfinkelstein/node-zookeeper/issues/6



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (ZOOKEEPER-1057) zookeeper c-client, connection to offline server fails to successfully fallback to second zk host

2013-12-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Germán Blanco updated ZOOKEEPER-1057:
-

Attachment: ZOOKEEPER-1057.patch

The attached patch has a proposed test case that passes both in trunk and 3.4.
It was my mistake, one zookeeper_close too many in the last patch.

 zookeeper c-client, connection to offline server fails to successfully 
 fallback to second zk host
 -

 Key: ZOOKEEPER-1057
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1057
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.1, 3.3.2, 3.3.3
 Environment: snowdutyrise-lm ~/- uname -a
 Darwin snowdutyrise-lm 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 
 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
 also observed on:
 2.6.35-28-server 49-Ubuntu SMP Tue Mar 1 14:55:37 UTC 2011
Reporter: Woody Anderson
Assignee: Michi Mutsuzaki
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1057-b3.4.patch, ZOOKEEPER-1057.patch, 
 ZOOKEEPER-1057.patch, ZOOKEEPER-1057.patch, ZOOKEEPER-1057.patch, 
 ZOOKEEPER-1057.patch


 Hello, I'm a contributor for the node.js zookeeper module: 
 https://github.com/yfinkelstein/node-zookeeper
 i'm using zk 3.3.3 for the purposes of this issue, but i have validated it 
 fails on 3.3.1 and 3.3.2
 i'm having an issue when trying to connect when one of my zookeeper servers 
 is offline.
 if the first server attempted is online, all is good.
 if the offline server is attempted first, then the client is never able to 
 connect to _any_ server.
 inside zookeeper.c a connection loss (-4) is received, the socket is closed 
 and buffers are cleaned up, it then attempts the next server in the list, 
 creates a new socket (which gets the same fd as the previously closed socket) 
 and connecting fails, and it continues to fail seemingly forever.
 The nature of this fail is not that it gets -4 connection loss errors, but 
 that zookeeper_interest doesn't find anything going on on the socket before 
 the user provided timeout kicks things out. I don't want to have to wait 5 
 minutes, even if i could make myself.
 this is the message that follows the connection loss:
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket 
 [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection 
 timed out (exceeded timeout by 3ms)
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest 
 returned error: -7 - operation timeout
 While investigating, i decided to comment out close(zh-fd) in handle_error 
 (zookeeper.c#1153)
 now everything works (obviously i'm leaking an fd). Connection the the second 
 host works immediately.
 this is the behavior i'm looking for, though i clearly don't want to leak the 
 fd, so i'm wondering why the fd re-use is causing this issue.
 close() is not returning an error (i checked even though current code assumes 
 success).
 i'm on osx 10.6.7
 i tried adding a setsockopt so_linger (though i didn't want that to be a 
 solution), it didn't work.
 full debug traces are included in issue here: 
 https://github.com/yfinkelstein/node-zookeeper/issues/6



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (ZOOKEEPER-1057) zookeeper c-client, connection to offline server fails to successfully fallback to second zk host

2013-12-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Germán Blanco updated ZOOKEEPER-1057:
-

Attachment: ZOOKEEPER-1057.patch

... and now with the deterministic connection order, to make sure it is not 
just luck that it was working.
I am very sorry for the spam, I think I need the holidays.

 zookeeper c-client, connection to offline server fails to successfully 
 fallback to second zk host
 -

 Key: ZOOKEEPER-1057
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1057
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.1, 3.3.2, 3.3.3
 Environment: snowdutyrise-lm ~/- uname -a
 Darwin snowdutyrise-lm 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 
 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
 also observed on:
 2.6.35-28-server 49-Ubuntu SMP Tue Mar 1 14:55:37 UTC 2011
Reporter: Woody Anderson
Assignee: Michi Mutsuzaki
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1057-b3.4.patch, ZOOKEEPER-1057.patch, 
 ZOOKEEPER-1057.patch, ZOOKEEPER-1057.patch, ZOOKEEPER-1057.patch, 
 ZOOKEEPER-1057.patch, ZOOKEEPER-1057.patch


 Hello, I'm a contributor for the node.js zookeeper module: 
 https://github.com/yfinkelstein/node-zookeeper
 i'm using zk 3.3.3 for the purposes of this issue, but i have validated it 
 fails on 3.3.1 and 3.3.2
 i'm having an issue when trying to connect when one of my zookeeper servers 
 is offline.
 if the first server attempted is online, all is good.
 if the offline server is attempted first, then the client is never able to 
 connect to _any_ server.
 inside zookeeper.c a connection loss (-4) is received, the socket is closed 
 and buffers are cleaned up, it then attempts the next server in the list, 
 creates a new socket (which gets the same fd as the previously closed socket) 
 and connecting fails, and it continues to fail seemingly forever.
 The nature of this fail is not that it gets -4 connection loss errors, but 
 that zookeeper_interest doesn't find anything going on on the socket before 
 the user provided timeout kicks things out. I don't want to have to wait 5 
 minutes, even if i could make myself.
 this is the message that follows the connection loss:
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket 
 [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection 
 timed out (exceeded timeout by 3ms)
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest 
 returned error: -7 - operation timeout
 While investigating, i decided to comment out close(zh-fd) in handle_error 
 (zookeeper.c#1153)
 now everything works (obviously i'm leaking an fd). Connection the the second 
 host works immediately.
 this is the behavior i'm looking for, though i clearly don't want to leak the 
 fd, so i'm wondering why the fd re-use is causing this issue.
 close() is not returning an error (i checked even though current code assumes 
 success).
 i'm on osx 10.6.7
 i tried adding a setsockopt so_linger (though i didn't want that to be a 
 solution), it didn't work.
 full debug traces are included in issue here: 
 https://github.com/yfinkelstein/node-zookeeper/issues/6



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (ZOOKEEPER-1057) zookeeper c-client, connection to offline server fails to successfully fallback to second zk host

2013-12-20 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-1057:
---

Priority: Blocker  (was: Critical)

 zookeeper c-client, connection to offline server fails to successfully 
 fallback to second zk host
 -

 Key: ZOOKEEPER-1057
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1057
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.1, 3.3.2, 3.3.3
 Environment: snowdutyrise-lm ~/- uname -a
 Darwin snowdutyrise-lm 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 
 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
 also observed on:
 2.6.35-28-server 49-Ubuntu SMP Tue Mar 1 14:55:37 UTC 2011
Reporter: Woody Anderson
Assignee: Michi Mutsuzaki
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1057.patch, ZOOKEEPER-1057.patch


 Hello, I'm a contributor for the node.js zookeeper module: 
 https://github.com/yfinkelstein/node-zookeeper
 i'm using zk 3.3.3 for the purposes of this issue, but i have validated it 
 fails on 3.3.1 and 3.3.2
 i'm having an issue when trying to connect when one of my zookeeper servers 
 is offline.
 if the first server attempted is online, all is good.
 if the offline server is attempted first, then the client is never able to 
 connect to _any_ server.
 inside zookeeper.c a connection loss (-4) is received, the socket is closed 
 and buffers are cleaned up, it then attempts the next server in the list, 
 creates a new socket (which gets the same fd as the previously closed socket) 
 and connecting fails, and it continues to fail seemingly forever.
 The nature of this fail is not that it gets -4 connection loss errors, but 
 that zookeeper_interest doesn't find anything going on on the socket before 
 the user provided timeout kicks things out. I don't want to have to wait 5 
 minutes, even if i could make myself.
 this is the message that follows the connection loss:
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket 
 [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection 
 timed out (exceeded timeout by 3ms)
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest 
 returned error: -7 - operation timeout
 While investigating, i decided to comment out close(zh-fd) in handle_error 
 (zookeeper.c#1153)
 now everything works (obviously i'm leaking an fd). Connection the the second 
 host works immediately.
 this is the behavior i'm looking for, though i clearly don't want to leak the 
 fd, so i'm wondering why the fd re-use is causing this issue.
 close() is not returning an error (i checked even though current code assumes 
 success).
 i'm on osx 10.6.7
 i tried adding a setsockopt so_linger (though i didn't want that to be a 
 solution), it didn't work.
 full debug traces are included in issue here: 
 https://github.com/yfinkelstein/node-zookeeper/issues/6



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (ZOOKEEPER-1057) zookeeper c-client, connection to offline server fails to successfully fallback to second zk host

2013-12-20 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Germán Blanco updated ZOOKEEPER-1057:
-

Attachment: ZOOKEEPER-1057-b3.4.patch

ZOOKEEPER-1057-b3.4 is the port of Michi's test case to 3.4 branch. It fails 
for me.
It uses a standalone server, instead of TestQuorumServer, since I figured that 
we just need a server listening on one port to test this.

 zookeeper c-client, connection to offline server fails to successfully 
 fallback to second zk host
 -

 Key: ZOOKEEPER-1057
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1057
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.1, 3.3.2, 3.3.3
 Environment: snowdutyrise-lm ~/- uname -a
 Darwin snowdutyrise-lm 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 
 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
 also observed on:
 2.6.35-28-server 49-Ubuntu SMP Tue Mar 1 14:55:37 UTC 2011
Reporter: Woody Anderson
Assignee: Michi Mutsuzaki
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1057-b3.4.patch, ZOOKEEPER-1057.patch, 
 ZOOKEEPER-1057.patch


 Hello, I'm a contributor for the node.js zookeeper module: 
 https://github.com/yfinkelstein/node-zookeeper
 i'm using zk 3.3.3 for the purposes of this issue, but i have validated it 
 fails on 3.3.1 and 3.3.2
 i'm having an issue when trying to connect when one of my zookeeper servers 
 is offline.
 if the first server attempted is online, all is good.
 if the offline server is attempted first, then the client is never able to 
 connect to _any_ server.
 inside zookeeper.c a connection loss (-4) is received, the socket is closed 
 and buffers are cleaned up, it then attempts the next server in the list, 
 creates a new socket (which gets the same fd as the previously closed socket) 
 and connecting fails, and it continues to fail seemingly forever.
 The nature of this fail is not that it gets -4 connection loss errors, but 
 that zookeeper_interest doesn't find anything going on on the socket before 
 the user provided timeout kicks things out. I don't want to have to wait 5 
 minutes, even if i could make myself.
 this is the message that follows the connection loss:
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket 
 [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection 
 timed out (exceeded timeout by 3ms)
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest 
 returned error: -7 - operation timeout
 While investigating, i decided to comment out close(zh-fd) in handle_error 
 (zookeeper.c#1153)
 now everything works (obviously i'm leaking an fd). Connection the the second 
 host works immediately.
 this is the behavior i'm looking for, though i clearly don't want to leak the 
 fd, so i'm wondering why the fd re-use is causing this issue.
 close() is not returning an error (i checked even though current code assumes 
 success).
 i'm on osx 10.6.7
 i tried adding a setsockopt so_linger (though i didn't want that to be a 
 solution), it didn't work.
 full debug traces are included in issue here: 
 https://github.com/yfinkelstein/node-zookeeper/issues/6



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (ZOOKEEPER-1057) zookeeper c-client, connection to offline server fails to successfully fallback to second zk host

2013-12-20 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Germán Blanco updated ZOOKEEPER-1057:
-

Attachment: ZOOKEEPER-1057.patch

The trunk version of the b3.4 patch, for whatever it is worth.
I guess it will work just as Michi's patch.

 zookeeper c-client, connection to offline server fails to successfully 
 fallback to second zk host
 -

 Key: ZOOKEEPER-1057
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1057
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.1, 3.3.2, 3.3.3
 Environment: snowdutyrise-lm ~/- uname -a
 Darwin snowdutyrise-lm 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 
 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
 also observed on:
 2.6.35-28-server 49-Ubuntu SMP Tue Mar 1 14:55:37 UTC 2011
Reporter: Woody Anderson
Assignee: Michi Mutsuzaki
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1057-b3.4.patch, ZOOKEEPER-1057.patch, 
 ZOOKEEPER-1057.patch, ZOOKEEPER-1057.patch


 Hello, I'm a contributor for the node.js zookeeper module: 
 https://github.com/yfinkelstein/node-zookeeper
 i'm using zk 3.3.3 for the purposes of this issue, but i have validated it 
 fails on 3.3.1 and 3.3.2
 i'm having an issue when trying to connect when one of my zookeeper servers 
 is offline.
 if the first server attempted is online, all is good.
 if the offline server is attempted first, then the client is never able to 
 connect to _any_ server.
 inside zookeeper.c a connection loss (-4) is received, the socket is closed 
 and buffers are cleaned up, it then attempts the next server in the list, 
 creates a new socket (which gets the same fd as the previously closed socket) 
 and connecting fails, and it continues to fail seemingly forever.
 The nature of this fail is not that it gets -4 connection loss errors, but 
 that zookeeper_interest doesn't find anything going on on the socket before 
 the user provided timeout kicks things out. I don't want to have to wait 5 
 minutes, even if i could make myself.
 this is the message that follows the connection loss:
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket 
 [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection 
 timed out (exceeded timeout by 3ms)
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest 
 returned error: -7 - operation timeout
 While investigating, i decided to comment out close(zh-fd) in handle_error 
 (zookeeper.c#1153)
 now everything works (obviously i'm leaking an fd). Connection the the second 
 host works immediately.
 this is the behavior i'm looking for, though i clearly don't want to leak the 
 fd, so i'm wondering why the fd re-use is causing this issue.
 close() is not returning an error (i checked even though current code assumes 
 success).
 i'm on osx 10.6.7
 i tried adding a setsockopt so_linger (though i didn't want that to be a 
 solution), it didn't work.
 full debug traces are included in issue here: 
 https://github.com/yfinkelstein/node-zookeeper/issues/6



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (ZOOKEEPER-1057) zookeeper c-client, connection to offline server fails to successfully fallback to second zk host

2013-12-10 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-1057:
---

Attachment: ZOOKEEPER-1057.patch

This patch adds a test to validate that the c client gets connected to the 
second server in the list if the first server is down when zookeeper_init is 
called.

 zookeeper c-client, connection to offline server fails to successfully 
 fallback to second zk host
 -

 Key: ZOOKEEPER-1057
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1057
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.1, 3.3.2, 3.3.3
 Environment: snowdutyrise-lm ~/- uname -a
 Darwin snowdutyrise-lm 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 
 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
 also observed on:
 2.6.35-28-server 49-Ubuntu SMP Tue Mar 1 14:55:37 UTC 2011
Reporter: Woody Anderson
Assignee: Michi Mutsuzaki
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1057.patch


 Hello, I'm a contributor for the node.js zookeeper module: 
 https://github.com/yfinkelstein/node-zookeeper
 i'm using zk 3.3.3 for the purposes of this issue, but i have validated it 
 fails on 3.3.1 and 3.3.2
 i'm having an issue when trying to connect when one of my zookeeper servers 
 is offline.
 if the first server attempted is online, all is good.
 if the offline server is attempted first, then the client is never able to 
 connect to _any_ server.
 inside zookeeper.c a connection loss (-4) is received, the socket is closed 
 and buffers are cleaned up, it then attempts the next server in the list, 
 creates a new socket (which gets the same fd as the previously closed socket) 
 and connecting fails, and it continues to fail seemingly forever.
 The nature of this fail is not that it gets -4 connection loss errors, but 
 that zookeeper_interest doesn't find anything going on on the socket before 
 the user provided timeout kicks things out. I don't want to have to wait 5 
 minutes, even if i could make myself.
 this is the message that follows the connection loss:
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket 
 [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection 
 timed out (exceeded timeout by 3ms)
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest 
 returned error: -7 - operation timeout
 While investigating, i decided to comment out close(zh-fd) in handle_error 
 (zookeeper.c#1153)
 now everything works (obviously i'm leaking an fd). Connection the the second 
 host works immediately.
 this is the behavior i'm looking for, though i clearly don't want to leak the 
 fd, so i'm wondering why the fd re-use is causing this issue.
 close() is not returning an error (i checked even though current code assumes 
 success).
 i'm on osx 10.6.7
 i tried adding a setsockopt so_linger (though i didn't want that to be a 
 solution), it didn't work.
 full debug traces are included in issue here: 
 https://github.com/yfinkelstein/node-zookeeper/issues/6



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (ZOOKEEPER-1057) zookeeper c-client, connection to offline server fails to successfully fallback to second zk host

2013-12-10 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-1057:
---

Attachment: ZOOKEEPER-1057.patch

Trying again.

 zookeeper c-client, connection to offline server fails to successfully 
 fallback to second zk host
 -

 Key: ZOOKEEPER-1057
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1057
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.1, 3.3.2, 3.3.3
 Environment: snowdutyrise-lm ~/- uname -a
 Darwin snowdutyrise-lm 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 
 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
 also observed on:
 2.6.35-28-server 49-Ubuntu SMP Tue Mar 1 14:55:37 UTC 2011
Reporter: Woody Anderson
Assignee: Michi Mutsuzaki
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1057.patch, ZOOKEEPER-1057.patch


 Hello, I'm a contributor for the node.js zookeeper module: 
 https://github.com/yfinkelstein/node-zookeeper
 i'm using zk 3.3.3 for the purposes of this issue, but i have validated it 
 fails on 3.3.1 and 3.3.2
 i'm having an issue when trying to connect when one of my zookeeper servers 
 is offline.
 if the first server attempted is online, all is good.
 if the offline server is attempted first, then the client is never able to 
 connect to _any_ server.
 inside zookeeper.c a connection loss (-4) is received, the socket is closed 
 and buffers are cleaned up, it then attempts the next server in the list, 
 creates a new socket (which gets the same fd as the previously closed socket) 
 and connecting fails, and it continues to fail seemingly forever.
 The nature of this fail is not that it gets -4 connection loss errors, but 
 that zookeeper_interest doesn't find anything going on on the socket before 
 the user provided timeout kicks things out. I don't want to have to wait 5 
 minutes, even if i could make myself.
 this is the message that follows the connection loss:
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket 
 [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection 
 timed out (exceeded timeout by 3ms)
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest 
 returned error: -7 - operation timeout
 While investigating, i decided to comment out close(zh-fd) in handle_error 
 (zookeeper.c#1153)
 now everything works (obviously i'm leaking an fd). Connection the the second 
 host works immediately.
 this is the behavior i'm looking for, though i clearly don't want to leak the 
 fd, so i'm wondering why the fd re-use is causing this issue.
 close() is not returning an error (i checked even though current code assumes 
 success).
 i'm on osx 10.6.7
 i tried adding a setsockopt so_linger (though i didn't want that to be a 
 solution), it didn't work.
 full debug traces are included in issue here: 
 https://github.com/yfinkelstein/node-zookeeper/issues/6



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (ZOOKEEPER-1057) zookeeper c-client, connection to offline server fails to successfully fallback to second zk host

2011-07-27 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1057:
-

Fix Version/s: (was: 3.3.4)
   (was: 3.4.0)
   3.5.0

Not a blocker.

 zookeeper c-client, connection to offline server fails to successfully 
 fallback to second zk host
 -

 Key: ZOOKEEPER-1057
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1057
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.1, 3.3.2, 3.3.3
 Environment: snowdutyrise-lm ~/- uname -a
 Darwin snowdutyrise-lm 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 
 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
 also observed on:
 2.6.35-28-server 49-Ubuntu SMP Tue Mar 1 14:55:37 UTC 2011
Reporter: Woody Anderson
 Fix For: 3.5.0


 Hello, I'm a contributor for the node.js zookeeper module: 
 https://github.com/yfinkelstein/node-zookeeper
 i'm using zk 3.3.3 for the purposes of this issue, but i have validated it 
 fails on 3.3.1 and 3.3.2
 i'm having an issue when trying to connect when one of my zookeeper servers 
 is offline.
 if the first server attempted is online, all is good.
 if the offline server is attempted first, then the client is never able to 
 connect to _any_ server.
 inside zookeeper.c a connection loss (-4) is received, the socket is closed 
 and buffers are cleaned up, it then attempts the next server in the list, 
 creates a new socket (which gets the same fd as the previously closed socket) 
 and connecting fails, and it continues to fail seemingly forever.
 The nature of this fail is not that it gets -4 connection loss errors, but 
 that zookeeper_interest doesn't find anything going on on the socket before 
 the user provided timeout kicks things out. I don't want to have to wait 5 
 minutes, even if i could make myself.
 this is the message that follows the connection loss:
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket 
 [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection 
 timed out (exceeded timeout by 3ms)
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest 
 returned error: -7 - operation timeout
 While investigating, i decided to comment out close(zh-fd) in handle_error 
 (zookeeper.c#1153)
 now everything works (obviously i'm leaking an fd). Connection the the second 
 host works immediately.
 this is the behavior i'm looking for, though i clearly don't want to leak the 
 fd, so i'm wondering why the fd re-use is causing this issue.
 close() is not returning an error (i checked even though current code assumes 
 success).
 i'm on osx 10.6.7
 i tried adding a setsockopt so_linger (though i didn't want that to be a 
 solution), it didn't work.
 full debug traces are included in issue here: 
 https://github.com/yfinkelstein/node-zookeeper/issues/6

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1057) zookeeper c-client, connection to offline server fails to successfully fallback to second zk host

2011-05-12 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1057:
-

Fix Version/s: 3.4.0

 zookeeper c-client, connection to offline server fails to successfully 
 fallback to second zk host
 -

 Key: ZOOKEEPER-1057
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1057
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.1, 3.3.2, 3.3.3
 Environment: snowdutyrise-lm ~/- uname -a
 Darwin snowdutyrise-lm 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 
 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
 also observed on:
 2.6.35-28-server 49-Ubuntu SMP Tue Mar 1 14:55:37 UTC 2011
Reporter: Woody Anderson
 Fix For: 3.3.4, 3.4.0


 Hello, I'm a contributor for the node.js zookeeper module: 
 https://github.com/yfinkelstein/node-zookeeper
 i'm using zk 3.3.3 for the purposes of this issue, but i have validated it 
 fails on 3.3.1 and 3.3.2
 i'm having an issue when trying to connect when one of my zookeeper servers 
 is offline.
 if the first server attempted is online, all is good.
 if the offline server is attempted first, then the client is never able to 
 connect to _any_ server.
 inside zookeeper.c a connection loss (-4) is received, the socket is closed 
 and buffers are cleaned up, it then attempts the next server in the list, 
 creates a new socket (which gets the same fd as the previously closed socket) 
 and connecting fails, and it continues to fail seemingly forever.
 The nature of this fail is not that it gets -4 connection loss errors, but 
 that zookeeper_interest doesn't find anything going on on the socket before 
 the user provided timeout kicks things out. I don't want to have to wait 5 
 minutes, even if i could make myself.
 this is the message that follows the connection loss:
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket 
 [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection 
 timed out (exceeded timeout by 3ms)
 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest 
 returned error: -7 - operation timeout
 While investigating, i decided to comment out close(zh-fd) in handle_error 
 (zookeeper.c#1153)
 now everything works (obviously i'm leaking an fd). Connection the the second 
 host works immediately.
 this is the behavior i'm looking for, though i clearly don't want to leak the 
 fd, so i'm wondering why the fd re-use is causing this issue.
 close() is not returning an error (i checked even though current code assumes 
 success).
 i'm on osx 10.6.7
 i tried adding a setsockopt so_linger (though i didn't want that to be a 
 solution), it didn't work.
 full debug traces are included in issue here: 
 https://github.com/yfinkelstein/node-zookeeper/issues/6

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira