[jira] [Updated] (ZOOKEEPER-1856) zookeeper C-client can fail to switch from a dead server in a 3+ server ensemble if the client only has a 2 server list.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han updated ZOOKEEPER-1856: --- Fix Version/s: (was: 3.5.3) 3.5.4 > zookeeper C-client can fail to switch from a dead server in a 3+ server > ensemble if the client only has a 2 server list. > > > Key: ZOOKEEPER-1856 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1856 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Reporter: Dutch T. Meyer >Assignee: Michi Mutsuzaki > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1856.patch > > > If a client has a 2 server list, and is currently connected to the last > server in that list, and that server then goes offline, the addrvec_next() > call handle_error() will push the client to the start of the list and > terminate the connection. > Then, the zoo_cycle_next_server() call in zookeeper_interest will be called > in response to the connection failure, and the client will cycle back to the > failed server. > In this way, a client who has a list of only 2 servers can get stuck on the > one failed server. This would only be an issue in an ensemble larger than 2 > of course, because failing 1 out of 2 would lead to quorum loss anyway. > There are other harmonics possible if every other server in the list is > failed, but this is simplest to reproduce in a 3 server ensemble where the > client only knows about 2 servers, one of which then fails. There are > probably some elegant fixes here, but I think the simplest is to add a flag > to track whether a server has been accessed before, and if it hasn't, don't > call zoo_cycle_next_server() at the top of the zookeeper_interest() function. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ZOOKEEPER-1856) zookeeper C-client can fail to switch from a dead server in a 3+ server ensemble if the client only has a 2 server list.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-1856: Priority: Major (was: Minor) > zookeeper C-client can fail to switch from a dead server in a 3+ server > ensemble if the client only has a 2 server list. > > > Key: ZOOKEEPER-1856 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1856 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Reporter: Dutch T. Meyer >Assignee: Michi Mutsuzaki > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-1856.patch > > > If a client has a 2 server list, and is currently connected to the last > server in that list, and that server then goes offline, the addrvec_next() > call handle_error() will push the client to the start of the list and > terminate the connection. > Then, the zoo_cycle_next_server() call in zookeeper_interest will be called > in response to the connection failure, and the client will cycle back to the > failed server. > In this way, a client who has a list of only 2 servers can get stuck on the > one failed server. This would only be an issue in an ensemble larger than 2 > of course, because failing 1 out of 2 would lead to quorum loss anyway. > There are other harmonics possible if every other server in the list is > failed, but this is simplest to reproduce in a 3 server ensemble where the > client only knows about 2 servers, one of which then fails. There are > probably some elegant fixes here, but I think the simplest is to add a flag > to track whether a server has been accessed before, and if it hasn't, don't > call zoo_cycle_next_server() at the top of the zookeeper_interest() function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-1856) zookeeper C-client can fail to switch from a dead server in a 3+ server ensemble if the client only has a 2 server list.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-1856: Fix Version/s: 3.6.0 3.5.3 > zookeeper C-client can fail to switch from a dead server in a 3+ server > ensemble if the client only has a 2 server list. > > > Key: ZOOKEEPER-1856 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1856 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Reporter: Dutch T. Meyer >Assignee: Michi Mutsuzaki >Priority: Minor > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-1856.patch > > > If a client has a 2 server list, and is currently connected to the last > server in that list, and that server then goes offline, the addrvec_next() > call handle_error() will push the client to the start of the list and > terminate the connection. > Then, the zoo_cycle_next_server() call in zookeeper_interest will be called > in response to the connection failure, and the client will cycle back to the > failed server. > In this way, a client who has a list of only 2 servers can get stuck on the > one failed server. This would only be an issue in an ensemble larger than 2 > of course, because failing 1 out of 2 would lead to quorum loss anyway. > There are other harmonics possible if every other server in the list is > failed, but this is simplest to reproduce in a 3 server ensemble where the > client only knows about 2 servers, one of which then fails. There are > probably some elegant fixes here, but I think the simplest is to add a flag > to track whether a server has been accessed before, and if it hasn't, don't > call zoo_cycle_next_server() at the top of the zookeeper_interest() function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-1856) zookeeper C-client can fail to switch from a dead server in a 3+ server ensemble if the client only has a 2 server list.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michi Mutsuzaki updated ZOOKEEPER-1856: --- Attachment: ZOOKEEPER-1856.patch > zookeeper C-client can fail to switch from a dead server in a 3+ server > ensemble if the client only has a 2 server list. > > > Key: ZOOKEEPER-1856 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1856 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Reporter: Dutch T. Meyer >Assignee: Michi Mutsuzaki >Priority: Minor > Attachments: ZOOKEEPER-1856.patch > > > If a client has a 2 server list, and is currently connected to the last > server in that list, and that server then goes offline, the addrvec_next() > call handle_error() will push the client to the start of the list and > terminate the connection. > Then, the zoo_cycle_next_server() call in zookeeper_interest will be called > in response to the connection failure, and the client will cycle back to the > failed server. > In this way, a client who has a list of only 2 servers can get stuck on the > one failed server. This would only be an issue in an ensemble larger than 2 > of course, because failing 1 out of 2 would lead to quorum loss anyway. > There are other harmonics possible if every other server in the list is > failed, but this is simplest to reproduce in a 3 server ensemble where the > client only knows about 2 servers, one of which then fails. There are > probably some elegant fixes here, but I think the simplest is to add a flag > to track whether a server has been accessed before, and if it hasn't, don't > call zoo_cycle_next_server() at the top of the zookeeper_interest() function. -- This message was sent by Atlassian JIRA (v6.2#6252)