Re: Issues with leader shutdown in a 3-node zookeeper cluster
Hi Andor, This issue never got a solution. I'm experiencing the same simptoms. Zoo client doesn't connect to any nodes of the cluster. Also a Nifi cluster that relies in Zoo to choose his cluster coordinator can't connect. Regards, Diego -- Sent from: http://zookeeper-user.578899.n2.nabble.com/
Re: Issues with leader shutdown in a 3-node zookeeper cluster
Hi Sushil, None of your leftover servers are responding to the client session creation requests (client timeouts), but the socket can be established correctly. Would you please share your server logs too? Andor > On 2019. Dec 3., at 1:14, Sushil Kumar wrote: > > I am still struggling to find the fix for this issue. > Another problem I am facing is I don't get any other emails except for > Damien, I am not telling that you guys do not reply, I am saying I am not > receiving those emails, not sure what is going on, they are not even in the > spam folder. > > On Wed, Nov 27, 2019 at 8:09 AM Sushil Kumar wrote: > >> Thanks Damien for the reply. >> >> That was something I had already tried. >> I wrote single ip in my notes to show that even specific running nodes are >> also not providing the connection. >> >> Can you by any chance include in this email other people who have replied >> earlier. I dont have their email addresses since i never received their >> replies and archive so not show email addreses. >> >> >> On Tue, Nov 26, 2019, 11:41 PM Damien Diederen >> wrote: >> >>> >>> Sushil, >>> I have put the gist of connection string and mntr outputs, i tried connecting to the left-over quorum cluster without any luck. https://gist.github.com/sushilkm/b8a540acc487830adaa5acae3a166d51 >>> >>> Combining this, from your notes: >>> >>>$ zkCli.sh -server "10.251.0.6:2181" >>> >>> with what Andor pointed out: >>> > zkCli.sh is trying to connect localhost only by default, if you run > it without parameters. > > If the node that you're trying to connect to is down (which is > completely fine, if you still have quorum), you should provide a > connection string (list of nodes) with at least 1 running server. >>> >>> You are not running zkCli.sh without parameters, but you are only >>> telling it about a single server; it thus doesn't have anywhere to fall >>> back when that single node becomes unreachable. >>> >>> Try something like: >>> >>>$ zkCli.sh -server "10.251.0.6:2181,10.251.0.X:2181,10.251.0.Y:2181" >>> >>> where 10.251.0.X and 10.251.0.Y are replaced by the addresses of the >>> other ensemble members. >>> >>> (This is not specific to the "CLI"; other clients also have to be given >>> a "sufficient" connection string to be able to failover. It doesn't >>> *have* to reference the full ensemble, but providing a single member >>> definitely won't cut it.) >>> >>> HTH, -D >>> >> > > -- > -- > > Thanks > > Sushil Kumar > +1-(206)-698-4116
Issues with leader shutdown in a 3-node zookeeper cluster
I am still struggling to find the fix for this issue. Another problem I am facing is I don't get any other emails except for Damien, I am not telling that you guys do not reply, I am saying I am not receiving those emails, not sure what is going on, they are not even in the spam folder. On Wed, Nov 27, 2019 at 8:09 AM Sushil Kumar wrote: > Thanks Damien for the reply. > > That was something I had already tried. > I wrote single ip in my notes to show that even specific running nodes are > also not providing the connection. > > Can you by any chance include in this email other people who have replied > earlier. I dont have their email addresses since i never received their > replies and archive so not show email addreses. > > > On Tue, Nov 26, 2019, 11:41 PM Damien Diederen > wrote: > >> >> Sushil, >> >> > I have put the gist of connection string and mntr outputs, i tried >> > connecting to the left-over quorum cluster without any luck. >> > https://gist.github.com/sushilkm/b8a540acc487830adaa5acae3a166d51 >> >> Combining this, from your notes: >> >> $ zkCli.sh -server "10.251.0.6:2181" >> >> with what Andor pointed out: >> >> >> zkCli.sh is trying to connect localhost only by default, if you run >> >> it without parameters. >> >> >> >> If the node that you're trying to connect to is down (which is >> >> completely fine, if you still have quorum), you should provide a >> >> connection string (list of nodes) with at least 1 running server. >> >> You are not running zkCli.sh without parameters, but you are only >> telling it about a single server; it thus doesn't have anywhere to fall >> back when that single node becomes unreachable. >> >> Try something like: >> >> $ zkCli.sh -server "10.251.0.6:2181,10.251.0.X:2181,10.251.0.Y:2181" >> >> where 10.251.0.X and 10.251.0.Y are replaced by the addresses of the >> other ensemble members. >> >> (This is not specific to the "CLI"; other clients also have to be given >> a "sufficient" connection string to be able to failover. It doesn't >> *have* to reference the full ensemble, but providing a single member >> definitely won't cut it.) >> >> HTH, -D >> > -- -- Thanks Sushil Kumar +1-(206)-698-4116
Re: Issues with leader shutdown in a 3-node zookeeper cluster
Sushil I am sorry, I did not send any other email :-) I saw Damien is already giving sensible advice. Enrico Il giorno mer 27 nov 2019 alle ore 17:10 Sushil Kumar ha scritto: > Thanks Damien for the reply. > > That was something I had already tried. > I wrote single ip in my notes to show that even specific running nodes are > also not providing the connection. > > Can you by any chance include in this email other people who have replied > earlier. I dont have their email addresses since i never received their > replies and archive so not show email addreses. > > > On Tue, Nov 26, 2019, 11:41 PM Damien Diederen > wrote: > > > > > Sushil, > > > > > I have put the gist of connection string and mntr outputs, i tried > > > connecting to the left-over quorum cluster without any luck. > > > https://gist.github.com/sushilkm/b8a540acc487830adaa5acae3a166d51 > > > > Combining this, from your notes: > > > > $ zkCli.sh -server "10.251.0.6:2181" > > > > with what Andor pointed out: > > > > >> zkCli.sh is trying to connect localhost only by default, if you run > > >> it without parameters. > > >> > > >> If the node that you're trying to connect to is down (which is > > >> completely fine, if you still have quorum), you should provide a > > >> connection string (list of nodes) with at least 1 running server. > > > > You are not running zkCli.sh without parameters, but you are only > > telling it about a single server; it thus doesn't have anywhere to fall > > back when that single node becomes unreachable. > > > > Try something like: > > > > $ zkCli.sh -server "10.251.0.6:2181,10.251.0.X:2181,10.251.0.Y:2181" > > > > where 10.251.0.X and 10.251.0.Y are replaced by the addresses of the > > other ensemble members. > > > > (This is not specific to the "CLI"; other clients also have to be given > > a "sufficient" connection string to be able to failover. It doesn't > > *have* to reference the full ensemble, but providing a single member > > definitely won't cut it.) > > > > HTH, -D > > >
Re: Issues with leader shutdown in a 3-node zookeeper cluster
Thanks Damien for the reply. That was something I had already tried. I wrote single ip in my notes to show that even specific running nodes are also not providing the connection. Can you by any chance include in this email other people who have replied earlier. I dont have their email addresses since i never received their replies and archive so not show email addreses. On Tue, Nov 26, 2019, 11:41 PM Damien Diederen wrote: > > Sushil, > > > I have put the gist of connection string and mntr outputs, i tried > > connecting to the left-over quorum cluster without any luck. > > https://gist.github.com/sushilkm/b8a540acc487830adaa5acae3a166d51 > > Combining this, from your notes: > > $ zkCli.sh -server "10.251.0.6:2181" > > with what Andor pointed out: > > >> zkCli.sh is trying to connect localhost only by default, if you run > >> it without parameters. > >> > >> If the node that you're trying to connect to is down (which is > >> completely fine, if you still have quorum), you should provide a > >> connection string (list of nodes) with at least 1 running server. > > You are not running zkCli.sh without parameters, but you are only > telling it about a single server; it thus doesn't have anywhere to fall > back when that single node becomes unreachable. > > Try something like: > > $ zkCli.sh -server "10.251.0.6:2181,10.251.0.X:2181,10.251.0.Y:2181" > > where 10.251.0.X and 10.251.0.Y are replaced by the addresses of the > other ensemble members. > > (This is not specific to the "CLI"; other clients also have to be given > a "sufficient" connection string to be able to failover. It doesn't > *have* to reference the full ensemble, but providing a single member > definitely won't cut it.) > > HTH, -D >
Re: Issues with leader shutdown in a 3-node zookeeper cluster
Sushil, > I have put the gist of connection string and mntr outputs, i tried > connecting to the left-over quorum cluster without any luck. > https://gist.github.com/sushilkm/b8a540acc487830adaa5acae3a166d51 Combining this, from your notes: $ zkCli.sh -server "10.251.0.6:2181" with what Andor pointed out: >> zkCli.sh is trying to connect localhost only by default, if you run >> it without parameters. >> >> If the node that you're trying to connect to is down (which is >> completely fine, if you still have quorum), you should provide a >> connection string (list of nodes) with at least 1 running server. You are not running zkCli.sh without parameters, but you are only telling it about a single server; it thus doesn't have anywhere to fall back when that single node becomes unreachable. Try something like: $ zkCli.sh -server "10.251.0.6:2181,10.251.0.X:2181,10.251.0.Y:2181" where 10.251.0.X and 10.251.0.Y are replaced by the addresses of the other ensemble members. (This is not specific to the "CLI"; other clients also have to be given a "sufficient" connection string to be able to failover. It doesn't *have* to reference the full ensemble, but providing a single member definitely won't cut it.) HTH, -D
Re: Issues with leader shutdown in a 3-node zookeeper cluster
Hello Andor/Enrico/Damien I am not sure why am I not receiving the emails sent from other users except Damien There are more replies on archives board than in my mailbox. I have put the gist of connection string and mntr outputs, i tried connecting to the left-over quorum cluster without any luck. https://gist.github.com/sushilkm/b8a540acc487830adaa5acae3a166d51 Thanks Sushil Kumar On Fri, Nov 22, 2019 at 1:32 PM Damien Diederen wrote: > > Hi Sushil, > > > Did I miss something? > > What is Andor's suggestion? > > It seems you missed this message: > > > https://mail-archives.apache.org/mod_mbox/zookeeper-user/201911.mbox/%3Cacc8526c7a99cb71962fb2b6f8c824f772c8032f.camel%40apache.org%3E > > HTH, -D >
Re: Issues with leader shutdown in a 3-node zookeeper cluster
Hi Sushil, > Did I miss something? > What is Andor's suggestion? It seems you missed this message: https://mail-archives.apache.org/mod_mbox/zookeeper-user/201911.mbox/%3Cacc8526c7a99cb71962fb2b6f8c824f772c8032f.camel%40apache.org%3E HTH, -D
Re: Issues with leader shutdown in a 3-node zookeeper cluster
Hello Damien Did I miss something? What is Andor's suggestion? Thanks Sushil On Thu, Nov 21, 2019, 10:36 AM Damien Diederen wrote: > > Hi Sushil, > > > This is what I see when running the monitor command > > > > $ echo mntr | nc nifi-investigate-zk-zk-1 2181 > > zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on > > 10/08/2019 20:18 GMT > […] > > Okay, the other nodes seem to work fine, indeed; this is unrelated to > the issue I have encountered in the past. > > You may have more luck following Andor's suggestion. > > Best, -D >
Re: Issues with leader shutdown in a 3-node zookeeper cluster
Hi Sushil, > This is what I see when running the monitor command > > $ echo mntr | nc nifi-investigate-zk-zk-1 2181 > zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on > 10/08/2019 20:18 GMT […] Okay, the other nodes seem to work fine, indeed; this is unrelated to the issue I have encountered in the past. You may have more luck following Andor's suggestion. Best, -D
Re: Issues with leader shutdown in a 3-node zookeeper cluster
Hi Sushil, zkCli.sh is trying to connect localhost only by default, if you run it without parameters. If the node that you're trying to connect to is down (which is completely fine, if you still have quorum), you should provide a connection string (list of nodes) with at least 1 running server. Andor -Original Message- From: Sushil Kumar Reply-To: user@zookeeper.apache.org To: user@zookeeper.apache.org Subject: Issues with leader shutdown in a 3-node zookeeper cluster Date: Tue, 19 Nov 2019 17:09:08 -0800 Hello I am trying to run a 3-node zookeeper cluster. It starts up good and I am able to access it. However, as soon as I shutdown the leader, some other node out of left-overs becomes a primary node which I believe is working as expected. However, if I try to connect using the zkCli.sh in this state, it cannot connect, it always remains in connecting state, and there is no way now that I can access my zookeeper cluster. The only way I have been able to fix is stop all nodes and start then in sequence. Couple of questions. First of all that zkCli.sh behavior with the cluster does not looks something a happy path to me. I doubt if my cluster is behaving good. Now if this cluster is not working why does my cluster status appear working "LEADER/FOLLOWER" for each left over node. I tried this with 5-node cluster and noticed exactly the same behavior. So I wonder how do people generally manage a working zookeeper cluster with leader going down. Thanks Sushil Kumar
Re: Issues with leader shutdown in a 3-node zookeeper cluster
Hello Damien Thanks for replying back on this. This is what I see when running the monitor command $ echo mntr | nc nifi-investigate-zk-zk-1 2181 zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on 10/08/2019 20:18 GMT zk_avg_latency 0 zk_max_latency 9 zk_min_latency 0 zk_packets_received 607609 zk_packets_sent 607608 zk_num_alive_connections 2 zk_outstanding_requests 0 zk_server_state follower zk_znode_count 9 zk_watch_count 0 zk_ephemerals_count 0 zk_approximate_data_size 281 zk_open_file_descriptor_count 68 zk_max_file_descriptor_count 4096 $ echo mntr | nc nifi-investigate-zk-zk-2 2181 zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on 10/08/2019 20:18 GMT zk_avg_latency 0 zk_max_latency 17 zk_min_latency 0 zk_packets_received 41179 zk_packets_sent 41178 zk_num_alive_connections 3 zk_outstanding_requests 0 zk_server_state leader zk_znode_count 9 zk_watch_count 0 zk_ephemerals_count 0 zk_approximate_data_size 281 zk_open_file_descriptor_count 70 zk_max_file_descriptor_count 4096 zk_followers 1 zk_synced_followers 1 zk_pending_syncs 0 zk_last_proposal_size 32 zk_max_proposal_size 125 zk_min_proposal_size 32 Regarding the hostname resolution, I am not using any zoo.conf, hostnames are recognized by dns itself. Thanks Sushil Kumar On Wed, Nov 20, 2019 at 12:01 AM Damien Diederen wrote: > > Hi Sushil, > > > I am trying to run a 3-node zookeeper cluster. > > It starts up good and I am able to access it. > > However, as soon as I shutdown the leader, some other node out of > > left-overs becomes a primary node which I believe is working as expected. > > Are you sure about that? Does everything look normal if you issue a > "monitor" command on one of the survivors, using either: > > echo mntr | nc example.com 2181 > > or by visiting: > > http://example.com:8080/commands/monitor > > Or do you get a message such as "This ZooKeeper instance is not > currently serving requests"? > > > However, if I try to connect using the zkCli.sh in this state, it cannot > > connect, it always remains in connecting state, and there is no way now > > that I can access my zookeeper cluster. > > > > The only way I have been able to fix is stop all nodes and start then in > > sequence. > > > > Couple of questions. > > First of all that zkCli.sh behavior with the cluster does not looks > > something a happy path to me. I doubt if my cluster is behaving good. Now > > if this cluster is not working why does my cluster status appear working > > "LEADER/FOLLOWER" for each left over node. > > I have seen such problems in some configurations where the ensemble was > unable to recover due to flaky (?) host name resolution, and have found > using IP addresses in zoo.conf to be more reliable. Are you using host > names in zoo.conf? > > > I tried this with 5-node cluster and noticed exactly the same behavior. > > So I wonder how do people generally manage a working zookeeper cluster > with > > leader going down. > > Best, -D > -- -- Thanks Sushil Kumar +1-(206)-698-4116
Re: Issues with leader shutdown in a 3-node zookeeper cluster
Hi Sushil, > I am trying to run a 3-node zookeeper cluster. > It starts up good and I am able to access it. > However, as soon as I shutdown the leader, some other node out of > left-overs becomes a primary node which I believe is working as expected. Are you sure about that? Does everything look normal if you issue a "monitor" command on one of the survivors, using either: echo mntr | nc example.com 2181 or by visiting: http://example.com:8080/commands/monitor Or do you get a message such as "This ZooKeeper instance is not currently serving requests"? > However, if I try to connect using the zkCli.sh in this state, it cannot > connect, it always remains in connecting state, and there is no way now > that I can access my zookeeper cluster. > > The only way I have been able to fix is stop all nodes and start then in > sequence. > > Couple of questions. > First of all that zkCli.sh behavior with the cluster does not looks > something a happy path to me. I doubt if my cluster is behaving good. Now > if this cluster is not working why does my cluster status appear working > "LEADER/FOLLOWER" for each left over node. I have seen such problems in some configurations where the ensemble was unable to recover due to flaky (?) host name resolution, and have found using IP addresses in zoo.conf to be more reliable. Are you using host names in zoo.conf? > I tried this with 5-node cluster and noticed exactly the same behavior. > So I wonder how do people generally manage a working zookeeper cluster with > leader going down. Best, -D
Re: Issues with leader shutdown in a 3-node zookeeper cluster
Hi Sushil Il mer 20 nov 2019, 02:22 Sushil Kumar ha scritto: > Hello > > > I am trying to run a 3-node zookeeper cluster. > It starts up good and I am able to access it. > However, as soon as I shutdown the leader, some other node out of > left-overs becomes a primary node which I believe is working as expected. > > However, if I try to connect using the zkCli.sh in this state How does your connection string look like? Are you passing the list of all of the servers? >From which machine are you using zkCli? Enrico , it cannot > connect, it always remains in connecting state, and there is no way now > that I can access my zookeeper cluster. > > The only way I have been able to fix is stop all nodes and start then in > sequence. > > Couple of questions. > First of all that zkCli.sh behavior with the cluster does not looks > something a happy path to me. I doubt if my cluster is behaving good. Now > if this cluster is not working why does my cluster status appear working > "LEADER/FOLLOWER" for each left over node. > > I tried this with 5-node cluster and noticed exactly the same behavior. > So I wonder how do people generally manage a working zookeeper cluster with > leader going down. > > Thanks > Sushil Kumar >
Issues with leader shutdown in a 3-node zookeeper cluster
Hello I am trying to run a 3-node zookeeper cluster. It starts up good and I am able to access it. However, as soon as I shutdown the leader, some other node out of left-overs becomes a primary node which I believe is working as expected. However, if I try to connect using the zkCli.sh in this state, it cannot connect, it always remains in connecting state, and there is no way now that I can access my zookeeper cluster. The only way I have been able to fix is stop all nodes and start then in sequence. Couple of questions. First of all that zkCli.sh behavior with the cluster does not looks something a happy path to me. I doubt if my cluster is behaving good. Now if this cluster is not working why does my cluster status appear working "LEADER/FOLLOWER" for each left over node. I tried this with 5-node cluster and noticed exactly the same behavior. So I wonder how do people generally manage a working zookeeper cluster with leader going down. Thanks Sushil Kumar