Re: Issues with leader shutdown in a 3-node zookeeper cluster

2021-02-23 Thread diego2glez
Hi Andor,

This issue never got a solution. I'm experiencing the same simptoms. Zoo
client doesn't connect to any nodes of the cluster. Also a Nifi cluster that
relies in Zoo to choose his cluster coordinator can't connect.

Regards,
Diego



--
Sent from: http://zookeeper-user.578899.n2.nabble.com/


Re: Issues with leader shutdown in a 3-node zookeeper cluster

2020-01-06 Thread Andor Molnar
Hi Sushil,

None of your leftover servers are responding to the client session creation 
requests (client timeouts), but the socket can be established correctly. Would 
you please share your server logs too?

Andor



> On 2019. Dec 3., at 1:14, Sushil Kumar  wrote:
> 
> I am still struggling to find the fix for this issue.
> Another problem I am facing is I don't get any other emails except for
> Damien, I am not telling that you guys do not reply, I am saying I am not
> receiving those emails, not sure what is going on, they are not even in the
> spam folder.
> 
> On Wed, Nov 27, 2019 at 8:09 AM Sushil Kumar  wrote:
> 
>> Thanks Damien for the reply.
>> 
>> That was something I had already tried.
>> I wrote single ip in my notes to show that even specific running nodes are
>> also not providing the connection.
>> 
>> Can you by any chance include in this email other people who have replied
>> earlier. I dont have their email addresses since i never received their
>> replies and archive so not show email addreses.
>> 
>> 
>> On Tue, Nov 26, 2019, 11:41 PM Damien Diederen 
>> wrote:
>> 
>>> 
>>> Sushil,
>>> 
 I have put the gist of connection string and mntr outputs, i tried
 connecting to the left-over quorum cluster without any luck.
 https://gist.github.com/sushilkm/b8a540acc487830adaa5acae3a166d51
>>> 
>>> Combining this, from your notes:
>>> 
>>>$ zkCli.sh -server "10.251.0.6:2181"
>>> 
>>> with what Andor pointed out:
>>> 
> zkCli.sh is trying to connect localhost only by default, if you run
> it without parameters.
> 
> If the node that you're trying to connect to is down (which is
> completely fine, if you still have quorum), you should provide a
> connection string (list of nodes) with at least 1 running server.
>>> 
>>> You are not running zkCli.sh without parameters, but you are only
>>> telling it about a single server; it thus doesn't have anywhere to fall
>>> back when that single node becomes unreachable.
>>> 
>>> Try something like:
>>> 
>>>$ zkCli.sh -server "10.251.0.6:2181,10.251.0.X:2181,10.251.0.Y:2181"
>>> 
>>> where 10.251.0.X and 10.251.0.Y are replaced by the addresses of the
>>> other ensemble members.
>>> 
>>> (This is not specific to the "CLI"; other clients also have to be given
>>> a "sufficient" connection string to be able to failover.  It doesn't
>>> *have* to reference the full ensemble, but providing a single member
>>> definitely won't cut it.)
>>> 
>>> HTH, -D
>>> 
>> 
> 
> -- 
> -- 
> 
> Thanks
> 
> Sushil Kumar
> +1-(206)-698-4116



Issues with leader shutdown in a 3-node zookeeper cluster

2019-12-02 Thread Sushil Kumar
I am still struggling to find the fix for this issue.
Another problem I am facing is I don't get any other emails except for
Damien, I am not telling that you guys do not reply, I am saying I am not
receiving those emails, not sure what is going on, they are not even in the
spam folder.

On Wed, Nov 27, 2019 at 8:09 AM Sushil Kumar  wrote:

> Thanks Damien for the reply.
>
> That was something I had already tried.
> I wrote single ip in my notes to show that even specific running nodes are
> also not providing the connection.
>
> Can you by any chance include in this email other people who have replied
> earlier. I dont have their email addresses since i never received their
> replies and archive so not show email addreses.
>
>
> On Tue, Nov 26, 2019, 11:41 PM Damien Diederen 
> wrote:
>
>>
>> Sushil,
>>
>> > I have put the gist of connection string and mntr outputs, i tried
>> > connecting to the left-over quorum cluster without any luck.
>> > https://gist.github.com/sushilkm/b8a540acc487830adaa5acae3a166d51
>>
>> Combining this, from your notes:
>>
>> $ zkCli.sh -server "10.251.0.6:2181"
>>
>> with what Andor pointed out:
>>
>> >> zkCli.sh is trying to connect localhost only by default, if you run
>> >> it without parameters.
>> >>
>> >> If the node that you're trying to connect to is down (which is
>> >> completely fine, if you still have quorum), you should provide a
>> >> connection string (list of nodes) with at least 1 running server.
>>
>> You are not running zkCli.sh without parameters, but you are only
>> telling it about a single server; it thus doesn't have anywhere to fall
>> back when that single node becomes unreachable.
>>
>> Try something like:
>>
>> $ zkCli.sh -server "10.251.0.6:2181,10.251.0.X:2181,10.251.0.Y:2181"
>>
>> where 10.251.0.X and 10.251.0.Y are replaced by the addresses of the
>> other ensemble members.
>>
>> (This is not specific to the "CLI"; other clients also have to be given
>> a "sufficient" connection string to be able to failover.  It doesn't
>> *have* to reference the full ensemble, but providing a single member
>> definitely won't cut it.)
>>
>> HTH, -D
>>
>

-- 
-- 

Thanks

Sushil Kumar
+1-(206)-698-4116


Re: Issues with leader shutdown in a 3-node zookeeper cluster

2019-11-29 Thread Enrico Olivelli
Sushil
I am sorry,
I did not send any other email :-)
I saw Damien is already giving sensible advice.

Enrico


Il giorno mer 27 nov 2019 alle ore 17:10 Sushil Kumar 
ha scritto:

> Thanks Damien for the reply.
>
> That was something I had already tried.
> I wrote single ip in my notes to show that even specific running nodes are
> also not providing the connection.
>
> Can you by any chance include in this email other people who have replied
> earlier. I dont have their email addresses since i never received their
> replies and archive so not show email addreses.
>
>
> On Tue, Nov 26, 2019, 11:41 PM Damien Diederen 
> wrote:
>
> >
> > Sushil,
> >
> > > I have put the gist of connection string and mntr outputs, i tried
> > > connecting to the left-over quorum cluster without any luck.
> > > https://gist.github.com/sushilkm/b8a540acc487830adaa5acae3a166d51
> >
> > Combining this, from your notes:
> >
> > $ zkCli.sh -server "10.251.0.6:2181"
> >
> > with what Andor pointed out:
> >
> > >> zkCli.sh is trying to connect localhost only by default, if you run
> > >> it without parameters.
> > >>
> > >> If the node that you're trying to connect to is down (which is
> > >> completely fine, if you still have quorum), you should provide a
> > >> connection string (list of nodes) with at least 1 running server.
> >
> > You are not running zkCli.sh without parameters, but you are only
> > telling it about a single server; it thus doesn't have anywhere to fall
> > back when that single node becomes unreachable.
> >
> > Try something like:
> >
> > $ zkCli.sh -server "10.251.0.6:2181,10.251.0.X:2181,10.251.0.Y:2181"
> >
> > where 10.251.0.X and 10.251.0.Y are replaced by the addresses of the
> > other ensemble members.
> >
> > (This is not specific to the "CLI"; other clients also have to be given
> > a "sufficient" connection string to be able to failover.  It doesn't
> > *have* to reference the full ensemble, but providing a single member
> > definitely won't cut it.)
> >
> > HTH, -D
> >
>


Re: Issues with leader shutdown in a 3-node zookeeper cluster

2019-11-27 Thread Sushil Kumar
Thanks Damien for the reply.

That was something I had already tried.
I wrote single ip in my notes to show that even specific running nodes are
also not providing the connection.

Can you by any chance include in this email other people who have replied
earlier. I dont have their email addresses since i never received their
replies and archive so not show email addreses.


On Tue, Nov 26, 2019, 11:41 PM Damien Diederen 
wrote:

>
> Sushil,
>
> > I have put the gist of connection string and mntr outputs, i tried
> > connecting to the left-over quorum cluster without any luck.
> > https://gist.github.com/sushilkm/b8a540acc487830adaa5acae3a166d51
>
> Combining this, from your notes:
>
> $ zkCli.sh -server "10.251.0.6:2181"
>
> with what Andor pointed out:
>
> >> zkCli.sh is trying to connect localhost only by default, if you run
> >> it without parameters.
> >>
> >> If the node that you're trying to connect to is down (which is
> >> completely fine, if you still have quorum), you should provide a
> >> connection string (list of nodes) with at least 1 running server.
>
> You are not running zkCli.sh without parameters, but you are only
> telling it about a single server; it thus doesn't have anywhere to fall
> back when that single node becomes unreachable.
>
> Try something like:
>
> $ zkCli.sh -server "10.251.0.6:2181,10.251.0.X:2181,10.251.0.Y:2181"
>
> where 10.251.0.X and 10.251.0.Y are replaced by the addresses of the
> other ensemble members.
>
> (This is not specific to the "CLI"; other clients also have to be given
> a "sufficient" connection string to be able to failover.  It doesn't
> *have* to reference the full ensemble, but providing a single member
> definitely won't cut it.)
>
> HTH, -D
>


Re: Issues with leader shutdown in a 3-node zookeeper cluster

2019-11-26 Thread Damien Diederen


Sushil,

> I have put the gist of connection string and mntr outputs, i tried
> connecting to the left-over quorum cluster without any luck.
> https://gist.github.com/sushilkm/b8a540acc487830adaa5acae3a166d51

Combining this, from your notes:

$ zkCli.sh -server "10.251.0.6:2181"

with what Andor pointed out:

>> zkCli.sh is trying to connect localhost only by default, if you run
>> it without parameters.
>>
>> If the node that you're trying to connect to is down (which is
>> completely fine, if you still have quorum), you should provide a
>> connection string (list of nodes) with at least 1 running server.

You are not running zkCli.sh without parameters, but you are only
telling it about a single server; it thus doesn't have anywhere to fall
back when that single node becomes unreachable.

Try something like:

$ zkCli.sh -server "10.251.0.6:2181,10.251.0.X:2181,10.251.0.Y:2181"

where 10.251.0.X and 10.251.0.Y are replaced by the addresses of the
other ensemble members.

(This is not specific to the "CLI"; other clients also have to be given
a "sufficient" connection string to be able to failover.  It doesn't
*have* to reference the full ensemble, but providing a single member
definitely won't cut it.)

HTH, -D


Re: Issues with leader shutdown in a 3-node zookeeper cluster

2019-11-25 Thread Sushil Kumar
Hello Andor/Enrico/Damien

I am not sure why am I not receiving the emails sent from other users
except Damien

There are more replies on archives board than in my mailbox.

I have put the gist of connection string and mntr outputs, i tried
connecting to the left-over quorum cluster without any luck.
https://gist.github.com/sushilkm/b8a540acc487830adaa5acae3a166d51


Thanks
Sushil Kumar

On Fri, Nov 22, 2019 at 1:32 PM Damien Diederen 
wrote:

>
> Hi Sushil,
>
> > Did I miss something?
> > What is Andor's suggestion?
>
> It seems you missed this message:
>
>
> https://mail-archives.apache.org/mod_mbox/zookeeper-user/201911.mbox/%3Cacc8526c7a99cb71962fb2b6f8c824f772c8032f.camel%40apache.org%3E
>
> HTH, -D
>


Re: Issues with leader shutdown in a 3-node zookeeper cluster

2019-11-22 Thread Damien Diederen


Hi Sushil,

> Did I miss something?
> What is Andor's suggestion?

It seems you missed this message:


https://mail-archives.apache.org/mod_mbox/zookeeper-user/201911.mbox/%3Cacc8526c7a99cb71962fb2b6f8c824f772c8032f.camel%40apache.org%3E

HTH, -D


Re: Issues with leader shutdown in a 3-node zookeeper cluster

2019-11-21 Thread Sushil Kumar
Hello Damien

Did I miss something?
What is Andor's suggestion?

Thanks
Sushil


On Thu, Nov 21, 2019, 10:36 AM Damien Diederen 
wrote:

>
> Hi Sushil,
>
> > This is what I see when running the monitor command
> >
> > $ echo mntr | nc nifi-investigate-zk-zk-1 2181
> > zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on
> > 10/08/2019 20:18 GMT
> […]
>
> Okay, the other nodes seem to work fine, indeed; this is unrelated to
> the issue I have encountered in the past.
>
> You may have more luck following Andor's suggestion.
>
> Best, -D
>


Re: Issues with leader shutdown in a 3-node zookeeper cluster

2019-11-21 Thread Damien Diederen


Hi Sushil,

> This is what I see when running the monitor command
>
> $ echo mntr | nc nifi-investigate-zk-zk-1 2181
> zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on
> 10/08/2019 20:18 GMT
[…]

Okay, the other nodes seem to work fine, indeed; this is unrelated to
the issue I have encountered in the past.

You may have more luck following Andor's suggestion.

Best, -D


Re: Issues with leader shutdown in a 3-node zookeeper cluster

2019-11-20 Thread Andor Molnar
Hi Sushil,

zkCli.sh is trying to connect localhost only by default, if you run it
without parameters.

If the node that you're trying to connect to is down (which is
completely fine, if you still have quorum), you should provide a
connection string (list of nodes) with at least 1 running server.

Andor



-Original Message-
From: Sushil Kumar 
Reply-To: user@zookeeper.apache.org
To: user@zookeeper.apache.org
Subject: Issues with leader shutdown in a 3-node zookeeper cluster
Date: Tue, 19 Nov 2019 17:09:08 -0800

Hello


I am trying to run a 3-node zookeeper cluster.
It starts up good and I am able to access it.
However, as soon as I shutdown the leader, some other node out of
left-overs becomes a primary node which I believe is working as
expected.

However, if I try to connect using the zkCli.sh in this state, it
cannot
connect, it always remains in connecting state, and there is no way now
that I can access my zookeeper cluster.

The only way I have been able to fix is stop all nodes and start then
in
sequence.

Couple of questions.
First of all that zkCli.sh behavior with the cluster does not looks
something a happy path to me. I doubt if my cluster is behaving good.
Now
if this cluster is not working why does my cluster status appear
working
"LEADER/FOLLOWER" for each left over node.

I tried this with 5-node cluster and noticed exactly the same behavior.
So I wonder how do people generally manage a working zookeeper cluster
with
leader going down.

Thanks
Sushil Kumar



Re: Issues with leader shutdown in a 3-node zookeeper cluster

2019-11-20 Thread Sushil Kumar
Hello Damien
Thanks for replying back on this.

This is what I see when running the monitor command

$ echo mntr | nc nifi-investigate-zk-zk-1 2181
zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on
10/08/2019 20:18 GMT
zk_avg_latency 0
zk_max_latency 9
zk_min_latency 0
zk_packets_received 607609
zk_packets_sent 607608
zk_num_alive_connections 2
zk_outstanding_requests 0
zk_server_state follower
zk_znode_count 9
zk_watch_count 0
zk_ephemerals_count 0
zk_approximate_data_size 281
zk_open_file_descriptor_count 68
zk_max_file_descriptor_count 4096

$ echo mntr | nc nifi-investigate-zk-zk-2 2181
zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on
10/08/2019 20:18 GMT
zk_avg_latency 0
zk_max_latency 17
zk_min_latency 0
zk_packets_received 41179
zk_packets_sent 41178
zk_num_alive_connections 3
zk_outstanding_requests 0
zk_server_state leader
zk_znode_count 9
zk_watch_count 0
zk_ephemerals_count 0
zk_approximate_data_size 281
zk_open_file_descriptor_count 70
zk_max_file_descriptor_count 4096
zk_followers 1
zk_synced_followers 1
zk_pending_syncs 0
zk_last_proposal_size 32
zk_max_proposal_size 125
zk_min_proposal_size 32

Regarding the hostname resolution, I am not using any zoo.conf, hostnames
are recognized by dns itself.

Thanks
Sushil Kumar

On Wed, Nov 20, 2019 at 12:01 AM Damien Diederen 
wrote:

>
> Hi Sushil,
>
> > I am trying to run a 3-node zookeeper cluster.
> > It starts up good and I am able to access it.
> > However, as soon as I shutdown the leader, some other node out of
> > left-overs becomes a primary node which I believe is working as expected.
>
> Are you sure about that?  Does everything look normal if you issue a
> "monitor" command on one of the survivors, using either:
>
> echo mntr | nc example.com 2181
>
> or by visiting:
>
> http://example.com:8080/commands/monitor
>
> Or do you get a message such as "This ZooKeeper instance is not
> currently serving requests"?
>
> > However, if I try to connect using the zkCli.sh in this state, it cannot
> > connect, it always remains in connecting state, and there is no way now
> > that I can access my zookeeper cluster.
> >
> > The only way I have been able to fix is stop all nodes and start then in
> > sequence.
> >
> > Couple of questions.
> > First of all that zkCli.sh behavior with the cluster does not looks
> > something a happy path to me. I doubt if my cluster is behaving good. Now
> > if this cluster is not working why does my cluster status appear working
> > "LEADER/FOLLOWER" for each left over node.
>
> I have seen such problems in some configurations where the ensemble was
> unable to recover due to flaky (?) host name resolution, and have found
> using IP addresses in zoo.conf to be more reliable.  Are you using host
> names in zoo.conf?
>
> > I tried this with 5-node cluster and noticed exactly the same behavior.
> > So I wonder how do people generally manage a working zookeeper cluster
> with
> > leader going down.
>
> Best, -D
>


-- 
-- 

Thanks

Sushil Kumar
+1-(206)-698-4116


Re: Issues with leader shutdown in a 3-node zookeeper cluster

2019-11-20 Thread Damien Diederen


Hi Sushil,

> I am trying to run a 3-node zookeeper cluster.
> It starts up good and I am able to access it.
> However, as soon as I shutdown the leader, some other node out of
> left-overs becomes a primary node which I believe is working as expected.

Are you sure about that?  Does everything look normal if you issue a
"monitor" command on one of the survivors, using either:

echo mntr | nc example.com 2181

or by visiting:

http://example.com:8080/commands/monitor

Or do you get a message such as "This ZooKeeper instance is not
currently serving requests"?

> However, if I try to connect using the zkCli.sh in this state, it cannot
> connect, it always remains in connecting state, and there is no way now
> that I can access my zookeeper cluster.
>
> The only way I have been able to fix is stop all nodes and start then in
> sequence.
>
> Couple of questions.
> First of all that zkCli.sh behavior with the cluster does not looks
> something a happy path to me. I doubt if my cluster is behaving good. Now
> if this cluster is not working why does my cluster status appear working
> "LEADER/FOLLOWER" for each left over node.

I have seen such problems in some configurations where the ensemble was
unable to recover due to flaky (?) host name resolution, and have found
using IP addresses in zoo.conf to be more reliable.  Are you using host
names in zoo.conf?

> I tried this with 5-node cluster and noticed exactly the same behavior.
> So I wonder how do people generally manage a working zookeeper cluster with
> leader going down.

Best, -D


Re: Issues with leader shutdown in a 3-node zookeeper cluster

2019-11-19 Thread Enrico Olivelli
Hi Sushil

Il mer 20 nov 2019, 02:22 Sushil Kumar  ha scritto:

> Hello
>
>
> I am trying to run a 3-node zookeeper cluster.
> It starts up good and I am able to access it.
> However, as soon as I shutdown the leader, some other node out of
> left-overs becomes a primary node which I believe is working as expected.
>
> However, if I try to connect using the zkCli.sh in this state


How does your connection string look like? Are you passing the list of all
of the servers?
>From which machine are you using zkCli?


Enrico



, it cannot
> connect, it always remains in connecting state, and there is no way now
> that I can access my zookeeper cluster.
>
> The only way I have been able to fix is stop all nodes and start then in
> sequence.
>
> Couple of questions.
> First of all that zkCli.sh behavior with the cluster does not looks
> something a happy path to me. I doubt if my cluster is behaving good. Now
> if this cluster is not working why does my cluster status appear working
> "LEADER/FOLLOWER" for each left over node.
>
> I tried this with 5-node cluster and noticed exactly the same behavior.
> So I wonder how do people generally manage a working zookeeper cluster with
> leader going down.
>
> Thanks
> Sushil Kumar
>


Issues with leader shutdown in a 3-node zookeeper cluster

2019-11-19 Thread Sushil Kumar
Hello


I am trying to run a 3-node zookeeper cluster.
It starts up good and I am able to access it.
However, as soon as I shutdown the leader, some other node out of
left-overs becomes a primary node which I believe is working as expected.

However, if I try to connect using the zkCli.sh in this state, it cannot
connect, it always remains in connecting state, and there is no way now
that I can access my zookeeper cluster.

The only way I have been able to fix is stop all nodes and start then in
sequence.

Couple of questions.
First of all that zkCli.sh behavior with the cluster does not looks
something a happy path to me. I doubt if my cluster is behaving good. Now
if this cluster is not working why does my cluster status appear working
"LEADER/FOLLOWER" for each left over node.

I tried this with 5-node cluster and noticed exactly the same behavior.
So I wonder how do people generally manage a working zookeeper cluster with
leader going down.

Thanks
Sushil Kumar