[jira] [Commented] (ZOOKEEPER-3036) Unexpected exception in zookeeper

2021-03-22 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306375#comment-17306375
 ] 

Lasaro Camargos commented on ZOOKEEPER-3036:


This issue still happens in 3.5.8, on a 3 node cluster.
Are there any plans to address this issue?

> Unexpected exception in zookeeper
> -
>
> Key: ZOOKEEPER-3036
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.4.10
> Environment: 3 Zookeepers, 5 kafka servers
>Reporter: Oded
>Priority: Critical
>
> We got an issue with one of the zookeeprs (Leader), causing the entire kafka 
> cluster to fail:
> 2018-05-09 02:29:01,730 [myid:3] - ERROR 
> [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648] - Unexpected 
> exception causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
>     at java.net.SocketInputStream.socketRead0(Native Method)
>     at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>     at java.net.SocketInputStream.read(SocketInputStream.java:171)
>     at java.net.SocketInputStream.read(SocketInputStream.java:141)
>     at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>     at java.io.DataInputStream.readInt(DataInputStream.java:387)
>     at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>     at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>     at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>     at 
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
> 2018-05-09 02:29:01,730 [myid:3] - WARN  
> [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661] - *** GOODBYE 
> /192.168.0.91:42490 
>  
> We would expect that zookeeper will choose another Leader and the Kafka 
> cluster will continue to work as expected, but that was not the case.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3775) Wrong message in IOException

2020-04-08 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078718#comment-17078718
 ] 

Lasaro Camargos commented on ZOOKEEPER-3775:


Thank you, [~phunt]. But [~shireennagdive] has already provided the PR and is 
probably going to be more active in the project than me. Could you add her as a 
contributor as well so I can reassign to her?

Regards

> Wrong message in IOException
> 
>
> Key: ZOOKEEPER-3775
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3775
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Reporter: Lasaro Camargos
>Assignee: Lasaro Camargos
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> method run of QuorumCnxManager throws the following exception:
> if (length <= 0 || length > PACKETMAXSIZE) {
>      throw new IOException("Received packet with invalid packet: " + length);
>  }
> Instead of the current string, the cause should be "Received packet with 
> invalid length: "



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3775) Wrong message in IOException

2020-04-08 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078609#comment-17078609
 ] 

Lasaro Camargos commented on ZOOKEEPER-3775:


Hi Shireen. I am not well versed in the processes followed by this project.

[~phunt], as an active member, could you chip in here? Maybe assign the bug and 
trigger the build?

Cheers

> Wrong message in IOException
> 
>
> Key: ZOOKEEPER-3775
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3775
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Reporter: Lasaro Camargos
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> method run of QuorumCnxManager throws the following exception:
> if (length <= 0 || length > PACKETMAXSIZE) {
>      throw new IOException("Received packet with invalid packet: " + length);
>  }
> Instead of the current string, the cause should be "Received packet with 
> invalid length: "



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3775) Wrong message in IOException

2020-04-05 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17075945#comment-17075945
 ] 

Lasaro Camargos commented on ZOOKEEPER-3775:


Hi Shireen. I do not have the credentials needed to assign the JIRA. But given 
that it is a fairly simple issue, I would recommend that you create a PR and 
post it here and someone else will do it.

Cheers.

> Wrong message in IOException
> 
>
> Key: ZOOKEEPER-3775
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3775
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Reporter: Lasaro Camargos
>Priority: Trivial
>
> method run of QuorumCnxManager throws the following exception:
> if (length <= 0 || length > PACKETMAXSIZE) {
>      throw new IOException("Received packet with invalid packet: " + length);
>  }
> Instead of the current string, the cause should be "Received packet with 
> invalid length: "



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-04-03 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074812#comment-17074812
 ] 

Lasaro Camargos commented on ZOOKEEPER-3769:


thanks for driving this, [~symat]

 

> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Assignee: Mate Szalay-Beko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.1, 3.5.8
>
> Attachments: node1.log, node2.log, node3.log
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/company/service/data
>  dataLogDir=/company/service/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could 
> be causing it.
>  
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3775) Wrong message in IOException

2020-03-30 Thread Lasaro Camargos (Jira)
Lasaro Camargos created ZOOKEEPER-3775:
--

 Summary: Wrong message in IOException
 Key: ZOOKEEPER-3775
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3775
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Reporter: Lasaro Camargos


method run of QuorumCnxManager throws the following exception:

if (length <= 0 || length > PACKETMAXSIZE) {
     throw new IOException("Received packet with invalid packet: " + length);
 }

Instead of the current string, the cause should be "Received packet with 
invalid length: "



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-29 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070311#comment-17070311
 ] 

Lasaro Camargos commented on ZOOKEEPER-3769:


I had to backtrack on it happening with Netty. The factory was
misconfigured and it was actually running on NIO.
Regarding the version, I tried 3.5.5 and 3.5.7.

Lásaro


On Sun, Mar 29, 2020 at 4:26 AM ASF GitHub Bot (Jira) 



> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Assignee: Mate Szalay-Beko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.1, 3.5.8
>
> Attachments: node1.log, node2.log, node3.log
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/company/service/data
>  dataLogDir=/company/service/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could 
> be causing it.
>  
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-27 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068985#comment-17068985
 ] 

Lasaro Camargos commented on ZOOKEEPER-3769:


[~symat], thanks for the updated patch. I gave it a spin and it is working, as 
in its not regressing anything else. I cannot confirm that it handles the issue 
I had as I still haven't managed to reproduce.
Trying to answer your questions,
 # There is nothing particular to this setup; all are physical boxes, running 
on the same network, OS (centos 7) and java version (12)
 # During the time the problem reproduced, I had multiple runs in which I just 
restarted the service, but also runs in which I cleaned the setup. It 
consistently reproduced, until it didn't. Whatever it was, it doesn't seem 
related to the snapshots.
 # Regarding dynamic reconfiguration, no, I haven't used it in this setup.
 # You had asked me if I had tried Netty. Please ignore my previous response. I 
didn't try it while the problem still reproduced.

Even if I cannot reproduce, I still think this is a fix worth having. Please 
submit the PR.

Should I change the Jira name to better reflect what actually happened?

 

> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Assignee: Mate Szalay-Beko
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/company/service/data
>  dataLogDir=/company/service/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could 
> be causing it.
>  
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-26 Thread Lasaro Camargos (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lasaro Camargos updated ZOOKEEPER-3769:
---
Description: 
In a cluster with three nodes, node3 is the leader and the other nodes are 
followers.

If I stop node3, the other two nodes do not finish the leader election.

This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and this 
config

 

tickTime=2000
 initLimit=30
 syncLimit=3
 dataDir=/company/service/data
 dataLogDir=/company/service/log
 clientPort=2181
 snapCount=10
 autopurge.snapRetainCount=3
 autopurge.purgeInterval=1
 skipACL=yes
 preAllocSize=65536
 maxClientCnxns=0
 4lw.commands.whitelist=*
 admin.enableServer=false

server.1=companydemo1.snc4.companyinc.com:3000:4000
 server.2=companydemo2.snc4.companyinc.com:3000:4000
 server.3=companydemo3.snc4.companyinc.com:3000:4000

 

Could you have a look at the logs and help me figure this out? It seems like 
node 1 is not getting notifications back from node2, but I don't see anything 
wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could be 
causing it.

 

In the logs, node3 is killed at 11:17:14

node2 is killed at 11:17:50 2 and node 1 at 11:18:02 

 

 

 

  was:
In a cluster with three nodes, node3 is the leader and the other nodes are 
followers.

If I stop node3, the other two nodes do not finish the leader election.

This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and this 
config

 

tickTime=2000
 initLimit=30
 syncLimit=3
 dataDir=/hedvig/hpod/data
 dataLogDir=/hedvig/hpod/log
 clientPort=2181
 snapCount=10
 autopurge.snapRetainCount=3
 autopurge.purgeInterval=1
 skipACL=yes
 preAllocSize=65536
 maxClientCnxns=0
 4lw.commands.whitelist=*
 admin.enableServer=false

server.1=companydemo1.snc4.companyinc.com:3000:4000
 server.2=companydemo2.snc4.companyinc.com:3000:4000
 server.3=companydemo3.snc4.companyinc.com:3000:4000

 

Could you have a look at the logs and help me figure this out? It seems like 
node 1 is not getting notifications back from node2, but I don't see anything 
wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could be 
causing it.

 

In the logs, node3 is killed at 11:17:14

node2 is killed at 11:17:50 2 and node 1 at 11:18:02 

 

 

 


> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Assignee: Mate Szalay-Beko
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/company/service/data
>  dataLogDir=/company/service/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could 
> be causing it.
>  
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-26 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068032#comment-17068032
 ] 

Lasaro Camargos commented on ZOOKEEPER-3769:


I went back and looked into some older logs and could confirm that the 
WorkerReceiver died and that's what caused the election to hang. However, the 
BufferUnderflowException was present in very few instances. Most of the time, 
it was a NegativeArraySizeException that was caught, but pretty much in the 
same situation, that is, after the connection being broken to node3. The 
following are excerpts from node1 and node 3. Let me know if you would like to 
have a look at the full logs.

03/23/20 10:14:45,772 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO 
[org.apache.zookeeper.server.ZooKeeperServer] (ZooKeeperServer.java:166) - 
Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 
4 datadir /company/service/log/version-2 snapdir 
/company/service/data/version-2

03/23/20 10:14:45,772 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO 
[org.apache.zookeeper.server.quorum.Learner] (Follower.java:69) - FOLLOWING - 
LEADER ELECTION TOOK - 9 MS

03/23/20 10:14:45,774 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] DEBUG 
[org.apache.zookeeper.server.quorum.QuorumPeer] (QuorumPeer.java:202) - 
Resolved address for companydemo3.snc4.companyinc.com: 
companydemo3.snc4.companyinc.com/172.22.64.148

03/23/20 10:14:45,793 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE 
[org.apache.zookeeper.server.quorum.Learner] (ZooTrace.java:71) - i UNKNOWN17 
5 null

03/23/20 10:14:45,798 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE 
[org.apache.zookeeper.server.quorum.Learner] (ZooTrace.java:71) - i DIFF 
4001f null

03/23/20 10:14:45,799 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO 
[org.apache.zookeeper.server.quorum.Learner] (Learner.java:391) - Getting a 
diff from the leader 0x4001f

03/23/20 10:14:45,801 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE 
[org.apache.zookeeper.server.quorum.Learner] (ZooTrace.java:71) - i NEWLEADER 
5 null

03/23/20 10:14:45,801 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO 
[org.apache.zookeeper.server.quorum.Learner] (Learner.java:546) - Learner 
received NEWLEADER message

03/23/20 10:14:45,815 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE 
[org.apache.zookeeper.server.quorum.Learner] (ZooTrace.java:71) - i UPTODATE 
 null

03/23/20 10:14:45,816 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO 
[org.apache.zookeeper.server.quorum.Learner] (Learner.java:529) - Learner 
received UPTODATE message

03/23/20 10:14:45,816 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] DEBUG 
[org.apache.zookeeper.server.quorum.QuorumPeer] (QuorumPeer.java:1916) - 
Reconfig feature is disabled, skip reconfig processing.

03/23/20 10:14:45,817 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO 
[org.apache.zookeeper.server.quorum.CommitProcessor] (CommitProcessor.java:256) 
- Configuring CommitProcessor with 32 worker threads.

03/23/20 10:14:46,064 [companydemo1.snc4.companyinc.com/172.22.65.65:4000] INFO 
[org.apache.zookeeper.server.quorum.QuorumCnxManager] 
(QuorumCnxManager.java:924) - Received connection request 172.22.30.98:58472

03/23/20 10:14:46,064 [companydemo1.snc4.companyinc.com/172.22.65.65:4000] 
DEBUG [org.apache.zookeeper.server.quorum.QuorumCnxManager] 
(QuorumCnxManager.java:1038) - Address of remote peer: 3

03/23/20 10:14:46,064 [companydemo1.snc4.companyinc.com/172.22.65.65:4000] 
DEBUG [org.apache.zookeeper.server.quorum.QuorumCnxManager] 
(QuorumCnxManager.java:1055) - Calling finish for 3

03/23/20 10:14:46,064 [companydemo1.snc4.companyinc.com/172.22.65.65:4000] 
DEBUG [org.apache.zookeeper.server.quorum.QuorumCnxManager] 
(QuorumCnxManager.java:1072) - Removing entry from senderWorkerMap sid=3

03/23/20 10:14:46,065 [SendWorker:3] WARN 
[org.apache.zookeeper.server.quorum.QuorumCnxManager] 
(QuorumCnxManager.java:1143) - Interrupted while waiting for message on queue

java.lang.InterruptedException: null

at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
 ~[?:?]

at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133)
 ~[?:?]

at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) 
~[?:?]

at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
 ~[zookeeper-3.5.7.jar:3.5.7]

at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.acce

[jira] [Comment Edited] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-25 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066943#comment-17066943
 ] 

Lasaro Camargos edited comment on ZOOKEEPER-3769 at 3/25/20, 7:20 PM:
--

Thank you for the analysis, [~symat].

Wrt to testing with NETTY, before trying SASL I did try just NETTY, but the 
behavior was exactly the same.

Wrt to using an older JDK, I reverted all my changes to the configs and put 
back the original version, 3.5.5, but didn't get to try other JDK. The problem 
no longer reproduces and I am still trying to figure if/what I am missing that 
might have changed the setup.

 

Regarding not handling the BufferUnderflowException properly, yes, it makes 
sense; the thread died and wasn't recreated so no more messages were ever 
received.

 

 


was (Author: lasaro):
Thank you for the analysis, [~symat].

Wrt to testing with NETTY, before trying SASL I did try just NETTY, but the 
behavior was exactly the same.

Wrt to using an older JDK, I reverted all my changes to the configs and put 
back the original version, 3.5.5, but didn't get to try other JDK. The problem 
no longer reproduces and I am still trying to figure if/what I am missing that 
might have changed the setup.

 

> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Assignee: Mate Szalay-Beko
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/hedvig/hpod/data
>  dataLogDir=/hedvig/hpod/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could 
> be causing it.
>  
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-25 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066943#comment-17066943
 ] 

Lasaro Camargos commented on ZOOKEEPER-3769:


Thank you for the analysis, [~symat].

Wrt to testing with NETTY, before trying SASL I did try just NETTY, but the 
behavior was exactly the same.

Wrt to using an older JDK, I reverted all my changes to the configs and put 
back the original version, 3.5.5, but didn't get to try other JDK. The problem 
no longer reproduces and I am still trying to figure if/what I am missing that 
might have changed the setup.

 

> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Assignee: Mate Szalay-Beko
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/hedvig/hpod/data
>  dataLogDir=/hedvig/hpod/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could 
> be causing it.
>  
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-24 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066245#comment-17066245
 ] 

Lasaro Camargos edited comment on ZOOKEEPER-3769 at 3/24/20, 11:02 PM:
---

After I enabled SASL in order to force the asynchronous creation of sockets, 
the problem no longer reproduces. Hence I am guessing this might be related to 
ZOOKEEPER-900


was (Author: lasaro):
After I enabled SASL in order to force the asynchronous creation of sockets, 
the problem no longer reproduces.

> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Assignee: Mate Szalay-Beko
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/hedvig/hpod/data
>  dataLogDir=/hedvig/hpod/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could 
> be causing it.
>  
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-24 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066245#comment-17066245
 ] 

Lasaro Camargos commented on ZOOKEEPER-3769:


After I enabled SASL in order to force the asynchronous creation of sockets, 
the problem no longer reproduces.

> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Assignee: Mate Szalay-Beko
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/hedvig/hpod/data
>  dataLogDir=/hedvig/hpod/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could 
> be causing it.
>  
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-24 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066185#comment-17066185
 ] 

Lasaro Camargos commented on ZOOKEEPER-3769:


Just to complement on the behavior (not covered by the logs), if I bring node 3 
back up, it becomes the leader, 2 a follower, and 1 does not finish the 
election.

If I stop and restart node 1, then it joins the cluster successfully.

It seems like the connection from 1 to 2 needs a "refresh" in order to work 
properly.

> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Assignee: Mate Szalay-Beko
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/hedvig/hpod/data
>  dataLogDir=/hedvig/hpod/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could 
> be causing it.
>  
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-24 Thread Lasaro Camargos (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lasaro Camargos updated ZOOKEEPER-3769:
---
Description: 
In a cluster with three nodes, node3 is the leader and the other nodes are 
followers.

If I stop node3, the other two nodes do not finish the leader election.

This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and this 
config

 

tickTime=2000
 initLimit=30
 syncLimit=3
 dataDir=/hedvig/hpod/data
 dataLogDir=/hedvig/hpod/log
 clientPort=2181
 snapCount=10
 autopurge.snapRetainCount=3
 autopurge.purgeInterval=1
 skipACL=yes
 preAllocSize=65536
 maxClientCnxns=0
 4lw.commands.whitelist=*
 admin.enableServer=false

server.1=companydemo1.snc4.companyinc.com:3000:4000
 server.2=companydemo2.snc4.companyinc.com:3000:4000
 server.3=companydemo3.snc4.companyinc.com:3000:4000

 

Could you have a look at the logs and help me figure this out? It seems like 
node 1 is not getting notifications back from node2, but I don't see anything 
wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could be 
causing it.

 

In the logs, node3 is killed at 11:17:14

node2 is killed at 11:17:50 2 and node 1 at 11:18:02 

 

 

 

  was:
In a cluster with three nodes, node3 is the leader and the other nodes are 
followers.

If I stop node3, the other two nodes do not finish the leader election.

This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and this 
config

 

tickTime=2000
 initLimit=30
 syncLimit=3
 dataDir=/hedvig/hpod/data
 dataLogDir=/hedvig/hpod/log
 clientPort=2181
 snapCount=10
 autopurge.snapRetainCount=3
 autopurge.purgeInterval=1
 skipACL=yes
 preAllocSize=65536
 maxClientCnxns=0
 4lw.commands.whitelist=*
 admin.enableServer=false

server.1=companydemo1.snc4.companyinc.com:3000:4000
 server.2=companydemo2.snc4.companyinc.com:3000:4000
 server.3=companydemo3.snc4.companyinc.com:3000:4000

 

Could you have a look at the logs and help me figure this out? It seems like 
node 1 is not getting notifications back from node2, but I don't see anything 
wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could be 
causing it.

 

 


> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Assignee: Mate Szalay-Beko
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/hedvig/hpod/data
>  dataLogDir=/hedvig/hpod/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could 
> be causing it.
>  
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3756) Members failing to rejoin quorum

2020-03-24 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066127#comment-17066127
 ] 

Lasaro Camargos commented on ZOOKEEPER-3756:


Thanks for the feedback. I've opened ZOOKEEPER-3769 with a slightly different 
scenario but problematic in the same sense. To give the complete answer, I am 
not using 0.0.0.0 addresses (not explicitly, at least) and not using containers.

[~symat], I appreciate your willingness to look into it. It's been troubling me 
for some time.

> Members failing to rejoin quorum
> 
>
> Key: ZOOKEEPER-3756
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3756
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: leaderElection
>Affects Versions: 3.5.6, 3.5.7
>Reporter: Dai Shi
>Assignee: Mate Szalay-Beko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.1, 3.5.8
>
> Attachments: Dockerfile, configmap.yaml, docker-entrypoint.sh, 
> jmx.yaml, zoo-0.log, zoo-1.log, zoo-2.log, zoo-service.yaml, zookeeper.yaml
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Not sure if this is the place to ask, please close if it's not.
> I am seeing some behavior that I can't explain since upgrading to 3.5:
> In a 5 member quorum, when server 3 is the leader and each server has this in 
> their configuration: 
> {code:java}
> server.1=100.71.255.254:2888:3888:participant;2181
> server.2=100.71.255.253:2888:3888:participant;2181
> server.3=100.71.255.252:2888:3888:participant;2181
> server.4=100.71.255.251:2888:3888:participant;2181
> server.5=100.71.255.250:2888:3888:participant;2181{code}
> If servers 1 or 2 are restarted, they fail to rejoin the quorum with this in 
> the logs:
> {code:java}
> 2020-03-11 20:23:35,720 [myid:2] - INFO  
> [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):QuorumPeer@1175] - 
> LOOKING
> 2020-03-11 20:23:35,721 [myid:2] - INFO  
> [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):FastLeaderElection@885]
>  - New election. My id =  2, proposed zxid=0x1b8005f4bba
> 2020-03-11 20:23:35,733 [myid:2] - INFO  
> [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, 
> so dropping the connection: (3, 2)
> 2020-03-11 20:23:35,734 [myid:2] - INFO  
> [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection 
> request 100.126.116.201:36140
> 2020-03-11 20:23:35,735 [myid:2] - INFO  
> [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, 
> so dropping the connection: (4, 2)
> 2020-03-11 20:23:35,740 [myid:2] - INFO  
> [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, 
> so dropping the connection: (5, 2)
> 2020-03-11 20:23:35,740 [myid:2] - INFO  
> [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection 
> request 100.126.116.201:36142
> 2020-03-11 20:23:35,740 [myid:2] - INFO  
> [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message 
> format version), 2 (n.leader), 0x1b8005f4bba (n.zxid), 0x1 (n.round), LOOKING 
> (n.state), 2 (n.sid), 0x1b8 (n.peerEPoch), LOOKING (my state)0 (n.config 
> version)
> 2020-03-11 20:23:35,742 [myid:2] - WARN  
> [SendWorker:3:QuorumCnxManager$SendWorker@1143] - Interrupted while waiting 
> for message on queue
> java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
> at 
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
> at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
> at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82)
> at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131)
> 2020-03-11 20:23:35,744 [myid:2] - WARN  
> [SendWorker:3:QuorumCnxManager$SendWorker@1153] - Send worker leaving thread  
> id 3 my id = 2
> 2020-03-11 20:23:35,745 [myid:2] - WARN  
> [RecvWorker:3:QuorumCnxManager$RecvWorker@1230] - Interrupting 
> SendWorker{code}
> The only way I can seem to get them to rejoin the quorum is to restart the 
> leader.
> However, if I remove server 4 and 5 from the configuration of server 1 or 2 
> (so only servers 1, 2, and 3 remain in the configuration file), then they can 
> rejoin the quorum fine. Is this expected and am I doing something wrong? Any 
> help or explanation would be greatly appreciated. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-24 Thread Lasaro Camargos (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lasaro Camargos updated ZOOKEEPER-3769:
---
Description: 
In a cluster with three nodes, node3 is the leader and the other nodes are 
followers.

If I stop node3, the other two nodes do not finish the leader election.

This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and this 
config

 

tickTime=2000
 initLimit=30
 syncLimit=3
 dataDir=/hedvig/hpod/data
 dataLogDir=/hedvig/hpod/log
 clientPort=2181
 snapCount=10
 autopurge.snapRetainCount=3
 autopurge.purgeInterval=1
 skipACL=yes
 preAllocSize=65536
 maxClientCnxns=0
 4lw.commands.whitelist=*
 admin.enableServer=false

server.1=companydemo1.snc4.companyinc.com:3000:4000
 server.2=companydemo2.snc4.companyinc.com:3000:4000
 server.3=companydemo3.snc4.companyinc.com:3000:4000

 

Could you have a look at the logs and help me figure this out? It seems like 
node 1 is not getting notifications back from node2, but I don't see anything 
wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could be 
causing it.

 

 

  was:
In a cluster with three nodes, node3 is the leader and the other nodes are 
followers.

If I stop node3, the other two nodes do not finish the leader election.

This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and this 
config

 

tickTime=2000
 initLimit=30
 syncLimit=3
 dataDir=/hedvig/hpod/data
 dataLogDir=/hedvig/hpod/log
 clientPort=2181
 snapCount=10
 autopurge.snapRetainCount=3
 autopurge.purgeInterval=1
 skipACL=yes
 preAllocSize=65536
 maxClientCnxns=0
 4lw.commands.whitelist=*
 admin.enableServer=false

server.1=companydemo1.snc4.companyinc.com:3000:4000
 server.2=companydemo2.snc4.companyinc.com:3000:4000
 server.3=companydemo3.snc4.companyinc.com:3000:4000

 

Could you have a look at the logs and help me figure this out? It seems like 
node 1 is not getting notifications back from node2, but I don't see anything 
wrong with the network so I am wondering if bugs like  

 

 


> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/hedvig/hpod/data
>  dataLogDir=/hedvig/hpod/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could 
> be causing it.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-24 Thread Lasaro Camargos (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lasaro Camargos updated ZOOKEEPER-3769:
---
Description: 
In a cluster with three nodes, node3 is the leader and the other nodes are 
followers.

If I stop node3, the other two nodes do not finish the leader election.

This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and this 
config

 

tickTime=2000
 initLimit=30
 syncLimit=3
 dataDir=/hedvig/hpod/data
 dataLogDir=/hedvig/hpod/log
 clientPort=2181
 snapCount=10
 autopurge.snapRetainCount=3
 autopurge.purgeInterval=1
 skipACL=yes
 preAllocSize=65536
 maxClientCnxns=0
 4lw.commands.whitelist=*
 admin.enableServer=false

server.1=companydemo1.snc4.companyinc.com:3000:4000
 server.2=companydemo2.snc4.companyinc.com:3000:4000
 server.3=companydemo3.snc4.companyinc.com:3000:4000

 

Could you have a look at the logs and help me figure this out? It seems like 
node 1 is not getting notifications back from node2, but I don't see anything 
wrong with the network so I am wondering if bugs like  

 

 

  was:
In a cluster with three nodes, node3 is the leader and the other nodes are 
followers.

If I stop node3, the other two nodes do not finish the leader election.

This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and this 
config

 

tickTime=2000
initLimit=30
syncLimit=3
dataDir=/hedvig/hpod/data
dataLogDir=/hedvig/hpod/log
clientPort=2181
snapCount=10
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
skipACL=yes
preAllocSize=65536
maxClientCnxns=0
4lw.commands.whitelist=*
admin.enableServer=false

server.1=companydemo1.snc4.companyinc.com:3000:4000
server.2=companydemo2.snc4.companyinc.com:3000:4000
server.3=companydemo3.snc4.companyinc.com:3000:4000

 

 

 

 


> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/hedvig/hpod/data
>  dataLogDir=/hedvig/hpod/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-24 Thread Lasaro Camargos (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lasaro Camargos updated ZOOKEEPER-3769:
---
Description: 
In a cluster with three nodes, node3 is the leader and the other nodes are 
followers.

If I stop node3, the other two nodes do not finish the leader election.

This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and this 
config

 

tickTime=2000
initLimit=30
syncLimit=3
dataDir=/hedvig/hpod/data
dataLogDir=/hedvig/hpod/log
clientPort=2181
snapCount=10
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
skipACL=yes
preAllocSize=65536
maxClientCnxns=0
4lw.commands.whitelist=*
admin.enableServer=false

server.1=companydemo1.snc4.companyinc.com:3000:4000
server.2=companydemo2.snc4.companyinc.com:3000:4000
server.3=companydemo3.snc4.companyinc.com:3000:4000

 

 

 

 

  was:
In a cluster with three nodes, node3 is the leader and the other nodes are 
followers.

If I stop node3, the other two nodes do not finish the leader election.

 

 

 


> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
> initLimit=30
> syncLimit=3
> dataDir=/hedvig/hpod/data
> dataLogDir=/hedvig/hpod/log
> clientPort=2181
> snapCount=10
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
> skipACL=yes
> preAllocSize=65536
> maxClientCnxns=0
> 4lw.commands.whitelist=*
> admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
> server.2=companydemo2.snc4.companyinc.com:3000:4000
> server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-24 Thread Lasaro Camargos (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lasaro Camargos updated ZOOKEEPER-3769:
---
Attachment: node1.log
node2.log
node3.log

> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-24 Thread Lasaro Camargos (Jira)
Lasaro Camargos created ZOOKEEPER-3769:
--

 Summary: fast leader election does not end if leader is taken down
 Key: ZOOKEEPER-3769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.5.7
Reporter: Lasaro Camargos


In a cluster with three nodes, node3 is the leader and the other nodes are 
followers.

If I stop node3, the other two nodes do not finish the leader election.

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ZOOKEEPER-3756) Members failing to rejoin quorum

2020-03-23 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065017#comment-17065017
 ] 

Lasaro Camargos edited comment on ZOOKEEPER-3756 at 3/23/20, 6:56 PM:
--

Dear all,

currently, I am consistently facing the following scenario while running 3.5.5 
and 3.5.7, which I believe is related to this bug:

3 nodes up.
 Node 3 stop -> node 2 is elected; node 1 follows.
 Node 3 start -> node 3 elected the leader; node 2 follows; node 1 is unable to 
elect.

Node 1 stop and start -> node 1 rejoins the quorum.
 Node 2 stop and start -> node 2 is unable to elect.
 Node 1 stop and start -> node 2 joins the quorum; node 1 joins the quorum
 Node 2 stop and start -> node 2 unable to join the quorum
 Node 3 stop and start -> node 3 elected the leader; node 2 follows; node 1 is 
unable to elect.

Reducing the cnxTimeout value didn't change the behavior.

I tested with this fix and now it is now worse; after a round of restarts, 
there doesn't seem to have anything I can to make node 1 join finish the 
election.

This is such a nasty problem that I am wondering if there is something else to 
it. Maybe my configuration. Could you point me what would be useful in terms of 
information in order to debug this better? Full logs?

 


was (Author: lasaro):
Dear all,

currently, I am consistently facing the following scenario while running 3.5.5 
and 3.5.7, which I believe is related to this bug:

3 nodes up.
Node 3 stop -> node 2 is elected; node 1 follows.
Node 3 start -> node 3 elected the leader; node 2 follows; node 1 is unable to 
elect.

Node 1 stop and start -> node 1 rejoins the quorum.
Node 2 stop and start -> node 2 is unable to elect.
Node 1 stop and start -> node 2 joins the quorum; node 1 joins the quorum
Node 2 stop and start -> node 2 unable to join the quorum
Node 3 stop and start -> node 3 elected the leader; node 2 follows; node 1 is 
unable to elect.

Reducing the cnxTimeout value didn't change the behavior.

I tested with this fix and now it is now worse; after a round of restarts, 
there doesn't seem to have anything I can to make node 1 join finish the 
election.

 

> Members failing to rejoin quorum
> 
>
> Key: ZOOKEEPER-3756
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3756
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: leaderElection
>Affects Versions: 3.5.6, 3.5.7
>Reporter: Dai Shi
>Assignee: Mate Szalay-Beko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.1
>
> Attachments: Dockerfile, configmap.yaml, docker-entrypoint.sh, 
> jmx.yaml, zoo-0.log, zoo-1.log, zoo-2.log, zoo-service.yaml, zookeeper.yaml
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Not sure if this is the place to ask, please close if it's not.
> I am seeing some behavior that I can't explain since upgrading to 3.5:
> In a 5 member quorum, when server 3 is the leader and each server has this in 
> their configuration: 
> {code:java}
> server.1=100.71.255.254:2888:3888:participant;2181
> server.2=100.71.255.253:2888:3888:participant;2181
> server.3=100.71.255.252:2888:3888:participant;2181
> server.4=100.71.255.251:2888:3888:participant;2181
> server.5=100.71.255.250:2888:3888:participant;2181{code}
> If servers 1 or 2 are restarted, they fail to rejoin the quorum with this in 
> the logs:
> {code:java}
> 2020-03-11 20:23:35,720 [myid:2] - INFO  
> [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):QuorumPeer@1175] - 
> LOOKING
> 2020-03-11 20:23:35,721 [myid:2] - INFO  
> [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):FastLeaderElection@885]
>  - New election. My id =  2, proposed zxid=0x1b8005f4bba
> 2020-03-11 20:23:35,733 [myid:2] - INFO  
> [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, 
> so dropping the connection: (3, 2)
> 2020-03-11 20:23:35,734 [myid:2] - INFO  
> [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection 
> request 100.126.116.201:36140
> 2020-03-11 20:23:35,735 [myid:2] - INFO  
> [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, 
> so dropping the connection: (4, 2)
> 2020-03-11 20:23:35,740 [myid:2] - INFO  
> [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, 
> so dropping the connection: (5, 2)
> 2020-03-11 20:23:35,740 [myid:2] - INFO  
> [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection 
> request 100.126.116.201:36142
> 2020-03-11 20:23:35,740 [myid:2] - INFO  
> [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message 
> format version), 2 (n.leader), 0x1b8005f4bba (n.zxid), 0x1 (n.round), LOOKING 
> (n.state), 2 (n.sid), 0x1b8 (n.peerEPoch), LOOKING (my state)0 (n.co

[jira] [Commented] (ZOOKEEPER-3756) Members failing to rejoin quorum

2020-03-23 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065017#comment-17065017
 ] 

Lasaro Camargos commented on ZOOKEEPER-3756:


Dear all,

currently, I am consistently facing the following scenario while running 3.5.5 
and 3.5.7, which I believe is related to this bug:

3 nodes up.
Node 3 stop -> node 2 is elected; node 1 follows.
Node 3 start -> node 3 elected the leader; node 2 follows; node 1 is unable to 
elect.

Node 1 stop and start -> node 1 rejoins the quorum.
Node 2 stop and start -> node 2 is unable to elect.
Node 1 stop and start -> node 2 joins the quorum; node 1 joins the quorum
Node 2 stop and start -> node 2 unable to join the quorum
Node 3 stop and start -> node 3 elected the leader; node 2 follows; node 1 is 
unable to elect.

Reducing the cnxTimeout value didn't change the behavior.

I tested with this fix and now it is now worse; after a round of restarts, 
there doesn't seem to have anything I can to make node 1 join finish the 
election.

 

> Members failing to rejoin quorum
> 
>
> Key: ZOOKEEPER-3756
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3756
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: leaderElection
>Affects Versions: 3.5.6, 3.5.7
>Reporter: Dai Shi
>Assignee: Mate Szalay-Beko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.1
>
> Attachments: Dockerfile, configmap.yaml, docker-entrypoint.sh, 
> jmx.yaml, zoo-0.log, zoo-1.log, zoo-2.log, zoo-service.yaml, zookeeper.yaml
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Not sure if this is the place to ask, please close if it's not.
> I am seeing some behavior that I can't explain since upgrading to 3.5:
> In a 5 member quorum, when server 3 is the leader and each server has this in 
> their configuration: 
> {code:java}
> server.1=100.71.255.254:2888:3888:participant;2181
> server.2=100.71.255.253:2888:3888:participant;2181
> server.3=100.71.255.252:2888:3888:participant;2181
> server.4=100.71.255.251:2888:3888:participant;2181
> server.5=100.71.255.250:2888:3888:participant;2181{code}
> If servers 1 or 2 are restarted, they fail to rejoin the quorum with this in 
> the logs:
> {code:java}
> 2020-03-11 20:23:35,720 [myid:2] - INFO  
> [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):QuorumPeer@1175] - 
> LOOKING
> 2020-03-11 20:23:35,721 [myid:2] - INFO  
> [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):FastLeaderElection@885]
>  - New election. My id =  2, proposed zxid=0x1b8005f4bba
> 2020-03-11 20:23:35,733 [myid:2] - INFO  
> [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, 
> so dropping the connection: (3, 2)
> 2020-03-11 20:23:35,734 [myid:2] - INFO  
> [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection 
> request 100.126.116.201:36140
> 2020-03-11 20:23:35,735 [myid:2] - INFO  
> [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, 
> so dropping the connection: (4, 2)
> 2020-03-11 20:23:35,740 [myid:2] - INFO  
> [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, 
> so dropping the connection: (5, 2)
> 2020-03-11 20:23:35,740 [myid:2] - INFO  
> [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection 
> request 100.126.116.201:36142
> 2020-03-11 20:23:35,740 [myid:2] - INFO  
> [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message 
> format version), 2 (n.leader), 0x1b8005f4bba (n.zxid), 0x1 (n.round), LOOKING 
> (n.state), 2 (n.sid), 0x1b8 (n.peerEPoch), LOOKING (my state)0 (n.config 
> version)
> 2020-03-11 20:23:35,742 [myid:2] - WARN  
> [SendWorker:3:QuorumCnxManager$SendWorker@1143] - Interrupted while waiting 
> for message on queue
> java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
> at 
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
> at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
> at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82)
> at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131)
> 2020-03-11 20:23:35,744 [myid:2] - WARN  
> [SendWorker:3:QuorumCnxManager$SendWorker@1153] - Send worker leaving thread  
> id 3 my id = 2
> 2020-03-11 20:23:35,745 [myid:2] - WARN  
> [RecvWorker:3:QuorumCnxManager$RecvWorker@1230] - Interruptin