[jira] [Created] (ZOOKEEPER-3920) Zookeeper clients timeout after leader change

2020-08-26 Thread Andre Price (Jira)
Andre Price created ZOOKEEPER-3920:
--

 Summary: Zookeeper clients timeout after leader change
 Key: ZOOKEEPER-3920
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3920
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.6.1
Reporter: Andre Price
 Attachments: zk_repro.zip

[Sorry I believe this is a dupe of 
https://issues.apache.org/jira/browse/ZOOKEEPER-3828 and potentially 
https://issues.apache.org/jira/browse/ZOOKEEPER-3466 

But i am not able to attach files there for some reason so creating a new issue 
which hopefully allows me]

We are encountering an issue where failing over from the leader results in 
zookeeper clients not being able to connect successfully. They timeout waiting 
for a response from the server. We are attempting to upgrade some existing 
zookeeper clusters from 3.4.14 to 3.6.1 (not sure if relevant but stating 
incase it helps with pinpointing issue) which is effectively blocked by this 
issue. We perform the rolling upgrade (followers first then leader last) and it 
seems to go successfully by all indicators. But we end up in the state 
described in this issue where if the leader changes (either due to restart or 
stopping) the cluster does not seem able to start new sessions.

I've gathered some TRACE logs from our servers and will attach in the hopes 
they can help figure this out. 

Attached zk_repro.zip which contains the following:
 * zoo.cfg used in one of the instances (they are all the same except for the 
local server's ip being 0.0.0.0 in each)
 * zoo.cfg.dynamic.next (don't think this is used anywhere but is written by 
zookeeper at some point - I think when the first 3.6.1 container becomes leader 
based on the value – the file is in all containers and is the same in all 
servers)
 * s\{1,2,3}_zk.log - logs from each of the 3 servers. Estimated time of repro 
start indicated by "// REPRO START" text and whitespace in logs
 * repro_steps.txt - rough steps executed that result in the server logs 
attached

 

I'll summarize the repro here also:
 # Initially it appears to be a healthy 3 node ensemble all running 3.6.1. 
Server ids are 1,2,3 and 3 is the leader. Dynamic config/reconfiguration is 
disabled.
 # invoke srvr on each node (to verify setup and also create bookmark in logs)
 # Do a zkCli get of /zookeeper/quota  which succeeds
 # Do a restart of the leader (to same image/config) (server 2 now becomes 
leader, 3 is back as follower)
 # Try to perform the same zkCli get which times out (this get is done within 
the container)
 # Try to perform the same zkCli get but from another machine, this also times 
out
 # Invoke srvr on each node again (to verify that 2 is now the leader/bookmark)
 # Do a restart of server 2 (3 becomes leader, 2 follower)
 # Do a zkCli get of /zookeeper/quota which succeeds
 # Invoke srvr on each node again (to verify that 3 is leader)

I tried to keep the other ZK traffic to a minimum but there are likely some 
periodic mntr requests mixed from our metrics scraper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Issues on java lock recipe

2011-05-12 Thread Andre
Hello,

My colleagues and I are working with the java lock recipe implementation.

We think we found two bugs in the code:

1) The first one is reported on this jira
topichttps://issues.apache.org/jira/browse/ZOOKEEPER-645.
The issue is that the znodes used to control the lock are ordered by
sessionID first and then by the sequence number. As earlier connected
clients appear to have lower sessionID values than those connected latter,
who connects first gets the lock disregarding anyone who has already the
lock.
We've posted a patch on that jira but it has not yet been reviewed.

2) The other bug is when you try to unlock. When calling unlock(), either
having the lock held or waiting for it, the znode lock is removed. However,
if you are not holding the lock, there's still a zookeeper watcher waiting
on the next znode with lower sequence number, which is necessary to avoid
the heard effect on the recipe implementation. When watcher tells the lock
implementation that the watched znode has been removed, the lock recipe
calls lock(). What happens then is that a new znode lock is created so the
client is again (unwilling) waiting for the lock (or eventually is now
holding the lock). If that client doesn't do anything (such as unlocking it
over and over until eventually getting the lock and doing an ultimate
unlock) there would be a deadlock due to the client having the lock without
knowing it.

We've come up with a patch for this (it's attached to this email). Our
question is: should we post this patch on the same jira topic mentioned on
the beginning of this email or should we open a new topic for this issue?

Thanks,

Andre Esteve
http://www.lsd.ic.unicamp.br/mc715-1s2011/index.php/Main_Page (wiki in
Portuguese about our (and other's) works using zookeeper)
Index: WriteLock.java
===
--- WriteLock.java	(revision 1102068)
+++ WriteLock.java	(working copy)
@@ -152,7 +152,10 @@
 LOG.debug(Watcher fired on path:  + event.getPath() +  state:  + 
 event.getState() +  type  + event.getType());
 try {
-lock();
+// avoid locking when not waiting for it
+if (id != null) {
+lock();
+}
 } catch (Exception e) {
 LOG.warn(Failed to acquire lock:  + e, e);
 }


[jira] [Updated] (ZOOKEEPER-645) Bug in WriteLock recipe implementation?

2011-04-19 Thread Andre Esteve (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andre Esteve updated ZOOKEEPER-645:
---

Attachment: ZOOKEEPER-645-compareTo.patch

compareTo.patch aims to correct ordering of ZNodeName objects used to validate 
lock ownership.

The code at WriteLock gets a list of znodes and for each znode creates a 
ZNodeName object which is added to a sorted list. 
The sorting was based on the full znode name, i.e. 
x-sessionID-ephemeral_number. As earlier connected clients appear to have lower 
sessionID values than those which connected latter, who connects first gets the 
lock disregarding anyone who has already the lock.

This patch simply changes compareTo overload at ZNodeName to just consider the 
sequence number instead of the full znode name, as this class' objects are used 
only for this purpose, this seems to have done the trick =)

However, getSessionID not being thread-safe is still an issue.

Could someone try it out and post the results?

[A discussion about this bug and some other issues on lock recipe, as well as 
this patch contributors, can be found here (in Portuguese) 
http://www.lsd.ic.unicamp.br/mc715-1s2011/index.php/Grupo01]

 Bug in WriteLock recipe implementation?
 ---

 Key: ZOOKEEPER-645
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-645
 Project: ZooKeeper
  Issue Type: Bug
  Components: recipes
Affects Versions: 3.2.2
 Environment: 3.2.2 java 1.6.0_12
Reporter: Jaakko Laine
Assignee: Mahadev konar
Priority: Minor
 Fix For: 3.4.0

 Attachments: 645-fix-findPrefixInChildren.patch, 
 ZOOKEEPER-645-compareTo.patch


 Not sure, but there seem to be two issues in the example WriteLock:
 (1) ZNodeName is sorted according to session ID first, and then according to 
 znode sequence number. This might cause starvation as lower session IDs 
 always get priority. WriteLock is not thread-safe in the first place, so 
 having session ID involved in compare operation does not seem to make sense.
 (2) if findPrefixInChildren finds previous ID, it should add dir in front of 
 the ID

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira