[jira] [Created] (ZOOKEEPER-3920) Zookeeper clients timeout after leader change
Andre Price created ZOOKEEPER-3920: -- Summary: Zookeeper clients timeout after leader change Key: ZOOKEEPER-3920 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3920 Project: ZooKeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.6.1 Reporter: Andre Price Attachments: zk_repro.zip [Sorry I believe this is a dupe of https://issues.apache.org/jira/browse/ZOOKEEPER-3828 and potentially https://issues.apache.org/jira/browse/ZOOKEEPER-3466 But i am not able to attach files there for some reason so creating a new issue which hopefully allows me] We are encountering an issue where failing over from the leader results in zookeeper clients not being able to connect successfully. They timeout waiting for a response from the server. We are attempting to upgrade some existing zookeeper clusters from 3.4.14 to 3.6.1 (not sure if relevant but stating incase it helps with pinpointing issue) which is effectively blocked by this issue. We perform the rolling upgrade (followers first then leader last) and it seems to go successfully by all indicators. But we end up in the state described in this issue where if the leader changes (either due to restart or stopping) the cluster does not seem able to start new sessions. I've gathered some TRACE logs from our servers and will attach in the hopes they can help figure this out. Attached zk_repro.zip which contains the following: * zoo.cfg used in one of the instances (they are all the same except for the local server's ip being 0.0.0.0 in each) * zoo.cfg.dynamic.next (don't think this is used anywhere but is written by zookeeper at some point - I think when the first 3.6.1 container becomes leader based on the value – the file is in all containers and is the same in all servers) * s\{1,2,3}_zk.log - logs from each of the 3 servers. Estimated time of repro start indicated by "// REPRO START" text and whitespace in logs * repro_steps.txt - rough steps executed that result in the server logs attached I'll summarize the repro here also: # Initially it appears to be a healthy 3 node ensemble all running 3.6.1. Server ids are 1,2,3 and 3 is the leader. Dynamic config/reconfiguration is disabled. # invoke srvr on each node (to verify setup and also create bookmark in logs) # Do a zkCli get of /zookeeper/quota which succeeds # Do a restart of the leader (to same image/config) (server 2 now becomes leader, 3 is back as follower) # Try to perform the same zkCli get which times out (this get is done within the container) # Try to perform the same zkCli get but from another machine, this also times out # Invoke srvr on each node again (to verify that 2 is now the leader/bookmark) # Do a restart of server 2 (3 becomes leader, 2 follower) # Do a zkCli get of /zookeeper/quota which succeeds # Invoke srvr on each node again (to verify that 3 is leader) I tried to keep the other ZK traffic to a minimum but there are likely some periodic mntr requests mixed from our metrics scraper. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Issues on java lock recipe
Hello, My colleagues and I are working with the java lock recipe implementation. We think we found two bugs in the code: 1) The first one is reported on this jira topichttps://issues.apache.org/jira/browse/ZOOKEEPER-645. The issue is that the znodes used to control the lock are ordered by sessionID first and then by the sequence number. As earlier connected clients appear to have lower sessionID values than those connected latter, who connects first gets the lock disregarding anyone who has already the lock. We've posted a patch on that jira but it has not yet been reviewed. 2) The other bug is when you try to unlock. When calling unlock(), either having the lock held or waiting for it, the znode lock is removed. However, if you are not holding the lock, there's still a zookeeper watcher waiting on the next znode with lower sequence number, which is necessary to avoid the heard effect on the recipe implementation. When watcher tells the lock implementation that the watched znode has been removed, the lock recipe calls lock(). What happens then is that a new znode lock is created so the client is again (unwilling) waiting for the lock (or eventually is now holding the lock). If that client doesn't do anything (such as unlocking it over and over until eventually getting the lock and doing an ultimate unlock) there would be a deadlock due to the client having the lock without knowing it. We've come up with a patch for this (it's attached to this email). Our question is: should we post this patch on the same jira topic mentioned on the beginning of this email or should we open a new topic for this issue? Thanks, Andre Esteve http://www.lsd.ic.unicamp.br/mc715-1s2011/index.php/Main_Page (wiki in Portuguese about our (and other's) works using zookeeper) Index: WriteLock.java === --- WriteLock.java (revision 1102068) +++ WriteLock.java (working copy) @@ -152,7 +152,10 @@ LOG.debug(Watcher fired on path: + event.getPath() + state: + event.getState() + type + event.getType()); try { -lock(); +// avoid locking when not waiting for it +if (id != null) { +lock(); +} } catch (Exception e) { LOG.warn(Failed to acquire lock: + e, e); }
[jira] [Updated] (ZOOKEEPER-645) Bug in WriteLock recipe implementation?
[ https://issues.apache.org/jira/browse/ZOOKEEPER-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andre Esteve updated ZOOKEEPER-645: --- Attachment: ZOOKEEPER-645-compareTo.patch compareTo.patch aims to correct ordering of ZNodeName objects used to validate lock ownership. The code at WriteLock gets a list of znodes and for each znode creates a ZNodeName object which is added to a sorted list. The sorting was based on the full znode name, i.e. x-sessionID-ephemeral_number. As earlier connected clients appear to have lower sessionID values than those which connected latter, who connects first gets the lock disregarding anyone who has already the lock. This patch simply changes compareTo overload at ZNodeName to just consider the sequence number instead of the full znode name, as this class' objects are used only for this purpose, this seems to have done the trick =) However, getSessionID not being thread-safe is still an issue. Could someone try it out and post the results? [A discussion about this bug and some other issues on lock recipe, as well as this patch contributors, can be found here (in Portuguese) http://www.lsd.ic.unicamp.br/mc715-1s2011/index.php/Grupo01] Bug in WriteLock recipe implementation? --- Key: ZOOKEEPER-645 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-645 Project: ZooKeeper Issue Type: Bug Components: recipes Affects Versions: 3.2.2 Environment: 3.2.2 java 1.6.0_12 Reporter: Jaakko Laine Assignee: Mahadev konar Priority: Minor Fix For: 3.4.0 Attachments: 645-fix-findPrefixInChildren.patch, ZOOKEEPER-645-compareTo.patch Not sure, but there seem to be two issues in the example WriteLock: (1) ZNodeName is sorted according to session ID first, and then according to znode sequence number. This might cause starvation as lower session IDs always get priority. WriteLock is not thread-safe in the first place, so having session ID involved in compare operation does not seem to make sense. (2) if findPrefixInChildren finds previous ID, it should add dir in front of the ID -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira