[jira] [Commented] (ZOOKEEPER-1360) QuorumTest.testNoLogBeforeLeaderEstablishment has several problems
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126316#comment-16126316 ] Henry Robinson commented on ZOOKEEPER-1360: --- Not at all - haven't looked at this in years! > QuorumTest.testNoLogBeforeLeaderEstablishment has several problems > -- > > Key: ZOOKEEPER-1360 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1360 > Project: ZooKeeper > Issue Type: Bug > Components: tests >Affects Versions: 3.4.2 >Reporter: Henry Robinson >Assignee: Henry Robinson > Fix For: 3.5.4, 3.6.0 > > > After the apparently valid fix to ZOOKEEPER-1294, > testNoLogBeforeLeaderEstablishment is failing for me about one time in four. > While I'll investigate whether the patch is 1294 is ultimately to blame, > reading the test brought to light a number of issues that appear to be bugs > or in need of improvement: > * As part of QuorumTest, an ensemble is already established by the fixture > setup code, but apparently unused by the test which uses QuorumUtil. > * The test reads QuorumPeer.leader and QuorumPeer.follower without > synchronization, which means that writes to those fields may not be published > when we come to read them. > * The return value of sem.tryAcquire is never checked. > * The progress of the test is based on ad-hoc timings (25 * 500ms sleeps) and > inscrutable numbers of iterations through the main loop (e.g. the semaphore > blocking the final asserts is released only after the 2th of 5 > callbacks) > * The test as a whole takes ~30s to run > The first three are easy to fix (as part of fixing the second, I intend to > hide all members of QuorumPeer behind getters and setters), the fourth and > fifth need a slightly deeper understanding of what the test is trying to > achieve. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-1697) large snapshots can cause continuous quorum failure
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13653114#comment-13653114 ] Henry Robinson commented on ZOOKEEPER-1697: --- [~phunt] - this seems _much_ clearer and easier to reason about. large snapshots can cause continuous quorum failure --- Key: ZOOKEEPER-1697 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1697 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.3, 3.5.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Critical Fix For: 3.5.0, 3.4.6 Attachments: ZOOKEEPER-1697_branch34.patch, ZOOKEEPER-1697_branch34.patch, ZOOKEEPER-1697.patch, ZOOKEEPER-1697.patch I keep seeing this on the leader: 2013-04-30 01:18:39,754 INFO org.apache.zookeeper.server.quorum.Leader: Shutdown called java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:447) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:422) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) The followers are downloading the snapshot when this happens, and are trying to do their first ACK to the leader, the ack fails with broken pipe. In this case the snapshots are large and the config has increased the initLimit. syncLimit is small - 10 or so with ticktime of 2000. Note this is 3.4.3 with ZOOKEEPER-1521 applied. I originally speculated that https://issues.apache.org/jira/browse/ZOOKEEPER-1521 might be related. I thought I might have broken something for this environment. That doesn't look to be the case. As it looks now it seems that 1521 didn't go far enough. The leader verifies that all followers have ACK'd to the leader within the last syncLimit time period. This runs all the time in the background on the leader to identify the case where a follower drops. In this case the followers take so long to load the snapshot that this check fails the very first time, as a result the leader drops (not enough ack'd followers w/in the sync limit) and re-election happens. This repeats forever. (the above error) this is the call: org.apache.zookeeper.server.quorum.LearnerHandler.synced() that's at odds. look at setting of tickOfLastAck in org.apache.zookeeper.server.quorum.LearnerHandler.run() It's not set until the follower first acks - in this case I can see that the followers are not getting to the ack prior to the leader shutting down due to the error log above. It seems that sync() should probably use the init limit until the first ack comes in from the follower. I also see that while tickOfLastAck and leader.self.tick is shared btw two threads there is no synchronization of the shared resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1346) Handle 4lws and monitoring on separate port
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468901#comment-13468901 ] Henry Robinson commented on ZOOKEEPER-1346: --- +1, great idea. I do think it's important that we set out with the intention of deprecating the old protocol eventually. This is a good opportunity to properly establish a procedure for doing that. I suggest including both in the next major release (3.5?) and warning that the old protocol will be turned off in 3.6. Assuming all goes according to plan, we can eventually ship 3.6 with only a Jetty-based 4lw implementation. Handle 4lws and monitoring on separate port --- Key: ZOOKEEPER-1346 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1346 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.5.0 Attachments: ZOOKEEPER-1346_jetty.patch Move the 4lws to their own port, off of the client port, and support them properly via long-lived sessions instead of polling. Deprecate the 4lw support on the client port. Will enable us to enhance the functionality of the commands via extended command syntax, address security concerns and fix bugs involving the socket close being received before all of the data on the client end. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (ZOOKEEPER-1238) when the linger time was changed for NIO the patch missed Netty
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson reassigned ZOOKEEPER-1238: - Assignee: Skye Wanderman-Milne when the linger time was changed for NIO the patch missed Netty --- Key: ZOOKEEPER-1238 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1238 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.0, 3.5.0 Reporter: Patrick Hunt Assignee: Skye Wanderman-Milne Fix For: 3.5.0 Attachments: ZOOKEEPER-1238.patch from NettyServerCnxn: bq. bootstrap.setOption(child.soLinger, 2); See ZOOKEEPER-1049 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (ZOOKEEPER-1376) zkServer.sh does not correctly check for $SERVER_JVMFLAGS
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson reassigned ZOOKEEPER-1376: - Assignee: Skye Wanderman-Milne zkServer.sh does not correctly check for $SERVER_JVMFLAGS - Key: ZOOKEEPER-1376 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1376 Project: ZooKeeper Issue Type: Bug Components: scripts Affects Versions: 3.3.3, 3.3.4 Reporter: Patrick Hunt Assignee: Skye Wanderman-Milne Priority: Minor Labels: newbie Fix For: 3.3.7, 3.4.5 Attachments: ZOOKEEPER-1376.patch It will always include it even if not defined, although not much harm. if [ x$SERVER_JVMFLAGS ] then JVMFLAGS=$SERVER_JVMFLAGS $JVMFLAGS fi should use the std idiom. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1238) when the linger time was changed for NIO the patch missed Netty
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13462237#comment-13462237 ] Henry Robinson commented on ZOOKEEPER-1238: --- +1, patch looks good to me - the lack of tests isn't a problem for this change. I'll commit shortly. when the linger time was changed for NIO the patch missed Netty --- Key: ZOOKEEPER-1238 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1238 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.0, 3.5.0 Reporter: Patrick Hunt Assignee: Skye Wanderman-Milne Fix For: 3.5.0 Attachments: ZOOKEEPER-1238.patch from NettyServerCnxn: bq. bootstrap.setOption(child.soLinger, 2); See ZOOKEEPER-1049 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1376) zkServer.sh does not correctly check for $SERVER_JVMFLAGS
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460911#comment-13460911 ] Henry Robinson commented on ZOOKEEPER-1376: --- +1, patch looks good to me. I'll commit shortly to 3.3 and 3.4. zkServer.sh does not correctly check for $SERVER_JVMFLAGS - Key: ZOOKEEPER-1376 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1376 Project: ZooKeeper Issue Type: Bug Components: scripts Affects Versions: 3.3.3, 3.3.4 Reporter: Patrick Hunt Priority: Minor Labels: newbie Fix For: 3.3.7, 3.4.5 Attachments: ZOOKEEPER-1376.patch It will always include it even if not defined, although not much harm. if [ x$SERVER_JVMFLAGS ] then JVMFLAGS=$SERVER_JVMFLAGS $JVMFLAGS fi should use the std idiom. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1361) Leader.lead iterates over 'learners' set without proper synchronisation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452560#comment-13452560 ] Henry Robinson commented on ZOOKEEPER-1361: --- Hey - sorry for the delay. I don't think the extra synchronisation in sendPacket is strictly necessary (because note that the forwardingFollowers lock is already held). However I think that the scoped lock around queuePacket is probably not required, should be removed, not the call to getForwardingFollowers. Make sense? Leader.lead iterates over 'learners' set without proper synchronisation --- Key: ZOOKEEPER-1361 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1361 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.2 Reporter: Henry Robinson Assignee: Henry Robinson Fix For: 3.4.4, 3.5.0 Attachments: zk-memory-leak-fix.patch, ZOOKEEPER-1361-3.4.patch, ZOOKEEPER-1361-no-whitespace.patch, ZOOKEEPER-1361.patch This block: {code} HashSetLong followerSet = new HashSetLong(); for(LearnerHandler f : learners) followerSet.add(f.getSid()); {code} is executed without holding the lock on learners, so if there were ever a condition where a new learner was added during the initial sync phase, I'm pretty sure we'd see a concurrent modification exception. Certainly other parts of the code are very careful to lock on learners when iterating. It would be nice to use a {{ConcurrentHashMap}} to hold the learners instead, but I can't convince myself that this wouldn't introduce some correctness bugs. For example the following: Learners contains A, B, C, D Thread 1 iterates over learners, and gets as far as B. Thread 2 removes A, and adds E. Thread 1 continues iterating and sees a learner view of A, B, C, D, E This may be a bug if Thread 1 is counting the number of synced followers for a quorum count, since at no point was A, B, C, D, E a correct view of the quorum. In practice, I think this is actually ok, because I don't think ZK makes any strong ordering guarantees on learners joining or leaving (so we don't need a strong serialisability guarantee on learners) but I don't think I'll make that change for this patch. Instead I want to clean up the locking protocols on the follower / learner sets - to avoid another easy deadlock like the one we saw in ZOOKEEPER-1294 - and to do less with the lock held; i.e. to copy and then iterate over the copy rather than iterate over a locked set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1514) FastLeaderElection - leader ignores the round information when joining a quorum
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425397#comment-13425397 ] Henry Robinson commented on ZOOKEEPER-1514: --- Hi Flavio - I don't really mind the check, it's just completely unnecessary (since listener == null = NPE = failed test). Let's keep it in if you think it is important. What is a problem, and I agree not worth fixing here, is that this is yet another example of class members not being hidden behind getters / setters that maintain correct invariants. Anyone can set listener to null, because it's a non-final public member, so every read of that variable in code that mustn't crash has to defensively check that it's not null, when we should be relying on the class to do this for us. Anyhow, this looks ok to me - +1, happy to commit. FastLeaderElection - leader ignores the round information when joining a quorum --- Key: ZOOKEEPER-1514 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1514 Project: ZooKeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.4 Reporter: Patrick Hunt Assignee: Flavio Junqueira Priority: Critical Fix For: 3.4.4, 3.5.0, 3.3.7 Attachments: ZOOKEEPER-1514.patch, ZOOKEEPER-1514.patch, ZOOKEEPER-1514.patch In the following case we have a 3 server ensemble. Initially all is well, zk3 is the leader. However zk3 fails, restarts, and rejoins the quorum as the new leader (was the old leader, still the leader after re-election) The existing two followers, zk1 and zk2 rejoin the new quorum again as followers of zk3. zk1 then fails, the datadirectory is deleted (so it has no state whatsoever) and restarted. However zk1 can never rejoin the quorum (even after an hour). During this time zk2 and zk3 are serving properly. Later all three servers are later restarted and properly form a functional quourm. Here are some interesting log snippets. Nothing else of interest was seen in the logs during this time: zk3. This is where it becomes the leader after failing initially (as the leader). Notice the round is ahead of zk1 and zk2: {noformat} 2012-07-18 17:19:35,423 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@663] - New election. My id = 3, Proposed zxid = 77309411648 2012-07-18 17:19:35,423 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 (n.zxid), 832 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-18 17:19:35,424 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 (n.zxid), 831 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-18 17:19:35,424 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 (n.zxid), 831 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-18 17:19:35,424 - INFO [QuorumPeer:/0.0.0.0:2181:QuorumPeer@655] - LEADING {noformat} zk1 which won't come back. Notice that zk3 is reporting the round as 831, while zk2 thinks that the round is 832: {noformat} 2012-07-18 17:31:12,015 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 77309411648 (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-18 17:31:12,016 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 (n.zxid), 831 (n.round), LEADING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-18 17:31:12,017 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 (n.zxid), 832 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-18 17:31:15,219 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 6400 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1514) FastLeaderElection - leader ignores the round information when joining a quorum
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13424286#comment-13424286 ] Henry Robinson commented on ZOOKEEPER-1514: --- Flavio - this looks fine. The point I am trying to make about this bit of code: {code} if(listener != null){ listener.start(); } else { LOG.error(Null listener when initializing cnx manager); Assert.fail(Failed to create cnx manager); } {code} is that there's no need for the null check, since if {{listener}} is null, there'll be an NPE thrown which will fail the test anyhow. Plus, looking at {{QuorumCnxManager.java:153}}, I can't see any way in which {{listener}} can be null, because it's unambiguously assigned to a {{new Listener()}}. Is there a case that I'm missing? I know this doesn't really affect the functionality of the patch, but if these checks aren't necessary, it will be confusing to the reader in the future. FastLeaderElection - leader ignores the round information when joining a quorum --- Key: ZOOKEEPER-1514 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1514 Project: ZooKeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.4 Reporter: Patrick Hunt Assignee: Flavio Junqueira Priority: Critical Fix For: 3.4.4, 3.5.0, 3.3.7 Attachments: ZOOKEEPER-1514.patch, ZOOKEEPER-1514.patch, ZOOKEEPER-1514.patch In the following case we have a 3 server ensemble. Initially all is well, zk3 is the leader. However zk3 fails, restarts, and rejoins the quorum as the new leader (was the old leader, still the leader after re-election) The existing two followers, zk1 and zk2 rejoin the new quorum again as followers of zk3. zk1 then fails, the datadirectory is deleted (so it has no state whatsoever) and restarted. However zk1 can never rejoin the quorum (even after an hour). During this time zk2 and zk3 are serving properly. Later all three servers are later restarted and properly form a functional quourm. Here are some interesting log snippets. Nothing else of interest was seen in the logs during this time: zk3. This is where it becomes the leader after failing initially (as the leader). Notice the round is ahead of zk1 and zk2: {noformat} 2012-07-18 17:19:35,423 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@663] - New election. My id = 3, Proposed zxid = 77309411648 2012-07-18 17:19:35,423 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 (n.zxid), 832 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-18 17:19:35,424 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 (n.zxid), 831 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-18 17:19:35,424 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 (n.zxid), 831 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-18 17:19:35,424 - INFO [QuorumPeer:/0.0.0.0:2181:QuorumPeer@655] - LEADING {noformat} zk1 which won't come back. Notice that zk3 is reporting the round as 831, while zk2 thinks that the round is 832: {noformat} 2012-07-18 17:31:12,015 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 77309411648 (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-18 17:31:12,016 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 (n.zxid), 831 (n.round), LEADING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-18 17:31:12,017 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 (n.zxid), 832 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-18 17:31:15,219 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 6400 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1521) LearnerHandler initLimit/syncLimit problems specifying follower socket timeout limits
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423210#comment-13423210 ] Henry Robinson commented on ZOOKEEPER-1521: --- Good catch. Here's what I notice in branch-3.3: * {{Leader.java:240}} - the initial read timeout is set to {{syncLimit}} ticks, but the first thing we wait for is an ACK from the follower saying that it has got up-to-date, which should be subject to {{initLimit}} * I also saw that in {{Learner.java:220}}, the connection should be established with {{initLimit}} as the connection timeout (this is not fixed in branch-3.4). However, because there's a retry loop, there's no guarantee that we will connect in less than initLimit or syncLimit. So {{initLimit}} is not a hard limit at all; but it isn't already for other reasons. In branch-3.4: * {{LearnerHandler.java:336}} sets the initial timeout to {{initLimit}}, but never sets it back again after the ACK. And it should just be setting the timeout in {{Leader.java:254}} anyhow. LearnerHandler initLimit/syncLimit problems specifying follower socket timeout limits - Key: ZOOKEEPER-1521 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1521 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.3, 3.3.5, 3.5.0 Reporter: Patrick Hunt Priority: Critical Fix For: 3.3.6, 3.4.4, 3.5.0 Attachments: ZOOKEEPER-1521_br33.patch branch 3.3: The leader is expecting the follower to initialize in syncLimit time rather than initLimit. In LearnerHandler run line 395 (branch33) we look for the ack from the follower with a timeout of syncLimit. branch 3.4+: seems like ZOOKEEPER-1136 introduced a regression while attempting to fix the problem. It sets the timeout as initLimit however it never sets the timeout to syncLimit once the ack is received. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1514) FastLeaderElection - leader ignores the round information when joining a quorum
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423443#comment-13423443 ] Henry Robinson commented on ZOOKEEPER-1514: --- I'm not sure that removing the null checks would mean findbugs warnings (easy to try!) - and if the listener is null, the test will throw an NPE and fail anyhow which seems like the right thing to do. So I would suggest just removing the null checks. What do you think? FastLeaderElection - leader ignores the round information when joining a quorum --- Key: ZOOKEEPER-1514 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1514 Project: ZooKeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.4 Reporter: Patrick Hunt Assignee: Flavio Junqueira Priority: Critical Fix For: 3.4.4, 3.5.0, 3.3.7 Attachments: ZOOKEEPER-1514.patch, ZOOKEEPER-1514.patch In the following case we have a 3 server ensemble. Initially all is well, zk3 is the leader. However zk3 fails, restarts, and rejoins the quorum as the new leader (was the old leader, still the leader after re-election) The existing two followers, zk1 and zk2 rejoin the new quorum again as followers of zk3. zk1 then fails, the datadirectory is deleted (so it has no state whatsoever) and restarted. However zk1 can never rejoin the quorum (even after an hour). During this time zk2 and zk3 are serving properly. Later all three servers are later restarted and properly form a functional quourm. Here are some interesting log snippets. Nothing else of interest was seen in the logs during this time: zk3. This is where it becomes the leader after failing initially (as the leader). Notice the round is ahead of zk1 and zk2: {noformat} 2012-07-18 17:19:35,423 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@663] - New election. My id = 3, Proposed zxid = 77309411648 2012-07-18 17:19:35,423 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 (n.zxid), 832 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-18 17:19:35,424 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 (n.zxid), 831 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-18 17:19:35,424 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 (n.zxid), 831 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-18 17:19:35,424 - INFO [QuorumPeer:/0.0.0.0:2181:QuorumPeer@655] - LEADING {noformat} zk1 which won't come back. Notice that zk3 is reporting the round as 831, while zk2 thinks that the round is 832: {noformat} 2012-07-18 17:31:12,015 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 77309411648 (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-18 17:31:12,016 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 (n.zxid), 831 (n.round), LEADING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-18 17:31:12,017 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 (n.zxid), 832 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-18 17:31:15,219 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 6400 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1514) FastLeaderElection - leader ignores the round information when joining a quorum
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13420379#comment-13420379 ] Henry Robinson commented on ZOOKEEPER-1514: --- Hey Flavio - Thanks for fixing this so quickly! Patch looks really nice, a few nits: * I don't think you need to duplicate {{createMsg}} in {{FLEBackwardElectionRound}}, since it's now in {{FLETestUtils}} * Could you add a comment to {{FLEBackwardElectionRound.testBackwardElectionRound}} describing the bug it's testing for, and I guess referencing this JIRA? * If {{listener}} is {{null}} for {{QuorumCnxManager.Listener listener = cnxManagers[0].listener;}} and similar, shouldn't the test fail straight away? Under what circumstances would this be true? * There's a small typo - 'instace' - 'instance' FastLeaderElection - leader ignores the round information when joining a quorum --- Key: ZOOKEEPER-1514 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1514 Project: ZooKeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.4 Reporter: Patrick Hunt Assignee: Flavio Junqueira Priority: Critical Fix For: 3.4.4, 3.5.0, 3.3.7 Attachments: ZOOKEEPER-1514.patch, ZOOKEEPER-1514.patch In the following case we have a 3 server ensemble. Initially all is well, zk3 is the leader. However zk3 fails, restarts, and rejoins the quorum as the new leader (was the old leader, still the leader after re-election) The existing two followers, zk1 and zk2 rejoin the new quorum again as followers of zk3. zk1 then fails, the datadirectory is deleted (so it has no state whatsoever) and restarted. However zk1 can never rejoin the quorum (even after an hour). During this time zk2 and zk3 are serving properly. Later all three servers are later restarted and properly form a functional quourm. Here are some interesting log snippets. Nothing else of interest was seen in the logs during this time: zk3. This is where it becomes the leader after failing initially (as the leader). Notice the round is ahead of zk1 and zk2: {noformat} 2012-07-18 17:19:35,423 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@663] - New election. My id = 3, Proposed zxid = 77309411648 2012-07-18 17:19:35,423 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 (n.zxid), 832 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-18 17:19:35,424 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 (n.zxid), 831 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-18 17:19:35,424 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 (n.zxid), 831 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-18 17:19:35,424 - INFO [QuorumPeer:/0.0.0.0:2181:QuorumPeer@655] - LEADING {noformat} zk1 which won't come back. Notice that zk3 is reporting the round as 831, while zk2 thinks that the round is 832: {noformat} 2012-07-18 17:31:12,015 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 77309411648 (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-18 17:31:12,016 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 (n.zxid), 831 (n.round), LEADING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-18 17:31:12,017 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 (n.zxid), 832 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-18 17:31:15,219 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 6400 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1431) zkpython: async calls leak memory
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396393#comment-13396393 ] Henry Robinson commented on ZOOKEEPER-1431: --- Patch looks good, I'll commit shortly. zkpython: async calls leak memory - Key: ZOOKEEPER-1431 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1431 Project: ZooKeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.4.3 Environment: RHEL 6.0, self-built from 3.3.3 sources Reporter: johan rydberg Assignee: Kapil Thangavelu Fix For: 3.4.4, 3.5.0 Attachments: pyzk-mem-leak-fix.diff, zk.patch, zktest3.py, zktest4.py Original Estimate: 1h Remaining Estimate: 1h I'm seeing a memory leakage when using the aget method. It leaks tuples and dicts, both containing stats. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (ZOOKEEPER-1473) Committed proposal log retains triple the memory it needs to
Henry Robinson created ZOOKEEPER-1473: - Summary: Committed proposal log retains triple the memory it needs to Key: ZOOKEEPER-1473 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1473 Project: ZooKeeper Issue Type: Bug Reporter: Henry Robinson ZKDatabase.committedLog retains the past 500 transactions to enable fast catch-up. This works great, but it's using triple the memory it needs to by retaining three copies of the data part of any transaction. * The first is in {{committedLog[i].request.request.hb}} - a heap-allocated {{ByteBuffer}}. * The second is in {{committedLog[i].request.txn.data}} - a jute-serialised record of the transaction * The third is in {{committedLog[i].packet.data}} - also jute-serialised, seemingly uninitialised data. This means that a ZK-server could be using 1G of memory more than it should be in the worst case. We should use just one copy of the data, even if we really have to refer to it 3 times. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1473) Committed proposal log retains triple the memory it needs to
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-1473: -- Description: ZKDatabase.committedLog retains the past 500 transactions to enable fast catch-up. This works great, but it's using triple the memory it needs to by retaining three copies of the data part of any transaction. * The first is in committedLog[i].request.request.hb - a heap-allocated {{ByteBuffer}}. * The second is in committedLog[i].request.txn.data - a jute-serialised record of the transaction * The third is in committedLog[i].packet.data - also jute-serialised, seemingly uninitialised data. This means that a ZK-server could be using 1G of memory more than it should be in the worst case. We should use just one copy of the data, even if we really have to refer to it 3 times. was: ZKDatabase.committedLog retains the past 500 transactions to enable fast catch-up. This works great, but it's using triple the memory it needs to by retaining three copies of the data part of any transaction. * The first is in {{committedLog[i].request.request.hb}} - a heap-allocated {{ByteBuffer}}. * The second is in {{committedLog[i].request.txn.data}} - a jute-serialised record of the transaction * The third is in {{committedLog[i].packet.data}} - also jute-serialised, seemingly uninitialised data. This means that a ZK-server could be using 1G of memory more than it should be in the worst case. We should use just one copy of the data, even if we really have to refer to it 3 times. Committed proposal log retains triple the memory it needs to Key: ZOOKEEPER-1473 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1473 Project: ZooKeeper Issue Type: Bug Reporter: Henry Robinson ZKDatabase.committedLog retains the past 500 transactions to enable fast catch-up. This works great, but it's using triple the memory it needs to by retaining three copies of the data part of any transaction. * The first is in committedLog[i].request.request.hb - a heap-allocated {{ByteBuffer}}. * The second is in committedLog[i].request.txn.data - a jute-serialised record of the transaction * The third is in committedLog[i].packet.data - also jute-serialised, seemingly uninitialised data. This means that a ZK-server could be using 1G of memory more than it should be in the worst case. We should use just one copy of the data, even if we really have to refer to it 3 times. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1461) Zookeeper C client doesn't check for NULL before dereferencing in prepend_string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266236#comment-13266236 ] Henry Robinson commented on ZOOKEEPER-1461: --- See ZOOKEEPER-1305 - this was fixed in trunk and 3.4, but not in 3.3. We should probably close this as a duplicate and commit 1305 to 3.3. See my comment there. Zookeeper C client doesn't check for NULL before dereferencing in prepend_string Key: ZOOKEEPER-1461 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1461 Project: ZooKeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.5 Reporter: Stephen Tyree Assignee: Stephen Tyree Fix For: 3.3.6 Attachments: ZOOKEEPER-1461.PATCH Original Estimate: 0h Remaining Estimate: 0h prepend_string, called before any checks for NULL in the c client for many API functions, has this line (zookeeper 3.3.5): if (zh-chroot == NULL) That means that before you check for NULL, you are dereferencing the pointer. This bug does not exist in the 3.4.* branch for whatever reason, but it still remains in the 3.3.* line. A patch which fixes it would make the line as follows: if (zh == NULL || zh-chroot == NULL) I would do that for you, but I don't know how to patch the 3.3.5 branch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1305) zookeeper.c:prepend_string func can dereference null ptr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266237#comment-13266237 ] Henry Robinson commented on ZOOKEEPER-1305: --- Hey Mahadev - Seems like some people are hitting this bug in 3.3 ZOOKEEPER-1461 - did you mean not to commit this to 3.3? If not, I'll go ahead and commit this there. Thanks, Henry zookeeper.c:prepend_string func can dereference null ptr Key: ZOOKEEPER-1305 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1305 Project: ZooKeeper Issue Type: Bug Components: c client Affects Versions: 3.3.3 Environment: All Reporter: Daniel Lescohier Assignee: Daniel Lescohier Labels: patch Fix For: 3.4.1, 3.5.0 Attachments: ZOOKEEPER-1305.patch, ZOOKEEPER-1305.patch Original Estimate: 0.5h Remaining Estimate: 0.5h All the callers of the function prepend_string make a call to prepend_string before checking that zhandle_t *zh is not null. At the top of prepend_string, zh is dereferenced without checking for a null ptr: static char* prepend_string(zhandle_t *zh, const char* client_path) { char *ret_str; if (zh-chroot == NULL) return (char *) client_path; I propose fixing this by adding the check here in prepend_string: static char* prepend_string(zhandle_t *zh, const char* client_path) { char *ret_str; if (zh==NULL || zh-chroot == NULL) return (char *) client_path; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1318) In Python binding, get_children (and get and exists, and probably others) with expired session doesn't raise exception properly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-1318: -- Attachment: ZOOKEEPER-1318.patch The error is a missing InvalidStateException. I've added the exception type, and confirmed that it show up when session expiration occurs. In Python binding, get_children (and get and exists, and probably others) with expired session doesn't raise exception properly --- Key: ZOOKEEPER-1318 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1318 Project: ZooKeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.3 Environment: Mac OS X (at least) Reporter: Jim Fulton Attachments: ZOOKEEPER-1318.patch In Python binding, get_children (and get and exists, and probably others) with expired session doesn't raise exception properly. zookeeper.state(h) -112 zookeeper.get_children(h, '/') Traceback (most recent call last): File console, line 1, in module SystemError: error return without exception set Let me know if you'd like me to work on a patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1318) In Python binding, get_children (and get and exists, and probably others) with expired session doesn't raise exception properly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-1318: -- Attachment: ZOOKEEPER-1318.patch Updated patch - InvalidStateException was already declared, just not dealt with in err_to_exception. This is a very simple patch, and tests are hard to write for session expired exceptions; also we don't have coverage for similar cases with other exceptions. In Python binding, get_children (and get and exists, and probably others) with expired session doesn't raise exception properly --- Key: ZOOKEEPER-1318 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1318 Project: ZooKeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.3 Environment: Mac OS X (at least) Reporter: Jim Fulton Assignee: Henry Robinson Fix For: 3.3.6, 3.4.4, 3.5.0 Attachments: ZOOKEEPER-1318.patch, ZOOKEEPER-1318.patch In Python binding, get_children (and get and exists, and probably others) with expired session doesn't raise exception properly. zookeeper.state(h) -112 zookeeper.get_children(h, '/') Traceback (most recent call last): File console, line 1, in module SystemError: error return without exception set Let me know if you'd like me to work on a patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1097) Quota is not correctly rehydrated on snapshot reload
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055234#comment-13055234 ] Henry Robinson commented on ZOOKEEPER-1097: --- Just committed this to 3.3. Thanks Camille! Quota is not correctly rehydrated on snapshot reload Key: ZOOKEEPER-1097 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1097 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.3, 3.4.0 Reporter: Camille Fournier Assignee: Camille Fournier Priority: Blocker Fix For: 3.3.4, 3.4.0 Attachments: 1097.patch, ZOOKEEPER-1097, ZOOKEEPER-1097-33.patch, ZOOKEEPER-1097-whitespace.patch, ZOOKEEPER-1097-whitespace.patch, ZOOKEEPER-1097.patch, ZOOKEEPER-1097.patch traverseNode in DataTree will never actually traverse the limit nodes properly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1097) Quota is not correctly rehydrated on snapshot reload
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-1097: -- Attachment: ZOOKEEPER-1097-whitespace.patch Quota is not correctly rehydrated on snapshot reload Key: ZOOKEEPER-1097 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1097 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.3, 3.4.0 Reporter: Camille Fournier Assignee: Camille Fournier Priority: Blocker Fix For: 3.3.4, 3.4.0 Attachments: 1097.patch, ZOOKEEPER-1097, ZOOKEEPER-1097-whitespace.patch, ZOOKEEPER-1097-whitespace.patch, ZOOKEEPER-1097.patch, ZOOKEEPER-1097.patch traverseNode in DataTree will never actually traverse the limit nodes properly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (ZOOKEEPER-1095) Simple leader election recipe
Simple leader election recipe - Key: ZOOKEEPER-1095 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1095 Project: ZooKeeper Issue Type: Improvement Reporter: Henry Robinson Leader election recipe originally contributed to ZOOKEEPER-1080. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1080) Provide a Leader Election framework based on Zookeeper receipe
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049887#comment-13049887 ] Henry Robinson commented on ZOOKEEPER-1080: --- What we've got here are two different, but equally valid, approaches to building leader election. Since this isn't a core framework issue, we're not making a decision that everyone has to live with. Therefore there's no need for the committers to play kingmaker by only committing one of these patches. We've got room for both, just not on this JIRA. Here's what I suggest we do. * Eric - I've opened ZOOKEEPER-1095 for your contribution. Can you attach your recipe (as a diff, with copyright headers) to that ticket, and we'll work on getting it committed there? * Hari - leave your patch here, and one of the committers will do a code review shortly. Provide a Leader Election framework based on Zookeeper receipe -- Key: ZOOKEEPER-1080 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1080 Project: ZooKeeper Issue Type: New Feature Components: contrib Affects Versions: 3.3.2 Reporter: Hari A V Fix For: 3.3.2 Attachments: LeaderElectionService.pdf, ZOOKEEPER-1080.patch, zkclient-0.1.0.jar, zookeeper-leader-0.0.1.tar.gz Currently Hadoop components such as NameNode and JobTracker are single point of failure. If Namenode or JobTracker goes down, there service will not be available until they are up and running again. If there was a Standby Namenode or JobTracker available and ready to serve when Active nodes go down, we could have reduced the service down time. Hadoop already provides a Standby Namenode implementation which is not fully a hot Standby. The common problem to be addressed in any such Active-Standby cluster is Leader Election and Failure detection. This can be done using Zookeeper as mentioned in the Zookeeper recipes. http://zookeeper.apache.org/doc/r3.3.3/recipes.html +Leader Election Service (LES)+ Any Node who wants to participate in Leader Election can use this service. They should start the service with required configurations. The service will notify the nodes whether they should be started as Active or Standby mode. Also they intimate any changes in the mode at runtime. All other complexities can be handled internally by the LES. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (ZOOKEEPER-1094) Small improvements to LeaderElection and Vote classes
Small improvements to LeaderElection and Vote classes - Key: ZOOKEEPER-1094 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1094 Project: ZooKeeper Issue Type: Improvement Components: quorum Reporter: Henry Robinson Assignee: Henry Robinson Priority: Minor 1. o.a.z.q.Vote is a struct-style class, whose fields are public and not final. In general, we should prefer making the fields of these kind of classes final, and hiding them behind getters for the following reasons: * Marking them as final allows clients of the class not to worry about any synchronisation when accessing the fields * Hiding them behind getters allows us to change the implementation of the class without changing the API. Object creation is very cheap. It's ok to create new Votes rather than mutate existing ones. 2. Votes are mainly used in the LeaderElection class. In this class a map of addresses to votes is passed in to countVotes, which modifies the map contents inside an iterator (and therefore changes the object passed in by reference). This is pretty gross, so at the same time I've slightly refactored this method to return information about the number of validVotes in the ElectionResult class, which is returned by countVotes. 3. The previous implementation of countVotes was quadratic in the number of votes. It is possible to do this linearly. No real speed-up is expected as a result, but it salves the CS OCD in me :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1094) Small improvements to LeaderElection and Vote classes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-1094: -- Attachment: ZK-1094.patch Small improvements to LeaderElection and Vote classes - Key: ZOOKEEPER-1094 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1094 Project: ZooKeeper Issue Type: Improvement Components: quorum Reporter: Henry Robinson Assignee: Henry Robinson Priority: Minor Attachments: ZK-1094.patch 1. o.a.z.q.Vote is a struct-style class, whose fields are public and not final. In general, we should prefer making the fields of these kind of classes final, and hiding them behind getters for the following reasons: * Marking them as final allows clients of the class not to worry about any synchronisation when accessing the fields * Hiding them behind getters allows us to change the implementation of the class without changing the API. Object creation is very cheap. It's ok to create new Votes rather than mutate existing ones. 2. Votes are mainly used in the LeaderElection class. In this class a map of addresses to votes is passed in to countVotes, which modifies the map contents inside an iterator (and therefore changes the object passed in by reference). This is pretty gross, so at the same time I've slightly refactored this method to return information about the number of validVotes in the ElectionResult class, which is returned by countVotes. 3. The previous implementation of countVotes was quadratic in the number of votes. It is possible to do this linearly. No real speed-up is expected as a result, but it salves the CS OCD in me :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1094) Small improvements to LeaderElection and Vote classes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-1094: -- Attachment: (was: ZK-1094.patch) Small improvements to LeaderElection and Vote classes - Key: ZOOKEEPER-1094 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1094 Project: ZooKeeper Issue Type: Improvement Components: quorum Reporter: Henry Robinson Assignee: Henry Robinson Priority: Minor Attachments: ZK-1094.patch 1. o.a.z.q.Vote is a struct-style class, whose fields are public and not final. In general, we should prefer making the fields of these kind of classes final, and hiding them behind getters for the following reasons: * Marking them as final allows clients of the class not to worry about any synchronisation when accessing the fields * Hiding them behind getters allows us to change the implementation of the class without changing the API. Object creation is very cheap. It's ok to create new Votes rather than mutate existing ones. 2. Votes are mainly used in the LeaderElection class. In this class a map of addresses to votes is passed in to countVotes, which modifies the map contents inside an iterator (and therefore changes the object passed in by reference). This is pretty gross, so at the same time I've slightly refactored this method to return information about the number of validVotes in the ElectionResult class, which is returned by countVotes. 3. The previous implementation of countVotes was quadratic in the number of votes. It is possible to do this linearly. No real speed-up is expected as a result, but it salves the CS OCD in me :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1094) Small improvements to LeaderElection and Vote classes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-1094: -- Attachment: ZK-1094.patch Small improvements to LeaderElection and Vote classes - Key: ZOOKEEPER-1094 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1094 Project: ZooKeeper Issue Type: Improvement Components: quorum Reporter: Henry Robinson Assignee: Henry Robinson Priority: Minor Attachments: ZK-1094.patch 1. o.a.z.q.Vote is a struct-style class, whose fields are public and not final. In general, we should prefer making the fields of these kind of classes final, and hiding them behind getters for the following reasons: * Marking them as final allows clients of the class not to worry about any synchronisation when accessing the fields * Hiding them behind getters allows us to change the implementation of the class without changing the API. Object creation is very cheap. It's ok to create new Votes rather than mutate existing ones. 2. Votes are mainly used in the LeaderElection class. In this class a map of addresses to votes is passed in to countVotes, which modifies the map contents inside an iterator (and therefore changes the object passed in by reference). This is pretty gross, so at the same time I've slightly refactored this method to return information about the number of validVotes in the ElectionResult class, which is returned by countVotes. 3. The previous implementation of countVotes was quadratic in the number of votes. It is possible to do this linearly. No real speed-up is expected as a result, but it salves the CS OCD in me :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1094) Small improvements to LeaderElection and Vote classes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048915#comment-13048915 ] Henry Robinson commented on ZOOKEEPER-1094: --- Sure - if you commit ZOOKEEPER-335 tonight, I'll rebase against trunk and repost the patch tomorrow. I just read the patch for ZOOKEEPER-335, and there should be very few conflicts. Thanks! Small improvements to LeaderElection and Vote classes - Key: ZOOKEEPER-1094 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1094 Project: ZooKeeper Issue Type: Improvement Components: quorum Reporter: Henry Robinson Assignee: Henry Robinson Priority: Minor Attachments: ZK-1094.patch 1. o.a.z.q.Vote is a struct-style class, whose fields are public and not final. In general, we should prefer making the fields of these kind of classes final, and hiding them behind getters for the following reasons: * Marking them as final allows clients of the class not to worry about any synchronisation when accessing the fields * Hiding them behind getters allows us to change the implementation of the class without changing the API. Object creation is very cheap. It's ok to create new Votes rather than mutate existing ones. 2. Votes are mainly used in the LeaderElection class. In this class a map of addresses to votes is passed in to countVotes, which modifies the map contents inside an iterator (and therefore changes the object passed in by reference). This is pretty gross, so at the same time I've slightly refactored this method to return information about the number of validVotes in the ElectionResult class, which is returned by countVotes. 3. The previous implementation of countVotes was quadratic in the number of votes. It is possible to do this linearly. No real speed-up is expected as a result, but it salves the CS OCD in me :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1080) Provide a Leader Election framework based on Zookeeper receipe
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048949#comment-13048949 ] Henry Robinson commented on ZOOKEEPER-1080: --- Hey Eric - this looks good. Protocol looks solid at the first pass. Some comments, based on a quick look: * I wouldn't try and delete the root node at STOP time. It seems prone to problems if you stop one node while others are starting / in a failed state and don't have ephemerals yet registered. Sequence numbers are a fairly abundant resource, and if it's possible to run out of them across several runs, it's definitely possible to run out of them in a single run. * That tuple support class is, imho, kinda gross. It would be clearer to use specific struct-type classes whose names correspond to the fields they're intended to hold. * 'Observers' is already a meaningful noun in ZK land, so it might be clearer to call them something else. Paxos uses Learners, but that's also taken inside ZK. Listeners? * Not a big deal, but I think you can break out of the for loop at the end of determineElectionStatus once the offer corresponding to the local node has been found. * I think addObserver / removeObserver probably need to synchronize on observers if you think you need to sync in dispatchEvent as well. * Is there any way to actually determine who the leader is (if not the local process)? Seems like this would be useful. Provide a Leader Election framework based on Zookeeper receipe -- Key: ZOOKEEPER-1080 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1080 Project: ZooKeeper Issue Type: New Feature Components: contrib Affects Versions: 3.3.2 Reporter: Hari A V Attachments: LeaderElectionService.pdf, zookeeper-leader-0.0.1.tar.gz Currently Hadoop components such as NameNode and JobTracker are single point of failure. If Namenode or JobTracker goes down, there service will not be available until they are up and running again. If there was a Standby Namenode or JobTracker available and ready to serve when Active nodes go down, we could have reduced the service down time. Hadoop already provides a Standby Namenode implementation which is not fully a hot Standby. The common problem to be addressed in any such Active-Standby cluster is Leader Election and Failure detection. This can be done using Zookeeper as mentioned in the Zookeeper recipes. http://zookeeper.apache.org/doc/r3.3.3/recipes.html +Leader Election Service (LES)+ Any Node who wants to participate in Leader Election can use this service. They should start the service with required configurations. The service will notify the nodes whether they should be started as Active or Standby mode. Also they intimate any changes in the mode at runtime. All other complexities can be handled internally by the LES. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-423) Add getFirstChild API
[ https://issues.apache.org/jira/browse/ZOOKEEPER-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027015#comment-13027015 ] Henry Robinson commented on ZOOKEEPER-423: -- Lukas - Good suggestion. Could you describe the use case you're thinking of, so we can properly weigh up the idea? Thanks - Henry Add getFirstChild API - Key: ZOOKEEPER-423 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-423 Project: ZooKeeper Issue Type: New Feature Components: contrib-bindings, documentation, java client, server Reporter: Henry Robinson Attachments: ZOOKEEPER-423.patch When building the distributed queue for my tutorial blog post, it was pointed out to me that there's a serious inefficiency here. Informally, the items in the queue are created as sequential nodes. For a 'dequeue' call, all items are retrieved and sorted by name by the client in order to find the name of the next item to try and take. This costs O( n ) bandwidth and O(n.log n) sorting time - per dequeue call! Clearly this doesn't scale very well. If the servers were able to maintain a data structure that allowed them to efficiently retrieve the children of a node in order of the zxid that created them this would make successful dequeue operations O( 1 ) at the cost of O( n ) memory on the server (to maintain, e.g. a singly-linked list as a queue). This is a win if it is generally true that clients only want the first child in creation order, rather than the whole set. We could expose this to the client via this API: getFirstChild(handle, path, name_buffer, watcher) which would have much the same semantics as getChildren, but only return one znode name. Sequential nodes would still allow the ordering of znodes to be made explicitly available to the client in one RPC should it need it. Although: since this ordering would now be available cheaply for every set of children, it's not completely clear that there would be that many use cases left for sequential nodes if this API was augmented with a getChildrenInCreationOrder call. However, that's for a different discussion. A halfway-house alternative with more flexibility is to add an 'order' parameter to getFirstChild and have the server compute the first child according to the requested order (creation time, update time, lexicographical order). This saves bandwidth at the expense of increased server load, although servers can be implemented to spend memory on pre-computing commonly requested orders. I am only in favour of this approach if servers maintain a data-structure for every possible order, and then the memory implications need careful consideration. [edit - JIRA interprets ( n ) without the spaces as a thumbs-down. cute.] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-423) Add getFirstChild API
[ https://issues.apache.org/jira/browse/ZOOKEEPER-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027030#comment-13027030 ] Henry Robinson commented on ZOOKEEPER-423: -- Thanks - I'm afraid I need a bit more clarification (I'm slow first thing in the morning :)). For a stack, each worker can call getLastChild which will give a FIFO ordering. Actually locking the nodes is not covered by this patch, although we could look at doing getAndDelete[First|Last]Child. If a worker could get the last N children, that could bring a benefit in terms of being able to batch process some nodes. Is that what you're describing? Henry Add getFirstChild API - Key: ZOOKEEPER-423 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-423 Project: ZooKeeper Issue Type: New Feature Components: contrib-bindings, documentation, java client, server Reporter: Henry Robinson Attachments: ZOOKEEPER-423.patch When building the distributed queue for my tutorial blog post, it was pointed out to me that there's a serious inefficiency here. Informally, the items in the queue are created as sequential nodes. For a 'dequeue' call, all items are retrieved and sorted by name by the client in order to find the name of the next item to try and take. This costs O( n ) bandwidth and O(n.log n) sorting time - per dequeue call! Clearly this doesn't scale very well. If the servers were able to maintain a data structure that allowed them to efficiently retrieve the children of a node in order of the zxid that created them this would make successful dequeue operations O( 1 ) at the cost of O( n ) memory on the server (to maintain, e.g. a singly-linked list as a queue). This is a win if it is generally true that clients only want the first child in creation order, rather than the whole set. We could expose this to the client via this API: getFirstChild(handle, path, name_buffer, watcher) which would have much the same semantics as getChildren, but only return one znode name. Sequential nodes would still allow the ordering of znodes to be made explicitly available to the client in one RPC should it need it. Although: since this ordering would now be available cheaply for every set of children, it's not completely clear that there would be that many use cases left for sequential nodes if this API was augmented with a getChildrenInCreationOrder call. However, that's for a different discussion. A halfway-house alternative with more flexibility is to add an 'order' parameter to getFirstChild and have the server compute the first child according to the requested order (creation time, update time, lexicographical order). This saves bandwidth at the expense of increased server load, although servers can be implemented to spend memory on pre-computing commonly requested orders. I am only in favour of this approach if servers maintain a data-structure for every possible order, and then the memory implications need careful consideration. [edit - JIRA interprets ( n ) without the spaces as a thumbs-down. cute.] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-423) Add getFirstChild API
[ https://issues.apache.org/jira/browse/ZOOKEEPER-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-423: - Attachment: ZOOKEEPER-423.patch Draft patch Add getFirstChild API - Key: ZOOKEEPER-423 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-423 Project: ZooKeeper Issue Type: New Feature Components: contrib-bindings, documentation, java client, server Reporter: Henry Robinson Attachments: ZOOKEEPER-423.patch When building the distributed queue for my tutorial blog post, it was pointed out to me that there's a serious inefficiency here. Informally, the items in the queue are created as sequential nodes. For a 'dequeue' call, all items are retrieved and sorted by name by the client in order to find the name of the next item to try and take. This costs O( n ) bandwidth and O(n.log n) sorting time - per dequeue call! Clearly this doesn't scale very well. If the servers were able to maintain a data structure that allowed them to efficiently retrieve the children of a node in order of the zxid that created them this would make successful dequeue operations O( 1 ) at the cost of O( n ) memory on the server (to maintain, e.g. a singly-linked list as a queue). This is a win if it is generally true that clients only want the first child in creation order, rather than the whole set. We could expose this to the client via this API: getFirstChild(handle, path, name_buffer, watcher) which would have much the same semantics as getChildren, but only return one znode name. Sequential nodes would still allow the ordering of znodes to be made explicitly available to the client in one RPC should it need it. Although: since this ordering would now be available cheaply for every set of children, it's not completely clear that there would be that many use cases left for sequential nodes if this API was augmented with a getChildrenInCreationOrder call. However, that's for a different discussion. A halfway-house alternative with more flexibility is to add an 'order' parameter to getFirstChild and have the server compute the first child according to the requested order (creation time, update time, lexicographical order). This saves bandwidth at the expense of increased server load, although servers can be implemented to spend memory on pre-computing commonly requested orders. I am only in favour of this approach if servers maintain a data-structure for every possible order, and then the memory implications need careful consideration. [edit - JIRA interprets ( n ) without the spaces as a thumbs-down. cute.] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-423) Add getFirstChild API
[ https://issues.apache.org/jira/browse/ZOOKEEPER-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-423: - Attachment: (was: ZOOKEEPER-423.patch) Add getFirstChild API - Key: ZOOKEEPER-423 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-423 Project: ZooKeeper Issue Type: New Feature Components: contrib-bindings, documentation, java client, server Reporter: Henry Robinson When building the distributed queue for my tutorial blog post, it was pointed out to me that there's a serious inefficiency here. Informally, the items in the queue are created as sequential nodes. For a 'dequeue' call, all items are retrieved and sorted by name by the client in order to find the name of the next item to try and take. This costs O( n ) bandwidth and O(n.log n) sorting time - per dequeue call! Clearly this doesn't scale very well. If the servers were able to maintain a data structure that allowed them to efficiently retrieve the children of a node in order of the zxid that created them this would make successful dequeue operations O( 1 ) at the cost of O( n ) memory on the server (to maintain, e.g. a singly-linked list as a queue). This is a win if it is generally true that clients only want the first child in creation order, rather than the whole set. We could expose this to the client via this API: getFirstChild(handle, path, name_buffer, watcher) which would have much the same semantics as getChildren, but only return one znode name. Sequential nodes would still allow the ordering of znodes to be made explicitly available to the client in one RPC should it need it. Although: since this ordering would now be available cheaply for every set of children, it's not completely clear that there would be that many use cases left for sequential nodes if this API was augmented with a getChildrenInCreationOrder call. However, that's for a different discussion. A halfway-house alternative with more flexibility is to add an 'order' parameter to getFirstChild and have the server compute the first child according to the requested order (creation time, update time, lexicographical order). This saves bandwidth at the expense of increased server load, although servers can be implemented to spend memory on pre-computing commonly requested orders. I am only in favour of this approach if servers maintain a data-structure for every possible order, and then the memory implications need careful consideration. [edit - JIRA interprets ( n ) without the spaces as a thumbs-down. cute.] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely
[ https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975918#action_12975918 ] Henry Robinson commented on ZOOKEEPER-965: -- Hi Ted - You don't have to lose the idiomatic Java - I like the API. I'm just distinguishing between the API provided by the Java client and the API that the Java client calls in ZooKeeper, which should not be idiomatic. Currently that API is defined by the serialisation (and this is true for most API calls, not just multi(...)) rather than some abstract API signature which is realisable in every implementing language - I want to make sure that the serialisation is not the specification. Again, an Avro or Thrift or whatever IDL API would make these issues go away. However, the idiomatic changes here are so slight that having thought about it overnight I'm not too concerned about separating it out. It would be different if, e.g. the API was heavily object-oriented (for example a builder interface); then I would mandate that such an API should wrap a simple procedural API. It's pretty clear here what the multi(...) API means in all languages we're interested in. Thanks for listening :) Henry Need a multi-update command to allow multiple znodes to be updated safely - Key: ZOOKEEPER-965 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965 Project: ZooKeeper Issue Type: Bug Reporter: Ted Dunning Assignee: Ted Dunning Fix For: 3.4.0 The basic idea is to have a single method called multi that will accept a list of create, delete, update or check objects each of which has a desired version or file state in the case of create. If all of the version and existence constraints can be satisfied, then all updates will be done atomically. Two API styles have been suggested. One has a list as above and the other style has a Transaction that allows builder-like methods to build a set of updates and a commit method to finalize the transaction. This can trivially be reduced to the first kind of API so the list based API style should be considered the primitive and the builder style should be implemented as syntactic sugar. The total size of all the data in all updates and creates in a single transaction should be limited to 1MB. Implementation-wise this capability can be done using standard ZK internals. The changes include: - update to ZK clients to all the new call - additional wire level request - on the server, in the code that converts transactions to idempotent form, the code should be slightly extended to convert a list of operations to idempotent form. - on the client, a down-rev server that rejects the multi-update should be detected gracefully and an informative exception should be thrown. To facilitate shared development, I have established a github repository at https://github.com/tdunning/zookeeper and am happy to extend committer status to anyone who agrees to donate their code back to Apache. The final patch will be attached to this bug as normal. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-921) zkPython incorrectly checks for existence of required ACL elements
[ https://issues.apache.org/jira/browse/ZOOKEEPER-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-921: - Hadoop Flags: [Reviewed] +1 Nice work, particularly nice catch on the test not running bug. I'll commit this shortly. zkPython incorrectly checks for existence of required ACL elements -- Key: ZOOKEEPER-921 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-921 Project: ZooKeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.1, 3.4.0 Environment: Mac OS X 10.6.4, included Python 2.6.1 Reporter: Nicholas Knight Assignee: Nicholas Knight Fix For: 3.3.3, 3.4.0 Attachments: zktest.py, ZOOKEEPER-921.patch Calling {{zookeeper.create()}} seems, under certain circumstances, to be corrupting a subsequent call to Python's {{logging}} module. Specifically, if the node does not exist (but its parent does), I end up with a traceback like this when I try to make the logging call: {noformat} Traceback (most recent call last): File zktest.py, line 21, in module logger.error(Boom?) File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py, line 1046, in error if self.isEnabledFor(ERROR): File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py, line 1206, in isEnabledFor return level = self.getEffectiveLevel() File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py, line 1194, in getEffectiveLevel while logger: TypeError: an integer is required {noformat} But if the node already exists, or the parent does not exist, I get the appropriate NodeExists or NoNode exceptions. I'll be attaching a test script that can be used to reproduce this behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-963) Make Forrest work with JDK6
[ https://issues.apache.org/jira/browse/ZOOKEEPER-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-963: - Hadoop Flags: [Reviewed] +1 It works! Oh happy day. I'll commit this asap to 3.3.3 and 3.4. Make Forrest work with JDK6 --- Key: ZOOKEEPER-963 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-963 Project: ZooKeeper Issue Type: Bug Components: build, documentation Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 3.3.3, 3.4.0 Attachments: ZOOKEEPER-963.1.patch.txt It's possible to make Forrest work with JDK6 by disabling sitemap validation in the forrest.properties file. See FOR-984 and PIG-1508 for more details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-837) cyclic dependency ClientCnxn, ZooKeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975701#action_12975701 ] Henry Robinson commented on ZOOKEEPER-837: -- Thomas - ZOOKEEPER-823 is not yet committed. Since this is a refactor, not a new feature or bug fix and might require 823 (a much larger and more complex patch, which you are also working on) to be reworked, do you want to move this forward or wait? Is there a significant benefit to getting this in before 823? cyclic dependency ClientCnxn, ZooKeeper --- Key: ZOOKEEPER-837 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-837 Project: ZooKeeper Issue Type: Sub-task Affects Versions: 3.3.1 Reporter: Patrick Datko Assignee: Thomas Koch Fix For: 3.4.0 Attachments: ZOOKEEPER-837.patch, ZOOKEEPER-837.patch ZooKeeper instantiates ClientCnxn in its ctor with this and therefor builds a cyclic dependency graph between both objects. This means, you can't have the one without the other. So why did you bother do make them to separate classes in the first place? ClientCnxn accesses ZooKeeper.state. State should rather be a property of ClientCnxn. And ClientCnxn accesses zooKeeper.get???Watches() in its method primeConnection(). I've not yet checked, how this dependency should be resolved better. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely
[ https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975702#action_12975702 ] Henry Robinson commented on ZOOKEEPER-965: -- Hi Ted - I took a quick look at the github branch. Looks really good, thanks. I've got a few comments on the code itself, but I'll save those until you post a patch. My main issue is the following: the 'multi' API call is expressed in terms of an iterable over a polymorphic type, both of which are Java features that aren't extant in C. To aid future language bindings authors and to make the implementation really easy to verify, I'd like to see an API signature that's very easily translated between languages. The iterable isn't too concerning (almost every language has *some* notion of lists) but the polymorphic op object should map onto some simpler struct type. I know that the serialisation is independent of the signature, so we could call it whatever we liked in any language, but I'd like to keep the core ZK API consistent across all bindings where possible and use wrappers in, for example, Python to provide more idiomatic interfaces. The serialisation may also change when we finally vote jute off the island, so we can't use that as the API spec. Indeed, we'll probably use Avro, where we have to write APIs in language-agnostic IDLs. So, to cut a long story short: any chance you can make the API a bit more language neutral? Then the op stuff can be a (very) thin wrapper. Shouldn't be a large change at all. You might consider chopping this up into a few JIRAs (apologies if you have and I've missed them) - core API, Java wrapper, finishing touches (like payload size limits). Excited to see this! Let me know how I can help. Henry Need a multi-update command to allow multiple znodes to be updated safely - Key: ZOOKEEPER-965 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965 Project: ZooKeeper Issue Type: Bug Reporter: Ted Dunning Fix For: 3.4.0 The basic idea is to have a single method called multi that will accept a list of create, delete, update or check objects each of which has a desired version or file state in the case of create. If all of the version and existence constraints can be satisfied, then all updates will be done atomically. Two API styles have been suggested. One has a list as above and the other style has a Transaction that allows builder-like methods to build a set of updates and a commit method to finalize the transaction. This can trivially be reduced to the first kind of API so the list based API style should be considered the primitive and the builder style should be implemented as syntactic sugar. The total size of all the data in all updates and creates in a single transaction should be limited to 1MB. Implementation-wise this capability can be done using standard ZK internals. The changes include: - update to ZK clients to all the new call - additional wire level request - on the server, in the code that converts transactions to idempotent form, the code should be slightly extended to convert a list of operations to idempotent form. - on the client, a down-rev server that rejects the multi-update should be detected gracefully and an informative exception should be thrown. To facilitate shared development, I have established a github repository at https://github.com/tdunning/zookeeper and am happy to extend committer status to anyone who agrees to donate their code back to Apache. The final patch will be attached to this bug as normal. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely
[ https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975704#action_12975704 ] Henry Robinson commented on ZOOKEEPER-965: -- Hi Thomas - I really appreciate all your hard work cleaning up ZooKeeper's internals. I understand your frustration about the speed at which some tickets are moving. You've correctly identified that the committers have limited time, and particularly so over the holiday season. Hopefully we can pick up the pace now! However, I'm uncomfortable with the idea that ongoing refactoring work could block an often asked-for feature like this - particularly a JIRA (911) where there isn't yet consensus on the approach, or indeed an available patch. Open source projects see fluctuating participation so we can't afford generally for issues to come with 'locks' on the code they touch, otherwise we run the risk of starvation :) So if this issue gets a patch with consensus before ZOOKEEPER-911, I'll be very happy to commit it and then to work with you on the extra changes in ZOOKEEPER-911 that this patch would cause. Henry Need a multi-update command to allow multiple znodes to be updated safely - Key: ZOOKEEPER-965 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965 Project: ZooKeeper Issue Type: Bug Reporter: Ted Dunning Fix For: 3.4.0 The basic idea is to have a single method called multi that will accept a list of create, delete, update or check objects each of which has a desired version or file state in the case of create. If all of the version and existence constraints can be satisfied, then all updates will be done atomically. Two API styles have been suggested. One has a list as above and the other style has a Transaction that allows builder-like methods to build a set of updates and a commit method to finalize the transaction. This can trivially be reduced to the first kind of API so the list based API style should be considered the primitive and the builder style should be implemented as syntactic sugar. The total size of all the data in all updates and creates in a single transaction should be limited to 1MB. Implementation-wise this capability can be done using standard ZK internals. The changes include: - update to ZK clients to all the new call - additional wire level request - on the server, in the code that converts transactions to idempotent form, the code should be slightly extended to convert a list of operations to idempotent form. - on the client, a down-rev server that rejects the multi-update should be detected gracefully and an informative exception should be thrown. To facilitate shared development, I have established a github repository at https://github.com/tdunning/zookeeper and am happy to extend committer status to anyone who agrees to donate their code back to Apache. The final patch will be attached to this bug as normal. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely
[ https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-965: - Assignee: Ted Dunning Assigning to Ted. Need a multi-update command to allow multiple znodes to be updated safely - Key: ZOOKEEPER-965 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965 Project: ZooKeeper Issue Type: Bug Reporter: Ted Dunning Assignee: Ted Dunning Fix For: 3.4.0 The basic idea is to have a single method called multi that will accept a list of create, delete, update or check objects each of which has a desired version or file state in the case of create. If all of the version and existence constraints can be satisfied, then all updates will be done atomically. Two API styles have been suggested. One has a list as above and the other style has a Transaction that allows builder-like methods to build a set of updates and a commit method to finalize the transaction. This can trivially be reduced to the first kind of API so the list based API style should be considered the primitive and the builder style should be implemented as syntactic sugar. The total size of all the data in all updates and creates in a single transaction should be limited to 1MB. Implementation-wise this capability can be done using standard ZK internals. The changes include: - update to ZK clients to all the new call - additional wire level request - on the server, in the code that converts transactions to idempotent form, the code should be slightly extended to convert a list of operations to idempotent form. - on the client, a down-rev server that rejects the multi-update should be detected gracefully and an informative exception should be thrown. To facilitate shared development, I have established a github repository at https://github.com/tdunning/zookeeper and am happy to extend committer status to anyone who agrees to donate their code back to Apache. The final patch will be attached to this bug as normal. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.