[jira] [Commented] (ZOOKEEPER-1360) QuorumTest.testNoLogBeforeLeaderEstablishment has several problems

2017-08-14 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126316#comment-16126316
 ] 

Henry Robinson commented on ZOOKEEPER-1360:
---

Not at all - haven't looked at this in years!

> QuorumTest.testNoLogBeforeLeaderEstablishment has several problems
> --
>
> Key: ZOOKEEPER-1360
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1360
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.4.2
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 3.5.4, 3.6.0
>
>
> After the apparently valid fix to ZOOKEEPER-1294, 
> testNoLogBeforeLeaderEstablishment is failing for me about one time in four. 
> While I'll investigate whether the patch is 1294 is ultimately to blame, 
> reading the test brought to light a number of issues that appear to be bugs 
> or in need of improvement:
> * As part of QuorumTest, an ensemble is already established by the fixture 
> setup code, but apparently unused by the test which uses QuorumUtil. 
> * The test reads QuorumPeer.leader and QuorumPeer.follower without 
> synchronization, which means that writes to those fields may not be published 
> when we come to read them. 
> * The return value of sem.tryAcquire is never checked.
> * The progress of the test is based on ad-hoc timings (25 * 500ms sleeps) and 
> inscrutable numbers of iterations through the main loop (e.g. the semaphore 
> blocking the final asserts is released only after the 2th of 5 
> callbacks)
> * The test as a whole takes ~30s to run
> The first three are easy to fix (as part of fixing the second, I intend to 
> hide all members of QuorumPeer behind getters and setters), the fourth and 
> fifth need a slightly deeper understanding of what the test is trying to 
> achieve.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-1697) large snapshots can cause continuous quorum failure

2013-05-09 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13653114#comment-13653114
 ] 

Henry Robinson commented on ZOOKEEPER-1697:
---

[~phunt] - this seems _much_ clearer and easier to reason about.  

 large snapshots can cause continuous quorum failure
 ---

 Key: ZOOKEEPER-1697
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1697
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3, 3.5.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.5.0, 3.4.6

 Attachments: ZOOKEEPER-1697_branch34.patch, 
 ZOOKEEPER-1697_branch34.patch, ZOOKEEPER-1697.patch, ZOOKEEPER-1697.patch


 I keep seeing this on the leader:
 2013-04-30 01:18:39,754 INFO
 org.apache.zookeeper.server.quorum.Leader: Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:447)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:422)
 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 The followers are downloading the snapshot when this happens, and are
 trying to do their first ACK to the leader, the ack fails with broken
 pipe.
 In this case the snapshots are large and the config has increased the
 initLimit. syncLimit is small - 10 or so with ticktime of 2000. Note
 this is 3.4.3 with ZOOKEEPER-1521 applied.
 I originally speculated that
 https://issues.apache.org/jira/browse/ZOOKEEPER-1521 might be related.
 I thought I might have broken something for this environment. That
 doesn't look to be the case.
 As it looks now it seems that 1521 didn't go far enough. The leader
 verifies that all followers have ACK'd to the leader within the last
 syncLimit time period. This runs all the time in the background on
 the leader to identify the case where a follower drops. In this case
 the followers take so long to load the snapshot that this check fails
 the very first time, as a result the leader drops (not enough ack'd
 followers w/in the sync limit) and re-election happens. This repeats
 forever. (the above error)
 this is the call:
 org.apache.zookeeper.server.quorum.LearnerHandler.synced() that's at
 odds.
 look at setting of tickOfLastAck in
 org.apache.zookeeper.server.quorum.LearnerHandler.run()
 It's not set until the follower first acks - in this case I can see
 that the followers are not getting to the ack prior to the leader
 shutting down due to the error log above.
 It seems that sync() should probably use the init limit until the
 first ack comes in from the follower. I also see that while tickOfLastAck and 
 leader.self.tick is shared btw two threads there is no synchronization of the 
 shared resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1346) Handle 4lws and monitoring on separate port

2012-10-03 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468901#comment-13468901
 ] 

Henry Robinson commented on ZOOKEEPER-1346:
---

+1, great idea.

I do think it's important that we set out with the intention of deprecating the 
old protocol eventually. This is a good opportunity to properly establish a 
procedure for doing that. I suggest including both in the next major release 
(3.5?) and warning that the old protocol will be turned off in 3.6. Assuming 
all goes according to plan, we can eventually ship 3.6 with only a Jetty-based 
4lw implementation. 

 Handle 4lws and monitoring on separate port
 ---

 Key: ZOOKEEPER-1346
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1346
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Camille Fournier
Assignee: Camille Fournier
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1346_jetty.patch


 Move the 4lws to their own port, off of the client port, and support them 
 properly via long-lived sessions instead of polling. Deprecate the 4lw 
 support on the client port. Will enable us to enhance the functionality of 
 the commands via extended command syntax, address security concerns and fix 
 bugs involving the socket close being received before all of the data on the 
 client end.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (ZOOKEEPER-1238) when the linger time was changed for NIO the patch missed Netty

2012-09-24 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson reassigned ZOOKEEPER-1238:
-

Assignee: Skye Wanderman-Milne

 when the linger time was changed for NIO the patch missed Netty
 ---

 Key: ZOOKEEPER-1238
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1238
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.0, 3.5.0
Reporter: Patrick Hunt
Assignee: Skye Wanderman-Milne
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1238.patch


 from NettyServerCnxn:
 bq. bootstrap.setOption(child.soLinger, 2);
 See ZOOKEEPER-1049

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (ZOOKEEPER-1376) zkServer.sh does not correctly check for $SERVER_JVMFLAGS

2012-09-24 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson reassigned ZOOKEEPER-1376:
-

Assignee: Skye Wanderman-Milne

 zkServer.sh does not correctly check for $SERVER_JVMFLAGS
 -

 Key: ZOOKEEPER-1376
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1376
 Project: ZooKeeper
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.3.3, 3.3.4
Reporter: Patrick Hunt
Assignee: Skye Wanderman-Milne
Priority: Minor
  Labels: newbie
 Fix For: 3.3.7, 3.4.5

 Attachments: ZOOKEEPER-1376.patch


 It will always include it even if not defined, although not much harm.
 if [ x$SERVER_JVMFLAGS ]
 then
 JVMFLAGS=$SERVER_JVMFLAGS $JVMFLAGS
 fi
 should use the std idiom.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1238) when the linger time was changed for NIO the patch missed Netty

2012-09-24 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13462237#comment-13462237
 ] 

Henry Robinson commented on ZOOKEEPER-1238:
---

+1, patch looks good to me - the lack of tests isn't a problem for this change. 
I'll commit shortly. 

 when the linger time was changed for NIO the patch missed Netty
 ---

 Key: ZOOKEEPER-1238
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1238
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.0, 3.5.0
Reporter: Patrick Hunt
Assignee: Skye Wanderman-Milne
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1238.patch


 from NettyServerCnxn:
 bq. bootstrap.setOption(child.soLinger, 2);
 See ZOOKEEPER-1049

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1376) zkServer.sh does not correctly check for $SERVER_JVMFLAGS

2012-09-21 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460911#comment-13460911
 ] 

Henry Robinson commented on ZOOKEEPER-1376:
---

+1, patch looks good to me. I'll commit shortly to 3.3 and 3.4.

 zkServer.sh does not correctly check for $SERVER_JVMFLAGS
 -

 Key: ZOOKEEPER-1376
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1376
 Project: ZooKeeper
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.3.3, 3.3.4
Reporter: Patrick Hunt
Priority: Minor
  Labels: newbie
 Fix For: 3.3.7, 3.4.5

 Attachments: ZOOKEEPER-1376.patch


 It will always include it even if not defined, although not much harm.
 if [ x$SERVER_JVMFLAGS ]
 then
 JVMFLAGS=$SERVER_JVMFLAGS $JVMFLAGS
 fi
 should use the std idiom.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1361) Leader.lead iterates over 'learners' set without proper synchronisation

2012-09-10 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452560#comment-13452560
 ] 

Henry Robinson commented on ZOOKEEPER-1361:
---

Hey - sorry for the delay. I don't think the extra synchronisation in 
sendPacket is strictly necessary (because note that the forwardingFollowers 
lock is already held). However I think that the scoped lock around queuePacket 
is probably not required, should be removed, not the call to 
getForwardingFollowers. Make sense? 

 Leader.lead iterates over 'learners' set without proper synchronisation
 ---

 Key: ZOOKEEPER-1361
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1361
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.2
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.4.4, 3.5.0

 Attachments: zk-memory-leak-fix.patch, ZOOKEEPER-1361-3.4.patch, 
 ZOOKEEPER-1361-no-whitespace.patch, ZOOKEEPER-1361.patch


 This block:
 {code}
 HashSetLong followerSet = new HashSetLong();
 for(LearnerHandler f : learners)
 followerSet.add(f.getSid());
 {code}
 is executed without holding the lock on learners, so if there were ever a 
 condition where a new learner was added during the initial sync phase, I'm 
 pretty sure we'd see a concurrent modification exception. Certainly other 
 parts of the code are very careful to lock on learners when iterating. 
 It would be nice to use a {{ConcurrentHashMap}} to hold the learners instead, 
 but I can't convince myself that this wouldn't introduce some correctness 
 bugs. For example the following:
 Learners contains A, B, C, D
 Thread 1 iterates over learners, and gets as far as B.
 Thread 2 removes A, and adds E.
 Thread 1 continues iterating and sees a learner view of A, B, C, D, E
 This may be a bug if Thread 1 is counting the number of synced followers for 
 a quorum count, since at no point was A, B, C, D, E a correct view of the 
 quorum.
 In practice, I think this is actually ok, because I don't think ZK makes any 
 strong ordering guarantees on learners joining or leaving (so we don't need a 
 strong serialisability guarantee on learners) but I don't think I'll make 
 that change for this patch. Instead I want to clean up the locking protocols 
 on the follower / learner sets - to avoid another easy deadlock like the one 
 we saw in ZOOKEEPER-1294 - and to do less with the lock held; i.e. to copy 
 and then iterate over the copy rather than iterate over a locked set. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1514) FastLeaderElection - leader ignores the round information when joining a quorum

2012-07-30 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425397#comment-13425397
 ] 

Henry Robinson commented on ZOOKEEPER-1514:
---

Hi Flavio - 

I don't really mind the check, it's just completely unnecessary (since listener 
== null = NPE = failed test). Let's keep it in if you think it is important. 

What is a problem, and I agree not worth fixing here, is that this is yet 
another example of class members not being hidden behind getters / setters that 
maintain correct invariants. Anyone can set listener to null, because it's a 
non-final public member, so every read of that variable in code that mustn't 
crash has to defensively check that it's not null, when we should be relying on 
the class to do this for us. 

Anyhow, this looks ok to me - +1, happy to commit. 

 FastLeaderElection - leader ignores the round information when joining a 
 quorum
 ---

 Key: ZOOKEEPER-1514
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1514
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.4
Reporter: Patrick Hunt
Assignee: Flavio Junqueira
Priority: Critical
 Fix For: 3.4.4, 3.5.0, 3.3.7

 Attachments: ZOOKEEPER-1514.patch, ZOOKEEPER-1514.patch, 
 ZOOKEEPER-1514.patch


 In the following case we have a 3 server ensemble.
 Initially all is well, zk3 is the leader.
 However zk3 fails, restarts, and rejoins the quorum as the new leader (was 
 the old leader, still the leader after re-election)
 The existing two followers, zk1 and zk2 rejoin the new quorum again as 
 followers of zk3.
 zk1 then fails, the datadirectory is deleted (so it has no state whatsoever) 
 and restarted. However zk1 can never rejoin the quorum (even after an hour). 
 During this time zk2 and zk3 are serving properly.
 Later all three servers are later restarted and properly form a functional 
 quourm.
 Here are some interesting log snippets. Nothing else of interest was seen in 
 the logs during this time:
 zk3. This is where it becomes the leader after failing initially (as the 
 leader). Notice the round is ahead of zk1 and zk2:
 {noformat}
 2012-07-18 17:19:35,423 - INFO  
 [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@663] - New election. My id =  3, 
 Proposed zxid = 77309411648
 2012-07-18 17:19:35,423 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 
 (n.zxid), 832 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
 2012-07-18 17:19:35,424 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 
 (n.zxid), 831 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state)
 2012-07-18 17:19:35,424 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 
 (n.zxid), 831 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state)
 2012-07-18 17:19:35,424 - INFO  [QuorumPeer:/0.0.0.0:2181:QuorumPeer@655] - 
 LEADING
 {noformat}
 zk1 which won't come back. Notice that zk3 is reporting the round as 831, 
 while zk2 thinks that the round is 832:
 {noformat}
 2012-07-18 17:31:12,015 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 77309411648 
 (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state)
 2012-07-18 17:31:12,016 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 
 (n.zxid), 831 (n.round), LEADING (n.state), 3 (n.sid), LOOKING (my state)
 2012-07-18 17:31:12,017 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 
 (n.zxid), 832 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state)
 2012-07-18 17:31:15,219 - INFO  
 [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 
 6400
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1514) FastLeaderElection - leader ignores the round information when joining a quorum

2012-07-28 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13424286#comment-13424286
 ] 

Henry Robinson commented on ZOOKEEPER-1514:
---

Flavio - this looks fine. The point I am trying to make about this bit of code:

{code}
  if(listener != null){
listener.start();
  } else {
LOG.error(Null listener when initializing cnx manager);
Assert.fail(Failed to create cnx manager);
  }
{code}

is that there's no need for the null check, since if {{listener}} is null, 
there'll be an NPE thrown which will fail the test anyhow. Plus, looking at 
{{QuorumCnxManager.java:153}}, I can't see any way in which {{listener}} can be 
null, because it's unambiguously assigned to a {{new Listener()}}. Is there a 
case that I'm missing?

I know this doesn't really affect the functionality of the patch, but if these 
checks aren't necessary, it will be confusing to the reader in the future. 

 FastLeaderElection - leader ignores the round information when joining a 
 quorum
 ---

 Key: ZOOKEEPER-1514
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1514
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.4
Reporter: Patrick Hunt
Assignee: Flavio Junqueira
Priority: Critical
 Fix For: 3.4.4, 3.5.0, 3.3.7

 Attachments: ZOOKEEPER-1514.patch, ZOOKEEPER-1514.patch, 
 ZOOKEEPER-1514.patch


 In the following case we have a 3 server ensemble.
 Initially all is well, zk3 is the leader.
 However zk3 fails, restarts, and rejoins the quorum as the new leader (was 
 the old leader, still the leader after re-election)
 The existing two followers, zk1 and zk2 rejoin the new quorum again as 
 followers of zk3.
 zk1 then fails, the datadirectory is deleted (so it has no state whatsoever) 
 and restarted. However zk1 can never rejoin the quorum (even after an hour). 
 During this time zk2 and zk3 are serving properly.
 Later all three servers are later restarted and properly form a functional 
 quourm.
 Here are some interesting log snippets. Nothing else of interest was seen in 
 the logs during this time:
 zk3. This is where it becomes the leader after failing initially (as the 
 leader). Notice the round is ahead of zk1 and zk2:
 {noformat}
 2012-07-18 17:19:35,423 - INFO  
 [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@663] - New election. My id =  3, 
 Proposed zxid = 77309411648
 2012-07-18 17:19:35,423 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 
 (n.zxid), 832 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
 2012-07-18 17:19:35,424 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 
 (n.zxid), 831 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state)
 2012-07-18 17:19:35,424 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 
 (n.zxid), 831 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state)
 2012-07-18 17:19:35,424 - INFO  [QuorumPeer:/0.0.0.0:2181:QuorumPeer@655] - 
 LEADING
 {noformat}
 zk1 which won't come back. Notice that zk3 is reporting the round as 831, 
 while zk2 thinks that the round is 832:
 {noformat}
 2012-07-18 17:31:12,015 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 77309411648 
 (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state)
 2012-07-18 17:31:12,016 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 
 (n.zxid), 831 (n.round), LEADING (n.state), 3 (n.sid), LOOKING (my state)
 2012-07-18 17:31:12,017 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 
 (n.zxid), 832 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state)
 2012-07-18 17:31:15,219 - INFO  
 [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 
 6400
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1521) LearnerHandler initLimit/syncLimit problems specifying follower socket timeout limits

2012-07-26 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423210#comment-13423210
 ] 

Henry Robinson commented on ZOOKEEPER-1521:
---

Good catch. Here's what I notice in branch-3.3:

* {{Leader.java:240}} - the initial read timeout is set to {{syncLimit}} ticks, 
but the first thing we wait for is an ACK from the follower saying that it has 
got up-to-date, which should be subject to {{initLimit}}
* I also saw that in {{Learner.java:220}}, the connection should be established 
with {{initLimit}} as the connection timeout (this is not fixed in branch-3.4). 
However, because there's a retry loop, there's no guarantee that we will 
connect in less than initLimit or syncLimit. So {{initLimit}} is not a hard 
limit at all; but it isn't already for other reasons. 

In branch-3.4:

* {{LearnerHandler.java:336}} sets the initial timeout to {{initLimit}}, but 
never sets it back again after the ACK. And it should just be setting the 
timeout in {{Leader.java:254}} anyhow. 



 LearnerHandler initLimit/syncLimit problems specifying follower socket 
 timeout limits
 -

 Key: ZOOKEEPER-1521
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1521
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3, 3.3.5, 3.5.0
Reporter: Patrick Hunt
Priority: Critical
 Fix For: 3.3.6, 3.4.4, 3.5.0

 Attachments: ZOOKEEPER-1521_br33.patch


 branch 3.3: The leader is expecting the follower to initialize in syncLimit 
 time rather than initLimit. In LearnerHandler run line 395 (branch33) we look 
 for the ack from the follower with a timeout of syncLimit.
 branch 3.4+: seems like ZOOKEEPER-1136 introduced a regression while 
 attempting to fix the problem. It sets the timeout as initLimit however it 
 never sets the timeout to syncLimit once the ack is received.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1514) FastLeaderElection - leader ignores the round information when joining a quorum

2012-07-26 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423443#comment-13423443
 ] 

Henry Robinson commented on ZOOKEEPER-1514:
---

I'm not sure that removing the null checks would mean findbugs warnings (easy 
to try!) - and if the listener is null, the test will throw an NPE and fail 
anyhow which seems like the right thing to do. So I would suggest just removing 
the null checks. What do you think?

 FastLeaderElection - leader ignores the round information when joining a 
 quorum
 ---

 Key: ZOOKEEPER-1514
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1514
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.4
Reporter: Patrick Hunt
Assignee: Flavio Junqueira
Priority: Critical
 Fix For: 3.4.4, 3.5.0, 3.3.7

 Attachments: ZOOKEEPER-1514.patch, ZOOKEEPER-1514.patch


 In the following case we have a 3 server ensemble.
 Initially all is well, zk3 is the leader.
 However zk3 fails, restarts, and rejoins the quorum as the new leader (was 
 the old leader, still the leader after re-election)
 The existing two followers, zk1 and zk2 rejoin the new quorum again as 
 followers of zk3.
 zk1 then fails, the datadirectory is deleted (so it has no state whatsoever) 
 and restarted. However zk1 can never rejoin the quorum (even after an hour). 
 During this time zk2 and zk3 are serving properly.
 Later all three servers are later restarted and properly form a functional 
 quourm.
 Here are some interesting log snippets. Nothing else of interest was seen in 
 the logs during this time:
 zk3. This is where it becomes the leader after failing initially (as the 
 leader). Notice the round is ahead of zk1 and zk2:
 {noformat}
 2012-07-18 17:19:35,423 - INFO  
 [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@663] - New election. My id =  3, 
 Proposed zxid = 77309411648
 2012-07-18 17:19:35,423 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 
 (n.zxid), 832 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
 2012-07-18 17:19:35,424 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 
 (n.zxid), 831 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state)
 2012-07-18 17:19:35,424 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 
 (n.zxid), 831 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state)
 2012-07-18 17:19:35,424 - INFO  [QuorumPeer:/0.0.0.0:2181:QuorumPeer@655] - 
 LEADING
 {noformat}
 zk1 which won't come back. Notice that zk3 is reporting the round as 831, 
 while zk2 thinks that the round is 832:
 {noformat}
 2012-07-18 17:31:12,015 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 77309411648 
 (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state)
 2012-07-18 17:31:12,016 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 
 (n.zxid), 831 (n.round), LEADING (n.state), 3 (n.sid), LOOKING (my state)
 2012-07-18 17:31:12,017 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 
 (n.zxid), 832 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state)
 2012-07-18 17:31:15,219 - INFO  
 [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 
 6400
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1514) FastLeaderElection - leader ignores the round information when joining a quorum

2012-07-22 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13420379#comment-13420379
 ] 

Henry Robinson commented on ZOOKEEPER-1514:
---

Hey Flavio - 

Thanks for fixing this so quickly! Patch looks really nice, a few nits:

* I don't think you need to duplicate {{createMsg}} in 
{{FLEBackwardElectionRound}}, since it's now in {{FLETestUtils}}
* Could you add a comment to 
{{FLEBackwardElectionRound.testBackwardElectionRound}} describing the bug it's 
testing for, and I guess referencing this JIRA?
* If {{listener}} is {{null}} for {{QuorumCnxManager.Listener listener = 
cnxManagers[0].listener;}} and similar, shouldn't the test fail straight away? 
Under what circumstances would this be true?
* There's a small typo - 'instace' - 'instance'


 FastLeaderElection - leader ignores the round information when joining a 
 quorum
 ---

 Key: ZOOKEEPER-1514
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1514
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.4
Reporter: Patrick Hunt
Assignee: Flavio Junqueira
Priority: Critical
 Fix For: 3.4.4, 3.5.0, 3.3.7

 Attachments: ZOOKEEPER-1514.patch, ZOOKEEPER-1514.patch


 In the following case we have a 3 server ensemble.
 Initially all is well, zk3 is the leader.
 However zk3 fails, restarts, and rejoins the quorum as the new leader (was 
 the old leader, still the leader after re-election)
 The existing two followers, zk1 and zk2 rejoin the new quorum again as 
 followers of zk3.
 zk1 then fails, the datadirectory is deleted (so it has no state whatsoever) 
 and restarted. However zk1 can never rejoin the quorum (even after an hour). 
 During this time zk2 and zk3 are serving properly.
 Later all three servers are later restarted and properly form a functional 
 quourm.
 Here are some interesting log snippets. Nothing else of interest was seen in 
 the logs during this time:
 zk3. This is where it becomes the leader after failing initially (as the 
 leader). Notice the round is ahead of zk1 and zk2:
 {noformat}
 2012-07-18 17:19:35,423 - INFO  
 [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@663] - New election. My id =  3, 
 Proposed zxid = 77309411648
 2012-07-18 17:19:35,423 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 
 (n.zxid), 832 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
 2012-07-18 17:19:35,424 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 
 (n.zxid), 831 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state)
 2012-07-18 17:19:35,424 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 
 (n.zxid), 831 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state)
 2012-07-18 17:19:35,424 - INFO  [QuorumPeer:/0.0.0.0:2181:QuorumPeer@655] - 
 LEADING
 {noformat}
 zk1 which won't come back. Notice that zk3 is reporting the round as 831, 
 while zk2 thinks that the round is 832:
 {noformat}
 2012-07-18 17:31:12,015 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 77309411648 
 (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state)
 2012-07-18 17:31:12,016 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 7301480 
 (n.zxid), 831 (n.round), LEADING (n.state), 3 (n.sid), LOOKING (my state)
 2012-07-18 17:31:12,017 - INFO  [WorkerReceiver 
 Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 
 (n.zxid), 832 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state)
 2012-07-18 17:31:15,219 - INFO  
 [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 
 6400
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1431) zkpython: async calls leak memory

2012-06-18 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396393#comment-13396393
 ] 

Henry Robinson commented on ZOOKEEPER-1431:
---

Patch looks good, I'll commit shortly.

 zkpython: async calls leak memory
 -

 Key: ZOOKEEPER-1431
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1431
 Project: ZooKeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.4.3
 Environment: RHEL 6.0, self-built from 3.3.3 sources
Reporter: johan rydberg
Assignee: Kapil Thangavelu
 Fix For: 3.4.4, 3.5.0

 Attachments: pyzk-mem-leak-fix.diff, zk.patch, zktest3.py, zktest4.py

   Original Estimate: 1h
  Remaining Estimate: 1h

 I'm seeing a memory leakage when using the aget method.
 It leaks tuples and dicts, both containing stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (ZOOKEEPER-1473) Committed proposal log retains triple the memory it needs to

2012-05-29 Thread Henry Robinson (JIRA)
Henry Robinson created ZOOKEEPER-1473:
-

 Summary: Committed proposal log retains triple the memory it needs 
to
 Key: ZOOKEEPER-1473
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1473
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Henry Robinson


ZKDatabase.committedLog retains the past 500 transactions to enable fast 
catch-up. This works great, but it's using triple the memory it needs to by 
retaining three copies of the data part of any transaction.

* The first is in {{committedLog[i].request.request.hb}} - a heap-allocated 
{{ByteBuffer}}.
* The second is in {{committedLog[i].request.txn.data}} - a jute-serialised 
record of the transaction
* The third is in {{committedLog[i].packet.data}} - also jute-serialised, 
seemingly uninitialised data.

This means that a ZK-server could be using 1G of memory more than it should be 
in the worst case. We should use just one copy of the data, even if we really 
have to refer to it 3 times. 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1473) Committed proposal log retains triple the memory it needs to

2012-05-29 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-1473:
--

Description: 
ZKDatabase.committedLog retains the past 500 transactions to enable fast 
catch-up. This works great, but it's using triple the memory it needs to by 
retaining three copies of the data part of any transaction.

* The first is in committedLog[i].request.request.hb - a heap-allocated 
{{ByteBuffer}}.
* The second is in committedLog[i].request.txn.data - a jute-serialised record 
of the transaction
* The third is in committedLog[i].packet.data - also jute-serialised, seemingly 
uninitialised data.

This means that a ZK-server could be using 1G of memory more than it should be 
in the worst case. We should use just one copy of the data, even if we really 
have to refer to it 3 times. 


  was:
ZKDatabase.committedLog retains the past 500 transactions to enable fast 
catch-up. This works great, but it's using triple the memory it needs to by 
retaining three copies of the data part of any transaction.

* The first is in {{committedLog[i].request.request.hb}} - a heap-allocated 
{{ByteBuffer}}.
* The second is in {{committedLog[i].request.txn.data}} - a jute-serialised 
record of the transaction
* The third is in {{committedLog[i].packet.data}} - also jute-serialised, 
seemingly uninitialised data.

This means that a ZK-server could be using 1G of memory more than it should be 
in the worst case. We should use just one copy of the data, even if we really 
have to refer to it 3 times. 



 Committed proposal log retains triple the memory it needs to
 

 Key: ZOOKEEPER-1473
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1473
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Henry Robinson

 ZKDatabase.committedLog retains the past 500 transactions to enable fast 
 catch-up. This works great, but it's using triple the memory it needs to by 
 retaining three copies of the data part of any transaction.
 * The first is in committedLog[i].request.request.hb - a heap-allocated 
 {{ByteBuffer}}.
 * The second is in committedLog[i].request.txn.data - a jute-serialised 
 record of the transaction
 * The third is in committedLog[i].packet.data - also jute-serialised, 
 seemingly uninitialised data.
 This means that a ZK-server could be using 1G of memory more than it should 
 be in the worst case. We should use just one copy of the data, even if we 
 really have to refer to it 3 times. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1461) Zookeeper C client doesn't check for NULL before dereferencing in prepend_string

2012-05-01 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266236#comment-13266236
 ] 

Henry Robinson commented on ZOOKEEPER-1461:
---

See ZOOKEEPER-1305 - this was fixed in trunk and 3.4, but not in 3.3. We should 
probably close this as a duplicate and commit 1305 to 3.3. See my comment 
there. 

 Zookeeper C client doesn't check for NULL before dereferencing in 
 prepend_string
 

 Key: ZOOKEEPER-1461
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1461
 Project: ZooKeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.5
Reporter: Stephen Tyree
Assignee: Stephen Tyree
 Fix For: 3.3.6

 Attachments: ZOOKEEPER-1461.PATCH

   Original Estimate: 0h
  Remaining Estimate: 0h

 prepend_string, called before any checks for NULL in the c client for many 
 API functions, has this line (zookeeper 3.3.5):
 if (zh-chroot == NULL)
 That means that before you check for NULL, you are dereferencing the pointer. 
 This bug does not exist in the 3.4.* branch for whatever reason, but it still 
 remains in the 3.3.* line. A patch which fixes it would make the line as 
 follows:
 if (zh == NULL || zh-chroot == NULL)
 I would do that for you, but I don't know how to patch the 3.3.5 branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1305) zookeeper.c:prepend_string func can dereference null ptr

2012-05-01 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266237#comment-13266237
 ] 

Henry Robinson commented on ZOOKEEPER-1305:
---

Hey Mahadev - 

Seems like some people are hitting this bug in 3.3 ZOOKEEPER-1461 - did you 
mean not to commit this to 3.3? If not, I'll go ahead and commit this there. 

Thanks,

Henry

 zookeeper.c:prepend_string func can dereference null ptr
 

 Key: ZOOKEEPER-1305
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1305
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.3
 Environment: All
Reporter: Daniel Lescohier
Assignee: Daniel Lescohier
  Labels: patch
 Fix For: 3.4.1, 3.5.0

 Attachments: ZOOKEEPER-1305.patch, ZOOKEEPER-1305.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 All the callers of the function prepend_string make a call to prepend_string 
 before checking that zhandle_t *zh is not null. At the top of prepend_string, 
 zh is dereferenced without checking for a null ptr:
 static char* prepend_string(zhandle_t *zh, const char* client_path) {
 char *ret_str;
 if (zh-chroot == NULL)
 return (char *) client_path;
 I propose fixing this by adding the check here in prepend_string:
 static char* prepend_string(zhandle_t *zh, const char* client_path) {
 char *ret_str;
 if (zh==NULL || zh-chroot == NULL)
 return (char *) client_path;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1318) In Python binding, get_children (and get and exists, and probably others) with expired session doesn't raise exception properly

2012-04-30 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-1318:
--

Attachment: ZOOKEEPER-1318.patch

The error is a missing InvalidStateException. I've added the exception type, 
and confirmed that it show up when session expiration occurs. 

 In Python binding, get_children (and get and exists, and probably others) 
 with expired session doesn't raise exception properly
 ---

 Key: ZOOKEEPER-1318
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1318
 Project: ZooKeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.3
 Environment: Mac OS X (at least)
Reporter: Jim Fulton
 Attachments: ZOOKEEPER-1318.patch


 In Python binding, get_children (and get and exists, and probably others) 
 with expired session doesn't raise exception properly.
  zookeeper.state(h)
 -112
  zookeeper.get_children(h, '/')
 Traceback (most recent call last):
   File console, line 1, in module
 SystemError: error return without exception set
 Let me know if you'd like me to work on a patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1318) In Python binding, get_children (and get and exists, and probably others) with expired session doesn't raise exception properly

2012-04-30 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-1318:
--

Attachment: ZOOKEEPER-1318.patch

Updated patch - InvalidStateException was already declared, just not dealt with 
in err_to_exception.

This is a very simple patch, and tests are hard to write for session expired 
exceptions; also we don't have coverage for similar cases with other 
exceptions. 

 In Python binding, get_children (and get and exists, and probably others) 
 with expired session doesn't raise exception properly
 ---

 Key: ZOOKEEPER-1318
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1318
 Project: ZooKeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.3
 Environment: Mac OS X (at least)
Reporter: Jim Fulton
Assignee: Henry Robinson
 Fix For: 3.3.6, 3.4.4, 3.5.0

 Attachments: ZOOKEEPER-1318.patch, ZOOKEEPER-1318.patch


 In Python binding, get_children (and get and exists, and probably others) 
 with expired session doesn't raise exception properly.
  zookeeper.state(h)
 -112
  zookeeper.get_children(h, '/')
 Traceback (most recent call last):
   File console, line 1, in module
 SystemError: error return without exception set
 Let me know if you'd like me to work on a patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1097) Quota is not correctly rehydrated on snapshot reload

2011-06-26 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055234#comment-13055234
 ] 

Henry Robinson commented on ZOOKEEPER-1097:
---

Just committed this to 3.3. Thanks Camille!

 Quota is not correctly rehydrated on snapshot reload
 

 Key: ZOOKEEPER-1097
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1097
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.3, 3.4.0
Reporter: Camille Fournier
Assignee: Camille Fournier
Priority: Blocker
 Fix For: 3.3.4, 3.4.0

 Attachments: 1097.patch, ZOOKEEPER-1097, ZOOKEEPER-1097-33.patch, 
 ZOOKEEPER-1097-whitespace.patch, ZOOKEEPER-1097-whitespace.patch, 
 ZOOKEEPER-1097.patch, ZOOKEEPER-1097.patch


 traverseNode in DataTree will never actually traverse the limit nodes 
 properly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1097) Quota is not correctly rehydrated on snapshot reload

2011-06-21 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-1097:
--

Attachment: ZOOKEEPER-1097-whitespace.patch

 Quota is not correctly rehydrated on snapshot reload
 

 Key: ZOOKEEPER-1097
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1097
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.3, 3.4.0
Reporter: Camille Fournier
Assignee: Camille Fournier
Priority: Blocker
 Fix For: 3.3.4, 3.4.0

 Attachments: 1097.patch, ZOOKEEPER-1097, 
 ZOOKEEPER-1097-whitespace.patch, ZOOKEEPER-1097-whitespace.patch, 
 ZOOKEEPER-1097.patch, ZOOKEEPER-1097.patch


 traverseNode in DataTree will never actually traverse the limit nodes 
 properly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (ZOOKEEPER-1095) Simple leader election recipe

2011-06-15 Thread Henry Robinson (JIRA)
Simple leader election recipe
-

 Key: ZOOKEEPER-1095
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1095
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Henry Robinson


Leader election recipe originally contributed to ZOOKEEPER-1080.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1080) Provide a Leader Election framework based on Zookeeper receipe

2011-06-15 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049887#comment-13049887
 ] 

Henry Robinson commented on ZOOKEEPER-1080:
---

What we've got here are two different, but equally valid, approaches to 
building leader election. Since this isn't a core framework issue, we're not 
making a decision that everyone has to live with. Therefore there's no need for 
the committers to play kingmaker by only committing one of these patches. We've 
got room for both, just not on this JIRA. 

Here's what I suggest we do. 

* Eric - I've opened ZOOKEEPER-1095 for your contribution. Can you attach your 
recipe (as a diff, with copyright headers) to that ticket, and we'll work on 
getting it committed there?
* Hari - leave your patch here, and one of the committers will do a code review 
shortly. 

 Provide a Leader Election framework based on Zookeeper receipe
 --

 Key: ZOOKEEPER-1080
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1080
 Project: ZooKeeper
  Issue Type: New Feature
  Components: contrib
Affects Versions: 3.3.2
Reporter: Hari A V
 Fix For: 3.3.2

 Attachments: LeaderElectionService.pdf, ZOOKEEPER-1080.patch, 
 zkclient-0.1.0.jar, zookeeper-leader-0.0.1.tar.gz


 Currently Hadoop components such as NameNode and JobTracker are single point 
 of failure.
 If Namenode or JobTracker goes down, there service will not be available 
 until they are up and running again. If there was a Standby Namenode or 
 JobTracker available and ready to serve when Active nodes go down, we could 
 have reduced the service down time. Hadoop already provides a Standby 
 Namenode implementation which is not fully a hot Standby. 
 The common problem to be addressed in any such Active-Standby cluster is 
 Leader Election and Failure detection. This can be done using Zookeeper as 
 mentioned in the Zookeeper recipes.
 http://zookeeper.apache.org/doc/r3.3.3/recipes.html
 +Leader Election Service (LES)+
 Any Node who wants to participate in Leader Election can use this service. 
 They should start the service with required configurations. The service will 
 notify the nodes whether they should be started as Active or Standby mode. 
 Also they intimate any changes in the mode at runtime. All other complexities 
 can be handled internally by the LES.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (ZOOKEEPER-1094) Small improvements to LeaderElection and Vote classes

2011-06-13 Thread Henry Robinson (JIRA)
Small improvements to LeaderElection and Vote classes
-

 Key: ZOOKEEPER-1094
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1094
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum
Reporter: Henry Robinson
Assignee: Henry Robinson
Priority: Minor


1. o.a.z.q.Vote is a struct-style class, whose fields are public and not final. 

In general, we should prefer making the fields of these kind of classes final, 
and hiding them behind getters for the following reasons:

* Marking them as final allows clients of the class not to worry about any 
synchronisation when accessing the fields
* Hiding them behind getters allows us to change the implementation of the 
class without changing the API. 

Object creation is very cheap. It's ok to create new Votes rather than mutate 
existing ones. 

2. Votes are mainly used in the LeaderElection class. In this class a map of 
addresses to votes is passed in to countVotes, which modifies the map contents 
inside an iterator (and therefore changes the object passed in by reference). 
This is pretty gross, so at the same time I've slightly refactored this method 
to return information about the number of validVotes in the ElectionResult 
class, which is returned by countVotes. 

3. The previous implementation of countVotes was quadratic in the number of 
votes. It is possible to do this linearly. No real speed-up is expected as a 
result, but it salves the CS OCD in me :)



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1094) Small improvements to LeaderElection and Vote classes

2011-06-13 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-1094:
--

Attachment: ZK-1094.patch

 Small improvements to LeaderElection and Vote classes
 -

 Key: ZOOKEEPER-1094
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1094
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum
Reporter: Henry Robinson
Assignee: Henry Robinson
Priority: Minor
 Attachments: ZK-1094.patch


 1. o.a.z.q.Vote is a struct-style class, whose fields are public and not 
 final. 
 In general, we should prefer making the fields of these kind of classes 
 final, and hiding them behind getters for the following reasons:
 * Marking them as final allows clients of the class not to worry about any 
 synchronisation when accessing the fields
 * Hiding them behind getters allows us to change the implementation of the 
 class without changing the API. 
 Object creation is very cheap. It's ok to create new Votes rather than mutate 
 existing ones. 
 2. Votes are mainly used in the LeaderElection class. In this class a map of 
 addresses to votes is passed in to countVotes, which modifies the map 
 contents inside an iterator (and therefore changes the object passed in by 
 reference). This is pretty gross, so at the same time I've slightly 
 refactored this method to return information about the number of validVotes 
 in the ElectionResult class, which is returned by countVotes. 
 3. The previous implementation of countVotes was quadratic in the number of 
 votes. It is possible to do this linearly. No real speed-up is expected as a 
 result, but it salves the CS OCD in me :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1094) Small improvements to LeaderElection and Vote classes

2011-06-13 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-1094:
--

Attachment: (was: ZK-1094.patch)

 Small improvements to LeaderElection and Vote classes
 -

 Key: ZOOKEEPER-1094
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1094
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum
Reporter: Henry Robinson
Assignee: Henry Robinson
Priority: Minor
 Attachments: ZK-1094.patch


 1. o.a.z.q.Vote is a struct-style class, whose fields are public and not 
 final. 
 In general, we should prefer making the fields of these kind of classes 
 final, and hiding them behind getters for the following reasons:
 * Marking them as final allows clients of the class not to worry about any 
 synchronisation when accessing the fields
 * Hiding them behind getters allows us to change the implementation of the 
 class without changing the API. 
 Object creation is very cheap. It's ok to create new Votes rather than mutate 
 existing ones. 
 2. Votes are mainly used in the LeaderElection class. In this class a map of 
 addresses to votes is passed in to countVotes, which modifies the map 
 contents inside an iterator (and therefore changes the object passed in by 
 reference). This is pretty gross, so at the same time I've slightly 
 refactored this method to return information about the number of validVotes 
 in the ElectionResult class, which is returned by countVotes. 
 3. The previous implementation of countVotes was quadratic in the number of 
 votes. It is possible to do this linearly. No real speed-up is expected as a 
 result, but it salves the CS OCD in me :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1094) Small improvements to LeaderElection and Vote classes

2011-06-13 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-1094:
--

Attachment: ZK-1094.patch

 Small improvements to LeaderElection and Vote classes
 -

 Key: ZOOKEEPER-1094
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1094
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum
Reporter: Henry Robinson
Assignee: Henry Robinson
Priority: Minor
 Attachments: ZK-1094.patch


 1. o.a.z.q.Vote is a struct-style class, whose fields are public and not 
 final. 
 In general, we should prefer making the fields of these kind of classes 
 final, and hiding them behind getters for the following reasons:
 * Marking them as final allows clients of the class not to worry about any 
 synchronisation when accessing the fields
 * Hiding them behind getters allows us to change the implementation of the 
 class without changing the API. 
 Object creation is very cheap. It's ok to create new Votes rather than mutate 
 existing ones. 
 2. Votes are mainly used in the LeaderElection class. In this class a map of 
 addresses to votes is passed in to countVotes, which modifies the map 
 contents inside an iterator (and therefore changes the object passed in by 
 reference). This is pretty gross, so at the same time I've slightly 
 refactored this method to return information about the number of validVotes 
 in the ElectionResult class, which is returned by countVotes. 
 3. The previous implementation of countVotes was quadratic in the number of 
 votes. It is possible to do this linearly. No real speed-up is expected as a 
 result, but it salves the CS OCD in me :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1094) Small improvements to LeaderElection and Vote classes

2011-06-13 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048915#comment-13048915
 ] 

Henry Robinson commented on ZOOKEEPER-1094:
---

Sure - if you commit ZOOKEEPER-335 tonight, I'll rebase against trunk and 
repost the patch tomorrow. I just read the patch for ZOOKEEPER-335, and there 
should be very few conflicts. Thanks!

 Small improvements to LeaderElection and Vote classes
 -

 Key: ZOOKEEPER-1094
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1094
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum
Reporter: Henry Robinson
Assignee: Henry Robinson
Priority: Minor
 Attachments: ZK-1094.patch


 1. o.a.z.q.Vote is a struct-style class, whose fields are public and not 
 final. 
 In general, we should prefer making the fields of these kind of classes 
 final, and hiding them behind getters for the following reasons:
 * Marking them as final allows clients of the class not to worry about any 
 synchronisation when accessing the fields
 * Hiding them behind getters allows us to change the implementation of the 
 class without changing the API. 
 Object creation is very cheap. It's ok to create new Votes rather than mutate 
 existing ones. 
 2. Votes are mainly used in the LeaderElection class. In this class a map of 
 addresses to votes is passed in to countVotes, which modifies the map 
 contents inside an iterator (and therefore changes the object passed in by 
 reference). This is pretty gross, so at the same time I've slightly 
 refactored this method to return information about the number of validVotes 
 in the ElectionResult class, which is returned by countVotes. 
 3. The previous implementation of countVotes was quadratic in the number of 
 votes. It is possible to do this linearly. No real speed-up is expected as a 
 result, but it salves the CS OCD in me :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1080) Provide a Leader Election framework based on Zookeeper receipe

2011-06-13 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048949#comment-13048949
 ] 

Henry Robinson commented on ZOOKEEPER-1080:
---

Hey Eric - this looks good. Protocol looks solid at the first pass. Some 
comments, based on a quick look:

* I wouldn't try and delete the root node at STOP time. It seems prone to 
problems if you stop one node while others are starting / in a failed state and 
don't have ephemerals yet registered. Sequence numbers are a fairly abundant 
resource, and if it's possible to run out of them across several runs, it's 
definitely possible to run out of them in a single run. 
* That tuple support class is, imho, kinda gross. It would be clearer to use 
specific struct-type classes whose names correspond to the fields they're 
intended to hold. 
* 'Observers' is already a meaningful noun in ZK land, so it might be clearer 
to call them something else. Paxos uses Learners, but that's also taken inside 
ZK. Listeners?
* Not a big deal, but I think you can break out of the for loop at the end of 
determineElectionStatus once the offer corresponding to the local node has been 
found. 
* I think addObserver / removeObserver probably need to synchronize on 
observers if you think you need to sync in dispatchEvent as well. 
* Is there any way to actually determine who the leader is (if not the local 
process)? Seems like this would be useful.

 Provide a Leader Election framework based on Zookeeper receipe
 --

 Key: ZOOKEEPER-1080
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1080
 Project: ZooKeeper
  Issue Type: New Feature
  Components: contrib
Affects Versions: 3.3.2
Reporter: Hari A V
 Attachments: LeaderElectionService.pdf, zookeeper-leader-0.0.1.tar.gz


 Currently Hadoop components such as NameNode and JobTracker are single point 
 of failure.
 If Namenode or JobTracker goes down, there service will not be available 
 until they are up and running again. If there was a Standby Namenode or 
 JobTracker available and ready to serve when Active nodes go down, we could 
 have reduced the service down time. Hadoop already provides a Standby 
 Namenode implementation which is not fully a hot Standby. 
 The common problem to be addressed in any such Active-Standby cluster is 
 Leader Election and Failure detection. This can be done using Zookeeper as 
 mentioned in the Zookeeper recipes.
 http://zookeeper.apache.org/doc/r3.3.3/recipes.html
 +Leader Election Service (LES)+
 Any Node who wants to participate in Leader Election can use this service. 
 They should start the service with required configurations. The service will 
 notify the nodes whether they should be started as Active or Standby mode. 
 Also they intimate any changes in the mode at runtime. All other complexities 
 can be handled internally by the LES.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-423) Add getFirstChild API

2011-04-29 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027015#comment-13027015
 ] 

Henry Robinson commented on ZOOKEEPER-423:
--

Lukas - 

Good suggestion. Could you describe the use case you're thinking of, so we can 
properly weigh up the idea?

Thanks - 

Henry

 Add getFirstChild API
 -

 Key: ZOOKEEPER-423
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-423
 Project: ZooKeeper
  Issue Type: New Feature
  Components: contrib-bindings, documentation, java client, server
Reporter: Henry Robinson
 Attachments: ZOOKEEPER-423.patch


 When building the distributed queue for my tutorial blog post, it was pointed 
 out to me that there's a serious inefficiency here. 
 Informally, the items in the queue are created as sequential nodes. For a 
 'dequeue' call, all items are retrieved and sorted by name by the client in 
 order to find the name of the next item to try and take. This costs O( n ) 
 bandwidth and O(n.log n) sorting time - per dequeue call! Clearly this 
 doesn't scale very well. 
 If the servers were able to maintain a data structure that allowed them to 
 efficiently retrieve the children of a node in order of the zxid that created 
 them this would make successful dequeue operations O( 1 ) at the cost of O( n 
 ) memory on the server (to maintain, e.g. a singly-linked list as a queue). 
 This is a win if it is generally true that clients only want the first child 
 in creation order, rather than the whole set. 
 We could expose this to the client via this API: getFirstChild(handle, path, 
 name_buffer, watcher) which would have much the same semantics as 
 getChildren, but only return one znode name. 
 Sequential nodes would still allow the ordering of znodes to be made 
 explicitly available to the client in one RPC should it need it. Although: 
 since this ordering would now be available cheaply for every set of children, 
 it's not completely clear that there would be that many use cases left for 
 sequential nodes if this API was augmented with a getChildrenInCreationOrder 
 call. However, that's for a different discussion. 
 A halfway-house alternative with more flexibility is to add an 'order' 
 parameter to getFirstChild and have the server compute the first child 
 according to the requested order (creation time, update time, lexicographical 
 order). This saves bandwidth at the expense of increased server load, 
 although servers can be implemented to spend memory on pre-computing commonly 
 requested orders. I am only in favour of this approach if servers maintain a 
 data-structure for every possible order, and then the memory implications 
 need careful consideration.
 [edit - JIRA interprets ( n ) without the spaces as a thumbs-down. cute.]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-423) Add getFirstChild API

2011-04-29 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027030#comment-13027030
 ] 

Henry Robinson commented on ZOOKEEPER-423:
--

Thanks - I'm afraid I need a bit more clarification (I'm slow first thing in 
the morning :)). 

For a stack, each worker can call getLastChild which will give a FIFO ordering. 
Actually locking the nodes is not covered by this patch, although we could look 
at doing getAndDelete[First|Last]Child. 

If a worker could get the last N children, that could bring a benefit in terms 
of being able to batch process some nodes. Is that what you're describing?

Henry

 Add getFirstChild API
 -

 Key: ZOOKEEPER-423
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-423
 Project: ZooKeeper
  Issue Type: New Feature
  Components: contrib-bindings, documentation, java client, server
Reporter: Henry Robinson
 Attachments: ZOOKEEPER-423.patch


 When building the distributed queue for my tutorial blog post, it was pointed 
 out to me that there's a serious inefficiency here. 
 Informally, the items in the queue are created as sequential nodes. For a 
 'dequeue' call, all items are retrieved and sorted by name by the client in 
 order to find the name of the next item to try and take. This costs O( n ) 
 bandwidth and O(n.log n) sorting time - per dequeue call! Clearly this 
 doesn't scale very well. 
 If the servers were able to maintain a data structure that allowed them to 
 efficiently retrieve the children of a node in order of the zxid that created 
 them this would make successful dequeue operations O( 1 ) at the cost of O( n 
 ) memory on the server (to maintain, e.g. a singly-linked list as a queue). 
 This is a win if it is generally true that clients only want the first child 
 in creation order, rather than the whole set. 
 We could expose this to the client via this API: getFirstChild(handle, path, 
 name_buffer, watcher) which would have much the same semantics as 
 getChildren, but only return one znode name. 
 Sequential nodes would still allow the ordering of znodes to be made 
 explicitly available to the client in one RPC should it need it. Although: 
 since this ordering would now be available cheaply for every set of children, 
 it's not completely clear that there would be that many use cases left for 
 sequential nodes if this API was augmented with a getChildrenInCreationOrder 
 call. However, that's for a different discussion. 
 A halfway-house alternative with more flexibility is to add an 'order' 
 parameter to getFirstChild and have the server compute the first child 
 according to the requested order (creation time, update time, lexicographical 
 order). This saves bandwidth at the expense of increased server load, 
 although servers can be implemented to spend memory on pre-computing commonly 
 requested orders. I am only in favour of this approach if servers maintain a 
 data-structure for every possible order, and then the memory implications 
 need careful consideration.
 [edit - JIRA interprets ( n ) without the spaces as a thumbs-down. cute.]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-423) Add getFirstChild API

2011-04-06 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-423:
-

Attachment: ZOOKEEPER-423.patch

Draft patch

 Add getFirstChild API
 -

 Key: ZOOKEEPER-423
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-423
 Project: ZooKeeper
  Issue Type: New Feature
  Components: contrib-bindings, documentation, java client, server
Reporter: Henry Robinson
 Attachments: ZOOKEEPER-423.patch


 When building the distributed queue for my tutorial blog post, it was pointed 
 out to me that there's a serious inefficiency here. 
 Informally, the items in the queue are created as sequential nodes. For a 
 'dequeue' call, all items are retrieved and sorted by name by the client in 
 order to find the name of the next item to try and take. This costs O( n ) 
 bandwidth and O(n.log n) sorting time - per dequeue call! Clearly this 
 doesn't scale very well. 
 If the servers were able to maintain a data structure that allowed them to 
 efficiently retrieve the children of a node in order of the zxid that created 
 them this would make successful dequeue operations O( 1 ) at the cost of O( n 
 ) memory on the server (to maintain, e.g. a singly-linked list as a queue). 
 This is a win if it is generally true that clients only want the first child 
 in creation order, rather than the whole set. 
 We could expose this to the client via this API: getFirstChild(handle, path, 
 name_buffer, watcher) which would have much the same semantics as 
 getChildren, but only return one znode name. 
 Sequential nodes would still allow the ordering of znodes to be made 
 explicitly available to the client in one RPC should it need it. Although: 
 since this ordering would now be available cheaply for every set of children, 
 it's not completely clear that there would be that many use cases left for 
 sequential nodes if this API was augmented with a getChildrenInCreationOrder 
 call. However, that's for a different discussion. 
 A halfway-house alternative with more flexibility is to add an 'order' 
 parameter to getFirstChild and have the server compute the first child 
 according to the requested order (creation time, update time, lexicographical 
 order). This saves bandwidth at the expense of increased server load, 
 although servers can be implemented to spend memory on pre-computing commonly 
 requested orders. I am only in favour of this approach if servers maintain a 
 data-structure for every possible order, and then the memory implications 
 need careful consideration.
 [edit - JIRA interprets ( n ) without the spaces as a thumbs-down. cute.]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-423) Add getFirstChild API

2011-04-06 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-423:
-

Attachment: (was: ZOOKEEPER-423.patch)

 Add getFirstChild API
 -

 Key: ZOOKEEPER-423
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-423
 Project: ZooKeeper
  Issue Type: New Feature
  Components: contrib-bindings, documentation, java client, server
Reporter: Henry Robinson

 When building the distributed queue for my tutorial blog post, it was pointed 
 out to me that there's a serious inefficiency here. 
 Informally, the items in the queue are created as sequential nodes. For a 
 'dequeue' call, all items are retrieved and sorted by name by the client in 
 order to find the name of the next item to try and take. This costs O( n ) 
 bandwidth and O(n.log n) sorting time - per dequeue call! Clearly this 
 doesn't scale very well. 
 If the servers were able to maintain a data structure that allowed them to 
 efficiently retrieve the children of a node in order of the zxid that created 
 them this would make successful dequeue operations O( 1 ) at the cost of O( n 
 ) memory on the server (to maintain, e.g. a singly-linked list as a queue). 
 This is a win if it is generally true that clients only want the first child 
 in creation order, rather than the whole set. 
 We could expose this to the client via this API: getFirstChild(handle, path, 
 name_buffer, watcher) which would have much the same semantics as 
 getChildren, but only return one znode name. 
 Sequential nodes would still allow the ordering of znodes to be made 
 explicitly available to the client in one RPC should it need it. Although: 
 since this ordering would now be available cheaply for every set of children, 
 it's not completely clear that there would be that many use cases left for 
 sequential nodes if this API was augmented with a getChildrenInCreationOrder 
 call. However, that's for a different discussion. 
 A halfway-house alternative with more flexibility is to add an 'order' 
 parameter to getFirstChild and have the server compute the first child 
 according to the requested order (creation time, update time, lexicographical 
 order). This saves bandwidth at the expense of increased server load, 
 although servers can be implemented to spend memory on pre-computing commonly 
 requested orders. I am only in favour of this approach if servers maintain a 
 data-structure for every possible order, and then the memory implications 
 need careful consideration.
 [edit - JIRA interprets ( n ) without the spaces as a thumbs-down. cute.]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2010-12-29 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975918#action_12975918
 ] 

Henry Robinson commented on ZOOKEEPER-965:
--

Hi Ted - 

You don't have to lose the idiomatic Java - I like the API. I'm just 
distinguishing between the API provided by the Java client and the API that the 
Java client calls in ZooKeeper, which should not be idiomatic. Currently that 
API is defined by the serialisation (and this is true for most API calls, not 
just multi(...)) rather than some abstract API signature which is realisable in 
every implementing language - I want to make sure that the serialisation is not 
the specification. Again, an Avro or Thrift or whatever IDL API would make 
these issues go away.  

However, the idiomatic changes here are so slight that having thought about it 
overnight I'm not too concerned about separating it out. It would be different 
if, e.g. the API was heavily object-oriented (for example a builder interface); 
then I would mandate that such an API should wrap a simple procedural API. It's 
pretty clear here what the multi(...) API means in all languages we're 
interested in. 

Thanks for listening :)

Henry

 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.4.0


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-921) zkPython incorrectly checks for existence of required ACL elements

2010-12-28 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-921:
-

Hadoop Flags: [Reviewed]

+1 

Nice work, particularly nice catch on the test not running bug. I'll commit 
this shortly. 

 zkPython incorrectly checks for existence of required ACL elements
 --

 Key: ZOOKEEPER-921
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-921
 Project: ZooKeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.1, 3.4.0
 Environment: Mac OS X 10.6.4, included Python 2.6.1
Reporter: Nicholas Knight
Assignee: Nicholas Knight
 Fix For: 3.3.3, 3.4.0

 Attachments: zktest.py, ZOOKEEPER-921.patch


 Calling {{zookeeper.create()}} seems, under certain circumstances, to be 
 corrupting a subsequent call to Python's {{logging}} module.
 Specifically, if the node does not exist (but its parent does), I end up with 
 a traceback like this when I try to make the logging call:
 {noformat}
 Traceback (most recent call last):
   File zktest.py, line 21, in module
 logger.error(Boom?)
   File 
 /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py,
  line 1046, in error
 if self.isEnabledFor(ERROR):
   File 
 /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py,
  line 1206, in isEnabledFor
 return level = self.getEffectiveLevel()
   File 
 /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py,
  line 1194, in getEffectiveLevel
 while logger:
 TypeError: an integer is required
 {noformat}
 But if the node already exists, or the parent does not exist, I get the 
 appropriate NodeExists or NoNode exceptions.
 I'll be attaching a test script that can be used to reproduce this behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-963) Make Forrest work with JDK6

2010-12-28 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-963:
-

Hadoop Flags: [Reviewed]

+1 

It works! Oh happy day. I'll commit this asap to 3.3.3 and 3.4. 

 Make Forrest work with JDK6
 ---

 Key: ZOOKEEPER-963
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-963
 Project: ZooKeeper
  Issue Type: Bug
  Components: build, documentation
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 3.3.3, 3.4.0

 Attachments: ZOOKEEPER-963.1.patch.txt


 It's possible to make Forrest work with JDK6 by disabling sitemap validation
 in the forrest.properties file. See FOR-984 and PIG-1508 for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-837) cyclic dependency ClientCnxn, ZooKeeper

2010-12-28 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975701#action_12975701
 ] 

Henry Robinson commented on ZOOKEEPER-837:
--

Thomas - ZOOKEEPER-823 is not yet committed. Since this is a refactor, not a 
new feature or bug fix and might require 823 (a much larger and more complex 
patch, which you are also working on) to be reworked, do you want to move this 
forward or wait? Is there a significant benefit to getting this in before 823?

 cyclic dependency ClientCnxn, ZooKeeper
 ---

 Key: ZOOKEEPER-837
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-837
 Project: ZooKeeper
  Issue Type: Sub-task
Affects Versions: 3.3.1
Reporter: Patrick Datko
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-837.patch, ZOOKEEPER-837.patch


 ZooKeeper instantiates ClientCnxn in its ctor with this and therefor builds a 
 cyclic dependency graph between both objects. This means, you can't have the 
 one without the other. So why did you bother do make them to separate classes 
 in the first place?
 ClientCnxn accesses ZooKeeper.state. State should rather be a property of 
 ClientCnxn. And ClientCnxn accesses zooKeeper.get???Watches() in its method 
 primeConnection(). I've not yet checked, how this dependency should be 
 resolved better.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2010-12-28 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975702#action_12975702
 ] 

Henry Robinson commented on ZOOKEEPER-965:
--

Hi Ted - 

I took a quick look at the github branch. Looks really good, thanks.

I've got a few comments on the code itself, but I'll save those until you post 
a patch. My main issue is the following: the 'multi' API call is expressed in 
terms of an iterable over a polymorphic type, both of which are Java features 
that aren't extant in C. To aid future language bindings authors and to make 
the implementation really easy to verify, I'd like to see an API signature 
that's very easily translated between languages. The iterable isn't too 
concerning (almost every language has *some* notion of lists) but the 
polymorphic op object should map onto some simpler struct type. 

I know that the serialisation is independent of the signature, so we could call 
it whatever we liked in any language, but I'd like to keep the core ZK API 
consistent across all bindings where possible and use wrappers in, for example, 
Python to provide more idiomatic interfaces. The serialisation may also change 
when we finally vote jute off the island, so we can't use that as the API spec. 
Indeed, we'll probably use Avro, where we have to write APIs in 
language-agnostic IDLs.

So, to cut a long story short: any chance you can make the API a bit more 
language neutral? Then the op stuff can be a (very) thin wrapper. Shouldn't be 
a large change at all. 

You might consider chopping this up into a few JIRAs (apologies if you have and 
I've missed them) - core API, Java wrapper, finishing touches (like payload 
size limits).

Excited to see this! Let me know how I can help. 

Henry





 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Dunning
 Fix For: 3.4.0


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2010-12-28 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975704#action_12975704
 ] 

Henry Robinson commented on ZOOKEEPER-965:
--

Hi Thomas - 

I really appreciate all your hard work cleaning up ZooKeeper's internals. I 
understand your frustration about the speed at which some tickets are moving. 
You've correctly identified that the committers have limited time, and 
particularly so over the holiday season. Hopefully we can pick up the pace now!

However, I'm uncomfortable with the idea that ongoing refactoring work could 
block an often asked-for feature like this - particularly a JIRA (911) where 
there isn't yet consensus on the approach, or indeed an available patch. Open 
source projects see fluctuating participation so we can't afford generally for 
issues to come with 'locks' on the code they touch, otherwise we run the risk 
of starvation :)

So if this issue gets a patch with consensus before ZOOKEEPER-911, I'll be very 
happy to commit it and then to work with you on the extra changes in 
ZOOKEEPER-911 that this patch would cause. 

Henry

 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Dunning
 Fix For: 3.4.0


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2010-12-28 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-965:
-

Assignee: Ted Dunning

Assigning to Ted.

 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.4.0


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.