[jira] [Commented] (ZOOKEEPER-866) Adding no disk persistence option in zookeeper.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010096#comment-13010096 ] Jürgen Schumacher commented on ZOOKEEPER-866: - Hi, I tried this patch with our application with Zokkeeper 3.3.3, because we do not care for persistence of data after complete system restarts but we need reliablity if only single Zookeeper servers crash and restart later. Is it correct that with this path the Zookeeper ensemble loses all currently stored data when just the leader server crashes or is killed (our test ensemble consists of 5 nodes)? I would have expected that each follower has the complete current data in memory and can continue to work on it, when it becomes the new leader. Or is this assumption wrong? Thanks. Adding no disk persistence option in zookeeper. --- Key: ZOOKEEPER-866 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-866 Project: ZooKeeper Issue Type: New Feature Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.4.0 Attachments: ZOOKEEPER-nodisk.patch Its been seen that some folks would like to use zookeeper for very fine grained locking. Also, in there use case they are fine with loosing all old zookeeper state if they reboot zookeeper or zookeeper goes down. The use case is more of a runtime locking wherein forgetting the state of locks is acceptable in case of a zookeeper reboot. Not logging to disk allows high throughput on and low latency on the writes to zookeeper. This would be a configuration option to set (ofcourse the default would be logging to disk). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
ZooKeeper_branch_3_3 - Build # 196 - Failure
See https://hudson.apache.org/hudson/job/ZooKeeper_branch_3_3/196/ ### ## LAST 60 LINES OF THE CONSOLE ### Started by timer Building remotely on hadoop9 Checking out a fresh workspace because /grid/0/hudson/hudson-slave/workspace/ZooKeeper_branch_3_3/branch-3.3 doesn't exist SCM check out aborted Recording test results ERROR: Publisher hudson.tasks.junit.JUnitResultArchiver aborted due to exception java.lang.NullPointerException at hudson.tasks.junit.JUnitParser.parse(JUnitParser.java:83) at hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:123) at hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:135) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:644) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:623) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:601) at hudson.model.Build$RunnerImpl.post2(Build.java:159) at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:570) at hudson.model.Run.run(Run.java:1386) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:145) Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
ZooKeeper-trunk - Build # 1130 - Failure
See https://hudson.apache.org/hudson/job/ZooKeeper-trunk/1130/ ### ## LAST 60 LINES OF THE CONSOLE ### Started by timer Building remotely on hadoop5 SCM check out aborted [FINDBUGS] Skipping publisher since build result is FAILURE [WARNINGS] Skipping publisher since build result is FAILURE Recording fingerprints ERROR: Unable to record fingerprints because there's no workspace Archiving artifacts Recording test results ERROR: Publisher hudson.tasks.junit.JUnitResultArchiver aborted due to exception java.lang.NullPointerException at hudson.tasks.junit.JUnitParser.parse(JUnitParser.java:83) at hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:123) at hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:135) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:644) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:623) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:601) at hudson.model.Build$RunnerImpl.post2(Build.java:159) at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:570) at hudson.model.Run.run(Run.java:1386) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:145) Publishing Javadoc ERROR: Publisher hudson.tasks.JavadocArchiver aborted due to exception java.lang.NullPointerException at hudson.tasks.JavadocArchiver.perform(JavadocArchiver.java:94) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:644) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:623) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:601) at hudson.model.Build$RunnerImpl.post2(Build.java:159) at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:570) at hudson.model.Run.run(Run.java:1386) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:145) ERROR: Publisher hudson.plugins.clover.CloverPublisher aborted due to exception java.lang.NullPointerException at hudson.plugins.clover.CloverPublisher.perform(CloverPublisher.java:137) at hudson.tasks.BuildStepMonitor$3.perform(BuildStepMonitor.java:36) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:644) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:623) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:601) at hudson.model.Build$RunnerImpl.post2(Build.java:159) at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:570) at hudson.model.Run.run(Run.java:1386) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:145) Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
broken links on zookeeper.apache.org
is this the primary site now? it looks like the api doc is a 404 in recent releases: from: http://zookeeper.apache.org/doc/r3.3.3/ http://zookeeper.apache.org/doc/r3.3.3/api/index.html = 404 similarly, from: http://zookeeper.apache.org/doc/r3.3.3/ http://zookeeper.apache.org/doc/r3.3.2/api/index.html = 404 -- n...@hep.cat (^-^)
[jira] [Commented] (ZOOKEEPER-866) Adding no disk persistence option in zookeeper.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010167#comment-13010167 ] Mahadev konar commented on ZOOKEEPER-866: - jurgen/Maya, We had exprimented with this patch a lot and realized that the throughput does not change a lot without logging to disk. The numbers were almost as close to logging to disk. Logging to disk wasnt a bottleneck. We had been trying to find out what might have increased the thoughput but had didnt get a chance to work through it. Also, the patch as Jurgen said if the Leader server crashes will bring down the whole cluster. Adding no disk persistence option in zookeeper. --- Key: ZOOKEEPER-866 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-866 Project: ZooKeeper Issue Type: New Feature Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.4.0 Attachments: ZOOKEEPER-nodisk.patch Its been seen that some folks would like to use zookeeper for very fine grained locking. Also, in there use case they are fine with loosing all old zookeeper state if they reboot zookeeper or zookeeper goes down. The use case is more of a runtime locking wherein forgetting the state of locks is acceptable in case of a zookeeper reboot. Not logging to disk allows high throughput on and low latency on the writes to zookeeper. This would be a configuration option to set (ofcourse the default would be logging to disk). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: broken links on zookeeper.apache.org
Thanks for pointing this out, I'll look into it - I suspect we didn't put up the api docs when we moved the site (it's a separate step as part of a release). Patrick On Wed, Mar 23, 2011 at 8:01 AM, nicholas harteau n...@hep.cat wrote: is this the primary site now? it looks like the api doc is a 404 in recent releases: from: http://zookeeper.apache.org/doc/r3.3.3/ http://zookeeper.apache.org/doc/r3.3.3/api/index.html = 404 similarly, from: http://zookeeper.apache.org/doc/r3.3.3/ http://zookeeper.apache.org/doc/r3.3.2/api/index.html = 404 -- n...@hep.cat (^-^)
[jira] [Commented] (ZOOKEEPER-1001) Read from open ledger
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010170#comment-13010170 ] Flavio Junqueira commented on ZOOKEEPER-1001: - Hi guys, Thanks for the comments. Utkarsh, I'm not sure I understand why you say ZK access done server side. If we have access to ZK for open ledgers, I feel that it should be on the client, either through the application directly or through the BK client. Dhruba, what you suggest sounds like the proposal I posted yesterday. Could you please check it again? Read from open ledger - Key: ZOOKEEPER-1001 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1001 Project: ZooKeeper Issue Type: New Feature Components: contrib-bookkeeper Reporter: Flavio Junqueira Attachments: zk-1001-design-doc.pdf, zk-1001-design-doc.pdf The BookKeeper client currently does not allow a client to read from an open ledger. That is, if the creator of a ledger is still writing to it (and the ledger is not closed), then an attempt to open the same ledger for reading will execute the code to recover the ledger, assuming that the ledger has not been correctly closed. It seems that there are applications that do require the ability to read from a ledger while it is being written to, and the main goal of this jira is to discuss possible implementations of this feature. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: ZooKeeper-trunk - Build # 1130 - Failure
Nigel/Giri, Are you guys trying out something with the builds? thanks mahadev On Wed, Mar 23, 2011 at 5:55 AM, Apache Hudson Server hud...@hudson.apache.org wrote: See https://hudson.apache.org/hudson/job/ZooKeeper-trunk/1130/ ### ## LAST 60 LINES OF THE CONSOLE ### Started by timer Building remotely on hadoop5 SCM check out aborted [FINDBUGS] Skipping publisher since build result is FAILURE [WARNINGS] Skipping publisher since build result is FAILURE Recording fingerprints ERROR: Unable to record fingerprints because there's no workspace Archiving artifacts Recording test results ERROR: Publisher hudson.tasks.junit.JUnitResultArchiver aborted due to exception java.lang.NullPointerException at hudson.tasks.junit.JUnitParser.parse(JUnitParser.java:83) at hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:123) at hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:135) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:644) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:623) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:601) at hudson.model.Build$RunnerImpl.post2(Build.java:159) at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:570) at hudson.model.Run.run(Run.java:1386) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:145) Publishing Javadoc ERROR: Publisher hudson.tasks.JavadocArchiver aborted due to exception java.lang.NullPointerException at hudson.tasks.JavadocArchiver.perform(JavadocArchiver.java:94) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:644) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:623) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:601) at hudson.model.Build$RunnerImpl.post2(Build.java:159) at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:570) at hudson.model.Run.run(Run.java:1386) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:145) ERROR: Publisher hudson.plugins.clover.CloverPublisher aborted due to exception java.lang.NullPointerException at hudson.plugins.clover.CloverPublisher.perform(CloverPublisher.java:137) at hudson.tasks.BuildStepMonitor$3.perform(BuildStepMonitor.java:36) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:644) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:623) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:601) at hudson.model.Build$RunnerImpl.post2(Build.java:159) at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:570) at hudson.model.Run.run(Run.java:1386) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:145) Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (ZOOKEEPER-866) Adding no disk persistence option in zookeeper.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010180#comment-13010180 ] Jürgen Schumacher commented on ZOOKEEPER-866: - Ok, thanks for the answer. I have to admit that we are abusing Zookeeper and don't have a separate cluster or separate disks for it and in such use cases the patch does increase the throughput greatly (sorry for this ;-). So I'd like to ask if the data loss is just a problem of the patch and can probably be fixed (relatively) easily or if Zookeeper does really need the persistence for providing the reliability. My (and my colleagues') impression from reading the docs and some mails in the user lists was that the persistence (apart from keeping the state after complete system restarts, of course) is good for faster recovery if a server crashes and reintegrates with the ensemble, but that in principal it should work without having any persistence at all by reading the state from the other servers, because all the necessary data is in memory anyway. Was this a misunderstanding? Adding no disk persistence option in zookeeper. --- Key: ZOOKEEPER-866 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-866 Project: ZooKeeper Issue Type: New Feature Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.4.0 Attachments: ZOOKEEPER-nodisk.patch Its been seen that some folks would like to use zookeeper for very fine grained locking. Also, in there use case they are fine with loosing all old zookeeper state if they reboot zookeeper or zookeeper goes down. The use case is more of a runtime locking wherein forgetting the state of locks is acceptable in case of a zookeeper reboot. Not logging to disk allows high throughput on and low latency on the writes to zookeeper. This would be a configuration option to set (ofcourse the default would be logging to disk). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1026) Sequence number assignment decreases after old node rejoins cluster
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010281#comment-13010281 ] Jeremy Stribling commented on ZOOKEEPER-1026: - Wow, thanks for the in-depth explanation. It makes sense to me, in terms of the timeline of events and what could go wrong, but I don't know enough about the Zookeeper code to be able to verify for sure. I would love to try out a patch for ZOOKEEPER-975 and see if that fixes the problem for me. (I added myself as a watcher for that bug.) Sequence number assignment decreases after old node rejoins cluster --- Key: ZOOKEEPER-1026 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1026 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.3 Reporter: Jeremy Stribling Attachments: ZOOKEEPER-1026.logs.tgz I ran into a weird case where a Zookeeper server rejoins the cluster after missing several operations, and then a client creates a new sequential node that has a number earlier than the last node it created. I don't have full logs, or a live system in this state, or any data directories, just some partial server logs and the evidence as seen by the client. Haven't tried reproducing it yet, just wanted to see if anyone here had any ideas. Here's the scenario (probably more info than necessary, but trying to be complete) 1) Initially (5:37:20): 3 nodes up, with ids 215, 126, and 37 (called nodes #1, #2, and #3 below): 2) Nodes periodically (and throughout this whole timeline) create sequential, non-ephemeral nodes under the /zkrsm parent node. 3) 5:46:57: Node #1 gets notified of /zkrsm/_record002116 4) 5:47:06: Node #1 restarts and rejoins 5) 5:49:26: Node #2 gets notified of /zkrsm/_record002708 6) 5:49:29: Node #2 restarts and rejoins 7) 5:52:01: Node #3 gets notified of /zkrsm/_record003291 8) 5:52:02: Node #3 restarts and begins the rejoining process 9) 5:52:08: Node #1 successfully creates /zkrsm/_record003348 10) 5:52:08: Node #2 dies after getting notified of /zkrsm/_record003348 11) 5:52:10ish: Node #3 is elected leader (the ZK server log doesn't have wallclock timestamps, so not exactly sure on the ordering of this step) 12) 5:52:15: Node #1 successfully creates /zkrsm/_record003292 Note that the node created in step #12 is lower than the one created in step #9, and is exactly one greater than the last node seen by node #3 before it restarted. Here is the sequence of session establishments as seen from the C client of node #1 after its restart (the IP address of node #1=13.0.0.11, #2=13.0.0.12, #3=13.0.0.13): 2011-03-18 05:46:59,838:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: session establishment complete on server [13.0.0.13:2888], sessionId=0x252ec780a302, negotiated timeout=6000 2011-03-18 05:49:32,194:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632: session establishment complete on server [13.0.0.13:2888], sessionId=0x252ec782f512, negotiated timeout=6000 2011-03-18 05:52:02,352:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: session establishment complete on server [13.0.0.12:2888], sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000 2011-03-18 05:52:08,583:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: session establishment complete on server [13.0.0.11:2888], sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000 2011-03-18 05:52:13,834:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632: session establishment complete on server [13.0.0.11:2888], sessionId=0xd72ec7856d0f0001, negotiated timeout=6000 I will attach logs for all nodes after each of their restarts, and a partial log for node #3 from before its restart. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online
[ https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal K reassigned ZOOKEEPER-975: -- Assignee: Vishal K new peer goes in LEADING state even if ensemble is online - Key: ZOOKEEPER-975 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.3.2 Reporter: Vishal K Assignee: Vishal K Fix For: 3.4.0 Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch Scenario: 1. 2 of the 3 ZK nodes are online 2. Third node is attempting to join 3. Third node unnecessarily goes in LEADING state 4. Then third goes back to LOOKING (no majority of followers) and finally goes to FOLLOWING state. While going through the logs I noticed that a peer C that is trying to join an already formed cluster goes in LEADING state. This is because QuorumCnxManager of A and B sends the entire history of notification messages to C. C receives the notification messages that were exchanged between A and B when they were forming the cluster. In FastLeaderElection.lookForLeader(), due to the following piece of code, C quits lookForLeader assuming that it is supposed to lead. 740 //If have received from all nodes, then terminate 741 if ((self.getVotingView().size() == recvset.size()) 742 (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){ 743 self.setPeerState((proposedLeader == self.getId()) ? 744 ServerState.LEADING: learningState()); 745 leaveInstance(); 746 return new Vote(proposedLeader, proposedZxid); 747 748 } else if (termPredicate(recvset, This can cause: 1. C to unnecessarily go in LEADING state and wait for tickTime * initLimit and then restart the FLE. 2. C waits for 200 ms (finalizeWait) and then considers whatever notifications it has received to make a decision. C could potentially decide to follow an old leader, fail to connect to the leader, and then restart FLE. See code below. 752 if (termPredicate(recvset, 753 new Vote(proposedLeader, proposedZxid, 754 logicalclock))) { 755 756 // Verify if there is any change in the proposed leader 757 while((n = recvqueue.poll(finalizeWait, 758 TimeUnit.MILLISECONDS)) != null){ 759 if(totalOrderPredicate(n.leader, n.zxid, 760 proposedLeader, proposedZxid)){ 761 recvqueue.put(n); 762 break; 763 } 764 } In general, this does not affect correctness of FLE since C will eventually go back to FOLLOWING state (A and B won't vote for C). However, this delays C from joining the cluster. This can in turn affect recovery time of an application. Proposal: A and B should send only the latest notification (most recent) instead of the entire history. Does this sound reasonable? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online
[ https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010520#comment-13010520 ] Hadoop QA commented on ZOOKEEPER-975: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12474451/ZOOKEEPER-975.patch against trunk revision 1082362. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/195//console This message is automatically generated. new peer goes in LEADING state even if ensemble is online - Key: ZOOKEEPER-975 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.3.2 Reporter: Vishal K Assignee: Vishal K Fix For: 3.4.0 Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch Scenario: 1. 2 of the 3 ZK nodes are online 2. Third node is attempting to join 3. Third node unnecessarily goes in LEADING state 4. Then third goes back to LOOKING (no majority of followers) and finally goes to FOLLOWING state. While going through the logs I noticed that a peer C that is trying to join an already formed cluster goes in LEADING state. This is because QuorumCnxManager of A and B sends the entire history of notification messages to C. C receives the notification messages that were exchanged between A and B when they were forming the cluster. In FastLeaderElection.lookForLeader(), due to the following piece of code, C quits lookForLeader assuming that it is supposed to lead. 740 //If have received from all nodes, then terminate 741 if ((self.getVotingView().size() == recvset.size()) 742 (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){ 743 self.setPeerState((proposedLeader == self.getId()) ? 744 ServerState.LEADING: learningState()); 745 leaveInstance(); 746 return new Vote(proposedLeader, proposedZxid); 747 748 } else if (termPredicate(recvset, This can cause: 1. C to unnecessarily go in LEADING state and wait for tickTime * initLimit and then restart the FLE. 2. C waits for 200 ms (finalizeWait) and then considers whatever notifications it has received to make a decision. C could potentially decide to follow an old leader, fail to connect to the leader, and then restart FLE. See code below. 752 if (termPredicate(recvset, 753 new Vote(proposedLeader, proposedZxid, 754 logicalclock))) { 755 756 // Verify if there is any change in the proposed leader 757 while((n = recvqueue.poll(finalizeWait, 758 TimeUnit.MILLISECONDS)) != null){ 759 if(totalOrderPredicate(n.leader, n.zxid, 760 proposedLeader, proposedZxid)){ 761 recvqueue.put(n); 762 break; 763 } 764 } In general, this does not affect correctness of FLE since C will eventually go back to FOLLOWING state (A and B won't vote for C). However, this delays C from joining the cluster. This can in turn affect recovery time of an application. Proposal: A and B should send only the latest notification (most recent) instead of the entire history. Does this sound reasonable? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1001) Read from open ledger
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010529#comment-13010529 ] Utkarsh Srivastava commented on ZOOKEEPER-1001: --- Ideally, all ZK access would be done by the bookie, so that the client is as thin as possible. This scheme has certain nice properties like easy upgrades, changes to layout etc. as mentioned by Dhruba. But it's also a radical departure from the current BK design. I agree that accessing ZK through the BK client api is the only sane option right now. Read from open ledger - Key: ZOOKEEPER-1001 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1001 Project: ZooKeeper Issue Type: New Feature Components: contrib-bookkeeper Reporter: Flavio Junqueira Attachments: zk-1001-design-doc.pdf, zk-1001-design-doc.pdf The BookKeeper client currently does not allow a client to read from an open ledger. That is, if the creator of a ledger is still writing to it (and the ledger is not closed), then an attempt to open the same ledger for reading will execute the code to recover the ledger, assuming that the ledger has not been correctly closed. It seems that there are applications that do require the ability to read from a ledger while it is being written to, and the main goal of this jira is to discuss possible implementations of this feature. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira