[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822700#comment-15822700 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/120 As a side effect of a new commit (https://github.com/apache/zookeeper/commit/42c75b5f2457f8ea5b4106ce5dc1c34c330361c0) that triggers git mirror sync, this PR is closed :) > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822695#comment-15822695 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user asfgit closed the pull request at: https://github.com/apache/zookeeper/pull/120 > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822481#comment-15822481 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/120 Right, it is committed, confirmed. I was only checking the apache mirror on git (https://github.com/apache/zookeeper), instead of the apache git directly. I suspect there is some infra issues on Apache (the JIRA was done yesterday) which contributes to the bridging between various systems not working as expected, leading to this PR not closed automatically. > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822476#comment-15822476 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user breed commented on the issue: https://github.com/apache/zookeeper/pull/120 it shows up in https://git-wip-us.apache.org/repos/asf?p=zookeeper.git > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822467#comment-15822467 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user breed commented on the issue: https://github.com/apache/zookeeper/pull/120 hmm, perhaps you are right. i don't think the script is working for me properly... checking. > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822464#comment-15822464 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/120 Also https://issues.apache.org/jira/browse/ZOOKEEPER-261 does not have Fixed Version set, which implies that the JIRA was resolved manually (my guess) instead of automatically as part of commit flow through https://github.com/apache/zookeeper/blob/master/zk-merge-pr.py - usually if we use the merge script it will take care everything including close the PR and resolve the JIRA (provided right credential has been set up.). > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822461#comment-15822461 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/120 I don't see this is committed in master. No commit log from git, no notification emails. @breed Are you sure this is committed (and pushed to apache git)? > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822389#comment-15822389 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user breed commented on the issue: https://github.com/apache/zookeeper/pull/120 hmm this is committed. anyone understands why it doesn't autoclose? > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822369#comment-15822369 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user breed commented on the issue: https://github.com/apache/zookeeper/pull/120 commited. thanx everyone for reviewing and brian for your contribution. > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821844#comment-15821844 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user eribeiro commented on the issue: https://github.com/apache/zookeeper/pull/120 @enixon Thank you very much for the explanation! Makes sense, sorry for my misunderstanding. @breed Yep, agree. Let's push it forward. :+1: > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820065#comment-15820065 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user eribeiro commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/120#discussion_r95721046 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -175,11 +193,20 @@ public long restore(DataTree dt, Mapsessions, "No snapshot found, but there are log entries. " + "Something is broken!"); } -/* TODO: (br33d) we should either put a ConcurrentHashMap on restore() - * or use Map on save() */ -save(dt, (ConcurrentHashMap )sessions); -/* return a zxid of zero, since we the database is empty */ -return 0; + +if (suspectEmptyDB) { +/* return a zxid of -1, since we are possibly missing data */ +LOG.warn("Unexpected empty data tree, setting zxid to -1"); --- End diff -- Are we 100% sure the data tree is empty? Couldn't it be only partially complete? I mean the machine recorded up to transaction n, but lost transactions n+1, n+2, n+3, etc? > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820066#comment-15820066 ] Edward Ribeiro commented on ZOOKEEPER-261: -- I wrote the comment below on GH, but for whatever reason it was not posted here, so I am duplicating just to see where/if I am mistaken. :) "Hi @enixon, I think your approach is very cool, for real. I only had time to give a first pass on your patch now (hope to look closer soon, esp. the tests), but I would like to ask a dumb question. What if we change the approach and, instead of the initialize file being used for normal execution, we use a recover (or rejoin) file whose presence denote an exceptional restart of a ZK node? That way, if and only if, this file is present we delete it and return -1L so that it cannot take part in the elections until it catches up with the ensemble, etc. If this file is not present then we proceed as usual (i.e. returns 0L). This way, we are dealing with the exceptional case by using the initialize/recover. For example: node C (from a 3 node ensemble) crashes due to disk full exceptions. Then the operator delete the data/ directory and put the recovering file there. In my humble (and naive) option, it would avoid some headaches for ops people who would forget to include the initialize file in a node or two, during rolling upgrades or other cases I can't think of right now. The presence of this file for normal execution changes the ordinal operation of a ZK node. So, we don't have to deal with changing the standard way of starting a ZK node. The recover file is for exceptional cases, where we want to make sure the restarting node cannot take part in an election. PS: I didn't get the autocreateDB stuff also. But it's late at night here. Wdyt? /cc [~hanm] [~breed] [~fpj] " PS2: The scenario described in the JIRA is a good point in favor of a {{initialize}} file, because when B & C came back **automatically** then the {{initialize}} file would be missing from both nodes, and the ensemble would grind to a halt because no one would be leader, right? Otherwise, if there was an script to **automatically* create those files on each node once the machine was turned up then B & C would have the file created and then we could come back to square one, right? Does it make any sense what I am writing? Please, lecture me. :) > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819918#comment-15819918 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user enixon commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/120#discussion_r95714487 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -167,6 +175,16 @@ public long restore(DataTree dt, Mapsessions, PlayBackListener listener) throws IOException { long deserializeResult = snapLog.deserialize(dt, sessions); FileTxnLog txnLog = new FileTxnLog(dataDir); +boolean suspectEmptyDB; +File initFile = new File(dataDir.getParent(), "initialize"); +if (initFile.exists()) { +if (!initFile.delete()) { +throw new IOException("Unable to delete initialization file " + initFile.toString()); +} +suspectEmptyDB = false; +} else { +suspectEmptyDB = !autoCreateDB; --- End diff -- I tempted to do put the log line on the other side of the conditional since this side is the expected case. We should only delete an initialize file once in the lifecycle of a given server while the check against `autoCreateDB` will happen every other time the server is restarted. > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819907#comment-15819907 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user enixon commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/120#discussion_r95714170 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -167,6 +175,16 @@ public long restore(DataTree dt, Mapsessions, PlayBackListener listener) throws IOException { long deserializeResult = snapLog.deserialize(dt, sessions); FileTxnLog txnLog = new FileTxnLog(dataDir); +boolean suspectEmptyDB; +File initFile = new File(dataDir.getParent(), "initialize"); +if (initFile.exists()) { --- End diff -- Nice optimization, I like it! > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819905#comment-15819905 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user enixon commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/120#discussion_r95714094 --- Diff: bin/zkServer-initialize.sh --- @@ -113,6 +113,8 @@ initialize() { else echo "No myid provided, be sure to specify it in $ZOO_DATADIR/myid if using non-standalone" fi + +date > "$ZOO_DATADIR/initialize" --- End diff -- True enough, `touch` is sufficient. Using `date` is an optimization I've included in other scripts in the past as a way of sneaking a bit more information into an otherwise meaningless file but in this context it's probably just confusing. > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819849#comment-15819849 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user eribeiro commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/120#discussion_r95711916 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -167,6 +175,16 @@ public long restore(DataTree dt, Mapsessions, PlayBackListener listener) throws IOException { long deserializeResult = snapLog.deserialize(dt, sessions); FileTxnLog txnLog = new FileTxnLog(dataDir); +boolean suspectEmptyDB; +File initFile = new File(dataDir.getParent(), "initialize"); +if (initFile.exists()) { --- End diff -- Disclaimer: I am not used to `Files` class so you may have to make sure it doesn't alter the current behaviour if you decide to use it. > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819823#comment-15819823 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user eribeiro commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/120#discussion_r95710948 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -167,6 +175,16 @@ public long restore(DataTree dt, Mapsessions, PlayBackListener listener) throws IOException { long deserializeResult = snapLog.deserialize(dt, sessions); FileTxnLog txnLog = new FileTxnLog(dataDir); +boolean suspectEmptyDB; +File initFile = new File(dataDir.getParent(), "initialize"); +if (initFile.exists()) { +if (!initFile.delete()) { +throw new IOException("Unable to delete initialization file " + initFile.toString()); +} +suspectEmptyDB = false; +} else { +suspectEmptyDB = !autoCreateDB; --- End diff -- IMO, it would be nice to put a `debug` (warn?) log message here. Something along the lines of "Initialize file doesn't found! Using autoCreateDB attribute." > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819801#comment-15819801 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user eribeiro commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/120#discussion_r95707659 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -132,6 +137,9 @@ public FileTxnSnapLog(File dataDir, File snapDir) throws IOException { txnLog = new FileTxnLog(this.dataDir); snapLog = new FileSnap(this.snapDir); + +autoCreateDB = Boolean.parseBoolean(System.getProperty(ZOOKEEPER_DB_AUTOCREATE, --- End diff -- +1 with @hanm > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819800#comment-15819800 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user eribeiro commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/120#discussion_r95707294 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -167,6 +175,16 @@ public long restore(DataTree dt, Mapsessions, PlayBackListener listener) throws IOException { long deserializeResult = snapLog.deserialize(dt, sessions); FileTxnLog txnLog = new FileTxnLog(dataDir); +boolean suspectEmptyDB; +File initFile = new File(dataDir.getParent(), "initialize"); +if (initFile.exists()) { --- End diff -- As Java 7 is the default we could use the code below? The benefits are that it automatically throws the `IOException` if an I/O error happens or return `false` if the file doesn't exists. ``` if (Files.deleteIfExists(initFile.toPath()) { suspectEmptyDB = false; } else { (...) ``` > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819803#comment-15819803 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user eribeiro commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/120#discussion_r95709489 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -167,6 +175,16 @@ public long restore(DataTree dt, Mapsessions, PlayBackListener listener) throws IOException { long deserializeResult = snapLog.deserialize(dt, sessions); FileTxnLog txnLog = new FileTxnLog(dataDir); +boolean suspectEmptyDB; --- End diff -- Could we rename this to `recoveringDB` or `recoveringNode`? My rationale is: `suspectEmptyDB` looks vague to me, **plus** __if I understood it right__ a node could have been shutdown and restarted after some time. So, not necessarily its DB will be empty, but it is in a recovering process so we want to avoid that it becoming the leader and messing up with transactions performed while it was offline, right? Could we rename this to `recoveringDB` or `recoveringNode`? My rationale is: `suspectEmptyDB` looks vague to me, **plus** because __if I understood it right__ a node could have been shutdown and restarted after some time. So, not necessarily its DB will be empty, but it is in a recovering process so we want to avoid that it becoming the leader and messing up with transactions performed while it was offline, right? > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819802#comment-15819802 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user eribeiro commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/120#discussion_r95703179 --- Diff: bin/zkServer-initialize.sh --- @@ -113,6 +113,8 @@ initialize() { else echo "No myid provided, be sure to specify it in $ZOO_DATADIR/myid if using non-standalone" fi + +date > "$ZOO_DATADIR/initialize" --- End diff -- Nit: If the sole purpose of this file is to act as a marker, in spite of its content, then a ```touch $ZOO_DATADIR/initialize``` would be enough, wouldn't it? Of course, `date` is fine as well, no problem. > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819633#comment-15819633 ] Hadoop QA commented on ZOOKEEPER-261: - +1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 20 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/205//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/205//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/205//console This message is automatically generated. > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819558#comment-15819558 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user enixon commented on the issue: https://github.com/apache/zookeeper/pull/120 Rebased on to latest master to avoid any potential conflicts with @breed 's changes for 2325. > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734431#comment-15734431 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/120 lgtm +1 > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733423#comment-15733423 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user enixon commented on the issue: https://github.com/apache/zookeeper/pull/120 - add documentation on 'zookeeper.db.autocreate' to zookeeperAdmin.xml - extend bin/zkServer-initialize.sh to create the initialize file - treat failure to delete initialization file as fatal, throw IOException instead of logging a warning > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727937#comment-15727937 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user hanm commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/120#discussion_r91235960 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -132,6 +137,9 @@ public FileTxnSnapLog(File dataDir, File snapDir) throws IOException { txnLog = new FileTxnLog(this.dataDir); snapLog = new FileSnap(this.snapDir); + +autoCreateDB = Boolean.parseBoolean(System.getProperty(ZOOKEEPER_DB_AUTOCREATE, --- End diff -- >> Is that in accord with Zookeeper style? I see - I thought the new property was not end user facing since there is no associated documents added here. Since the property "zookeeper.db.autocreate" is exposed to user some doc could be added to ZooKeeperAdmin.html (similarly like how the existing "zookeeper.datadir.autocreate" is documented there) to describe the motivation / usage of the property. > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727911#comment-15727911 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user enixon commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/120#discussion_r91234951 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -132,6 +137,9 @@ public FileTxnSnapLog(File dataDir, File snapDir) throws IOException { txnLog = new FileTxnLog(this.dataDir); snapLog = new FileSnap(this.snapDir); + +autoCreateDB = Boolean.parseBoolean(System.getProperty(ZOOKEEPER_DB_AUTOCREATE, --- End diff -- I included `ZOOKEEPER_DB_AUTOCREATE` to allow users to opt out of the feature until they're ready to update their ensemble management tooling to support creating the new file. Is that in accord with Zookeeper style? On the question of style, `ZOOKEEPER_DB_AUTOCREATE_DEFAULT` exists purely because `ZOOKEEPER_DATADIR_AUTOCREATE_DEFAULT` exists above it in the file. If including the defaults as static constants isn't Zookeeper style then I'm happy to replace it with a string literal in the constructor. > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727293#comment-15727293 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user hanm commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/120#discussion_r91208860 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -132,6 +137,9 @@ public FileTxnSnapLog(File dataDir, File snapDir) throws IOException { txnLog = new FileTxnLog(this.dataDir); snapLog = new FileSnap(this.snapDir); + +autoCreateDB = Boolean.parseBoolean(System.getProperty(ZOOKEEPER_DB_AUTOCREATE, --- End diff -- It seems that this variable `autoCreateDB` (and the property `ZOOKEEPER_DB_AUTOCREATE_DEFAULT` and `ZOOKEEPER_DB_AUTOCREATE`) is used solely for testing purpose (to change control flow and get code coverage). IIUC, maybe add some comments about these test only variables? > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727294#comment-15727294 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user hanm commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/120#discussion_r91208138 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -165,8 +173,41 @@ public File getSnapDir() { */ public long restore(DataTree dt, Mapsessions, PlayBackListener listener) throws IOException { -snapLog.deserialize(dt, sessions); +long deserializeResult = snapLog.deserialize(dt, sessions); FileTxnLog txnLog = new FileTxnLog(dataDir); +boolean suspectEmptyDB; +File initFile = new File(dataDir.getParent(), "initialize"); +if (initFile.exists()) { +if (!initFile.delete()) { +LOG.warn("Unable to delete initialization file " + initFile.toString()); --- End diff -- It sounds pretty serious issue if the initialize file can't be cleaned up upon startup of a new ensemble for whatever reasons as the presence of this file is the key promise made here - not able to clean it up will possibly lead to inconsistent quorum state again that this PR is trying to fix. So, maybe throw an IOException here to abort server start process and let admin intervene instead? > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727075#comment-15727075 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user enixon commented on the issue: https://github.com/apache/zookeeper/pull/120 Thanks, @hanm , let's see if editing the correct string into the title suffices or if I need to open up a new PR. > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727057#comment-15727057 ] ASF GitHub Bot commented on ZOOKEEPER-261: -- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/120 Nit: the PR title should contain the string "ZOOKEEPER-261" for the merge script to work - otherwise comments made here will not be bridged to JIRA https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727016#comment-15727016 ] Brian Nixon commented on ZOOKEEPER-261: --- Ben and I discussed this offline. When starting up without any local data, the safest thing to do is view this lack with extreme suspicion and not participate in voting until you can pull down the data tree from the rest of the ensemble. Such a server is not qualified to confirm which servers are up to date and could inadvertently elect a server that is missing some data. The one exception is the creation of a fresh ensemble, when there is no data to repopulate the local data tree. It's not clear that an ensemble can detect that it is in this state on its own since in the worst case, every server will be subject to the same data losing fault (in which case you should recover from backups instead of coming online as an empty data base). This extra information needs to come from the admin. With the changes from ZOOKEEPER-2325, a server with no local data tree starts with a zxid of 0. I'll submit a pull request that changes that initial zxid to -1 unless a special 'initialize' file is present in the data directory and removes voting privileges from members reporting -1. The idea is that creating the 'initialize' file alongside 'myid' will be a standard part of ensemble creation - the extra information from the admin. The 'initialize' file will be automatically cleaned up by the server and subsequent restarts can view missing data directories as a sign they are legitimately missing context (e.g. adding to an existing ensemble). > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)