[GitHub] zookeeper pull request #648: ZOOKEEPER-3156: Add in option to canonicalize h...
Github user revans2 closed the pull request at: https://github.com/apache/zookeeper/pull/648 ---
[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/648 Thanks ---
[GitHub] zookeeper issue #652: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/652 @anmolnar sorry it took me so long. I was out on vacation. ---
[GitHub] zookeeper issue #652: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/652 Ping any hope in getting this merged in? ---
[GitHub] zookeeper issue #652: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/652 I just rebased to deal with the directories being moved around. ---
[GitHub] zookeeper issue #652: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/652 Thanks for all of the reviews I just rebased and squashed commits. ---
[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/648 Thanks for all of the reviews I just rebased and squashed commits. ---
[GitHub] zookeeper issue #652: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/652 Ping @eolivelli @lvfangmin @anmolnar I think I have addressed all of the concerns on both branches. The test failure here looks unrelated to the changes that I made. ---
[GitHub] zookeeper issue #652: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/652 @anmolnar and @lvfangmin I think I have addressed all of your review comments. I named the new class SaslServerPrincipal but if you have a different idea for a name I am happy to change it. ---
[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/648 There were some comments in the other pull request about the test code. I will address them in both places. ---
[GitHub] zookeeper pull request #652: ZOOKEEPER-3156: Add in option to canonicalize h...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/652#discussion_r221956355 --- Diff: src/java/main/org/apache/zookeeper/ClientCnxn.java --- @@ -793,7 +794,87 @@ public RWServerFoundException(String msg) { super(msg); } } - + +static class MockableInetSocketAddress { +private final InetSocketAddress addr; + +MockableInetSocketAddress(InetSocketAddress addr) { +this.addr = addr; +} + +public String getHostName() { +return addr.getHostName(); +} + +public MockableInetAddress getAddress() { +InetAddress ia = addr.getAddress(); +return ia == null ? null : new MockableInetAddress(ia); +} + +@Override +public String toString() { +return addr.toString(); +} +} + +static class MockableInetAddress { +private final InetAddress ia; + +MockableInetAddress(InetAddress ia) { +this.ia = ia; +} + +public String getCanonicalHostName() { +return ia.getCanonicalHostName(); +} + +public String getHostAddress() { +return ia.getHostAddress(); +} + +@Override +public String toString() { +return ia.toString(); +} +} +/** + * Get the name of the server principal for a SASL client. This is visible for testing purposes. + * @param addr the address of the host. + * @param clientConfig the configuration for the client. + * @return the name of the principal. + */ +static String getServerPrincipal(MockableInetSocketAddress addr, ZKClientConfig clientConfig) { +String principalUserName = clientConfig.getProperty(ZKClientConfig.ZK_SASL_CLIENT_USERNAME, +ZKClientConfig.ZK_SASL_CLIENT_USERNAME_DEFAULT); +String hostName = addr.getHostName(); + +boolean canonicalize = true; +try { +canonicalize = Boolean.parseBoolean(clientConfig.getProperty(ZKClientConfig.ZK_SASL_CLIENT_CANONICALIZE_HOSTNAME, + ZKClientConfig.ZK_SASL_CLIENT_CANONICALIZE_HOSTNAME_DEFAULT)); +} catch (IllegalArgumentException ea) { --- End diff -- The IllegalArgumentException is coming from the `Boolean.parseBoolean`. If it cannot understand the value as either true or false it will throw the exception and we end up with it being true, the default. If you want a log I am happy to add it, because "no" will be interpreted as true, which might be confusing. ---
[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/648 I just added in a test and made the code match closer to how master/3.5 work for the canonical host name. ---
[GitHub] zookeeper issue #652: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/652 Added in the test. ---
[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/648 I tried to add in powermock too, but it conflicts with mockito (power mock uses a much older version of mockito-core). I'll try and use a different API for powermock instead of mockito. ---
[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/648 Looks like it is mockito-all not powermock-api-mockito. I can try and change it and see. ---
[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/648 So I am running into some issues with the tests as all of the methods to InetSocketAddress are marked as final and as such I cannot mock them. This means I would need a set of host names that are real, and accessible from any build host you would want to run the test on and never going to be taken down, or I would need to create another class that wraps InetSocketAddress for the canonicalization that would allow me to do the mocking at that level. Not sure which you would prefer. ---
[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/648 Sorry yes. I'll get on the test case... ---
[GitHub] zookeeper pull request #652: ZOOKEEPER-3156: Add in option to canonicalize h...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/652#discussion_r221749153 --- Diff: src/java/main/org/apache/zookeeper/ClientCnxn.java --- @@ -1102,7 +1103,32 @@ private void startConnect(InetSocketAddress addr) throws IOException { private String getServerPrincipal(InetSocketAddress addr) { String principalUserName = clientConfig.getProperty(ZKClientConfig.ZK_SASL_CLIENT_USERNAME, ZKClientConfig.ZK_SASL_CLIENT_USERNAME_DEFAULT); -String serverPrincipal = principalUserName + "/" + addr.getHostString(); +String hostName = addr.getHostName(); + +boolean canonicalize = true; +try { +canonicalize = Boolean.parseBoolean(System.getProperty(ZKClientConfig.ZK_SASL_CLIENT_CANONICALIZE_HOSTNAME, --- End diff -- Great point, missed that on the port. ---
[GitHub] zookeeper pull request #652: ZOOKEEPER-3156: Add in option to canonicalize h...
GitHub user revans2 opened a pull request: https://github.com/apache/zookeeper/pull/652 ZOOKEEPER-3156: Add in option to canonicalize host name This is the master and 3.5 version of #648. It should apply cleanly to both lines, but if you want a separate pull request for each I am happy to do it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/revans2/zookeeper ZOOKEEPER-3156 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/652.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #652 commit 24aca04adf57d10df123e2d61f55e2eb1344c014 Author: Robert Evans Date: 2018-09-26T20:40:26Z ZOOKEEPER-3156: Add in option to canonicalize host name ---
[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/648 I just addressed the review comments if things look good I am happy to squash commits. I'll also put up pull requests to the other lines, now that it looks like we have the majority of the code worked out. ---
[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/648 @anmolnar I think I have addressed all of your comments. Please have another look. ---
[GitHub] zookeeper pull request #648: ZOOKEEPER-3156: Add in option to canonicalize h...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/648#discussion_r221377101 --- Diff: src/java/main/org/apache/zookeeper/ClientCnxn.java --- @@ -990,6 +992,27 @@ private void sendPing() { private boolean saslLoginFailed = false; private void startConnect(InetSocketAddress addr) throws IOException { +boolean canonicalize = true; +try { +canonicalize = Boolean.parseBoolean(System.getProperty(ZK_SASL_CLIENT_CANONICALIZE_HOSTNAME, "true")); +} catch (IllegalArgumentException ea) { +//ignored ... +} + +if (canonicalize) { +try { +InetAddress ia = addr.getAddress(); +LOG.warn("ia {}", ia); +if (ia == null) { +ia = InetAddress.getByName(addr.getHostName()); +} +String host = (ia != null) ? ia.getCanonicalHostName() : addr.getHostName(); +addr = new InetSocketAddress(InetAddress.getByAddress(host, ia.getAddress()), addr.getPort()); --- End diff -- But you said that we already have the IP address, which is what it is going to use, so never mind, it should be fine. ---
[GitHub] zookeeper pull request #648: ZOOKEEPER-3156: Add in option to canonicalize h...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/648#discussion_r221376870 --- Diff: src/java/main/org/apache/zookeeper/ClientCnxn.java --- @@ -990,6 +992,27 @@ private void sendPing() { private boolean saslLoginFailed = false; private void startConnect(InetSocketAddress addr) throws IOException { +boolean canonicalize = true; +try { +canonicalize = Boolean.parseBoolean(System.getProperty(ZK_SASL_CLIENT_CANONICALIZE_HOSTNAME, "true")); +} catch (IllegalArgumentException ea) { +//ignored ... +} + +if (canonicalize) { +try { +InetAddress ia = addr.getAddress(); +LOG.warn("ia {}", ia); +if (ia == null) { +ia = InetAddress.getByName(addr.getHostName()); +} +String host = (ia != null) ? ia.getCanonicalHostName() : addr.getHostName(); +addr = new InetSocketAddress(InetAddress.getByAddress(host, ia.getAddress()), addr.getPort()); --- End diff -- I put it here because there is a possible race. The entire reason we have the setup we do is so that we can change the nodes in the cluster without changing the config on the client side. If the code that establishes the connection uses a different address from the code that creates the principal there is the possibility, because of the magic of DNS caching, that the connection would be made to a different box from the one the SASL client is expecting. If that happens, and we have the default Client section in the jaas conf, it will not result in an error. Instead the ZK client logs something about SASL failed and still connects but is not authorized to do anything. If it failed and retried a different node I would be happy to move it, but it does not and that failure would not be transparent to the end user. ---
[GitHub] zookeeper pull request #648: ZOOKEEPER-3156: Add in option to canonicalize h...
GitHub user revans2 opened a pull request: https://github.com/apache/zookeeper/pull/648 ZOOKEEPER-3156: Add in option to canonicalize host name You can merge this pull request into a Git repository by running: $ git pull https://github.com/revans2/zookeeper ZOOKEEPER-3156-3.4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/648.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #648 commit 7ecd6c9151c0e7be749cfead5f141d673a8663f6 Author: Robert Evans Date: 2018-09-26T20:40:26Z ZOOKEEPER-3156: Add in option to canonicalize host name ---
[GitHub] zookeeper issue #453: ZOOKEEPER-2845: Apply commit log when restarting serve...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/453 Thanks @afine I closed them. ---
[GitHub] zookeeper pull request #455: ZOOKEEPER-2845: Apply commit log when restartin...
Github user revans2 closed the pull request at: https://github.com/apache/zookeeper/pull/455 ---
[GitHub] zookeeper pull request #454: ZOOKEEPER-2845: Apply commit log when restartin...
Github user revans2 closed the pull request at: https://github.com/apache/zookeeper/pull/454 ---
[GitHub] zookeeper issue #453: ZOOKEEPER-2845: Apply commit log when restarting serve...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/453 @afine all of the changes in this branch are now in the pull requests to the 3.5 and 3.5 branches, ---
[GitHub] zookeeper issue #455: ZOOKEEPER-2845: Apply commit log when restarting serve...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/455 I just rebased this and pulled in all of the changes made to the main test. ---
[GitHub] zookeeper issue #454: ZOOKEEPER-2845: Apply commit log when restarting serve...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/454 I just rebased this and pulled in all of the changes made to the main test. ---
[GitHub] zookeeper issue #453: ZOOKEEPER-2845: Apply commit log when restarting serve...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/453 @afine I have addressed you most recent comments. If you want me to squash commits please let me know. I have a pull request for the 3.5 branch #454 and for the 3.4 branch #455. I will be spending some time porting the test to them, and let you know when it is ready. ---
[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/453#discussion_r169662234 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java --- @@ -888,4 +923,103 @@ public void testWithOnlyMinSessionTimeout() throws Exception { maxSessionTimeOut, quorumPeer.getMaxSessionTimeout()); } +@Test +public void testFailedTxnAsPartOfQuorumLoss() throws Exception { +// 1. start up server and wait for leader election to finish +ClientBase.setupTestEnv(); +final int SERVER_COUNT = 3; +servers = LaunchServers(SERVER_COUNT); + +waitForAll(servers, States.CONNECTED); + +// we need to shutdown and start back up to make sure that the create session isn't the first transaction since +// that is rather innocuous. +servers.shutDownAllServers(); +waitForAll(servers, States.CONNECTING); +servers.restartAllServersAndClients(this); +waitForAll(servers, States.CONNECTED); + +// 2. kill all followers +int leader = servers.findLeader(); +Map outstanding = servers.mt[leader].main.quorumPeer.leader.outstandingProposals; +// increase the tick time to delay the leader going to looking +servers.mt[leader].main.quorumPeer.tickTime = 1; +LOG.warn("LEADER {}", leader); + +for (int i = 0; i < SERVER_COUNT; i++) { +if (i != leader) { +servers.mt[i].shutdown(); +} +} + +// 3. start up the followers to form a new quorum +for (int i = 0; i < SERVER_COUNT; i++) { +if (i != leader) { +servers.mt[i].start(); +} +} + +// 4. wait one of the follower to be the new leader +for (int i = 0; i < SERVER_COUNT; i++) { +if (i != leader) { +// Recreate a client session since the previous session was not persisted. +servers.restartClient(i, this); +waitForOne(servers.zk[i], States.CONNECTED); +} +} + +// 5. send a create request to old leader and make sure it's synced to disk, +//which means it acked from itself +try { +servers.zk[leader].create("/zk" + leader, "zk".getBytes(), Ids.OPEN_ACL_UNSAFE, +CreateMode.PERSISTENT); +Assert.fail("create /zk" + leader + " should have failed"); +} catch (KeeperException e) { +} + +// just make sure that we actually did get it in process at the +// leader +Assert.assertEquals(1, outstanding.size()); +Proposal p = outstanding.values().iterator().next(); +Assert.assertEquals(OpCode.create, p.request.getHdr().getType()); + +// make sure it has a chance to write it to disk +int sleepTime = 0; +Long longLeader = new Long(leader); +while (!p.qvAcksetPairs.get(0).getAckset().contains(longLeader)) { +if (sleepTime > 2000) { +Assert.fail("Transaction not synced to disk within 1 second " + p.qvAcksetPairs.get(0).getAckset() ++ " expected " + leader); +} +Thread.sleep(100); +sleepTime += 100; +} + +// 6. wait for the leader to quit due to not enough followers and come back up as a part of the new quorum +sleepTime = 0; +Follower f = servers.mt[leader].main.quorumPeer.follower; +while (f == null || !f.isRunning()) { +if (sleepTime > 10_000) { +Assert.fail("Took too long for old leader to time out " + servers.mt[leader].main.quorumPeer.getPeerState()); +} +Thread.sleep(100); +sleepTime += 100; +f = servers.mt[leader].main.quorumPeer.follower; +} +servers.mt[leader].shutdown(); --- End diff -- It is a lot of very specific steps that make the data inconsistency show up. This is needed to force the transaction log to be replayed which has an edit in it that wasn't considered as a part of leader election. ---
[GitHub] zookeeper issue #453: ZOOKEEPER-2845: Apply commit log when restarting serve...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/453 @afine and @anmolnar I think I have addressed all of your review comments, except for the one about the change to `waitForOne` and I am happy to adjust however you want there. ---
[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/453#discussion_r168807943 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java --- @@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws Exception { maxSessionTimeOut, quorumPeer.getMaxSessionTimeout()); } +@Test +public void testTxnAheadSnapInRetainDB() throws Exception { +// 1. start up server and wait for leader election to finish +ClientBase.setupTestEnv(); +final int SERVER_COUNT = 3; +final int clientPorts[] = new int[SERVER_COUNT]; +StringBuilder sb = new StringBuilder(); +for (int i = 0; i < SERVER_COUNT; i++) { +clientPorts[i] = PortAssignment.unique(); +sb.append("server." + i + "=127.0.0.1:" + PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] + "\n"); +} +String quorumCfgSection = sb.toString(); + +MainThread mt[] = new MainThread[SERVER_COUNT]; +ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT]; +for (int i = 0; i < SERVER_COUNT; i++) { +mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection); +mt[i].start(); +zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], ClientBase.CONNECTION_TIMEOUT, this); +} + +waitForAll(zk, States.CONNECTED); + +// we need to shutdown and start back up to make sure that the create session isn't the first transaction since +// that is rather innocuous. +for (int i = 0; i < SERVER_COUNT; i++) { +mt[i].shutdown(); +} + +waitForAll(zk, States.CONNECTING); + +for (int i = 0; i < SERVER_COUNT; i++) { +mt[i].start(); +// Recreate a client session since the previous session was not persisted. +zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], ClientBase.CONNECTION_TIMEOUT, this); +} + +waitForAll(zk, States.CONNECTED); + +// 2. kill all followers +int leader = -1; +Map outstanding = null; +for (int i = 0; i < SERVER_COUNT; i++) { +if (mt[i].main.quorumPeer.leader != null) { +leader = i; +outstanding = mt[leader].main.quorumPeer.leader.outstandingProposals; +// increase the tick time to delay the leader going to looking +mt[leader].main.quorumPeer.tickTime = 1; +} +} +LOG.warn("LEADER {}", leader); + +for (int i = 0; i < SERVER_COUNT; i++) { +if (i != leader) { +mt[i].shutdown(); +} +} + +// 3. start up the followers to form a new quorum +for (int i = 0; i < SERVER_COUNT; i++) { +if (i != leader) { +mt[i].start(); +} +} + +// 4. wait one of the follower to be the leader +for (int i = 0; i < SERVER_COUNT; i++) { +if (i != leader) { +// Recreate a client session since the previous session was not persisted. +zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], ClientBase.CONNECTION_TIMEOUT, this); +waitForOne(zk[i], States.CONNECTED); +} +} + +// 5. send a create request to leader and make sure it's synced to disk, +//which means it acked from itself +try { +zk[leader].create("/zk" + leader, "zk".getBytes(), Ids.OPEN_ACL_UNSAFE, +CreateMode.PERSISTENT); +Assert.fail("create /zk" + leader + " should have failed"); +} catch (KeeperException e) { +} + +// just make sure that we actually did get it in process at the +// leader +Assert.assertTrue(outstanding.size() == 1); +Proposal p = (Proposal) outstanding.values().iterator().next(); +Assert.assertTrue(p.request.getHdr().getType() == OpCode.create); + +// make sure it has a chance to write it to disk +Thread.sleep(1000); +p.qvAcksetPairs.get(0).getAckset().contains(leader); + +// 6. wait the leader to quit due to no enough followers +Thread.sleep(4000); +//waitForOne(zk[leader], States.CONNECTING); --- End diff -- done ---
[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/453#discussion_r168807976 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java --- @@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws Exception { maxSessionTimeOut, quorumPeer.getMaxSessionTimeout()); } +@Test +public void testTxnAheadSnapInRetainDB() throws Exception { --- End diff -- done ---
[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/453#discussion_r168807914 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java --- @@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws Exception { maxSessionTimeOut, quorumPeer.getMaxSessionTimeout()); } +@Test +public void testTxnAheadSnapInRetainDB() throws Exception { +// 1. start up server and wait for leader election to finish +ClientBase.setupTestEnv(); +final int SERVER_COUNT = 3; +final int clientPorts[] = new int[SERVER_COUNT]; +StringBuilder sb = new StringBuilder(); +for (int i = 0; i < SERVER_COUNT; i++) { +clientPorts[i] = PortAssignment.unique(); +sb.append("server." + i + "=127.0.0.1:" + PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] + "\n"); +} +String quorumCfgSection = sb.toString(); + +MainThread mt[] = new MainThread[SERVER_COUNT]; +ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT]; +for (int i = 0; i < SERVER_COUNT; i++) { +mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection); +mt[i].start(); +zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], ClientBase.CONNECTION_TIMEOUT, this); +} + +waitForAll(zk, States.CONNECTED); + +// we need to shutdown and start back up to make sure that the create session isn't the first transaction since +// that is rather innocuous. +for (int i = 0; i < SERVER_COUNT; i++) { +mt[i].shutdown(); +} + +waitForAll(zk, States.CONNECTING); + +for (int i = 0; i < SERVER_COUNT; i++) { +mt[i].start(); +// Recreate a client session since the previous session was not persisted. +zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], ClientBase.CONNECTION_TIMEOUT, this); +} + +waitForAll(zk, States.CONNECTED); + +// 2. kill all followers +int leader = -1; +Map outstanding = null; +for (int i = 0; i < SERVER_COUNT; i++) { +if (mt[i].main.quorumPeer.leader != null) { +leader = i; +outstanding = mt[leader].main.quorumPeer.leader.outstandingProposals; +// increase the tick time to delay the leader going to looking +mt[leader].main.quorumPeer.tickTime = 1; +} +} +LOG.warn("LEADER {}", leader); + +for (int i = 0; i < SERVER_COUNT; i++) { +if (i != leader) { +mt[i].shutdown(); +} +} + +// 3. start up the followers to form a new quorum +for (int i = 0; i < SERVER_COUNT; i++) { +if (i != leader) { +mt[i].start(); +} +} + +// 4. wait one of the follower to be the leader +for (int i = 0; i < SERVER_COUNT; i++) { +if (i != leader) { +// Recreate a client session since the previous session was not persisted. +zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], ClientBase.CONNECTION_TIMEOUT, this); +waitForOne(zk[i], States.CONNECTED); +} +} + +// 5. send a create request to leader and make sure it's synced to disk, +//which means it acked from itself +try { +zk[leader].create("/zk" + leader, "zk".getBytes(), Ids.OPEN_ACL_UNSAFE, +CreateMode.PERSISTENT); +Assert.fail("create /zk" + leader + " should have failed"); +} catch (KeeperException e) { +} + +// just make sure that we actually did get it in process at the +// leader +Assert.assertTrue(outstanding.size() == 1); +Proposal p = (Proposal) outstanding.values().iterator().next(); --- End diff -- removed the cast ---
[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/453#discussion_r168807853 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java --- @@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws Exception { maxSessionTimeOut, quorumPeer.getMaxSessionTimeout()); } +@Test +public void testTxnAheadSnapInRetainDB() throws Exception { +// 1. start up server and wait for leader election to finish +ClientBase.setupTestEnv(); +final int SERVER_COUNT = 3; +final int clientPorts[] = new int[SERVER_COUNT]; +StringBuilder sb = new StringBuilder(); +for (int i = 0; i < SERVER_COUNT; i++) { +clientPorts[i] = PortAssignment.unique(); +sb.append("server." + i + "=127.0.0.1:" + PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] + "\n"); +} +String quorumCfgSection = sb.toString(); + +MainThread mt[] = new MainThread[SERVER_COUNT]; +ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT]; +for (int i = 0; i < SERVER_COUNT; i++) { +mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection); +mt[i].start(); +zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], ClientBase.CONNECTION_TIMEOUT, this); +} + +waitForAll(zk, States.CONNECTED); + +// we need to shutdown and start back up to make sure that the create session isn't the first transaction since +// that is rather innocuous. +for (int i = 0; i < SERVER_COUNT; i++) { +mt[i].shutdown(); +} + +waitForAll(zk, States.CONNECTING); + +for (int i = 0; i < SERVER_COUNT; i++) { +mt[i].start(); +// Recreate a client session since the previous session was not persisted. +zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], ClientBase.CONNECTION_TIMEOUT, this); +} + +waitForAll(zk, States.CONNECTED); + +// 2. kill all followers +int leader = -1; +Map outstanding = null; +for (int i = 0; i < SERVER_COUNT; i++) { +if (mt[i].main.quorumPeer.leader != null) { +leader = i; +outstanding = mt[leader].main.quorumPeer.leader.outstandingProposals; +// increase the tick time to delay the leader going to looking +mt[leader].main.quorumPeer.tickTime = 1; +} +} +LOG.warn("LEADER {}", leader); + +for (int i = 0; i < SERVER_COUNT; i++) { +if (i != leader) { +mt[i].shutdown(); +} +} + +// 3. start up the followers to form a new quorum +for (int i = 0; i < SERVER_COUNT; i++) { +if (i != leader) { +mt[i].start(); +} +} + +// 4. wait one of the follower to be the leader +for (int i = 0; i < SERVER_COUNT; i++) { +if (i != leader) { +// Recreate a client session since the previous session was not persisted. +zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], ClientBase.CONNECTION_TIMEOUT, this); +waitForOne(zk[i], States.CONNECTED); +} +} + +// 5. send a create request to leader and make sure it's synced to disk, +//which means it acked from itself +try { +zk[leader].create("/zk" + leader, "zk".getBytes(), Ids.OPEN_ACL_UNSAFE, +CreateMode.PERSISTENT); +Assert.fail("create /zk" + leader + " should have failed"); +} catch (KeeperException e) { +} + +// just make sure that we actually did get it in process at the +// leader +Assert.assertTrue(outstanding.size() == 1); +Proposal p = (Proposal) outstanding.values().iterator().next(); +Assert.assertTrue(p.request.getHdr().getType() == OpCode.create); + +// make sure it has a chance to write it to disk +Thread.sleep(1000); --- End diff -- I was able to do what you said and drop the 1 second sleep, but the sleep at step 6 I am going to need something else because the leader is only in the looking state for 2 ms. Leader election happens way too fast for us to be able to catch that by polli
[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/453#discussion_r168795646 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java --- @@ -435,7 +435,7 @@ private void waitForOne(ZooKeeper zk, States state) throws InterruptedException int iterations = ClientBase.CONNECTION_TIMEOUT / 500; while (zk.getState() != state) { if (iterations-- == 0) { -throw new RuntimeException("Waiting too long"); +throw new RuntimeException("Waiting too long " + zk.getState() + " != " + state); --- End diff -- @anmolnar and @afine I put this in for my own debugging and I forgot to remove it. If you want me to I am happy to either remove it or file a separate JIRA and put it up as a separate pull request, or just leave it. Either way is fine with me. ---
[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/453#discussion_r168794042 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java --- @@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws Exception { maxSessionTimeOut, quorumPeer.getMaxSessionTimeout()); } +@Test +public void testTxnAheadSnapInRetainDB() throws Exception { +// 1. start up server and wait for leader election to finish +ClientBase.setupTestEnv(); +final int SERVER_COUNT = 3; +final int clientPorts[] = new int[SERVER_COUNT]; +StringBuilder sb = new StringBuilder(); +for (int i = 0; i < SERVER_COUNT; i++) { +clientPorts[i] = PortAssignment.unique(); +sb.append("server." + i + "=127.0.0.1:" + PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] + "\n"); +} +String quorumCfgSection = sb.toString(); + +MainThread mt[] = new MainThread[SERVER_COUNT]; +ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT]; +for (int i = 0; i < SERVER_COUNT; i++) { +mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection); +mt[i].start(); +zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], ClientBase.CONNECTION_TIMEOUT, this); +} + +waitForAll(zk, States.CONNECTED); + +// we need to shutdown and start back up to make sure that the create session isn't the first transaction since +// that is rather innocuous. +for (int i = 0; i < SERVER_COUNT; i++) { +mt[i].shutdown(); +} + +waitForAll(zk, States.CONNECTING); + +for (int i = 0; i < SERVER_COUNT; i++) { +mt[i].start(); +// Recreate a client session since the previous session was not persisted. +zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], ClientBase.CONNECTION_TIMEOUT, this); +} + +waitForAll(zk, States.CONNECTED); + +// 2. kill all followers +int leader = -1; +Map outstanding = null; +for (int i = 0; i < SERVER_COUNT; i++) { +if (mt[i].main.quorumPeer.leader != null) { +leader = i; +outstanding = mt[leader].main.quorumPeer.leader.outstandingProposals; +// increase the tick time to delay the leader going to looking +mt[leader].main.quorumPeer.tickTime = 1; +} +} +LOG.warn("LEADER {}", leader); + +for (int i = 0; i < SERVER_COUNT; i++) { +if (i != leader) { +mt[i].shutdown(); +} +} + +// 3. start up the followers to form a new quorum +for (int i = 0; i < SERVER_COUNT; i++) { +if (i != leader) { +mt[i].start(); +} +} + +// 4. wait one of the follower to be the leader +for (int i = 0; i < SERVER_COUNT; i++) { +if (i != leader) { +// Recreate a client session since the previous session was not persisted. +zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], ClientBase.CONNECTION_TIMEOUT, this); +waitForOne(zk[i], States.CONNECTED); +} +} + +// 5. send a create request to leader and make sure it's synced to disk, +//which means it acked from itself +try { +zk[leader].create("/zk" + leader, "zk".getBytes(), Ids.OPEN_ACL_UNSAFE, +CreateMode.PERSISTENT); +Assert.fail("create /zk" + leader + " should have failed"); +} catch (KeeperException e) { +} + +// just make sure that we actually did get it in process at the +// leader +Assert.assertTrue(outstanding.size() == 1); +Proposal p = (Proposal) outstanding.values().iterator().next(); +Assert.assertTrue(p.request.getHdr().getType() == OpCode.create); + +// make sure it has a chance to write it to disk +Thread.sleep(1000); --- End diff -- I will see if I can make it work. I agree I would love to kill as many of the sleeps as possible. ---
[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/453#discussion_r168793764 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java --- @@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws Exception { maxSessionTimeOut, quorumPeer.getMaxSessionTimeout()); } +@Test +public void testTxnAheadSnapInRetainDB() throws Exception { +// 1. start up server and wait for leader election to finish +ClientBase.setupTestEnv(); +final int SERVER_COUNT = 3; +final int clientPorts[] = new int[SERVER_COUNT]; +StringBuilder sb = new StringBuilder(); +for (int i = 0; i < SERVER_COUNT; i++) { +clientPorts[i] = PortAssignment.unique(); +sb.append("server." + i + "=127.0.0.1:" + PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] + "\n"); +} +String quorumCfgSection = sb.toString(); + +MainThread mt[] = new MainThread[SERVER_COUNT]; +ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT]; +for (int i = 0; i < SERVER_COUNT; i++) { +mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection); --- End diff -- I am not super familiar with the test infrastructure. If you have a suggestion I would love it, otherwise I will look around and see what I can come up with. ---
[GitHub] zookeeper issue #453: ZOOKEEPER-2845: Apply commit log when restarting serve...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/453 @anmolnar I added in an updated version of the test in #310. The issue turned out to be a race condition where the original leader would time out clients and then would join the new quorum too quickly for the test to be able to detect it. I changed it so there is a hard coded sleep instead and then just shut down the leader. I would love to get rid of the hard coded sleep, but I wasn't really sure how to do it without making some major changes in the leader code to put in a synchronization point. If you really want me to do it I can, but it felt rather intrusive. I verified that when I comment out my code that does the fast forward the test fails and when I put it back the test passes. If this looks OK I'll try to port the test to the other release branches too. I also addressed your request to make some of the code common. ---
[GitHub] zookeeper issue #453: ZOOKEEPER-2845: Apply commit log when restarting serve...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/453 @anmolnar I will add some kind of a test. I ran into a lot of issues with `testTxnAheadSnapInRetainDB`. I could not get it to run correctly against master as it would always end up electing the original leader again and the test would fail, but not because it had reproduced the issue. I finally just did development work based off of the [original patch](https://github.com/apache/zookeeper/compare/master...revans2:ZOOKEEPER-2845-updated-fix?expand=1) and verified that `testTxnAheadSnapInRetainDB` passed, or that if it failed it did so because of leader election. ---
[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/453#discussion_r167885280 --- Diff: src/java/main/org/apache/zookeeper/server/ZKDatabase.java --- @@ -233,14 +233,32 @@ public long getDataTreeLastProcessedZxid() { * @throws IOException */ public long loadDataBase() throws IOException { -PlayBackListener listener=new PlayBackListener(){ +PlayBackListener listener = new PlayBackListener(){ public void onTxnLoaded(TxnHeader hdr,Record txn){ Request r = new Request(0, hdr.getCxid(),hdr.getType(), hdr, txn, hdr.getZxid()); addCommittedProposal(r); } }; -long zxid = snapLog.restore(dataTree,sessionsWithTimeouts,listener); +long zxid = snapLog.restore(dataTree, sessionsWithTimeouts, listener); +initialized = true; +return zxid; +} + +/** + * Fast forward the database adding transactions from the committed log into memory. + * @return the last valid zxid. + * @throws IOException + */ +public long fastForwardDataBase() throws IOException { +PlayBackListener listener = new PlayBackListener(){ --- End diff -- Will do ---
[GitHub] zookeeper issue #453: ZOOKEEPER-2845: Apply commit log when restarting serve...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/453 Thank you to everyone who reviewed the patch, but with the help of Fangmin Lv I found one case that the original patch didn't cover. I have reworked the patch to cover that case, but to do so I had to take a completely different approach. I think this is a better approach because it reuses a lot of the code that was originally run to load the database from disk. So now instead of reloading the entire database from disk, we apply all of the uncommitted transactions in the log to the in memory database. This should put it in exactly the same state as if we had cleared the data and reloaded it from disk, but with much less overhead. ---
[GitHub] zookeeper pull request #455: ZOOKEEPER-2845: Send a SNAP if transactions can...
GitHub user revans2 opened a pull request: https://github.com/apache/zookeeper/pull/455 ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified. This is the version of #453 for the 3.4 branch You can merge this pull request into a Git repository by running: $ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2845-3.4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/455.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #455 commit b035df19616424036afb1f31f345dedf26e3b2ae Author: Robert Evans Date: 2018-02-01T02:09:53Z ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified. ---
[GitHub] zookeeper pull request #454: ZOOKEEPER-2845: Send a SNAP if transactions can...
GitHub user revans2 opened a pull request: https://github.com/apache/zookeeper/pull/454 ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified. (3.5) This is the version of #453 for the 3.5 branch You can merge this pull request into a Git repository by running: $ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2845-3.5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/454.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #454 commit 70436249c830af0b129caf3d1bed2f55a2498b6b Author: Robert Evans Date: 2018-01-29T20:27:10Z ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified. ---
[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Send a SNAP if transactions can...
GitHub user revans2 opened a pull request: https://github.com/apache/zookeeper/pull/453 ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified. I will be creating a patch/pull request for 3.4 and 3.5 too, but I wanted to get a pull request up for others to look at ASAP. I have a version of this based off of #310 at https://github.com/revans2/zookeeper/tree/ZOOKEEPER-2845-orig-test-patch but the test itself is flaky. Frequently leader election does not go as planned on the test and it ends up failing but not because it ended up in an inconsistent state. I am happy to answer any questions anyone has about the patch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2845-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/453.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #453 commit 0219b2c9e44527067cd5fed4b642729171721886 Author: Robert Evans Date: 2018-01-29T20:27:10Z ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified. ---
[GitHub] zookeeper issue #310: [ZOOKEEPER-2845][Test] Test used to reproduce the data...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/310 Apparently for some reason I don't understand if I don't run all of the tests in QuorumPeerMainTest The old leader is elected again each time. ---
[GitHub] zookeeper issue #310: [ZOOKEEPER-2845][Test] Test used to reproduce the data...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/310 @lvfangmin I am trying to reproduce the issue you have seen here, and I have not been able to do so. The test either fails for me with the same leader being elected each time, or on newer versions with the leader client connected instead of connecting waiting for it to quit with a timeout, that I am not sure if it ever happens. How frequently would this test pass for you? ---
[GitHub] zookeeper issue #310: [ZOOKEEPER-2845][Test] Test used to reproduce the data...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/310 @lvfangmin any update on getting a pull request for the actual fix? ---
[GitHub] zookeeper issue #282: ZOOKEEPER-1782: Let a SASL super user be super
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/282 @hanm I added in the docs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #282: ZOOKEEPER-1782: Let a SASL super user be super
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/282 @hanm I didn't add in `zookeeper.superUser`. I am happy to document it if you tell me where to make that change. I don't know where the documentation source is stored. I am happy to add in an additional `zookeeper.SASLAuthenticationProvider.superUser` config and document it instead, but removing `zookeeper.superUser` will technically break backwards compatibility. If that is what you want I can do it, I just want to be sure like on the other breaking change that was suggested. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #282: ZOOKEEPER-1782: Let a SASL super user be super
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/282 @arshadmohammad if everything looks good I will squash my commits --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #282: ZOOKEEPER-1782: Let a SASL super user be super
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/282#discussion_r123330561 --- Diff: src/java/main/org/apache/zookeeper/server/auth/SASLAuthenticationProvider.java --- @@ -38,11 +38,6 @@ public String getScheme() { } public boolean matches(String id,String aclExpr) { -if (System.getProperty("zookeeper.superUser") != null) { -if (id.equals(System.getProperty("zookeeper.superUser")) || id.equals(aclExpr)) { - return true; -} -} if ((id.equals("super") || id.equals(aclExpr))) { --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #282: ZOOKEEPER-1782: Let a SASL super user be super
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/282 @arshadmohammad I think I have addressed your review comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #282: ZOOKEEPER-1782: Let a SASL super user be super
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/282#discussion_r122698936 --- Diff: src/java/main/org/apache/zookeeper/server/auth/SASLAuthenticationProvider.java --- @@ -38,11 +38,6 @@ public String getScheme() { } public boolean matches(String id,String aclExpr) { -if (System.getProperty("zookeeper.superUser") != null) { -if (id.equals(System.getProperty("zookeeper.superUser")) || id.equals(aclExpr)) { - return true; -} -} if ((id.equals("super") || id.equals(aclExpr))) { --- End diff -- Sure. It looks like I am going to have to pull out the "super" user SASL Login too. https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/auth/SaslServerCallbackHandler.java#L42 https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/auth/SaslServerCallbackHandler.java#L99-L101 Do you know where the source for the documentation at https://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html is? Under the section "Authentication & Authorization Options" it talks about how to use the "super" user. I also found docs for this in a few other places on the open internet and because this is a backwards incompatible change I get a little nervous, so is there anything else I need to do to document a breaking change like this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #282: ZOOKEEPER-1782: Let a SASL super user be super
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/282 @hanm Thanks for the review. I think I have addressed all of your review comments. I originally put the patch up in 2013 and I have not touched it since so I honestly don't remember a lot about why I made the test that way to begin with. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #282: ZOOKEEPER-1782: Let a SASL super user be super
GitHub user revans2 opened a pull request: https://github.com/apache/zookeeper/pull/282 ZOOKEEPER-1782: Let a SASL super user be super You can merge this pull request into a Git repository by running: $ git pull https://github.com/revans2/zookeeper ZOOKEEPER-1782 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/282.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #282 commit 62ee0bcedbbbf34b6baf6effac24765d714928cc Author: Robert Evans Date: 2017-06-14T14:13:03Z ZOOKEEPER-1782: Let a SASL super user be super --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...
Github user revans2 closed the pull request at: https://github.com/apache/zookeeper/pull/157 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #180: ZOOKEEPER-2700 add JMX `takeSnapshot` method an...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/180#discussion_r102283801 --- Diff: src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java --- @@ -303,15 +305,38 @@ public void loadData() throws IOException, InterruptedException { public void takeSnapshot(){ --- End diff -- @flier Having SyncRequestProcessor look at the number of snapshot files feels like a bit of a hack, but I am not expert here. I would be fine with having SyncRequestProcessor delegate all throttling of snapshots to ZooKeeperServer, but then we need some sort of synchronization to prevent a snapshot from being taken when it is not in a proper state (during the ZABP sync phase). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #180: ZOOKEEPER-2700 add JMX `takeSnapshot` method an...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/180#discussion_r102245094 --- Diff: src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java --- @@ -303,15 +305,38 @@ public void loadData() throws IOException, InterruptedException { public void takeSnapshot(){ --- End diff -- @eribeiro An AtomicBoolean is not enough as the SyncRequestProcessor ignores it, only the JMX and admin commands use it. @flier there are several things going on here and I don't know the reasoning for all of them but I can guess. 1. Only take one snapshot at a time. I don't think this is super critical, because it is not a correctness problem. Having multiple snapshots happening at the same time should work correctly even in the face of crashes, but it becomes a performance problem. There is a limited amount of CPU, Memory, and most importantly disk bandwidth and IOps. The last successful snapshot wins. So having multiple snapshots all going at the same time means we are likely to be doing a lot of wasted work if everything does happen successfully. If we can schedule the snapshots so there is only one going on at a time then the full bandwidth and IOps can go to that snapshot. Even better would be to space them out, so if we force a snapshot it makes SyncRequestProcessor reset its counter so we don't take another one until X more transactions have been processed. 2. Taking a snapshot at the wrong time and then crashing can corrupt the DB on that node. This is potentially critical. The probability of this happening is very very small, but if we can fix it so it can never happen, I would be much happier. I think we can fix both issues at the same time, by making JMX and Admin use SyncRequestProcessor instead of going to the ZookeeperServer directly. If no SyncRequestProcessor is ready then we can return such to the user so they know the node is not ready to take a snapshot. I also really would like to understand the use case for this command, because I just don't see much value to a user to force another snapshot if one is already in progress. I also would like to understand when you would want to take a snapshot and if these changes would too severely limit that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #180: ZOOKEEPER-2700 add JMX `takeSnapshot` method and Jetty...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/180 Oh I also had the idea that it might be nice to provide feedback on the snap command for both JMX and the Admin command. You could provide more then just the last zxid, but you could also indicate if we are going to do the snapshot or not, and why. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #180: ZOOKEEPER-2700 add JMX `takeSnapshot` method an...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/180#discussion_r10749 --- Diff: src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java --- @@ -303,15 +305,38 @@ public void loadData() throws IOException, InterruptedException { public void takeSnapshot(){ try { +lastSnapshotZxid = zkDb.getDataTreeLastProcessedZxid(); +isGeneratingSnapshot.incrementAndGet(); + txnLogFactory.save(zkDb.getDataTree(), zkDb.getSessionWithTimeOuts()); } catch (IOException e) { LOG.error("Severe unrecoverable error, exiting", e); // This is a severe error that we cannot recover from, // so we need to exit System.exit(10); +} finally { +isGeneratingSnapshot.decrementAndGet(); } } +public boolean maybeTakeSnapshot() { --- End diff -- Could you add some javadocs here? It would be nice to explain the difference between takeSnapshot and maybeTakeSnapshot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #180: ZOOKEEPER-2700 add JMX `takeSnapshot` method an...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/180#discussion_r102229805 --- Diff: src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java --- @@ -303,15 +305,38 @@ public void loadData() throws IOException, InterruptedException { public void takeSnapshot(){ --- End diff -- We now have a lot of potential paths to take a snapshot. JMX, SyncRequestProcessor, Admin, and as part of the sync stage in ZABP. In the past it was just the SyncRequestProcessor and ZABP that would trigger snapshots. We didn't have much concern about collisions, because SyncRequestProcessor would only run as edits were being sent, and the edits would only be sent after the ZABP sync phase had completed. Now we have Admin and JMX possibly doing a snapshot in the background. Now if a snapshot request comes in during the ZABP sync phase after we have clear out the in memory DB and not yet applied the new snapshot and then we crash before we can write out the new snapshot we could end up with data corruption. This should be super super rare so I don't really know if we care all that much about it, but I think it is something that we can fix. I am not an expert on the code, so I am not sure of the cleanest way to fix this, but it feels like having maybeTakeSnapshot be a part of the SyncRequestProcessor instead of ZooKeeperServer. I don't know how simple it is to get to the SyncRequestProcessor from JMX/Admin but if you can then it means that we can reset the edit count so we are not taking a lot of snapshots all at once, one right after another. It also means that we can truly avoid having more then one snapshot be taken at any time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #180: ZOOKEEPER-2700 add `snap` command to take snaps...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/180#discussion_r101775786 --- Diff: src/java/main/org/apache/zookeeper/server/command/SnapCommand.java --- @@ -0,0 +1,53 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.zookeeper.server.command; + +import org.apache.zookeeper.server.ServerCnxn; +import org.apache.zookeeper.server.ZooKeeperThread; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.io.PrintWriter; + +public class SnapCommand extends AbstractFourLetterCommand { +private static final Logger LOG = LoggerFactory.getLogger(SnapCommand.class); + +SnapCommand(PrintWriter pw, ServerCnxn serverCnxn) { +super(pw,serverCnxn); +} + +@Override +public void commandRun() throws IOException { +if (!isZKServerRunning()) { +pw.println(ZK_NOT_SERVING); +} else { +Thread snapInProcess = new ZooKeeperThread("Snapshot Thread") { --- End diff -- Wouldn't this also be a potential DoS attack? Especially with launching a new thread in the background for the snapshot. Could we put in some sort of throttling on taking the snapshot, so that we don't do it too frequently? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #157: ZOOKEEPER-2678: Discovery and Sync can take a very lon...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/157 @rakeshadr If it makes you feel any better we have been running with an older version of this patch in production for a while. We have used it as part of a rolling upgrade at least 10 times in production where if it were not there we would have had some very painful outages. I have also manually tested it at least 50 times shooting the leader under load (10,000 operations/second) on a 3.4 GB DB, watching it recover, and then validating the integrity of the DB to be sure we didn't get any corruption. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #159: ZOOKEEPER-2678: Discovery and Sync can take a very lon...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/159 @hanm thanks for the reviews I removed the incorrect comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #159: ZOOKEEPER-2678: Discovery and Sync can take a v...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/159#discussion_r100552192 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/Learner.java --- @@ -498,14 +504,19 @@ else if (qp.getType() == Leader.SNAP) { throw new Exception("changes proposed in reconfig"); } } -if (!snapshotTaken) { // true for the pre v1.0 case - zk.takeSnapshot(); +if (isPreZAB1_0) { +zk.takeSnapshot(); self.setCurrentEpoch(newEpoch); } self.setZooKeeperServer(zk); self.adminServer.setZooKeeperServer(zk); break outerLoop; -case Leader.NEWLEADER: // it will be NEWLEADER in v1.0 +case Leader.NEWLEADER: // Getting NEWLEADER here instead of in discovery +// means this is Zab 1.0 +// Create updatingEpoch file and remove it after current --- End diff -- You are right will fix that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #157: ZOOKEEPER-2678: Discovery and Sync can take a very lon...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/157 Is there any more I need to do to get this merged in? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #157: ZOOKEEPER-2678: Discovery and Sync can take a very lon...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/157 @afine I updated the test to spy on the `LearnerZooKeeperServer` instance and check if and when `takeSnapshot` was called. I think this fulfills your desires, but with tests there can always be more. So, if you do want more tests or other areas covered please let me know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/157#discussion_r99347551 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java --- @@ -839,12 +839,25 @@ public void converseWithFollower(InputArchive ia, OutputArchive oa, Assert.assertEquals(1, f.self.getAcceptedEpoch()); Assert.assertEquals(1, f.self.getCurrentEpoch()); +//Wait for the transactions to be written out. The thread that writes them out +// does not send anything back when it is done. +long start = System.currentTimeMillis(); +while (createSessionZxid != f.fzk.getLastProcessedZxid() && (System.currentTimeMillis() - start) < 50) { --- End diff -- OK Will see what I can do --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #157: ZOOKEEPER-2678: Discovery and Sync can take a very lon...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/157 @hanm I addressed your review comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/157#discussion_r98744573 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java --- @@ -839,12 +839,25 @@ public void converseWithFollower(InputArchive ia, OutputArchive oa, Assert.assertEquals(1, f.self.getAcceptedEpoch()); Assert.assertEquals(1, f.self.getCurrentEpoch()); +//Wait for the transactions to be written out. The thread that writes them out +// does not send anything back when it is done. +long start = System.currentTimeMillis(); +while (createSessionZxid != f.fzk.getLastProcessedZxid() && (System.currentTimeMillis() - start) < 50) { --- End diff -- It does seem a bit beyond the scope of this. But if you really want me to I can look into it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #159: ZOOKEEPER-2678: Discovery and Sync can take a very lon...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/159 This also applies cleanly to the 3.5 line #157 is for the 3.4 line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #158: ZOOKEEPER-2678: Discovery and Sync can take a v...
Github user revans2 closed the pull request at: https://github.com/apache/zookeeper/pull/158 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #158: ZOOKEEPER-2678: Discovery and Sync can take a very lon...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/158 Closing this because master applies cleanly to 3.5 as well --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #157: ZOOKEEPER-2678: Discovery and Sync can take a very lon...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/157 @eribeiro will do. It is a clean cherry pick between master and 3.5 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #157: ZOOKEEPER-2678: Discovery and Sync can take a very lon...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/157 @fpj I addressed your review comments, and I also found another race in the ZAB test that I addressed. Apparently the log line I removed was taking enough time for the transactions to be fully flushed. When I removed it the test would occasionally fail. Please also take a look at #158 and #159 for master and branch 3.5. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #158: ZOOKEEPER-2678: Discovery and Sync can take a v...
GitHub user revans2 opened a pull request: https://github.com/apache/zookeeper/pull/158 ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DBs This is the 3.5 version of #157 You can merge this pull request into a Git repository by running: $ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2678-3.5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/158.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #158 commit f871601abaf2420c674bb78658ceeb5083b3b04f Author: Robert (Bobby) Evans Date: 2017-01-19T19:50:32Z ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DBs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #159: ZOOKEEPER-2678: Discovery and Sync can take a v...
GitHub user revans2 opened a pull request: https://github.com/apache/zookeeper/pull/159 ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DBs This is the master version of #157 You can merge this pull request into a Git repository by running: $ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2678-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/159.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #159 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #157: ZOOKEEPER-2678: Discovery and Sync can take a very lon...
Github user revans2 commented on the issue: https://github.com/apache/zookeeper/pull/157 @fpj Thanks for the review I will update the comments and start porting it to other lines. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/157#discussion_r98449817 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/Learner.java --- @@ -364,10 +367,12 @@ else if (qp.getType() == Leader.SNAP) { long lastQueued = 0; -// in V1.0 we take a snapshot when we get the NEWLEADER message, but in pre V1.0 +// in Zab V1.0 (ZK 3.4+) we might take a snapshot when we get the NEWLEADER message, but in pre V1.0 // we take the snapshot at the UPDATE, since V1.0 also gets the UPDATE (after the NEWLEADER) // we need to make sure that we don't take the snapshot twice. -boolean snapshotTaken = false; +boolean isPreZAB1_0 = true; +//If we are not going to take the snapshot be sure the edits are not applied in memory +boolean writeToEditLog = !snapshotNeeded; --- End diff -- Sorry about that I am still a bit new at the internal terminology. I will update it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...
GitHub user revans2 opened a pull request: https://github.com/apache/zookeeper/pull/157 ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DB This patch addresses recovery time when a leader is lost on a large DB. It does this by not clearing the DB before leader election begins, and by avoiding taking a snapshot as part of the SYNC phase, specifically for a DIFF sync. It does this by buffering the proposals and commits just like the code currently does for proposals/commits sent after the NEWLEADER and before the UPTODATE messages. If a SNAP is sent we cannot avoid writing out the full snapshot because there is no other way to make sure the disk DB is in sync with what is in memory. So any edits to the edit log before a background snapshot happened could possibly be applied on top of an incorrect snapshot. This same optimization should work for TRUNC too, but I opted not to do it for TRUNC because TRUNC is rare and TRUNC by its very nature already forces the DB to be reread after the edit logs are modified. So it would still not be fast. In practice this makes it so instead of taking 5+ mins for the cluster to recover from losing a leader it now takes about 3 seconds. I am happy to port this to 3.5. if it looks good. You can merge this pull request into a Git repository by running: $ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2678 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/157.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #157 commit 5aa25620e0189b28d7040305272be2fda28126fb Author: Robert (Bobby) Evans Date: 2017-01-19T19:50:32Z ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DBs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #148: ZOOKEEPER-2659 Log4j 2 migration
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/148#discussion_r96435980 --- Diff: ivy.xml --- @@ -41,13 +41,20 @@ - + + + + + + + --- End diff -- This is the version that log4j 2 uses itself, when you enable async logging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---