[GitHub] zookeeper pull request #648: ZOOKEEPER-3156: Add in option to canonicalize h...

2018-11-06 Thread revans2
Github user revans2 closed the pull request at:

https://github.com/apache/zookeeper/pull/648


---


[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-11-06 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/648
  
Thanks


---


[GitHub] zookeeper issue #652: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-10-31 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/652
  
@anmolnar sorry it took me so long.  I was out on vacation.


---


[GitHub] zookeeper issue #652: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-10-16 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/652
  
Ping any hope in getting this merged in?


---


[GitHub] zookeeper issue #652: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-10-05 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/652
  
I just rebased to deal with the directories being moved around.


---


[GitHub] zookeeper issue #652: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-10-04 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/652
  
Thanks for all of the reviews I just rebased and squashed commits.


---


[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-10-04 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/648
  
Thanks for all of the reviews I just rebased and squashed commits.


---


[GitHub] zookeeper issue #652: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-10-03 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/652
  
Ping @eolivelli @lvfangmin @anmolnar 

I think I have addressed all of the concerns on both branches.  The test 
failure here looks unrelated to the changes that I made.


---


[GitHub] zookeeper issue #652: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-10-02 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/652
  
@anmolnar and @lvfangmin I think I have addressed all of your review 
comments.  I named the new class SaslServerPrincipal but if you have a 
different idea for a name I am happy to change it.


---


[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-10-02 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/648
  
There were some comments in the other pull request about the test code.  I 
will address them in both places.


---


[GitHub] zookeeper pull request #652: ZOOKEEPER-3156: Add in option to canonicalize h...

2018-10-02 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/652#discussion_r221956355
  
--- Diff: src/java/main/org/apache/zookeeper/ClientCnxn.java ---
@@ -793,7 +794,87 @@ public RWServerFoundException(String msg) {
 super(msg);
 }
 }
-
+
+static class MockableInetSocketAddress {
+private final InetSocketAddress addr;
+
+MockableInetSocketAddress(InetSocketAddress addr) {
+this.addr = addr;
+}
+
+public String getHostName() {
+return addr.getHostName();
+}
+
+public MockableInetAddress getAddress() {
+InetAddress ia = addr.getAddress();
+return ia == null ? null : new MockableInetAddress(ia);
+}
+
+@Override
+public String toString() {
+return addr.toString();
+}
+}
+
+static class MockableInetAddress {
+private final InetAddress ia;
+
+MockableInetAddress(InetAddress ia) {
+this.ia = ia;
+}
+
+public String getCanonicalHostName() {
+return ia.getCanonicalHostName();
+}
+
+public String getHostAddress() {
+return ia.getHostAddress();
+}
+
+@Override
+public String toString() {
+return ia.toString();
+}
+}
+/**
+ * Get the name of the server principal for a SASL client.  This is 
visible for testing purposes.
+ * @param addr the address of the host.
+ * @param clientConfig the configuration for the client.
+ * @return the name of the principal.
+ */
+static String getServerPrincipal(MockableInetSocketAddress addr, 
ZKClientConfig clientConfig) {
+String principalUserName = 
clientConfig.getProperty(ZKClientConfig.ZK_SASL_CLIENT_USERNAME,
+ZKClientConfig.ZK_SASL_CLIENT_USERNAME_DEFAULT);
+String hostName = addr.getHostName();
+
+boolean canonicalize = true;
+try {
+canonicalize = 
Boolean.parseBoolean(clientConfig.getProperty(ZKClientConfig.ZK_SASL_CLIENT_CANONICALIZE_HOSTNAME,
+
ZKClientConfig.ZK_SASL_CLIENT_CANONICALIZE_HOSTNAME_DEFAULT));
+} catch (IllegalArgumentException ea) {
--- End diff --

The IllegalArgumentException is coming from the `Boolean.parseBoolean`.  If 
it cannot understand the value as either true or false it will throw the 
exception and we end up with it being true, the default.  If you want a log I 
am happy to add it, because "no" will be interpreted as true, which might be 
confusing.


---


[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-10-01 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/648
  
I just added in a test and made the code match closer to how master/3.5 
work for the canonical host name.


---


[GitHub] zookeeper issue #652: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-10-01 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/652
  
Added in the test.


---


[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-10-01 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/648
  
I tried to add in powermock too, but it conflicts with mockito (power mock 
uses a much older version of mockito-core).  I'll try and use a different API 
for powermock instead of mockito.


---


[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-10-01 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/648
  
Looks like it is mockito-all not powermock-api-mockito.  I can try and 
change it and see.


---


[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-10-01 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/648
  
So I am running into some issues with the tests as all of the methods to 
InetSocketAddress are marked as final and as such I cannot mock them. This 
means I would need a set of host names that are real, and accessible from any 
build host you would want to run the test on and never going to be taken down, 
or I would need to create another class that wraps InetSocketAddress for the 
canonicalization that would allow me to do the mocking at that level.  Not sure 
which you would prefer. 


---


[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-10-01 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/648
  
Sorry yes. I'll get on the test case...


---


[GitHub] zookeeper pull request #652: ZOOKEEPER-3156: Add in option to canonicalize h...

2018-10-01 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/652#discussion_r221749153
  
--- Diff: src/java/main/org/apache/zookeeper/ClientCnxn.java ---
@@ -1102,7 +1103,32 @@ private void startConnect(InetSocketAddress addr) 
throws IOException {
 private String getServerPrincipal(InetSocketAddress addr) {
 String principalUserName = 
clientConfig.getProperty(ZKClientConfig.ZK_SASL_CLIENT_USERNAME,
 ZKClientConfig.ZK_SASL_CLIENT_USERNAME_DEFAULT);
-String serverPrincipal = principalUserName + "/" + 
addr.getHostString();
+String hostName = addr.getHostName();
+
+boolean canonicalize = true;
+try {
+canonicalize = 
Boolean.parseBoolean(System.getProperty(ZKClientConfig.ZK_SASL_CLIENT_CANONICALIZE_HOSTNAME,
--- End diff --

Great point, missed that on the port.


---


[GitHub] zookeeper pull request #652: ZOOKEEPER-3156: Add in option to canonicalize h...

2018-10-01 Thread revans2
GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/652

ZOOKEEPER-3156: Add in option to canonicalize host name

This is the master and 3.5 version of #648.  It should apply cleanly to 
both lines, but if you want a separate pull request for each I am happy to do 
it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-3156

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/652.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #652


commit 24aca04adf57d10df123e2d61f55e2eb1344c014
Author: Robert Evans 
Date:   2018-09-26T20:40:26Z

ZOOKEEPER-3156: Add in option to canonicalize host name




---


[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-10-01 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/648
  
I just addressed the review comments if things look good I am happy to 
squash commits.  I'll also put up pull requests to the other lines, now that it 
looks like we have the majority of the code worked out.


---


[GitHub] zookeeper issue #648: ZOOKEEPER-3156: Add in option to canonicalize host nam...

2018-09-28 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/648
  
@anmolnar I think I have addressed all of your comments.  Please have 
another look.


---


[GitHub] zookeeper pull request #648: ZOOKEEPER-3156: Add in option to canonicalize h...

2018-09-28 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/648#discussion_r221377101
  
--- Diff: src/java/main/org/apache/zookeeper/ClientCnxn.java ---
@@ -990,6 +992,27 @@ private void sendPing() {
 private boolean saslLoginFailed = false;
 
 private void startConnect(InetSocketAddress addr) throws 
IOException {
+boolean canonicalize = true;
+try {
+canonicalize = 
Boolean.parseBoolean(System.getProperty(ZK_SASL_CLIENT_CANONICALIZE_HOSTNAME, 
"true"));
+} catch (IllegalArgumentException ea) {
+//ignored ...
+}
+
+if (canonicalize) {
+try {
+InetAddress ia = addr.getAddress();
+LOG.warn("ia {}", ia);
+if (ia == null) {
+ia = InetAddress.getByName(addr.getHostName());
+}
+String host = (ia != null) ? ia.getCanonicalHostName() 
: addr.getHostName();
+addr = new 
InetSocketAddress(InetAddress.getByAddress(host, ia.getAddress()), 
addr.getPort());
--- End diff --

But you said that we already have the IP address, which is what it is going 
to use, so never mind, it should be fine.


---


[GitHub] zookeeper pull request #648: ZOOKEEPER-3156: Add in option to canonicalize h...

2018-09-28 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/648#discussion_r221376870
  
--- Diff: src/java/main/org/apache/zookeeper/ClientCnxn.java ---
@@ -990,6 +992,27 @@ private void sendPing() {
 private boolean saslLoginFailed = false;
 
 private void startConnect(InetSocketAddress addr) throws 
IOException {
+boolean canonicalize = true;
+try {
+canonicalize = 
Boolean.parseBoolean(System.getProperty(ZK_SASL_CLIENT_CANONICALIZE_HOSTNAME, 
"true"));
+} catch (IllegalArgumentException ea) {
+//ignored ...
+}
+
+if (canonicalize) {
+try {
+InetAddress ia = addr.getAddress();
+LOG.warn("ia {}", ia);
+if (ia == null) {
+ia = InetAddress.getByName(addr.getHostName());
+}
+String host = (ia != null) ? ia.getCanonicalHostName() 
: addr.getHostName();
+addr = new 
InetSocketAddress(InetAddress.getByAddress(host, ia.getAddress()), 
addr.getPort());
--- End diff --

I put it here because there is a possible race.  The entire reason we have 
the setup we do is so that we can change the nodes in the cluster without 
changing the config on the client side.  If the code that establishes the 
connection uses a different address from the code that creates the principal 
there is the possibility, because of the magic of DNS caching, that the 
connection would be made to a different box from the one the SASL client is 
expecting.  If that happens, and we have the default Client section in the jaas 
conf, it will not result in an error.  Instead the ZK client logs something 
about SASL failed and still connects but is not authorized to do anything.  If 
it failed and retried a different node I would be happy to move it, but it does 
not and that failure would not be transparent to the end user.


---


[GitHub] zookeeper pull request #648: ZOOKEEPER-3156: Add in option to canonicalize h...

2018-09-28 Thread revans2
GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/648

ZOOKEEPER-3156: Add in option to canonicalize host name



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-3156-3.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/648.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #648


commit 7ecd6c9151c0e7be749cfead5f141d673a8663f6
Author: Robert Evans 
Date:   2018-09-26T20:40:26Z

ZOOKEEPER-3156: Add in option to canonicalize host name




---


[GitHub] zookeeper issue #453: ZOOKEEPER-2845: Apply commit log when restarting serve...

2018-02-27 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/453
  
Thanks @afine I closed them.


---


[GitHub] zookeeper pull request #455: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-27 Thread revans2
Github user revans2 closed the pull request at:

https://github.com/apache/zookeeper/pull/455


---


[GitHub] zookeeper pull request #454: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-27 Thread revans2
Github user revans2 closed the pull request at:

https://github.com/apache/zookeeper/pull/454


---


[GitHub] zookeeper issue #453: ZOOKEEPER-2845: Apply commit log when restarting serve...

2018-02-21 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/453
  
@afine all of the changes in this branch are now in the pull requests to 
the 3.5 and 3.5 branches,


---


[GitHub] zookeeper issue #455: ZOOKEEPER-2845: Apply commit log when restarting serve...

2018-02-21 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/455
  
I just rebased this and pulled in all of the changes made to the main test.


---


[GitHub] zookeeper issue #454: ZOOKEEPER-2845: Apply commit log when restarting serve...

2018-02-21 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/454
  
I just rebased this and pulled in all of the changes made to the main test.


---


[GitHub] zookeeper issue #453: ZOOKEEPER-2845: Apply commit log when restarting serve...

2018-02-21 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/453
  
@afine 

I have addressed you most recent comments.  If you want me to squash 
commits please let me know.

I have a pull request for the 3.5 branch #454 and for the 3.4 branch #455.  
I will be spending some time porting the test to them, and let you know when it 
is ready.


---


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-21 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r169662234
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +923,103 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testFailedTxnAsPartOfQuorumLoss() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+servers = LaunchServers(SERVER_COUNT);
+
+waitForAll(servers, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+servers.shutDownAllServers();
+waitForAll(servers, States.CONNECTING);
+servers.restartAllServersAndClients(this);
+waitForAll(servers, States.CONNECTED);
+
+// 2. kill all followers
+int leader = servers.findLeader();
+Map outstanding =  
servers.mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to looking
+servers.mt[leader].main.quorumPeer.tickTime = 1;
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+servers.mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+servers.mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the new leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+servers.restartClient(i, this);
+waitForOne(servers.zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to old leader and make sure it's 
synced to disk,
+//which means it acked from itself
+try {
+servers.zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertEquals(1, outstanding.size());
+Proposal p = outstanding.values().iterator().next();
+Assert.assertEquals(OpCode.create, p.request.getHdr().getType());
+
+// make sure it has a chance to write it to disk
+int sleepTime = 0;
+Long longLeader = new Long(leader);
+while (!p.qvAcksetPairs.get(0).getAckset().contains(longLeader)) {
+if (sleepTime > 2000) {
+Assert.fail("Transaction not synced to disk within 1 
second " + p.qvAcksetPairs.get(0).getAckset()
++ " expected " + leader);
+}
+Thread.sleep(100);
+sleepTime += 100;
+}
+
+// 6. wait for the leader to quit due to not enough followers and 
come back up as a part of the new quorum
+sleepTime = 0;
+Follower f = servers.mt[leader].main.quorumPeer.follower;
+while (f == null || !f.isRunning()) {
+if (sleepTime > 10_000) {
+Assert.fail("Took too long for old leader to time out " + 
servers.mt[leader].main.quorumPeer.getPeerState());
+}
+Thread.sleep(100);
+sleepTime += 100;
+f = servers.mt[leader].main.quorumPeer.follower;
+}
+servers.mt[leader].shutdown();
--- End diff --

It is a lot of very specific steps that make the data inconsistency show 
up.  This is needed to force the transaction log to be replayed which has an 
edit in it that wasn't considered as a part of leader election.


---


[GitHub] zookeeper issue #453: ZOOKEEPER-2845: Apply commit log when restarting serve...

2018-02-16 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/453
  
@afine and @anmolnar I think I have addressed all of your review comments, 
except for the one about the change to `waitForOne` and I am happy to adjust 
however you want there.


---


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168807943
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
+mt[i].start();
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].shutdown();
+}
+
+waitForAll(zk, States.CONNECTING);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].start();
+// Recreate a client session since the previous session was 
not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// 2. kill all followers
+int leader = -1;
+Map outstanding = null;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+leader = i;
+outstanding = 
mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to 
looking
+mt[leader].main.quorumPeer.tickTime = 1;
+}
+}
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+waitForOne(zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to leader and make sure it's synced to 
disk,
+//which means it acked from itself
+try {
+zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertTrue(outstanding.size() == 1);
+Proposal p = (Proposal) outstanding.values().iterator().next();
+Assert.assertTrue(p.request.getHdr().getType() == OpCode.create);
+
+// make sure it has a chance to write it to disk
+Thread.sleep(1000);
+p.qvAcksetPairs.get(0).getAckset().contains(leader);
+
+// 6. wait the leader to quit due to no enough followers
+Thread.sleep(4000);
+//waitForOne(zk[leader], States.CONNECTING);
--- End diff --

done


---


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168807976
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
--- End diff --

done


---


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168807914
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
+mt[i].start();
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].shutdown();
+}
+
+waitForAll(zk, States.CONNECTING);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].start();
+// Recreate a client session since the previous session was 
not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// 2. kill all followers
+int leader = -1;
+Map outstanding = null;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+leader = i;
+outstanding = 
mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to 
looking
+mt[leader].main.quorumPeer.tickTime = 1;
+}
+}
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+waitForOne(zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to leader and make sure it's synced to 
disk,
+//which means it acked from itself
+try {
+zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertTrue(outstanding.size() == 1);
+Proposal p = (Proposal) outstanding.values().iterator().next();
--- End diff --

removed the cast


---


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168807853
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
+mt[i].start();
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].shutdown();
+}
+
+waitForAll(zk, States.CONNECTING);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].start();
+// Recreate a client session since the previous session was 
not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// 2. kill all followers
+int leader = -1;
+Map outstanding = null;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+leader = i;
+outstanding = 
mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to 
looking
+mt[leader].main.quorumPeer.tickTime = 1;
+}
+}
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+waitForOne(zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to leader and make sure it's synced to 
disk,
+//which means it acked from itself
+try {
+zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertTrue(outstanding.size() == 1);
+Proposal p = (Proposal) outstanding.values().iterator().next();
+Assert.assertTrue(p.request.getHdr().getType() == OpCode.create);
+
+// make sure it has a chance to write it to disk
+Thread.sleep(1000);
--- End diff --

I was able to do what you said and drop the 1 second sleep, but the sleep 
at step 6 I am going to need something else because the leader is only in the 
looking state for 2 ms.  Leader election happens way too fast for us to be able 
to catch that by polli

[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168795646
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -435,7 +435,7 @@ private void waitForOne(ZooKeeper zk, States state) 
throws InterruptedException
 int iterations = ClientBase.CONNECTION_TIMEOUT / 500;
 while (zk.getState() != state) {
 if (iterations-- == 0) {
-throw new RuntimeException("Waiting too long");
+throw new RuntimeException("Waiting too long " + 
zk.getState() + " != " + state);
--- End diff --

@anmolnar  and @afine I put this in for my own debugging and I forgot to 
remove it.  If you want me to I am happy to either remove it or file a separate 
JIRA and put it up as a separate pull request, or just leave it.  Either way is 
fine with me.


---


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168794042
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
+mt[i].start();
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].shutdown();
+}
+
+waitForAll(zk, States.CONNECTING);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].start();
+// Recreate a client session since the previous session was 
not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// 2. kill all followers
+int leader = -1;
+Map outstanding = null;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+leader = i;
+outstanding = 
mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to 
looking
+mt[leader].main.quorumPeer.tickTime = 1;
+}
+}
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+waitForOne(zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to leader and make sure it's synced to 
disk,
+//which means it acked from itself
+try {
+zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertTrue(outstanding.size() == 1);
+Proposal p = (Proposal) outstanding.values().iterator().next();
+Assert.assertTrue(p.request.getHdr().getType() == OpCode.create);
+
+// make sure it has a chance to write it to disk
+Thread.sleep(1000);
--- End diff --

I will see if I can make it work.  I agree I would love to kill as many of 
the sleeps as possible.


---


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168793764
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
--- End diff --

I am not super familiar with the test infrastructure.  If you have a 
suggestion I would love it, otherwise I will look around and see what I can 
come up with.


---


[GitHub] zookeeper issue #453: ZOOKEEPER-2845: Apply commit log when restarting serve...

2018-02-13 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/453
  
@anmolnar I added in an updated version of the test in #310. The issue 
turned out to be a race condition where the original leader would time out 
clients and then would join the new quorum too quickly for the test to be able 
to detect it.  I changed it so there is a hard coded sleep instead and then 
just shut down the leader.  I would love to get rid of the hard coded sleep, 
but I wasn't really sure how to do it without making some major changes in the 
leader code to put in a synchronization point.  If you really want me to do it 
I can, but it felt rather intrusive.

I verified that when I comment out my code that does the fast forward the 
test fails and when I put it back the test passes.  If this looks OK I'll try 
to port the test to the other release branches too.

I also addressed your request to make some of the code common.


---


[GitHub] zookeeper issue #453: ZOOKEEPER-2845: Apply commit log when restarting serve...

2018-02-13 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/453
  
@anmolnar I will add some kind of a test.  I ran into a lot of issues with 
`testTxnAheadSnapInRetainDB`.  I could not get it to run correctly against 
master as it would always end up electing the original leader again and the 
test would fail, but not because it had reproduced the issue.  I finally just 
did development work based off of the [original 
patch](https://github.com/apache/zookeeper/compare/master...revans2:ZOOKEEPER-2845-updated-fix?expand=1)
 and verified that `testTxnAheadSnapInRetainDB` passed, or that if it failed it 
did so because of leader election.


---


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-13 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r167885280
  
--- Diff: src/java/main/org/apache/zookeeper/server/ZKDatabase.java ---
@@ -233,14 +233,32 @@ public long getDataTreeLastProcessedZxid() {
  * @throws IOException
  */
 public long loadDataBase() throws IOException {
-PlayBackListener listener=new PlayBackListener(){
+PlayBackListener listener = new PlayBackListener(){
 public void onTxnLoaded(TxnHeader hdr,Record txn){
 Request r = new Request(0, hdr.getCxid(),hdr.getType(), 
hdr, txn, hdr.getZxid());
 addCommittedProposal(r);
 }
 };
 
-long zxid = 
snapLog.restore(dataTree,sessionsWithTimeouts,listener);
+long zxid = snapLog.restore(dataTree, sessionsWithTimeouts, 
listener);
+initialized = true;
+return zxid;
+}
+
+/**
+ * Fast forward the database adding transactions from the committed 
log into memory.
+ * @return the last valid zxid.
+ * @throws IOException
+ */
+public long fastForwardDataBase() throws IOException {
+PlayBackListener listener = new PlayBackListener(){
--- End diff --

Will do


---


[GitHub] zookeeper issue #453: ZOOKEEPER-2845: Apply commit log when restarting serve...

2018-02-13 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/453
  
Thank you to everyone who reviewed the patch, but with the help of Fangmin 
Lv I found one case that the original patch didn't cover.  I have reworked the 
patch to cover that case, but to do so I had to take a completely different 
approach.

I think this is a better approach because it reuses a lot of the code that 
was originally run to load the database from disk.  So now instead of reloading 
the entire database from disk, we apply all of the uncommitted transactions in 
the log to the in memory database.  This should put it in exactly the same 
state as if we had cleared the data and reloaded it from disk, but with much 
less overhead.


---


[GitHub] zookeeper pull request #455: ZOOKEEPER-2845: Send a SNAP if transactions can...

2018-01-31 Thread revans2
GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/455

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified.

This is the version of #453 for the 3.4 branch

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2845-3.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/455.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #455


commit b035df19616424036afb1f31f345dedf26e3b2ae
Author: Robert Evans 
Date:   2018-02-01T02:09:53Z

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified.




---


[GitHub] zookeeper pull request #454: ZOOKEEPER-2845: Send a SNAP if transactions can...

2018-01-31 Thread revans2
GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/454

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified. (3.5)

This is the version of #453 for the 3.5 branch

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2845-3.5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/454.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #454


commit 70436249c830af0b129caf3d1bed2f55a2498b6b
Author: Robert Evans 
Date:   2018-01-29T20:27:10Z

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified.




---


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Send a SNAP if transactions can...

2018-01-31 Thread revans2
GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/453

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified.

I will be creating a patch/pull request for 3.4 and 3.5 too, but I wanted 
to get a pull request up for others to look at ASAP.

I have a version of this based off of #310 at 
https://github.com/revans2/zookeeper/tree/ZOOKEEPER-2845-orig-test-patch but 
the test itself is flaky.  Frequently leader election does not go as planned on 
the test and it ends up failing but not because it ended up in an inconsistent 
state.

I am happy to answer any questions anyone has about the patch.  

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2845-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/453.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #453


commit 0219b2c9e44527067cd5fed4b642729171721886
Author: Robert Evans 
Date:   2018-01-29T20:27:10Z

ZOOKEEPER-2845: Send a SNAP if transactions cannot be verified.




---


[GitHub] zookeeper issue #310: [ZOOKEEPER-2845][Test] Test used to reproduce the data...

2018-01-29 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/310
  
Apparently for some reason I don't understand if I don't run all of the 
tests in QuorumPeerMainTest The old leader is elected again each time.


---


[GitHub] zookeeper issue #310: [ZOOKEEPER-2845][Test] Test used to reproduce the data...

2018-01-29 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/310
  
@lvfangmin 

I am trying to reproduce the issue you have seen here, and I have not been 
able to do so.  The test either fails for me with the same leader being elected 
each time, or on newer versions with the leader client connected instead of 
connecting waiting for it to quit with a timeout, that I am not sure if it ever 
happens.

How frequently would this test pass for you?


---


[GitHub] zookeeper issue #310: [ZOOKEEPER-2845][Test] Test used to reproduce the data...

2017-09-05 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/310
  
@lvfangmin any update on getting a pull request for the actual fix?


---


[GitHub] zookeeper issue #282: ZOOKEEPER-1782: Let a SASL super user be super

2017-06-23 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/282
  
@hanm I added in the docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #282: ZOOKEEPER-1782: Let a SASL super user be super

2017-06-22 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/282
  
@hanm I didn't add in `zookeeper.superUser`.  I am happy to document it if 
you tell me where to make that change.  I don't know where the documentation 
source is stored.  I am happy to add in an additional 
`zookeeper.SASLAuthenticationProvider.superUser` config and document it 
instead, but removing `zookeeper.superUser` will technically break backwards 
compatibility.  If that is what you want I can do it, I just want to be sure 
like on the other breaking change that was suggested.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #282: ZOOKEEPER-1782: Let a SASL super user be super

2017-06-21 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/282
  
@arshadmohammad if everything looks good I will squash my commits


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #282: ZOOKEEPER-1782: Let a SASL super user be super

2017-06-21 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/282#discussion_r123330561
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/auth/SASLAuthenticationProvider.java 
---
@@ -38,11 +38,6 @@ public String getScheme() {
 }
 
 public boolean matches(String id,String aclExpr) {
-if (System.getProperty("zookeeper.superUser") != null) {
-if (id.equals(System.getProperty("zookeeper.superUser")) || 
id.equals(aclExpr)) {
-  return true;
-}
-}
 if ((id.equals("super") || id.equals(aclExpr))) {
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #282: ZOOKEEPER-1782: Let a SASL super user be super

2017-06-19 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/282
  
@arshadmohammad I think I have addressed your review comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #282: ZOOKEEPER-1782: Let a SASL super user be super

2017-06-19 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/282#discussion_r122698936
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/auth/SASLAuthenticationProvider.java 
---
@@ -38,11 +38,6 @@ public String getScheme() {
 }
 
 public boolean matches(String id,String aclExpr) {
-if (System.getProperty("zookeeper.superUser") != null) {
-if (id.equals(System.getProperty("zookeeper.superUser")) || 
id.equals(aclExpr)) {
-  return true;
-}
-}
 if ((id.equals("super") || id.equals(aclExpr))) {
--- End diff --

Sure.  It looks like I am going to have to pull out the "super" user SASL 
Login too.


https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/auth/SaslServerCallbackHandler.java#L42

https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/auth/SaslServerCallbackHandler.java#L99-L101

Do you know where the source for the documentation at 
https://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html is? Under the 
section "Authentication & Authorization Options" it talks about how to use the 
"super" user. 

I also found docs for this in a few other places on the open internet and 
because this is a backwards incompatible change I get a little nervous, so is 
there anything else I need to do to document a breaking change like this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #282: ZOOKEEPER-1782: Let a SASL super user be super

2017-06-17 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/282
  
@hanm Thanks for the review.  I think I have addressed all of your review 
comments.  I originally put the patch up in 2013 and I have not touched it 
since so I honestly don't remember a lot about why I made the test that way to 
begin with.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #282: ZOOKEEPER-1782: Let a SASL super user be super

2017-06-14 Thread revans2
GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/282

ZOOKEEPER-1782: Let a SASL super user be super



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-1782

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/282.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #282


commit 62ee0bcedbbbf34b6baf6effac24765d714928cc
Author: Robert Evans 
Date:   2017-06-14T14:13:03Z

ZOOKEEPER-1782: Let a SASL super user be super




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-04-18 Thread revans2
Github user revans2 closed the pull request at:

https://github.com/apache/zookeeper/pull/157


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #180: ZOOKEEPER-2700 add JMX `takeSnapshot` method an...

2017-02-21 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/180#discussion_r102283801
  
--- Diff: src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java ---
@@ -303,15 +305,38 @@ public void loadData() throws IOException, 
InterruptedException {
 
 public void takeSnapshot(){
--- End diff --

@flier Having SyncRequestProcessor look at the number of snapshot files 
feels like a bit of a hack, but I am not expert here.  I would be fine with 
having SyncRequestProcessor delegate all throttling of snapshots to 
ZooKeeperServer, but then we need some sort of synchronization to prevent a 
snapshot from being taken when it is not in a proper state (during the ZABP 
sync phase).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #180: ZOOKEEPER-2700 add JMX `takeSnapshot` method an...

2017-02-21 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/180#discussion_r102245094
  
--- Diff: src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java ---
@@ -303,15 +305,38 @@ public void loadData() throws IOException, 
InterruptedException {
 
 public void takeSnapshot(){
--- End diff --

@eribeiro An AtomicBoolean is not enough as the SyncRequestProcessor 
ignores it, only the JMX and admin commands use it.

@flier there are several things going on here and I don't know the 
reasoning for all of them but I can guess.

1.  Only take one snapshot at a time.  I don't think this is super 
critical, because it is not a correctness problem.  Having multiple snapshots 
happening at the same time should work correctly even in the face of crashes, 
but it becomes a performance problem.  There is a limited amount of CPU, 
Memory, and most importantly disk bandwidth and IOps.  The last successful 
snapshot wins.  So having multiple snapshots all going at the same time means 
we are likely to be doing a lot of wasted work if everything does happen 
successfully.  If we can schedule the snapshots so there is only one going on 
at a time then the full bandwidth and IOps can go to that snapshot.  Even 
better would be to space them out, so if we force a snapshot it makes 
SyncRequestProcessor reset its counter so we don't take another one until X 
more transactions have been processed.

2. Taking a snapshot at the wrong time and then crashing can corrupt the DB 
on that node.  This is potentially critical.  The probability of this happening 
is very very small, but if we can fix it so it can never happen, I would be 
much happier.

I think we can fix both issues at the same time, by making JMX and Admin 
use SyncRequestProcessor instead of going to the ZookeeperServer directly.  If 
no SyncRequestProcessor is ready then we can return such to the user so they 
know the node is not ready to take a snapshot. 

I also really would like to understand the use case for this command, 
because I just don't see much value to a user to force another snapshot if one 
is already in progress.  I also would like to understand when you would want to 
take a snapshot and if these changes would too severely limit that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #180: ZOOKEEPER-2700 add JMX `takeSnapshot` method and Jetty...

2017-02-21 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/180
  
Oh I also had the idea that it might be nice to provide feedback on the 
snap command for both JMX and the Admin command.  You could provide more then 
just the last zxid, but you could also indicate if we are going to do the 
snapshot or not, and why.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #180: ZOOKEEPER-2700 add JMX `takeSnapshot` method an...

2017-02-21 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/180#discussion_r10749
  
--- Diff: src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java ---
@@ -303,15 +305,38 @@ public void loadData() throws IOException, 
InterruptedException {
 
 public void takeSnapshot(){
 try {
+lastSnapshotZxid = zkDb.getDataTreeLastProcessedZxid();
+isGeneratingSnapshot.incrementAndGet();
+
 txnLogFactory.save(zkDb.getDataTree(), 
zkDb.getSessionWithTimeOuts());
 } catch (IOException e) {
 LOG.error("Severe unrecoverable error, exiting", e);
 // This is a severe error that we cannot recover from,
 // so we need to exit
 System.exit(10);
+} finally {
+isGeneratingSnapshot.decrementAndGet();
 }
 }
 
+public boolean maybeTakeSnapshot() {
--- End diff --

Could you add some javadocs here?  It would be nice to explain the 
difference between takeSnapshot and maybeTakeSnapshot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #180: ZOOKEEPER-2700 add JMX `takeSnapshot` method an...

2017-02-21 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/180#discussion_r102229805
  
--- Diff: src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java ---
@@ -303,15 +305,38 @@ public void loadData() throws IOException, 
InterruptedException {
 
 public void takeSnapshot(){
--- End diff --

We now have a lot of potential paths to take a snapshot.  JMX, 
SyncRequestProcessor, Admin, and as part of the sync stage in ZABP.  In the 
past it was just the SyncRequestProcessor and ZABP that would trigger 
snapshots.  We didn't have much concern about collisions, because 
SyncRequestProcessor would only run as edits were being sent, and the edits 
would only be sent after the ZABP sync phase had completed.  Now we have Admin 
and JMX possibly doing a snapshot in the background.  

Now if a snapshot request comes in during the ZABP sync phase after we have 
clear out the in memory DB and not yet applied the new snapshot and then we 
crash before we can write out the new snapshot we could end up with data 
corruption.  This should be super super rare so I don't really know if we care 
all that much about it, but I think it is something that we can fix.

I am not an expert on the code, so I am not sure of the cleanest way to fix 
this, but it feels like having maybeTakeSnapshot be a part of the 
SyncRequestProcessor instead of ZooKeeperServer.  I don't know how simple it is 
to get to the SyncRequestProcessor from JMX/Admin but if you can then it means 
that we can reset the edit count so we are not taking a lot of snapshots all at 
once, one right after another.  It also means that we can truly avoid having 
more then one snapshot be taken at any time. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #180: ZOOKEEPER-2700 add `snap` command to take snaps...

2017-02-17 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/180#discussion_r101775786
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/command/SnapCommand.java ---
@@ -0,0 +1,53 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server.command;
+
+import org.apache.zookeeper.server.ServerCnxn;
+import org.apache.zookeeper.server.ZooKeeperThread;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.PrintWriter;
+
+public class SnapCommand extends AbstractFourLetterCommand {
+private static final Logger LOG = 
LoggerFactory.getLogger(SnapCommand.class);
+
+SnapCommand(PrintWriter pw, ServerCnxn serverCnxn) {
+super(pw,serverCnxn);
+}
+
+@Override
+public void commandRun() throws IOException {
+if (!isZKServerRunning()) {
+pw.println(ZK_NOT_SERVING);
+} else {
+Thread snapInProcess = new ZooKeeperThread("Snapshot Thread") {
--- End diff --

Wouldn't this also be a potential DoS attack?  Especially with launching a 
new thread in the background for the snapshot.  Could we put in some sort of 
throttling on taking the snapshot, so that we don't do it too frequently?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #157: ZOOKEEPER-2678: Discovery and Sync can take a very lon...

2017-02-13 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/157
  
@rakeshadr If it makes you feel any better we have been running with an 
older version of this patch in production for a while.  We have used it as part 
of a rolling upgrade at least 10 times in production where if it were not there 
we would have had some very painful outages. 

I have also manually tested it at least 50 times shooting the leader under 
load (10,000 operations/second) on a 3.4 GB DB, watching it recover, and then 
validating the integrity of the DB to be sure we didn't get any corruption.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #159: ZOOKEEPER-2678: Discovery and Sync can take a very lon...

2017-02-10 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/159
  
@hanm thanks for the reviews I removed the incorrect comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #159: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-02-10 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/159#discussion_r100552192
  
--- Diff: src/java/main/org/apache/zookeeper/server/quorum/Learner.java ---
@@ -498,14 +504,19 @@ else if (qp.getType() == Leader.SNAP) {
throw new Exception("changes proposed in 
reconfig");
}
 }
-if (!snapshotTaken) { // true for the pre v1.0 case
-   zk.takeSnapshot();
+if (isPreZAB1_0) {
+zk.takeSnapshot();
 self.setCurrentEpoch(newEpoch);
 }
 self.setZooKeeperServer(zk);
 self.adminServer.setZooKeeperServer(zk);
 break outerLoop;
-case Leader.NEWLEADER: // it will be NEWLEADER in v1.0 
   
+case Leader.NEWLEADER: // Getting NEWLEADER here instead 
of in discovery 
+// means this is Zab 1.0
+// Create updatingEpoch file and remove it after 
current
--- End diff --

You are right will fix that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #157: ZOOKEEPER-2678: Discovery and Sync can take a very lon...

2017-02-09 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/157
  
Is there any more I need to do to get this merged in?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #157: ZOOKEEPER-2678: Discovery and Sync can take a very lon...

2017-02-06 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/157
  
@afine I updated the test to spy on the `LearnerZooKeeperServer` instance 
and check if and when `takeSnapshot` was called.  I think this fulfills your 
desires, but with tests there can always be more. So, if you do want more tests 
or other areas covered please let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-02-03 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/157#discussion_r99347551
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -839,12 +839,25 @@ public void converseWithFollower(InputArchive ia, 
OutputArchive oa,
 Assert.assertEquals(1, f.self.getAcceptedEpoch());
 Assert.assertEquals(1, f.self.getCurrentEpoch());
 
+//Wait for the transactions to be written out. The 
thread that writes them out
+// does not send anything back when it is done.
+long start = System.currentTimeMillis();
+while (createSessionZxid != 
f.fzk.getLastProcessedZxid() && (System.currentTimeMillis() - start) < 50) {
--- End diff --

OK Will see what I can do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #157: ZOOKEEPER-2678: Discovery and Sync can take a very lon...

2017-01-31 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/157
  
@hanm I addressed your review comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-01-31 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/157#discussion_r98744573
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -839,12 +839,25 @@ public void converseWithFollower(InputArchive ia, 
OutputArchive oa,
 Assert.assertEquals(1, f.self.getAcceptedEpoch());
 Assert.assertEquals(1, f.self.getCurrentEpoch());
 
+//Wait for the transactions to be written out. The 
thread that writes them out
+// does not send anything back when it is done.
+long start = System.currentTimeMillis();
+while (createSessionZxid != 
f.fzk.getLastProcessedZxid() && (System.currentTimeMillis() - start) < 50) {
--- End diff --

It does seem a bit beyond the scope of this.  But if you really want me to 
I can look into it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #159: ZOOKEEPER-2678: Discovery and Sync can take a very lon...

2017-01-30 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/159
  
This also applies cleanly to the 3.5 line #157 is for the 3.4 line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #158: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-01-30 Thread revans2
Github user revans2 closed the pull request at:

https://github.com/apache/zookeeper/pull/158


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #158: ZOOKEEPER-2678: Discovery and Sync can take a very lon...

2017-01-30 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/158
  
Closing this because master applies cleanly to 3.5 as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #157: ZOOKEEPER-2678: Discovery and Sync can take a very lon...

2017-01-30 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/157
  
@eribeiro will do.  It is a clean cherry pick between master and 3.5


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #157: ZOOKEEPER-2678: Discovery and Sync can take a very lon...

2017-01-30 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/157
  
@fpj I addressed your review comments, and I also found another race in the 
ZAB test that I addressed.  Apparently the log line I removed was taking enough 
time for the transactions to be fully flushed.  When I removed it the test 
would occasionally fail.

Please also take a look at #158 and #159 for master and branch 3.5.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #158: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-01-30 Thread revans2
GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/158

ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DBs

This is the 3.5 version of #157 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2678-3.5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/158.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #158


commit f871601abaf2420c674bb78658ceeb5083b3b04f
Author: Robert (Bobby) Evans 
Date:   2017-01-19T19:50:32Z

ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DBs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #159: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-01-30 Thread revans2
GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/159

ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DBs

This is the master version of #157 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2678-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/159.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #159






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #157: ZOOKEEPER-2678: Discovery and Sync can take a very lon...

2017-01-30 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/157
  
@fpj Thanks for the review I will update the comments and start porting it 
to other lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-01-30 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/157#discussion_r98449817
  
--- Diff: src/java/main/org/apache/zookeeper/server/quorum/Learner.java ---
@@ -364,10 +367,12 @@ else if (qp.getType() == Leader.SNAP) {
 
 long lastQueued = 0;
 
-// in V1.0 we take a snapshot when we get the NEWLEADER 
message, but in pre V1.0
+// in Zab V1.0 (ZK 3.4+) we might take a snapshot when we get 
the NEWLEADER message, but in pre V1.0
 // we take the snapshot at the UPDATE, since V1.0 also gets 
the UPDATE (after the NEWLEADER)
 // we need to make sure that we don't take the snapshot twice.
-boolean snapshotTaken = false;
+boolean isPreZAB1_0 = true;
+//If we are not going to take the snapshot be sure the edits 
are not applied in memory
+boolean writeToEditLog = !snapshotNeeded;
--- End diff --

Sorry about that I am still a bit new at the internal terminology.  I will 
update it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-01-26 Thread revans2
GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/157

ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DB

This patch addresses recovery time when a leader is lost on a large DB.  

It does this by not clearing the DB before leader election begins, and by 
avoiding taking a snapshot as part of the SYNC phase, specifically for a DIFF 
sync. It does this by buffering the proposals and commits just like the code 
currently does for proposals/commits sent after the NEWLEADER and before the 
UPTODATE messages. 

If a SNAP is sent we cannot avoid writing out the full snapshot because 
there is no other way to make sure the disk DB is in sync with what is in 
memory.  So any edits to the edit log before a background snapshot happened 
could possibly be applied on top of an incorrect snapshot.

This same optimization should work for TRUNC too, but I opted not to do it 
for TRUNC because TRUNC is rare and TRUNC by its very nature already forces the 
DB to be reread after the edit logs are modified.  So it would still not be 
fast.

In practice this makes it so instead of taking 5+ mins for the cluster to 
recover from losing a leader it now takes about 3 seconds.

I am happy to port this to 3.5. if it looks good.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2678

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/157.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #157


commit 5aa25620e0189b28d7040305272be2fda28126fb
Author: Robert (Bobby) Evans 
Date:   2017-01-19T19:50:32Z

ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DBs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #148: ZOOKEEPER-2659 Log4j 2 migration

2017-01-17 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/148#discussion_r96435980
  
--- Diff: ivy.xml ---
@@ -41,13 +41,20 @@
 
   
 
-
+
+
+
+
+
+
+
--- End diff --

This is the version that log4j 2 uses itself, when you enable async logging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---