[jira] [Commented] (ZOOKEEPER-2680) Correct DataNode.getChildren() inconsistent behavior.

2017-01-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15842327#comment-15842327
 ] 

Hadoop QA commented on ZOOKEEPER-2680:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12849650/ZOOKEEPER-2680-01.patch
  against trunk revision 8771ffdaacb87126a485ae740558f6a288ab980b.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3571//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3571//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3571//console

This message is automatically generated.

> Correct DataNode.getChildren() inconsistent behavior.
> -
>
> Key: ZOOKEEPER-2680
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2680
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.9, 3.5.1
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2680-01.patch
>
>
> DataNode.getChildren() API returns null and empty set if there are no 
> children in it depending on when the API is called. DataNode.getChildren() 
> API behavior should be changed and it should always return empty set if the 
> node does not have any child
> *DataNode.getChildren() API Current Behavior:*
> # returns null initially
> When DataNode is created and no children are added yet, 
> DataNode.getChildren() returns null
> # returns empty set after all the children are deleted:
> created a Node
> add a child
> delete the child
> DataNode.getChildren() returns empty set.
> After fix DataNode.getChildren() should return empty set in all the above 
> cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: ZOOKEEPER-2680 PreCommit Build #3571

2017-01-26 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2680
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3571/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 489638 lines...]
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 2 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3571//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3571//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3571//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 5c27f3deaa1f8201df04bad536b334004f3d69d3 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/patchprocess' 
and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/patchprocess' 
are the same file

BUILD SUCCESSFUL
Total time: 21 minutes 1 second
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2680
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2464) NullPointerException on ContainerManager

2017-01-26 Thread Mohammad Arshad (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15842312#comment-15842312
 ] 

Mohammad Arshad commented on ZOOKEEPER-2464:


[~randgalt], I created ZOOKEEPER-2680. 
After ZOOKEEPER-2680 fix this issue will get automatically fixed.

> NullPointerException on ContainerManager
> 
>
> Key: ZOOKEEPER-2464
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2464
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1
>Reporter: Stefano Salmaso
>Assignee: Jordan Zimmerman
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ContainerManagerTest.java, ZOOKEEPER-2464.patch
>
>
> I would like to expose you to a problem that we are experiencing.
> We are using a cluster of 7 zookeeper and we use them to implement a 
> distributed lock using Curator 
> (http://curator.apache.org/curator-recipes/shared-reentrant-lock.html)
> So .. we tried to play with the servers to see if everything worked properly 
> and we stopped and start servers to see that the system worked well
> (like stop 03, stop 05, stop 06, start 05, start 06, start 03)
> We saw a strange behavior.
> The number of znodes grew up without stopping (normally we had 4000 or 5000, 
> we got to 60,000 and then we stopped our application)
> In zookeeeper logs I saw this (on leader only, one every minute)
> 2016-07-04 14:53:50,302 [myid:7] - ERROR 
> [ContainerManagerTask:ContainerManager$1@84] - Error checking containers
> java.lang.NullPointerException
>at 
> org.apache.zookeeper.server.ContainerManager.getCandidates(ContainerManager.java:151)
>at 
> org.apache.zookeeper.server.ContainerManager.checkContainers(ContainerManager.java:111)
>at 
> org.apache.zookeeper.server.ContainerManager$1.run(ContainerManager.java:78)
>at java.util.TimerThread.mainLoop(Timer.java:555)
>at java.util.TimerThread.run(Timer.java:505)
> We have not yet deleted the data ... so the problem can be reproduced on our 
> servers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2680) Correct DataNode.getChildren() inconsistent behavior.

2017-01-26 Thread Mohammad Arshad (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-2680:
---
Fix Version/s: 3.6.0
   3.5.3
   3.4.10

> Correct DataNode.getChildren() inconsistent behavior.
> -
>
> Key: ZOOKEEPER-2680
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2680
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.9, 3.5.1
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2680-01.patch
>
>
> DataNode.getChildren() API returns null and empty set if there are no 
> children in it depending on when the API is called. DataNode.getChildren() 
> API behavior should be changed and it should always return empty set if the 
> node does not have any child
> *DataNode.getChildren() API Current Behavior:*
> # returns null initially
> When DataNode is created and no children are added yet, 
> DataNode.getChildren() returns null
> # returns empty set after all the children are deleted:
> created a Node
> add a child
> delete the child
> DataNode.getChildren() returns empty set.
> After fix DataNode.getChildren() should return empty set in all the above 
> cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2680) Correct DataNode.getChildren() inconsistent behavior.

2017-01-26 Thread Mohammad Arshad (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-2680:
---
Attachment: ZOOKEEPER-2680-01.patch

> Correct DataNode.getChildren() inconsistent behavior.
> -
>
> Key: ZOOKEEPER-2680
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2680
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.9, 3.5.1
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
> Attachments: ZOOKEEPER-2680-01.patch
>
>
> DataNode.getChildren() API returns null and empty set if there are no 
> children in it depending on when the API is called. DataNode.getChildren() 
> API behavior should be changed and it should always return empty set if the 
> node does not have any child
> *DataNode.getChildren() API Current Behavior:*
> # returns null initially
> When DataNode is created and no children are added yet, 
> DataNode.getChildren() returns null
> # returns empty set after all the children are deleted:
> created a Node
> add a child
> delete the child
> DataNode.getChildren() returns empty set.
> After fix DataNode.getChildren() should return empty set in all the above 
> cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2680) Correct DataNode.getChildren() inconsistent behavior.

2017-01-26 Thread Mohammad Arshad (JIRA)
Mohammad Arshad created ZOOKEEPER-2680:
--

 Summary: Correct DataNode.getChildren() inconsistent behavior.
 Key: ZOOKEEPER-2680
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2680
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.1, 3.4.9
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad


DataNode.getChildren() API returns null and empty set if there are no children 
in it depending on when the API is called. DataNode.getChildren() API behavior 
should be changed and it should always return empty set if the node does not 
have any child

*DataNode.getChildren() API Current Behavior:*
# returns null initially
When DataNode is created and no children are added yet, DataNode.getChildren() 
returns null
# returns empty set after all the children are deleted:
created a Node
add a child
delete the child
DataNode.getChildren() returns empty set.

After fix DataNode.getChildren() should return empty set in all the above cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2044) CancelledKeyException in zookeeper 3.4.5

2017-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15841038#comment-15841038
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2044:
---

Github user rakeshadr commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/156#discussion_r98146418
  
--- Diff: src/java/test/org/apache/zookeeper/server/NIOServerCnxnTest.java 
---
@@ -68,4 +69,41 @@ public void testOperationsAfterCnxnClose() throws 
IOException,
 zk.close();
 }
 }
+
+/**
+ * Mock extension of NIOServerCnxn to test for
+ * CancelledKeyException (ZOOKEEPER-2044).
+ */
+private static class MockNIOServerCnxn extends NIOServerCnxn {
+public MockNIOServerCnxn(NIOServerCnxn cnxn)
+throws IOException {
+super(cnxn.zkServer, cnxn.sock, cnxn.sk, cnxn.factory);
+}
+
+public void mockSendBuffer(ByteBuffer bb) throws Exception {
+super.internalSendBuffer(bb);
+}
+}
+
+@Test(timeout = 3)
+public void testValidSelectionKey() throws Exception {
+int oldTimeout = ClientBase.CONNECTION_TIMEOUT;
+ClientBase.CONNECTION_TIMEOUT = 3000;
+final ZooKeeper zk = createClient();
--- End diff --

Thanks @hanm for the analysis and fixing it. Instead of directly changing 
the static value, how about simplifying the ZooKeeper client creation like 
below,

``final ZooKeeper zk = createZKClient(hostPort, 3000);``


> CancelledKeyException in zookeeper 3.4.5
> 
>
> Key: ZOOKEEPER-2044
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2044
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
> Environment: Red Hat Enterprise Linux Server release 6.2
>Reporter: shamjith antholi
>Assignee: Flavio Junqueira
>Priority: Minor
> Fix For: 3.4.10
>
> Attachments: ZOOKEEPER-2044.patch, ZOOKEEPER-2044.patch
>
>
> I am getting cancelled key exception in zookeeper (version 3.4.5). Please see 
> the log below. When this error is thrown, the connected solr shard is going 
> down by giving the error "Failed to index metadata in 
> Solr,StackTrace=SolrError: HTTP status 503.Reason: 
> {"responseHeader":{"status":503,"QTime":204},"error":{"msg":"ClusterState 
> says we are the leader, but locally we don't think so","code":503"  and 
> ultimately the current activity is going down. Could you please give a 
> solution for this ?
> Zookeper log 
> --
> 2014-09-16 02:58:47,799 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
> attempting to renew session 0x24868e7ca980003 at /172.22.0.5:58587
> 2014-09-16 02:58:47,800 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating 
> client: 0x24868e7ca980003
> 2014-09-16 02:58:47,802 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@588] - Invalid 
> session 0x24868e7ca980003 for client /172.22.0.5:58587, probably expired
> 2014-09-16 02:58:47,803 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed 
> socket connection for client /172.22.0.5:58587 which had sessionid 
> 0x24868e7ca980003
> 2014-09-16 02:58:47,810 [myid:1] - ERROR 
> [CommitProcessor:1:NIOServerCnxn@180] - Unexpected Exception:
> java.nio.channels.CancelledKeyException
> at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
> at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1076)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1113)
> at org.apache.zookeeper.server.DataTree.setWatches(DataTree.java:1327)
> at 
> org.apache.zookeeper.server.ZKDatabase.setWatches(ZKDatabase.java:384)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:304)
> at 
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] zookeeper pull request #156: ZOOKEEPER-2044:CancelledKeyException in zookeep...

2017-01-26 Thread rakeshadr
Github user rakeshadr commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/156#discussion_r98146418
  
--- Diff: src/java/test/org/apache/zookeeper/server/NIOServerCnxnTest.java 
---
@@ -68,4 +69,41 @@ public void testOperationsAfterCnxnClose() throws 
IOException,
 zk.close();
 }
 }
+
+/**
+ * Mock extension of NIOServerCnxn to test for
+ * CancelledKeyException (ZOOKEEPER-2044).
+ */
+private static class MockNIOServerCnxn extends NIOServerCnxn {
+public MockNIOServerCnxn(NIOServerCnxn cnxn)
+throws IOException {
+super(cnxn.zkServer, cnxn.sock, cnxn.sk, cnxn.factory);
+}
+
+public void mockSendBuffer(ByteBuffer bb) throws Exception {
+super.internalSendBuffer(bb);
+}
+}
+
+@Test(timeout = 3)
+public void testValidSelectionKey() throws Exception {
+int oldTimeout = ClientBase.CONNECTION_TIMEOUT;
+ClientBase.CONNECTION_TIMEOUT = 3000;
+final ZooKeeper zk = createClient();
--- End diff --

Thanks @hanm for the analysis and fixing it. Instead of directly changing 
the static value, how about simplifying the ZooKeeper client creation like 
below,

``final ZooKeeper zk = createZKClient(hostPort, 3000);``


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2464) NullPointerException on ContainerManager

2017-01-26 Thread Jordan Zimmerman (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15841002#comment-15841002
 ] 

Jordan Zimmerman commented on ZOOKEEPER-2464:
-

[~arshad.mohammad] IMO it should be a separate issue. 

> NullPointerException on ContainerManager
> 
>
> Key: ZOOKEEPER-2464
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2464
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1
>Reporter: Stefano Salmaso
>Assignee: Jordan Zimmerman
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ContainerManagerTest.java, ZOOKEEPER-2464.patch
>
>
> I would like to expose you to a problem that we are experiencing.
> We are using a cluster of 7 zookeeper and we use them to implement a 
> distributed lock using Curator 
> (http://curator.apache.org/curator-recipes/shared-reentrant-lock.html)
> So .. we tried to play with the servers to see if everything worked properly 
> and we stopped and start servers to see that the system worked well
> (like stop 03, stop 05, stop 06, start 05, start 06, start 03)
> We saw a strange behavior.
> The number of znodes grew up without stopping (normally we had 4000 or 5000, 
> we got to 60,000 and then we stopped our application)
> In zookeeeper logs I saw this (on leader only, one every minute)
> 2016-07-04 14:53:50,302 [myid:7] - ERROR 
> [ContainerManagerTask:ContainerManager$1@84] - Error checking containers
> java.lang.NullPointerException
>at 
> org.apache.zookeeper.server.ContainerManager.getCandidates(ContainerManager.java:151)
>at 
> org.apache.zookeeper.server.ContainerManager.checkContainers(ContainerManager.java:111)
>at 
> org.apache.zookeeper.server.ContainerManager$1.run(ContainerManager.java:78)
>at java.util.TimerThread.mainLoop(Timer.java:555)
>at java.util.TimerThread.run(Timer.java:505)
> We have not yet deleted the data ... so the problem can be reproduced on our 
> servers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2464) NullPointerException on ContainerManager

2017-01-26 Thread Jordan Zimmerman (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15841000#comment-15841000
 ] 

Jordan Zimmerman commented on ZOOKEEPER-2464:
-

[~eribeiro] - I think a 1 line change is too much for a test

> NullPointerException on ContainerManager
> 
>
> Key: ZOOKEEPER-2464
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2464
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1
>Reporter: Stefano Salmaso
>Assignee: Jordan Zimmerman
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ContainerManagerTest.java, ZOOKEEPER-2464.patch
>
>
> I would like to expose you to a problem that we are experiencing.
> We are using a cluster of 7 zookeeper and we use them to implement a 
> distributed lock using Curator 
> (http://curator.apache.org/curator-recipes/shared-reentrant-lock.html)
> So .. we tried to play with the servers to see if everything worked properly 
> and we stopped and start servers to see that the system worked well
> (like stop 03, stop 05, stop 06, start 05, start 06, start 03)
> We saw a strange behavior.
> The number of znodes grew up without stopping (normally we had 4000 or 5000, 
> we got to 60,000 and then we stopped our application)
> In zookeeeper logs I saw this (on leader only, one every minute)
> 2016-07-04 14:53:50,302 [myid:7] - ERROR 
> [ContainerManagerTask:ContainerManager$1@84] - Error checking containers
> java.lang.NullPointerException
>at 
> org.apache.zookeeper.server.ContainerManager.getCandidates(ContainerManager.java:151)
>at 
> org.apache.zookeeper.server.ContainerManager.checkContainers(ContainerManager.java:111)
>at 
> org.apache.zookeeper.server.ContainerManager$1.run(ContainerManager.java:78)
>at java.util.TimerThread.mainLoop(Timer.java:555)
>at java.util.TimerThread.run(Timer.java:505)
> We have not yet deleted the data ... so the problem can be reproduced on our 
> servers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2125) SSL on Netty client-server communication

2017-01-26 Thread Shivam (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840986#comment-15840986
 ] 

Shivam commented on ZOOKEEPER-2125:
---

Can this fix be back ported to last stable release 3.4.9 ??

> SSL on Netty client-server communication
> 
>
> Key: ZOOKEEPER-2125
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2125
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Hongchao Deng
>Assignee: Hongchao Deng
> Fix For: 3.5.1, 3.6.0
>
> Attachments: testKeyStore.jks, testTrustStore.jks, 
> ZOOKEEPER-2125-build.patch, ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, 
> ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, 
> ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, 
> ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, 
> ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, 
> ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, 
> ZOOKEEPER-2125.patch
>
>
> Supporting SSL on Netty client-server communication. 
> 1. It supports keystore and trustore usage. 
> 2. It adds an additional ZK server port which supports SSL. This would be 
> useful for rolling upgrade.
> RB: https://reviews.apache.org/r/31277/
> The patch includes three files: 
> * testing purpose keystore and truststore under 
> "$(ZK_REPO_HOME)/src/java/test/data/ssl". Might need to create "ssl/".
> * latest ZOOKEEPER-2125.patch
> h2. How to use it
> You need to set some parameters on both ZK server and client.
> h3. Server
> You need to specify a listening SSL port in "zoo.cfg":
> {code}
> secureClientPort=2281
> {code}
> Just like what you did with "clientPort". And then set some jvm flags:
> {code}
> export 
> SERVER_JVMFLAGS="-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
>  -Dzookeeper.ssl.keyStore.location=/root/zookeeper/ssl/testKeyStore.jks 
> -Dzookeeper.ssl.keyStore.password=testpass 
> -Dzookeeper.ssl.trustStore.location=/root/zookeeper/ssl/testTrustStore.jks 
> -Dzookeeper.ssl.trustStore.password=testpass"
> {code}
> Please change keystore and truststore parameters accordingly.
> h3. Client
> You need to set jvm flags:
> {code}
> export 
> CLIENT_JVMFLAGS="-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
>  -Dzookeeper.client.secure=true 
> -Dzookeeper.ssl.keyStore.location=/root/zookeeper/ssl/testKeyStore.jks 
> -Dzookeeper.ssl.keyStore.password=testpass 
> -Dzookeeper.ssl.trustStore.location=/root/zookeeper/ssl/testTrustStore.jks 
> -Dzookeeper.ssl.trustStore.password=testpass"
> {code}
> change keystore and truststore parameters accordingly.
> And then connect to the server's SSL port, in this case:
> {code}
> bin/zkCli.sh -server 127.0.0.1:2281
> {code}
> If you have any feedback, you are more than welcome to discuss it here!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2659) Use log4j2 as a logging framework as log4j 1.X is now deprecated

2017-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840802#comment-15840802
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2659:
---

Github user praste commented on the issue:

https://github.com/apache/zookeeper/pull/148
  
Are sure you removed reference to `slf4j-log4j12` from all the `ivy.xml` 
files?
I am not an ivy expert but you can take a look at
http://stackoverflow.com/questions/5405310/find-hidden-dependencies-in-ivy 
and
http://ant.apache.org/ivy/history/latest-milestone/use/dependencytree.html


> Use log4j2 as a logging framework as log4j 1.X is now deprecated
> 
>
> Key: ZOOKEEPER-2659
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2659
> Project: ZooKeeper
>  Issue Type: Wish
>Reporter: Pushkar Raste
>Assignee: Pushkar Raste
>Priority: Minor
> Attachments: zk_log4j2_migration.patch
>
>
> Zookeeper currently uses {{log4j 1.X}} as the default logging framework. 
> {{log4j 1.X}} is now deprecated http://logging.apache.org/log4j/1.2/
> This ticket is to track efforts to move zookeeper to {{log4j2}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] zookeeper issue #148: ZOOKEEPER-2659 Log4j 2 migration

2017-01-26 Thread praste
Github user praste commented on the issue:

https://github.com/apache/zookeeper/pull/148
  
Are sure you removed reference to `slf4j-log4j12` from all the `ivy.xml` 
files?
I am not an ivy expert but you can take a look at
http://stackoverflow.com/questions/5405310/find-hidden-dependencies-in-ivy 
and
http://ant.apache.org/ivy/history/latest-milestone/use/dependencytree.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Failed: ZOOKEEPER- PreCommit Build #256

2017-01-26 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/256/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 25 lines...]
  Getting sizes
Done: 9
  Compressing objects
Done: 0
  Writing objects
Done: 10
  remote: Updating references
Merging refs/tags/changes/256
 > git rev-parse refs/tags/changes/256^{commit} # timeout=10
 > git merge 831e560a9396f021b9d77f2127b4a294d7cc8638 # timeout=10
 > git rev-parse branch-3.4^{commit} # timeout=10
Checking out Revision 74d5f228bc28391195e242b99f5c63f77ac12080 (branch-3.4)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 74d5f228bc28391195e242b99f5c63f77ac12080
 > git rev-parse origin/branch-3.4^{commit} # timeout=10
 > git rev-list d6bbfd76d24c044073764c5d074a9198c69fafab # timeout=10
No emails were triggered.
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[PreCommit-ZOOKEEPER-github-pr-build] $ /bin/bash 
/tmp/hudson9036623794741036736.sh
/home/jenkins/tools/java/latest1.7/bin/java
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 386177
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 6
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 10240
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited
Buildfile: 
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml

BUILD FAILED
Target "qa-test-pullrequest" does not exist in the project "ZooKeeper". 

Total time: 0 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
ERROR: Step ‘Publish JUnit test result report’ failed: No test report files 
were found. Configuration error?
[description-setter] Could not determine description.
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (ZOOKEEPER-2044) CancelledKeyException in zookeeper 3.4.5

2017-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840707#comment-15840707
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2044:
---

Github user hanm commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/156#discussion_r98120457
  
--- Diff: src/java/test/org/apache/zookeeper/server/NIOServerCnxnTest.java 
---
@@ -68,4 +69,38 @@ public void testOperationsAfterCnxnClose() throws 
IOException,
 zk.close();
 }
 }
+
+/**
+ * Mock extension of NIOServerCnxn to test for
+ * CancelledKeyException (ZOOKEEPER-2044).
+ */
+private static class MockNIOServerCnxn extends NIOServerCnxn {
+public MockNIOServerCnxn(NIOServerCnxn cnxn)
+throws IOException {
+super(cnxn.zkServer, cnxn.sock, cnxn.sk, cnxn.factory);
+}
+
+public void mockSendBuffer(ByteBuffer bb) throws Exception {
+super.internalSendBuffer(bb);
+}
+}
+
+@Test(timeout = 3)
+public void testValidSelectionKey() throws Exception {
+final ZooKeeper zk = createClient();
+try {
+Iterable connections = 
serverFactory.getConnections();
+for (ServerCnxn serverCnxn : connections) {
+MockNIOServerCnxn mock = new 
MockNIOServerCnxn((NIOServerCnxn) serverCnxn);
+// Cancel key
+((NIOServerCnxn) 
serverCnxn).sock.keyFor(((NIOServerCnxnFactory) 
serverFactory).selector).cancel();;
+mock.mockSendBuffer(ByteBuffer.allocate(8));
+}
+} catch (CancelledKeyException e) {
+LOG.error("Exception while sending bytes!", e);
+Assert.fail(e.toString());
+} finally {
+zk.close();
--- End diff --

@rakeshadr Good observation on the long running of the test. This is 
definitely something we should fix. The actual delay indeed happens at client 
close and the root cause is session timeout: when a client closing itself it 
sends a request to server, and this request packet will stuck forever in our 
case because server has canceled the selector; so client session will expire 
eventually and by default, the timeout value between client / server is set as 
30 sec and 2/3 about it - which is 20 sec is exactly what it would cost for a 
heart beat to fail. I fixed this by adjusting the timeout value to 3 sec 
instead just for this single test. PTAL.


> CancelledKeyException in zookeeper 3.4.5
> 
>
> Key: ZOOKEEPER-2044
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2044
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
> Environment: Red Hat Enterprise Linux Server release 6.2
>Reporter: shamjith antholi
>Assignee: Flavio Junqueira
>Priority: Minor
> Fix For: 3.4.10
>
> Attachments: ZOOKEEPER-2044.patch, ZOOKEEPER-2044.patch
>
>
> I am getting cancelled key exception in zookeeper (version 3.4.5). Please see 
> the log below. When this error is thrown, the connected solr shard is going 
> down by giving the error "Failed to index metadata in 
> Solr,StackTrace=SolrError: HTTP status 503.Reason: 
> {"responseHeader":{"status":503,"QTime":204},"error":{"msg":"ClusterState 
> says we are the leader, but locally we don't think so","code":503"  and 
> ultimately the current activity is going down. Could you please give a 
> solution for this ?
> Zookeper log 
> --
> 2014-09-16 02:58:47,799 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
> attempting to renew session 0x24868e7ca980003 at /172.22.0.5:58587
> 2014-09-16 02:58:47,800 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating 
> client: 0x24868e7ca980003
> 2014-09-16 02:58:47,802 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@588] - Invalid 
> session 0x24868e7ca980003 for client /172.22.0.5:58587, probably expired
> 2014-09-16 02:58:47,803 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed 
> socket connection for client /172.22.0.5:58587 which had sessionid 
> 0x24868e7ca980003
> 2014-09-16 02:58:47,810 [myid:1] - ERROR 
> [CommitProcessor:1:NIOServerCnxn@180] - Unexpected Exception:
> java.nio.channels.CancelledKeyException
> at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
> at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
> at 
> 

[GitHub] zookeeper pull request #156: ZOOKEEPER-2044:CancelledKeyException in zookeep...

2017-01-26 Thread hanm
Github user hanm commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/156#discussion_r98120457
  
--- Diff: src/java/test/org/apache/zookeeper/server/NIOServerCnxnTest.java 
---
@@ -68,4 +69,38 @@ public void testOperationsAfterCnxnClose() throws 
IOException,
 zk.close();
 }
 }
+
+/**
+ * Mock extension of NIOServerCnxn to test for
+ * CancelledKeyException (ZOOKEEPER-2044).
+ */
+private static class MockNIOServerCnxn extends NIOServerCnxn {
+public MockNIOServerCnxn(NIOServerCnxn cnxn)
+throws IOException {
+super(cnxn.zkServer, cnxn.sock, cnxn.sk, cnxn.factory);
+}
+
+public void mockSendBuffer(ByteBuffer bb) throws Exception {
+super.internalSendBuffer(bb);
+}
+}
+
+@Test(timeout = 3)
+public void testValidSelectionKey() throws Exception {
+final ZooKeeper zk = createClient();
+try {
+Iterable connections = 
serverFactory.getConnections();
+for (ServerCnxn serverCnxn : connections) {
+MockNIOServerCnxn mock = new 
MockNIOServerCnxn((NIOServerCnxn) serverCnxn);
+// Cancel key
+((NIOServerCnxn) 
serverCnxn).sock.keyFor(((NIOServerCnxnFactory) 
serverFactory).selector).cancel();;
+mock.mockSendBuffer(ByteBuffer.allocate(8));
+}
+} catch (CancelledKeyException e) {
+LOG.error("Exception while sending bytes!", e);
+Assert.fail(e.toString());
+} finally {
+zk.close();
--- End diff --

@rakeshadr Good observation on the long running of the test. This is 
definitely something we should fix. The actual delay indeed happens at client 
close and the root cause is session timeout: when a client closing itself it 
sends a request to server, and this request packet will stuck forever in our 
case because server has canceled the selector; so client session will expire 
eventually and by default, the timeout value between client / server is set as 
30 sec and 2/3 about it - which is 20 sec is exactly what it would cost for a 
heart beat to fail. I fixed this by adjusting the timeout value to 3 sec 
instead just for this single test. PTAL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Paper

2017-01-26 Thread Patrick Hunt
On Thu, Jan 26, 2017 at 1:14 PM, Bobby Evans  wrote:

> We did think about ram disks a little, but the plan still is to have the
> source code, in one form or another, morph into a small distributed time
> series database for the metrics.  With that in mind we thought it would be
> better to take a step in that direction.  Yes a ram disk would likely have
> provided similar performance.  Although we would have still wanted to
> separate out just the metrics to the ram disk backed ZK, because we store
> other more critical data in ZK too, that need more durability guarantees.
>
>
Indeed, that was my thinking as well - two ZKs, one with "ephemeral" data
and one with precious. Your response makes sense/expected, I only ask
because I didn't see it mentioned in the document and for many folks it's a
good, if perhaps short term, solution.

Regards,

Patrick


> - Bobby
>
>
> On Thursday, January 26, 2017, 10:36:01 AM CST, Patrick Hunt <
> ph...@apache.org> wrote:
> Very interesting results and real world insights. Thanks for
> creating/sharing.
>
> One thing I noticed is that you mentioned considering SSDs, had you also
> considered using ram disks? I've seen some scenarios where that has been
> very successful.
>
> Patrick
>
> On Thu, Jan 26, 2017 at 6:28 AM, Bobby Evans 
> wrote:
>
> > As one of the authors of pacemaker in Apache Storm (and the paper), I am
> > happy to answer any questions about why we did it or how it works.  The
> > reality of it is storm was, and still is by default, abusing zookeeper by
> > trying to store a massive amount of metrics in it, instead of the
> > configuration/coordination it was designed for. And since storm metrics
> > don't really need strong consistency or even that much in terms of
> > reliability guarantees we stood up a netty server in front of a
> > ConcurrentHashMap (quite literately) and then wrote a client that could
> > handle fail-over.
> > It really is meant as a scalability stepping stone until we can get to
> the
> > point that all the metrics go to a TSDB that is actually designed for
> > metrics. But like I said if you have any questions I am happy to answer
> > them.
> > Sadly because of the way IEEE works neither I nor my employer own the
> copy
> > right to that paper any more so I can't even put a copy of it up for you
> to
> > read.
> >
> >
> > - Bobby
> >
> > On Thursday, January 26, 2017, 6:44:56 AM CST, ibrahim El-sanosi <
> > ibrahimsaba...@gmail.com> wrote:Hi folk,
> >
> > There is a paper published recently "PaceMaker: When ZooKeeper Arteries
> Get
> > Clogged in Storm Clusters" [1]. It may worth to read.
> >
> > [1]
> > http://ieeexplore.ieee.org/document/7820303/?tp=;
> > arnumber=7820303=Conference%20Publications=
> > eWFob28uY29t=SEARCHALERT
> >
> > Ibrahim
> >
>


Re: Paper

2017-01-26 Thread Bobby Evans
We did think about ram disks a little, but the plan still is to have the source 
code, in one form or another, morph into a small distributed time series 
database for the metrics.  With that in mind we thought it would be better to 
take a step in that direction.  Yes a ram disk would likely have provided 
similar performance.  Although we would have still wanted to separate out just 
the metrics to the ram disk backed ZK, because we store other more critical 
data in ZK too, that need more durability guarantees.

- Bobby

On Thursday, January 26, 2017, 10:36:01 AM CST, Patrick Hunt  
wrote:Very interesting results and real world insights. Thanks for
creating/sharing.

One thing I noticed is that you mentioned considering SSDs, had you also
considered using ram disks? I've seen some scenarios where that has been
very successful.

Patrick

On Thu, Jan 26, 2017 at 6:28 AM, Bobby Evans 
wrote:

> As one of the authors of pacemaker in Apache Storm (and the paper), I am
> happy to answer any questions about why we did it or how it works.  The
> reality of it is storm was, and still is by default, abusing zookeeper by
> trying to store a massive amount of metrics in it, instead of the
> configuration/coordination it was designed for. And since storm metrics
> don't really need strong consistency or even that much in terms of
> reliability guarantees we stood up a netty server in front of a
> ConcurrentHashMap (quite literately) and then wrote a client that could
> handle fail-over.
> It really is meant as a scalability stepping stone until we can get to the
> point that all the metrics go to a TSDB that is actually designed for
> metrics. But like I said if you have any questions I am happy to answer
> them.
> Sadly because of the way IEEE works neither I nor my employer own the copy
> right to that paper any more so I can't even put a copy of it up for you to
> read.
>
>
> - Bobby
>
> On Thursday, January 26, 2017, 6:44:56 AM CST, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:Hi folk,
>
> There is a paper published recently "PaceMaker: When ZooKeeper Arteries Get
> Clogged in Storm Clusters" [1]. It may worth to read.
>
> [1]
> http://ieeexplore.ieee.org/document/7820303/?tp=;
> arnumber=7820303=Conference%20Publications=
> eWFob28uY29t=SEARCHALERT
>
> Ibrahim
>


[jira] [Commented] (ZOOKEEPER-2464) NullPointerException on ContainerManager

2017-01-26 Thread Edward Ribeiro (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840464#comment-15840464
 ] 

Edward Ribeiro commented on ZOOKEEPER-2464:
---

Yeah... makes sense, it's pretty inconsistent behavior. It would require some 
defensive code as {{DataNode.setChildren(null)}} could introduce the null 
again. In fact, it is a total  refactoring of {{DataNode}}, albeit a small 
class. Wdyt [~randgalt]? 

> NullPointerException on ContainerManager
> 
>
> Key: ZOOKEEPER-2464
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2464
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1
>Reporter: Stefano Salmaso
>Assignee: Jordan Zimmerman
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ContainerManagerTest.java, ZOOKEEPER-2464.patch
>
>
> I would like to expose you to a problem that we are experiencing.
> We are using a cluster of 7 zookeeper and we use them to implement a 
> distributed lock using Curator 
> (http://curator.apache.org/curator-recipes/shared-reentrant-lock.html)
> So .. we tried to play with the servers to see if everything worked properly 
> and we stopped and start servers to see that the system worked well
> (like stop 03, stop 05, stop 06, start 05, start 06, start 03)
> We saw a strange behavior.
> The number of znodes grew up without stopping (normally we had 4000 or 5000, 
> we got to 60,000 and then we stopped our application)
> In zookeeeper logs I saw this (on leader only, one every minute)
> 2016-07-04 14:53:50,302 [myid:7] - ERROR 
> [ContainerManagerTask:ContainerManager$1@84] - Error checking containers
> java.lang.NullPointerException
>at 
> org.apache.zookeeper.server.ContainerManager.getCandidates(ContainerManager.java:151)
>at 
> org.apache.zookeeper.server.ContainerManager.checkContainers(ContainerManager.java:111)
>at 
> org.apache.zookeeper.server.ContainerManager$1.run(ContainerManager.java:78)
>at java.util.TimerThread.mainLoop(Timer.java:555)
>at java.util.TimerThread.run(Timer.java:505)
> We have not yet deleted the data ... so the problem can be reproduced on our 
> servers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2464) NullPointerException on ContainerManager

2017-01-26 Thread Mohammad Arshad (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840421#comment-15840421
 ] 

Mohammad Arshad commented on ZOOKEEPER-2464:


Root cause of the problem is the inconsistent behavior of 
DataNode.getChildren() API.
DataNode.getChildren() API Current Behaviour:
# returns null initially
When DataNode is created and no children are added yet, DataNode.getChildren() 
returns null
# returns empty set after all the children are deleted:
created a Node
add a child
delete the child
DataNode.getChildren() returns empty set.

I think we should fix this issue by modifying the DataNode.getChildren() API. 
We should always return empty set if there is no child.

> NullPointerException on ContainerManager
> 
>
> Key: ZOOKEEPER-2464
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2464
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1
>Reporter: Stefano Salmaso
>Assignee: Jordan Zimmerman
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ContainerManagerTest.java, ZOOKEEPER-2464.patch
>
>
> I would like to expose you to a problem that we are experiencing.
> We are using a cluster of 7 zookeeper and we use them to implement a 
> distributed lock using Curator 
> (http://curator.apache.org/curator-recipes/shared-reentrant-lock.html)
> So .. we tried to play with the servers to see if everything worked properly 
> and we stopped and start servers to see that the system worked well
> (like stop 03, stop 05, stop 06, start 05, start 06, start 03)
> We saw a strange behavior.
> The number of znodes grew up without stopping (normally we had 4000 or 5000, 
> we got to 60,000 and then we stopped our application)
> In zookeeeper logs I saw this (on leader only, one every minute)
> 2016-07-04 14:53:50,302 [myid:7] - ERROR 
> [ContainerManagerTask:ContainerManager$1@84] - Error checking containers
> java.lang.NullPointerException
>at 
> org.apache.zookeeper.server.ContainerManager.getCandidates(ContainerManager.java:151)
>at 
> org.apache.zookeeper.server.ContainerManager.checkContainers(ContainerManager.java:111)
>at 
> org.apache.zookeeper.server.ContainerManager$1.run(ContainerManager.java:78)
>at java.util.TimerThread.mainLoop(Timer.java:555)
>at java.util.TimerThread.run(Timer.java:505)
> We have not yet deleted the data ... so the problem can be reproduced on our 
> servers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2659) Use log4j2 as a logging framework as log4j 1.X is now deprecated

2017-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840226#comment-15840226
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2659:
---

Github user nerdyyatrice commented on the issue:

https://github.com/apache/zookeeper/pull/148
  
Hi, I tried the same approach and I got a bin place conflict as some of the 
dependencies in my ivy.xml is still using log4j1.2. I wonder how can I find out 
which dependency bin placed that or is that a way to override that?


> Use log4j2 as a logging framework as log4j 1.X is now deprecated
> 
>
> Key: ZOOKEEPER-2659
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2659
> Project: ZooKeeper
>  Issue Type: Wish
>Reporter: Pushkar Raste
>Assignee: Pushkar Raste
>Priority: Minor
> Attachments: zk_log4j2_migration.patch
>
>
> Zookeeper currently uses {{log4j 1.X}} as the default logging framework. 
> {{log4j 1.X}} is now deprecated http://logging.apache.org/log4j/1.2/
> This ticket is to track efforts to move zookeeper to {{log4j2}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] zookeeper issue #148: ZOOKEEPER-2659 Log4j 2 migration

2017-01-26 Thread nerdyyatrice
Github user nerdyyatrice commented on the issue:

https://github.com/apache/zookeeper/pull/148
  
Hi, I tried the same approach and I got a bin place conflict as some of the 
dependencies in my ivy.xml is still using log4j1.2. I wonder how can I find out 
which dependency bin placed that or is that a way to override that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2659) Use log4j2 as a logging framework as log4j 1.X is now deprecated

2017-01-26 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840125#comment-15840125
 ] 

Pushkar Raste commented on ZOOKEEPER-2659:
--

I am not a committer. 
Can someone take a look. 

> Use log4j2 as a logging framework as log4j 1.X is now deprecated
> 
>
> Key: ZOOKEEPER-2659
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2659
> Project: ZooKeeper
>  Issue Type: Wish
>Reporter: Pushkar Raste
>Assignee: Pushkar Raste
>Priority: Minor
> Attachments: zk_log4j2_migration.patch
>
>
> Zookeeper currently uses {{log4j 1.X}} as the default logging framework. 
> {{log4j 1.X}} is now deprecated http://logging.apache.org/log4j/1.2/
> This ticket is to track efforts to move zookeeper to {{log4j2}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


ZooKeeper_branch35_solaris - Build # 411 - Still Failing

2017-01-26 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_solaris/411/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 468102 lines...]
[junit] 2017-01-26 17:17:16,769 [myid:] - INFO  [main:ClientBase@386] - 
CREATING server instance 127.0.0.1:11222
[junit] 2017-01-26 17:17:16,769 [myid:] - INFO  
[main:NIOServerCnxnFactory@673] - Configuring NIO connection handler with 10s 
sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 
kB direct buffers.
[junit] 2017-01-26 17:17:16,770 [myid:] - INFO  
[main:NIOServerCnxnFactory@686] - binding to port 0.0.0.0/0.0.0.0:11222
[junit] 2017-01-26 17:17:16,771 [myid:] - INFO  [main:ClientBase@361] - 
STARTING server instance 127.0.0.1:11222
[junit] 2017-01-26 17:17:16,771 [myid:] - INFO  [main:ZooKeeperServer@893] 
- minSessionTimeout set to 6000
[junit] 2017-01-26 17:17:16,771 [myid:] - INFO  [main:ZooKeeperServer@902] 
- maxSessionTimeout set to 6
[junit] 2017-01-26 17:17:16,771 [myid:] - INFO  [main:ZooKeeperServer@159] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch35_solaris/build/test/tmp/test5698106632771417782.junit.dir/version-2
 snapdir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch35_solaris/build/test/tmp/test5698106632771417782.junit.dir/version-2
[junit] 2017-01-26 17:17:16,772 [myid:] - INFO  [main:FileSnap@83] - 
Reading snapshot 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch35_solaris/build/test/tmp/test5698106632771417782.junit.dir/version-2/snapshot.b
[junit] 2017-01-26 17:17:16,774 [myid:] - INFO  [main:FileTxnSnapLog@320] - 
Snapshotting: 0xb to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch35_solaris/build/test/tmp/test5698106632771417782.junit.dir/version-2/snapshot.b
[junit] 2017-01-26 17:17:16,775 [myid:] - ERROR [main:ZooKeeperServer@505] 
- ZKShutdownHandler is not registered, so ZooKeeper server won't take any 
action on ERROR or SHUTDOWN server state changes
[junit] 2017-01-26 17:17:16,775 [myid:] - INFO  
[main:FourLetterWordMain@85] - connecting to 127.0.0.1 11222
[junit] 2017-01-26 17:17:16,776 [myid:] - INFO  
[NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11222:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:53056
[junit] 2017-01-26 17:17:16,777 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@485] - Processing stat command from 
/127.0.0.1:53056
[junit] 2017-01-26 17:17:16,777 [myid:] - INFO  
[NIOWorkerThread-1:StatCommand@49] - Stat command output
[junit] 2017-01-26 17:17:16,777 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@614] - Closed socket connection for client 
/127.0.0.1:53056 (no session established for client)
[junit] 2017-01-26 17:17:16,778 [myid:] - INFO  [main:JMXEnv@228] - 
ensureParent:[InMemoryDataTree, StandaloneServer_port]
[junit] 2017-01-26 17:17:16,779 [myid:] - INFO  [main:JMXEnv@245] - 
expect:InMemoryDataTree
[junit] 2017-01-26 17:17:16,779 [myid:] - INFO  [main:JMXEnv@249] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port11222,name1=InMemoryDataTree
[junit] 2017-01-26 17:17:16,779 [myid:] - INFO  [main:JMXEnv@245] - 
expect:StandaloneServer_port
[junit] 2017-01-26 17:17:16,779 [myid:] - INFO  [main:JMXEnv@249] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port11222
[junit] 2017-01-26 17:17:16,780 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 17885
[junit] 2017-01-26 17:17:16,780 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 24
[junit] 2017-01-26 17:17:16,780 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD 
testQuota
[junit] 2017-01-26 17:17:16,780 [myid:] - INFO  [main:ClientBase@543] - 
tearDown starting
[junit] 2017-01-26 17:17:16,852 [myid:] - INFO  [main:ZooKeeper@1322] - 
Session: 0x12658c14419 closed
[junit] 2017-01-26 17:17:16,852 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down for 
session: 0x12658c14419
[junit] 2017-01-26 17:17:16,852 [myid:] - INFO  [main:ClientBase@513] - 
STOPPING server
[junit] 2017-01-26 17:17:16,852 [myid:] - INFO  
[ConnnectionExpirer:NIOServerCnxnFactory$ConnectionExpirerThread@583] - 
ConnnectionExpirerThread interrupted
[junit] 2017-01-26 17:17:16,853 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] 
- selector thread exitted run method
[junit] 2017-01-26 17:17:16,852 [myid:] - INFO  

[jira] [Commented] (ZOOKEEPER-2659) Use log4j2 as a logging framework as log4j 1.X is now deprecated

2017-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840018#comment-15840018
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2659:
---

Github user jvz commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/148#discussion_r98037086
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -413,13 +418,18 @@ public void testBadPeerAddressInQuorum() throws 
Exception {
 ClientBase.setupTestEnv();
 
 // setup the logger to capture all logs
+LoggerContext loggerContext =  (LoggerContext) 
LogManager.getContext(false);
--- End diff --

Oh sorry, I meant to get back to you on this much sooner. You can merge 
without using it; just thought it would be a less hacky test.


> Use log4j2 as a logging framework as log4j 1.X is now deprecated
> 
>
> Key: ZOOKEEPER-2659
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2659
> Project: ZooKeeper
>  Issue Type: Wish
>Reporter: Pushkar Raste
>Assignee: Pushkar Raste
>Priority: Minor
> Attachments: zk_log4j2_migration.patch
>
>
> Zookeeper currently uses {{log4j 1.X}} as the default logging framework. 
> {{log4j 1.X}} is now deprecated http://logging.apache.org/log4j/1.2/
> This ticket is to track efforts to move zookeeper to {{log4j2}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] zookeeper pull request #148: ZOOKEEPER-2659 Log4j 2 migration

2017-01-26 Thread jvz
Github user jvz commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/148#discussion_r98037086
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -413,13 +418,18 @@ public void testBadPeerAddressInQuorum() throws 
Exception {
 ClientBase.setupTestEnv();
 
 // setup the logger to capture all logs
+LoggerContext loggerContext =  (LoggerContext) 
LogManager.getContext(false);
--- End diff --

Oh sorry, I meant to get back to you on this much sooner. You can merge 
without using it; just thought it would be a less hacky test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Paper

2017-01-26 Thread Patrick Hunt
Very interesting results and real world insights. Thanks for
creating/sharing.

One thing I noticed is that you mentioned considering SSDs, had you also
considered using ram disks? I've seen some scenarios where that has been
very successful.

Patrick

On Thu, Jan 26, 2017 at 6:28 AM, Bobby Evans 
wrote:

> As one of the authors of pacemaker in Apache Storm (and the paper), I am
> happy to answer any questions about why we did it or how it works.  The
> reality of it is storm was, and still is by default, abusing zookeeper by
> trying to store a massive amount of metrics in it, instead of the
> configuration/coordination it was designed for. And since storm metrics
> don't really need strong consistency or even that much in terms of
> reliability guarantees we stood up a netty server in front of a
> ConcurrentHashMap (quite literately) and then wrote a client that could
> handle fail-over.
> It really is meant as a scalability stepping stone until we can get to the
> point that all the metrics go to a TSDB that is actually designed for
> metrics. But like I said if you have any questions I am happy to answer
> them.
> Sadly because of the way IEEE works neither I nor my employer own the copy
> right to that paper any more so I can't even put a copy of it up for you to
> read.
>
>
> - Bobby
>
> On Thursday, January 26, 2017, 6:44:56 AM CST, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:Hi folk,
>
> There is a paper published recently "PaceMaker: When ZooKeeper Arteries Get
> Clogged in Storm Clusters" [1]. It may worth to read.
>
> [1]
> http://ieeexplore.ieee.org/document/7820303/?tp=;
> arnumber=7820303=Conference%20Publications=
> eWFob28uY29t=SEARCHALERT
>
> Ibrahim
>


Failed: ZOOKEEPER- PreCommit Build #255

2017-01-26 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/255/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 25 lines...]
  Getting sizes
Done: 18
  Compressing objects
Done: 0
  Writing objects
Done: 19
  remote: Updating references
Merging refs/tags/changes/255
 > git rev-parse refs/tags/changes/255^{commit} # timeout=10
 > git merge 5aa25620e0189b28d7040305272be2fda28126fb # timeout=10
 > git rev-parse branch-3.4^{commit} # timeout=10
Checking out Revision 5aa25620e0189b28d7040305272be2fda28126fb (branch-3.4)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 5aa25620e0189b28d7040305272be2fda28126fb
 > git rev-parse origin/branch-3.4^{commit} # timeout=10
 > git rev-list d6bbfd76d24c044073764c5d074a9198c69fafab # timeout=10
No emails were triggered.
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[PreCommit-ZOOKEEPER-github-pr-build] $ /bin/bash 
/tmp/hudson9068826468290050932.sh
/home/jenkins/tools/java/latest1.7/bin/java
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 386177
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 6
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 10240
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited
Buildfile: 
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml

BUILD FAILED
Target "qa-test-pullrequest" does not exist in the project "ZooKeeper". 

Total time: 0 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
ERROR: Step ‘Publish JUnit test result report’ failed: No test report files 
were found. Configuration error?
[description-setter] Could not determine description.
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
No tests ran.

[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-01-26 Thread revans2
GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/157

ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DB

This patch addresses recovery time when a leader is lost on a large DB.  

It does this by not clearing the DB before leader election begins, and by 
avoiding taking a snapshot as part of the SYNC phase, specifically for a DIFF 
sync. It does this by buffering the proposals and commits just like the code 
currently does for proposals/commits sent after the NEWLEADER and before the 
UPTODATE messages. 

If a SNAP is sent we cannot avoid writing out the full snapshot because 
there is no other way to make sure the disk DB is in sync with what is in 
memory.  So any edits to the edit log before a background snapshot happened 
could possibly be applied on top of an incorrect snapshot.

This same optimization should work for TRUNC too, but I opted not to do it 
for TRUNC because TRUNC is rare and TRUNC by its very nature already forces the 
DB to be reread after the edit logs are modified.  So it would still not be 
fast.

In practice this makes it so instead of taking 5+ mins for the cluster to 
recover from losing a leader it now takes about 3 seconds.

I am happy to port this to 3.5. if it looks good.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2678

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/157.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #157


commit 5aa25620e0189b28d7040305272be2fda28126fb
Author: Robert (Bobby) Evans 
Date:   2017-01-19T19:50:32Z

ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DBs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2678) Large databases take a long time to regain a quorum

2017-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839860#comment-15839860
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2678:
---

GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/157

ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DB

This patch addresses recovery time when a leader is lost on a large DB.  

It does this by not clearing the DB before leader election begins, and by 
avoiding taking a snapshot as part of the SYNC phase, specifically for a DIFF 
sync. It does this by buffering the proposals and commits just like the code 
currently does for proposals/commits sent after the NEWLEADER and before the 
UPTODATE messages. 

If a SNAP is sent we cannot avoid writing out the full snapshot because 
there is no other way to make sure the disk DB is in sync with what is in 
memory.  So any edits to the edit log before a background snapshot happened 
could possibly be applied on top of an incorrect snapshot.

This same optimization should work for TRUNC too, but I opted not to do it 
for TRUNC because TRUNC is rare and TRUNC by its very nature already forces the 
DB to be reread after the edit logs are modified.  So it would still not be 
fast.

In practice this makes it so instead of taking 5+ mins for the cluster to 
recover from losing a leader it now takes about 3 seconds.

I am happy to port this to 3.5. if it looks good.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2678

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/157.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #157


commit 5aa25620e0189b28d7040305272be2fda28126fb
Author: Robert (Bobby) Evans 
Date:   2017-01-19T19:50:32Z

ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DBs




> Large databases take a long time to regain a quorum
> ---
>
> Key: ZOOKEEPER-2678
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2678
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.9, 3.5.2
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>
> I know this is long but please here me out.
> I recently inherited a massive zookeeper ensemble.  The snapshot is 3.4 GB on 
> disk.  Because of its massive size we have been running into a number of 
> issues. There are lots of problems that we hope to fix with tuning GC etc, 
> but the big one right now that is blocking us making a lot of progress on the 
> rest of them is that when we lose a quorum because the leader left, for what 
> ever reason, it can take well over 5 mins for a new quorum to be established. 
>  So we cannot tune the leader without risking downtime.
> We traced down where the time was being spent and found that each server was 
> clearing the database so it would be read back in again before leader 
> election even started.  Then as part of the sync phase each server will write 
> out a snapshot to checkpoint the progress it made as part of the sync.
> I will be putting up a patch shortly with some proposed changes in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2659) Use log4j2 as a logging framework as log4j 1.X is now deprecated

2017-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839840#comment-15839840
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2659:
---

Github user praste commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/148#discussion_r98013864
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -413,13 +418,18 @@ public void testBadPeerAddressInQuorum() throws 
Exception {
 ClientBase.setupTestEnv();
 
 // setup the logger to capture all logs
+LoggerContext loggerContext =  (LoggerContext) 
LogManager.getContext(false);
--- End diff --

@jvz is using ListAppender absolutely necessary? 

Can we merge this with  current changes? 


> Use log4j2 as a logging framework as log4j 1.X is now deprecated
> 
>
> Key: ZOOKEEPER-2659
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2659
> Project: ZooKeeper
>  Issue Type: Wish
>Reporter: Pushkar Raste
>Assignee: Pushkar Raste
>Priority: Minor
> Attachments: zk_log4j2_migration.patch
>
>
> Zookeeper currently uses {{log4j 1.X}} as the default logging framework. 
> {{log4j 1.X}} is now deprecated http://logging.apache.org/log4j/1.2/
> This ticket is to track efforts to move zookeeper to {{log4j2}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2678) Large databases take a long time to regain a quorum

2017-01-26 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created ZOOKEEPER-2678:
--

 Summary: Large databases take a long time to regain a quorum
 Key: ZOOKEEPER-2678
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2678
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.2, 3.4.9
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


I know this is long but please here me out.

I recently inherited a massive zookeeper ensemble.  The snapshot is 3.4 GB on 
disk.  Because of its massive size we have been running into a number of 
issues. There are lots of problems that we hope to fix with tuning GC etc, but 
the big one right now that is blocking us making a lot of progress on the rest 
of them is that when we lose a quorum because the leader left, for what ever 
reason, it can take well over 5 mins for a new quorum to be established.  So we 
cannot tune the leader without risking downtime.

We traced down where the time was being spent and found that each server was 
clearing the database so it would be read back in again before leader election 
even started.  Then as part of the sync phase each server will write out a 
snapshot to checkpoint the progress it made as part of the sync.

I will be putting up a patch shortly with some proposed changes in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] zookeeper pull request #148: ZOOKEEPER-2659 Log4j 2 migration

2017-01-26 Thread praste
Github user praste commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/148#discussion_r98013864
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -413,13 +418,18 @@ public void testBadPeerAddressInQuorum() throws 
Exception {
 ClientBase.setupTestEnv();
 
 // setup the logger to capture all logs
+LoggerContext loggerContext =  (LoggerContext) 
LogManager.getContext(false);
--- End diff --

@jvz is using ListAppender absolutely necessary? 

Can we merge this with  current changes? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Paper

2017-01-26 Thread Paul Asmuth
You can already find the paper on sci-hub.io (search for the DOI I)Dsci

On Thu, Jan 26, 2017 at 3:17 PM, Jordan Zimmerman <
jor...@jordanzimmerman.com> wrote:

> Sad that such an important paper requires a fee. Is there a free version
> anywhere?
>
> -Jordan
>
> > On Jan 26, 2017, at 7:44 AM, ibrahim El-sanosi 
> wrote:
> >
> > Hi folk,
> >
> > There is a paper published recently "PaceMaker: When ZooKeeper Arteries
> Get
> > Clogged in Storm Clusters" [1]. It may worth to read.
> >
> > [1]
> > http://ieeexplore.ieee.org/document/7820303/?tp=;
> arnumber=7820303=Conference%20Publications=
> eWFob28uY29t=SEARCHALERT
> >
> > Ibrahim
>
>


-- 
Paul Asmuth
T: +31-622-351956
p...@asmuth.com

EventQL | DeepCortex GmbH
https://eventql.io/
Kantstraße 33
10625 Berlin


Re: Paper

2017-01-26 Thread Bobby Evans
As one of the authors of pacemaker in Apache Storm (and the paper), I am happy 
to answer any questions about why we did it or how it works.  The reality of it 
is storm was, and still is by default, abusing zookeeper by trying to store a 
massive amount of metrics in it, instead of the configuration/coordination it 
was designed for. And since storm metrics don't really need strong consistency 
or even that much in terms of reliability guarantees we stood up a netty server 
in front of a ConcurrentHashMap (quite literately) and then wrote a client that 
could handle fail-over.
It really is meant as a scalability stepping stone until we can get to the 
point that all the metrics go to a TSDB that is actually designed for metrics. 
But like I said if you have any questions I am happy to answer them.
Sadly because of the way IEEE works neither I nor my employer own the copy 
right to that paper any more so I can't even put a copy of it up for you to 
read.


- Bobby

On Thursday, January 26, 2017, 6:44:56 AM CST, ibrahim El-sanosi 
 wrote:Hi folk,

There is a paper published recently "PaceMaker: When ZooKeeper Arteries Get
Clogged in Storm Clusters" [1]. It may worth to read.

[1]
http://ieeexplore.ieee.org/document/7820303/?tp==7820303=Conference%20Publications=eWFob28uY29t=SEARCHALERT

Ibrahim


Re: Paper

2017-01-26 Thread Jordan Zimmerman
Sad that such an important paper requires a fee. Is there a free version 
anywhere?

-Jordan

> On Jan 26, 2017, at 7:44 AM, ibrahim El-sanosi  
> wrote:
> 
> Hi folk,
> 
> There is a paper published recently "PaceMaker: When ZooKeeper Arteries Get
> Clogged in Storm Clusters" [1]. It may worth to read.
> 
> [1]
> http://ieeexplore.ieee.org/document/7820303/?tp==7820303=Conference%20Publications=eWFob28uY29t=SEARCHALERT
> 
> Ibrahim



ZooKeeper_branch34_solaris - Build # 1446 - Still Failing

2017-01-26 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_solaris/1446/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 197435 lines...]
[junit] 2017-01-26 13:54:46,272 [myid:] - INFO  [main:ZooKeeperServer@497] 
- shutting down
[junit] 2017-01-26 13:54:46,272 [myid:] - ERROR [main:ZooKeeperServer@472] 
- ZKShutdownHandler is not registered, so ZooKeeper server won't take any 
action on ERROR or SHUTDOWN server state changes
[junit] 2017-01-26 13:54:46,272 [myid:] - INFO  
[main:SessionTrackerImpl@225] - Shutting down
[junit] 2017-01-26 13:54:46,272 [myid:] - INFO  
[main:PrepRequestProcessor@765] - Shutting down
[junit] 2017-01-26 13:54:46,272 [myid:] - INFO  
[main:SyncRequestProcessor@208] - Shutting down
[junit] 2017-01-26 13:54:46,272 [myid:] - INFO  [ProcessThread(sid:0 
cport:11221)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop!
[junit] 2017-01-26 13:54:46,272 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@186] - SyncRequestProcessor exited!
[junit] 2017-01-26 13:54:46,272 [myid:] - INFO  
[main:FinalRequestProcessor@402] - shutdown of request processor complete
[junit] 2017-01-26 13:54:46,273 [myid:] - INFO  
[main:FourLetterWordMain@62] - connecting to 127.0.0.1 11221
[junit] 2017-01-26 13:54:46,273 [myid:] - INFO  [main:JMXEnv@147] - 
ensureOnly:[]
[junit] 2017-01-26 13:54:46,274 [myid:] - INFO  [main:ClientBase@445] - 
STARTING server
[junit] 2017-01-26 13:54:46,274 [myid:] - INFO  [main:ClientBase@366] - 
CREATING server instance 127.0.0.1:11221
[junit] 2017-01-26 13:54:46,275 [myid:] - INFO  
[main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:11221
[junit] 2017-01-26 13:54:46,275 [myid:] - INFO  [main:ClientBase@341] - 
STARTING server instance 127.0.0.1:11221
[junit] 2017-01-26 13:54:46,275 [myid:] - INFO  [main:ZooKeeperServer@173] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch34_solaris/build/test/tmp/test5802278521389583743.junit.dir/version-2
 snapdir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch34_solaris/build/test/tmp/test5802278521389583743.junit.dir/version-2
[junit] 2017-01-26 13:54:46,278 [myid:] - ERROR [main:ZooKeeperServer@472] 
- ZKShutdownHandler is not registered, so ZooKeeper server won't take any 
action on ERROR or SHUTDOWN server state changes
[junit] 2017-01-26 13:54:46,278 [myid:] - INFO  
[main:FourLetterWordMain@62] - connecting to 127.0.0.1 11221
[junit] 2017-01-26 13:54:46,278 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@192] - 
Accepted socket connection from /127.0.0.1:34218
[junit] 2017-01-26 13:54:46,279 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@827] - Processing 
stat command from /127.0.0.1:34218
[junit] 2017-01-26 13:54:46,279 [myid:] - INFO  
[Thread-5:NIOServerCnxn$StatCommand@663] - Stat command output
[junit] 2017-01-26 13:54:46,279 [myid:] - INFO  
[Thread-5:NIOServerCnxn@1008] - Closed socket connection for client 
/127.0.0.1:34218 (no session established for client)
[junit] 2017-01-26 13:54:46,279 [myid:] - INFO  [main:JMXEnv@230] - 
ensureParent:[InMemoryDataTree, StandaloneServer_port]
[junit] 2017-01-26 13:54:46,280 [myid:] - INFO  [main:JMXEnv@247] - 
expect:InMemoryDataTree
[junit] 2017-01-26 13:54:46,280 [myid:] - INFO  [main:JMXEnv@251] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port11221,name1=InMemoryDataTree
[junit] 2017-01-26 13:54:46,281 [myid:] - INFO  [main:JMXEnv@247] - 
expect:StandaloneServer_port
[junit] 2017-01-26 13:54:46,281 [myid:] - INFO  [main:JMXEnv@251] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port11221
[junit] 2017-01-26 13:54:46,281 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@58] - Memory used 8996
[junit] 2017-01-26 13:54:46,281 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@63] - Number of threads 20
[junit] 2017-01-26 13:54:46,281 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@78] - FINISHED TEST METHOD testQuota
[junit] 2017-01-26 13:54:46,281 [myid:] - INFO  [main:ClientBase@522] - 
tearDown starting
[junit] 2017-01-26 13:54:46,362 [myid:] - INFO  [main:ZooKeeper@684] - 
Session: 0x159db0e97a5 closed
[junit] 2017-01-26 13:54:46,362 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@519] - EventThread shut down for 
session: 0x159db0e97a5
[junit] 2017-01-26 13:54:46,362 [myid:] - INFO  [main:ClientBase@492] - 
STOPPING server
[junit] 2017-01-26 13:54:46,363 [myid:] - INFO  [main:ZooKeeperServer@497] 
- shutting down
[junit] 2017-01-26 13:54:46,363 [myid:] - 

Paper

2017-01-26 Thread ibrahim El-sanosi
Hi folk,

There is a paper published recently "PaceMaker: When ZooKeeper Arteries Get
Clogged in Storm Clusters" [1]. It may worth to read.

[1]
http://ieeexplore.ieee.org/document/7820303/?tp==7820303=Conference%20Publications=eWFob28uY29t=SEARCHALERT

Ibrahim


ZooKeeper_branch35_jdk8 - Build # 392 - Still Failing

2017-01-26 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk8/392/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 437582 lines...]
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-01-26 12:15:57,545 [myid:127.0.0.1:11348] - INFO  
[main-SendThread(127.0.0.1:11348):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:11348. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-01-26 12:15:57,545 [myid:127.0.0.1:11348] - WARN  
[main-SendThread(127.0.0.1:11348):ClientCnxn$SendThread@1235] - Session 
0x202057b70ce for server 127.0.0.1/127.0.0.1:11348, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-01-26 12:15:57,650 [myid:127.0.0.1:11345] - INFO  
[main-SendThread(127.0.0.1:11345):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:11345. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-01-26 12:15:57,651 [myid:127.0.0.1:11345] - WARN  
[main-SendThread(127.0.0.1:11345):ClientCnxn$SendThread@1235] - Session 
0x102057b70c9 for server 127.0.0.1/127.0.0.1:11345, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-01-26 12:15:57,853 [myid:127.0.0.1:11222] - INFO  
[main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:11222. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-01-26 12:15:57,853 [myid:127.0.0.1:11222] - WARN  
[main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1235] - Session 
0x10205779eaf for server 127.0.0.1/127.0.0.1:11222, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-01-26 12:15:58,077 [myid:] - INFO  [ProcessThread(sid:0 
cport:11468)::PrepRequestProcessor@656] - Processed session termination for 
sessionid: 0x102057ecd12
[junit] 2017-01-26 12:15:58,090 [myid:] - INFO  [main:ZooKeeper@1322] - 
Session: 0x102057ecd12 closed
[junit] 2017-01-26 12:15:58,090 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down for 
session: 0x102057ecd12
[junit] 2017-01-26 12:15:58,090 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 214901
[junit] 2017-01-26 12:15:58,091 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 2427
[junit] 2017-01-26 12:15:58,091 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD 
testWatcherAutoResetWithLocal
[junit] 2017-01-26 12:15:58,091 [myid:] - INFO  [main:ClientBase@543] - 
tearDown starting
[junit] 2017-01-26 12:15:58,091 [myid:] - INFO  [main:ClientBase@513] - 
STOPPING server
[junit] 2017-01-26 12:15:58,091 [myid:] - INFO  
[main:NettyServerCnxnFactory@464] - shutdown called 0.0.0.0/0.0.0.0:11468
[junit] 2017-01-26 12:15:58,090 [myid:] - INFO  
[SyncThread:0:MBeanRegistry@128] - Unregister MBean 
[org.apache.ZooKeeperService:name0=StandaloneServer_port11468,name1=Connections,name2=127.0.0.1,name3=0x102057ecd12]
[junit] 2017-01-26 12:15:58,100 [myid:] - INFO  [main:ZooKeeperServer@533] 
- shutting down
[junit] 2017-01-26 12:15:58,101 [myid:] - ERROR [main:ZooKeeperServer@505] 
- ZKShutdownHandler is not registered, so ZooKeeper server won't take any 
action on ERROR or SHUTDOWN server state changes
[junit] 2017-01-26 

Re: ZooKeeper 3.4.10 release discussion

2017-01-26 Thread Edward Ribeiro
Hi,

Rakesh and Flavio, what do you think about merging ZOOKEEPER-2622 to
branch-3.4 and include it in 3.4.10 besides branch-3.5 and master?

Edward

On Thu, Jan 26, 2017 at 8:20 AM, Flavio Junqueira  wrote:

> Here are a few comments on the proposal of changes to the release process:
>
> - It might be a better idea to preserve the HowToRelease document for
> future reference, clone the document, and change the cloned document to
> reflect the git commands rather than svn.
> - We still need to modify Step 2 to be git oriented, otherwise it will
> look odd that we have svn there.
> - In Step 4, I thought that we had informally agreed to rely on the git
> log rather than maintain the CHANGES.txt file. If we aren't all onboard
> with the idea of stopping to use CHANGES.txt, then we need to discuss this
> separately.
> - Steps 5 and 6: I'm not sure why the steps to produce the release notes
> changes. We still resolve issues on jira which is pretty much the source of
> data for the release notes.
> - Step 10: I personally don't like using "git commit -a" unless you're
> pretty sure that it is what you want. A much safer approach is to run "git
> status" and "git add" to the individual files/directories.
> - Step 11: Why are we tagging with -s? Is that standard practice in other
> projects?
>
> -Flavio
>
> > On 26 Jan 2017, at 03:30, Rakesh Radhakrishnan 
> wrote:
> >
> > Agreed, will try to resolve ZK-2184. I have included this to 3.4.10
> > releasing. I could see few open review comments in the PR, probably will
> > push once this is concluded.
> >
> > Thanks,
> > Rakesh
> >
> > On Thu, Jan 26, 2017 at 2:01 AM, Flavio Junqueira 
> wrote:
> >
> >> I'd like to have ZK-2184 in as well. I have seen many cases in which
> >> applications are affected by that problem. If folks can help me push it
> >> through, I'd appreciate.
> >>
> >> -Flavio
> >>
> >>> On 25 Jan 2017, at 17:01, Rakesh Radhakrishnan 
> >> wrote:
> >>>
> >>> I've reviewed ZOOKEEPER-2044 pull request and added few comments. I
> hope
> >>> this will be committed soon.
> >>>
> >>> I'm planning to keep the CHANGE.txt file for this release. But, not
> >>> updating the commit history considering that git revision can be used
> as
> >> a
> >>> reference. Please see my comment https://goo.gl/wu5V2M in
> ZOOKEEPER-2672
> >>> jira.
> >>>
> >>> Sometime back, I've filtered the issues which was marked for 3.4.10 and
> >>> moved out these to 3.4.11 release.
> >>>
> >>> Thanks,
> >>> Rakesh
> >>>
> >>> On Wed, Jan 25, 2017 at 5:41 AM, Michael Han 
> wrote:
> >>>
>  Hi Rakesh,
> 
>  Thanks for driving 3.4.10 release.
> 
>  I've been looking at https://issues.apache.org/
> >> jira/browse/ZOOKEEPER-2044
>  today I think this could be a good addition to 3.4.10 release - what
> do
> >> you
>  think? Should we get this in 3.4.10?
> 
> 
>  On Tue, Jan 24, 2017 at 9:13 AM, Rakesh Radhakrishnan <
> >> rake...@apache.org>
>  wrote:
> 
> > Hi folks,
> >
> > ZOOKEEPER-2573 fix is agreed and will be resolved soon. After
> >> committing
> > this jira, I'm planning to start cutting a release candidate based on
> >> my
> > proposed "HowToRelease" ZK cwiki changes.
> >
> > Appreciate feedback on proposed ZK cwiki https://cwiki.apache.org/
> > confluence/display/ZOOKEEPER/HowToRelease changes. Please refer my
> > previous
> > mail to understand more about it.
> >
> > Thanks,
> > Rakesh
> >
> > On Tue, Jan 17, 2017 at 12:11 PM, Rakesh Radhakrishnan <
>  rake...@apache.org
> >>
> > wrote:
> >
> >> OK. I have modified ZK cwiki page https://cwiki.apache.org/
> >> confluence/display/ZOOKEEPER/HowToRelease directly. Please review
> the
> > newly
> >> added lines in orange color to understand the changes. The following
> >> sections has been modified:
> >>
> >>  - *Updating the release branch -> modified steps **1, 4, 10, 11*
> >>  - *Building -> modified step 9*
> >>  - *Publishing -> modified step 1*
> >>
> >> Thanks,
> >> Rakesh
> >>
> >> On Tue, Jan 17, 2017 at 11:36 AM, Patrick Hunt 
>  wrote:
> >>
> >>> Perhaps you can make the changes directly on the wiki page as a
> > duplicate
> >>> line item under the original in a different color? It's hard for me
> >> to
> >>> really follow, esp as it's not a 1:1 replacement iiuc. Could you
> try
> >>> editing the wiki directly to start with, leave the original line
> and
>  add
> >>> the new line(s) but in another color or some other indication?
> >>>
> >>> Thanks Rakesh.
> >>>
> >>> Patrick
> >>>
> >>> On Mon, Jan 16, 2017 at 8:48 AM, Rakesh Radhakrishnan <
> > rake...@apache.org
> 
> >>> wrote:
> >>>
>  Hi folks,
> 
>  As we all know, 3.4.10 release 

[jira] [Commented] (ZOOKEEPER-1416) Persistent Recursive Watch

2017-01-26 Thread Henrik Nordvik (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839573#comment-15839573
 ] 

Henrik Nordvik commented on ZOOKEEPER-1416:
---

When I read ZOOKEEPER-153 it looks to me that the arguments they have against 
persistent watches is that 1. it's not suitable for clients requiring getting 
every change (we use it as a cache, so we don't require every change), and 2. 
it doesn't provide a performance benefit when watching a single node since you 
need to get the data anyways, and you can set the watch again at the same time. 
However, this changes when you watch a tree of nodes. With a persistent 
recursive watch you don't need one watch per child znode, which reduces the 
amount of book keeping that both the client and the server has to do.

> Persistent Recursive Watch
> --
>
> Key: ZOOKEEPER-1416
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1416
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client, documentation, java client, server
>Reporter: Phillip Liu
>Assignee: Jordan Zimmerman
> Attachments: ZOOKEEPER-1416.patch, ZOOKEEPER-1416.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> h4. The Problem
> A ZooKeeper Watch can be placed on a single znode and when the znode changes 
> a Watch event is sent to the client. If there are thousands of znodes being 
> watched, when a client (re)connect, it would have to send thousands of watch 
> requests. At Facebook, we have this problem storing information for thousands 
> of db shards. Consequently a naming service that consumes the db shard 
> definition issues thousands of watch requests each time the service starts 
> and changes client watcher.
> h4. Proposed Solution
> We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent 
> means no Watch reset is necessary after a watch-fire. Recursive means the 
> Watch applies to the node and descendant nodes. A Persistent Recursive Watch 
> behaves as follows:
> # Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS.
> # CHILDREN and DATA Recursive Watches can be placed on any znode.
> # EXISTS Recursive Watches can be placed on any path.
> # A Recursive Watch behaves like a auto-watch registrar on the server side. 
> Setting a  Recursive Watch means to set watches on all descendant znodes.
> # When a watch on a descendant fires, no subsequent event is fired until a 
> corresponding getData(..) on the znode is called, then Recursive Watch 
> automically apply the watch on the znode. This maintains the existing Watch 
> semantic on an individual znode.
> # A Recursive Watch overrides any watches placed on a descendant znode. 
> Practically this means the Recursive Watch Watcher callback is the one 
> receiving the event and event is delivered exactly once.
> A goal here is to reduce the number of semantic changes. The guarantee of no 
> intermediate watch event until data is read will be maintained. The only 
> difference is we will automatically re-add the watch after read. At the same 
> time we add the convience of reducing the need to add multiple watches for 
> sibling znodes and in turn reduce the number of watch messages sent from the 
> client to the server.
> There are some implementation details that needs to be hashed out. Initial 
> thinking is to have the Recursive Watch create per-node watches. This will 
> cause a lot of watches to be created on the server side. Currently, each 
> watch is stored as a single bit in a bit set relative to a session - up to 3 
> bits per client per znode. If there are 100m znodes with 100k clients, each 
> watching all nodes, then this strategy will consume approximately 3.75TB of 
> ram distributed across all Observers. Seems expensive.
> Alternatively, a blacklist of paths to not send Watches regardless of Watch 
> setting can be set each time a watch event from a Recursive Watch is fired. 
> The memory utilization is relative to the number of outstanding reads and at 
> worst case it's 1/3 * 3.75TB using the parameters given above.
> Otherwise, a relaxation of no intermediate watch event until read guarantee 
> is required. If the server can send watch events regardless of one has 
> already been fired without corresponding read, then the server can simply 
> fire watch events without tracking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2672) Remove CHANGE.txt

2017-01-26 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839537#comment-15839537
 ] 

Flavio Junqueira commented on ZOOKEEPER-2672:
-

I'm not aware on any dependency on CHANGES.txt, so I'm +1 for removing it. 
According to the project bylaws, I'd say that this change corresponds to a 
change to the code base, and as such, the vote is by lazy approval, switching 
to lazy majority in the case of at least one -1, where the binding votes are 
from active committers.

> Remove CHANGE.txt
> -
>
> Key: ZOOKEEPER-2672
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2672
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.4.9, 3.5.2
>Reporter: Michael Han
>Assignee: Michael Han
>
> The CHANGE.txt is already not the source of truth of what's changed after we 
> migrating to git - most of the git commits in recent couple of months don't 
> update CHANGE.txt. The option of updating CHANGE.txt during commit flow 
> automatically is none trivial, and do that manually is cumbersome and error 
> prone.
> The consensus is we would rely on source control revision logs instead of 
> CHANGE.txt moving forward; see 
> https://www.mail-archive.com/dev@zookeeper.apache.org/msg37108.html for more 
> details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: ZooKeeper 3.4.10 release discussion

2017-01-26 Thread Flavio Junqueira
Here are a few comments on the proposal of changes to the release process:

- It might be a better idea to preserve the HowToRelease document for future 
reference, clone the document, and change the cloned document to reflect the 
git commands rather than svn.  
- We still need to modify Step 2 to be git oriented, otherwise it will look odd 
that we have svn there.
- In Step 4, I thought that we had informally agreed to rely on the git log 
rather than maintain the CHANGES.txt file. If we aren't all onboard with the 
idea of stopping to use CHANGES.txt, then we need to discuss this separately.
- Steps 5 and 6: I'm not sure why the steps to produce the release notes 
changes. We still resolve issues on jira which is pretty much the source of 
data for the release notes.
- Step 10: I personally don't like using "git commit -a" unless you're pretty 
sure that it is what you want. A much safer approach is to run "git status" and 
"git add" to the individual files/directories.
- Step 11: Why are we tagging with -s? Is that standard practice in other 
projects?

-Flavio

> On 26 Jan 2017, at 03:30, Rakesh Radhakrishnan  wrote:
> 
> Agreed, will try to resolve ZK-2184. I have included this to 3.4.10
> releasing. I could see few open review comments in the PR, probably will
> push once this is concluded.
> 
> Thanks,
> Rakesh
> 
> On Thu, Jan 26, 2017 at 2:01 AM, Flavio Junqueira  wrote:
> 
>> I'd like to have ZK-2184 in as well. I have seen many cases in which
>> applications are affected by that problem. If folks can help me push it
>> through, I'd appreciate.
>> 
>> -Flavio
>> 
>>> On 25 Jan 2017, at 17:01, Rakesh Radhakrishnan 
>> wrote:
>>> 
>>> I've reviewed ZOOKEEPER-2044 pull request and added few comments. I hope
>>> this will be committed soon.
>>> 
>>> I'm planning to keep the CHANGE.txt file for this release. But, not
>>> updating the commit history considering that git revision can be used as
>> a
>>> reference. Please see my comment https://goo.gl/wu5V2M in ZOOKEEPER-2672
>>> jira.
>>> 
>>> Sometime back, I've filtered the issues which was marked for 3.4.10 and
>>> moved out these to 3.4.11 release.
>>> 
>>> Thanks,
>>> Rakesh
>>> 
>>> On Wed, Jan 25, 2017 at 5:41 AM, Michael Han  wrote:
>>> 
 Hi Rakesh,
 
 Thanks for driving 3.4.10 release.
 
 I've been looking at https://issues.apache.org/
>> jira/browse/ZOOKEEPER-2044
 today I think this could be a good addition to 3.4.10 release - what do
>> you
 think? Should we get this in 3.4.10?
 
 
 On Tue, Jan 24, 2017 at 9:13 AM, Rakesh Radhakrishnan <
>> rake...@apache.org>
 wrote:
 
> Hi folks,
> 
> ZOOKEEPER-2573 fix is agreed and will be resolved soon. After
>> committing
> this jira, I'm planning to start cutting a release candidate based on
>> my
> proposed "HowToRelease" ZK cwiki changes.
> 
> Appreciate feedback on proposed ZK cwiki https://cwiki.apache.org/
> confluence/display/ZOOKEEPER/HowToRelease changes. Please refer my
> previous
> mail to understand more about it.
> 
> Thanks,
> Rakesh
> 
> On Tue, Jan 17, 2017 at 12:11 PM, Rakesh Radhakrishnan <
 rake...@apache.org
>> 
> wrote:
> 
>> OK. I have modified ZK cwiki page https://cwiki.apache.org/
>> confluence/display/ZOOKEEPER/HowToRelease directly. Please review the
> newly
>> added lines in orange color to understand the changes. The following
>> sections has been modified:
>> 
>>  - *Updating the release branch -> modified steps **1, 4, 10, 11*
>>  - *Building -> modified step 9*
>>  - *Publishing -> modified step 1*
>> 
>> Thanks,
>> Rakesh
>> 
>> On Tue, Jan 17, 2017 at 11:36 AM, Patrick Hunt 
 wrote:
>> 
>>> Perhaps you can make the changes directly on the wiki page as a
> duplicate
>>> line item under the original in a different color? It's hard for me
>> to
>>> really follow, esp as it's not a 1:1 replacement iiuc. Could you try
>>> editing the wiki directly to start with, leave the original line and
 add
>>> the new line(s) but in another color or some other indication?
>>> 
>>> Thanks Rakesh.
>>> 
>>> Patrick
>>> 
>>> On Mon, Jan 16, 2017 at 8:48 AM, Rakesh Radhakrishnan <
> rake...@apache.org
 
>>> wrote:
>>> 
 Hi folks,
 
 As we all know, 3.4.10 release is the first ZooKeeper release after
> the
 github repository migration. I have tried an attempt to modify the
> steps
 described in the '
 https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToRelease'
>>> page
 to
 make the release. Since this release is from an already created
> branch,
>>> I
 have focused only the branch related parts in cwiki and below
 sections
>>> in

[jira] [Commented] (ZOOKEEPER-2395) allow ant command line control of junit test jvm args

2017-01-26 Thread Edward Ribeiro (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839502#comment-15839502
 ] 

Edward Ribeiro commented on ZOOKEEPER-2395:
---

oops, *excuse me* [~hanm] for skipping this comment on ZK-2664, shame on me. :(

Agree with you. 

> allow ant command line control of junit test jvm args
> -
>
> Key: ZOOKEEPER-2395
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2395
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: build, tests
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
> Fix For: 3.5.3, 3.6.0, 3.4.11
>
>
> We're seeing some failing jobs (see below) and the speculation is that it 
> might be due to ipv6 vs ipv4 usage. It would be nice to turn on "prefer ipv4" 
> in the jvm but there is no easy way to do that. I'll propose that we add a 
> variable to ant that's passed through to the jvm.
> 
> This is very odd. It failed 2 of the last three times it was run on H9
> with the following:
> 2016-03-20 06:06:18,480 [myid:] - INFO
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@74] - TEST METHOD FAILED
> testBindByAddress
> java.net.SocketException: No such device
> at java.net.NetworkInterface.isLoopback0(Native Method)
> at java.net.NetworkInterface.isLoopback(NetworkInterface.java:339)
> at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
> https://builds.apache.org/job/ZooKeeper_branch34/buildTimeTrend
> Why would it pass one of the times though if there is no loopback
> device on the host? That seems very odd!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


ZooKeeper-trunk-solaris - Build # 1480 - Still Failing

2017-01-26 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-solaris/1480/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 467006 lines...]
[junit] 2017-01-26 08:23:11,378 [myid:] - INFO  [main:ClientBase@401] - 
CREATING server instance 127.0.0.1:11222
[junit] 2017-01-26 08:23:11,378 [myid:] - INFO  
[main:NIOServerCnxnFactory@673] - Configuring NIO connection handler with 10s 
sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 
kB direct buffers.
[junit] 2017-01-26 08:23:11,379 [myid:] - INFO  
[main:NIOServerCnxnFactory@686] - binding to port 0.0.0.0/0.0.0.0:11222
[junit] 2017-01-26 08:23:11,380 [myid:] - INFO  [main:ClientBase@376] - 
STARTING server instance 127.0.0.1:11222
[junit] 2017-01-26 08:23:11,380 [myid:] - INFO  [main:ZooKeeperServer@894] 
- minSessionTimeout set to 6000
[junit] 2017-01-26 08:23:11,380 [myid:] - INFO  [main:ZooKeeperServer@903] 
- maxSessionTimeout set to 6
[junit] 2017-01-26 08:23:11,380 [myid:] - INFO  [main:ZooKeeperServer@160] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/build/test/tmp/test2622701376991085976.junit.dir/version-2
 snapdir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/build/test/tmp/test2622701376991085976.junit.dir/version-2
[junit] 2017-01-26 08:23:11,381 [myid:] - INFO  [main:FileSnap@83] - 
Reading snapshot 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/build/test/tmp/test2622701376991085976.junit.dir/version-2/snapshot.b
[junit] 2017-01-26 08:23:11,383 [myid:] - INFO  [main:FileTxnSnapLog@346] - 
Snapshotting: 0xb to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/build/test/tmp/test2622701376991085976.junit.dir/version-2/snapshot.b
[junit] 2017-01-26 08:23:11,384 [myid:] - ERROR [main:ZooKeeperServer@506] 
- ZKShutdownHandler is not registered, so ZooKeeper server won't take any 
action on ERROR or SHUTDOWN server state changes
[junit] 2017-01-26 08:23:11,384 [myid:] - INFO  
[main:FourLetterWordMain@85] - connecting to 127.0.0.1 11222
[junit] 2017-01-26 08:23:11,385 [myid:] - INFO  
[NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11222:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:58510
[junit] 2017-01-26 08:23:11,386 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@485] - Processing stat command from 
/127.0.0.1:58510
[junit] 2017-01-26 08:23:11,386 [myid:] - INFO  
[NIOWorkerThread-1:StatCommand@49] - Stat command output
[junit] 2017-01-26 08:23:11,386 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@614] - Closed socket connection for client 
/127.0.0.1:58510 (no session established for client)
[junit] 2017-01-26 08:23:11,386 [myid:] - INFO  [main:JMXEnv@228] - 
ensureParent:[InMemoryDataTree, StandaloneServer_port]
[junit] 2017-01-26 08:23:11,387 [myid:] - INFO  [main:JMXEnv@245] - 
expect:InMemoryDataTree
[junit] 2017-01-26 08:23:11,388 [myid:] - INFO  [main:JMXEnv@249] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port11222,name1=InMemoryDataTree
[junit] 2017-01-26 08:23:11,388 [myid:] - INFO  [main:JMXEnv@245] - 
expect:StandaloneServer_port
[junit] 2017-01-26 08:23:11,388 [myid:] - INFO  [main:JMXEnv@249] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port11222
[junit] 2017-01-26 08:23:11,388 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 17907
[junit] 2017-01-26 08:23:11,388 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 24
[junit] 2017-01-26 08:23:11,389 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD 
testQuota
[junit] 2017-01-26 08:23:11,389 [myid:] - INFO  [main:ClientBase@558] - 
tearDown starting
[junit] 2017-01-26 08:23:11,462 [myid:] - INFO  [main:ZooKeeper@1324] - 
Session: 0x12656d84b56 closed
[junit] 2017-01-26 08:23:11,462 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down for 
session: 0x12656d84b56
[junit] 2017-01-26 08:23:11,462 [myid:] - INFO  [main:ClientBase@528] - 
STOPPING server
[junit] 2017-01-26 08:23:11,463 [myid:] - INFO  
[ConnnectionExpirer:NIOServerCnxnFactory$ConnectionExpirerThread@583] - 
ConnnectionExpirerThread interrupted
[junit] 2017-01-26 08:23:11,463 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] 
- selector thread exitted run method
[junit] 2017-01-26 08:23:11,463 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-1:NIOServerCnxnFactory$SelectorThread@420] 
-