[jira] [Commented] (ZOOKEEPER-1715) Upgrade netty version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788943#comment-13788943 ] Patrick Hunt commented on ZOOKEEPER-1715: - Looks like ZOOKEEPER-1763 is a dup of ZOOKEEPER-1715 Upgrade netty version - Key: ZOOKEEPER-1715 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1715 Project: ZooKeeper Issue Type: Improvement Affects Versions: 3.4.5 Environment: zookeeper 3.4.5 uses netty 3.2.2, which was released in August 2010. The latest version of netty is 3.6.6 released May 2013. Zookeeper should consider upgrading. Reporter: Sean Bridges Assignee: Sean Bridges Fix For: 3.5.0 Attachments: ZOOKEEPER-1715-2.patch, zookeeper-1715.patch, ZOOKEEPER-1715.patch Upgrade netty version -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1732) ZooKeeper server unable to join established ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788947#comment-13788947 ] Germán Blanco commented on ZOOKEEPER-1732: -- Is there anything else to be done in this one? ZooKeeper server unable to join established ensemble Key: ZOOKEEPER-1732 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.4.5 Environment: Windows 7, Java 1.7 Reporter: Germán Blanco Assignee: Germán Blanco Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: CREATE_INCONSISTENCIES_patch.txt, zklog.tar.gz, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch I have a test in which I do a rolling restart of three ZooKeeper servers and it was failing from time to time. I ran the tests in a loop until the failure came out and it seems that at some point one of the servers is unable to join the enssemble formed by the other two. -- This message was sent by Atlassian JIRA (v6.1#6144)
Failed: ZOOKEEPER-1715 PreCommit Build #1653
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1715 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1653/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 322684 lines...] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12590934/zookeeper-1715.patch [exec] against trunk revision 1530158. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1653//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1653//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1653//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] de233301f7dd6cf462c432bd5305e2d622f86b9e logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 1 Total time: 38 minutes 9 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1715 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788954#comment-13788954 ] Germán Blanco commented on ZOOKEEPER-832: - Hello again, would it make sense to have the client API report a new event to the application (INCONSISTENCY DETECTED), and then the application takes whatever action is required? The server still needs to reset session information, so that event would need to be sneaked in before the client is disconnected (I guess that after the session information is reset in the server, the client will be disconnected). Invalid session id causes infinite loop during automatic reconnect -- Key: ZOOKEEPER-832 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832 Project: ZooKeeper Issue Type: Improvement Components: c client, java client Affects Versions: 3.3.1 Environment: Mac OS X 10.6.4 JVM 1.6.0_20 Reporter: Ryan Holmes Assignee: Germán Blanco Fix For: 3.5.0 Steps to reproduce: 1.) Connect to a standalone server using the Java client. 2.) Stop the server. 3.) Delete the contents of the data directory (i.e. the persisted session data). 4.) Start the server. The client now automatically tries to reconnect but the server refuses the connection because the session id is invalid. The client and server are now in an infinite loop of attempted and rejected connections. While this situation represents a catastrophic failure and the current behavior is not incorrect, it appears that there is no way to detect this situation on the client and therefore no way to recover. The suggested improvement is to send an event to the default watcher indicating that the current state is session invalid, similar to how the session expired state is handled. Server log output (repeats indefinitely): 2010-08-05 11:48:08,283 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - Accepted socket connection from /127.0.0.1:63292 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last zxid is 0x0 client must try another server 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed socket connection for client /127.0.0.1:63292 (no session established for client) Client log output (repeats indefinitely): 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - Opening socket connection to server localhost/127.0.0.1:2181 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 0x12a3ae4e893000a for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring exception during shutdown output java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1715) Upgrade netty version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788957#comment-13788957 ] Hadoop QA commented on ZOOKEEPER-1715: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12590934/zookeeper-1715.patch against trunk revision 1530158. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1653//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1653//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1653//console This message is automatically generated. Upgrade netty version - Key: ZOOKEEPER-1715 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1715 Project: ZooKeeper Issue Type: Improvement Affects Versions: 3.4.5 Environment: zookeeper 3.4.5 uses netty 3.2.2, which was released in August 2010. The latest version of netty is 3.6.6 released May 2013. Zookeeper should consider upgrading. Reporter: Sean Bridges Assignee: Sean Bridges Fix For: 3.5.0 Attachments: ZOOKEEPER-1715-2.patch, zookeeper-1715.patch, ZOOKEEPER-1715.patch Upgrade netty version -- This message was sent by Atlassian JIRA (v6.1#6144)
Failed: ZOOKEEPER-1756 PreCommit Build #1654
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1756 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1654/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 285184 lines...] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12607319/ZOOKEEPER-1756.patch [exec] against trunk revision 1530158. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1654//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1654//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1654//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 9988237dc8ab8af82658fc726cb573acd018ef2e logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 1 Total time: 31 minutes 41 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1756 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Updated] (ZOOKEEPER-1147) Add support for local sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1147: - Attachment: ZOOKEEPER-1147.patch Minor conflict with the current patch fails on applying with QuorumPeerMain.java - attaching a new one which fixes the conflict. Add support for local sessions -- Key: ZOOKEEPER-1147 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.3.3 Reporter: Vishal Kathuria Assignee: Thawan Kooburat Labels: api-change, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch Original Estimate: 840h Remaining Estimate: 840h This improvement is in the bucket of making ZooKeeper work at a large scale. We are planning on having about a 1 million clients connect to a ZooKeeper ensemble through a set of 50-100 observers. Majority of these clients are read only - ie they do not do any updates or create ephemeral nodes. In ZooKeeper today, the client creates a session and the session creation is handled like any other update. In the above use case, the session create/drop workload can easily overwhelm an ensemble. The following is a proposal for a local session, to support a larger number of connections. 1. The idea is to introduce a new type of session - local session. A local session doesn't have a full functionality of a normal session. 2. Local sessions cannot create ephemeral nodes. 3. Once a local session is lost, you cannot re-establish it using the session-id/password. The session and its watches are gone for good. 4. When a local session connects, the session info is only maintained on the zookeeper server (in this case, an observer) that it is connected to. The leader is not aware of the creation of such a session and there is no state written to disk. 5. The pings and expiration is handled by the server that the session is connected to. With the above changes, we can make ZooKeeper scale to a much larger number of clients without making the core ensemble a bottleneck. In terms of API, there are two options that are being considered 1. Let the client specify at the connect time which kind of session do they want. 2. All sessions connect as local sessions and automatically get promoted to global sessions when they do an operation that requires a global session (e.g. creating an ephemeral node) Chubby took the approach of lazily promoting all sessions to global, but I don't think that would work in our case, where we want to keep sessions which never create ephemeral nodes as always local. Option 2 would make it more broadly usable but option 1 would be easier to implement. We are thinking of implementing option 1 as the first cut. There would be a client flag, IsLocalSession (much like the current readOnly flag) that would be used to determine whether to create a local session or a global session. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788998#comment-13788998 ] Mahadev konar commented on ZOOKEEPER-1147: -- [~fpj] looks like the patch is ready to get in. You want to look through before we commit? Add support for local sessions -- Key: ZOOKEEPER-1147 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.3.3 Reporter: Vishal Kathuria Assignee: Thawan Kooburat Labels: api-change, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch Original Estimate: 840h Remaining Estimate: 840h This improvement is in the bucket of making ZooKeeper work at a large scale. We are planning on having about a 1 million clients connect to a ZooKeeper ensemble through a set of 50-100 observers. Majority of these clients are read only - ie they do not do any updates or create ephemeral nodes. In ZooKeeper today, the client creates a session and the session creation is handled like any other update. In the above use case, the session create/drop workload can easily overwhelm an ensemble. The following is a proposal for a local session, to support a larger number of connections. 1. The idea is to introduce a new type of session - local session. A local session doesn't have a full functionality of a normal session. 2. Local sessions cannot create ephemeral nodes. 3. Once a local session is lost, you cannot re-establish it using the session-id/password. The session and its watches are gone for good. 4. When a local session connects, the session info is only maintained on the zookeeper server (in this case, an observer) that it is connected to. The leader is not aware of the creation of such a session and there is no state written to disk. 5. The pings and expiration is handled by the server that the session is connected to. With the above changes, we can make ZooKeeper scale to a much larger number of clients without making the core ensemble a bottleneck. In terms of API, there are two options that are being considered 1. Let the client specify at the connect time which kind of session do they want. 2. All sessions connect as local sessions and automatically get promoted to global sessions when they do an operation that requires a global session (e.g. creating an ephemeral node) Chubby took the approach of lazily promoting all sessions to global, but I don't think that would work in our case, where we want to keep sessions which never create ephemeral nodes as always local. Option 2 would make it more broadly usable but option 1 would be easier to implement. We are thinking of implementing option 1 as the first cut. There would be a client flag, IsLocalSession (much like the current readOnly flag) that would be used to determine whether to create a local session or a global session. -- This message was sent by Atlassian JIRA (v6.1#6144)
Success: ZOOKEEPER-1147 PreCommit Build #1655
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1147 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1655/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 310900 lines...] [exec] BUILD SUCCESSFUL [exec] Total time: 0 seconds [exec] [exec] [exec] [exec] [exec] +1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12607325/ZOOKEEPER-1147.patch [exec] against trunk revision 1530166. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 33 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1655//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1655//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1655//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] b5f9ffa97e8e23fc0cabc5fc2638600538e6431a logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD SUCCESSFUL Total time: 32 minutes 47 seconds Archiving artifacts Recording test results Description set: ZOOKEEPER-1147 Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-1756) zookeeper_interest() in C client can return a timeval of 0
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789016#comment-13789016 ] Eric Lindvall commented on ZOOKEEPER-1756: -- Thanks! zookeeper_interest() in C client can return a timeval of 0 -- Key: ZOOKEEPER-1756 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1756 Project: ZooKeeper Issue Type: Bug Components: c client Affects Versions: 3.3.4, 3.4.5 Reporter: Eric Lindvall Assignee: Eric Lindvall Fix For: 3.4.6, 3.5.0 Attachments: 0001-Ensure-send_to-is-positive.patch, 0001-Ensure-send_to-is-positive.patch, ZOOKEEPER-1756-br34.patch, ZOOKEEPER-1756.patch, zookeeper-3.4.5-send_to-fix.patch If the client is connected to a zookeeper server that has hung while there is an outstanding request, zookeeper_interest() can return a timeval of 0 because send_to will be negative. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789018#comment-13789018 ] Hadoop QA commented on ZOOKEEPER-1147: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607325/ZOOKEEPER-1147.patch against trunk revision 1530166. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 33 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1655//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1655//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1655//console This message is automatically generated. Add support for local sessions -- Key: ZOOKEEPER-1147 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.3.3 Reporter: Vishal Kathuria Assignee: Thawan Kooburat Labels: api-change, scaling Fix For: 3.5.0 Attachments: ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch Original Estimate: 840h Remaining Estimate: 840h This improvement is in the bucket of making ZooKeeper work at a large scale. We are planning on having about a 1 million clients connect to a ZooKeeper ensemble through a set of 50-100 observers. Majority of these clients are read only - ie they do not do any updates or create ephemeral nodes. In ZooKeeper today, the client creates a session and the session creation is handled like any other update. In the above use case, the session create/drop workload can easily overwhelm an ensemble. The following is a proposal for a local session, to support a larger number of connections. 1. The idea is to introduce a new type of session - local session. A local session doesn't have a full functionality of a normal session. 2. Local sessions cannot create ephemeral nodes. 3. Once a local session is lost, you cannot re-establish it using the session-id/password. The session and its watches are gone for good. 4. When a local session connects, the session info is only maintained on the zookeeper server (in this case, an observer) that it is connected to. The leader is not aware of the creation of such a session and there is no state written to disk. 5. The pings and expiration is handled by the server that the session is connected to. With the above changes, we can make ZooKeeper scale to a much larger number of clients without making the core ensemble a bottleneck. In terms of API, there are two options that are being considered 1. Let the client specify at the connect time which kind of session do they want. 2. All sessions connect as local sessions and automatically get promoted to global sessions when they do an operation that requires a global session (e.g. creating an ephemeral node) Chubby took the approach of lazily promoting all sessions to global, but I don't think that would work in our case, where we want to keep sessions which never create ephemeral nodes as always local. Option 2 would make it more broadly usable but option 1 would be easier to implement. We are thinking of implementing option 1 as the first cut. There would be a client flag, IsLocalSession (much like the current readOnly flag) that would be used to determine whether to create a local session or a global session. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1551) Observers ignore txns that come after snapshot and UPTODATE
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789047#comment-13789047 ] Flavio Junqueira commented on ZOOKEEPER-1551: - b3.4: 1530035 trunk: 1530029 Observers ignore txns that come after snapshot and UPTODATE Key: ZOOKEEPER-1551 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551 Project: ZooKeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.4.3 Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1551-3.4.patch, ZOOKEEPER-1551-b3.4.patch, ZOOKEEPER-1551.patch, ZOOKEEPER-1551.patch, ZOOKEEPER-1551-trunk.patch, ZOOKEEPER-1551-trunk.patch, ZOOKEEPER-1551-trunk.patch In Learner.java, txns which comes after the learner has taken the snapshot (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has special logic to apply these txns at the end of syncWithLeader() method. However, the observer will ignore these txns completely, causing data inconsistency. -- This message was sent by Atlassian JIRA (v6.1#6144)
ZooKeeper-trunk-solaris - Build # 694 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-solaris/694/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 208008 lines...] [junit] 2013-10-08 09:07:33,034 [myid:] - INFO [NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] - selector thread exitted run method [junit] 2013-10-08 09:07:33,036 [myid:] - INFO [main:ZooKeeperServer@422] - shutting down [junit] 2013-10-08 09:07:33,036 [myid:] - INFO [main:SessionTrackerImpl@180] - Shutting down [junit] 2013-10-08 09:07:33,036 [myid:] - INFO [main:PrepRequestProcessor@929] - Shutting down [junit] 2013-10-08 09:07:33,036 [myid:] - INFO [main:SyncRequestProcessor@190] - Shutting down [junit] 2013-10-08 09:07:33,036 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@156] - PrepRequestProcessor exited loop! [junit] 2013-10-08 09:07:33,037 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@168] - SyncRequestProcessor exited! [junit] 2013-10-08 09:07:33,037 [myid:] - INFO [main:FinalRequestProcessor@427] - shutdown of request processor complete [junit] 2013-10-08 09:07:33,038 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-08 09:07:33,038 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-10-08 09:07:33,039 [myid:] - INFO [main:ClientBase@414] - STARTING server [junit] 2013-10-08 09:07:33,040 [myid:] - INFO [main:ZooKeeperServer@149] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test3393565285669046510.junit.dir/version-2 snapdir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test3393565285669046510.junit.dir/version-2 [junit] 2013-10-08 09:07:33,040 [myid:] - INFO [main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 kB direct buffers. [junit] 2013-10-08 09:07:33,041 [myid:] - INFO [main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-10-08 09:07:33,042 [myid:] - INFO [main:FileSnap@83] - Reading snapshot /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test3393565285669046510.junit.dir/version-2/snapshot.b [junit] 2013-10-08 09:07:33,045 [myid:] - INFO [main:FileTxnSnapLog@297] - Snapshotting: 0xb to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test3393565285669046510.junit.dir/version-2/snapshot.b [junit] 2013-10-08 09:07:33,047 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-08 09:07:33,048 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:58144 [junit] 2013-10-08 09:07:33,049 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@828] - Processing stat command from /127.0.0.1:58144 [junit] 2013-10-08 09:07:33,050 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@677] - Stat command output [junit] 2013-10-08 09:07:33,050 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@999] - Closed socket connection for client /127.0.0.1:58144 (no session established for client) [junit] 2013-10-08 09:07:33,050 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-10-08 09:07:33,052 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-10-08 09:07:33,052 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-10-08 09:07:33,052 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-10-08 09:07:33,052 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-10-08 09:07:33,053 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-10-08 09:07:33,053 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-10-08 09:07:33,113 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down [junit] 2013-10-08 09:07:33,113 [myid:] - INFO [main:ZooKeeper@777] - Session: 0x14197524173 closed [junit] 2013-10-08 09:07:33,113 [myid:] - INFO [main:ClientBase@421] - STOPPING server [junit] 2013-10-08 09:07:33,114 [myid:] - INFO
ZooKeeper-3.4-WinVS2008_java - Build # 318 - Still Failing
See https://builds.apache.org/job/ZooKeeper-3.4-WinVS2008_java/318/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 236893 lines...] [junit] 2013-10-08 11:16:08,530 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@968] - Opening socket connection to server 127.0.0.1/127.0.0.1:11221. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration) [junit] 2013-10-08 11:16:08,942 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-10-08 11:16:08,943 [myid:] - INFO [main:ClientBase@414] - STARTING server [junit] 2013-10-08 11:16:08,943 [myid:] - INFO [main:ZooKeeperServer@162] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir f:\hudson\hudson-slave\workspace\ZooKeeper-3.4-WinVS2008_java\branch-3.4\build\test\tmp\test1323789941894009492.junit.dir\version-2 snapdir f:\hudson\hudson-slave\workspace\ZooKeeper-3.4-WinVS2008_java\branch-3.4\build\test\tmp\test1323789941894009492.junit.dir\version-2 [junit] 2013-10-08 11:16:08,945 [myid:] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-10-08 11:16:09,001 [myid:] - INFO [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop! [junit] 2013-10-08 11:16:09,000 [myid:] - INFO [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop! [junit] 2013-10-08 11:16:09,044 [myid:] - INFO [main:FileSnap@83] - Reading snapshot f:\hudson\hudson-slave\workspace\ZooKeeper-3.4-WinVS2008_java\branch-3.4\build\test\tmp\test1323789941894009492.junit.dir\version-2\snapshot.b [junit] 2013-10-08 11:16:09,047 [myid:] - INFO [main:FileTxnSnapLog@240] - Snapshotting: 0xb to f:\hudson\hudson-slave\workspace\ZooKeeper-3.4-WinVS2008_java\branch-3.4\build\test\tmp\test1323789941894009492.junit.dir\version-2\snapshot.b [junit] 2013-10-08 11:16:09,146 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-08 11:16:09,147 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:53756 [junit] 2013-10-08 11:16:09,147 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@817] - Processing stat command from /127.0.0.1:53756 [junit] 2013-10-08 11:16:09,148 [myid:] - INFO [Thread-5:NIOServerCnxn$StatCommand@653] - Stat command output [junit] 2013-10-08 11:16:09,244 [myid:] - INFO [Thread-5:NIOServerCnxn@997] - Closed socket connection for client /127.0.0.1:53756 (no session established for client) [junit] 2013-10-08 11:16:09,245 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-10-08 11:16:09,246 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-10-08 11:16:09,246 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-10-08 11:16:09,345 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-10-08 11:16:09,345 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-10-08 11:16:09,345 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-10-08 11:16:09,346 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-10-08 11:16:09,528 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@849] - Socket connection established to 127.0.0.1/127.0.0.1:11221, initiating session [junit] 2013-10-08 11:16:09,528 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:53753 [junit] 2013-10-08 11:16:09,529 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:ZooKeeperServer@863] - Client attempting to renew session 0x141979104c6 at /127.0.0.1:53753 [junit] 2013-10-08 11:16:09,546 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:ZooKeeperServer@619] - Established session 0x141979104c6 with negotiated timeout 3 for client /127.0.0.1:53753 [junit] 2013-10-08 11:16:09,547 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@1228] - Session establishment complete on server 127.0.0.1/127.0.0.1:11221, sessionid = 0x141979104c6, negotiated timeout = 3 [junit] 2013-10-08 11:16:09,547 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x141979104c6 [junit] 2013-10-08 11:16:09,647 [myid:] - INFO [SyncThread:0:FileTxnLog@199] - Creating new log
ZooKeeper-trunk-WinVS2008_java - Build # 566 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-WinVS2008_java/566/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 256376 lines...] [junit] 2013-10-08 11:25:08,181 [myid:] - INFO [main:ZooKeeperServer@149] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test8407143303493842632.junit.dir\version-2 snapdir f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test8407143303493842632.junit.dir\version-2 [junit] 2013-10-08 11:25:08,190 [myid:] - INFO [main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s sessionless connection timeout, 1 selector thread(s), 4 worker threads, and 64 kB direct buffers. [junit] 2013-10-08 11:25:08,191 [myid:] - INFO [main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-10-08 11:25:08,193 [myid:] - INFO [main:FileSnap@83] - Reading snapshot f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test8407143303493842632.junit.dir\version-2\snapshot.b [junit] 2013-10-08 11:25:08,282 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:55106 [junit] 2013-10-08 11:25:08,282 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@882] - Socket connection established to 127.0.0.1/127.0.0.1:11221, initiating session [junit] 2013-10-08 11:25:08,293 [myid:] - INFO [main:FileTxnSnapLog@297] - Snapshotting: 0xb to f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test8407143303493842632.junit.dir\version-2\snapshot.b [junit] 2013-10-08 11:25:08,294 [myid:] - WARN [NIOWorkerThread-1:NIOServerCnxn@365] - Exception causing close of session 0x0: ZooKeeperServer not running [junit] 2013-10-08 11:25:08,391 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@999] - Closed socket connection for client /127.0.0.1:55106 (no session established for client) [junit] 2013-10-08 11:25:08,392 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@1124] - Unable to read additional data from server sessionid 0x14197993d4d, likely server has closed socket, closing socket connection and attempting reconnect [junit] 2013-10-08 11:25:08,393 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-08 11:25:08,493 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:55111 [junit] 2013-10-08 11:25:08,494 [myid:] - INFO [NIOWorkerThread-2:NIOServerCnxn@828] - Processing stat command from /127.0.0.1:55111 [junit] 2013-10-08 11:25:08,494 [myid:] - INFO [NIOWorkerThread-2:NIOServerCnxn$StatCommand@677] - Stat command output [junit] 2013-10-08 11:25:08,593 [myid:] - INFO [NIOWorkerThread-2:NIOServerCnxn@999] - Closed socket connection for client /127.0.0.1:55111 (no session established for client) [junit] 2013-10-08 11:25:08,594 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-10-08 11:25:08,595 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-10-08 11:25:08,595 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-10-08 11:25:08,693 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-10-08 11:25:08,693 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-10-08 11:25:08,693 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-10-08 11:25:08,693 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-10-08 11:25:09,001 [myid:] - INFO [SessionTracker:SessionTrackerImpl@135] - SessionTrackerImpl exited loop! [junit] 2013-10-08 11:25:10,238 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@1008] - Opening socket connection to server 127.0.0.1/127.0.0.1:11221. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration) [junit] 2013-10-08 11:25:10,239 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@882] - Socket connection established to 127.0.0.1/127.0.0.1:11221, initiating session [junit] 2013-10-08 11:25:10,239 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted
ZooKeeper-trunk-jdk7 - Build # 675 - Failure
See https://builds.apache.org/job/ZooKeeper-trunk-jdk7/675/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 208355 lines...] [junit] 2013-10-08 10:27:00,768 [myid:] - INFO [main:Environment@99] - Server environment:user.dir=/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk [junit] 2013-10-08 10:27:00,780 [myid:] - INFO [main:ZooKeeperServer@149] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test7343977656047055373.junit.dir/version-2 snapdir /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test7343977656047055373.junit.dir/version-2 [junit] 2013-10-08 10:27:00,792 [myid:] - INFO [main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 kB direct buffers. [junit] 2013-10-08 10:27:00,801 [myid:] - INFO [main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-10-08 10:27:00,801 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-10-08 10:27:00,801 [myid:] - INFO [main:ClientBase@421] - STOPPING server [junit] 2013-10-08 10:27:00,802 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-10-08 10:27:00,806 [myid:] - INFO [main:ClientBase@476] - fdcount after test is: 45 at start it was 30 [junit] 2013-10-08 10:27:00,806 [myid:] - INFO [main:ClientBase@478] - sleeping for 20 secs [junit] 2013-10-08 10:27:00,808 [myid:] - INFO [main:ZKTestCase$1@66] - FAILED testQuota [junit] java.net.BindException: Address already in use [junit] at sun.nio.ch.Net.bind0(Native Method) [junit] at sun.nio.ch.Net.bind(Net.java:444) [junit] at sun.nio.ch.Net.bind(Net.java:436) [junit] at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) [junit] at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) [junit] at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) [junit] at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:684) [junit] at org.apache.zookeeper.server.ServerCnxnFactory.createFactory(ServerCnxnFactory.java:127) [junit] at org.apache.zookeeper.server.ServerCnxnFactory.createFactory(ServerCnxnFactory.java:120) [junit] at org.apache.zookeeper.test.ClientBase.createNewServerInstance(ClientBase.java:335) [junit] at org.apache.zookeeper.test.ClientBase.startServer(ClientBase.java:415) [junit] at org.apache.zookeeper.test.ClientBase.setUp(ClientBase.java:408) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit] at java.lang.reflect.Method.invoke(Method.java:606) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) [junit] at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) [junit] at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) [junit] at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:52) [junit] at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) [junit] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:69) [junit] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:48) [junit] at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) [junit] at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) [junit] at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) [junit] at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) [junit] at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) [junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:292) [junit] at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052) [junit] at
ZooKeeper_branch34_openjdk7 - Build # 362 - Failure
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/362/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 303389 lines...] [junit] 2013-10-08 10:31:01,811 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-10-08 10:31:01,811 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-10-08 10:31:01,811 [myid:] - INFO [main:ClientBase@421] - STOPPING server [junit] 2013-10-08 10:31:01,811 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@224] - NIOServerCnxn factory exited run method [junit] 2013-10-08 10:31:01,812 [myid:] - INFO [main:ZooKeeperServer@443] - shutting down [junit] 2013-10-08 10:31:01,812 [myid:] - INFO [main:SessionTrackerImpl@225] - Shutting down [junit] 2013-10-08 10:31:01,812 [myid:] - INFO [main:PrepRequestProcessor@743] - Shutting down [junit] 2013-10-08 10:31:01,812 [myid:] - INFO [main:SyncRequestProcessor@190] - Shutting down [junit] 2013-10-08 10:31:01,812 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop! [junit] 2013-10-08 10:31:01,812 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@168] - SyncRequestProcessor exited! [junit] 2013-10-08 10:31:01,813 [myid:] - INFO [main:FinalRequestProcessor@415] - shutdown of request processor complete [junit] 2013-10-08 10:31:01,813 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-08 10:31:01,814 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-10-08 10:31:01,815 [myid:] - INFO [main:ClientBase@414] - STARTING server [junit] 2013-10-08 10:31:01,815 [myid:] - INFO [main:ZooKeeperServer@162] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/branch-3.4/build/test/tmp/test5336527926794013213.junit.dir/version-2 snapdir /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/branch-3.4/build/test/tmp/test5336527926794013213.junit.dir/version-2 [junit] 2013-10-08 10:31:01,816 [myid:] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-10-08 10:31:01,816 [myid:] - INFO [main:FileSnap@83] - Reading snapshot /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/branch-3.4/build/test/tmp/test5336527926794013213.junit.dir/version-2/snapshot.b [junit] 2013-10-08 10:31:01,819 [myid:] - INFO [main:FileTxnSnapLog@240] - Snapshotting: 0xb to /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/branch-3.4/build/test/tmp/test5336527926794013213.junit.dir/version-2/snapshot.b [junit] 2013-10-08 10:31:01,821 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-08 10:31:01,821 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:60368 [junit] 2013-10-08 10:31:01,822 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@817] - Processing stat command from /127.0.0.1:60368 [junit] 2013-10-08 10:31:01,822 [myid:] - INFO [Thread-4:NIOServerCnxn$StatCommand@653] - Stat command output [junit] 2013-10-08 10:31:01,823 [myid:] - INFO [Thread-4:NIOServerCnxn@997] - Closed socket connection for client /127.0.0.1:60368 (no session established for client) [junit] 2013-10-08 10:31:01,823 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-10-08 10:31:01,824 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-10-08 10:31:01,825 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-10-08 10:31:01,825 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-10-08 10:31:01,825 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-10-08 10:31:01,825 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-10-08 10:31:01,825 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-10-08 10:31:01,895 [myid:] - INFO [main:ZooKeeper@684] - Session: 0x141979eb04d closed [junit] 2013-10-08 10:31:01,895 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@509] - EventThread shut down [junit] 2013-10-08 10:31:01,895 [myid:] - INFO [main:ClientBase@421] - STOPPING server [junit] 2013-10-08 10:31:01,896 [myid:] - INFO
[jira] [Commented] (ZOOKEEPER-1551) Observers ignore txns that come after snapshot and UPTODATE
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789110#comment-13789110 ] Hudson commented on ZOOKEEPER-1551: --- SUCCESS: Integrated in ZooKeeper-trunk #2082 (See [https://builds.apache.org/job/ZooKeeper-trunk/2082/]) ZOOKEEPER-1551. Observers ignore txns that come after snapshot and UPTODATE (thawan, fpj via thawan) (thawan: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530029) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Learner.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java * /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java Observers ignore txns that come after snapshot and UPTODATE Key: ZOOKEEPER-1551 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551 Project: ZooKeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.4.3 Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1551-3.4.patch, ZOOKEEPER-1551-b3.4.patch, ZOOKEEPER-1551.patch, ZOOKEEPER-1551.patch, ZOOKEEPER-1551-trunk.patch, ZOOKEEPER-1551-trunk.patch, ZOOKEEPER-1551-trunk.patch In Learner.java, txns which comes after the learner has taken the snapshot (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has special logic to apply these txns at the end of syncWithLeader() method. However, the observer will ignore these txns completely, causing data inconsistency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1771) ZooInspector authentication
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789109#comment-13789109 ] Hudson commented on ZOOKEEPER-1771: --- SUCCESS: Integrated in ZooKeeper-trunk #2082 (See [https://builds.apache.org/job/ZooKeeper-trunk/2082/]) ZOOKEEPER-1771. ZooInspector authentication (Benjamin Jaton via phunt) part 2 - fix license headers (phunt: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530158) * /zookeeper/trunk/src/contrib/zooinspector/config/defaultConnectionSettings.cfg * /zookeeper/trunk/src/contrib/zooinspector/config/defaultNodeViewers.cfg * /zookeeper/trunk/src/contrib/zooinspector/src/java/org/apache/zookeeper/inspector/manager/ZooInspectorManagerImpl.java ZooInspector authentication --- Key: ZOOKEEPER-1771 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1771 Project: ZooKeeper Issue Type: Improvement Components: contrib Affects Versions: 3.4.5, 3.5.0 Reporter: Benjamin Jaton Assignee: Benjamin Jaton Priority: Minor Fix For: 3.4.6, 3.5.0 Attachments: Proposal_UI.png, ZOOKEEPER-1771-3.4.patch, ZOOKEEPER-1771-trunk.patch ZooInspector doesn't support authentication, so it always connects as anonymous to the ensemble. It would be nice to be able to configure the authentication scheme+data in order to browse the nodes that have ACLs set. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1774) QuorumPeerMainTest fails consistently with complains about host assertion failure
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789111#comment-13789111 ] Hudson commented on ZOOKEEPER-1774: --- SUCCESS: Integrated in ZooKeeper-trunk #2082 (See [https://builds.apache.org/job/ZooKeeper-trunk/2082/]) ZOOKEEPER-1774. QuorumPeerMainTest fails consistently with complains about host assertion failure (phunt) (phunt: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530157) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java QuorumPeerMainTest fails consistently with complains about host assertion failure --- Key: ZOOKEEPER-1774 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1774 Project: ZooKeeper Issue Type: Bug Components: quorum, tests Affects Versions: 3.4.6 Environment: Ubuntu 13.04 Linux version 3.8.0-30-generic (buildd@akateko) (gcc version 4.7.3 (Ubuntu/Linaro 4.7.3-1ubuntu1) ) #44-Ubuntu SMP Thu Aug 22 20:54:42 UTC 2013 java -version java version 1.6.0_45 Java(TM) SE Runtime Environment (build 1.6.0_45-b06) Java HotSpot(TM) Server VM (build 20.45-b01, mixed mode) Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1774-b34.patch, ZOOKEEPER-1774.patch QuorumPeerMainTest fails consistently with complains about host assertion failure. {noformat} 2013-10-01 16:09:17,962 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED testBadPeerAddressInQuorum java.lang.AssertionError: complains about host at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testBadPeerAddressInQuorum(QuorumPeerMainTest.java:434) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906) 2013-10-01 16:09:17,963 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testBadPeerAddressInQuorum java.lang.AssertionError: complains about host at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testBadPeerAddressInQuorum(QuorumPeerMainTest.java:434) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
[jira] [Commented] (ZOOKEEPER-877) zkpython does not work with python3.1
[ https://issues.apache.org/jira/browse/ZOOKEEPER-877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789112#comment-13789112 ] Hudson commented on ZOOKEEPER-877: -- SUCCESS: Integrated in ZooKeeper-trunk #2082 (See [https://builds.apache.org/job/ZooKeeper-trunk/2082/]) ZOOKEEPER-877. zkpython does not work with python3.1 (Daniel Enman via phunt) (phunt: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530166) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/contrib/zkpython/src/c/zookeeper.c * /zookeeper/trunk/src/contrib/zkpython/src/python/zk.py * /zookeeper/trunk/src/contrib/zkpython/src/test/connection_test.py * /zookeeper/trunk/src/contrib/zkpython/src/test/get_set_test.py * /zookeeper/trunk/src/contrib/zkpython/src/test/zktestbase.py zkpython does not work with python3.1 - Key: ZOOKEEPER-877 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-877 Project: ZooKeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.1 Environment: linux+python3.1 Reporter: TuxRacer Assignee: Daniel Enman Fix For: 3.4.6, 3.5.0 Attachments: Doc.tgz, tests_py3k.tgz, ZOOKEEPER-877.patch, zookeeper.c, zookeeper.c.patch.v1, zookeeper.c.patch.v2, zookeeper.c.v2, zookeeper.rst as written in the contrib/zkpython/README file: Python = 2.6 is required. We have tested against 2.6. We have not tested against 3.x. this is probably more a 'new feature' request than a bug; anyway compiling the pythn module and calling it returns an error at load time: python3.1 Python 3.1.2 (r312:79147, May 8 2010, 16:36:46) [GCC 4.4.4] on linux2 Type help, copyright, credits or license for more information. import zookeeper Traceback (most recent call last): File stdin, line 1, in module ImportError: /usr/local/lib/python3.1/dist-packages/zookeeper.so: undefined symbol: PyString_AsString are there any plan to support Python3.X? I also tried to write a 3.1 ctypes wrapper but the C API seems in fact to be written in C++, so python ctypes cannot be used. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1781) ZooKeeper Server fails if snapCount is set to 1
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789108#comment-13789108 ] Hudson commented on ZOOKEEPER-1781: --- SUCCESS: Integrated in ZooKeeper-trunk #2082 (See [https://builds.apache.org/job/ZooKeeper-trunk/2082/]) ZOOKEEPER-1781. ZooKeeper Server fails if snapCount is set to 1 (Takashi Ohnishi via phunt, breed) (phunt: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530110) * /zookeeper/trunk/docs/index.pdf * /zookeeper/trunk/docs/javaExample.pdf * /zookeeper/trunk/docs/linkmap.pdf * /zookeeper/trunk/docs/recipes.pdf * /zookeeper/trunk/docs/releasenotes.pdf * /zookeeper/trunk/docs/zookeeperAdmin.html * /zookeeper/trunk/docs/zookeeperAdmin.pdf * /zookeeper/trunk/docs/zookeeperHierarchicalQuorums.pdf * /zookeeper/trunk/docs/zookeeperInternals.pdf * /zookeeper/trunk/docs/zookeeperJMX.pdf * /zookeeper/trunk/docs/zookeeperObservers.pdf * /zookeeper/trunk/docs/zookeeperOver.pdf * /zookeeper/trunk/docs/zookeeperProgrammers.pdf * /zookeeper/trunk/docs/zookeeperQuotas.pdf * /zookeeper/trunk/docs/zookeeperStarted.pdf * /zookeeper/trunk/docs/zookeeperTutorial.pdf ZOOKEEPER-1781. ZooKeeper Server fails if snapCount is set to 1 (Takashi Ohnishi via phunt, breed) (phunt: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530108) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/docs/zookeeperAdmin.html * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java * /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/InvalidSnapCountTest.java ZooKeeper Server fails if snapCount is set to 1 Key: ZOOKEEPER-1781 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1781 Project: ZooKeeper Issue Type: Bug Components: quorum Affects Versions: 3.4.5 Reporter: Takashi Ohnishi Assignee: Takashi Ohnishi Priority: Minor Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1781.patch, ZOOKEEPER-1781.patch If snapCount is set to 1, ZooKeeper Server can start but it fails with the below error: 2013-10-02 18:09:07,600 [myid:1] - ERROR [SyncThread:1:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.lang.IllegalArgumentException: n must be positive at java.util.Random.nextInt(Random.java:300) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:93) In source code, it maybe be supposed that snapCount must be 2 or more: {code:title=org.apache.zookeeper.server.SyncRequestProcessor.java|borderStyle=solid} 91 // we do this in an attempt to ensure that not all ofthe servers 92 // in the ensemble take a snapshot at the same time 93 int randRoll = r.nextInt(snapCount/2); {code} I think this supposition is not bad because snapCount = 1 is not realistic setting... But, it may be better to mention this restriction in documentation or add a validation in the source code. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Germán Blanco updated ZOOKEEPER-832: Attachment: ZOOKEEPER-832.patch The behavior in trunk is not consistent. For an ensemble, if the client has a higher zxid, then the session is closed. On the other hand, the standalone server produces the same loop reported in this JIRA. The change in the attached patch, closes the session also in the standalone server avoiding the loop. Even though closing the session might not be the optimal solution, it is much better than an endless loop. Invalid session id causes infinite loop during automatic reconnect -- Key: ZOOKEEPER-832 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832 Project: ZooKeeper Issue Type: Improvement Components: c client, java client Affects Versions: 3.3.1 Environment: Mac OS X 10.6.4 JVM 1.6.0_20 Reporter: Ryan Holmes Assignee: Germán Blanco Fix For: 3.5.0 Attachments: ZOOKEEPER-832.patch Steps to reproduce: 1.) Connect to a standalone server using the Java client. 2.) Stop the server. 3.) Delete the contents of the data directory (i.e. the persisted session data). 4.) Start the server. The client now automatically tries to reconnect but the server refuses the connection because the session id is invalid. The client and server are now in an infinite loop of attempted and rejected connections. While this situation represents a catastrophic failure and the current behavior is not incorrect, it appears that there is no way to detect this situation on the client and therefore no way to recover. The suggested improvement is to send an event to the default watcher indicating that the current state is session invalid, similar to how the session expired state is handled. Server log output (repeats indefinitely): 2010-08-05 11:48:08,283 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - Accepted socket connection from /127.0.0.1:63292 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last zxid is 0x0 client must try another server 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed socket connection for client /127.0.0.1:63292 (no session established for client) Client log output (repeats indefinitely): 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - Opening socket connection to server localhost/127.0.0.1:2181 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 0x12a3ae4e893000a for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring exception during shutdown output java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789150#comment-13789150 ] Hadoop QA commented on ZOOKEEPER-832: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607349/ZOOKEEPER-832.patch against trunk revision 1530166. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1656//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1656//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1656//console This message is automatically generated. Invalid session id causes infinite loop during automatic reconnect -- Key: ZOOKEEPER-832 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832 Project: ZooKeeper Issue Type: Improvement Components: c client, java client Affects Versions: 3.3.1, 3.5.0 Environment: Mac OS X 10.6.4 JVM 1.6.0_20 Reporter: Ryan Holmes Assignee: Germán Blanco Fix For: 3.5.0 Attachments: ZOOKEEPER-832.patch Steps to reproduce: 1.) Connect to a standalone server using the Java client. 2.) Stop the server. 3.) Delete the contents of the data directory (i.e. the persisted session data). 4.) Start the server. The client now automatically tries to reconnect but the server refuses the connection because the session id is invalid. The client and server are now in an infinite loop of attempted and rejected connections. While this situation represents a catastrophic failure and the current behavior is not incorrect, it appears that there is no way to detect this situation on the client and therefore no way to recover. The suggested improvement is to send an event to the default watcher indicating that the current state is session invalid, similar to how the session expired state is handled. Server log output (repeats indefinitely): 2010-08-05 11:48:08,283 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - Accepted socket connection from /127.0.0.1:63292 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last zxid is 0x0 client must try another server 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed socket connection for client /127.0.0.1:63292 (no session established for client) Client log output (repeats indefinitely): 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - Opening socket connection to server localhost/127.0.0.1:2181 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 0x12a3ae4e893000a for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring exception during shutdown output java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
Failed: ZOOKEEPER-832 PreCommit Build #1656
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-832 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1656/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 247971 lines...] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12607349/ZOOKEEPER-832.patch [exec] against trunk revision 1530166. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1656//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1656//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1656//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 8300c63d152a3492dc1a05eb1f00cfb11784b1b9 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 1 Total time: 31 minutes 38 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-832 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-1756) zookeeper_interest() in C client can return a timeval of 0
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789322#comment-13789322 ] Patrick Hunt commented on ZOOKEEPER-1756: - NP. [~lindvall] is it possible to add a test for this problem/change? zookeeper_interest() in C client can return a timeval of 0 -- Key: ZOOKEEPER-1756 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1756 Project: ZooKeeper Issue Type: Bug Components: c client Affects Versions: 3.3.4, 3.4.5 Reporter: Eric Lindvall Assignee: Eric Lindvall Fix For: 3.4.6, 3.5.0 Attachments: 0001-Ensure-send_to-is-positive.patch, 0001-Ensure-send_to-is-positive.patch, ZOOKEEPER-1756-br34.patch, ZOOKEEPER-1756.patch, zookeeper-3.4.5-send_to-fix.patch If the client is connected to a zookeeper server that has hung while there is an outstanding request, zookeeper_interest() can return a timeval of 0 because send_to will be negative. -- This message was sent by Atlassian JIRA (v6.1#6144)
Re: Have zk cpp client?
afaik there is no c++ native library, see also: https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZKClientBindings take a look at the code in src/contrib/zkfuse/src/ for an example of how c++ can be used to access the c library. Patrick On Tue, Oct 8, 2013 at 1:34 AM, 吴腾飞 wuteng...@yy.com wrote: Hi,all I looking for c++ client for zookeeper. Where I can find it? Thanks, Albert Wu
[jira] [Updated] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-1477: Fix Version/s: (was: 3.5.0) Test failures with Java 7 on Mac OS X - Key: ZOOKEEPER-1477 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477 Project: ZooKeeper Issue Type: Bug Components: server, tests Affects Versions: 3.4.3 Environment: Mac OS X Lion (10.7.4) Java version: java version 1.7.0_04 Java(TM) SE Runtime Environment (build 1.7.0_04-b21) Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) Reporter: Diwaker Gupta Attachments: with-ZK-1550.txt I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, including ZooKeeperTest. A common symptom was spurious {{ConnectionLossException}}: {code} 2012-06-01 12:01:23,420 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED testDeleteRecursiveAsync org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for / at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) at org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ... (snipped) {code} As background, I was actually investigating some non-deterministic failures when using Netflix's Curator with Java 7 (see https://github.com/Netflix/curator/issues/79). After a while, I figured I should establish a clean ZK baseline first and realized it is actually a ZK issue, not a Curator issue. We are trying to migrate to Java 7 but this is a blocking issue for us right now. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1639) zk.getZKDatabase().deserializeSnapshot adds new system znodes instead of replacing existing ones
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789330#comment-13789330 ] Patrick Hunt commented on ZOOKEEPER-1639: - [~fpj] [~shralex] is this something we need to worry about for 3.4.6? zk.getZKDatabase().deserializeSnapshot adds new system znodes instead of replacing existing ones Key: ZOOKEEPER-1639 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1639 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.5 Reporter: Alexander Shraer Before the call to zk.getZKDatabase().deserializeSnapshot in Learner.java, zk.getZKDatabase().getDataTree().getNode(/zookeeper) == zk.getZKDatabase().getDataTree().procDataNode, which means that this is the same znode, as it should be. However, after this call, they are not equal. The node actually being used in client operations is zk.getZKDatabase().getDataTree().getNode(/zookeeper), but the other old node procDataNode is still there and not replaced (in fact it is a final field). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (ZOOKEEPER-1639) zk.getZKDatabase().deserializeSnapshot adds new system znodes instead of replacing existing ones
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789337#comment-13789337 ] Flavio Junqueira edited comment on ZOOKEEPER-1639 at 10/8/13 4:15 PM: -- I'm missing some context here. What kind of incorrect behavior does this bug induce? was (Author: fpj): I'm missing some context here. What kind of incorrect behavior does this bug induces? zk.getZKDatabase().deserializeSnapshot adds new system znodes instead of replacing existing ones Key: ZOOKEEPER-1639 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1639 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.5 Reporter: Alexander Shraer Before the call to zk.getZKDatabase().deserializeSnapshot in Learner.java, zk.getZKDatabase().getDataTree().getNode(/zookeeper) == zk.getZKDatabase().getDataTree().procDataNode, which means that this is the same znode, as it should be. However, after this call, they are not equal. The node actually being used in client operations is zk.getZKDatabase().getDataTree().getNode(/zookeeper), but the other old node procDataNode is still there and not replaced (in fact it is a final field). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1480) ClientCnxn(1161) can't get the current zk server add, so that - Session 0x for server null, unexpected error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789340#comment-13789340 ] Patrick Hunt commented on ZOOKEEPER-1480: - [~nileader] can you update the patch and resubmit? Thanks! ClientCnxn(1161) can't get the current zk server add, so that - Session 0x for server null, unexpected error Key: ZOOKEEPER-1480 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1480 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.3 Reporter: Leader Ni Assignee: Leader Ni Labels: client, getCurrentZooKeeperAddr Fix For: 3.5.0 Attachments: getCurrentZooKeeperAddr_for_3.4.3.patch, getCurrentZooKeeperAddr_for_branch3.4.patch When zookeeper occur an unexpected error( Not SessionExpiredException, SessionTimeoutException and EndOfStreamException), ClientCnxn(1161) will log such as the formart Session 0x for server null, unexpected error, closing socket connection and attempting reconnect . The log at line 1161 in zookeeper-3.3.3 We found that, zookeeper use ((SocketChannel)sockKey.channel()).socket().getRemoteSocketAddress() to get zookeeper addr. But,Sometimes, it logs Session 0x for server null, you know, if log null, developer can't determine the current zookeeper addr that client is connected or connecting. I add a method in Class SendThread:InetSocketAddress org.apache.zookeeper.ClientCnxn.SendThread.getCurrentZooKeeperAddr(). Here: /** * Returns the address to which the socket is connected. * * @return ip address of the remote side of the connection or null if not * connected */ @Override SocketAddress getRemoteSocketAddress() { // a lot could go wrong here, so rather than put in a bunch of code // to check for nulls all down the chain let's do it the simple // yet bulletproof way . -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1659) Add JMX support for dynamic reconfiguration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-1659: Priority: Blocker (was: Major) We need to triage this at a minimum. Also effects on 4lw? Add JMX support for dynamic reconfiguration --- Key: ZOOKEEPER-1659 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1659 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Alexander Shraer Priority: Blocker Fix For: 3.5.0 We need to update JMX during reconfigurations. Currently, reconfiguration changes are not reflected in JConsole. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1459) Standalone ZooKeeperServer is not closing the transaction log files on shutdown
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789357#comment-13789357 ] Flavio Junqueira commented on ZOOKEEPER-1459: - I'm not sure what you mean with leaking in the description here, [~rakeshr]. In fact, I was hoping that this patch would solve the problem of deleting temporary files in unit tests on windows, but it didn't. I agree with [~fournc], this is not a blocker for 3.4.6, but it is ok to have it if we can get a fix. Standalone ZooKeeperServer is not closing the transaction log files on shutdown --- Key: ZOOKEEPER-1459 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1459 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch When shutdown the standalone ZK server, its only clearing the zkdatabase and not closing the transaction log streams. This will leaks the transaction log streams. ZooKeeperServer.java {noformat} if (zkDb != null) { zkDb.clear(); } {noformat} Suggestion to close the zkDb as follows, this inturn will take care transaction logs: {noformat} if (zkDb != null) { zkDb.clear(); try { zkDb.close(); } catch (IOException ie) { LOG.warn(Error closing logs , ie); } } {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1624) PrepRequestProcessor abort multi-operation incorrectly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789358#comment-13789358 ] Flavio Junqueira commented on ZOOKEEPER-1624: - [~thawan], would you have time to generate a patch for 3.4.6? PrepRequestProcessor abort multi-operation incorrectly -- Key: ZOOKEEPER-1624 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Critical Labels: zk-review Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch We found this issue when trying to issue multiple instances of the following multi-op concurrently multi { 1. create sequential node /a- 2. create node /b } The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible. Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously 1. ZOK, ZOK 2. ZOK, ZNODEEXISTS, 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY, When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor 1. create /a-0001 2. create /a-0003 3. create /a-0001 The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node (/). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789361#comment-13789361 ] Patrick Hunt commented on ZOOKEEPER-1742: - Does this mean that we can't compile the c client on macos? If so we should increase the priority IMO. make check doesn't work on macos -- Key: ZOOKEEPER-1742 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742 Project: ZooKeeper Issue Type: Bug Reporter: Flavio Junqueira Assignee: Benjamin Reed Fix For: 3.4.6, 3.5.0 There are two problems I have spotted when running make check with the C client. First, it complains that the sleep call is not defined in two test files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. Including unistd.h works. The second problem is with linker options. It complains that --wrap is not a valid. I'm not sure how to deal with this one yet, since I'm not sure why we are using it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789362#comment-13789362 ] Flavio Junqueira commented on ZOOKEEPER-1742: - We can build the C client, but we can't run tests. At least I couldn't make it work on my computer, but I haven't had the time to look into it again. make check doesn't work on macos -- Key: ZOOKEEPER-1742 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742 Project: ZooKeeper Issue Type: Bug Reporter: Flavio Junqueira Assignee: Benjamin Reed Fix For: 3.4.6, 3.5.0 There are two problems I have spotted when running make check with the C client. First, it complains that the sleep call is not defined in two test files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. Including unistd.h works. The second problem is with linker options. It complains that --wrap is not a valid. I'm not sure how to deal with this one yet, since I'm not sure why we are using it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1002) The Barrier sample code should create a EPHEMERAL znode instead of EPHEMERAL_SEQUENTIAL znode
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-1002: Assignee: Ching-Shen Chen The Barrier sample code should create a EPHEMERAL znode instead of EPHEMERAL_SEQUENTIAL znode - Key: ZOOKEEPER-1002 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1002 Project: ZooKeeper Issue Type: Bug Components: documentation Affects Versions: 3.3.2 Reporter: Ching-Shen Chen Assignee: Ching-Shen Chen Priority: Minor Labels: documentation Fix For: 3.4.6, 3.5.0 Attachments: zookeeper-1002.patch Please see the Barrier sample code from ZooKeeper Tutorial(http://zookeeper.apache.org/doc/r3.3.1/zookeeperTutorial.html#sc_barriers), that should enable a group of processes to synchronize the beginning and the end of a computation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13598463#comment-13598463 ] Patrick Hunt edited comment on ZOOKEEPER-1382 at 10/8/13 4:41 PM: -- I've opened a bug for this some time ago: ZOOKEEPER-1629 perhaps we should disable this test for now ? for example, if one of the Java tests fails, the C tests don't run at all. was (Author: shralex): I've opened a bug for this some time ago: https://issues.apache.org/jira/browse/ZOOKEEPER-1629 perhaps we should disable this test for now ? for example, if one of the Java tests fails, the C tests don't run at all. Zookeeper server holds onto dead/expired session ids in the watch data structures - Key: ZOOKEEPER-1382 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Critical Fix For: 3.4.6 Attachments: ZOOKEEPER-1382_3.3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch I've observed that zookeeper server holds onto expired session ids in the watcher data structures. The result is the wchp command reports session ids that cannot be found through cons/dump and those expired session ids sit there maybe until the server is restarted. Here are snippets from the client and the server logs that lead to this state, for one particular session id 0x134485fd7bcb26f - There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 and I'm using ZkClient to connect to the cluster From the application log - application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application Session establishment complete on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, negotiated timeout = 6000 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application] Client session timed out, have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing socket connection and attempting reconnect application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] [main-SendThread(226.prod:12913)] [application] Unable to reconnect to ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket connection On the leader zk, 225 - zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, timeout of 6000ms exceeded zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination for sessionid: 0x134485fd7bcb26f On the server, the client was initially connected to, 223 - zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO [CommitProcessor:1:NIOServerCnxn@1580] - Established session 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f Here are the log snippets from 226, which is the server, the client reconnected to, before getting session expired event - 2012-01-27 09:52:38,190 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367 2012-01-27 09:52:38,191 - INFO [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired 2012-01-27 09:52:38,191 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:49367 which had sessionid 0x134485fd7bcb26f wchp output from 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*wchp* | wc -l 3 wchp output from 223, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*wchp* | wc -l 0 cons output from 223 and 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*cons* | wc -l 0 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*cons* | wc -l 0 So, what seems to have happened is that the client was able to re-register the watches on the new server (226), after it got disconnected from 223, inspite of having an expired session id. In NIOServerCnxn, I saw that after suspecting that a
Failed: ZOOKEEPER-1105 PreCommit Build #1657
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1105 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1657/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 274724 lines...] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12546985/ZOOKEEPER-1105v1.patch [exec] against trunk revision 1530166. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] -1 core tests. The patch failed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1657//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1657//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1657//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 20eb47580074efbbe210a32c059e0158badaf0f9 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 2 Total time: 27 minutes 34 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1105 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-1105) c client zookeeper_close not send CLOSE_OP request to server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789368#comment-13789368 ] Hadoop QA commented on ZOOKEEPER-1105: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12546985/ZOOKEEPER-1105v1.patch against trunk revision 1530166. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1657//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1657//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1657//console This message is automatically generated. c client zookeeper_close not send CLOSE_OP request to server Key: ZOOKEEPER-1105 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1105 Project: ZooKeeper Issue Type: Bug Components: c client Affects Versions: 3.3.2, 3.4.3 Reporter: jiang guangran Assignee: lincoln.lee Fix For: 3.5.0 Attachments: zklog.txt, zktest.c, zktest.java, ZOOKEEPER-1105.patch, ZOOKEEPER-1105v1.patch in zookeeper_close function, do adaptor_finish before send CLOSE_OP request to server so the CLOSE_OP request can not be sent to server in server zookeeper.log have many 2011-06-22 00:23:02,323 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634] - EndOfStreamException: Unable to read additional data from client sessionid 0x1305970d66d2224, likely client has closed socket 2011-06-22 00:23:02,324 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for client /10.250.8.123:60257 which had sessionid 0x1305970d66d2224 2011-06-22 00:23:02,325 - ERROR [CommitProcessor:1:NIOServerCnxn@445] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) and java client not have this problem -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1732) ZooKeeper server unable to join established ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789374#comment-13789374 ] Flavio Junqueira commented on ZOOKEEPER-1732: - One thing I'm not happy about your patch is that you use zero as don't care values. For readability, I'd rather have perhaps different method calls or constants reflecting the fact that we are not taking those values into account. Adding comments to the code explaining what's going on sounds like a good thing to do. ZooKeeper server unable to join established ensemble Key: ZOOKEEPER-1732 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.4.5 Environment: Windows 7, Java 1.7 Reporter: Germán Blanco Assignee: Germán Blanco Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: CREATE_INCONSISTENCIES_patch.txt, zklog.tar.gz, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch I have a test in which I do a rolling restart of three ZooKeeper servers and it was failing from time to time. I ran the tests in a loop until the failure came out and it seems that at some point one of the servers is unable to join the enssemble formed by the other two. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1624) PrepRequestProcessor abort multi-operation incorrectly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789376#comment-13789376 ] Camille Fournier commented on ZOOKEEPER-1624: - The problem is that ZOOKEEPER-1572 is necessary for the test he wrote for this. Do we want to push that into 3.4.6? PrepRequestProcessor abort multi-operation incorrectly -- Key: ZOOKEEPER-1624 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Critical Labels: zk-review Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch We found this issue when trying to issue multiple instances of the following multi-op concurrently multi { 1. create sequential node /a- 2. create node /b } The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible. Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously 1. ZOK, ZOK 2. ZOK, ZNODEEXISTS, 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY, When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor 1. create /a-0001 2. create /a-0003 3. create /a-0001 The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node (/). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1002) The Barrier sample code should create a EPHEMERAL znode instead of EPHEMERAL_SEQUENTIAL znode
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789384#comment-13789384 ] Flavio Junqueira commented on ZOOKEEPER-1002: - I don't understand the rationale for this patch. Could you please describe what you're trying to achieve, [~chingshen]? The Barrier sample code should create a EPHEMERAL znode instead of EPHEMERAL_SEQUENTIAL znode - Key: ZOOKEEPER-1002 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1002 Project: ZooKeeper Issue Type: Bug Components: documentation Affects Versions: 3.3.2 Reporter: Ching-Shen Chen Assignee: Ching-Shen Chen Priority: Minor Labels: documentation Fix For: 3.4.6, 3.5.0 Attachments: zookeeper-1002.patch Please see the Barrier sample code from ZooKeeper Tutorial(http://zookeeper.apache.org/doc/r3.3.1/zookeeperTutorial.html#sc_barriers), that should enable a group of processes to synchronize the beginning and the end of a computation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1624) PrepRequestProcessor abort multi-operation incorrectly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789388#comment-13789388 ] Camille Fournier commented on ZOOKEEPER-1624: - Right. So should we push the patch without the Java test in 3.4.6? PrepRequestProcessor abort multi-operation incorrectly -- Key: ZOOKEEPER-1624 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Critical Labels: zk-review Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch We found this issue when trying to issue multiple instances of the following multi-op concurrently multi { 1. create sequential node /a- 2. create node /b } The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible. Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously 1. ZOK, ZOK 2. ZOK, ZNODEEXISTS, 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY, When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor 1. create /a-0001 2. create /a-0003 3. create /a-0001 The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node (/). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789394#comment-13789394 ] Flavio Junqueira commented on ZOOKEEPER-1777: - I don't understand why we are not truncating. As I explained before this is what I would expect ZooKeeper to do. Missing ephemeral nodes in one of the members of the ensemble - Key: ZOOKEEPER-1777 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 Project: ZooKeeper Issue Type: Bug Components: quorum Affects Versions: 3.4.5 Environment: Linux, Java 1.7 Reporter: Germán Blanco Assignee: Germán Blanco Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: logs_trunk.tar.gz, snaps.tar, ZOOKEEPER-1777-3.4.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.tar.gz In a 3-servers ensemble, one of the followers doesn't see part of the ephemeral nodes that are present in the leader and the other follower. The 8 missing nodes in the follower that is not ok were created in the end of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1525) Plumb ZooKeeperServer object into auth plugins
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789400#comment-13789400 ] Hadoop QA commented on ZOOKEEPER-1525: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539665/ZOOKEEPER-1525.patch against trunk revision 1530166. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1658//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1658//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1658//console This message is automatically generated. Plumb ZooKeeperServer object into auth plugins -- Key: ZOOKEEPER-1525 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1525 Project: ZooKeeper Issue Type: Improvement Reporter: Warren Turkal Assignee: Patrick Hunt Fix For: 3.5.0 Attachments: ZOOKEEPER-1525.patch I want to plumb the ZooKeeperServer object into the auth plugins so that I can store authentication data in zookeeper itself. With access to the ZooKeeperServer object, I also have access to the ZKDatabase and can look up entries in the local copy of the zookeeper data. In order to implement this, I make sure that a ZooKeeperServer instance is passed in to the ProviderRegistry.initialize() method. Then initialize() will try to find a constructor for the AuthenticationProvider that takes a ZooKeeperServer instance. If the constructor is found, it will be used. Otherwise, initialize() will look for a constructor that takes no arguments and use that instead. -- This message was sent by Atlassian JIRA (v6.1#6144)
Failed: ZOOKEEPER-1525 PreCommit Build #1658
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1525 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1658/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 254712 lines...] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12539665/ZOOKEEPER-1525.patch [exec] against trunk revision 1530166. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] -1 core tests. The patch failed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1658//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1658//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1658//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 55a06cd9f28817357a297b2df264d7f2f370adce logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 2 Total time: 27 minutes 14 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1525 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 5 tests failed. REGRESSION: org.apache.zookeeper.test.SaslAuthDesignatedClientTest.testAuth Error Message: test failed :org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /path1 Stack Trace: junit.framework.AssertionFailedError: test failed :org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /path1 at org.apache.zookeeper.test.SaslAuthDesignatedClientTest.testAuth(SaslAuthDesignatedClientTest.java:80) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) REGRESSION: org.apache.zookeeper.test.SaslAuthDesignatedClientTest.testReadAccessUser Error Message: Unable to create znode Stack Trace: junit.framework.AssertionFailedError: Unable to create znode at org.apache.zookeeper.test.SaslAuthDesignatedClientTest.testReadAccessUser(SaslAuthDesignatedClientTest.java:120) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) REGRESSION: org.apache.zookeeper.test.SaslAuthDesignatedServerTest.testAuth Error Message: test failed :org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /path1 Stack Trace: junit.framework.AssertionFailedError: test failed :org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /path1 at
[jira] [Commented] (ZOOKEEPER-1756) zookeeper_interest() in C client can return a timeval of 0
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789484#comment-13789484 ] Eric Lindvall commented on ZOOKEEPER-1756: -- The only way I was able to repro it was with a test-harness I wrote to replication network failures, and only was able to identify the problem by an excess of log messages. Here's the code I had used: https://github.com/zk-ruby/zookeeper/blob/master/zoomonkey/zoomonkey.rb zookeeper_interest() in C client can return a timeval of 0 -- Key: ZOOKEEPER-1756 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1756 Project: ZooKeeper Issue Type: Bug Components: c client Affects Versions: 3.3.4, 3.4.5 Reporter: Eric Lindvall Assignee: Eric Lindvall Fix For: 3.4.6, 3.5.0 Attachments: 0001-Ensure-send_to-is-positive.patch, 0001-Ensure-send_to-is-positive.patch, ZOOKEEPER-1756-br34.patch, ZOOKEEPER-1756.patch, zookeeper-3.4.5-send_to-fix.patch If the client is connected to a zookeeper server that has hung while there is an outstanding request, zookeeper_interest() can return a timeval of 0 because send_to will be negative. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1624) PrepRequestProcessor abort multi-operation incorrectly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789627#comment-13789627 ] Patrick Hunt commented on ZOOKEEPER-1624: - Can a test not be created that meets the criteria? PrepRequestProcessor abort multi-operation incorrectly -- Key: ZOOKEEPER-1624 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Critical Labels: zk-review Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch We found this issue when trying to issue multiple instances of the following multi-op concurrently multi { 1. create sequential node /a- 2. create node /b } The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible. Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously 1. ZOK, ZOK 2. ZOK, ZNODEEXISTS, 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY, When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor 1. create /a-0001 2. create /a-0003 3. create /a-0001 The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node (/). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1784) Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789646#comment-13789646 ] Alexander Shraer commented on ZOOKEEPER-1784: - You're right - it does look like a typo. Good catch! If you'd like, please feel free to assign to yourself and submit a patch. Thanks, Alex Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus Key: ZOOKEEPER-1784 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1784 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer If you look at Learner#syncWithLeader: {noformat} while (self.isRunning()) { readPacket(qp); switch(qp.getType()) { ... case Leader.INFORM: case Leader.INFORMANDACTIVATE: PacketInFlight packet = new PacketInFlight(); packet.hdr = new TxnHeader(); if (qp.getType() == Leader.COMMITANDACTIVATE) { {noformat} I guess qp.getType() == Leader.COMMITANDACTIVATE is a typo that should read qp.getType() == Leader.INFORMANDACTIVATE. Assigning to Alexander for now since this is part of ZOOKEEPER-107. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1639) zk.getZKDatabase().deserializeSnapshot adds new system znodes instead of replacing existing ones
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789660#comment-13789660 ] Alexander Shraer commented on ZOOKEEPER-1639: - The context is that I was trying to add a new system znode (for holding the configuration) and couldn't understand why the new node is initially there but suddenly disappears! The reason turned out to be that the node I attached it too is not the one being used after the call to deserializeSnapshot. I don't think it affects the users - just an internal cleaning up issue. zk.getZKDatabase().deserializeSnapshot adds new system znodes instead of replacing existing ones Key: ZOOKEEPER-1639 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1639 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.5 Reporter: Alexander Shraer Before the call to zk.getZKDatabase().deserializeSnapshot in Learner.java, zk.getZKDatabase().getDataTree().getNode(/zookeeper) == zk.getZKDatabase().getDataTree().procDataNode, which means that this is the same znode, as it should be. However, after this call, they are not equal. The node actually being used in client operations is zk.getZKDatabase().getDataTree().getNode(/zookeeper), but the other old node procDataNode is still there and not replaced (in fact it is a final field). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1624) PrepRequestProcessor abort multi-operation incorrectly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789664#comment-13789664 ] Flavio Junqueira commented on ZOOKEEPER-1624: - I understand that we typically don't add new features to bug fix releases, but I don't really see a problem with having ZOOKEEPER-1572 into b3.4, it is not really a whole new feature. PrepRequestProcessor abort multi-operation incorrectly -- Key: ZOOKEEPER-1624 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Critical Labels: zk-review Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch We found this issue when trying to issue multiple instances of the following multi-op concurrently multi { 1. create sequential node /a- 2. create node /b } The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible. Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously 1. ZOK, ZOK 2. ZOK, ZNODEEXISTS, 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY, When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor 1. create /a-0001 2. create /a-0003 3. create /a-0001 The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node (/). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1699) Leader should timeout and give up leadership when losing quorum of last proposed configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Shraer updated ZOOKEEPER-1699: Priority: Blocker (was: Major) Leader should timeout and give up leadership when losing quorum of last proposed configuration -- Key: ZOOKEEPER-1699 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1699 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Alexander Shraer Priority: Blocker Fix For: 3.5.0 A leader gives up leadership when losing a quorum of the current configuration. This doesn't take into account any proposed configuration. So, if a reconfig operation is in progress and a quorum of the new configuration is not responsive, the leader will just get stuck waiting for it to ACK the reconfig operation, and will never timeout. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789694#comment-13789694 ] Patrick Hunt commented on ZOOKEEPER-1777: - fwiw I upped the priority in order that we triage this issue appropriately. If we've done so and we feel confident that this is not a blocker, well don't consider me a blocker to making progress. (e.g. downgrading again and/or moving out to a future release, etc...) Missing ephemeral nodes in one of the members of the ensemble - Key: ZOOKEEPER-1777 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 Project: ZooKeeper Issue Type: Bug Components: quorum Affects Versions: 3.4.5 Environment: Linux, Java 1.7 Reporter: Germán Blanco Assignee: Germán Blanco Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: logs_trunk.tar.gz, snaps.tar, ZOOKEEPER-1777-3.4.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.tar.gz In a 3-servers ensemble, one of the followers doesn't see part of the ephemeral nodes that are present in the leader and the other follower. The 8 missing nodes in the follower that is not ok were created in the end of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1624) PrepRequestProcessor abort multi-operation incorrectly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789692#comment-13789692 ] Thawan Kooburat commented on ZOOKEEPER-1624: As I already comment earlier, the current Java test doesn't actually catch the bug due to timing issue. I guess, I will have to rewrite it to test PrepRequestProcessor directly (which is probably not going to rely on ZOOKEEPER-1572) If you want to commit this now, the patch itself has a proper and reliable (at least on my box) unit test in C. Our test infrastructure do run C unit test and report the result right? I agree with Camile that it would be nice to have Java test for server-side functionality but it isn't strictly needed right? PrepRequestProcessor abort multi-operation incorrectly -- Key: ZOOKEEPER-1624 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Critical Labels: zk-review Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch We found this issue when trying to issue multiple instances of the following multi-op concurrently multi { 1. create sequential node /a- 2. create node /b } The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible. Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously 1. ZOK, ZOK 2. ZOK, ZNODEEXISTS, 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY, When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor 1. create /a-0001 2. create /a-0003 3. create /a-0001 The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node (/). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789704#comment-13789704 ] Flavio Junqueira commented on ZOOKEEPER-1777: - I'd like to understand why the truncation is not working, but since we don't actually guarantee correctness in such scenarios, I don't think it should block the release. Again, we can keep working on it until we produce a release candidate, but I'd like to make sure that we agree that it shouldn't block the release when time comes. Missing ephemeral nodes in one of the members of the ensemble - Key: ZOOKEEPER-1777 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 Project: ZooKeeper Issue Type: Bug Components: quorum Affects Versions: 3.4.5 Environment: Linux, Java 1.7 Reporter: Germán Blanco Assignee: Germán Blanco Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: logs_trunk.tar.gz, snaps.tar, ZOOKEEPER-1777-3.4.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.tar.gz In a 3-servers ensemble, one of the followers doesn't see part of the ephemeral nodes that are present in the leader and the other follower. The 8 missing nodes in the follower that is not ok were created in the end of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-1777: Priority: Critical (was: Blocker) Missing ephemeral nodes in one of the members of the ensemble - Key: ZOOKEEPER-1777 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 Project: ZooKeeper Issue Type: Bug Components: quorum Affects Versions: 3.4.5 Environment: Linux, Java 1.7 Reporter: Germán Blanco Assignee: Germán Blanco Priority: Critical Fix For: 3.4.6, 3.5.0 Attachments: logs_trunk.tar.gz, snaps.tar, ZOOKEEPER-1777-3.4.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.tar.gz In a 3-servers ensemble, one of the followers doesn't see part of the ephemeral nodes that are present in the leader and the other follower. The 8 missing nodes in the follower that is not ok were created in the end of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1784) Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raul Gutierrez Segales updated ZOOKEEPER-1784: -- Attachment: ZOOKEEPER-1784.patch Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus Key: ZOOKEEPER-1784 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1784 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Attachments: ZOOKEEPER-1784.patch If you look at Learner#syncWithLeader: {noformat} while (self.isRunning()) { readPacket(qp); switch(qp.getType()) { ... case Leader.INFORM: case Leader.INFORMANDACTIVATE: PacketInFlight packet = new PacketInFlight(); packet.hdr = new TxnHeader(); if (qp.getType() == Leader.COMMITANDACTIVATE) { {noformat} I guess qp.getType() == Leader.COMMITANDACTIVATE is a typo that should read qp.getType() == Leader.INFORMANDACTIVATE. Assigning to Alexander for now since this is part of ZOOKEEPER-107. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (ZOOKEEPER-1784) Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raul Gutierrez Segales reassigned ZOOKEEPER-1784: - Assignee: Raul Gutierrez Segales (was: Alexander Shraer) Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus Key: ZOOKEEPER-1784 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1784 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Attachments: ZOOKEEPER-1784.patch If you look at Learner#syncWithLeader: {noformat} while (self.isRunning()) { readPacket(qp); switch(qp.getType()) { ... case Leader.INFORM: case Leader.INFORMANDACTIVATE: PacketInFlight packet = new PacketInFlight(); packet.hdr = new TxnHeader(); if (qp.getType() == Leader.COMMITANDACTIVATE) { {noformat} I guess qp.getType() == Leader.COMMITANDACTIVATE is a typo that should read qp.getType() == Leader.INFORMANDACTIVATE. Assigning to Alexander for now since this is part of ZOOKEEPER-107. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (ZOOKEEPER-952) scrub codebase for references to pre-TLP locations.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt resolved ZOOKEEPER-952. Resolution: Not A Problem scrub codebase for references to pre-TLP locations. --- Key: ZOOKEEPER-952 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-952 Project: ZooKeeper Issue Type: Sub-task Reporter: Patrick Hunt Assignee: Mahadev konar The codebase needs to be scrubbed of references to hadoop and old locations (web site, wiki, svn, mailing lists, etc...) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (ZOOKEEPER-951) monthly board reports for first 3 months (then quarterly reports)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt resolved ZOOKEEPER-951. Resolution: Implemented monthly board reports for first 3 months (then quarterly reports) - Key: ZOOKEEPER-951 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-951 Project: ZooKeeper Issue Type: Sub-task Reporter: Patrick Hunt Assignee: Patrick Hunt Board reporting guidelines can be found here: http://apache.org/foundation/board/reporting note that ZOOKEEPER-953 should also be addressed (branding checklist) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (ZOOKEEPER-940) Umbrella JIRA for move to TLP
[ https://issues.apache.org/jira/browse/ZOOKEEPER-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt resolved ZOOKEEPER-940. Resolution: Implemented Umbrella JIRA for move to TLP - Key: ZOOKEEPER-940 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-940 Project: ZooKeeper Issue Type: Task Reporter: Patrick Hunt Assignee: Patrick Hunt This is an umbrella jira for our move to TLP status. Please create subtasks for any issues you find related to the move. Note that INFRA-3228 is now closed, so a number of infra related issues have already been closed. This jira (subs) is for additional issues we need to address. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (ZOOKEEPER-929) hudson qabot incorrectly reporting issues as number 909 when the patch from 908 is the one being tested
[ https://issues.apache.org/jira/browse/ZOOKEEPER-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt resolved ZOOKEEPER-929. Resolution: Cannot Reproduce hudson qabot incorrectly reporting issues as number 909 when the patch from 908 is the one being tested --- Key: ZOOKEEPER-929 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-929 Project: ZooKeeper Issue Type: Bug Components: build Reporter: Patrick Hunt Assignee: Patrick Hunt Hi Nigel can you take a look at this? Following you'll see the email I got, notice that the patch is patch 908, however if you look at the hudson page it's linked to the change is documented as 909 patch file applied https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25/changes I looked at both jiras ZOOKEEPER-908 and ZOOKEEPER-909 both of these look good (the right names on patches) and qabot actually updated 908 with the comment (failure). However the change is listed as 909 which is wrong. [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12459361/ZOOKEEPER-908.patch [exec] against trunk revision 1033770. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//testReport/ [exec] Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//console [exec] [exec] This message is automatically generated. [exec] -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789762#comment-13789762 ] Patrick Hunt commented on ZOOKEEPER-900: [~vishalmlst] [~mahadev] what's the status on this? Should we close in preference to another jira? (as Mahadev suggested) FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: ZooKeeper Issue Type: Bug Reporter: Vishal Kher Assignee: Vishal Kher Priority: Critical Fix For: 3.5.0 Attachments: ZOOKEEPER-900.patch, ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1784) Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789794#comment-13789794 ] Hadoop QA commented on ZOOKEEPER-1784: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607443/ZOOKEEPER-1784.patch against trunk revision 1530166. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1659//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1659//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1659//console This message is automatically generated. Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus Key: ZOOKEEPER-1784 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1784 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Attachments: ZOOKEEPER-1784.patch If you look at Learner#syncWithLeader: {noformat} while (self.isRunning()) { readPacket(qp); switch(qp.getType()) { ... case Leader.INFORM: case Leader.INFORMANDACTIVATE: PacketInFlight packet = new PacketInFlight(); packet.hdr = new TxnHeader(); if (qp.getType() == Leader.COMMITANDACTIVATE) { {noformat} I guess qp.getType() == Leader.COMMITANDACTIVATE is a typo that should read qp.getType() == Leader.INFORMANDACTIVATE. Assigning to Alexander for now since this is part of ZOOKEEPER-107. -- This message was sent by Atlassian JIRA (v6.1#6144)
Failed: ZOOKEEPER-1784 PreCommit Build #1659
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1784 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1659/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 257446 lines...] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12607443/ZOOKEEPER-1784.patch [exec] against trunk revision 1530166. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1659//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1659//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1659//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 05b0e721b1defb7b0731fd1856bbf7e42e44c6d4 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 1 Total time: 32 minutes 12 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1784 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-1784) Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789799#comment-13789799 ] Raul Gutierrez Segales commented on ZOOKEEPER-1784: --- [~shralex]: so that code path, processing INFORMANDACTIVATE, doesn't have (it seems) a corresponding test case. Should we add one or extend an existing one to cover it? Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus Key: ZOOKEEPER-1784 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1784 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Attachments: ZOOKEEPER-1784.patch If you look at Learner#syncWithLeader: {noformat} while (self.isRunning()) { readPacket(qp); switch(qp.getType()) { ... case Leader.INFORM: case Leader.INFORMANDACTIVATE: PacketInFlight packet = new PacketInFlight(); packet.hdr = new TxnHeader(); if (qp.getType() == Leader.COMMITANDACTIVATE) { {noformat} I guess qp.getType() == Leader.COMMITANDACTIVATE is a typo that should read qp.getType() == Leader.INFORMANDACTIVATE. Assigning to Alexander for now since this is part of ZOOKEEPER-107. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (ZOOKEEPER-1037) Create BookKeeper subproject
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt resolved ZOOKEEPER-1037. - Resolution: Implemented Create BookKeeper subproject Key: ZOOKEEPER-1037 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1037 Project: ZooKeeper Issue Type: Task Components: contrib-bookkeeper, contrib-hedwig Reporter: Benjamin Reed move the hedwig and bookkeeper code to the bookkeeper subproject -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (ZOOKEEPER-1041) get hudson running on bookkeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt resolved ZOOKEEPER-1041. - Resolution: Implemented get hudson running on bookkeeper Key: ZOOKEEPER-1041 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1041 Project: ZooKeeper Issue Type: Sub-task Components: contrib-bookkeeper, contrib-hedwig Reporter: Benjamin Reed setup hudson to run on bookkeeper code -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1729) Add l4w command snap to trigger log rotation and snapshotting
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789806#comment-13789806 ] Patrick Hunt commented on ZOOKEEPER-1729: - Adding to JMX would also make sense imo (less familiar with the security support there though) Add l4w command snap to trigger log rotation and snapshotting Key: ZOOKEEPER-1729 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1729 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Minor snap command can be used to trigger log rotate and snapshotting on each server. One use case for this command is to make server restart faster by issuing snap command before restarting the server. This help when txnlog is large (due to txn size or number of txn) snap is a blocking command, it will return when snapshot is written to disk. So it is safe to call this prior to restarting the server. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1729) Add l4w command snap to trigger log rotation and snapshotting
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789804#comment-13789804 ] Patrick Hunt commented on ZOOKEEPER-1729: - I'm fine to add this, but not as a 4lw. 4lw are read-only operations due to the lack of security. Keep in mind 4lw also shares the same port as the client port. We should add this to the Jetty implementation (ZOOKEEPER-1346) which can have proper security/auth constraints. Add l4w command snap to trigger log rotation and snapshotting Key: ZOOKEEPER-1729 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1729 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Minor snap command can be used to trigger log rotate and snapshotting on each server. One use case for this command is to make server restart faster by issuing snap command before restarting the server. This help when txnlog is large (due to txn size or number of txn) snap is a blocking command, it will return when snapshot is written to disk. So it is safe to call this prior to restarting the server. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1624) PrepRequestProcessor abort multi-operation incorrectly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789807#comment-13789807 ] Camille Fournier commented on ZOOKEEPER-1624: - I'm comfortable with pushing the patch for 3.4 without the Java test. PrepRequestProcessor abort multi-operation incorrectly -- Key: ZOOKEEPER-1624 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Critical Labels: zk-review Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch We found this issue when trying to issue multiple instances of the following multi-op concurrently multi { 1. create sequential node /a- 2. create node /b } The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible. Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously 1. ZOK, ZOK 2. ZOK, ZNODEEXISTS, 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY, When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor 1. create /a-0001 2. create /a-0003 3. create /a-0001 The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node (/). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1586) tarballs for zkfuse don't compile out of tree
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789810#comment-13789810 ] Patrick Hunt commented on ZOOKEEPER-1586: - Still an issue? Should we close this? (see my recent comment) tarballs for zkfuse don't compile out of tree - Key: ZOOKEEPER-1586 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1586 Project: ZooKeeper Issue Type: Bug Components: contrib-zkfuse Affects Versions: 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Attachments: ZOOKEEPER-1586.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1499) clientPort config changes not backwards-compatible
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789811#comment-13789811 ] Patrick Hunt commented on ZOOKEEPER-1499: - [~fournc], [~breed] [~shralex] is this still an issue? clientPort config changes not backwards-compatible -- Key: ZOOKEEPER-1499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1499 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Camille Fournier Assignee: Benjamin Reed Priority: Blocker With the new reconfig logic, clientPort=2181 in the zoo.cfg file no longer gets read, and clients can't connect without adding ;2181 to the end of their server lines. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1558) Leader should not snapshot uncommitted state
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-1558: Affects Version/s: 3.4.6 Leader should not snapshot uncommitted state Key: ZOOKEEPER-1558 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1558 Project: ZooKeeper Issue Type: Sub-task Components: quorum Affects Versions: 3.4.6 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Priority: Blocker Fix For: 3.4.6 Attachments: ZOOKEEPER-1558.patch, ZOOKEEPER-1558.patch, ZOOKEEPER-1558.patch, ZOOKEEPER-1558.patch Leader currently takes a snapshot when it calls loadData in the beginning of the lead() method. The loaded data, however, may contain uncommitted state. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1430) add maven deploy support to the build
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-1430: Affects Version/s: 3.5.0 3.4.4 add maven deploy support to the build - Key: ZOOKEEPER-1430 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1430 Project: ZooKeeper Issue Type: Task Components: build Affects Versions: 3.4.4, 3.5.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1430-3.4.patch, ZOOKEEPER-1430.patch, ZOOKEEPER-1430.patch, ZOOKEEPER-1430-V1.PATCH, ZOOKEEPER-1430-V2.PATCH Infra is phasing out the current mechanism we use to deploy maven artifacts. We need to move to repository.apache.org (nexus). In particular we need to ensure the following artifacts get published: * zookeeper-3.x.y.jar * zookeeper-3.x.y-sources.jar * zookeeper-3.x.y-tests.jar * zookeeper-3.x.y-javadoc.jar In 3.4.4/3.4.5 we missed the tests jar which caused headaches for downstream users, such as Hadoop. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-871) ClientTest testClientCleanup is failing due to high fd count.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-871: --- Fix Version/s: (was: 3.5.0) ClientTest testClientCleanup is failing due to high fd count. - Key: ZOOKEEPER-871 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-871 Project: ZooKeeper Issue Type: Bug Reporter: Mahadev konar Priority: Blocker The fd counts has increased. The tests are repeatedly failing on hudson machines. I probably think this is related to netty server changes. We have to fix this before we release 3.4 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (ZOOKEEPER-871) ClientTest testClientCleanup is failing due to high fd count.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt resolved ZOOKEEPER-871. Resolution: Cannot Reproduce I don't believe we are seeing this currently. Reopen if it becomes an issue again. ClientTest testClientCleanup is failing due to high fd count. - Key: ZOOKEEPER-871 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-871 Project: ZooKeeper Issue Type: Bug Reporter: Mahadev konar Priority: Blocker Fix For: 3.5.0 The fd counts has increased. The tests are repeatedly failing on hudson machines. I probably think this is related to netty server changes. We have to fix this before we release 3.4 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1784) Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789819#comment-13789819 ] Alexander Shraer commented on ZOOKEEPER-1784: - I actually was just looking at this. The reconfig tests apparently are not testing the COMMITANDACTIVATE or OBSERVEANDACTIVATE paths in syncwithleader currently (I had a test where a server misses a reconfig and learns about it later but apparently its not through this code path). The only test that I see using the INFORMANDACTIVATE path is Zab1_0Test (testNormalObserverRun) For the COMMITANDACTIVATE its FollowerResyncConcurrencyTest, QuorumTest and Zab1_0Test We should try to understand how the tests above are activating these paths. Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus Key: ZOOKEEPER-1784 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1784 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Attachments: ZOOKEEPER-1784.patch If you look at Learner#syncWithLeader: {noformat} while (self.isRunning()) { readPacket(qp); switch(qp.getType()) { ... case Leader.INFORM: case Leader.INFORMANDACTIVATE: PacketInFlight packet = new PacketInFlight(); packet.hdr = new TxnHeader(); if (qp.getType() == Leader.COMMITANDACTIVATE) { {noformat} I guess qp.getType() == Leader.COMMITANDACTIVATE is a typo that should read qp.getType() == Leader.INFORMANDACTIVATE. Assigning to Alexander for now since this is part of ZOOKEEPER-107. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1624) PrepRequestProcessor abort multi-operation incorrectly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789821#comment-13789821 ] Patrick Hunt commented on ZOOKEEPER-1624: - bq. it is not really a whole new feature fwiw a public facing client API change that adds async support to multi is a feature imo. PrepRequestProcessor abort multi-operation incorrectly -- Key: ZOOKEEPER-1624 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Critical Labels: zk-review Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch We found this issue when trying to issue multiple instances of the following multi-op concurrently multi { 1. create sequential node /a- 2. create node /b } The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible. Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously 1. ZOK, ZOK 2. ZOK, ZNODEEXISTS, 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY, When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor 1. create /a-0001 2. create /a-0003 3. create /a-0001 The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node (/). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1586) tarballs for zkfuse don't compile out of tree
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789827#comment-13789827 ] Raul Gutierrez Segales commented on ZOOKEEPER-1586: --- Yes, I believe it is. Actually there are two issues: a) the tarball is created with missing source files and b) (more importantly) the BUILD path for the C libs is wrong. Should be: {noformat} -AC_CHECK_LIB(zookeeper_mt, main, [ZOOKEEPER_LD=-L${ZOOKEEPER_PATH}/.libs -lzookeeper_mt],,[-L${ZOOKEEPER_PATH}/.libs]) +ZOOKEEPER_BUILD_PATH=${BUILD_PATH}/../../../build/c +AC_CHECK_LIB(zookeeper_mt, main, [ZOOKEEPER_LD=-L${ZOOKEEPER_BUILD_PATH}/.libs -lzookeeper_mt],,[-L${ZOOKEEPER_BUILD_PATH}/.libs]) {noformat} And the third thing would be the configure.ac doesn't state that boost is needed. I'll update the patch. tarballs for zkfuse don't compile out of tree - Key: ZOOKEEPER-1586 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1586 Project: ZooKeeper Issue Type: Bug Components: contrib-zkfuse Affects Versions: 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Attachments: ZOOKEEPER-1586.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1586) tarballs for zkfuse don't compile out of tree
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789832#comment-13789832 ] Patrick Hunt commented on ZOOKEEPER-1586: - Is this an issue in 3.4 branch as well? (affects field says just trunk, please update the jira appropriately (if necessary)) thx. tarballs for zkfuse don't compile out of tree - Key: ZOOKEEPER-1586 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1586 Project: ZooKeeper Issue Type: Bug Components: contrib-zkfuse Affects Versions: 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Attachments: ZOOKEEPER-1586.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1624) PrepRequestProcessor abort multi-operation incorrectly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789833#comment-13789833 ] Flavio Junqueira commented on ZOOKEEPER-1624: - Given the recent set of comments, I'm not sure it matters, but my point was simply that adding async calls for a feature that already exists is not really adding a whole new feature, just extending the scope of an existing one. For me, the feature is multi. But fine if we can make progress without it. PrepRequestProcessor abort multi-operation incorrectly -- Key: ZOOKEEPER-1624 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Critical Labels: zk-review Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch We found this issue when trying to issue multiple instances of the following multi-op concurrently multi { 1. create sequential node /a- 2. create node /b } The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible. Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously 1. ZOK, ZOK 2. ZOK, ZNODEEXISTS, 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY, When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor 1. create /a-0001 2. create /a-0003 3. create /a-0001 The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node (/). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1499) clientPort config changes not backwards-compatible
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789852#comment-13789852 ] Alexander Shraer commented on ZOOKEEPER-1499: - I tried this: clientPort is being read, but if you specified both clientPort and the port in the new format (after ;) they must be the same. Same for clientPortAddress. But seems like Camille is right in her comment that if you specified the clientPort with the new format you won't be able to connect to it using localhost. I'm not sure if this is a problem but I'm guessing that it may be because previously if you just specified clientPort, the IP was taken as localhost implicitly. Whereas now if you say a:b:c;d, a is taken as the ip for d. To achieve the same as before you can write a:b:c;localhost:d. While trying this I found a corner case missing in zkServer.sh -- if the specification uses the new format but still appears in the static configuration file (backward compatibility), zkServer.sh won't find the port (for example if you say ./bin/zkServer.sh status it will complain). Attached is a small patch for this. clientPort config changes not backwards-compatible -- Key: ZOOKEEPER-1499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1499 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Camille Fournier Assignee: Benjamin Reed Priority: Blocker With the new reconfig logic, clientPort=2181 in the zoo.cfg file no longer gets read, and clients can't connect without adding ;2181 to the end of their server lines. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1499) clientPort config changes not backwards-compatible
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Shraer updated ZOOKEEPER-1499: Attachment: zkServersh.patch clientPort config changes not backwards-compatible -- Key: ZOOKEEPER-1499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1499 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Camille Fournier Assignee: Benjamin Reed Priority: Blocker Attachments: zkServersh.patch With the new reconfig logic, clientPort=2181 in the zoo.cfg file no longer gets read, and clients can't connect without adding ;2181 to the end of their server lines. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1624) PrepRequestProcessor abort multi-operation incorrectly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Camille Fournier updated ZOOKEEPER-1624: Attachment: ZOOKEEPER-1624-3.4 patch for 3.4 PrepRequestProcessor abort multi-operation incorrectly -- Key: ZOOKEEPER-1624 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Critical Labels: zk-review Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1624-3.4, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch We found this issue when trying to issue multiple instances of the following multi-op concurrently multi { 1. create sequential node /a- 2. create node /b } The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible. Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously 1. ZOK, ZOK 2. ZOK, ZNODEEXISTS, 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY, When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor 1. create /a-0001 2. create /a-0003 3. create /a-0001 The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node (/). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1499) clientPort config changes not backwards-compatible
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-1499: Fix Version/s: 3.5.0 Assignee: Alexander Shraer (was: Benjamin Reed) clientPort config changes not backwards-compatible -- Key: ZOOKEEPER-1499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1499 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Camille Fournier Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: zkServersh.patch With the new reconfig logic, clientPort=2181 in the zoo.cfg file no longer gets read, and clients can't connect without adding ;2181 to the end of their server lines. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1785) Small fix in zkServer.sh to support new configuration format
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Shraer updated ZOOKEEPER-1785: Attachment: zkServersh.patch Small fix in zkServer.sh to support new configuration format Key: ZOOKEEPER-1785 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1785 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.5.0 Reporter: Alexander Shraer Assignee: Alexander Shraer Priority: Minor Fix For: 3.5.0 Attachments: zkServersh.patch The problem can be reproduced by running a server with the following type of config file: dataDir=/Users/shralex/zookeeper-test/zookeeper1 syncLimit=2 initLimit=5 tickTime=2000 server.1=localhost:2721:2731:participant;2791 server.2=localhost:2722:2732:participant;2792 and then trying to do zkServer.sh status Here I specified the servers using the new config format but still used the static config file and didn't include the clientPort key. zkServer.sh already supports the new configuration format, but expects server spec to appear in the dynamic config file if it uses the new format. So in the example above it will not find the client port. The current logic for executing something like 'zkServer.sh status' is: 1. Look for clientPort keyword in the static config file 2. Look for the client port in the server spec in the dynamic config file The attached patch adds an intermediate step: 1'. Look for the client port in the server spec in the static config file -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1499) clientPort config changes not backwards-compatible
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Shraer updated ZOOKEEPER-1499: Attachment: (was: zkServersh.patch) clientPort config changes not backwards-compatible -- Key: ZOOKEEPER-1499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1499 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Camille Fournier Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 With the new reconfig logic, clientPort=2181 in the zoo.cfg file no longer gets read, and clients can't connect without adding ;2181 to the end of their server lines. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1499) clientPort config changes not backwards-compatible
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789869#comment-13789869 ] Alexander Shraer commented on ZOOKEEPER-1499: - I opened a separate Jira for the zkServer.sh change. I suggest to close this one, since I verified that clientPort keyword is working. clientPort config changes not backwards-compatible -- Key: ZOOKEEPER-1499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1499 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Camille Fournier Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 With the new reconfig logic, clientPort=2181 in the zoo.cfg file no longer gets read, and clients can't connect without adding ;2181 to the end of their server lines. -- This message was sent by Atlassian JIRA (v6.1#6144)
Failed: ZOOKEEPER-1499 PreCommit Build #1661
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1499 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1661/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 274948 lines...] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12607472/zkServersh.patch [exec] against trunk revision 1530166. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1661//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1661//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1661//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 884f190be581fcc3cb2bdcbf4332e4ece2207aa6 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 1 Total time: 31 minutes 39 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1499 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-1499) clientPort config changes not backwards-compatible
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789886#comment-13789886 ] Hadoop QA commented on ZOOKEEPER-1499: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607472/zkServersh.patch against trunk revision 1530166. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1661//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1661//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1661//console This message is automatically generated. clientPort config changes not backwards-compatible -- Key: ZOOKEEPER-1499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1499 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Camille Fournier Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 With the new reconfig logic, clientPort=2181 in the zoo.cfg file no longer gets read, and clients can't connect without adding ;2181 to the end of their server lines. -- This message was sent by Atlassian JIRA (v6.1#6144)
Failed: ZOOKEEPER-1624 PreCommit Build #1662
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1624 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1662/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 63 lines...] [exec] == [exec] Applying patch. [exec] == [exec] == [exec] [exec] [exec] patching file src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java [exec] Hunk #1 FAILED at 198. [exec] 1 out of 1 hunk FAILED -- saving rejects to file src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java.rej [exec] patching file src/c/tests/TestMulti.cc [exec] PATCH APPLICATION FAILED [exec] [exec] [exec] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12607473/ZOOKEEPER-1624-3.4 [exec] against trunk revision 1530166. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] -1 patch. The patch command could not apply the patch. [exec] [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1662//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 0c6b99eb0f6707da69869615f7e8dfac3e7459e8 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 1 Total time: 43 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1624 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (ZOOKEEPER-1624) PrepRequestProcessor abort multi-operation incorrectly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789887#comment-13789887 ] Hadoop QA commented on ZOOKEEPER-1624: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607473/ZOOKEEPER-1624-3.4 against trunk revision 1530166. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1662//console This message is automatically generated. PrepRequestProcessor abort multi-operation incorrectly -- Key: ZOOKEEPER-1624 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Critical Labels: zk-review Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1624-3.4, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch We found this issue when trying to issue multiple instances of the following multi-op concurrently multi { 1. create sequential node /a- 2. create node /b } The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible. Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously 1. ZOK, ZOK 2. ZOK, ZNODEEXISTS, 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY, When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor 1. create /a-0001 2. create /a-0003 3. create /a-0001 The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node (/). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1499) clientPort config changes not backwards-compatible
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789896#comment-13789896 ] Camille Fournier commented on ZOOKEEPER-1499: - I'm not sure I can remember anything about 2012 at this point. Seems strange that this is something fixed by a change to a shell script though, is that the way the config is always parsed now? clientPort config changes not backwards-compatible -- Key: ZOOKEEPER-1499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1499 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Camille Fournier Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 With the new reconfig logic, clientPort=2181 in the zoo.cfg file no longer gets read, and clients can't connect without adding ;2181 to the end of their server lines. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1785) Small fix in zkServer.sh to support new configuration format
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789911#comment-13789911 ] Hadoop QA commented on ZOOKEEPER-1785: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607475/zkServersh.patch against trunk revision 1530166. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1663//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1663//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1663//console This message is automatically generated. Small fix in zkServer.sh to support new configuration format Key: ZOOKEEPER-1785 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1785 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.5.0 Reporter: Alexander Shraer Assignee: Alexander Shraer Priority: Minor Fix For: 3.5.0 Attachments: zkServersh.patch The problem can be reproduced by running a server with the following type of config file: dataDir=/Users/shralex/zookeeper-test/zookeeper1 syncLimit=2 initLimit=5 tickTime=2000 server.1=localhost:2721:2731:participant;2791 server.2=localhost:2722:2732:participant;2792 and then trying to do zkServer.sh status Here I specified the servers using the new config format but still used the static config file and didn't include the clientPort key. zkServer.sh already supports the new configuration format, but expects server spec to appear in the dynamic config file if it uses the new format. So in the example above it will not find the client port. The current logic for executing something like 'zkServer.sh status' is: 1. Look for clientPort keyword in the static config file 2. Look for the client port in the server spec in the dynamic config file The attached patch adds an intermediate step: 1'. Look for the client port in the server spec in the static config file -- This message was sent by Atlassian JIRA (v6.1#6144)
Failed: ZOOKEEPER-1785 PreCommit Build #1663
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1785 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1663/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 283760 lines...] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12607475/zkServersh.patch [exec] against trunk revision 1530166. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1663//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1663//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1663//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 278b87f32aef3aa8312ab0d1874c8d4679e66de5 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 1 Total time: 31 minutes 35 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1785 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-1499) clientPort config changes not backwards-compatible
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789913#comment-13789913 ] Alexander Shraer commented on ZOOKEEPER-1499: - Hi Camille, There are two issues pointed out in this JIRA. 1) (in the description) clientPort no longer gets read and 2) (in your first comment) and a client can't use localhost:... I tried but can't reproduce issue 1, seems like clientPort is being read. Thats why I suggested to close the JIRA. For issue 2, I think you're right, and I think this follows from how the ip address is inferred if you don't explicitly specify it. Previously it was assumed to be 'localhost' whereas now its assumed to be the ip from the server spec line. I'm not sure if this is a problem. I moved the patch to zkServer.sh to a separate JIRA - ZOOKEEPER-1785 since it solves a different problem clientPort config changes not backwards-compatible -- Key: ZOOKEEPER-1499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1499 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Camille Fournier Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 With the new reconfig logic, clientPort=2181 in the zoo.cfg file no longer gets read, and clients can't connect without adding ;2181 to the end of their server lines. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (ZOOKEEPER-1019) zkfuse doesn't list dependency on boost in README
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raul Gutierrez Segales reassigned ZOOKEEPER-1019: - Assignee: Raul Gutierrez Segales zkfuse doesn't list dependency on boost in README - Key: ZOOKEEPER-1019 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1019 Project: ZooKeeper Issue Type: Improvement Components: contrib Affects Versions: 3.4.0 Reporter: Karel Vervaeke Assignee: Raul Gutierrez Segales Original Estimate: 5m Remaining Estimate: 5m The README.txt under contrib/fuse doesn't list boost under Development build libraries -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1019) zkfuse doesn't list dependency on boost in README
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789923#comment-13789923 ] Raul Gutierrez Segales commented on ZOOKEEPER-1019: --- [~phunt]: i haven't seen the crashes. I'll upload a patch that adds: {noformat} AC_CHECK_LIB([boost], [main], [], [AC_MSG_ERROR(We need boost to build zkfuse)]) {noformat} or: {noformat} AC_CHECK_HEADERS([boost/shared_ptr.hpp boost/shared_array.hpp boost/date_time/gregorian/gregorian.hpp],,AC_MSG_ERROR([boost library headers not found. Please install boost library.])) {noformat} or such such to configure.ac (as well us updating the README). zkfuse doesn't list dependency on boost in README - Key: ZOOKEEPER-1019 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1019 Project: ZooKeeper Issue Type: Improvement Components: contrib Affects Versions: 3.4.0 Reporter: Karel Vervaeke Assignee: Raul Gutierrez Segales Original Estimate: 5m Remaining Estimate: 5m The README.txt under contrib/fuse doesn't list boost under Development build libraries -- This message was sent by Atlassian JIRA (v6.1#6144)