ZooKeeper_branch34_jdk7 - Build # 919 - Failure
See https://builds.apache.org/job/ZooKeeper_branch34_jdk7/919/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 213540 lines...] [junit] 2015-06-16 10:00:52,190 [myid:] - INFO [main:JMXEnv@246] - expect:StandaloneServer_port [junit] 2015-06-16 10:00:52,191 [myid:] - INFO [main:JMXEnv@250] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port11221 [junit] 2015-06-16 10:00:52,191 [myid:] - INFO [main:ClientBase@490] - STOPPING server [junit] 2015-06-16 10:00:52,191 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@224] - NIOServerCnxn factory exited run method [junit] 2015-06-16 10:00:52,191 [myid:] - INFO [main:ZooKeeperServer@441] - shutting down [junit] 2015-06-16 10:00:52,192 [myid:] - INFO [main:SessionTrackerImpl@225] - Shutting down [junit] 2015-06-16 10:00:52,192 [myid:] - INFO [main:PrepRequestProcessor@768] - Shutting down [junit] 2015-06-16 10:00:52,192 [myid:] - INFO [main:SyncRequestProcessor@209] - Shutting down [junit] 2015-06-16 10:00:52,192 [myid:] - INFO [ProcessThread(sid:0 cport:11221)::PrepRequestProcessor@144] - PrepRequestProcessor exited loop! [junit] 2015-06-16 10:00:52,192 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@187] - SyncRequestProcessor exited! [junit] 2015-06-16 10:00:52,193 [myid:] - INFO [main:FinalRequestProcessor@415] - shutdown of request processor complete [junit] 2015-06-16 10:00:52,193 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2015-06-16 10:00:52,194 [myid:] - INFO [main:JMXEnv@146] - ensureOnly:[] [junit] 2015-06-16 10:00:52,195 [myid:] - INFO [main:ClientBase@443] - STARTING server [junit] 2015-06-16 10:00:52,196 [myid:] - INFO [main:ClientBase@364] - CREATING server instance 127.0.0.1:11221 [junit] 2015-06-16 10:00:52,196 [myid:] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2015-06-16 10:00:52,196 [myid:] - INFO [main:ClientBase@339] - STARTING server instance 127.0.0.1:11221 [junit] 2015-06-16 10:00:52,197 [myid:] - INFO [main:ZooKeeperServer@162] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /x1/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk7/branch-3.4/build/test/tmp/test1362945604557324636.junit.dir/version-2 snapdir /x1/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk7/branch-3.4/build/test/tmp/test1362945604557324636.junit.dir/version-2 [junit] 2015-06-16 10:00:52,201 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2015-06-16 10:00:52,201 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:44487 [junit] 2015-06-16 10:00:52,201 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@827] - Processing stat command from /127.0.0.1:44487 [junit] 2015-06-16 10:00:52,202 [myid:] - INFO [Thread-4:NIOServerCnxn$StatCommand@663] - Stat command output [junit] 2015-06-16 10:00:52,202 [myid:] - INFO [Thread-4:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:44487 (no session established for client) [junit] 2015-06-16 10:00:52,202 [myid:] - INFO [main:JMXEnv@229] - ensureParent:[InMemoryDataTree, StandaloneServer_port] [junit] 2015-06-16 10:00:52,205 [myid:] - INFO [main:JMXEnv@246] - expect:InMemoryDataTree [junit] 2015-06-16 10:00:52,205 [myid:] - INFO [main:JMXEnv@250] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port11221,name1=InMemoryDataTree [junit] 2015-06-16 10:00:52,205 [myid:] - INFO [main:JMXEnv@246] - expect:StandaloneServer_port [junit] 2015-06-16 10:00:52,205 [myid:] - INFO [main:JMXEnv@250] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port11221 [junit] 2015-06-16 10:00:52,206 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@58] - Memory used 10403 [junit] 2015-06-16 10:00:52,206 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@63] - Number of threads 20 [junit] 2015-06-16 10:00:52,206 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@78] - FINISHED TEST METHOD testQuota [junit] 2015-06-16 10:00:52,206 [myid:] - INFO [main:ClientBase@520] - tearDown starting [junit] 2015-06-16 10:00:52,275 [myid:] - INFO [main:ZooKeeper@684] - Session: 0x14dfbd07395 closed [junit] 2015-06-16 10:00:52,275 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@524] - EventThread shut down for session: 0x14dfbd07395 [junit] 2015-06-16 10:00:52,276 [myid:] - INFO [main:ClientBase@490] - STOPPING server [junit] 2015-06-16 10:00:52,276 [myid:] - INFO [NIOServerCxn.F
ZooKeeper-trunk - Build # 2728 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk/2728/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 364234 lines...] [junit] 2015-06-16 10:58:54,715 [myid:] - INFO [main:QuorumUtil@254] - Shutting down leader election QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:14051)(secure=disabled) [junit] 2015-06-16 10:58:54,715 [myid:] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:14051)(secure=disabled):MBeanRegistry@119] - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id3] [junit] 2015-06-16 10:58:54,715 [myid:] - INFO [main:QuorumUtil@259] - Waiting for QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:14051)(secure=disabled) to exit thread [junit] 2015-06-16 10:58:54,715 [myid:] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:14051)(secure=disabled):MBeanRegistry@119] - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.3] [junit] 2015-06-16 10:58:54,715 [myid:] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:14051)(secure=disabled):MBeanRegistry@119] - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.1] [junit] 2015-06-16 10:58:54,715 [myid:] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:14051)(secure=disabled):MBeanRegistry@119] - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.2] [junit] 2015-06-16 10:58:54,715 [myid:] - INFO [main:FourLetterWordMain@63] - connecting to 127.0.0.1 14045 [junit] 2015-06-16 10:58:54,716 [myid:] - INFO [main:QuorumUtil@243] - 127.0.0.1:14045 is no longer accepting client connections [junit] 2015-06-16 10:58:54,716 [myid:] - INFO [main:FourLetterWordMain@63] - connecting to 127.0.0.1 14048 [junit] 2015-06-16 10:58:54,716 [myid:] - INFO [main:QuorumUtil@243] - 127.0.0.1:14048 is no longer accepting client connections [junit] 2015-06-16 10:58:54,716 [myid:] - INFO [main:FourLetterWordMain@63] - connecting to 127.0.0.1 14051 [junit] 2015-06-16 10:58:54,716 [myid:] - INFO [main:QuorumUtil@243] - 127.0.0.1:14051 is no longer accepting client connections [junit] 2015-06-16 10:58:54,718 [myid:] - INFO [main:ZKTestCase$1@65] - SUCCEEDED testPortChange [junit] 2015-06-16 10:58:54,718 [myid:] - INFO [main:ZKTestCase$1@60] - FINISHED testPortChange [junit] Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 245.326 sec, Thread: 2, Class: org.apache.zookeeper.test.ReconfigTest [junit] 2015-06-16 10:58:54,748 [myid:] - INFO [main-SendThread(127.0.0.1:13999):ClientCnxn$SendThread@1138] - Opening socket connection to server 127.0.0.1/127.0.0.1:13999. Will not attempt to authenticate using SASL (unknown error) [junit] 2015-06-16 10:58:54,749 [myid:] - WARN [main-SendThread(127.0.0.1:13999):ClientCnxn$SendThread@1257] - Session 0x4028bf8086c for server 127.0.0.1/127.0.0.1:13999, unexpected error, closing socket connection and attempting reconnect [junit] java.net.ConnectException: Connection refused [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236) [junit] 2015-06-16 10:58:54,795 [myid:] - INFO [main-SendThread(127.0.0.1:13942):ClientCnxn$SendThread@1138] - Opening socket connection to server 127.0.0.1/127.0.0.1:13942. Will not attempt to authenticate using SASL (unknown error) [junit] 2015-06-16 10:58:54,796 [myid:] - WARN [main-SendThread(127.0.0.1:13942):ClientCnxn$SendThread@1257] - Session 0x2028bf76d7f for server 127.0.0.1/127.0.0.1:13942, unexpected error, closing socket connection and attempting reconnect [junit] java.net.ConnectException: Connection refused [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236) [junit] 2015-06-16 10:58:54,937 [myid:] - INFO [main-SendThread(127.0.0.1:13939):ClientCnxn$SendThread@1138] - Opening socket connection to server 127.0.0.1/127.0.0.1:13939. Will not attempt to authenticate using SASL (unknown error) [junit] 2015-06-16 10:58:54,937 [myid:] - WARN [main-SendThread(127.0.0.1:13939):ClientCnxn$SendThread@1257] - Session 0x1028bf76d80 for server 127.0.0.1/127.0.0.1:13939, unexpected error, closing socket connection and attempting reconnect [junit] java.net.ConnectException: Connection refused [jun
[jira] [Commented] (ZOOKEEPER-2212) distributed race condition related to QV version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587866#comment-14587866 ] Hudson commented on ZOOKEEPER-2212: --- FAILURE: Integrated in ZooKeeper-trunk #2728 (See [https://builds.apache.org/job/ZooKeeper-trunk/2728/]) ZOOKEEPER-2212: distributed race condition related to QV version (Akihiro Suda via rgs) (rgs: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1685685) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java > distributed race condition related to QV version > > > Key: ZOOKEEPER-2212 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2212 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.5.0 >Reporter: Akihiro Suda >Assignee: Akihiro Suda >Priority: Critical > Fix For: 3.5.1, 3.6.0 > > Attachments: > 0001-ZOOKEEPER-2212-distributed-race-condition-related-to.patch, > ZOOKEEPER-2212-v2.patch, ZOOKEEPER-2212-v3.patch > > > When a joiner is listed as an observer in an initial config, > the joiner should become a non-voting follower (not an observer) until > reconfig is triggered. > [(Link)|http://zookeeper.apache.org/doc/trunk/zookeeperReconfig.html#sc_reconfig_general] > I found a distributed race-condition situation where an observer keeps being > an observer and cannot become a non-voting follower. > This race condition happens when an observer receives an UPTODATE Quorum > Packet from the leader:2888/tcp *after* receiving a Notification FLE Packet > of which n.config version is larger than the observer's one from > leader:3888/tcp. > h4. Detail > * Problem: An observer cannot become a non-voting follower > * Cause: Cannot restart FLE > * Cause: In {{QuorumPeer.run()}}, cannot shutdown {{Observer}} > [(Link)|https://github.com/apache/zookeeper/blob/98a3cabfa279833b81908d72f1c10ee9f598a045/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1014] > * Cause: In {{QuorumPeer.run()}}, cannot return from > {{Observer.observeLeader()}} > [(Link)|https://github.com/apache/zookeeper/blob/98a3cabfa279833b81908d72f1c10ee9f598a045/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1010] > * Cause: In {{Observer.observeLeader()}}, {{Learner.syncWithLeader()}} does > not throw an exception of "changes proposed in reconfig" > [(Link)|https://github.com/apache/zookeeper/blob/98a3cabfa279833b81908d72f1c10ee9f598a045/src/java/main/org/apache/zookeeper/server/quorum/Observer.java#L79] > * Cause: In {{switch(qp.getType()) case UPTODATE}} of > {{Learner.syncWithLeader()}} > [(Link)|https://github.com/apache/zookeeper/blob/98a3cabfa279833b81908d72f1c10ee9f598a045/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L492-507], > {{QuorumPeer.processReconfig()}} > [(Link)|https://github.com/apache/zookeeper/blob/98a3cabfa279833b81908d72f1c10ee9f598a045/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1644]returns > false with a log message like ["2 setQuorumVerifier called with known or old > config 4294967296. Current version: > 4294967296"|https://github.com/osrg/earthquake/blob/v0.1/example/zk-found-bug.ether/example-output/3.REPRODUCED/zk2.log]. > > [(Link)|https://github.com/apache/zookeeper/blob/98a3cabfa279833b81908d72f1c10ee9f598a045/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1369] > , > * Cause: The observer have already received a Notification > Packet({{n.config.version=4294967296}}) and invoked > {{QuorumPeer.processReconfig()}} > [(Link)|https://github.com/apache/zookeeper/blob/98a3cabfa279833b81908d72f1c10ee9f598a045/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L291-304] > > h4. How I found this bug > I found this bug using [Earthquake|http://osrg.github.io/earthquake/], our > open-source dynamic model checker for real implementations of distributed > systems. > Earthquakes permutes C/Java function calls, Ethernet packets, and injected > fault events in various orders so as to find implementation-level bugs of the > distributed system. > When Earthquake finds a bug, Earthquake automatically records [the event > history|https://github.com/osrg/earthquake/blob/v0.1/example/zk-found-bug.ether/example-output/3.REPRODUCED/json] > and helps the user to analyze which permutation of events triggers the bug. > I analyzed Earthquake's event histories and found that the bug is triggered > when an observer receives an UPTODATE *after* receiving a specific kind of > FLE packet. > h4. How to reproduce this bug > You can also easily reproduce the bug using Earthquake. > I made a Docker container > [osrg/earthquake-zookeeper-2212|https://registry
ZooKeeper_branch35_jdk7 - Build # 329 - Failure
See https://builds.apache.org/job/ZooKeeper_branch35_jdk7/329/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 371915 lines...] [junit] 2015-06-16 12:28:15,641 [myid:] - INFO [SessionTracker:SessionTrackerImpl@158] - SessionTrackerImpl exited loop! [junit] 2015-06-16 12:28:16,987 [myid:] - INFO [main-SendThread(127.0.0.1:27383):ClientCnxn$SendThread@1138] - Opening socket connection to server 127.0.0.1/127.0.0.1:27383. Will not attempt to authenticate using SASL (unknown error) [junit] 2015-06-16 12:28:16,988 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:27383:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:40693 [junit] 2015-06-16 12:28:16,988 [myid:] - INFO [main-SendThread(127.0.0.1:27383):ClientCnxn$SendThread@980] - Socket connection established, initiating session, client: /127.0.0.1:40693, server: 127.0.0.1/127.0.0.1:27383 [junit] 2015-06-16 12:28:16,990 [myid:] - INFO [NIOWorkerThread-2:ZooKeeperServer@936] - Client attempting to renew session 0x100fb47ec460001 at /127.0.0.1:40693 [junit] 2015-06-16 12:28:16,990 [myid:] - INFO [NIOWorkerThread-2:ZooKeeperServer@645] - Established session 0x100fb47ec460001 with negotiated timeout 3 for client /127.0.0.1:40693 [junit] 2015-06-16 12:28:16,991 [myid:] - INFO [main-SendThread(127.0.0.1:27383):ClientCnxn$SendThread@1400] - Session establishment complete on server 127.0.0.1/127.0.0.1:27383, sessionid = 0x100fb47ec460001, negotiated timeout = 3 [junit] 2015-06-16 12:28:17,541 [myid:] - INFO [main-SendThread(127.0.0.1:27383):ClientCnxn$SendThread@1138] - Opening socket connection to server 127.0.0.1/127.0.0.1:27383. Will not attempt to authenticate using SASL (unknown error) [junit] 2015-06-16 12:28:17,541 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:27383:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:40694 [junit] 2015-06-16 12:28:17,541 [myid:] - INFO [main-SendThread(127.0.0.1:27383):ClientCnxn$SendThread@980] - Socket connection established, initiating session, client: /127.0.0.1:40694, server: 127.0.0.1/127.0.0.1:27383 [junit] 2015-06-16 12:28:17,542 [myid:] - INFO [NIOWorkerThread-6:ZooKeeperServer@936] - Client attempting to renew session 0x100fb47ec46 at /127.0.0.1:40694 [junit] 2015-06-16 12:28:17,543 [myid:] - INFO [NIOWorkerThread-6:ZooKeeperServer@645] - Established session 0x100fb47ec46 with negotiated timeout 3 for client /127.0.0.1:40694 [junit] 2015-06-16 12:28:17,544 [myid:] - INFO [main-SendThread(127.0.0.1:27383):ClientCnxn$SendThread@1400] - Session establishment complete on server 127.0.0.1/127.0.0.1:27383, sessionid = 0x100fb47ec46, negotiated timeout = 3 [junit] 2015-06-16 12:28:17,545 [myid:] - INFO [SyncThread:0:FileTxnLog@200] - Creating new log file: log.6 [junit] 2015-06-16 12:28:17,605 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@58] - Memory used 59241 [junit] 2015-06-16 12:28:17,605 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@63] - Number of threads 33 [junit] 2015-06-16 12:28:17,605 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@78] - FINISHED TEST METHOD testChildWatcherAutoResetWithChroot [junit] 2015-06-16 12:28:17,605 [myid:] - INFO [main:ClientBase@538] - tearDown starting [junit] 2015-06-16 12:28:17,606 [myid:] - INFO [ProcessThread(sid:0 cport:27383)::PrepRequestProcessor@640] - Processed session termination for sessionid: 0x100fb47ec46 [junit] 2015-06-16 12:28:17,630 [myid:] - INFO [NIOWorkerThread-12:MBeanRegistry@119] - Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port27383,name1=Connections,name2=127.0.0.1,name3=0x100fb47ec46] [junit] 2015-06-16 12:28:17,630 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@542] - EventThread shut down for session: 0x100fb47ec46 [junit] 2015-06-16 12:28:17,630 [myid:] - INFO [NIOWorkerThread-12:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:40694 which had sessionid 0x100fb47ec46 [junit] 2015-06-16 12:28:17,630 [myid:] - INFO [main:ZooKeeper@1110] - Session: 0x100fb47ec46 closed [junit] 2015-06-16 12:28:17,632 [myid:] - INFO [ProcessThread(sid:0 cport:27383)::PrepRequestProcessor@640] - Processed session termination for sessionid: 0x100fb47ec460001 [junit] 2015-06-16 12:28:17,655 [myid:] - INFO [NIOWorkerThread-14:MBeanRegistry@119] - Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port27383,name1=Connections,name2=127.0.0.1,name3=0x100fb47ec460001] [junit] 2015-06-16 12:28:17,655 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@542] - EventThread shut down for session: 0x100fb47ec460001 [
[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588049#comment-14588049 ] Ziyou Wang commented on ZOOKEEPER-2172: --- Hi [~shralex], thanks for notifying this. I try the 2212 patch today and find it indeed reduce the chance to reproduce the problem. From the log, I can see this logic: if (!rqv.equals(curQV)) { LOG.info("restarting leader election"); self.shuttingDownLE = true; self.getElectionAlg().shutdown(); break; } will be executed in node 2 once and node 3 twice in success case. But I still can hit the problem (although it is hard than before). In the fail case, I find the above logic only is executed once in the node 3. Sorry for updating this jira late, I am busy for some urgent issues before. > Cluster crashes when reconfig a new node as a participant > - > > Key: ZOOKEEPER-2172 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum, server >Affects Versions: 3.5.0 > Environment: Ubuntu 12.04 + java 7 >Reporter: Ziyou Wang >Priority: Critical > Attachments: node-1.log, node-2.log, node-3.log, zoo-1.log, > zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, zoo-2.log, zoo-3-1.log, zoo-3-2.log, > zoo-3-3.log, zoo-3.log, zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, > zookeeper-1.log, zookeeper-2.log, zookeeper-3.log > > > The operations are quite simple: start three zk servers one by one, then > reconfig the cluster to add the new one as a participant. When I add the > third one, the zk cluster may enter a weird state and cannot recover. > > I found “2015-04-20 12:53:48,236 [myid:1] - INFO [ProcessThread(sid:1 > cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. > So the first node received the reconfig cmd at 12:53:48. Latter, it logged > “2015-04-20 12:53:52,230 [myid:1] - ERROR > [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception > causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] > - WARN [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE > /10.0.0.2:55890 ”. From then on, the first node and second node > rejected all client connections and the third node didn’t join the cluster as > a participant. The whole cluster was done. > > When the problem happened, all three nodes just used the same dynamic > config file zoo.cfg.dynamic.1005d which only contained the first two > nodes. But there was another unused dynamic config file in node-1 directory > zoo.cfg.dynamic.next which already contained three nodes. > > When I extended the waiting time between starting the third node and > reconfiguring the cluster, the problem didn’t show again. So it should be a > race condition problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2216) Get the property hierarchy as a a whole tree
Nabarun Mondal created ZOOKEEPER-2216: - Summary: Get the property hierarchy as a a whole tree Key: ZOOKEEPER-2216 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2216 Project: ZooKeeper Issue Type: Improvement Components: c client Affects Versions: 3.5.0 Reporter: Nabarun Mondal Priority: Minor I am logging this as part of a feature request. We use Zookeeper - pretty extensively. Thanks for putting a pretty awesome product! This is a feature request. As of now, there is no way to ask Zookeeper to get the whole property hierarchy as a whole tree in a single call. We would be grateful if you guys can give this facility to get the whole property tree as a whole. NOTE: I personally won't mind coding this, if you guys permit me. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2095) Add Systemd startup/conf files
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588467#comment-14588467 ] Alex Elent commented on ZOOKEEPER-2095: --- Not sure if relevant but I was not able to get systemd working with Zookeeper without adding "Type=forking" This is my final config: {noformat} [Unit] Description=Apache Zookeeper After=network.target [Service] Type=forking User=zookeeper Group=zookeeper SyslogIdentifier=zookeeper Restart=always RestartSec=0s ExecStart=/usr/bin/zookeeper-server start ExecStop=/usr/bin/zookeeper-server stop ExecReload=/usr/bin/zookeeper-server restart [Install] WantedBy=multi-user.target {noformat} > Add Systemd startup/conf files > -- > > Key: ZOOKEEPER-2095 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2095 > Project: ZooKeeper > Issue Type: Improvement > Components: contrib >Reporter: Guillaume ALAUX >Priority: Minor > Attachments: ZOOKEEPER-2095.patch > > > As adoption of systemd by distributions grows, it would be nice to have > systemd configuration and startup files for Zookeeper in the upstream tree. I > would thus like to contribute the following patch which brings the followings > systemd files: > - {{sysusers.d_zookeeper.conf}}: creates {{zookeeper}} Linux system user to > run Zookeeper > - {{tmpfiles.d_zookeeper.conf}}: creates temporary {{/var/log/zookeeper}} and > {{/var/lib/zookeeper} directories > - {{zookeeper.service}}: regular systemd startup _script_ > - {{zookeeper@.service}}: systemd startup _script_ for specific use (for > instance when Zookeeper is invoked to support some other piece of software – > [example for > Kafka|http://pkgbuild.com/git/aur-mirror.git/tree/kafka/systemd_kafka.service#n3], > [example for > Storm|http://pkgbuild.com/git/aur-mirror.git/tree/storm/systemd_storm-nimbus.service#n3]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2095) Add Systemd startup/conf files
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588493#comment-14588493 ] Raul Gutierrez Segales commented on ZOOKEEPER-2095: --- [~aelent]: what about using the config provided by the patch attached here? > Add Systemd startup/conf files > -- > > Key: ZOOKEEPER-2095 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2095 > Project: ZooKeeper > Issue Type: Improvement > Components: contrib >Reporter: Guillaume ALAUX >Priority: Minor > Attachments: ZOOKEEPER-2095.patch > > > As adoption of systemd by distributions grows, it would be nice to have > systemd configuration and startup files for Zookeeper in the upstream tree. I > would thus like to contribute the following patch which brings the followings > systemd files: > - {{sysusers.d_zookeeper.conf}}: creates {{zookeeper}} Linux system user to > run Zookeeper > - {{tmpfiles.d_zookeeper.conf}}: creates temporary {{/var/log/zookeeper}} and > {{/var/lib/zookeeper} directories > - {{zookeeper.service}}: regular systemd startup _script_ > - {{zookeeper@.service}}: systemd startup _script_ for specific use (for > instance when Zookeeper is invoked to support some other piece of software – > [example for > Kafka|http://pkgbuild.com/git/aur-mirror.git/tree/kafka/systemd_kafka.service#n3], > [example for > Storm|http://pkgbuild.com/git/aur-mirror.git/tree/storm/systemd_storm-nimbus.service#n3]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2203) multiple leaders can be elected when configs conflict
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588500#comment-14588500 ] Alexander Shraer commented on ZOOKEEPER-2203: - I propose to close this jira. The described issue is an expected behaviour -- server 3 is the only participant as far as it knows so it cannot wait for any info from the others that may just as well be down. If there is any proposal for a better bootstrapping method this is probably an "improvement" and not a bug and should have an appropriate Jira. If anyone objects, feel free to reopen. > multiple leaders can be elected when configs conflict > - > > Key: ZOOKEEPER-2203 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2203 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection >Affects Versions: 3.5.0 >Reporter: Akihiro Suda > > This sequence leads the ensemble to a split-brain state: > * Start server 1 (config=1:participant, 2:participant, 3:participant) > * Start server 2 (config=1:participant, 2:participant, 3:participant) > * 1 and 2 believe 2 is the leader > * Start server 3 (config=1:observer, 2:observer, 3:participant) > * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader > Such a split-brain ensemble is very unstable. > Znodes can be lost easily: > * Create some znodes on 2 > * Restart 1 and 2 > * 1, 2 and 3 can think 3 is the leader > * znodes created on 2 are lost, as 1 and 2 sync with 3 > I consider this behavior as a bug and that ZK should fail gracefully if a > participant is listed as an observer in the config. > In current implementation, ZK cannot detect such an invalid config, as > FastLeaderElection.sendNotification() sends notifications to only voting > members and hence there is no message from observers(1 and 2) to the new > voter (3). > I think FastLeaderElection.sendNotification() should send notifications to > all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should > verify acks. > Any thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (ZOOKEEPER-2203) multiple leaders can be elected when configs conflict
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Shraer resolved ZOOKEEPER-2203. - Resolution: Not A Problem > multiple leaders can be elected when configs conflict > - > > Key: ZOOKEEPER-2203 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2203 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection >Affects Versions: 3.5.0 >Reporter: Akihiro Suda > > This sequence leads the ensemble to a split-brain state: > * Start server 1 (config=1:participant, 2:participant, 3:participant) > * Start server 2 (config=1:participant, 2:participant, 3:participant) > * 1 and 2 believe 2 is the leader > * Start server 3 (config=1:observer, 2:observer, 3:participant) > * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader > Such a split-brain ensemble is very unstable. > Znodes can be lost easily: > * Create some znodes on 2 > * Restart 1 and 2 > * 1, 2 and 3 can think 3 is the leader > * znodes created on 2 are lost, as 1 and 2 sync with 3 > I consider this behavior as a bug and that ZK should fail gracefully if a > participant is listed as an observer in the config. > In current implementation, ZK cannot detect such an invalid config, as > FastLeaderElection.sendNotification() sends notifications to only voting > members and hence there is no message from observers(1 and 2) to the new > voter (3). > I think FastLeaderElection.sendNotification() should send notifications to > all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should > verify acks. > Any thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2095) Add Systemd startup/conf files
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588526#comment-14588526 ] Guillaume ALAUX commented on ZOOKEEPER-2095: This last service file uses shell scripts which must run zookeeper "as daemon" (ie running it in a forked process) thus the "Type=forking". The one I submitted does not fork. It allows for better log capturing as far as I remember. As Raul Gutierrez Segales suggested, I would go with a systemd service file that directly calls `java` in order to keep all configuration in the systemd service file. > Add Systemd startup/conf files > -- > > Key: ZOOKEEPER-2095 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2095 > Project: ZooKeeper > Issue Type: Improvement > Components: contrib >Reporter: Guillaume ALAUX >Priority: Minor > Attachments: ZOOKEEPER-2095.patch > > > As adoption of systemd by distributions grows, it would be nice to have > systemd configuration and startup files for Zookeeper in the upstream tree. I > would thus like to contribute the following patch which brings the followings > systemd files: > - {{sysusers.d_zookeeper.conf}}: creates {{zookeeper}} Linux system user to > run Zookeeper > - {{tmpfiles.d_zookeeper.conf}}: creates temporary {{/var/log/zookeeper}} and > {{/var/lib/zookeeper} directories > - {{zookeeper.service}}: regular systemd startup _script_ > - {{zookeeper@.service}}: systemd startup _script_ for specific use (for > instance when Zookeeper is invoked to support some other piece of software – > [example for > Kafka|http://pkgbuild.com/git/aur-mirror.git/tree/kafka/systemd_kafka.service#n3], > [example for > Storm|http://pkgbuild.com/git/aur-mirror.git/tree/storm/systemd_storm-nimbus.service#n3]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: ZOOKEEPER-2095 PreCommit Build #2772
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2095 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2772/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 373764 lines...] [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2772//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2772//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2772//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] c16b40b641442adecf10fb70a75b69f5a2d5377a logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1782: exec returned: 1 Total time: 13 minutes 22 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-ZOOKEEPER-Build #2752 Archived 24 artifacts Archive block size is 32768 Received 5 blocks and 33840175 bytes Compression is 0.5% Took 12 sec Recording test results Description set: ZOOKEEPER-2095 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-2095) Add Systemd startup/conf files
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588539#comment-14588539 ] Hadoop QA commented on ZOOKEEPER-2095: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688324/ZOOKEEPER-2095.patch against trunk revision 1685685. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2772//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2772//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2772//console This message is automatically generated. > Add Systemd startup/conf files > -- > > Key: ZOOKEEPER-2095 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2095 > Project: ZooKeeper > Issue Type: Improvement > Components: contrib >Reporter: Guillaume ALAUX >Priority: Minor > Attachments: ZOOKEEPER-2095.patch > > > As adoption of systemd by distributions grows, it would be nice to have > systemd configuration and startup files for Zookeeper in the upstream tree. I > would thus like to contribute the following patch which brings the followings > systemd files: > - {{sysusers.d_zookeeper.conf}}: creates {{zookeeper}} Linux system user to > run Zookeeper > - {{tmpfiles.d_zookeeper.conf}}: creates temporary {{/var/log/zookeeper}} and > {{/var/lib/zookeeper} directories > - {{zookeeper.service}}: regular systemd startup _script_ > - {{zookeeper@.service}}: systemd startup _script_ for specific use (for > instance when Zookeeper is invoked to support some other piece of software – > [example for > Kafka|http://pkgbuild.com/git/aur-mirror.git/tree/kafka/systemd_kafka.service#n3], > [example for > Storm|http://pkgbuild.com/git/aur-mirror.git/tree/storm/systemd_storm-nimbus.service#n3]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2163) Introduce new ZNode type: container
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588543#comment-14588543 ] Rakesh R commented on ZOOKEEPER-2163: - +1 for the {{zookeeper-2163.15.patch}}, version changes to 3.5.1 > Introduce new ZNode type: container > --- > > Key: ZOOKEEPER-2163 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2163 > Project: ZooKeeper > Issue Type: New Feature > Components: c client, java client, server >Affects Versions: 3.5.0 >Reporter: Jordan Zimmerman >Assignee: Jordan Zimmerman > Fix For: 3.5.1 > > Attachments: zookeeper-2163.10.patch, zookeeper-2163.11.patch, > zookeeper-2163.12.patch, zookeeper-2163.13.patch, zookeeper-2163.14.patch, > zookeeper-2163.15.patch, zookeeper-2163.3.patch, zookeeper-2163.5.patch, > zookeeper-2163.6.patch, zookeeper-2163.7.patch, zookeeper-2163.8.patch, > zookeeper-2163.9.patch > > > BACKGROUND > > A recurring problem for ZooKeeper users is garbage collection of parent > nodes. Many recipes (e.g. locks, leaders, etc.) call for the creation of a > parent node under which participants create sequential nodes. When the > participant is done, it deletes its node. In practice, the ZooKeeper tree > begins to fill up with orphaned parent nodes that are no longer needed. The > ZooKeeper APIs don’t provide a way to clean these. Over time, ZooKeeper can > become unstable due to the number of these nodes. > CURRENT SOLUTIONS > === > Apache Curator has a workaround solution for this by providing the Reaper > class which runs in the background looking for orphaned parent nodes and > deleting them. This isn’t ideal and it would be better if ZooKeeper supported > this directly. > PROPOSAL > = > ZOOKEEPER-723 and ZOOKEEPER-834 have been proposed to allow EPHEMERAL nodes > to contain child nodes. This is not optimum as EPHEMERALs are tied to a > session and the general use case of parent nodes is for PERSISTENT nodes. > This proposal adds a new node type, CONTAINER. A CONTAINER node is the same > as a PERSISTENT node with the additional property that when its last child is > deleted, it is deleted (and CONTAINER nodes recursively up the tree are > deleted if empty). > CANONICAL USAGE > > {code} > while ( true) { // or some reasonable limit > try { > zk.create(path, ...); > break; > } catch ( KeeperException.NoNodeException e ) { > try { > zk.createContainer(containerPath, ...); > } catch ( KeeperException.NodeExistsException ignore) { >} > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2145) Node can be seen but not deleted
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589343#comment-14589343 ] Marshall McMullen commented on ZOOKEEPER-2145: -- Has anyone had a chance to investigate this issue yet? > Node can be seen but not deleted > > > Key: ZOOKEEPER-2145 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2145 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.6 >Reporter: Frans Lawaetz > > I have a three-server ensemble that appears to be working fine in every > respect but for the fact that I can ls or get a znode but can not rmr it. > >[zk: localhost:2181(CONNECTED) 0] get > >/accumulo/9354e975-7e2a-4207-8c7b-5d36c0e7765d/masters/goal_state > CLEAN_STOP > cZxid = 0x15 > ctime = Fri Feb 20 13:37:59 CST 2015 > mZxid = 0x72 > mtime = Fri Feb 20 13:38:05 CST 2015 > pZxid = 0x15 > cversion = 0 > dataVersion = 2 > aclVersion = 0 > ephemeralOwner = 0x0 > dataLength = 10 > numChildren = 0 > [zk: localhost:2181(CONNECTED) 1] rmr > /accumulo/9354e975-7e2a-4207-8c7b-5d36c0e7765d/masters/goal_state > Node does not exist: > /accumulo/9354e975-7e2a-4207-8c7b-5d36c0e7765d/masters/goal_state > I have run a 'stat' against all three servers and they seem properly > structured with a leader and two followers. An md5sum of all zoo.cfg shows > them to be identical. > The problem seems localized to the accumulo/935 directory as I can create > and delete znodes outside of that path fine but not inside of it. > For example: > [zk: localhost:2181(CONNECTED) 12] create > /accumulo/9354e975-7e2a-4207-8c7b-5d36c0e7765d/fubar asdf > Node does not exist: /accumulo/9354e975-7e2a-4207-8c7b-5d36c0e7765d/fubar > [zk: localhost:2181(CONNECTED) 13] create /accumulo/fubar asdf > Created /accumulo/fubar > [zk: localhost:2181(CONNECTED) 14] ls /accumulo/fubar > [] > [zk: localhost:2181(CONNECTED) 15] rmr /accumulo/fubar > [zk: localhost:2181(CONNECTED) 16] > Here is my zoo.cfg: > tickTime=2000 > initLimit=10 > syncLimit=15 > dataDir=/data/extera/zkeeper/data > clientPort=2181 > maxClientCnxns=300 > autopurge.snapRetainCount=10 > autopurge.purgeInterval=1 > server.1=cdf61:2888:3888 > server.2=cdf62:2888:3888 > server.3=cdf63:2888:3888 -- This message was sent by Atlassian JIRA (v6.3.4#6332)