ZooKeeper-trunk-solaris - Build # 714 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-solaris/714/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 223549 lines...] [junit] 2013-10-28 09:03:16,243 [myid:] - INFO [NIOServerCxnFactory.SelectorThread-1:NIOServerCnxnFactory$SelectorThread@420] - selector thread exitted run method [junit] 2013-10-28 09:03:16,244 [myid:] - INFO [main:ZooKeeperServer@428] - shutting down [junit] 2013-10-28 09:03:16,244 [myid:] - INFO [main:SessionTrackerImpl@183] - Shutting down [junit] 2013-10-28 09:03:16,244 [myid:] - INFO [main:PrepRequestProcessor@972] - Shutting down [junit] 2013-10-28 09:03:16,244 [myid:] - INFO [main:SyncRequestProcessor@190] - Shutting down [junit] 2013-10-28 09:03:16,244 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@156] - PrepRequestProcessor exited loop! [junit] 2013-10-28 09:03:16,245 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@168] - SyncRequestProcessor exited! [junit] 2013-10-28 09:03:16,245 [myid:] - INFO [main:FinalRequestProcessor@442] - shutdown of request processor complete [junit] 2013-10-28 09:03:16,245 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-28 09:03:16,246 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-10-28 09:03:16,247 [myid:] - INFO [main:ClientBase@414] - STARTING server [junit] 2013-10-28 09:03:16,247 [myid:] - INFO [main:ZooKeeperServer@149] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test3436096022061781276.junit.dir/version-2 snapdir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test3436096022061781276.junit.dir/version-2 [junit] 2013-10-28 09:03:16,248 [myid:] - INFO [main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 kB direct buffers. [junit] 2013-10-28 09:03:16,248 [myid:] - INFO [main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-10-28 09:03:16,249 [myid:] - INFO [main:FileSnap@83] - Reading snapshot /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test3436096022061781276.junit.dir/version-2/snapshot.b [junit] 2013-10-28 09:03:16,251 [myid:] - INFO [main:FileTxnSnapLog@297] - Snapshotting: 0xb to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test3436096022061781276.junit.dir/version-2/snapshot.b [junit] 2013-10-28 09:03:16,253 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-28 09:03:16,253 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:45802 [junit] 2013-10-28 09:03:16,254 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@828] - Processing stat command from /127.0.0.1:45802 [junit] 2013-10-28 09:03:16,254 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@677] - Stat command output [junit] 2013-10-28 09:03:16,255 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@999] - Closed socket connection for client /127.0.0.1:45802 (no session established for client) [junit] 2013-10-28 09:03:16,263 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-10-28 09:03:16,264 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-10-28 09:03:16,264 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-10-28 09:03:16,265 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-10-28 09:03:16,265 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-10-28 09:03:16,265 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-10-28 09:03:16,265 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-10-28 09:03:16,324 [myid:] - INFO [main:ZooKeeper@777] - Session: 0x141fe4d844c closed [junit] 2013-10-28 09:03:16,324 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down [junit] 2013-10-28 09:03:16,324 [myid:] - INFO [main:ClientBase@421] - STOPPING server [junit] 2013-10-28 09:03:16,325 [myid:] - INFO
ZooKeeper-3.4-WinVS2008_java - Build # 336 - Still Failing
See https://builds.apache.org/job/ZooKeeper-3.4-WinVS2008_java/336/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 205817 lines...] [junit] 2013-10-28 10:02:45,815 [myid:] - INFO [main:SessionTrackerImpl@225] - Shutting down [junit] 2013-10-28 10:02:45,815 [myid:] - INFO [main:PrepRequestProcessor@761] - Shutting down [junit] 2013-10-28 10:02:45,815 [myid:] - INFO [main:SyncRequestProcessor@209] - Shutting down [junit] 2013-10-28 10:02:45,815 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop! [junit] 2013-10-28 10:02:45,815 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@187] - SyncRequestProcessor exited! [junit] 2013-10-28 10:02:45,816 [myid:] - INFO [main:FinalRequestProcessor@415] - shutdown of request processor complete [junit] 2013-10-28 10:02:45,916 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-28 10:02:46,370 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@968] - Opening socket connection to server 127.0.0.1/127.0.0.1:11221. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration) [junit] 2013-10-28 10:02:46,911 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-10-28 10:02:46,912 [myid:] - INFO [main:ClientBase@414] - STARTING server [junit] 2013-10-28 10:02:46,912 [myid:] - INFO [main:ZooKeeperServer@162] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir f:\hudson\hudson-slave\workspace\ZooKeeper-3.4-WinVS2008_java\branch-3.4\build\test\tmp\test9208320739183221583.junit.dir\version-2 snapdir f:\hudson\hudson-slave\workspace\ZooKeeper-3.4-WinVS2008_java\branch-3.4\build\test\tmp\test9208320739183221583.junit.dir\version-2 [junit] 2013-10-28 10:02:46,922 [myid:] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-10-28 10:02:46,926 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-28 10:02:46,926 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:57333 [junit] 2013-10-28 10:02:46,927 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@817] - Processing stat command from /127.0.0.1:57333 [junit] 2013-10-28 10:02:47,022 [myid:] - INFO [Thread-5:NIOServerCnxn$StatCommand@653] - Stat command output [junit] 2013-10-28 10:02:47,023 [myid:] - INFO [Thread-5:NIOServerCnxn@997] - Closed socket connection for client /127.0.0.1:57333 (no session established for client) [junit] 2013-10-28 10:02:47,023 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-10-28 10:02:47,025 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-10-28 10:02:47,025 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-10-28 10:02:47,122 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-10-28 10:02:47,122 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-10-28 10:02:47,122 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-10-28 10:02:47,122 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-10-28 10:02:47,362 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@849] - Socket connection established to 127.0.0.1/127.0.0.1:11221, initiating session [junit] 2013-10-28 10:02:47,362 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:57326 [junit] 2013-10-28 10:02:47,362 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:ZooKeeperServer@861] - Client attempting to renew session 0x141fe83f892 at /127.0.0.1:57326 [junit] 2013-10-28 10:02:47,424 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:ZooKeeperServer@617] - Established session 0x141fe83f892 with negotiated timeout 3 for client /127.0.0.1:57326 [junit] 2013-10-28 10:02:47,425 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@1228] - Session establishment complete on server 127.0.0.1/127.0.0.1:11221, sessionid = 0x141fe83f892, negotiated timeout = 3 [junit] 2013-10-28 10:02:47,425 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x141fe83f892 [junit] 2013-10-28
ZooKeeper_branch34_openjdk7 - Build # 382 - Failure
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/382/ ### ## LAST 60 LINES OF THE CONSOLE ### Started by timer FATAL: null java.lang.NullPointerException at hudson.model.Slave.createLauncher(Slave.java:347) at hudson.model.AbstractBuild$AbstractBuildExecution.createLauncher(AbstractBuild.java:612) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:543) at hudson.model.Run.execute(Run.java:1603) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:246) ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (ZOOKEEPER-1459) Standalone ZooKeeperServer is not closing the transaction log files on shutdown
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806721#comment-13806721 ] Flavio Junqueira commented on ZOOKEEPER-1459: - bq. why this fix is affecting ZxidRolloverTest testcases. I don´t know, I just know that the test hangs on Windows. I haven´t tested on Linux, but from the QA output I suppose it is not a problem. bq. it would be good if you can share logs, threaddumps etc. Ok, I´ll upload when I have a chance. Standalone ZooKeeperServer is not closing the transaction log files on shutdown --- Key: ZOOKEEPER-1459 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1459 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch When shutdown the standalone ZK server, its only clearing the zkdatabase and not closing the transaction log streams. When tries to delete the temporary files in unit tests on windows, its failing. ZooKeeperServer.java {noformat} if (zkDb != null) { zkDb.clear(); } {noformat} Suggestion to close the zkDb as follows, this inturn will take care transaction logs: {noformat} if (zkDb != null) { zkDb.clear(); try { zkDb.close(); } catch (IOException ie) { LOG.warn(Error closing logs , ie); } } {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1459) Standalone ZooKeeperServer is not closing the transaction log files on shutdown
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806729#comment-13806729 ] Rakesh R commented on ZOOKEEPER-1459: - ok. I tried in Windows 7 env but couldn't simulate the problem. Is this consistently failiing in your env, also would like to know which test case in ZxidRolloverTest ? Standalone ZooKeeperServer is not closing the transaction log files on shutdown --- Key: ZOOKEEPER-1459 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1459 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch When shutdown the standalone ZK server, its only clearing the zkdatabase and not closing the transaction log streams. When tries to delete the temporary files in unit tests on windows, its failing. ZooKeeperServer.java {noformat} if (zkDb != null) { zkDb.clear(); } {noformat} Suggestion to close the zkDb as follows, this inturn will take care transaction logs: {noformat} if (zkDb != null) { zkDb.clear(); try { zkDb.close(); } catch (IOException ie) { LOG.warn(Error closing logs , ie); } } {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1554) Can't use zookeeper client without SASL
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806733#comment-13806733 ] Germán Blanco commented on ZOOKEEPER-1554: -- The code for the method in [l#comment-13468619] is the same in trunk and branch 3.4. In both they check an additional condition and not only the one in the comment, and in both there is a comment indicating that there is some work left to do. But the remaining work sounds more like an improvement. I only sent the question above because I thought that this JIRA should be closed. Can't use zookeeper client without SASL --- Key: ZOOKEEPER-1554 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1554 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.4 Reporter: Guillaume Nodet Priority: Blocker Fix For: 3.4.6, 3.5.0 The ZooKeeperSaslClient correctly detects that it should not use SASL when nothing is configured, however the SendThread waits forever because clientTunneledAuthenticationInProgress() returns true instead of false. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1732) ZooKeeper server unable to join established ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807102#comment-13807102 ] Raul Gutierrez Segales commented on ZOOKEEPER-1732: --- [~fpj], [~abranzyck]: did you guys test this patch when joining a cluster of servers running without this patch (i.e.: trunk, only without this patch)? After rolling the first 2 followers - in a 5 member ensemble - the 3rd follower fails to join with this: {noformat} 2013-10-28 18:43:18,134 - INFO [WorkerReceiver[myid=4]] - Notification: 4 (n.leader), 0x890415 (n.zxid), 0x6 (n.round), LOOKING (n.state), 4 (n.sid), 0x89 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2013-10-28 18:43:18,134 - INFO [WorkerReceiver[myid=4]] - Notification: 2 (n.leader), 0x88002c (n.zxid), 0x (n.round), FOLLOWING (n.state), 0 (n.sid), 0x89 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2013-10-28 18:43:18,135 - INFO [WorkerReceiver[myid=4]] - Notification: 2 (n.leader), 0x88002c (n.zxid), 0x6 (n.round), LEADING (n.state), 2 (n.sid), 0x88 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2013-10-28 18:43:18,135 - INFO [WorkerReceiver[myid=4]] - Notification: 2 (n.leader), 0x88002c (n.zxid), 0x6 (n.round), FOLLOWING (n.state), 3 (n.sid), 0x88 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2013-10-28 18:43:18,136 - INFO [WorkerReceiver[myid=4]] - Notification: 2 (n.leader), 0x88002c (n.zxid), 0x (n.round), FOLLOWING (n.state), 1 (n.sid), 0x89 (n.peerEPoch), LOOKING (my state)0 (n.config version) {noformat} I am guessing IGNOREVALUE (0x) as the round value is causing issues? What was the expected behavior here (i.e.: when dealing with cluster members without this patch during an upgrade)? ZooKeeper server unable to join established ensemble Key: ZOOKEEPER-1732 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.4.5 Environment: Windows 7, Java 1.7 Reporter: Germán Blanco Assignee: Germán Blanco Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: CREATE_INCONSISTENCIES_patch.txt, zklog.tar.gz, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch I have a test in which I do a rolling restart of three ZooKeeper servers and it was failing from time to time. I ran the tests in a loop until the failure came out and it seems that at some point one of the servers is unable to join the enssemble formed by the other two. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1745) Wrong Import-Package in the META-INF/MANIFEST.MF of zookeeper 3.4.5 bundle
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807264#comment-13807264 ] Arnoud Glimmerveen commented on ZOOKEEPER-1745: --- Are you still experiencing these OSGi issues with trunk/3.4 branch [~xldai]? I think the issue described was already addressed by ZOOKEEPER-1334 Wrong Import-Package in the META-INF/MANIFEST.MF of zookeeper 3.4.5 bundle -- Key: ZOOKEEPER-1745 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1745 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Environment: Java 7 Reporter: Xilai Dai Assignee: Jean-Baptiste Onofré Fix For: 3.4.6, 3.5.0 Import-Package: javax.management,org.apache.log4j,org.osgi.framework;v ersion=[1.4,2.0),org.osgi.util.tracker;version=[1.1,2.0) the org.apache.log4j should be replaced by org.slf4j, because from the source codes, zookeeper server classes import org.slf4j.* for logging. currently will get: Caused by: java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory at org.apache.zookeeper.server.quorum.QuorumPeerConfig.clinit(QuorumPeerConfig.java:46) when try to create instance for some of its classes in OSGi container (e.g. apache karaf) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1732) ZooKeeper server unable to join established ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807553#comment-13807553 ] Flavio Junqueira commented on ZOOKEEPER-1732: - hmm, this is odd. I don't understand why the notifications don't have the same round value, the don't care value in this case. The value is also not what I expected, so I might have done something wrong there. Let me have a closer look and report back. Thanks for reporting, [~rgs]. ZooKeeper server unable to join established ensemble Key: ZOOKEEPER-1732 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.4.5 Environment: Windows 7, Java 1.7 Reporter: Germán Blanco Assignee: Germán Blanco Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: CREATE_INCONSISTENCIES_patch.txt, zklog.tar.gz, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch I have a test in which I do a rolling restart of three ZooKeeper servers and it was failing from time to time. I ran the tests in a loop until the failure came out and it seems that at some point one of the servers is unable to join the enssemble formed by the other two. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1732) ZooKeeper server unable to join established ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807565#comment-13807565 ] Raul Gutierrez Segales commented on ZOOKEEPER-1732: --- What's wrong with the round values? i.e.: the two new servers have IGNOREVALUE (sounds correct right?) and the older followers have the current round value (i.e.: 0x6). I thought the problem would be here: {noformat} * @see https://issues.apache.org/jira/browse/ZOOKEEPER-1732 */ outofelection.put(n.sid, new Vote(n.leader, IGNOREVALUE, IGNOREVALUE, n.peerEpoch, n.state)); if (termPredicate(outofelection, new Vote(n.leader, IGNOREVALUE, IGNOREVALUE, n.peerEpoch, n.state)) checkLeader(outofelection, n.leader, IGNOREVALUE)) { {noformat} IGNOREVALUE doesn't work here, because we are talking to un-patched cluster members. Sorry if I am completely misleading you :) That's as far as I got with my analysis today. ZooKeeper server unable to join established ensemble Key: ZOOKEEPER-1732 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.4.5 Environment: Windows 7, Java 1.7 Reporter: Germán Blanco Assignee: Germán Blanco Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: CREATE_INCONSISTENCIES_patch.txt, zklog.tar.gz, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch I have a test in which I do a rolling restart of three ZooKeeper servers and it was failing from time to time. I ran the tests in a loop until the failure came out and it seems that at some point one of the servers is unable to join the enssemble formed by the other two. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1745) Wrong Import-Package in the META-INF/MANIFEST.MF of zookeeper 3.4.5 bundle
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807583#comment-13807583 ] Xilai Dai commented on ZOOKEEPER-1745: -- Arnoud, We are still using zookeeper 3.3.6 and never tested with zookeeper trunk/3.4 branch. Wrong Import-Package in the META-INF/MANIFEST.MF of zookeeper 3.4.5 bundle -- Key: ZOOKEEPER-1745 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1745 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Environment: Java 7 Reporter: Xilai Dai Assignee: Jean-Baptiste Onofré Fix For: 3.4.6, 3.5.0 Import-Package: javax.management,org.apache.log4j,org.osgi.framework;v ersion=[1.4,2.0),org.osgi.util.tracker;version=[1.1,2.0) the org.apache.log4j should be replaced by org.slf4j, because from the source codes, zookeeper server classes import org.slf4j.* for logging. currently will get: Caused by: java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory at org.apache.zookeeper.server.quorum.QuorumPeerConfig.clinit(QuorumPeerConfig.java:46) when try to create instance for some of its classes in OSGi container (e.g. apache karaf) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1732) ZooKeeper server unable to join established ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807588#comment-13807588 ] Flavio Junqueira commented on ZOOKEEPER-1732: - I see, my mental model of the problem ignored the fact that there were servers with newer and older versions, my bad. I think the IGNOREVALUE is not really being ignored, I'll come up with a fix, but I'll do it in a different jira. ZooKeeper server unable to join established ensemble Key: ZOOKEEPER-1732 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.4.5 Environment: Windows 7, Java 1.7 Reporter: Germán Blanco Assignee: Germán Blanco Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: CREATE_INCONSISTENCIES_patch.txt, zklog.tar.gz, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch I have a test in which I do a rolling restart of three ZooKeeper servers and it was failing from time to time. I ran the tests in a loop until the failure came out and it seems that at some point one of the servers is unable to join the enssemble formed by the other two. -- This message was sent by Atlassian JIRA (v6.1#6144)
Re: zxids from some epoch are a series of consecutive numbers, aren't they?
yes zxids are always consecutive within an epoch and the protocol ensures that followers see the zxids in order. On Fri, Oct 25, 2013 at 1:53 AM, 聂安 nieanan3...@163.com wrote: hi all, Are these log indexes(zxids) in one epoch a series of numbers? Do they go like this [0--1--2--3--4], and there's no number (such as 2) skipped? Followers do little to prevent data-inconformity before they commit some proposal. So can I conclude that ZK always thinks of itself as normal once it forms the leader-followers relationship? Thank you~ Regards An.Nie
[jira] [Commented] (ZOOKEEPER-1732) ZooKeeper server unable to join established ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807633#comment-13807633 ] Thawan Kooburat commented on ZOOKEEPER-1732: May be we should start considering automate rolling upgrade test?. In jenkins we might be able to continuously grab 3.4 branch and perform rolling upgrade to 3.5 and verify that quorum come up ZooKeeper server unable to join established ensemble Key: ZOOKEEPER-1732 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.4.5 Environment: Windows 7, Java 1.7 Reporter: Germán Blanco Assignee: Germán Blanco Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: CREATE_INCONSISTENCIES_patch.txt, zklog.tar.gz, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch I have a test in which I do a rolling restart of three ZooKeeper servers and it was failing from time to time. I ran the tests in a loop until the failure came out and it seems that at some point one of the servers is unable to join the enssemble formed by the other two. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1800) jenkins failure in testGetProposalFromTxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807654#comment-13807654 ] Thawan Kooburat commented on ZOOKEEPER-1800: Yeah, fsync time on these boxes is unbelievable. 2013-10-24 10:43:32,575 [myid:] - WARN [SyncThread:0:FileTxnLog@322] - fsync-ing the write ahead log in SyncThread:0 took 7333ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide 2013-10-24 10:43:33,900 [myid:] - WARN [SyncThread:0:FileTxnLog@322] - fsync-ing the write ahead log in SyncThread:0 took 1324ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide 2013-10-24 10:43:33,902 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED testGetProposalFromTxn org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /invalidsnap-129 jenkins failure in testGetProposalFromTxn - Key: ZOOKEEPER-1800 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1800 Project: ZooKeeper Issue Type: Bug Components: tests Affects Versions: 3.5.0 Reporter: Patrick Hunt Assignee: Thawan Kooburat Fix For: 3.5.0 https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-jdk7/691/testReport/junit/org.apache.zookeeper.test/GetProposalFromTxnTest/testGetProposalFromTxn/ test was introduced in ZOOKEEPER-1413, seems to have failed twice so far this month. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1802) flakey test testResyncByTxnlogThenDiffAfterFollowerCrashes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807657#comment-13807657 ] Thawan Kooburat commented on ZOOKEEPER-1802: As part of fixing duplicate NEWLEADER packet (ZOOKEEPER-1324), lastProcessedZxid on each server can be different when the quorum start up and there is no new request (They may point to the last txn from the previous epoch). As shown in the log here 2013-10-24 10:42:07,301 [myid:] - INFO [main:FollowerResyncConcurrencyTest@588] - Timeout waiting for zxid to sync: leader 0x13ecc clean 0x2 restarted 0x13ecc I can switch to rely on other method to verify that all server has up-to-date data instead of checking lastProcessedZxid flakey test testResyncByTxnlogThenDiffAfterFollowerCrashes -- Key: ZOOKEEPER-1802 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1802 Project: ZooKeeper Issue Type: Bug Components: tests Affects Versions: 3.5.0 Reporter: Patrick Hunt Assignee: Thawan Kooburat This test fails intermittently on trunk: https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-jdk7/691/testReport/junit/org.apache.zookeeper.test/FollowerResyncConcurrencyTest/testResyncByTxnlogThenDiffAfterFollowerCrashes/ -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1732) ZooKeeper server unable to join established ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807673#comment-13807673 ] Flavio Junqueira commented on ZOOKEEPER-1732: - Given that rolling upgrades seem to be very common, it doesn't sound like a bad idea to automate the testing. I think we can't do it with junit, or at least I don't know how. ZooKeeper server unable to join established ensemble Key: ZOOKEEPER-1732 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.4.5 Environment: Windows 7, Java 1.7 Reporter: Germán Blanco Assignee: Germán Blanco Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: CREATE_INCONSISTENCIES_patch.txt, zklog.tar.gz, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch I have a test in which I do a rolling restart of three ZooKeeper servers and it was failing from time to time. I ran the tests in a loop until the failure came out and it seems that at some point one of the servers is unable to join the enssemble formed by the other two. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (BOOKKEEPER-628) Improve bookie registration interface
[ https://issues.apache.org/jira/browse/BOOKKEEPER-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806991#comment-13806991 ] Ivan Kelly commented on BOOKKEEPER-628: --- Do you mean push the listener stuff in a different JIRA? Improve bookie registration interface - Key: BOOKKEEPER-628 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-628 Project: Bookkeeper Issue Type: Improvement Components: bookkeeper-client, bookkeeper-server Reporter: Rakesh R Assignee: Rakesh R Fix For: 4.3.0 Attachments: BOOKKEEPER-628-interface-version-1.patch, BOOKKEEPER-628-interface-version-2.patch, BOOKKEEPER-628-interface-version.patch The idea is to improve/generalize the bookie registration process -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (BOOKKEEPER-602) we should have request timeouts rather than channel timeout in PerChannelBookieClient
[ https://issues.apache.org/jira/browse/BOOKKEEPER-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807440#comment-13807440 ] Hadoop QA commented on BOOKKEEPER-602: -- Testing JIRA BOOKKEEPER-602 Patch [0001-BOOKKEEPER-602-we-should-have-request-timeouts-rathe.patch|https://issues.apache.org/jira/secure/attachment/12610664/0001-BOOKKEEPER-602-we-should-have-request-timeouts-rathe.patch] downloaded at Mon Oct 28 22:26:39 UTC 2013 {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 120 .{color:green}+1{color} the patch does adds/modifies 6 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 FINDBUGS{color} .{color:green}+1{color} the patch does not seem to introduce new Findbugs warnings {color:green}+1 TESTS{color} .Tests run: 882 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:green}*+1 Overall result, good!, no -1s*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/bookkeeper-trunk-precommit-build/514/ we should have request timeouts rather than channel timeout in PerChannelBookieClient - Key: BOOKKEEPER-602 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-602 Project: Bookkeeper Issue Type: Bug Affects Versions: 4.2.0, 4.2.1 Reporter: Sijie Guo Assignee: Aniruddha Fix For: 4.3.0 Attachments: 0001-BOOKKEEPER-602-we-should-have-request-timeouts-rathe.patch, BOOKKEEPER-602.diff, BOOKKEEPER-602.diff currently we only have readTimeout in netty channel, it timeouts only when there is no activities in that channel, but it can't track timeouts of individual requests. if a channel continues having read entry activities, it might shadow a slow add entry response, which is bad impacting add latency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (BOOKKEEPER-602) we should have request timeouts rather than channel timeout in PerChannelBookieClient
[ https://issues.apache.org/jira/browse/BOOKKEEPER-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807659#comment-13807659 ] Sijie Guo commented on BOOKKEEPER-602: -- +1 commiting we should have request timeouts rather than channel timeout in PerChannelBookieClient - Key: BOOKKEEPER-602 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-602 Project: Bookkeeper Issue Type: Bug Affects Versions: 4.2.0, 4.2.1 Reporter: Sijie Guo Assignee: Aniruddha Fix For: 4.3.0 Attachments: 0001-BOOKKEEPER-602-we-should-have-request-timeouts-rathe.patch, BOOKKEEPER-602.diff, BOOKKEEPER-602.diff currently we only have readTimeout in netty channel, it timeouts only when there is no activities in that channel, but it can't track timeouts of individual requests. if a channel continues having read entry activities, it might shadow a slow add entry response, which is bad impacting add latency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (BOOKKEEPER-602) we should have request timeouts rather than channel timeout in PerChannelBookieClient
[ https://issues.apache.org/jira/browse/BOOKKEEPER-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807672#comment-13807672 ] Hudson commented on BOOKKEEPER-602: --- SUCCESS: Integrated in bookkeeper-trunk #418 (See [https://builds.apache.org/job/bookkeeper-trunk/418/]) BOOKKEEPER-602: we should have request timeouts rather than channel timeout in PerChannelBookieClient (Aniruddha, ivank via sijie) (sijie: rev 1536584) * /zookeeper/bookkeeper/trunk/CHANGES.txt * /zookeeper/bookkeeper/trunk/bookkeeper-benchmark/src/main/java/org/apache/bookkeeper/benchmark/BenchReadThroughputLatency.java * /zookeeper/bookkeeper/trunk/bookkeeper-benchmark/src/main/java/org/apache/bookkeeper/benchmark/BenchThroughputLatency.java * /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ClientConfiguration.java * /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieClient.java * /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java * /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/client/LedgerCloseTest.java * /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/client/SlowBookieTest.java * /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestReadTimeout.java * /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestSpeculativeRead.java * /zookeeper/bookkeeper/trunk/hedwig-server/src/test/java/org/apache/hedwig/server/persistence/TestBookKeeperPersistenceManager.java * /zookeeper/bookkeeper/trunk/hedwig-server/src/test/java/org/apache/hedwig/server/persistence/TestDeadlock.java we should have request timeouts rather than channel timeout in PerChannelBookieClient - Key: BOOKKEEPER-602 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-602 Project: Bookkeeper Issue Type: Bug Affects Versions: 4.2.0, 4.2.1 Reporter: Sijie Guo Assignee: Aniruddha Fix For: 4.3.0 Attachments: 0001-BOOKKEEPER-602-we-should-have-request-timeouts-rathe.patch, BOOKKEEPER-602.diff, BOOKKEEPER-602.diff currently we only have readTimeout in netty channel, it timeouts only when there is no activities in that channel, but it can't track timeouts of individual requests. if a channel continues having read entry activities, it might shadow a slow add entry response, which is bad impacting add latency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (BOOKKEEPER-614) Generic stats interface, which multiple providers can be plugged into
[ https://issues.apache.org/jira/browse/BOOKKEEPER-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sijie Guo updated BOOKKEEPER-614: - Attachment: 0001-generic-stats-api.patch generate a generic patch to combine the twitter change with Ivan's change. Generic stats interface, which multiple providers can be plugged into - Key: BOOKKEEPER-614 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-614 Project: Bookkeeper Issue Type: Improvement Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 4.3.0 Attachments: 0001-BOOKKEEPER-614-Generic-stats-interface-which-multipl.patch, 0001-generic-stats-api.patch Currently we collect stats though JMX. Adding a new stat to JMX is cumbersome, and reading the stats out of JMX is painful if you're not on the same machine. As a consequence, we aren't measuring a fraction of the stuff we should be. There are a couple of nice stats packages out there, such as twitter-stats[1] and codahale metrics[2], which would make collection of stats much easier. This JIRA is to provide a generic interface, which a metrics backend can be plugged into. [1] https://github.com/twitter/commons/tree/master/src/java/com/twitter/common/stats [2] http://metrics.codahale.com/ -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (BOOKKEEPER-615) Twitter stats implementation of stats interface
[ https://issues.apache.org/jira/browse/BOOKKEEPER-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sijie Guo updated BOOKKEEPER-615: - Attachment: 0002-twitter-stats-provider.patch implement twitter stats provider of generic stats API. Twitter stats implementation of stats interface --- Key: BOOKKEEPER-615 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-615 Project: Bookkeeper Issue Type: Sub-task Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 4.3.0 Attachments: 0002-twitter-stats-provider.patch Implementation of the generic stats interface using twitter stats. -- This message was sent by Atlassian JIRA (v6.1#6144)