ZooKeeper-trunk-solaris - Build # 108 - Failure
See https://builds.apache.org/job/ZooKeeper-trunk-solaris/108/ ### ## LAST 60 LINES OF THE CONSOLE ### Started by timer Building remotely on solaris1 hudson.util.IOException2: remote file operation failed: /export/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris at hudson.remoting.Channel@2e339415:solaris1 at hudson.FilePath.act(FilePath.java:780) at hudson.FilePath.act(FilePath.java:766) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:731) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:676) at hudson.model.AbstractProject.checkout(AbstractProject.java:1195) at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:573) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:462) at hudson.model.Run.run(Run.java:1404) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:238) Caused by: java.io.IOException: Remote call on solaris1 failed at hudson.remoting.Channel.call(Channel.java:690) at hudson.FilePath.act(FilePath.java:773) ... 10 more Caused by: java.lang.NoClassDefFoundError at hudson.scm.SubversionWorkspaceSelector.syncWorkspaceFormatFromMaster(SubversionWorkspaceSelector.java:85) at hudson.scm.SubversionSCM.createSvnClientManager(SubversionSCM.java:808) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:751) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:738) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2045) at hudson.remoting.UserRequest.perform(UserRequest.java:118) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:287) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269) at java.util.concurrent.FutureTask.run(FutureTask.java:123) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676) at java.lang.Thread.run(Thread.java:595) Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
ZooKeeper-trunk-jdk7 - Build # 149 - Failure
See https://builds.apache.org/job/ZooKeeper-trunk-jdk7/149/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 137373 lines...] [junit] 2012-01-18 10:06:13,467 [myid:] - INFO [main:ClientBase@417] - STOPPING server [junit] 2012-01-18 10:06:13,467 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@240] - NIOServerCnxn factory exited run method [junit] 2012-01-18 10:06:13,467 [myid:] - INFO [main:ZooKeeperServer@391] - shutting down [junit] 2012-01-18 10:06:13,467 [myid:] - INFO [main:SessionTrackerImpl@220] - Shutting down [junit] 2012-01-18 10:06:13,468 [myid:] - INFO [main:PrepRequestProcessor@711] - Shutting down [junit] 2012-01-18 10:06:13,468 [myid:] - INFO [main:SyncRequestProcessor@173] - Shutting down [junit] 2012-01-18 10:06:13,468 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@134] - PrepRequestProcessor exited loop! [junit] 2012-01-18 10:06:13,468 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@155] - SyncRequestProcessor exited! [junit] 2012-01-18 10:06:13,468 [myid:] - INFO [main:FinalRequestProcessor@419] - shutdown of request processor complete [junit] 2012-01-18 10:06:13,469 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2012-01-18 10:06:13,469 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2012-01-18 10:06:13,470 [myid:] - INFO [main:ClientBase@410] - STARTING server [junit] 2012-01-18 10:06:13,471 [myid:] - INFO [main:ZooKeeperServer@143] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test3695124395202373686.junit.dir/version-2 snapdir /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test3695124395202373686.junit.dir/version-2 [junit] 2012-01-18 10:06:13,471 [myid:] - INFO [main:NIOServerCnxnFactory@110] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2012-01-18 10:06:13,472 [myid:] - INFO [main:FileSnap@83] - Reading snapshot /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test3695124395202373686.junit.dir/version-2/snapshot.b [junit] 2012-01-18 10:06:13,473 [myid:] - INFO [main:FileTxnSnapLog@237] - Snapshotting: b [junit] 2012-01-18 10:06:13,475 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2012-01-18 10:06:13,475 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@213] - Accepted socket connection from /127.0.0.1:54646 [junit] 2012-01-18 10:06:13,476 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@820] - Processing stat command from /127.0.0.1:54646 [junit] 2012-01-18 10:06:13,476 [myid:] - INFO [Thread-4:NIOServerCnxn$StatCommand@655] - Stat command output [junit] 2012-01-18 10:06:13,476 [myid:] - INFO [Thread-4:NIOServerCnxn@1000] - Closed socket connection for client /127.0.0.1:54646 (no session established for client) [junit] 2012-01-18 10:06:13,477 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2012-01-18 10:06:13,478 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2012-01-18 10:06:13,478 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2012-01-18 10:06:13,478 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2012-01-18 10:06:13,478 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2012-01-18 10:06:13,479 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2012-01-18 10:06:13,479 [myid:] - INFO [main:ClientBase@447] - tearDown starting [junit] 2012-01-18 10:06:13,554 [myid:] - INFO [main:ZooKeeper@679] - Session: 0x134f047ef3e closed [junit] 2012-01-18 10:06:13,554 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@511] - EventThread shut down [junit] 2012-01-18 10:06:13,554 [myid:] - INFO [main:ClientBase@417] - STOPPING server [junit] 2012-01-18 10:06:13,554 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@240] - NIOServerCnxn factory exited run method [junit] 2012-01-18 10:06:13,555 [myid:] - INFO [main:ZooKeeperServer@391] - shutting down [junit] 2012-01-18 10:06:13,555 [myid:] - INFO [main:SessionTrackerImpl@220] - Shutting down [junit] 2012-01-18 10:06:13,555 [myid:] - INFO [main:PrepRequestProcessor@711] - Shutting down [junit] 2012-01-18 10:06:13,555 [myid:] - INFO [main:SyncRequestProcessor@173]
ZooKeeper-trunk - Build # 1432 - Failure
See https://builds.apache.org/job/ZooKeeper-trunk/1432/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 138452 lines...] [junit] 2012-01-18 10:55:09,284 [myid:] - INFO [main:PrepRequestProcessor@711] - Shutting down [junit] 2012-01-18 10:55:09,285 [myid:] - INFO [main:SyncRequestProcessor@173] - Shutting down [junit] 2012-01-18 10:55:09,285 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@134] - PrepRequestProcessor exited loop! [junit] 2012-01-18 10:55:09,285 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@155] - SyncRequestProcessor exited! [junit] 2012-01-18 10:55:09,285 [myid:] - INFO [main:FinalRequestProcessor@419] - shutdown of request processor complete [junit] 2012-01-18 10:55:09,286 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2012-01-18 10:55:09,286 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2012-01-18 10:55:09,287 [myid:] - INFO [main:ClientBase@410] - STARTING server [junit] 2012-01-18 10:55:09,288 [myid:] - INFO [main:ZooKeeperServer@143] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test6903943840352692016.junit.dir/version-2 snapdir /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test6903943840352692016.junit.dir/version-2 [junit] 2012-01-18 10:55:09,288 [myid:] - INFO [main:NIOServerCnxnFactory@110] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2012-01-18 10:55:09,289 [myid:] - INFO [main:FileSnap@83] - Reading snapshot /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test6903943840352692016.junit.dir/version-2/snapshot.b [junit] 2012-01-18 10:55:09,291 [myid:] - INFO [main:FileTxnSnapLog@237] - Snapshotting: b [junit] 2012-01-18 10:55:09,293 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2012-01-18 10:55:09,293 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@213] - Accepted socket connection from /127.0.0.1:58072 [junit] 2012-01-18 10:55:09,294 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@820] - Processing stat command from /127.0.0.1:58072 [junit] 2012-01-18 10:55:09,294 [myid:] - INFO [Thread-5:NIOServerCnxn$StatCommand@655] - Stat command output [junit] 2012-01-18 10:55:09,294 [myid:] - INFO [Thread-5:NIOServerCnxn@1000] - Closed socket connection for client /127.0.0.1:58072 (no session established for client) [junit] 2012-01-18 10:55:09,295 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2012-01-18 10:55:09,296 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2012-01-18 10:55:09,296 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2012-01-18 10:55:09,296 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2012-01-18 10:55:09,296 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2012-01-18 10:55:09,297 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2012-01-18 10:55:09,297 [myid:] - INFO [main:ClientBase@447] - tearDown starting [junit] 2012-01-18 10:55:09,369 [myid:] - INFO [main:ZooKeeper@679] - Session: 0x134f074bb13 closed [junit] 2012-01-18 10:55:09,369 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@511] - EventThread shut down [junit] 2012-01-18 10:55:09,369 [myid:] - INFO [main:ClientBase@417] - STOPPING server [junit] 2012-01-18 10:55:09,370 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@240] - NIOServerCnxn factory exited run method [junit] 2012-01-18 10:55:09,370 [myid:] - INFO [main:ZooKeeperServer@391] - shutting down [junit] 2012-01-18 10:55:09,370 [myid:] - INFO [main:SessionTrackerImpl@220] - Shutting down [junit] 2012-01-18 10:55:09,370 [myid:] - INFO [main:PrepRequestProcessor@711] - Shutting down [junit] 2012-01-18 10:55:09,370 [myid:] - INFO [main:SyncRequestProcessor@173] - Shutting down [junit] 2012-01-18 10:55:09,370 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@134] - PrepRequestProcessor exited loop! [junit] 2012-01-18 10:55:09,371 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@155] - SyncRequestProcessor exited! [junit] 2012-01-18 10:55:09,371 [myid:] - INFO [main:FinalRequestProcessor@419] - shutdown of request processor complete [junit] 2012-01-18 10:55:09,372 [myid:] - INFO
Re: Timeouts and ping handling
I think it can be done. Looking through the code, it seems like it should be safe modulo some stats that are set in the FinalRequestProcessor that may be less useful. A question for the other zookeeper devs out there, is there a reason that we handle read-only operations in the first processor differently on the leader than the followers? The leader (calling PrepRequestProcessor first) will do a session check for any of the read-only requests: zks.sessionTracker.checkSession(request.sessionId, request.getOwner()); but the FollowerRequestProcessor will just push these requests to its second processor, and never check the session. What's the purpose of the session check on the leader but not the followers? C On Wed, Jan 18, 2012 at 4:26 PM, Manosiz Bhattacharyya manos...@gmail.comwrote: Hello, We are using Zookeeper-3.3.4 with client session timeouts of 5 seconds, and we see frequent timeouts. We have a cluster of 50 nodes (3 of which are ZK nodes) and each node has 5 client connections (a total of 250 connection to the Ensemble). While investigating the zookeeper connections, we found that sometimes pings sent from the zookeeper client does not return from the server within 5 seconds, and the client connection gets disconnected. Digging deeper it seems that pings are enqueued the same way as other requests in the three stage request processing pipeline (prep, sync, finalize) in zkserver. So if there are a lot of write operations from other active sessions in front of a ping from an inactive session in the queues, the inactive session could timeout. My question is whether we can return the ping request from the client immediately from the server, as the purpose of the ping request seems to be to treat it as an heartbeat from relatively inactive sessions. If we keep a separate ping queue in the Prep phase which forwards it straight to the finalize phase, possible requests before the ping which required I/O inside the sync phase would not cause the client timeouts. I hope pings do not generate any order in the database. I did take a cursory look at the code and thought that could be done. Would really appreciate an opinion regarding this. As an aside I should mention that increasing the session timeout to 20 seconds did improved the problem significantly. However as we are using Zookeeper to monitor health of our components, increasing the timeout means that we only get to know a component's death 20 seconds later. This is something we would definitely try to avoid, and would like to go to the 5 second timeout. Regards, Manosiz.
Re: Timeouts and ping handling
On Wed, Jan 18, 2012 at 2:03 PM, Camille Fournier cami...@apache.org wrote: I think it can be done. Looking through the code, it seems like it should be safe modulo some stats that are set in the FinalRequestProcessor that may be less useful. Turning around HBs at the head end of the server is a bad idea. If the server can't support the timeout you requested then you are setting yourself up for trouble if you try to fake it. (think through some of the failure cases...) This is not something you want to do. Rather first look at some of the more obvious issues such as GC, then disk (I've seen ppl go to ramdisks in some cases), then OS/net tuning etc Patrick
robustness in the face of clock changes
I have seen a number of issues at client sites related to cavalier adjustments of clocks. Up to now, my response has been to simply say don't do that, but lately it has been bugging me and it seems like there should be a better solution. The problem scenario involves a step-wise time change on a ZK server node either forward or backwards. The issues are: - a step backwards causes all of the timeouts to be extended by the amount of the step. Thus, if you set all clocks back by an hour, no session will time out for the next hour of real-time. This is bad. - a step forward of sufficient size will cause all live session to immediately time out. To investigate solutions, I played around a bit with nanoTime and currentTimeMillis. My experiments verified that on Linux, nanoTime is, indeed, a timer and currentTimeMillis is a reference to the absolute system clock. In my test program, I use both as the system time is modified and I see stable behavior from nanoTime and the predictably goofy behavior from currentTimeMillis. My test code is at https://github.com/tdunning/timeSkew From these tests, it seems that using nanoTime would be substantially better than using currentTimeMillis in ZK. I think that Camille brought this up a while ago, but I don't remember this going forward. Right now, ZK is very delicate in the face of clock changes and it seems that it could be very robust. Moreover, many naive admins and some experienced admins seem to have no clue about how to keep their clocks well behaved so this delicate nature causes lots of problems. Should I try to prepare a patch? One other thing that I see is that I can't find any way to cause a java process to sleep for an elapsed time. All timer related sleeps that I can find work relative to absolute time rather than intervals. The only work-around I have found is to use Thread.yield() in a polling loop which is clearly only one half step above hideous. Relative to ZK, my question is whether there any critical need anywhere in ZK for a timed sleep.
Re: Timeouts and ping handling
Duh, I knew there was something I was forgetting. You can't process the session timeout faster than the server can process the full pipeline, so making pings come back faster just means you will have a false sense of liveness for your services. The question about why the leaders and followers handle read-only requests differently still stands, though. C On Wed, Jan 18, 2012 at 5:45 PM, Patrick Hunt ph...@apache.org wrote: On Wed, Jan 18, 2012 at 2:03 PM, Camille Fournier cami...@apache.org wrote: I think it can be done. Looking through the code, it seems like it should be safe modulo some stats that are set in the FinalRequestProcessor that may be less useful. Turning around HBs at the head end of the server is a bad idea. If the server can't support the timeout you requested then you are setting yourself up for trouble if you try to fake it. (think through some of the failure cases...) This is not something you want to do. Rather first look at some of the more obvious issues such as GC, then disk (I've seen ppl go to ramdisks in some cases), then OS/net tuning etc Patrick
[jira] [Created] (ZOOKEEPER-1365) Removing a duplicate function and another minor cleanup in QuorumPeer.java
Removing a duplicate function and another minor cleanup in QuorumPeer.java -- Key: ZOOKEEPER-1365 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1365 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Alexander Shraer Assignee: Alexander Shraer Priority: Trivial - getMyId() and getId() in QuorumPeer are doing the same thing - QuorumPeer.quorumPeers is being read directly from outside QuorumPeer, although we have the getter QuorumPeers.getView(). The purpose of this cleanup is to later be able to change more easily the way QuorumPeer manages its list of peers (to support dynamic changes in this list). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Timeouts and ping handling
On Wed, Jan 18, 2012 at 3:21 PM, Camille Fournier c...@renttherunway.com wrote: Duh, I knew there was something I was forgetting. You can't process the session timeout faster than the server can process the full pipeline, so making pings come back faster just means you will have a false sense of liveness for your services. There's also this - we only send HBs when the client is not active. HBs check that the server is alive but at the same time we're also letting the server know that we're alive. However, when the client is active (sending read/write ops) we don't need a HB. Any read/write operation serves as the HB. Say we send a read operation to the server, we won't send another HB to the server until the read operation result comes back (and then 1/3 the timeout after that). In this case you can't take advantage of the hack that's been discussed. The read operation needs to complete, if it takes too long (as in this case) the session will timeout as usual. Now, if you have clients that are largely inactive this may not matter too much, but depending on the use case you might get caught by this. Patrick
[jira] [Updated] (ZOOKEEPER-1365) Removing a duplicate function and another minor cleanup in QuorumPeer.java
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Shraer updated ZOOKEEPER-1365: Attachment: ZOOKEEPER-1365.patch trivial patch - no tests included Removing a duplicate function and another minor cleanup in QuorumPeer.java -- Key: ZOOKEEPER-1365 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1365 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Alexander Shraer Assignee: Alexander Shraer Priority: Trivial Attachments: ZOOKEEPER-1365.patch, ZOOKEEPER-1365.patch - getMyId() and getId() in QuorumPeer are doing the same thing - QuorumPeer.quorumPeers is being read directly from outside QuorumPeer, although we have the getter QuorumPeers.getView(). The purpose of this cleanup is to later be able to change more easily the way QuorumPeer manages its list of peers (to support dynamic changes in this list). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1327) there are still remnants of hadoop urls
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188906#comment-13188906 ] Harsh J commented on ZOOKEEPER-1327: Hey devs, usually hard to maintain wide patches as this. Can someone do a quick review please, and commit if its good? I did a re-review of my changes and I am confident that no URLs have been broken. Tested each one out. there are still remnants of hadoop urls --- Key: ZOOKEEPER-1327 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1327 Project: ZooKeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Harsh J Fix For: 3.4.3, 3.5.0 there are still hadoop urls and references to zookeeper lists under the hadoop project in the sources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-973) bind() could fail on Leader because it does not setReuseAddress on its ServerSocket
[ https://issues.apache.org/jira/browse/ZOOKEEPER-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188917#comment-13188917 ] Harsh J commented on ZOOKEEPER-973: --- Hey devs, are there any further comments you'd like me to address on this patch? Do let me know. bind() could fail on Leader because it does not setReuseAddress on its ServerSocket Key: ZOOKEEPER-973 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-973 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.2 Reporter: Vishal Kher Assignee: Harsh J Priority: Trivial Fix For: 3.5.0 Attachments: ZOOKEEPER-973.patch, ZOOKEEPER-973.patch setReuseAddress(true) should be used below. Leader(QuorumPeer self,LeaderZooKeeperServer zk) throws IOException { this.self = self; try { ss = new ServerSocket(self.getQuorumAddress().getPort()); } catch (BindException e) { LOG.error(Couldn't bind to port + self.getQuorumAddress().getPort(), e); throw e; } this.zk=zk; } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (ZOOKEEPER-1366) Zookeeper should be tolerant of clock adjustments
Zookeeper should be tolerant of clock adjustments - Key: ZOOKEEPER-1366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1366 Project: ZooKeeper Issue Type: Bug Reporter: Ted Dunning Fix For: 3.4.3 If you want to wreak havoc on a ZK based system just do [date -s +1hour] and watch the mayhem as all sessions expire at once. This shouldn't happen. Zookeeper could easily know handle elapsed times as elapsed times rather than as differences between absolute times. The absolute times are subject to adjustment when the clock is set while a timer is not subject to this problem. In Java, System.currentTimeMillis() gives you absolute time while System.nanoTime() gives you time based on a timer from an arbitrary epoch. I have done this and have been running tests now for some tens of minutes with no failures. I will set up a test machine to redo the build again on Ubuntu and post a patch here for discussion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1366) Zookeeper should be tolerant of clock adjustments
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188974#comment-13188974 ] Ted Dunning commented on ZOOKEEPER-1366: See https://github.com/tdunning/zookeeper for a work in progress. Tests seem good except for ReadOnlyModeTest. That may be failing due to unrelated issues. Zookeeper should be tolerant of clock adjustments - Key: ZOOKEEPER-1366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1366 Project: ZooKeeper Issue Type: Bug Reporter: Ted Dunning Fix For: 3.4.3 If you want to wreak havoc on a ZK based system just do [date -s +1hour] and watch the mayhem as all sessions expire at once. This shouldn't happen. Zookeeper could easily know handle elapsed times as elapsed times rather than as differences between absolute times. The absolute times are subject to adjustment when the clock is set while a timer is not subject to this problem. In Java, System.currentTimeMillis() gives you absolute time while System.nanoTime() gives you time based on a timer from an arbitrary epoch. I have done this and have been running tests now for some tens of minutes with no failures. I will set up a test machine to redo the build again on Ubuntu and post a patch here for discussion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-153) Ledger can't be opened or closed due to zero-length metadata
[ https://issues.apache.org/jira/browse/BOOKKEEPER-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188403#comment-13188403 ] Sijie Guo commented on BOOKKEEPER-153: -- discussed with Ivan offline, these ledgers are orphan ledgers (failed creation), which only affect recovery tool. It would be better to handle such kind of ledgers in recovery tool. so I would remove code changes in LedgerOpenOp and create another jira to handle it in recovery tool. Ledger can't be opened or closed due to zero-length metadata Key: BOOKKEEPER-153 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-153 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-client Affects Versions: 4.0.0 Reporter: Sijie Guo Assignee: Sijie Guo Fix For: 4.1.0 Attachments: BK-153.patch Currently creating ledger path and writing ledger metadata are not in a transaction. so if the bookkeeper client (hub server uses bookkeeper client) is crashed, we have a ledger existed in zookeeper with zero-length metadata. we can't open/close it. we should create the ledger path with initial metadata to avoid such case. besides that, we need to add code in openLedgerOp to handle zero-length metadata for backward compatibility. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-153) Ledger can't be opened or closed due to zero-length metadata
[ https://issues.apache.org/jira/browse/BOOKKEEPER-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sijie Guo updated BOOKKEEPER-153: - Attachment: BK-153.patch_v2 remove codes change in LedgerOpenOp Ledger can't be opened or closed due to zero-length metadata Key: BOOKKEEPER-153 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-153 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-client Affects Versions: 4.0.0 Reporter: Sijie Guo Assignee: Sijie Guo Fix For: 4.1.0 Attachments: BK-153.patch, BK-153.patch_v2 Currently creating ledger path and writing ledger metadata are not in a transaction. so if the bookkeeper client (hub server uses bookkeeper client) is crashed, we have a ledger existed in zookeeper with zero-length metadata. we can't open/close it. we should create the ledger path with initial metadata to avoid such case. besides that, we need to add code in openLedgerOp to handle zero-length metadata for backward compatibility. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: bookkeeper-trunk #321
See https://builds.apache.org/job/bookkeeper-trunk/321/ -- Started by timer Building remotely on solaris1 hudson.util.IOException2: remote file operation failed: https://builds.apache.org/job/bookkeeper-trunk/ws/ at hudson.remoting.Channel@16274e4b:solaris1 at hudson.FilePath.act(FilePath.java:780) at hudson.FilePath.act(FilePath.java:766) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:731) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:676) at hudson.model.AbstractProject.checkout(AbstractProject.java:1195) at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:573) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:462) at hudson.model.Run.run(Run.java:1404) at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:481) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:238) Caused by: java.io.IOException: Remote call on solaris1 failed at hudson.remoting.Channel.call(Channel.java:690) at hudson.FilePath.act(FilePath.java:773) ... 10 more Caused by: java.lang.NoClassDefFoundError at hudson.scm.SubversionWorkspaceSelector.syncWorkspaceFormatFromMaster(SubversionWorkspaceSelector.java:85) at hudson.scm.SubversionSCM.createSvnClientManager(SubversionSCM.java:808) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:751) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:738) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2045) at hudson.remoting.UserRequest.perform(UserRequest.java:118) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:287) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269) at java.util.concurrent.FutureTask.run(FutureTask.java:123) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676) at java.lang.Thread.run(Thread.java:595)
Jenkins build is unstable: bookkeeper-trunk #322
See https://builds.apache.org/job/bookkeeper-trunk/322/
[jira] [Created] (BOOKKEEPER-154) Garbage collect messages for those subscribers inactive/offline for a long time.
Garbage collect messages for those subscribers inactive/offline for a long time. - Key: BOOKKEEPER-154 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-154 Project: Bookkeeper Issue Type: New Feature Components: hedwig-client, hedwig-server Affects Versions: 4.0.0 Reporter: Sijie Guo Currently hedwig tracks subscribers progress for garbage collecting published messages. If subscriber subscribe and becomes offline without unsubscribing for a long time, those messages published in its topic have no chance to be garbage collected. A time based garbage collection policy would be suitable for this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-154) Garbage collect messages for those subscribers inactive/offline for a long time.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188902#comment-13188902 ] Sijie Guo commented on BOOKKEEPER-154: -- currently we don't have publish timestamp for each message. it would be not easy to implement such time-based garbage collection policy in hub server itself. so a proposal is to provide a offline tool to check subscriber's state to do time based gc. if a subscriber is inactive for a long time, the offline tool send a CONSUME request for this subscriber to consume to the lastest message. the tool works as below: loop over all topics, for each topic: 1) find the subscriber who is inactive for a long time: read subscriber znodes, we can get the modify time for these znodes. if these znodes are not modified for a long time, it means that these subscribers were not active for a long time. 2) read the lastest message id : we can parse ledgers znode to get it. we did the similar thing in BOOKKEEPER-77 . 3) do #subscribe the topic for the found inactive subscribers. (if these subscribers are online, the subscription would be fail. we should not do CONSUME for them) send a CONSUME request to hub server for them, to consume to the lastest message. Garbage collect messages for those subscribers inactive/offline for a long time. - Key: BOOKKEEPER-154 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-154 Project: Bookkeeper Issue Type: New Feature Components: hedwig-client, hedwig-server Affects Versions: 4.0.0 Reporter: Sijie Guo Currently hedwig tracks subscribers progress for garbage collecting published messages. If subscriber subscribe and becomes offline without unsubscribing for a long time, those messages published in its topic have no chance to be garbage collected. A time based garbage collection policy would be suitable for this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira