ZooKeeper-trunk-solaris - Build # 108 - Failure

2012-01-18 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-solaris/108/

###
## LAST 60 LINES OF THE CONSOLE 
###
Started by timer
Building remotely on solaris1
hudson.util.IOException2: remote file operation failed: 
/export/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris at 
hudson.remoting.Channel@2e339415:solaris1
at hudson.FilePath.act(FilePath.java:780)
at hudson.FilePath.act(FilePath.java:766)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:731)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:676)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1195)
at 
hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:573)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:462)
at hudson.model.Run.run(Run.java:1404)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:238)
Caused by: java.io.IOException: Remote call on solaris1 failed
at hudson.remoting.Channel.call(Channel.java:690)
at hudson.FilePath.act(FilePath.java:773)
... 10 more
Caused by: java.lang.NoClassDefFoundError
at 
hudson.scm.SubversionWorkspaceSelector.syncWorkspaceFormatFromMaster(SubversionWorkspaceSelector.java:85)
at 
hudson.scm.SubversionSCM.createSvnClientManager(SubversionSCM.java:808)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:751)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:738)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2045)
at hudson.remoting.UserRequest.perform(UserRequest.java:118)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:287)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
at java.lang.Thread.run(Thread.java:595)
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.


ZooKeeper-trunk-jdk7 - Build # 149 - Failure

2012-01-18 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-jdk7/149/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 137373 lines...]
[junit] 2012-01-18 10:06:13,467 [myid:] - INFO  [main:ClientBase@417] - 
STOPPING server
[junit] 2012-01-18 10:06:13,467 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@240] - 
NIOServerCnxn factory exited run method
[junit] 2012-01-18 10:06:13,467 [myid:] - INFO  [main:ZooKeeperServer@391] 
- shutting down
[junit] 2012-01-18 10:06:13,467 [myid:] - INFO  
[main:SessionTrackerImpl@220] - Shutting down
[junit] 2012-01-18 10:06:13,468 [myid:] - INFO  
[main:PrepRequestProcessor@711] - Shutting down
[junit] 2012-01-18 10:06:13,468 [myid:] - INFO  
[main:SyncRequestProcessor@173] - Shutting down
[junit] 2012-01-18 10:06:13,468 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@134] - PrepRequestProcessor exited loop!
[junit] 2012-01-18 10:06:13,468 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@155] - SyncRequestProcessor exited!
[junit] 2012-01-18 10:06:13,468 [myid:] - INFO  
[main:FinalRequestProcessor@419] - shutdown of request processor complete
[junit] 2012-01-18 10:06:13,469 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2012-01-18 10:06:13,469 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[]
[junit] 2012-01-18 10:06:13,470 [myid:] - INFO  [main:ClientBase@410] - 
STARTING server
[junit] 2012-01-18 10:06:13,471 [myid:] - INFO  [main:ZooKeeperServer@143] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test3695124395202373686.junit.dir/version-2
 snapdir 
/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test3695124395202373686.junit.dir/version-2
[junit] 2012-01-18 10:06:13,471 [myid:] - INFO  
[main:NIOServerCnxnFactory@110] - binding to port 0.0.0.0/0.0.0.0:11221
[junit] 2012-01-18 10:06:13,472 [myid:] - INFO  [main:FileSnap@83] - 
Reading snapshot 
/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test3695124395202373686.junit.dir/version-2/snapshot.b
[junit] 2012-01-18 10:06:13,473 [myid:] - INFO  [main:FileTxnSnapLog@237] - 
Snapshotting: b
[junit] 2012-01-18 10:06:13,475 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2012-01-18 10:06:13,475 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@213] - 
Accepted socket connection from /127.0.0.1:54646
[junit] 2012-01-18 10:06:13,476 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@820] - Processing 
stat command from /127.0.0.1:54646
[junit] 2012-01-18 10:06:13,476 [myid:] - INFO  
[Thread-4:NIOServerCnxn$StatCommand@655] - Stat command output
[junit] 2012-01-18 10:06:13,476 [myid:] - INFO  
[Thread-4:NIOServerCnxn@1000] - Closed socket connection for client 
/127.0.0.1:54646 (no session established for client)
[junit] 2012-01-18 10:06:13,477 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[InMemoryDataTree, StandaloneServer_port]
[junit] 2012-01-18 10:06:13,478 [myid:] - INFO  [main:JMXEnv@105] - 
expect:InMemoryDataTree
[junit] 2012-01-18 10:06:13,478 [myid:] - INFO  [main:JMXEnv@108] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
[junit] 2012-01-18 10:06:13,478 [myid:] - INFO  [main:JMXEnv@105] - 
expect:StandaloneServer_port
[junit] 2012-01-18 10:06:13,478 [myid:] - INFO  [main:JMXEnv@108] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1
[junit] 2012-01-18 10:06:13,479 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota
[junit] 2012-01-18 10:06:13,479 [myid:] - INFO  [main:ClientBase@447] - 
tearDown starting
[junit] 2012-01-18 10:06:13,554 [myid:] - INFO  [main:ZooKeeper@679] - 
Session: 0x134f047ef3e closed
[junit] 2012-01-18 10:06:13,554 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@511] - EventThread shut down
[junit] 2012-01-18 10:06:13,554 [myid:] - INFO  [main:ClientBase@417] - 
STOPPING server
[junit] 2012-01-18 10:06:13,554 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@240] - 
NIOServerCnxn factory exited run method
[junit] 2012-01-18 10:06:13,555 [myid:] - INFO  [main:ZooKeeperServer@391] 
- shutting down
[junit] 2012-01-18 10:06:13,555 [myid:] - INFO  
[main:SessionTrackerImpl@220] - Shutting down
[junit] 2012-01-18 10:06:13,555 [myid:] - INFO  
[main:PrepRequestProcessor@711] - Shutting down
[junit] 2012-01-18 10:06:13,555 [myid:] - INFO  
[main:SyncRequestProcessor@173] 

ZooKeeper-trunk - Build # 1432 - Failure

2012-01-18 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk/1432/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 138452 lines...]
[junit] 2012-01-18 10:55:09,284 [myid:] - INFO  
[main:PrepRequestProcessor@711] - Shutting down
[junit] 2012-01-18 10:55:09,285 [myid:] - INFO  
[main:SyncRequestProcessor@173] - Shutting down
[junit] 2012-01-18 10:55:09,285 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@134] - PrepRequestProcessor exited loop!
[junit] 2012-01-18 10:55:09,285 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@155] - SyncRequestProcessor exited!
[junit] 2012-01-18 10:55:09,285 [myid:] - INFO  
[main:FinalRequestProcessor@419] - shutdown of request processor complete
[junit] 2012-01-18 10:55:09,286 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2012-01-18 10:55:09,286 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[]
[junit] 2012-01-18 10:55:09,287 [myid:] - INFO  [main:ClientBase@410] - 
STARTING server
[junit] 2012-01-18 10:55:09,288 [myid:] - INFO  [main:ZooKeeperServer@143] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test6903943840352692016.junit.dir/version-2
 snapdir 
/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test6903943840352692016.junit.dir/version-2
[junit] 2012-01-18 10:55:09,288 [myid:] - INFO  
[main:NIOServerCnxnFactory@110] - binding to port 0.0.0.0/0.0.0.0:11221
[junit] 2012-01-18 10:55:09,289 [myid:] - INFO  [main:FileSnap@83] - 
Reading snapshot 
/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test6903943840352692016.junit.dir/version-2/snapshot.b
[junit] 2012-01-18 10:55:09,291 [myid:] - INFO  [main:FileTxnSnapLog@237] - 
Snapshotting: b
[junit] 2012-01-18 10:55:09,293 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2012-01-18 10:55:09,293 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@213] - 
Accepted socket connection from /127.0.0.1:58072
[junit] 2012-01-18 10:55:09,294 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@820] - Processing 
stat command from /127.0.0.1:58072
[junit] 2012-01-18 10:55:09,294 [myid:] - INFO  
[Thread-5:NIOServerCnxn$StatCommand@655] - Stat command output
[junit] 2012-01-18 10:55:09,294 [myid:] - INFO  
[Thread-5:NIOServerCnxn@1000] - Closed socket connection for client 
/127.0.0.1:58072 (no session established for client)
[junit] 2012-01-18 10:55:09,295 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[InMemoryDataTree, StandaloneServer_port]
[junit] 2012-01-18 10:55:09,296 [myid:] - INFO  [main:JMXEnv@105] - 
expect:InMemoryDataTree
[junit] 2012-01-18 10:55:09,296 [myid:] - INFO  [main:JMXEnv@108] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
[junit] 2012-01-18 10:55:09,296 [myid:] - INFO  [main:JMXEnv@105] - 
expect:StandaloneServer_port
[junit] 2012-01-18 10:55:09,296 [myid:] - INFO  [main:JMXEnv@108] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1
[junit] 2012-01-18 10:55:09,297 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota
[junit] 2012-01-18 10:55:09,297 [myid:] - INFO  [main:ClientBase@447] - 
tearDown starting
[junit] 2012-01-18 10:55:09,369 [myid:] - INFO  [main:ZooKeeper@679] - 
Session: 0x134f074bb13 closed
[junit] 2012-01-18 10:55:09,369 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@511] - EventThread shut down
[junit] 2012-01-18 10:55:09,369 [myid:] - INFO  [main:ClientBase@417] - 
STOPPING server
[junit] 2012-01-18 10:55:09,370 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@240] - 
NIOServerCnxn factory exited run method
[junit] 2012-01-18 10:55:09,370 [myid:] - INFO  [main:ZooKeeperServer@391] 
- shutting down
[junit] 2012-01-18 10:55:09,370 [myid:] - INFO  
[main:SessionTrackerImpl@220] - Shutting down
[junit] 2012-01-18 10:55:09,370 [myid:] - INFO  
[main:PrepRequestProcessor@711] - Shutting down
[junit] 2012-01-18 10:55:09,370 [myid:] - INFO  
[main:SyncRequestProcessor@173] - Shutting down
[junit] 2012-01-18 10:55:09,370 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@134] - PrepRequestProcessor exited loop!
[junit] 2012-01-18 10:55:09,371 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@155] - SyncRequestProcessor exited!
[junit] 2012-01-18 10:55:09,371 [myid:] - INFO  
[main:FinalRequestProcessor@419] - shutdown of request processor complete
[junit] 2012-01-18 10:55:09,372 [myid:] - INFO  

Re: Timeouts and ping handling

2012-01-18 Thread Camille Fournier
I think it can be done. Looking through the code, it seems like it should
be safe modulo some stats that are set in the FinalRequestProcessor that
may be less useful.

A question for the other zookeeper devs out there, is there a reason that
we handle read-only operations in the first processor differently on the
leader than the followers? The leader (calling PrepRequestProcessor first)
will do a session check for any of the read-only requests:
 zks.sessionTracker.checkSession(request.sessionId,
request.getOwner());

but the FollowerRequestProcessor will just push these requests to its
second processor, and never check the session. What's the purpose of the
session check on the leader but not the followers?

C

On Wed, Jan 18, 2012 at 4:26 PM, Manosiz Bhattacharyya
manos...@gmail.comwrote:

 Hello,

  We are using Zookeeper-3.3.4 with client session timeouts of 5 seconds,
 and we see frequent timeouts. We have a cluster of 50 nodes (3 of which are
 ZK nodes) and each node has 5 client connections (a total of 250 connection
 to the Ensemble). While investigating the zookeeper connections, we found
 that sometimes pings sent from the zookeeper client does not return from
 the server within 5 seconds, and the client connection gets disconnected.
 Digging deeper it seems that pings are enqueued the same way as other
 requests in the three stage request processing pipeline (prep, sync,
 finalize) in zkserver. So if there are a lot of write operations from other
 active sessions in front of a ping from an inactive session in the queues,
 the inactive session could timeout.

 My question is whether we can return the ping request from the client
 immediately from the server, as the purpose of the ping request seems to be
 to treat it as an heartbeat from relatively inactive sessions. If we keep a
 separate ping queue in the Prep phase which forwards it straight to the
 finalize phase, possible requests before the ping which required I/O inside
 the sync phase would not cause the client timeouts. I hope pings do not
 generate any order in the database. I did take a cursory look at the code
 and thought that could be done. Would really appreciate an opinion
 regarding this.

 As an aside I should mention that increasing the session timeout to 20
 seconds did improved the problem significantly. However as we are using
 Zookeeper to monitor health of our components, increasing the timeout means
 that we only get to know a component's death 20 seconds later. This is
 something we would definitely try to avoid, and would like to go to the 5
 second timeout.

 Regards,
 Manosiz.



Re: Timeouts and ping handling

2012-01-18 Thread Patrick Hunt
On Wed, Jan 18, 2012 at 2:03 PM, Camille Fournier cami...@apache.org wrote:
 I think it can be done. Looking through the code, it seems like it should
 be safe modulo some stats that are set in the FinalRequestProcessor that
 may be less useful.


Turning around HBs at the head end of the server is a bad idea. If the
server can't support the timeout you requested then you are setting
yourself up for trouble if you try to fake it. (think through some of
the failure cases...)

This is not something you want to do. Rather first look at some of the
more obvious issues such as GC, then disk (I've seen ppl go to
ramdisks in some cases), then OS/net tuning etc

Patrick


robustness in the face of clock changes

2012-01-18 Thread Ted Dunning
I have seen a number of issues at client sites related to cavalier
adjustments of clocks.  Up to now, my response has been to simply say
don't do that, but lately it has been bugging me and it seems like there
should be a better solution.

The problem scenario involves a step-wise time change on a ZK server node
either forward or backwards.  The issues are:

- a step backwards causes all of the timeouts to be extended by the amount
of the step.  Thus, if you set all clocks back by an hour, no session will
time out for the next hour of real-time.  This is bad.

- a step forward of sufficient size will cause all live session to
immediately time out.

To investigate solutions, I played around a bit with nanoTime and
currentTimeMillis.  My experiments verified that on Linux, nanoTime is,
indeed, a timer and currentTimeMillis is a reference to the absolute system
clock.  In my test program, I use both as the system time is modified and I
see stable behavior from nanoTime and the predictably goofy behavior from
currentTimeMillis.  My test code is at https://github.com/tdunning/timeSkew

From these tests, it seems that using nanoTime would be substantially
better than using currentTimeMillis in ZK.  I think that Camille brought
this up a while ago, but I don't remember this going forward.  Right now,
ZK is very delicate in the face of clock changes and it seems that it could
be very robust.  Moreover, many naive admins and some experienced admins
seem to have no clue about how to keep their clocks well behaved so this
delicate nature causes lots of problems.

Should I try to prepare a patch?

One other thing that I see is that I can't find any way to cause a java
process to sleep for an elapsed time.  All timer related sleeps that I can
find work relative to absolute time rather than intervals.  The only
work-around I have found is to use Thread.yield() in a polling loop which
is clearly only one half step above hideous.

Relative to ZK, my question is whether there any critical need anywhere in
ZK for a timed sleep.


Re: Timeouts and ping handling

2012-01-18 Thread Camille Fournier
Duh, I knew there was something I was forgetting. You can't process the
session timeout faster than the server can process the full pipeline, so
making pings come back faster just means you will have a false sense of
liveness for your services.

The question about why the leaders and followers handle read-only requests
differently still stands, though.

C

On Wed, Jan 18, 2012 at 5:45 PM, Patrick Hunt ph...@apache.org wrote:

 On Wed, Jan 18, 2012 at 2:03 PM, Camille Fournier cami...@apache.org
 wrote:
  I think it can be done. Looking through the code, it seems like it should
  be safe modulo some stats that are set in the FinalRequestProcessor that
  may be less useful.
 

 Turning around HBs at the head end of the server is a bad idea. If the
 server can't support the timeout you requested then you are setting
 yourself up for trouble if you try to fake it. (think through some of
 the failure cases...)

 This is not something you want to do. Rather first look at some of the
 more obvious issues such as GC, then disk (I've seen ppl go to
 ramdisks in some cases), then OS/net tuning etc

 Patrick



[jira] [Created] (ZOOKEEPER-1365) Removing a duplicate function and another minor cleanup in QuorumPeer.java

2012-01-18 Thread Alexander Shraer (Created) (JIRA)
Removing a duplicate function and another minor cleanup in QuorumPeer.java
--

 Key: ZOOKEEPER-1365
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1365
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Alexander Shraer
Assignee: Alexander Shraer
Priority: Trivial


- getMyId() and getId() in QuorumPeer are doing the same thing
- QuorumPeer.quorumPeers is being read directly from outside QuorumPeer, 
although we have the getter QuorumPeers.getView(). 

The purpose of this cleanup is to later be able to change more easily the way 
QuorumPeer manages its list of peers (to support dynamic changes in this list).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Timeouts and ping handling

2012-01-18 Thread Patrick Hunt
On Wed, Jan 18, 2012 at 3:21 PM, Camille Fournier c...@renttherunway.com 
wrote:
 Duh, I knew there was something I was forgetting. You can't process the
 session timeout faster than the server can process the full pipeline, so
 making pings come back faster just means you will have a false sense of
 liveness for your services.

There's also this - we only send HBs when the client is not active.
HBs check that the server is alive but at the same time we're also
letting the server know that we're alive.

However, when the client is active (sending read/write ops) we don't
need a HB. Any read/write operation serves as the HB. Say we send a
read operation to the server, we won't send another HB to the server
until the read operation result comes back (and then 1/3 the timeout
after that). In this case you can't take advantage of the hack that's
been discussed. The read operation needs to complete, if it takes too
long (as in this case) the session will timeout as usual. Now, if you
have clients that are largely inactive this may not matter too much,
but depending on the use case you might get caught by this.

Patrick


[jira] [Updated] (ZOOKEEPER-1365) Removing a duplicate function and another minor cleanup in QuorumPeer.java

2012-01-18 Thread Alexander Shraer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Shraer updated ZOOKEEPER-1365:


Attachment: ZOOKEEPER-1365.patch

trivial patch - no tests included 

 Removing a duplicate function and another minor cleanup in QuorumPeer.java
 --

 Key: ZOOKEEPER-1365
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1365
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Alexander Shraer
Assignee: Alexander Shraer
Priority: Trivial
 Attachments: ZOOKEEPER-1365.patch, ZOOKEEPER-1365.patch


 - getMyId() and getId() in QuorumPeer are doing the same thing
 - QuorumPeer.quorumPeers is being read directly from outside QuorumPeer, 
 although we have the getter QuorumPeers.getView(). 
 The purpose of this cleanup is to later be able to change more easily the way 
 QuorumPeer manages its list of peers (to support dynamic changes in this 
 list).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1327) there are still remnants of hadoop urls

2012-01-18 Thread Harsh J (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188906#comment-13188906
 ] 

Harsh J commented on ZOOKEEPER-1327:


Hey devs, usually hard to maintain wide patches as this. Can someone do a quick 
review please, and commit if its good?

I did a re-review of my changes and I am confident that no URLs have been 
broken. Tested each one out.

 there are still remnants of hadoop urls
 ---

 Key: ZOOKEEPER-1327
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1327
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Harsh J
 Fix For: 3.4.3, 3.5.0


 there are still hadoop urls and references to zookeeper lists under the 
 hadoop project in the sources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-973) bind() could fail on Leader because it does not setReuseAddress on its ServerSocket

2012-01-18 Thread Harsh J (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188917#comment-13188917
 ] 

Harsh J commented on ZOOKEEPER-973:
---

Hey devs, are there any further comments you'd like me to address on this 
patch? Do let me know.

 bind() could fail on Leader because it does not setReuseAddress on its 
 ServerSocket 
 

 Key: ZOOKEEPER-973
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-973
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.2
Reporter: Vishal Kher
Assignee: Harsh J
Priority: Trivial
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-973.patch, ZOOKEEPER-973.patch


 setReuseAddress(true) should be used below.
 Leader(QuorumPeer self,LeaderZooKeeperServer zk) throws IOException {
 this.self = self;
 try {
 ss = new ServerSocket(self.getQuorumAddress().getPort());
 } catch (BindException e) {
 LOG.error(Couldn't bind to port 
 + self.getQuorumAddress().getPort(), e);
 throw e;
 }
 this.zk=zk;
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (ZOOKEEPER-1366) Zookeeper should be tolerant of clock adjustments

2012-01-18 Thread Ted Dunning (Created) (JIRA)
Zookeeper should be tolerant of clock adjustments
-

 Key: ZOOKEEPER-1366
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1366
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Dunning
 Fix For: 3.4.3


If you want to wreak havoc on a ZK based system just do [date -s +1hour] and 
watch the mayhem as all sessions expire at once.

This shouldn't happen.  Zookeeper could easily know handle elapsed times as 
elapsed times rather than as differences between absolute times.  The absolute 
times are subject to adjustment when the clock is set while a timer is not 
subject to this problem.  In Java, System.currentTimeMillis() gives you 
absolute time while System.nanoTime() gives you time based on a timer from an 
arbitrary epoch.

I have done this and have been running tests now for some tens of minutes with 
no failures.  I will set up a test machine to redo the build again on Ubuntu 
and post a patch here for discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1366) Zookeeper should be tolerant of clock adjustments

2012-01-18 Thread Ted Dunning (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188974#comment-13188974
 ] 

Ted Dunning commented on ZOOKEEPER-1366:


See https://github.com/tdunning/zookeeper for a work in progress.

Tests seem good except for ReadOnlyModeTest.  That may be failing due to 
unrelated issues.

 Zookeeper should be tolerant of clock adjustments
 -

 Key: ZOOKEEPER-1366
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1366
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Dunning
 Fix For: 3.4.3


 If you want to wreak havoc on a ZK based system just do [date -s +1hour] 
 and watch the mayhem as all sessions expire at once.
 This shouldn't happen.  Zookeeper could easily know handle elapsed times as 
 elapsed times rather than as differences between absolute times.  The 
 absolute times are subject to adjustment when the clock is set while a timer 
 is not subject to this problem.  In Java, System.currentTimeMillis() gives 
 you absolute time while System.nanoTime() gives you time based on a timer 
 from an arbitrary epoch.
 I have done this and have been running tests now for some tens of minutes 
 with no failures.  I will set up a test machine to redo the build again on 
 Ubuntu and post a patch here for discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (BOOKKEEPER-153) Ledger can't be opened or closed due to zero-length metadata

2012-01-18 Thread Sijie Guo (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188403#comment-13188403
 ] 

Sijie Guo commented on BOOKKEEPER-153:
--

discussed with Ivan offline, these ledgers are orphan ledgers (failed 
creation), which only affect recovery tool. It would be better to handle such 
kind of ledgers in recovery tool. so I would remove code changes in 
LedgerOpenOp and create another jira to handle it in recovery tool.

 Ledger can't be opened or closed due to zero-length metadata
 

 Key: BOOKKEEPER-153
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-153
 Project: Bookkeeper
  Issue Type: Bug
  Components: bookkeeper-client
Affects Versions: 4.0.0
Reporter: Sijie Guo
Assignee: Sijie Guo
 Fix For: 4.1.0

 Attachments: BK-153.patch


 Currently creating ledger path and writing ledger metadata are not in a 
 transaction. so if the bookkeeper client (hub server uses bookkeeper client) 
 is crashed, we have a ledger existed in zookeeper with zero-length metadata. 
 we can't open/close it.
 we should create the ledger path with initial metadata to avoid such case. 
 besides that, we need to add code in openLedgerOp to handle zero-length 
 metadata for backward compatibility.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (BOOKKEEPER-153) Ledger can't be opened or closed due to zero-length metadata

2012-01-18 Thread Sijie Guo (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sijie Guo updated BOOKKEEPER-153:
-

Attachment: BK-153.patch_v2

remove codes change in LedgerOpenOp

 Ledger can't be opened or closed due to zero-length metadata
 

 Key: BOOKKEEPER-153
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-153
 Project: Bookkeeper
  Issue Type: Bug
  Components: bookkeeper-client
Affects Versions: 4.0.0
Reporter: Sijie Guo
Assignee: Sijie Guo
 Fix For: 4.1.0

 Attachments: BK-153.patch, BK-153.patch_v2


 Currently creating ledger path and writing ledger metadata are not in a 
 transaction. so if the bookkeeper client (hub server uses bookkeeper client) 
 is crashed, we have a ledger existed in zookeeper with zero-length metadata. 
 we can't open/close it.
 we should create the ledger path with initial metadata to avoid such case. 
 besides that, we need to add code in openLedgerOp to handle zero-length 
 metadata for backward compatibility.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Build failed in Jenkins: bookkeeper-trunk #321

2012-01-18 Thread Apache Jenkins Server
See https://builds.apache.org/job/bookkeeper-trunk/321/

--
Started by timer
Building remotely on solaris1
hudson.util.IOException2: remote file operation failed: 
https://builds.apache.org/job/bookkeeper-trunk/ws/ at 
hudson.remoting.Channel@16274e4b:solaris1
at hudson.FilePath.act(FilePath.java:780)
at hudson.FilePath.act(FilePath.java:766)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:731)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:676)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1195)
at 
hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:573)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:462)
at hudson.model.Run.run(Run.java:1404)
at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:481)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:238)
Caused by: java.io.IOException: Remote call on solaris1 failed
at hudson.remoting.Channel.call(Channel.java:690)
at hudson.FilePath.act(FilePath.java:773)
... 10 more
Caused by: java.lang.NoClassDefFoundError
at 
hudson.scm.SubversionWorkspaceSelector.syncWorkspaceFormatFromMaster(SubversionWorkspaceSelector.java:85)
at 
hudson.scm.SubversionSCM.createSvnClientManager(SubversionSCM.java:808)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:751)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:738)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2045)
at hudson.remoting.UserRequest.perform(UserRequest.java:118)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:287)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
at java.lang.Thread.run(Thread.java:595)



Jenkins build is unstable: bookkeeper-trunk #322

2012-01-18 Thread Apache Jenkins Server
See https://builds.apache.org/job/bookkeeper-trunk/322/




[jira] [Created] (BOOKKEEPER-154) Garbage collect messages for those subscribers inactive/offline for a long time.

2012-01-18 Thread Sijie Guo (Created) (JIRA)
Garbage collect messages for those subscribers inactive/offline for a long 
time. 
-

 Key: BOOKKEEPER-154
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-154
 Project: Bookkeeper
  Issue Type: New Feature
  Components: hedwig-client, hedwig-server
Affects Versions: 4.0.0
Reporter: Sijie Guo


Currently hedwig tracks subscribers progress for garbage collecting published 
messages. If subscriber subscribe and becomes offline without unsubscribing for 
a long time, those messages published in its topic have no chance to be garbage 
collected.

A time based garbage collection policy would be suitable for this case. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (BOOKKEEPER-154) Garbage collect messages for those subscribers inactive/offline for a long time.

2012-01-18 Thread Sijie Guo (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188902#comment-13188902
 ] 

Sijie Guo commented on BOOKKEEPER-154:
--

currently we don't have publish timestamp for each message. it would be not 
easy to implement such time-based garbage collection policy in hub server 
itself. 

so a proposal is to provide a offline tool to check subscriber's state to do 
time based gc. if a subscriber is inactive for a long time, the offline tool 
send a CONSUME request for this subscriber to consume to the lastest message.

the tool works as below:

loop over all topics, for each topic:
1) find the subscriber who is inactive for a long time: read subscriber znodes, 
we can get the modify time for these znodes. if these znodes are not modified 
for a long time, it means that these subscribers were not active for a long 
time.
2) read the lastest message id : we can parse ledgers znode to get it. we did 
the similar thing in BOOKKEEPER-77 .
3) do #subscribe the topic for the found inactive subscribers. (if these 
subscribers are online, the subscription would be fail. we should not do 
CONSUME for them) send a CONSUME request to hub server for them, to consume to 
the lastest message.



 Garbage collect messages for those subscribers inactive/offline for a long 
 time. 
 -

 Key: BOOKKEEPER-154
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-154
 Project: Bookkeeper
  Issue Type: New Feature
  Components: hedwig-client, hedwig-server
Affects Versions: 4.0.0
Reporter: Sijie Guo

 Currently hedwig tracks subscribers progress for garbage collecting published 
 messages. If subscriber subscribe and becomes offline without unsubscribing 
 for a long time, those messages published in its topic have no chance to be 
 garbage collected.
 A time based garbage collection policy would be suitable for this case. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira