I found one problem - https://issues.apache.org/jira/browse/ZOOKEEPER-1144 I am not sure if this is causing Euguene's test to fail as well.
On Wed, Aug 3, 2011 at 12:32 PM, Patrick Hunt <[email protected]> wrote: > Seems the observer (or the quorum itself) is failing to allow a client > to connect: > > [junit] 2011-08-03 14:12:29,273 [myid:3] - INFO > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:11229:QuorumPeer@701] - OBSERVING > .... > [junit] 2011-08-03 14:12:29,359 [myid:] - INFO > [main:ZooKeeper@427] - Initiating client connection, > connectString=127.0.0.1:11229 sessionTimeout=30000 > watcher=org.apache.zookeeper.test.ObserverTest@6490832e > [junit] 2011-08-03 14:12:29,378 [myid:] - INFO > [main-SendThread():ClientCnxn$SendThread@888] - Opening socket > connection to server /127.0.0.1:11229 > [junit] 2011-08-03 14:12:29,379 [myid:3] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11229:NIOServerCnxnFactory@197] > - Accepted socket connection from /127.0.0.1:56250 > [junit] 2011-08-03 14:12:29,379 [myid:] - INFO > [main-SendThread(localhost:11229):ClientCnxn$SendThread@814] - Socket > connection established to localhost/127.0.0.1:11229, initiating > session > [junit] 2011-08-03 14:12:29,380 [myid:3] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11229:ZooKeeperServer@833] - > Client attempting to establish new session at /127.0.0.1:56250 > [junit] 2011-08-03 14:12:53,356 [myid:2] - INFO > [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11228:Leader@419] - Shutting down > > Notice that last line, ~24 seconds go by. > > Please file a bug on this. Blocker for 3.4.0. > > Can you try re-running your test, but modify it to attempt to have a > client connect to a non-observer in the case that connecting to the > observer fails? It would be interesting to see if this was an observer > specific issue or not. (another thing perhaps to try is just have the > existing client connect to a non-observer rather than the observer, > run it a bunch of times and see if it happens) > > Patrick > > On Wed, Aug 3, 2011 at 7:16 AM, Eugene Koontz <[email protected]> > wrote: > > On 8/2/11 10:32 PM, Patrick Hunt wrote: > >> > >> What type of ec2 instance are you running on? I've seen some failures > >> due to underpowered/underresourced systems. > >> > >> Is ObserverTest consistently failing? > >> > >> Patrick > >> > > Hi Patrick, > > It's an m1.large. I have ulimit -a set so that open files and open > > processes are at 100,000. > > > > If I run ObserverTest on its own using the attached shell script > repeat.sh > > (src/repeat.sh ObserverTest), it usually fails within 20 iterations; > > (although in the following pastebin it took 38 iterations to fail). > > > > > > It always fails in the same place, at ObserverTest.java:101: > > > > http://pastebin.com/BGNUb05t > > > > -Eugene > > > > >
