Seems the observer (or the quorum itself) is failing to allow a client
to connect:

[junit] 2011-08-03 14:12:29,273 [myid:3] - INFO
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:11229:QuorumPeer@701] - OBSERVING
....
    [junit] 2011-08-03 14:12:29,359 [myid:] - INFO
[main:ZooKeeper@427] - Initiating client connection,
connectString=127.0.0.1:11229 sessionTimeout=30000
watcher=org.apache.zookeeper.test.ObserverTest@6490832e
    [junit] 2011-08-03 14:12:29,378 [myid:] - INFO
[main-SendThread():ClientCnxn$SendThread@888] - Opening socket
connection to server /127.0.0.1:11229
    [junit] 2011-08-03 14:12:29,379 [myid:3] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11229:NIOServerCnxnFactory@197]
- Accepted socket connection from /127.0.0.1:56250
    [junit] 2011-08-03 14:12:29,379 [myid:] - INFO
[main-SendThread(localhost:11229):ClientCnxn$SendThread@814] - Socket
connection established to localhost/127.0.0.1:11229, initiating
session
    [junit] 2011-08-03 14:12:29,380 [myid:3] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11229:ZooKeeperServer@833] -
Client attempting to establish new session at /127.0.0.1:56250
    [junit] 2011-08-03 14:12:53,356 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11228:Leader@419] - Shutting down

Notice that last line, ~24 seconds go by.

Please file a bug on this. Blocker for 3.4.0.

Can you try re-running your test, but modify it to attempt to have a
client connect to a non-observer in the case that connecting to the
observer fails? It would be interesting to see if this was an observer
specific issue or not. (another thing perhaps to try is just have the
existing client connect to a non-observer rather than the observer,
run it a bunch of times and see if it happens)

Patrick

On Wed, Aug 3, 2011 at 7:16 AM, Eugene Koontz <ekoo...@hiro-tan.org> wrote:
> On 8/2/11 10:32 PM, Patrick Hunt wrote:
>>
>> What type of ec2 instance are you running on? I've seen some failures
>> due to underpowered/underresourced systems.
>>
>> Is ObserverTest consistently failing?
>>
>> Patrick
>>
> Hi Patrick,
>    It's an m1.large. I have ulimit -a set so that open files and open
> processes are at 100,000.
>
>  If I run ObserverTest on its own using the attached shell script repeat.sh
> (src/repeat.sh ObserverTest), it usually fails within 20 iterations;
> (although in the following pastebin it took 38 iterations to fail).
>
>
> It always fails in the same place, at ObserverTest.java:101:
>
> http://pastebin.com/BGNUb05t
>
> -Eugene
>
>

Reply via email to