[ https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563719#comment-14563719 ]
Germán Blanco commented on ZOOKEEPER-832: ----------------------------------------- We crossed messages there, sorry. My intuition was actually wrong. There doesn't seem to be anything wrong with the C code. It seems this is a synchronization issue. The thread that listens for connections starts before the ZkDatabase is loaded, so it reads a low zxid and kills the session. This was not noticeable before, because the client looped until the zxid in the server was high enough and then connected. I think that the cleanest thing is to put a Latch and wait for the ZkDatabase to load before accepting connections. I will update the patch with this fix (the Latch to wait for the ZkDatabase to load) on top of the previous patch and that should pass the tests. > Invalid session id causes infinite loop during automatic reconnect > ------------------------------------------------------------------ > > Key: ZOOKEEPER-832 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.4.5, 3.5.0 > Environment: All > Reporter: Ryan Holmes > Assignee: Germán Blanco > Priority: Blocker > Fix For: 3.4.7, 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, > ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch > > > Steps to reproduce: > 1.) Connect to a standalone server using the Java client. > 2.) Stop the server. > 3.) Delete the contents of the data directory (i.e. the persisted session > data). > 4.) Start the server. > The client now automatically tries to reconnect but the server refuses the > connection because the session id is invalid. The client and server are now > in an infinite loop of attempted and rejected connections. While this > situation represents a catastrophic failure and the current behavior is not > incorrect, it appears that there is no way to detect this situation on the > client and therefore no way to recover. > The suggested improvement is to send an event to the default watcher > indicating that the current state is "session invalid", similar to how the > "session expired" state is handled. > Server log output (repeats indefinitely): > 2010-08-05 11:48:08,283 - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - > Accepted socket connection from /127.0.0.1:63292 > 2010-08-05 11:48:08,284 - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing > session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last > zxid is 0x0 client must try another server > 2010-08-05 11:48:08,284 - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed > socket connection for client /127.0.0.1:63292 (no session established for > client) > Client log output (repeats indefinitely): > 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - > Opening socket connection to server localhost/127.0.0.1:2181 > 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session > 0x12a3ae4e893000a for server null, unexpected error, closing socket > connection and attempting reconnect > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) > 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring > exception during shutdown input > java.nio.channels.ClosedChannelException > at > sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) > at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) > at > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) > 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring > exception during shutdown output > java.nio.channels.ClosedChannelException > at > sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) > at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) > at > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) -- This message was sent by Atlassian JIRA (v6.3.4#6332)