[ https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562388#comment-14562388 ]
Germán Blanco commented on ZOOKEEPER-832: ----------------------------------------- It seems to me that with this patch applied the test is not flaky any more. It fails consistently. I have an intuition of why that might be, and I would like to fix the test (or the c client if the problem happens to be there). But I don't know the C code as well as I know the Java code, so help is very welcome. You can call it a client issue, and the proposal in my opinion does "the right thing" just as you propose (expire/close the *session*, which is what is under control of ZooKeeper code). > Invalid session id causes infinite loop during automatic reconnect > ------------------------------------------------------------------ > > Key: ZOOKEEPER-832 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.4.5, 3.5.0 > Environment: All > Reporter: Ryan Holmes > Assignee: Germán Blanco > Priority: Blocker > Fix For: 3.4.7, 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, > ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch > > > Steps to reproduce: > 1.) Connect to a standalone server using the Java client. > 2.) Stop the server. > 3.) Delete the contents of the data directory (i.e. the persisted session > data). > 4.) Start the server. > The client now automatically tries to reconnect but the server refuses the > connection because the session id is invalid. The client and server are now > in an infinite loop of attempted and rejected connections. While this > situation represents a catastrophic failure and the current behavior is not > incorrect, it appears that there is no way to detect this situation on the > client and therefore no way to recover. > The suggested improvement is to send an event to the default watcher > indicating that the current state is "session invalid", similar to how the > "session expired" state is handled. > Server log output (repeats indefinitely): > 2010-08-05 11:48:08,283 - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - > Accepted socket connection from /127.0.0.1:63292 > 2010-08-05 11:48:08,284 - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing > session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last > zxid is 0x0 client must try another server > 2010-08-05 11:48:08,284 - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed > socket connection for client /127.0.0.1:63292 (no session established for > client) > Client log output (repeats indefinitely): > 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - > Opening socket connection to server localhost/127.0.0.1:2181 > 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session > 0x12a3ae4e893000a for server null, unexpected error, closing socket > connection and attempting reconnect > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) > 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring > exception during shutdown input > java.nio.channels.ClosedChannelException > at > sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) > at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) > at > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) > 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring > exception during shutdown output > java.nio.channels.ClosedChannelException > at > sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) > at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) > at > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) -- This message was sent by Atlassian JIRA (v6.3.4#6332)