[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089357#comment-13089357
 ] 

Laxman commented on ZOOKEEPER-832:
----------------------------------

Camille and Ben, Thanks for your response.

>From Camille's explanation I understand there are two scenarios.
*scenario #1* Deletion of datadirs without stopping client.
*scenario #2* Deletion of some snapshots and restart of quorum without stopping 
client.

To summarize there are two approaches here.

*approach #1*
{quote}just reorder the code so that we validate the session before doing the 
zxid check{quote}

This will resolve the scenario #1 only. Removal of some snapshots may not 
delete the session data and session may be still valid. In this case, the 
infinite loop problem still persists as the session expiry will never happen as 
we are updating the expiry interval while validating a session.

*approach #2*
{quote}I have a proposal to send an error code that will allow the client to 
detect that all servers are at a lower zxid and close the session that I will 
probably implement for the release after 3.4, unless people are really 
clamoring for it.{quote}

Initially, our thoughts were inline with this approach. But on deeper analysis, 
we found this involves more complexity and also may need some protocol 
[Client-Server] changes. Here are some of the points identified.

** Introduce a new error code in ConnectionResponse.[Backward incompatibility 
and Protocol change]
** Client has to interpret this error code. On this error, Client has to 
exclude the server from retries. At the same time, Client should not exclude a 
server which is timeout.

So, I feel the *approach #2* brings unnecessary complexity to the system to 
handle a negative scenario/usecase.

Correct me if my understanding is wrong or if I'm missing something here.

*Camille*, if you agree I can work on the approach #2. Suggest me if you have 
any better solution.

> Invalid session id causes infinite loop during automatic reconnect
> ------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-832
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: c client, java client
>    Affects Versions: 3.3.1
>         Environment: Mac OS X 10.6.4
> JVM 1.6.0_20
>            Reporter: Ryan Holmes
>             Fix For: 3.5.0
>
>
> Steps to reproduce:
> 1.) Connect to a standalone server using the Java client.
> 2.) Stop the server.
> 3.) Delete the contents of the data directory (i.e. the persisted session 
> data).
> 4.) Start the server.
> The client now automatically tries to reconnect but the server refuses the 
> connection because the session id is invalid. The client and server are now 
> in an infinite loop of attempted and rejected connections. While this 
> situation represents a catastrophic failure and the current behavior is not 
> incorrect, it appears that there is no way to detect this situation on the 
> client and therefore no way to recover.
> The suggested improvement is to send an event to the default watcher 
> indicating that the current state is "session invalid", similar to how the 
> "session expired" state is handled.
> Server log output (repeats indefinitely):
> 2010-08-05 11:48:08,283 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - 
> Accepted socket connection from /127.0.0.1:63292
> 2010-08-05 11:48:08,284 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing 
> session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last 
> zxid is 0x0 client must try another server
> 2010-08-05 11:48:08,284 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed 
> socket connection for client /127.0.0.1:63292 (no session established for 
> client)
> Client log output (repeats indefinitely):
> 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - 
> Opening socket connection to server localhost/127.0.0.1:2181
> 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 
> 0x12a3ae4e893000a for server null, unexpected error, closing socket 
> connection and attempting reconnect
> java.net.ConnectException: Connection refused
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>       at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring 
> exception during shutdown input
> java.nio.channels.ClosedChannelException
>       at 
> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
>       at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>       at 
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164)
>       at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
> 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring 
> exception during shutdown output
> java.nio.channels.ClosedChannelException
>       at 
> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
>       at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
>       at 
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171)
>       at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to