Github user Randgalt commented on the issue:

    https://github.com/apache/curator/pull/262
  
    > -If the connection between client and server is lost a disconnected event 
is received essentially immediately.
    > -If a heart beat is missed it takes 2/3 of the session timeout.
    
    Thinking more about this... There are three scenarios:
    
    1. The internal ZK client detects a missed heartbeat, closes the connection 
and sends `Event.KeeperState.Disconnected` (note: this is done by 
ClientCnxn.java in the SendThread class's run() method - the huge catch 
handler).
    2. The server detects a missed heartbeat and closes the connection which 
causes the client to get a closed connection and most likely a 
`SocketException` (which is handled in that same catch clause)
    3. The connection fails for other reasons (TCP errors, server crashes, 
etc.). 
    
    Unfortunately, cases 2 and 3 look _exactly the same_ to the client. In case 
2, the client should ideally act as if 2/3 of the session have elapsed. In case 
3, the client should act as if none of the session has elapsed. Worse still, if 
the entire ZK cluster is down, it could conceivably come back up and clients 
won't lose their sessions at all because "time" in ZK is based on the leader's 
start time (I still think there's value in killing the session on the client 
side anyway as clients could be left hanging interminably). 
    
    So, even if we can know the client side heartbeat miss (case 1), we have no 
way of mitigating cases 2/3. I'm not sure what to do. Maybe leave 
`StandardConnectionHandlingPolicy` as is and add a new, optional policy, 
`AggressiveConnectionHandlingPolicy` that always assumes connection loss is due 
to missed heartbeats? Or, possibly, just do nothing and leave things as they 
are?
    
    Thoughts?



---

Reply via email to