[
https://issues.apache.org/jira/browse/CURATOR-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856003#comment-15856003
]
ASF GitHub Bot commented on CURATOR-367:
----------------------------------------
Github user Randgalt commented on the issue:
https://github.com/apache/curator/pull/197
This is a significant change. I would like amble time to look at it.
@cammckenzie please do not merge.
> Curator may deliver RECONNECTED before LOST in case of session expiry
> ---------------------------------------------------------------------
>
> Key: CURATOR-367
> URL: https://issues.apache.org/jira/browse/CURATOR-367
> Project: Apache Curator
> Issue Type: Bug
> Components: Client
> Affects Versions: 2.11.1
> Reporter: Zoltan Szekeres
>
> h2. Behaviour:
> We saw our code blocked at client.blockUntilConnected() after reconnected on
> session expiry.
> h2. Possible reason:
> After receiving a session expired event ConnectionState first resets the
> connection then notifies the parent watchers, where the CuratorEvent is
> created. In this case it seems the execution of the first zookeeper event
> thread was delayed before calling the parent watchers. Meanwhile a new
> zookeeper event thread was created due to calling reset and this new thread
> sent the SyncConnected event earlier than SessionExpired was sent to parent
> watchers in the first thread. This resulted ConnectionStateListener instances
> seeing the RECONNECTED before the LOST.
> h2. Logs:
> 2016-11-17T20:23:28.527Z [Thread-0-SendThread(] INFO
> ClientCnxn: Opening socket connection to server _
> 2016-11-17T20:23:28.535Z [Thread-0-SendThread(] INFO
> ClientCnxn: Socket connection established to _, initiating session
> 2016-11-17T20:23:28.576Z [Thread-0-SendThread(] INFO
> ClientCnxn: Unable to reconnect to ZooKeeper service, session
> 0xc585ba1e7b6adc2 has expired, closing socket connection
> 2016-11-17T20:23:28.576Z [Thread-0-EventThread] WARN
> ConnectionState: Session expired event received
> 2016-11-17T20:23:28.673Z [Thread-0-EventThread] INFO
> ZooKeeper: Initiating client connection, connectString=_ sessionTimeout=30000
> watcher=org.apache.curator.ConnectionState@6ddf3f9d
> 2016-11-17T20:23:28.691Z [Thread-0-SendThread(] INFO
> ClientCnxn: Opening socket connection to server _
> 2016-11-17T20:23:28.693Z [Thread-0-SendThread(] INFO
> ClientCnxn: Socket connection established to _, initiating session
> 2016-11-17T20:23:28.701Z [Thread-0-SendThread(] INFO
> ClientCnxn: Session establishment complete on server _, sessionid =
> 0x2585ba1e69ffeca, negotiated timeout = 30000
> 2016-11-17T20:23:28.701Z [Thread-0-EventThread] INFO
> ConnectionStateManager: State change: RECONNECTED
> 2016-11-17T20:23:28.715Z [Thread-0-EventThread] INFO
> ConnectionStateManager: State change: LOST
> 2016-11-17T20:23:28.715Z [Thread-0-EventThread] INFO
> ClientCnxn: EventThread shut down
> h2. Reproduction:
> I was only able to reproduce the behaviour by adding artificial Thread.sleep
> in ConnectionState#process before calling the parent watchers if the event is
> session expired.
> {code:title=ConnectionState#process}
> @Override
> public void process(WatchedEvent event)
> {
> if ( LOG_EVENTS )
> {
> log.debug("ConnectState watcher: " + event);
> }
> if ( event.getType() == Watcher.Event.EventType.None )
> {
> boolean wasConnected = isConnected.get();
> boolean newIsConnected = checkState(event.getState(),
> wasConnected);
> if ( newIsConnected != wasConnected )
> {
> isConnected.set(newIsConnected);
> connectionStartMs = System.currentTimeMillis();
> }
> }
> if (event.getState() == KeeperState.Expired)
> {
> System.err.println("::> sleep in ConnectionState#process");
> try {
> Thread.sleep(1000);
> } catch (InterruptedException e) {}
> }
> for ( Watcher parentWatcher : parentWatchers )
> {
> TimeTrace timeTrace = new
> TimeTrace("connection-state-parent-process", tracer.get());
> parentWatcher.process(event);
> timeTrace.commit();
> }
> }
> {code}
> h2. Some ideas for fix:
> * Add the event handling and calling parent watchers into a synchronized
> block.
> * Change the order of handling watched event and calling parent watchers
> (I'm not aware of the behaviour implications of this).
> * Move only calling reset to the end of the method "process".
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)