[ https://issues.apache.org/jira/browse/CURATOR-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855051#comment-15855051 ]
ASF GitHub Bot commented on CURATOR-367: ---------------------------------------- Github user Randgalt commented on the issue: https://github.com/apache/curator/pull/197 I'm -1 on introducing a new dependency. Can the PR be re-worked to not use Mockito? > Curator may deliver RECONNECTED before LOST in case of session expiry > --------------------------------------------------------------------- > > Key: CURATOR-367 > URL: https://issues.apache.org/jira/browse/CURATOR-367 > Project: Apache Curator > Issue Type: Bug > Components: Client > Affects Versions: 2.11.1 > Reporter: Zoltan Szekeres > > h2. Behaviour: > We saw our code blocked at client.blockUntilConnected() after reconnected on > session expiry. > h2. Possible reason: > After receiving a session expired event ConnectionState first resets the > connection then notifies the parent watchers, where the CuratorEvent is > created. In this case it seems the execution of the first zookeeper event > thread was delayed before calling the parent watchers. Meanwhile a new > zookeeper event thread was created due to calling reset and this new thread > sent the SyncConnected event earlier than SessionExpired was sent to parent > watchers in the first thread. This resulted ConnectionStateListener instances > seeing the RECONNECTED before the LOST. > h2. Logs: > 2016-11-17T20:23:28.527Z [Thread-0-SendThread(] INFO > ClientCnxn: Opening socket connection to server _ > 2016-11-17T20:23:28.535Z [Thread-0-SendThread(] INFO > ClientCnxn: Socket connection established to _, initiating session > 2016-11-17T20:23:28.576Z [Thread-0-SendThread(] INFO > ClientCnxn: Unable to reconnect to ZooKeeper service, session > 0xc585ba1e7b6adc2 has expired, closing socket connection > 2016-11-17T20:23:28.576Z [Thread-0-EventThread] WARN > ConnectionState: Session expired event received > 2016-11-17T20:23:28.673Z [Thread-0-EventThread] INFO > ZooKeeper: Initiating client connection, connectString=_ sessionTimeout=30000 > watcher=org.apache.curator.ConnectionState@6ddf3f9d > 2016-11-17T20:23:28.691Z [Thread-0-SendThread(] INFO > ClientCnxn: Opening socket connection to server _ > 2016-11-17T20:23:28.693Z [Thread-0-SendThread(] INFO > ClientCnxn: Socket connection established to _, initiating session > 2016-11-17T20:23:28.701Z [Thread-0-SendThread(] INFO > ClientCnxn: Session establishment complete on server _, sessionid = > 0x2585ba1e69ffeca, negotiated timeout = 30000 > 2016-11-17T20:23:28.701Z [Thread-0-EventThread] INFO > ConnectionStateManager: State change: RECONNECTED > 2016-11-17T20:23:28.715Z [Thread-0-EventThread] INFO > ConnectionStateManager: State change: LOST > 2016-11-17T20:23:28.715Z [Thread-0-EventThread] INFO > ClientCnxn: EventThread shut down > h2. Reproduction: > I was only able to reproduce the behaviour by adding artificial Thread.sleep > in ConnectionState#process before calling the parent watchers if the event is > session expired. > {code:title=ConnectionState#process} > @Override > public void process(WatchedEvent event) > { > if ( LOG_EVENTS ) > { > log.debug("ConnectState watcher: " + event); > } > if ( event.getType() == Watcher.Event.EventType.None ) > { > boolean wasConnected = isConnected.get(); > boolean newIsConnected = checkState(event.getState(), > wasConnected); > if ( newIsConnected != wasConnected ) > { > isConnected.set(newIsConnected); > connectionStartMs = System.currentTimeMillis(); > } > } > if (event.getState() == KeeperState.Expired) > { > System.err.println("::> sleep in ConnectionState#process"); > try { > Thread.sleep(1000); > } catch (InterruptedException e) {} > } > for ( Watcher parentWatcher : parentWatchers ) > { > TimeTrace timeTrace = new > TimeTrace("connection-state-parent-process", tracer.get()); > parentWatcher.process(event); > timeTrace.commit(); > } > } > {code} > h2. Some ideas for fix: > * Add the event handling and calling parent watchers into a synchronized > block. > * Change the order of handling watched event and calling parent watchers > (I'm not aware of the behaviour implications of this). > * Move only calling reset to the end of the method "process". -- This message was sent by Atlassian JIRA (v6.3.15#6346)