[ https://issues.apache.org/jira/browse/GEODE-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
nabarun closed GEODE-4096. -------------------------- > Race Condition between ConcurrentSerialGatewaySenderEventProcessor stopper > thread and the _dispatchBatch method for the connection global variable. > --------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: GEODE-4096 > URL: https://issues.apache.org/jira/browse/GEODE-4096 > Project: Geode > Issue Type: Bug > Components: wan > Reporter: nabarun > Assignee: nabarun > Fix For: 1.4.0 > > > *+Order of execution for this race condition to occur+*. > # _dispatchBatch is trying to dispatch a batch of events but was somehow > unsuccessful > # It silently decides that the remote server may not be ready so it wants to > retry > # Same time we decide to stop the SerialGatewaySenderEventProcessor hence we > call the Stopper Thread. > # Before the threads are started on all the senders / dispatchers it sets the > isStopped flag for the SerialGatewaySenderEventProcessor to true. > # Then the _dispatchBatch method which was in retry mode makes a > getConnection call to get the connection. This method does a check on the > SerialGatewaySenderEventProcessor's isStopped flag. It sees that the flag is > set and this return null. > # This null is stored in the global variable connection for the dispatcher. > # Now that the _dispatchBatch method calls sees that the connection is null > it should raise an exception and destroyConnection. > # Meanwhile there was a AckThreadReader that was running and the stopper > thread for the event processor wants to stop it, but since the connection > global variable was set to null by the get connection method call by > _disptachBatch. > # Hence the shutDownAckReaderThreadConnection is executed on null and hence > the AckReaderThread continues to keep running - being stuck on socketRead0. > # But the problem is that the AckReaderThread acquire a > connectionLifeCycle.readLock. to readAcknowledgement, but the > destroyConnection calls from the stopper thread and _dispatchBatch's > exception handling code needs a connectionLifeCycleLock.writeLock which they > can't because readLock is held by the AckReaderThread, causing a deadlock -- This message was sent by Atlassian JIRA (v6.4.14#64029)