[ 
https://issues.apache.org/jira/browse/GEODE-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297460#comment-16297460
 ] 

ASF GitHub Bot commented on GEODE-4096:
---------------------------------------

nabarunnag commented on issue #1186: GEODE-4096: Fixed race condition for 
connection global variable
URL: https://github.com/apache/geode/pull/1186#issuecomment-352892881
 
 
   **Order of execution for this race condition to occur.**
   
   1. _dispatchBatch is trying to dispatch a batch of events but was somehow 
unsuccessful
   2. It silently decides that the remote server may not be ready so it wants 
to retry
   3. Same time we decide to stop the SerialGatewaySenderEventProcessor hence 
we call the Stopper Thread.
   4. Before the threads are started on all the senders / dispatchers it sets 
the isStopped flag for the SerialGatewaySenderEventProcessor to true.
   5. Then the _dispatchBatch method which was in retry mode makes a 
getConnection call to get the connection. This method does a check on the 
SerialGatewaySenderEventProcessor's isStopped flag. It sees that the flag is 
set and this return null.
   6. This null is stored in the global variable connection for the dispatcher.
   7. Now that the _dispatchBatch method calls sees that the connection is null 
it should raise an exception and destroyConnection.
   8. Meanwhile there was a AckThreadReader that was running and the stopper 
thread for the event processor wants to stop it, but since the connection 
global variable was set to null by the get connection method call by 
_disptachBatch.
   9. Hence the shutDownAckReaderThreadConnection is executed on null and hence 
the AckReaderThread continues to keep running - being stuck on socketRead0.
   10. But the problem is that the AckReaderThread acquire a 
connectionLifeCycle.readLock. to readAcknowledgement, but the destroyConnection 
calls from the stopper thread and _dispatchBatch's exception handling code 
needs a connectionLifeCycleLock.writeLock which they can't because readLock is 
held by the AckReaderThread, causing a deadlock

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Race Condition between ConcurrentSerialGatewaySenderEventProcessor stopper 
> thread and the _dispatchBatch method for the connection global variable.
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-4096
>                 URL: https://issues.apache.org/jira/browse/GEODE-4096
>             Project: Geode
>          Issue Type: Bug
>          Components: wan
>            Reporter: nabarun
>            Assignee: nabarun
>
> *+Order of execution for this race condition to occur+*.
> #  _dispatchBatch is trying to dispatch a batch of events but was somehow 
> unsuccessful 
> # It silently decides that the remote server may not be ready so it wants to 
> retry
> # Same time we decide to stop the SerialGatewaySenderEventProcessor hence we 
> call the Stopper Thread.
> # Before the threads are started on all the senders / dispatchers it sets the 
> isStopped flag for the SerialGatewaySenderEventProcessor to true.
> # Then the _dispatchBatch method which was in retry mode makes a 
> getConnection call to get the connection. This method does a check on the 
> SerialGatewaySenderEventProcessor's isStopped flag. It sees that the flag is 
> set and this return null.
> # This null is stored in the global variable connection for the dispatcher.
> # Now that the _dispatchBatch method calls sees that the connection is null 
> it should raise an exception and destroyConnection.
> # Meanwhile there was a AckThreadReader that was running and the stopper 
> thread for the event processor wants to stop it, but since the connection 
> global variable was set to null by the get connection method call by 
> _disptachBatch.
> # Hence the shutDownAckReaderThreadConnection is executed on null and hence 
> the AckReaderThread continues to keep running - being stuck on socketRead0.
> # But the problem is that the AckReaderThread acquire a 
> connectionLifeCycle.readLock. to readAcknowledgement, but the 
> destroyConnection calls from the stopper thread and _dispatchBatch's 
> exception handling code needs a connectionLifeCycleLock.writeLock which they 
> can't because readLock is held by the AckReaderThread, causing a deadlock



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to