[ 
https://issues.apache.org/jira/browse/GEODE-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312081#comment-16312081
 ] 

ASF subversion and git services commented on GEODE-4096:
--------------------------------------------------------

Commit 6c37ff4a2a4fc6d609626b172793e3a123f9be82 in geode's branch 
refs/heads/develop from [~nnag]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=6c37ff4 ]

GEODE-4096: Fixed race condition for connection global variable

        * Information on how the race condition occurs is provided in the 
GEODE-4096 ticket.
        * getConnection before returning null and clearing out the global 
variable connection calls stop on the dispatcher.
        * This makes sure that AckReaderThreads for the dispatcher is shutdown 
and prevents lingering threads holding the connection life cycle lock.


> Race Condition between ConcurrentSerialGatewaySenderEventProcessor stopper 
> thread and the _dispatchBatch method for the connection global variable.
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-4096
>                 URL: https://issues.apache.org/jira/browse/GEODE-4096
>             Project: Geode
>          Issue Type: Bug
>          Components: wan
>            Reporter: nabarun
>            Assignee: nabarun
>
> *+Order of execution for this race condition to occur+*.
> #  _dispatchBatch is trying to dispatch a batch of events but was somehow 
> unsuccessful 
> # It silently decides that the remote server may not be ready so it wants to 
> retry
> # Same time we decide to stop the SerialGatewaySenderEventProcessor hence we 
> call the Stopper Thread.
> # Before the threads are started on all the senders / dispatchers it sets the 
> isStopped flag for the SerialGatewaySenderEventProcessor to true.
> # Then the _dispatchBatch method which was in retry mode makes a 
> getConnection call to get the connection. This method does a check on the 
> SerialGatewaySenderEventProcessor's isStopped flag. It sees that the flag is 
> set and this return null.
> # This null is stored in the global variable connection for the dispatcher.
> # Now that the _dispatchBatch method calls sees that the connection is null 
> it should raise an exception and destroyConnection.
> # Meanwhile there was a AckThreadReader that was running and the stopper 
> thread for the event processor wants to stop it, but since the connection 
> global variable was set to null by the get connection method call by 
> _disptachBatch.
> # Hence the shutDownAckReaderThreadConnection is executed on null and hence 
> the AckReaderThread continues to keep running - being stuck on socketRead0.
> # But the problem is that the AckReaderThread acquire a 
> connectionLifeCycle.readLock. to readAcknowledgement, but the 
> destroyConnection calls from the stopper thread and _dispatchBatch's 
> exception handling code needs a connectionLifeCycleLock.writeLock which they 
> can't because readLock is held by the AckReaderThread, causing a deadlock



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to