[ https://issues.apache.org/activemq/browse/AMQNET-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=63060#action_63060 ]
Timothy Bish commented on AMQNET-289: ------------------------------------- Applied the suggested fix in trunk. @Daniel if you happen to have a stack trace on those three threads I'd love to see it, would like to ensure there aren't any other points where this sort of thing can happen. > Deadlock while sending a message after failover within a consumer > ----------------------------------------------------------------- > > Key: AMQNET-289 > URL: https://issues.apache.org/activemq/browse/AMQNET-289 > Project: ActiveMQ .Net > Issue Type: Bug > Components: ActiveMQ > Affects Versions: 1.4.1 > Environment: Windows 7 64 bits > Reporter: Morgan Martinet > Assignee: Jim Gomes > Priority: Critical > Fix For: 1.5.0 > > Attachments: deadlock.jpg, SessionExecutor.cs > > > Scenario: > - I have one producer that sends a request (with a temporary queue specified > in the Reply-to attribute) to a consumer, in a separate process. > - both, the producer and the consumer, use the following connection string: > failover:(tcp://localhost:61616)?timeout=3000 > - the consumer, when processing the request, waits 10 seconds then sends a > response back, using the Reply-To attribute. > - immediately after the message has been sent, while the consumer is waiting > for 10 secs, I restart the ActiveMQ broker. > - once the the consumer wakes up and tries to send its reply, it will > deadlock because of the failover. > We have managed to identify the resources that deadlock: > Thread1 - lock(reconnectMutex) > (c:\Temp\Apache\NMS.ActiveMQ\1.4.1\src\main\csharp\Transport\Failover\FailoverTransport.cs: > line 366) > Thread1 - wait on lock(this.consumers.SyncRoot) > (c:\Temp\Apache\NMS.ActiveMQ\1.4.1\src\main\csharp\Session.cs: line 830) > Thread2 - lock(this.consumers.SyncRoot) > (c:\Temp\Apache\NMS.ActiveMQ\1.4.1\src\main\csharp\SessionExecutor.cs: line > 147) > Thread2 - wait on lock(reconnectMutex) > (c:\Temp\Apache\NMS.ActiveMQ\1.4.1\src\main\csharp\Transport\Failover\FailoverTransport.cs: > line 531) > Patch: > I managed to find a simple fix for this, by moving the consumer dispatch out > of the this.consumers.SyncRoot lock in SessionExecutor.cs: > {{ > public void Dispatch(MessageDispatch dispatch) > { > try > { > MessageConsumer consumer = null; > lock(this.consumers.SyncRoot) > { > if(this.consumers.Contains(dispatch.ConsumerId)) > { > consumer = this.consumers[dispatch.ConsumerId] as > MessageConsumer; > } > // Note that consumer.Dispatch(...) was moved below, outside of the lock. > } > // If the consumer is not available, just ignore the message. > // Otherwise, dispatch the message to the consumer. > if(consumer != null) { > consumer.Dispatch(dispatch); > } > } > catch(Exception ex) > { > Tracer.DebugFormat("Caught Exception While Dispatching: {0}", > ex.Message ); > } > } > }} > Note that I ran the unit tests before my patch and I got 3 failures. Then I > got the same failures with my patch. So, I hope it didn't break anything but > I'll let you find the best solution... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.