[ https://issues.apache.org/jira/browse/QPIDJMS-534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Timothy A. Bish resolved QPIDJMS-534. ------------------------------------- Fix Version/s: 0.59.0 Resolution: Fixed > BalancedProviderFuture.sync stuck forever during connection recovery > -------------------------------------------------------------------- > > Key: QPIDJMS-534 > URL: https://issues.apache.org/jira/browse/QPIDJMS-534 > Project: Qpid JMS > Issue Type: Bug > Components: qpid-jms-client > Affects Versions: 0.42.0 > Reporter: Ravi Nirmal > Assignee: Timothy A. Bish > Priority: Major > Fix For: 0.59.0 > > Attachments: logs.txt, thread-dump.txt > > > Recently, we observed an issue on our production environment where we can see > that BalancedProviderFuture.sync method during connection recovery is stuck > forever and never returns. We have observed this in 2 hosts in last one week, > the only solution is to restart the server. > I am attaching the thread dump which indicates the issue and how it blocks > other threads, [^thread-dump.txt] will have details of all the threads. > h3. Details of Investigation > * This issue is happening on connection recovery during failover from one > server to another. > * By debugging I can see that BalancedProviderFuture.sync method is waiting > for its state to be updated, and its state is updated by AmqpProvider thread. > In thread dump I don't see any AmqpProvider thread which is in stuck state > which indicates that AmqpProvider has done its job but still the state for > given BalancedProviderFuture object is not updated. > * In the successful event, I can see that the state of > BalancedProviderFuture object is updated in below sequence: > ** JmsSession.onConnectionRecovery method calls provider.create after > creating BalancedProviderFuture object. > ** provider.create (aka AmqpProvider.create) is start a thread using > serializer, this create method has proper handling and it either calls > pumpToProtonTransport OR request.onFailure(which will update the state of > BalancedProviderFuture in case of exception). > ** Once the above thread gets finished(basically after > pumpToProtonTransport), the serializer will call the AmqpProvider.onData > method which will update the state of BalancedProviderFuture object. > * I have observed that if we get the exception in AmqpProvider.onData method > then the state of BalancedProviderFuture is not getting updated and the > BalancedProviderFuture.sync method gets stuck forever, the exception can come > in case of protonTransport tail is closed already(probably because of idle > timeout issue OR any other transport related issue). > * I have also observed that in some cases(of idle timeout OR transport > errors) after completion of a thread which was started by provider.create > (aka AmqpProvider.create), the serializer is not calling AmqpProvider.onData > but instead it calls AmqpProvider.onTransportError OR > AmqpProvider.onTransportClosed and I can not see any handling of updating the > state of BalancedProviderFuture object in onTransportError OR > onTransportClosed method. > * I am attaching some [^logs.txt] which shows some errors, these error came > when the state of BalancedProviderFuture is not updated and sync mehod stuck > forever. > * Please note we are using URL - failover:(amqp://localhost:5672 > ,amqp://localhost:5682)?jms.sendTimeout=5000 and qpid version 0.42.0. > I have found two old tickets QPIDJMS-458 & QPIDJMS-464 which shows the > similar issue, but I believe this issue is different and might needs to be > fixed separately. > Can someone please take a look at this as this becomes critical issue in our > production environment and we don't have any option except restart of our > services? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org