[jira] Commented: (AMQ-734) Network connections do not reconnect when using static: with failover=true
[ https://issues.apache.org/activemq/browse/AMQ-734?page=comments#action_37516 ] Chris Hofstaedter commented on AMQ-734: --- I've got what I think is a fix for AMQ-734. However, the fix is possibly heavy-handed in its implementation, so I'd like for someone to review it and provide comments regarding any potential unintended side effects. I've attached to the issue in JIRA a junit case that is able to reproduce the issue 100% of the time as well as the necessary updates to DiscoveryNetworkConnector and DemandForwardingBridgeSupport. Basically, the classes were stopping the local broker and bridge whenever any exception was encountered on the transport. However, some of the exceptions are ok and should not result in a stop. These include InvalidClientIDException, BrokerStoppedException, InvalidStateException, and SocketException. With the fix, for these specific exceptions, the broker/bridge are not stopped and fireServiceFailed is never called. For all other exceptions, they still are called. In addition, the TransportListener.transportResumed implementation in DemandForwardingBridgeSupport never started the local/remote broker and never called triggerRemoteStartBridge. With this fix, they now do. I've run the whole suite of tests and all of the tests that passed prior to my modifications continue to pass. In addition, the test case attached to the JIRA fails without the fixes and passes with the fixes 100% of the time. Comments? Network connections do not reconnect when using static: with failover=true -- Key: AMQ-734 URL: https://issues.apache.org/activemq/browse/AMQ-734 Project: ActiveMQ Issue Type: Bug Components: Connector Affects Versions: 4.0 Environment: winxp java1.5.6 Reporter: tao Assigned To: Hiram Chirino Priority: Critical Fix For: 4.2.0, 4.1.1 Attachments: Amq734Test.java, Amq734Test.java, DemandForwardingBridgeSupport.java, DiscoveryNetworkConnector.java If I pull out RJ45 port from net card ,waiting a time ,then put RJ45 port net card .Network is resume.Other computer always throw errors and net channel can't work. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/activemq/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (AMQ-734) Network connections do not reconnect when using static: with failover=true
[ https://issues.apache.org/activemq/browse/AMQ-734?page=comments#action_37499 ] Eric Wood commented on AMQ-734: --- Hiram, The issue was intermittent, so the best we can do is run it for long periods of time. So far we have run with 4.1 for over ten hours total, failed the network connection about 20 times, including both active and natural failures, and it has recovered every time and we had 0 message loss. With 4.0.2 we were successful in reproducing the error nearly every time we caused a failure. So it seems that the problem is resolved, but we still want to do more testing. We are noticing a large number of inactivity timeouts. Our network is fairly low bandwidth/high latency, so we were wondering if there is a way, or one could be added, to adjust the inactivity timeout value. We think we could improve our throughput if we didn't timeout so often. If you'd like, I'll open a separate JIRA case for this. Regards, Eric Network connections do not reconnect when using static: with failover=true -- Key: AMQ-734 URL: https://issues.apache.org/activemq/browse/AMQ-734 Project: ActiveMQ Issue Type: Bug Components: Connector Affects Versions: 4.0 Environment: winxp java1.5.6 Reporter: tao Assigned To: Hiram Chirino Priority: Critical Fix For: 4.2.0, 4.1.1 If I pull out RJ45 port from net card ,waiting a time ,then put RJ45 port net card .Network is resume.Other computer always throw errors and net channel can't work. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/activemq/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (AMQ-734) Network connections do not reconnect when using static: with failover=true
[ https://issues.apache.org/activemq/browse/AMQ-734?page=comments#action_37379 ] Jef Gearhart commented on AMQ-734: -- [[ Old comment, sent by email on Fri, 27 Oct 2006 16:09:56 -0500 ]] Hi Eric, I guess that's up to you. I believe the problem you and I both posted, however, are side effects of the overall problem. The failover transport 'reconnection' works, but restoration of the state of the previous connection is broken in multiple places (e.g. subscriptions, clientIds). I've peeked into the code a little bit, and I think both side effects could be addressed simultaneously, so my opinion is to just leave both descriptions here. Network connections do not reconnect when using static: with failover=true -- Key: AMQ-734 URL: https://issues.apache.org/activemq/browse/AMQ-734 Project: ActiveMQ Issue Type: Bug Components: Connector Affects Versions: 4.0 Environment: winxp java1.5.6 Reporter: tao Assigned To: Hiram Chirino Priority: Critical Fix For: 4.0.1, 4.1 If I pull out RJ45 port from net card ,waiting a time ,then put RJ45 port net card .Network is resume.Other computer always throw errors and net channel can't work. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/activemq/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (AMQ-734) Network connections do not reconnect when using static: with failover=true
[ https://issues.apache.org/activemq/browse/AMQ-734?page=comments#action_37307 ] Eric Wood commented on AMQ-734: --- To add some further clarification. We verified (via logs) that the receive side broker had timed out the send side broker. Then, when the send side broker tries to reconnect, the receive side broker says invalid client id apparently because it did not unregister the client when it timed out? During this test, we had numerous timeouts and successful recoveries, and then suddenly it broke down. Looking at the previous posts a little more closely, I am now wondering if this is a different issue? Should I create a new issue? Network connections do not reconnect when using static: with failover=true -- Key: AMQ-734 URL: https://issues.apache.org/activemq/browse/AMQ-734 Project: ActiveMQ Issue Type: Bug Components: Connector Affects Versions: 4.0 Environment: winxp java1.5.6 Reporter: tao Assigned To: Hiram Chirino Priority: Critical Fix For: 4.0.1, 4.1 If I pull out RJ45 port from net card ,waiting a time ,then put RJ45 port net card .Network is resume.Other computer always throw errors and net channel can't work. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/activemq/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (AMQ-734) Network connections do not reconnect when using static: with failover=true
[ https://issues.apache.org/activemq/browse/AMQ-734?page=comments#action_37294 ] Eric Wood commented on AMQ-734: --- We can reproduce this issue using the current build of 4.0.2. We are running a distributed queue between two embedded brokers over an unreliable network (wireless). We had been using 4.0.1, but the brokers did not reestablish the bridge when it was stopped due to inactivity timeout. We tried 4.0.2 and it worked great for a while, but eventually one of the reconnect attempts failed. On the receiver side broker (called CTSServer) we see in the logs wire format negotiation taking place, but on the send side broker (called CTSClient) we get the following in the log: [java] 12:42:12,236 DEBUG DurableConduitBridge:69 - Forwarding messages for durable destination: queue://TEST.FOO [java] 12:42:12,237 DEBUG DemandForwardingBridge:289 - stopping CTSClient bridge to CTSServer is disposed already ? false [java] 12:42:12,240 DEBUG DemandForwardingBridge:308 - CTSClient bridge to CTSServer stopped [java] 12:42:12,241 DEBUG DemandForwardingBridge:289 - stopping CTSClient bridge to CTSServer is disposed already ? true [java] 12:42:12,242 DEBUG DemandForwardingBridge:308 - CTSClient bridge to CTSServer stopped [java] 12:42:12,244 INFO NetworkConnector:96 - Establishing network connection between from vm://CTSClient?network=true to tcp://10.134.0.1:61616 [java] 12:42:15,986 DEBUG WireFormatNegotiator:65 - Sending: WireFormatInfo { version=1, properties={TightEncodingEnabled=true, TcpNoDelayEnabled=true, SizePrefixDisabled=false, StackTraceEnabled=true, MaxInactivityDuration=3, CacheEnabled=true}, magic=[A,c,t,i,v,e,M,Q]} [java] 12:42:15,987 DEBUG TcpTransport:133 - TCP consumer thread starting [java] 12:42:18,957 DEBUG WireFormatNegotiator:95 - Received WireFormat: WireFormatInfo { version=1, properties={StackTraceEnabled=true, TightEncodingEnabled=true, TcpNoDelayEnabled=true, SizePrefixDisabled=false, MaxInactivityDuration=3, CacheEnabled=true}, magic=[A,c,t,i,v,e,M,Q]} [java] 12:42:18,957 DEBUG WireFormatNegotiator:102 - tcp:///10.134.0.1:61616 before negotiation: OpenWireFormat{version=1, cacheEnabled=false, stackTraceEnabled=false, tightEncodingEnabled=false, sizePrefixDisabled=false} [java] 12:42:18,958 DEBUG WireFormatNegotiator:113 - tcp:///10.134.0.1:61616 after negotiation: OpenWireFormat{version=1, cacheEnabled=true, stackTraceEnabled=true, tightEncodingEnabled=true, sizePrefixDisabled=false} [java] 12:42:21,838 DEBUG Service:221 - Async error occurred: javax.jms.InvalidClientIDException: Broker: CTSClient - Client: NC_CTSServer_inboundCTSClient already connected [java] javax.jms.InvalidClientIDException: Broker: CTSClient - Client: NC_CTSServer_inboundCTSClient already connected [java] at org.apache.activemq.broker.region.RegionBroker.addConnection(RegionBroker.java:180) [java] at org.apache.activemq.broker.BrokerFilter.addConnection(BrokerFilter.java:70) [java] at org.apache.activemq.advisory.AdvisoryBroker.addConnection(AdvisoryBroker.java:70) [java] at org.apache.activemq.broker.BrokerFilter.addConnection(BrokerFilter.java:70) [java] at org.apache.activemq.broker.MutableBrokerFilter.addConnection(MutableBrokerFilter.java:83) [java] at org.apache.activemq.broker.AbstractConnection.processAddConnection(AbstractConnection.java:633) [java] at org.apache.activemq.command.ConnectionInfo.visit(ConnectionInfo.java:120) [java] at org.apache.activemq.broker.AbstractConnection.service(AbstractConnection.java:237) [java] at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:61) [java] at org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:92) [java] at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:67) [java] at org.apache.activemq.transport.vm.VMTransport.oneway(VMTransport.java:77) [java] at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:45) [java] at org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:59) [java] at org.apache.activemq.network.DemandForwardingBridgeSupport.startLocalBridge(DemandForwardingBridgeSupport.java:225) [java] at org.apache.activemq.network.DemandForwardingBridgeSupport$3.run(DemandForwardingBridgeSupport.java:191) [java] 12:42:21,840 INFO DemandForwardingBridge:230 - Network connection between vm://CTSClient#728 and tcp:///10.134.0.1:61616(CTSServer) has been established. [java] 12:42:21,851 INFO DemandForwardingBridge:432 - Network connection between vm://CTSClient#728 and tcp:///10.134.0.1:61616 shutdown due to a local error: javax.jms.InvalidClientIDException: Broker: CTSClient - Client:
[jira] Commented: (AMQ-734) Network connections do not reconnect when using static: with failover=true
[ https://issues.apache.org/activemq/browse/AMQ-734?page=comments#action_36274 ] Rob Davies commented on AMQ-734: the reason for the fixed client_id is durable topic consumers. The network bridge amalgamates multiple durable subscribers on the same topic in to one durable subscriber - to avoid duplicates and improve performance. There is a keep alive protocol used with the tcp connector that should detect the network outage - needs some more investigation to see why this doesn't appear to be working Network connections do not reconnect when using static: with failover=true -- Key: AMQ-734 URL: https://issues.apache.org/activemq/browse/AMQ-734 Project: ActiveMQ Type: Bug Components: Connector Versions: 4.0 Environment: winxp java1.5.6 Reporter: tao Priority: Critical Fix For: 4.0.1 If I pull out RJ45 port from net card ,waiting a time ,then put RJ45 port net card .Network is resume.Other computer always throw errors and net channel can't work. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/activemq/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (AMQ-734) Network connections do not reconnect when using static: with failover=true
[ https://issues.apache.org/activemq/browse/AMQ-734?page=comments#action_36230 ] Janet Cooper commented on AMQ-734: -- (using active-mq-snapshot-30-may / linux) Some comments: Given a network of brokers in a demand forwarding environment: Broker A + Broker B in a _real_ network on different hosts (a lot of bugs are not reproducable in a localhost-environment) UseCase 1: Start Broker A Start Broker B Do a network-split (aka unplugging the cable) [There is an exception attached that will be generated in such a case] Create new message m1 on Broker A (that is stored for later delivery) Reconnect network Brokers will reconnect with exception (see below) It is actually possible to LOSE MESSAGES that way. I found out that under certain circumstances message m1 will not be delivered. Even restarting every broker does not help in such a case. EXCEPTION ON UPLUGGING: INFO Service:185 - Async error occurred: java.lang.IllegalStateException: Cannot lookup a connection that had not been registered: ID:develop-43781-1149388950450-3:0 java.lang.IllegalStateException: Cannot lookup a connection that had not been registered: ID:develop-43781-1149388950450-3:0 at org.apache.activemq.broker.AbstractConnection.lookupConnectionState(AbstractConnection.java:241) at org.apache.activemq.broker.AbstractConnection.processRemoveConnection(AbstractConnection.java:519) at org.apache.activemq.command.RemoveInfo.visit(RemoveInfo.java:59) at org.apache.activemq.broker.AbstractConnection.service(AbstractConnection.java:201) at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:62) at org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:97) at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:63) at org.apache.activemq.transport.vm.VMTransport.oneway(VMTransport.java:76) at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:44) at org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60) at org.apache.activemq.network.DemandForwardingBridgeSupport$2.transportInterupted(DemandForwardingBridgeSupport.java:138) at org.apache.activemq.transport.TransportFilter.transportInterupted(TransportFilter.java:98) at org.apache.activemq.transport.TransportFilter.transportInterupted(TransportFilter.java:98) at org.apache.activemq.transport.failover.FailoverTransport.handleTransportFailure(FailoverTransport.java:223) at org.apache.activemq.transport.failover.FailoverTransport.access$300(FailoverTransport.java:53) at org.apache.activemq.transport.failover.FailoverTransport$1.onException(FailoverTransport.java:111) at org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:94) at org.apache.activemq.transport.WireFormatNegotiator.onException(WireFormatNegotiator.java:120) at org.apache.activemq.transport.InactivityMonitor.onException(InactivityMonitor.java:149) at org.apache.activemq.transport.TransportSupport.onException(TransportSupport.java:100) at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:156) at java.lang.Thread.run(Thread.java:595) EXCEPTION ON RECONNECTING: INFO Service:185 - Async error occurred: javax.jms.InvalidClientIDException: Broker: develop - Client: NC_testbroker_inbounddevelop already connected javax.jms.InvalidClientIDException: Broker: develop - Client: NC_testbroker_inbounddevelop already connected at org.apache.activemq.broker.region.RegionBroker.addConnection(RegionBroker.java:176) at org.apache.activemq.broker.BrokerFilter.addConnection(BrokerFilter.java:69) at org.apache.activemq.advisory.AdvisoryBroker.addConnection(AdvisoryBroker.java:69) at org.apache.activemq.broker.BrokerFilter.addConnection(BrokerFilter.java:69) at org.apache.activemq.broker.MutableBrokerFilter.addConnection(MutableBrokerFilter.java:82) at org.apache.activemq.broker.AbstractConnection.processAddConnection(AbstractConnection.java:507) at org.apache.activemq.command.ConnectionInfo.visit(ConnectionInfo.java:118) at org.apache.activemq.broker.AbstractConnection.service(AbstractConnection.java:201) at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:62) at org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:97) at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:63) at org.apache.activemq.transport.vm.VMTransport.oneway(VMTransport.java:76) at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:44) at org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60) at