After turning on the trace, here is that pattern we found. 1. connection issue detected at 2014-05-21 17:13:17,008 for vm://mq01#68026 2. MulticastDiscoveryAgent detected issue at 17:13:17,009, decides to wait for 5 seconds 3. Stopping connection at 2014-05-21 17:13:17,015 4. Stopping transport at 2014-05-21 17:13:17,016 5. Cleaning up connection resources: vm://mq01#68026 at 2014-05-21 17:13:17,016 6. remove connection id: mq01->mq02-46052-1400304319531-34022:1 at 2014-05-21 17:13:17,016 7. Unregistering MBean for consumer/producers starting from 2014-05-21 17:13:17,017 8. MulticastDiscoveryAgent kicking off recovery at 17:13:22, failed, InvalidClientIDException logged, wating 5 seconds 9. MulticastDiscoveryAgent kicking off recovery at 17:13:28, failed, InvalidClientIDException logged, wating 5 seconds 10. MulticastDiscoveryAgent kicking off recovery at 17:13:34, failed, InvalidClientIDException logged, wating 5 seconds 11. Unregistering MBean for consumer/producers completed at 2014-05-21 17:13:37,975 12. Connection Stopped: vm://mq01#68026 at 2014-05-21 17:13:38,012 13. MulticastDiscoveryAgent kicking off recovery at 17:13:39, and succeeded
The InvalidClientIDException we saw is actually indicating: 1. connection stopping process is till going on, the recovery attempts kicked of by MulticastDiscoveryAgent is not able to go through (and shouldn't) 2. that fact that we saw 4 InvalidClientIDException in a row after a "connection issue" seems caused by the long running resource cleaning up(like MBeans/consumers) 3. from the log the cleaning process is done by a single thread, and it takes 40 milli seconds to unregister a subscription ( I am seeing remote consumers mostly). for instance =====> ./mq01/tracemq01.log:2014-05-21 17:13:30,811 | DEBUG | mq01 removing consumer: mq01->mq02-46052-1400304319531-34022:1:1:274 for destination: queue://Template | org.apache.activemq.broker.region.AbstractRegion | ActiveMQ BrokerService[mq01] Task-22226 ./mq01/tracemq01.log:2014-05-21 17:13:30,849 | DEBUG | mq01 removing consumer: mq01->mq02-46052-1400304319531-34022:1:1:238 for destination: queue://Template | org.apache.activemq.broker.region.AbstractRegion | ActiveMQ BrokerService[mq01] Task-22226 4. since we saw have 500+ remote consumers on it. each time a connection issue happen, it takes 30~40 seconds to clean up the resources and reestablish new one. Questions: 1. is 500+ consumers on single broker node consider to be too much for a broker? 2. is not the 30~40 per destination performance too slow? is it normal? any way to tune it up? 3. shouldn't the clean up process running in parallel? -- View this message in context: http://activemq.2283324.n4.nabble.com/InvalidClientIDException-between-network-brokers-tp4681090p4681383.html Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.
