Re: Consumers hanging on a queue although there are messages in it
It has been a while so here's an update. The same problem has been occurring on and off for the past two months now and there is one suspect always coming back: message grouping. We have found an tried several things and here are some of the findings: *Message groups cache* ActiveMQ defaults to an LRU cache with size 1024 for storing hashes of the JMX message group header. We where grouping on a higher number and could not find where to change this setting so we went to 1024 message groups in code. That did not help the 'hanging' problem at all. *Broker page size* Because the ActiveMQ broker sends all messages of a group to a single consumer it needs to load messages in memory. When all messages in memory are for a single consumer all other messages in the queue are not processed. Max page size is the parameter to enable the broker to load more messages in memory and hopefully will find messages for another consumer so flow is not impacted heavily. That problem with message groups and some kind of bug within client and/or broker seems to trigger the hanging state. When we simulate a lot of messages for a single broker, even within the max page size, we encounter the hanging state issue (although lately another variant, more below). Strangely after restart of the client and broker failover the hanging state disappears so it must be something when running for a while instead of a full queue when just started. After changing the maxPageSize (increasing it from 1000 to 1) we did see a major decline in incidents, so that definitely has effect (and supporting the theory above how that causes the hanging state). The hanging state we encounter recently is a failover transport handler in the client that seems to think the broker is down/unresponsive and blocks all consumers for a specific timeout (3 seconds default I think). After that timeout everything continues for a few seconds and the timeout is triggered again in an endless loop. Only way we know how to stop this is restarting the client and performing a broker failover. *Next steps* We are now researching how number of consumers, maxPageSize and client preFetch settings interact with each other to hopefully find a good setting for all those parameters. Mostly because the number of consumers directly affects the number of messages groups per consumer. Also we upgraded to the latest activemq and camel client libraries and the latest ActiveMQ broker. The broker is running quite some time now and the issues continued, the client libraries update will be released to production soon. -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Replicated Message Store for ActiveMQ
Hi, We have been using ActiveMQ 5.x (upgraded to 5.14 last year) for our product which is in production for 3years. We have been facing stability issues with replicated LevelDB store(it was deprecated by community after we went live with LevelDB, we have stuck to it as we accomplished HA through replicated message store which is not available in KahaDB). Now we have reached a threshold point where we can't withstand any further LevelDB corrupt and lose the messages store over there. We are looking for the possible way forward to solve this and it would be great if the community can help us on identifying the right solution. Just a high level picture on our architecture. We have 3 brokers running in three machines with N number of producers and N number of consumers which can be scaled independently. 3 instance of brokers are maintained to provide HA using replicated LevelDB, so that even at 2 node failure, message queue is available with zero loss of messages. 1) Replace LevelDB with KahaDB and use Hadoop NFS (Faster than traditional NFS provided by hadoop distributor MapR) to persist KahaDB data. So that even if two of the broker goes down, the other broker can work on the data available through Hadoop NFS. I guess it would same as Shared File Storage, so this should work. Please confirm. 2) Replace ActiveMQ 5.x with ActiveMQ Artemis. I can understand from the Artemis documents that the replicated message store option is available. If Artemis is chosen, we are speculating about the code changes and efforts required to adapt the same when it will be release as ActiveMQ 6.x. Please en light me on these. - Subash Kunjupillai -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Re: Both instances of ActiveMQ connected to kahadb after network outage
On Wed, Apr 4, 2018 at 9:18 AM, gbrown wrote: > We had a short outage on the network and once the this came back both > instances in our master / slave setup were up and connectable. Once this > was > discovered when messages on queues were not browsable or able to be > consumed > the instances were restarted after renaming the db.data file as other > methods to start (persistenceAdapter options) would not work. > > Once started the messages on the queues were gone so probably lost. > > We use an nfs4 mount point. > > ActiveMQ Version is 5.11.1 > > so can anyone help with > > 1. How is it possible that both master and slave connected to the kahabd > It sure sounds like your NFS setup isn't successfully doing shared exclusive locks, even though it's an NFSv4 mount. http://activemq.2283324.n4.nabble.com/Unreliable-NFS-exclusive-locks-on-unreliable-networks-td4737992.html has some discussion of the NFS mount options that some other users are using, but I can't say that anyone's built a consensus around "these settings work and these other ones don't" so all you have to go on at the moment are these reports from other users. If you're able to tell us what settings you end up using that fix the problem (and you should plan on doing thorough testing, given that you've just demonstrated that your current settings appeared to work but didn't actually), maybe we can establish enough of a consensus among the community to consider documenting recommended values on the wiki. > 2. Is there anyway I could have recovered that would have kept the messages > on the queues > db.data is the index, and is simply cached information derived from the actual journal files. It can be safely deleted without data loss, because it will simply be rebuilt from the journal files. If all you deleted was that one file (which is what it sounds like) and you ended up not having messages upon restart, it means they had already been deleted from the journal files, and there wasn't anything you could have done to avoid losing the messages. If on the other hand you deleted *.log files in addition to db.data, then you could have avoided losing your messages by not deleting those journal files (*.log). I think from what you wrote that the message loss was unavoidable, unless your description of which files you deleted was incomplete. Tim
Re: Unreliable NFS exclusive locks on unreliable networks
I don't know why this content wasn't posted to the mailing list when I sent it via email on 3/29, but since it wasn't, here it is again: A broker should be attempting to acquire the lock on startup, so if that's not working right, it seems to indicate problems with your NFS configuration. The settings used by a few other people can be found in http://activemq.2283324.n4.nabble.com/NFS-v4-locks-quot-given-up-quot-w-o-any-logging-td4709672.html. Can you please share what you're using? This is a weak point in the ActiveMQ documentation, so if we can get enough consensus about what settings are needed for correct behavior, I can update the wiki to capture that information. Also, I see that the StackOverflow post was removed. Can you summarize its content? I'm certainly concerned about the things you wrote about the isValid() method, and I'd like to understand more about that. Tim -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Re: failed to start ActiveMQ
OK, it sounds like I understood you correctly, then. Did you use the tools and techniques outlined in the wiki page I provided to determine which destination(s) contain the messages that are preventing the files from being deleted? Tim On Wed, Apr 4, 2018, 6:02 PM norinos wrote: > Sorry, my information is not enough. > I changed the ActiveMQ setting as follows, and restarted. > > - > offlineDurableSubscriberTimeout="12" > offlineDurableSubscriberTaskSchedule="18" > - > > In this case, the client application was not connected to the server. > So I assumed that the journal files will be deleted in a few minutes after > the server started up. > > But when the offline durable subscription cleanup is started, journal file > could not be deleted. > > Since the file was not deleted, I connected the application thought that I > could receive a message, but I could not receive the message. > > > > > -- > Sent from: > http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html >
Re: failed to start ActiveMQ
Sorry, my information is not enough. I changed the ActiveMQ setting as follows, and restarted. - offlineDurableSubscriberTimeout="12" offlineDurableSubscriberTaskSchedule="18" - In this case, the client application was not connected to the server. So I assumed that the journal files will be deleted in a few minutes after the server started up. But when the offline durable subscription cleanup is started, journal file could not be deleted. Since the file was not deleted, I connected the application thought that I could receive a message, but I could not receive the message. -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Re: failed to start ActiveMQ
Sorry, my information is not enough. I changed the ActiveMQ setting as follows, and restarted. - offlineDurableSubscriberTimeout="12" offlineDurableSubscriberTaskSchedule="18" - In this case, the client application was not connected to the server. So I assumed that the journal files will be deleted in a few minutes after the server started up. But when the offline durable subscription cleanup is started, journal file could not be deleted. Since the file was not deleted, I connected the application thought that I could receive a message, but I could not receive the message. -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Both instances of ActiveMQ connected to kahadb after network outage
We had a short outage on the network and once the this came back both instances in our master / slave setup were up and connectable. Once this was discovered when messages on queues were not browsable or able to be consumed the instances were restarted after renaming the db.data file as other methods to start (persistenceAdapter options) would not work. Once started the messages on the queues were gone so probably lost. We use an nfs4 mount point. ActiveMQ Version is 5.11.1 so can anyone help with 1. How is it possible that both master and slave connected to the kahabd 2. Is there anyway I could have recovered that would have kept the messages on the queues -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Re: Older Message Not Consumed(Stay Untouched in queue) Newer Messages consumed
Sorry - the weren’t really shuffled. I don’t know exactly if they were moved to the back of the queue or just held until their redelivery delay expired and then re-injected into the queue. We didn’t test enough to make that determination - we stopped as soon as we discovered that delayed redelivery on the broker side broke FIFO. > On Apr 3, 2018, at 9:59 PM, Tim Bain wrote: > > When you say "shuffled", do you simply mean that the message went to the > back of the queue when it got sent back for broker-side redelivery? Or do > you mean that actual randomization of all messages on the queue occurred? > > Tim > > On Tue, Apr 3, 2018 at 8:29 AM, Quinn Stevenson > wrote: > >> No - we weren’t using selectors. The only “special” feature we were using >> was Virtual Topics - we saw the order shuffled on the queues created for >> the Virtual Topic Consumers. >> >> >>> On Apr 2, 2018, at 11:20 PM, Tim Bain wrote: >>> >>> @RuralHunter, same question as to the OP: were you using selectors? Was >> it >>> possible that no consumer with a selector matching those messages was >>> online, and then one came online and consumed the message? >>> >>> Tim >>> >>> On Mon, Mar 26, 2018 at 8:31 AM, RuralHunter >> wrote: >>> I reported the same problem for 5.13.4. http://activemq.2283324.n4.nabble.com/Message-stuck-in- queue-td4720713.html -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User- f2341805.html >> >>
Re: expired messages --> DLQ ? (when expiration=0)
well, before submitting issue, I'd like to ask if there's already a possibility to configure that way. ok, got your point, will open a request 2018-04-04 17:29 GMT+05:00 Tim Bain : > If you think that message expiration should be checked when the publisher > publishes the message, you can submit an enhancement request in JIRA for > it. > > Tim > > On Tue, Apr 3, 2018, 11:15 PM Илья Шипицин wrote: > > > Tim, thank you for your investigation (looks like our client is buggy, > > we'll look at it). > > besides that, doesn't it look strange, accept message and drop it right > > away ? is there a possibility to reject such messages ? > > it will inform client in a better way > > > > 2018-04-04 9:12 GMT+05:00 Tim Bain : > > > > > Ilya, > > > > > > The output you're seeing is from the message as it's being put onto the > > > DLQ. (destination=queue://DLQ.EdiInbound") This means that it has > already > > > passed through the set of transformations that occur in > > > orc.apache.activemq.broker.region.RegionBroker.sendToDeadLetterQueue() > > and > > > RegionBroker.stampAsExpired(), which set the originalExpiration > property > > to > > > the value of expiration on the original message and then set expiration > > to > > > the DLQ's expiration value (i.e. 0). > > > > > > What that means is that although you think you've set expiration to 0, > > you > > > haven't successfully done that; you can see from both the > > > originalExpiration property and the message of the Throwable in the > > > dlqDeliveryFailureCause that the message's expiration value was > > > 1522739054119, > > > which is why this message got expired to the DLQ. > > > > > > So if you're attempting to disable message expiration for this message, > > you > > > need to look at how you're doing that to figure out what you're doing > > > that's not properly configured. > > > http://activemq.apache.org/how-do-i-set-the-message-expiration.html > > might > > > be relevant, if you haven't already seen it. > > > > > > Tim > > > > > > On Tue, Apr 3, 2018 at 7:22 AM, Илья Шипицин > > wrote: > > > > > > > hello, > > > > > > > > activemq.log:2018-04-03 10:42:57,961 | INFO | preProcessDispatch: > > > > MessageDispatch {commandId = 0, responseRequired = false, consumerId > = > > > > ID:dd-amq-app01. > > > > sd.kontur.ru-32887-1522738593874-4:1:1:1, destination = > > > > queue://DLQ.EdiInbound, message = ActiveMQTextMessage {commandId = 5, > > > > responseRequired = true, mes > > > > sageId = ID:vm-dc-test8-50009-636572273611169908-4:1826:1:1:1, > > > > originalDestination = queue://EdiInbound, originalTransactionId = > null, > > > > producerId = ID:vm- > > > > dc-test8-50009-636572273611169908-4:1826:1:1, destination = > > > > queue://DLQ.EdiInbound, transactionId = null, expiration = 0, > > timestamp = > > > > 1522739053119, arriv > > > > al = 0, brokerInTime = 1522739055628, brokerOutTime = 1522739053126, > > > > correlationId = null, replyTo = null, persistent = true, type = null, > > > > priority = 4, g > > > > roupID = null, groupSequence = 0, targetConsumerId = null, > compressed = > > > > false, userID = null, content = > > > > org.apache.activemq.util.ByteSequence@552349c1, ma > > > > rshalledProperties = null, dataStructure = null, redeliveryCounter = > 0, > > > > size = 1263, properties = {originalExpiration=1522739054119, > > > > EDI_CORRELATION_ID=21 > > > > 008, EDI_FILE_NAME=15.xml, EDI_DOC_TYPE=EDI_DOC_TYPE, > > > > BrokerPath=localhost,localhost, > > > > dlqDeliveryFailureCause=java.lang.Throwable: Message Expired. Expira > > > > tion:1522739054119}, readOnlyProperties = false, readOnlyBody = > false, > > > > droppable = false, jmsXGroupFirstForConsumer = false, text = > > > version="1.0" en > > > > coding="UTF-8"?> > > > > > > > > > > > > as I can see, "originalDestination = queue://EdiInbound" - so, we > tried > > > to > > > > deliver to EdiInbound > > > > expiration is 0 (we did not set it, it's default): expiration = 0 > > > > > > > > as we can see, message was expired and delivered to DLQ. it was not > > > > intended behaviour. > > > > also, documentation states that expiration = 0 means "no expiration" > > > > > > > > > > > > please, explain me. I do not understand how expiration=0 lead to DLQ > > > > actually. > > > > > > > > configuration is pretty generic, nothing special. I'll provide > > > > configuration if needed (also, from documentation I read that > > > expiration=0 > > > > is not configuration dependent) > > > > > > > > we run 5.15.3 > > > > > > > > cheers, > > > > Ilya Shipitsin > > > > > > > > > >
Re: failed to start ActiveMQ
I'm not understanding. Are you saying that after those durable subscriptions were deleted, there were no more unconsumed messages and so the journal files should have been deleted but were not? If I've understood correctly, http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html will let you determine why the files are being kept. (Is there anything in the DLQ?) Tim On Wed, Apr 4, 2018, 4:24 AM norinos wrote: > I tried deleting db.data and db.redo files, and starting up ActiveMQ. > This try succeeded.(ActiveMQ start successfuly, and recreated db.data and > db.redo) > > But when the offline durable subscription cleanup is started, journal file > could not be deleted. > > The following message was logged to the activemq.log sometimes, and it was > no longer logged at the end. > In this state, the client cannot subscribe message. > > > > 2018-04-04 12:39:45,600 | INFO | Destroying durable subscriber due to > inactivity: > > DurableTopicSubscription-OjPgmaDmsor4oyyGGONiR18:EXACTLY_ONCE:OjPgmaDmsor4oyyGGONiR18, > id=OFFLINE:1:6137, active=false, destinations=1, total=0, pending=0, > dispatched=0, inflight=0, prefetchExtension=0 | > org.apache.activemq.broker.region.TopicRegion | ActiveMQ Durable Subscriber > Cleanup Timer > 2018-04-04 12:41:06,857 | INFO | Destroying durable subscriber due to > inactivity: > > DurableTopicSubscription-xOyHFpTRyeCRG5Ymjszmi6Q:EXACTLY_ONCE:xOyHFpTRyeCRG5Ymjszmi6Q, > id=OFFLINE:1:3439, active=false, destinations=1, total=3, pending=3, > dispatched=0, inflight=0, prefetchExtension=0 | > org.apache.activemq.broker.region.TopicRegion | ActiveMQ Durable Subscriber > Cleanup Timer > 2018-04-04 12:45:14,792 | INFO | Destroying durable subscriber due to > inactivity: > > DurableTopicSubscription-EySMl5bvJ5Ae2HZNqPmoizn:EXACTLY_ONCE:EySMl5bvJ5Ae2HZNqPmoizn, > id=OFFLINE:1:3459, active=false, destinations=1, total=0, pending=0, > dispatched=0, inflight=0, prefetchExtension=0 | > org.apache.activemq.broker.region.TopicRegion | ActiveMQ Durable Subscriber > Cleanup Timer > > > > > > > > -- > Sent from: > http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html >
Re: expired messages --> DLQ ? (when expiration=0)
If you think that message expiration should be checked when the publisher publishes the message, you can submit an enhancement request in JIRA for it. Tim On Tue, Apr 3, 2018, 11:15 PM Илья Шипицин wrote: > Tim, thank you for your investigation (looks like our client is buggy, > we'll look at it). > besides that, doesn't it look strange, accept message and drop it right > away ? is there a possibility to reject such messages ? > it will inform client in a better way > > 2018-04-04 9:12 GMT+05:00 Tim Bain : > > > Ilya, > > > > The output you're seeing is from the message as it's being put onto the > > DLQ. (destination=queue://DLQ.EdiInbound") This means that it has already > > passed through the set of transformations that occur in > > orc.apache.activemq.broker.region.RegionBroker.sendToDeadLetterQueue() > and > > RegionBroker.stampAsExpired(), which set the originalExpiration property > to > > the value of expiration on the original message and then set expiration > to > > the DLQ's expiration value (i.e. 0). > > > > What that means is that although you think you've set expiration to 0, > you > > haven't successfully done that; you can see from both the > > originalExpiration property and the message of the Throwable in the > > dlqDeliveryFailureCause that the message's expiration value was > > 1522739054119, > > which is why this message got expired to the DLQ. > > > > So if you're attempting to disable message expiration for this message, > you > > need to look at how you're doing that to figure out what you're doing > > that's not properly configured. > > http://activemq.apache.org/how-do-i-set-the-message-expiration.html > might > > be relevant, if you haven't already seen it. > > > > Tim > > > > On Tue, Apr 3, 2018 at 7:22 AM, Илья Шипицин > wrote: > > > > > hello, > > > > > > activemq.log:2018-04-03 10:42:57,961 | INFO | preProcessDispatch: > > > MessageDispatch {commandId = 0, responseRequired = false, consumerId = > > > ID:dd-amq-app01. > > > sd.kontur.ru-32887-1522738593874-4:1:1:1, destination = > > > queue://DLQ.EdiInbound, message = ActiveMQTextMessage {commandId = 5, > > > responseRequired = true, mes > > > sageId = ID:vm-dc-test8-50009-636572273611169908-4:1826:1:1:1, > > > originalDestination = queue://EdiInbound, originalTransactionId = null, > > > producerId = ID:vm- > > > dc-test8-50009-636572273611169908-4:1826:1:1, destination = > > > queue://DLQ.EdiInbound, transactionId = null, expiration = 0, > timestamp = > > > 1522739053119, arriv > > > al = 0, brokerInTime = 1522739055628, brokerOutTime = 1522739053126, > > > correlationId = null, replyTo = null, persistent = true, type = null, > > > priority = 4, g > > > roupID = null, groupSequence = 0, targetConsumerId = null, compressed = > > > false, userID = null, content = > > > org.apache.activemq.util.ByteSequence@552349c1, ma > > > rshalledProperties = null, dataStructure = null, redeliveryCounter = 0, > > > size = 1263, properties = {originalExpiration=1522739054119, > > > EDI_CORRELATION_ID=21 > > > 008, EDI_FILE_NAME=15.xml, EDI_DOC_TYPE=EDI_DOC_TYPE, > > > BrokerPath=localhost,localhost, > > > dlqDeliveryFailureCause=java.lang.Throwable: Message Expired. Expira > > > tion:1522739054119}, readOnlyProperties = false, readOnlyBody = false, > > > droppable = false, jmsXGroupFirstForConsumer = false, text = > > version="1.0" en > > > coding="UTF-8"?> > > > > > > > > > as I can see, "originalDestination = queue://EdiInbound" - so, we tried > > to > > > deliver to EdiInbound > > > expiration is 0 (we did not set it, it's default): expiration = 0 > > > > > > as we can see, message was expired and delivered to DLQ. it was not > > > intended behaviour. > > > also, documentation states that expiration = 0 means "no expiration" > > > > > > > > > please, explain me. I do not understand how expiration=0 lead to DLQ > > > actually. > > > > > > configuration is pretty generic, nothing special. I'll provide > > > configuration if needed (also, from documentation I read that > > expiration=0 > > > is not configuration dependent) > > > > > > we run 5.15.3 > > > > > > cheers, > > > Ilya Shipitsin > > > > > >
Re: failed to start ActiveMQ
I tried deleting db.data and db.redo files, and starting up ActiveMQ. This try succeeded.(ActiveMQ start successfuly, and recreated db.data and db.redo) But when the offline durable subscription cleanup is started, journal file could not be deleted. The following message was logged to the activemq.log sometimes, and it was no longer logged at the end. In this state, the client cannot subscribe message. 2018-04-04 12:39:45,600 | INFO | Destroying durable subscriber due to inactivity: DurableTopicSubscription-OjPgmaDmsor4oyyGGONiR18:EXACTLY_ONCE:OjPgmaDmsor4oyyGGONiR18, id=OFFLINE:1:6137, active=false, destinations=1, total=0, pending=0, dispatched=0, inflight=0, prefetchExtension=0 | org.apache.activemq.broker.region.TopicRegion | ActiveMQ Durable Subscriber Cleanup Timer 2018-04-04 12:41:06,857 | INFO | Destroying durable subscriber due to inactivity: DurableTopicSubscription-xOyHFpTRyeCRG5Ymjszmi6Q:EXACTLY_ONCE:xOyHFpTRyeCRG5Ymjszmi6Q, id=OFFLINE:1:3439, active=false, destinations=1, total=3, pending=3, dispatched=0, inflight=0, prefetchExtension=0 | org.apache.activemq.broker.region.TopicRegion | ActiveMQ Durable Subscriber Cleanup Timer 2018-04-04 12:45:14,792 | INFO | Destroying durable subscriber due to inactivity: DurableTopicSubscription-EySMl5bvJ5Ae2HZNqPmoizn:EXACTLY_ONCE:EySMl5bvJ5Ae2HZNqPmoizn, id=OFFLINE:1:3459, active=false, destinations=1, total=0, pending=0, dispatched=0, inflight=0, prefetchExtension=0 | org.apache.activemq.broker.region.TopicRegion | ActiveMQ Durable Subscriber Cleanup Timer -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html