[jira] [Commented] (AMQ-5077) Improve performance of ConcurrentStoreAndDispatch
[ https://issues.apache.org/jira/browse/AMQ-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985636#comment-13985636 ] Gary Tully commented on AMQ-5077: - on the store write thread delay. Currently a single thread pulls from the pending async writes queue (so long as there are 1 pending writes) - so there is an implicit delay in many concurrent producers relying to a single thread. The difficulty was getting multiple writes queued up; a single connection can now queue up writes. To improve on this, use multiple concurrent producers. Improve performance of ConcurrentStoreAndDispatch - Key: AMQ-5077 URL: https://issues.apache.org/jira/browse/AMQ-5077 Project: ActiveMQ Issue Type: Wish Components: Message Store Affects Versions: 5.9.0 Environment: 5.9.0.redhat-610343 Reporter: Jason Shepherd Assignee: Gary Tully Attachments: Test combinations.xlsx, compDesPerf.tar.gz, topicRouting.zip We have publishers publishing to a topic which has 5 topic - queue routings, and gets a max message rate attainable of ~833 messages/sec, with each message around 5k in size. To test this i set up a JMS config with topic queues: Topic TopicRouted.1 ... TopicRouted.11 Each topic has an increasing number of routings to queues, and a client is set up to subscribe to all the queues. Rough message rates: routings messages/sec 0 2500 1 1428 2 2000 3 1428 4 5 833 This occurs whether the broker config has producerFlowControl=false set to true or false , and KahaDB disk synching is turned off. We also tried experimenting with concurrentStoreAndDispatch, but that didn't seem to help. LevelDB didn't give any notable performance improvement either. We also have asyncSend enabled on the producer, and have a requirement to use persistent messages. We have also experimented with sending messages in a transaction, but that hasn't really helped. It seems like producer throughput rate across all queue destinations, all connections and all publisher machines is limited by something on the broker, through a mechanism which is not producer flow control. I think the prime suspect is still contention on the index. We did some test with Yourkit profiler. Profiler was attached to broker at startup, allowed to run and then a topic publisher was started, routing to 5 queues. Profiler statistics were reset, the publisher allowed to run for 60 seconds, and then profiling snapshot was taken. During that time, ~9600 messages were logged as being sent for a rate of ~160/sec. This ties in roughly with the invocation counts recorded in the snapshot (i think) - ~43k calls. From what i can work out, in the snapshot (filtering everything but org.apache.activemq.store.kahadb), For the 60 second sample period, 24.8 seconds elapsed in org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.removeAsyncMessage(ConnectionContext, MessageAck). 18.3 seconds elapsed in org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.asyncAddQueueMessage(ConnectionContext, Message, boolean). From these, a further large portion of the time is spent inside MessageDatabase: org.apache.activemq.store.kahadb.MessageDatabase.process(KahaRemoveMessageCommand, Location) - 10 secs elapsed org.apache.activemq.store.kahadb.MessageDatabase.process(KahaAddMessageCommand, Location) - 8.5 secs elapsed. As both of these lock on indexLock.writeLock(), and both take place on the NIO transport threads, i think this accounts for at least some of the message throughput limits. As messages are added and removed from the index one by one, regardless of sync type settings, this adds a fair amount of overhead. While we're not synchronising on writes to disk, we are performing work on the NIO worker thread which can block on locks, and could account for the behaviour we've seen client side. To Reproduce: 1. Install a broker and use the attached configuration. 2. Use the 5.8.0 example ant script to consume from the queues, TopicQueueRouted.1 - 5. eg: ant consumer -Durl=tcp://localhost:61616 -Dsubject=TopicQueueRouted.1 -Duser=admin -Dpassword=admin -Dmax=-1 3. Use the modified version of 5.8.0 example ant script (attached) to send messages to topics, TopicRouted.1 - 5, eg: ant producer -Durl='tcp://localhost:61616?jms.useAsyncSend=truewireFormat.tightEncodingEnabled=falsekeepAlive=truewireFormat.maxInactivityDuration=6socketBufferSize=32768' -Dsubject=TopicRouted.1 -Duser=admin -Dpassword=admin -Dmax=1 -Dtopic=true -DsleepTime=0 -Dmax=1 -DmessageSize=5000 This modified version of the script prints the number of messages per second and prints it to the console. -- This message was sent by Atlassian JIRA
[jira] [Commented] (AMQ-5077) Improve performance of ConcurrentStoreAndDispatch
[ https://issues.apache.org/jira/browse/AMQ-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13978299#comment-13978299 ] Gary Tully commented on AMQ-5077: - [~rwagg] I added a concurrentSend option to the composite destination and this reduces the latency because the writes can be batched. {code}compositeTopic name=TopicRouted.5.Embedded forwardOnly=true concurrentSend=true {code} Note, concurrentStoreAndDispatch gets in the way when there are no consumers so for the graph/test[1] it is disabled. With concurrentStoreAndDispatch enabled, the pending write queue can get some depth which will allow fast consumers to negate the write. changes http://git-wip-us.apache.org/repos/asf/activemq/commit/08bb172f [1] http://www.chartgo.com/get.do?id=7f99485050 Improve performance of ConcurrentStoreAndDispatch - Key: AMQ-5077 URL: https://issues.apache.org/jira/browse/AMQ-5077 Project: ActiveMQ Issue Type: Wish Components: Message Store Affects Versions: 5.9.0 Environment: 5.9.0.redhat-610343 Reporter: Jason Shepherd Assignee: Gary Tully Attachments: Test combinations.xlsx, compDesPerf.tar.gz, topicRouting.zip We have publishers publishing to a topic which has 5 topic - queue routings, and gets a max message rate attainable of ~833 messages/sec, with each message around 5k in size. To test this i set up a JMS config with topic queues: Topic TopicRouted.1 ... TopicRouted.11 Each topic has an increasing number of routings to queues, and a client is set up to subscribe to all the queues. Rough message rates: routings messages/sec 0 2500 1 1428 2 2000 3 1428 4 5 833 This occurs whether the broker config has producerFlowControl=false set to true or false , and KahaDB disk synching is turned off. We also tried experimenting with concurrentStoreAndDispatch, but that didn't seem to help. LevelDB didn't give any notable performance improvement either. We also have asyncSend enabled on the producer, and have a requirement to use persistent messages. We have also experimented with sending messages in a transaction, but that hasn't really helped. It seems like producer throughput rate across all queue destinations, all connections and all publisher machines is limited by something on the broker, through a mechanism which is not producer flow control. I think the prime suspect is still contention on the index. We did some test with Yourkit profiler. Profiler was attached to broker at startup, allowed to run and then a topic publisher was started, routing to 5 queues. Profiler statistics were reset, the publisher allowed to run for 60 seconds, and then profiling snapshot was taken. During that time, ~9600 messages were logged as being sent for a rate of ~160/sec. This ties in roughly with the invocation counts recorded in the snapshot (i think) - ~43k calls. From what i can work out, in the snapshot (filtering everything but org.apache.activemq.store.kahadb), For the 60 second sample period, 24.8 seconds elapsed in org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.removeAsyncMessage(ConnectionContext, MessageAck). 18.3 seconds elapsed in org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.asyncAddQueueMessage(ConnectionContext, Message, boolean). From these, a further large portion of the time is spent inside MessageDatabase: org.apache.activemq.store.kahadb.MessageDatabase.process(KahaRemoveMessageCommand, Location) - 10 secs elapsed org.apache.activemq.store.kahadb.MessageDatabase.process(KahaAddMessageCommand, Location) - 8.5 secs elapsed. As both of these lock on indexLock.writeLock(), and both take place on the NIO transport threads, i think this accounts for at least some of the message throughput limits. As messages are added and removed from the index one by one, regardless of sync type settings, this adds a fair amount of overhead. While we're not synchronising on writes to disk, we are performing work on the NIO worker thread which can block on locks, and could account for the behaviour we've seen client side. To Reproduce: 1. Install a broker and use the attached configuration. 2. Use the 5.8.0 example ant script to consume from the queues, TopicQueueRouted.1 - 5. eg: ant consumer -Durl=tcp://localhost:61616 -Dsubject=TopicQueueRouted.1 -Duser=admin -Dpassword=admin -Dmax=-1 3. Use the modified version of 5.8.0 example ant script (attached) to send messages to topics, TopicRouted.1 - 5, eg: ant producer -Durl='tcp://localhost:61616?jms.useAsyncSend=truewireFormat.tightEncodingEnabled=falsekeepAlive=truewireFormat.maxInactivityDuration=6socketBufferSize=32768' -Dsubject=TopicRouted.1 -Duser=admin -Dpassword=admin -Dmax=1
[jira] [Commented] (AMQ-5077) Improve performance of ConcurrentStoreAndDispatch
[ https://issues.apache.org/jira/browse/AMQ-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13956461#comment-13956461 ] Gary Tully commented on AMQ-5077: - ahh sorry, you are correct, the CompositeDestinationFilter is taking on the fanout send and it ignores the producerWindow altogether. So I think there are two ways to improve this in a derivative CompositeDestinationFilter (a new once can be provided in xml config) that will still have the sends pending and will avoid the need for a routing topic. 1) introduce an executor that can forward in parallel - so we can make better use of concurrent store and dispatch and batching to disk for the composite fanout. 2) respect the producerWindow for the executor queue, such that lots of pending sends can accumulate, allowing producer bursts up to a limit. Improve performance of ConcurrentStoreAndDispatch - Key: AMQ-5077 URL: https://issues.apache.org/jira/browse/AMQ-5077 Project: ActiveMQ Issue Type: Wish Components: Message Store Affects Versions: 5.9.0 Environment: 5.9.0.redhat-610343 Reporter: Jason Shepherd Assignee: Gary Tully Attachments: Test combinations.xlsx, compDesPerf.tar.gz, topicRouting.zip We have publishers publishing to a topic which has 5 topic - queue routings, and gets a max message rate attainable of ~833 messages/sec, with each message around 5k in size. To test this i set up a JMS config with topic queues: Topic TopicRouted.1 ... TopicRouted.11 Each topic has an increasing number of routings to queues, and a client is set up to subscribe to all the queues. Rough message rates: routings messages/sec 0 2500 1 1428 2 2000 3 1428 4 5 833 This occurs whether the broker config has producerFlowControl=false set to true or false , and KahaDB disk synching is turned off. We also tried experimenting with concurrentStoreAndDispatch, but that didn't seem to help. LevelDB didn't give any notable performance improvement either. We also have asyncSend enabled on the producer, and have a requirement to use persistent messages. We have also experimented with sending messages in a transaction, but that hasn't really helped. It seems like producer throughput rate across all queue destinations, all connections and all publisher machines is limited by something on the broker, through a mechanism which is not producer flow control. I think the prime suspect is still contention on the index. We did some test with Yourkit profiler. Profiler was attached to broker at startup, allowed to run and then a topic publisher was started, routing to 5 queues. Profiler statistics were reset, the publisher allowed to run for 60 seconds, and then profiling snapshot was taken. During that time, ~9600 messages were logged as being sent for a rate of ~160/sec. This ties in roughly with the invocation counts recorded in the snapshot (i think) - ~43k calls. From what i can work out, in the snapshot (filtering everything but org.apache.activemq.store.kahadb), For the 60 second sample period, 24.8 seconds elapsed in org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.removeAsyncMessage(ConnectionContext, MessageAck). 18.3 seconds elapsed in org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.asyncAddQueueMessage(ConnectionContext, Message, boolean). From these, a further large portion of the time is spent inside MessageDatabase: org.apache.activemq.store.kahadb.MessageDatabase.process(KahaRemoveMessageCommand, Location) - 10 secs elapsed org.apache.activemq.store.kahadb.MessageDatabase.process(KahaAddMessageCommand, Location) - 8.5 secs elapsed. As both of these lock on indexLock.writeLock(), and both take place on the NIO transport threads, i think this accounts for at least some of the message throughput limits. As messages are added and removed from the index one by one, regardless of sync type settings, this adds a fair amount of overhead. While we're not synchronising on writes to disk, we are performing work on the NIO worker thread which can block on locks, and could account for the behaviour we've seen client side. To Reproduce: 1. Install a broker and use the attached configuration. 2. Use the 5.8.0 example ant script to consume from the queues, TopicQueueRouted.1 - 5. eg: ant consumer -Durl=tcp://localhost:61616 -Dsubject=TopicQueueRouted.1 -Duser=admin -Dpassword=admin -Dmax=-1 3. Use the modified version of 5.8.0 example ant script (attached) to send messages to topics, TopicRouted.1 - 5, eg: ant producer -Durl='tcp://localhost:61616?jms.useAsyncSend=truewireFormat.tightEncodingEnabled=falsekeepAlive=truewireFormat.maxInactivityDuration=6socketBufferSize=32768'
[jira] [Commented] (AMQ-5077) Improve performance of ConcurrentStoreAndDispatch
[ https://issues.apache.org/jira/browse/AMQ-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13956570#comment-13956570 ] Richard Wagg commented on AMQ-5077: --- Hi, I'm happy that those 2 changes would help the producer in not getting blocked by the JMS. It is also likely to improve throughput to the consumers, but i think we could make further improvements there. A new executor using Concurrent store and dispatch is likely to result in more messages in flight to the consumers - but i'm still concerned that the performance of this (this being the ultimate queue writes/sec the broker can achieve) will be hard to determine, as it will be a function of how quick the underlying disk store is as well as the average roundtrip time for the consumer to receive, process and ACK each message. If we have a large queue of messages being written to the diskstore, and relatively quick consumers, then we can optimise away the disk writes - but i'm not sure that this is visible at the moment. I would still be interested in i guess some form of delay queue for the diskstore writes - a configurable property for a minimum delay to wait before writing messages through to the index/diskstore, which you could benchmark against your expected consumer ACK roundtrip time to determine if you expect to be able to optimise away the disk writes completely under most situations. If this delay is small enough, and the producer window doesn't get decremented till either the diskstore write completes or the consumer ACK arrives, we could still have some resilience against message loss with this in place. Do you think that would be a useful option, or cause more problems than it could solve? Thanks, Richard Improve performance of ConcurrentStoreAndDispatch - Key: AMQ-5077 URL: https://issues.apache.org/jira/browse/AMQ-5077 Project: ActiveMQ Issue Type: Wish Components: Message Store Affects Versions: 5.9.0 Environment: 5.9.0.redhat-610343 Reporter: Jason Shepherd Assignee: Gary Tully Attachments: Test combinations.xlsx, compDesPerf.tar.gz, topicRouting.zip We have publishers publishing to a topic which has 5 topic - queue routings, and gets a max message rate attainable of ~833 messages/sec, with each message around 5k in size. To test this i set up a JMS config with topic queues: Topic TopicRouted.1 ... TopicRouted.11 Each topic has an increasing number of routings to queues, and a client is set up to subscribe to all the queues. Rough message rates: routings messages/sec 0 2500 1 1428 2 2000 3 1428 4 5 833 This occurs whether the broker config has producerFlowControl=false set to true or false , and KahaDB disk synching is turned off. We also tried experimenting with concurrentStoreAndDispatch, but that didn't seem to help. LevelDB didn't give any notable performance improvement either. We also have asyncSend enabled on the producer, and have a requirement to use persistent messages. We have also experimented with sending messages in a transaction, but that hasn't really helped. It seems like producer throughput rate across all queue destinations, all connections and all publisher machines is limited by something on the broker, through a mechanism which is not producer flow control. I think the prime suspect is still contention on the index. We did some test with Yourkit profiler. Profiler was attached to broker at startup, allowed to run and then a topic publisher was started, routing to 5 queues. Profiler statistics were reset, the publisher allowed to run for 60 seconds, and then profiling snapshot was taken. During that time, ~9600 messages were logged as being sent for a rate of ~160/sec. This ties in roughly with the invocation counts recorded in the snapshot (i think) - ~43k calls. From what i can work out, in the snapshot (filtering everything but org.apache.activemq.store.kahadb), For the 60 second sample period, 24.8 seconds elapsed in org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.removeAsyncMessage(ConnectionContext, MessageAck). 18.3 seconds elapsed in org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.asyncAddQueueMessage(ConnectionContext, Message, boolean). From these, a further large portion of the time is spent inside MessageDatabase: org.apache.activemq.store.kahadb.MessageDatabase.process(KahaRemoveMessageCommand, Location) - 10 secs elapsed org.apache.activemq.store.kahadb.MessageDatabase.process(KahaAddMessageCommand, Location) - 8.5 secs elapsed. As both of these lock on indexLock.writeLock(), and both take place on the NIO transport threads, i think this accounts for at least some of the message throughput limits. As messages are
[jira] [Commented] (AMQ-5077) Improve performance of ConcurrentStoreAndDispatch
[ https://issues.apache.org/jira/browse/AMQ-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13955378#comment-13955378 ] Richard Wagg commented on AMQ-5077: --- Calling connectionFactory.setProducerWindowSize() With sizes varying from 10k to 10Mb has no effect on the throughput i can attain. All stack traces i take of the producer catch it in code like: {noformat} main prio=10 tid=0x0bc3b000 nid=0x4109 runnable [0x41ebe000] java.lang.Thread.State: RUNNABLE at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) at java.net.SocketOutputStream.write(SocketOutputStream.java:159) at org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:115) at java.io.DataOutputStream.flush(DataOutputStream.java:123) at org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:176) at org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:304) at org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:286) at org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:85) at org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:104) at org.apache.activemq.transport.failover.FailoverTransport.oneway(FailoverTransport.java:658) - locked 0x00050f60c5e8 (a java.lang.Object) at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68) at org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60) at org.apache.activemq.ActiveMQConnection.doAsyncSendPacket(ActiveMQConnection.java:1321) at org.apache.activemq.ActiveMQConnection.asyncSendPacket(ActiveMQConnection.java:1315) at org.apache.activemq.ActiveMQSession.send(ActiveMQSession.java:1853) - locked 0x00050f60c668 (a java.lang.Object) at org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:289) at org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:224) at org.apache.activemq.ActiveMQMessageProducerSupport.send(ActiveMQMessageProducerSupport.java:269) {noformat} My understanding of flow control the producer window size: Client side: - window size is set. - Before each send, current size of all messages in flight is checked to see if window is exceeded. - if producerWindow.waitForSpace() doesn't block, then the message is sent. - After the message is sent, the producer in flight size is incremented by the message size (and decremented when the ack is received). Broker side: - Each queue has a memory limit set, as well as overall memory limit and disk store limit. - For each message dispatched for a given queue, each of these limits is checked. - if any limit is set and sendFailIfNoSpace is set to true, the producer should get an exception sent back. In none of my tests have i caught any thread stuck inside the flow control handling logic. In all cases they're inside network code - producer side as above, broker side in something like: {noformat} ActiveMQ NIO Worker 29 daemon prio=10 tid=0x1775d000 nid=0x6a0d runnable [0x4473d000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000638b62430 (a org.apache.activemq.store.kahadb.KahaDBStore$StoreQueueTask$InnerFutureTask) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425) at java.util.concurrent.FutureTask.get(FutureTask.java:187) at org.apache.activemq.broker.region.Queue.doMessageSend(Queue.java:942) at org.apache.activemq.broker.region.Queue.send(Queue.java:727) at org.apache.activemq.broker.region.AbstractRegion.send(AbstractRegion.java:395) at org.apache.activemq.broker.region.RegionBroker.send(RegionBroker.java:441) at org.apache.activemq.broker.jmx.ManagedRegionBroker.send(ManagedRegionBroker.java:297) at org.apache.activemq.broker.region.virtual.CompositeDestinationFilter.send(CompositeDestinationFilter.java:86) at org.apache.activemq.broker.region.AbstractRegion.send(AbstractRegion.java:395) at org.apache.activemq.broker.region.RegionBroker.send(RegionBroker.java:441) at org.apache.activemq.broker.jmx.ManagedRegionBroker.send(ManagedRegionBroker.java:297) at org.apache.activemq.broker.CompositeDestinationBroker.send(CompositeDestinationBroker.java:96) at org.apache.activemq.broker.TransactionBroker.send(TransactionBroker.java:307) at
[jira] [Commented] (AMQ-5077) Improve performance of ConcurrentStoreAndDispatch
[ https://issues.apache.org/jira/browse/AMQ-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13950866#comment-13950866 ] Gary Tully commented on AMQ-5077: - With the router, a persistent message to a topic with no durable consumers is pass through - so the messages backup in the subscription pending dispatch. This is not unlike a send to the virtual topic with a very large producerWindow, in that case the messages will back up (to the window limit) pending send. In both cases the messages are pending in the broker memory, but in the producerWindow[1] case, a failover client may retain the messages pending a reply so a failover would resend. It is really a case of where to store the messages in memory and whether they need recovery. [1] org.apache.activemq.ActiveMQConnectionFactory#setProducerWindowSize Improve performance of ConcurrentStoreAndDispatch - Key: AMQ-5077 URL: https://issues.apache.org/jira/browse/AMQ-5077 Project: ActiveMQ Issue Type: Wish Components: Message Store Affects Versions: 5.9.0 Environment: 5.9.0.redhat-610343 Reporter: Jason Shepherd Assignee: Gary Tully Attachments: Test combinations.xlsx, compDesPerf.tar.gz, topicRouting.zip We have publishers publishing to a topic which has 5 topic - queue routings, and gets a max message rate attainable of ~833 messages/sec, with each message around 5k in size. To test this i set up a JMS config with topic queues: Topic TopicRouted.1 ... TopicRouted.11 Each topic has an increasing number of routings to queues, and a client is set up to subscribe to all the queues. Rough message rates: routings messages/sec 0 2500 1 1428 2 2000 3 1428 4 5 833 This occurs whether the broker config has producerFlowControl=false set to true or false , and KahaDB disk synching is turned off. We also tried experimenting with concurrentStoreAndDispatch, but that didn't seem to help. LevelDB didn't give any notable performance improvement either. We also have asyncSend enabled on the producer, and have a requirement to use persistent messages. We have also experimented with sending messages in a transaction, but that hasn't really helped. It seems like producer throughput rate across all queue destinations, all connections and all publisher machines is limited by something on the broker, through a mechanism which is not producer flow control. I think the prime suspect is still contention on the index. We did some test with Yourkit profiler. Profiler was attached to broker at startup, allowed to run and then a topic publisher was started, routing to 5 queues. Profiler statistics were reset, the publisher allowed to run for 60 seconds, and then profiling snapshot was taken. During that time, ~9600 messages were logged as being sent for a rate of ~160/sec. This ties in roughly with the invocation counts recorded in the snapshot (i think) - ~43k calls. From what i can work out, in the snapshot (filtering everything but org.apache.activemq.store.kahadb), For the 60 second sample period, 24.8 seconds elapsed in org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.removeAsyncMessage(ConnectionContext, MessageAck). 18.3 seconds elapsed in org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.asyncAddQueueMessage(ConnectionContext, Message, boolean). From these, a further large portion of the time is spent inside MessageDatabase: org.apache.activemq.store.kahadb.MessageDatabase.process(KahaRemoveMessageCommand, Location) - 10 secs elapsed org.apache.activemq.store.kahadb.MessageDatabase.process(KahaAddMessageCommand, Location) - 8.5 secs elapsed. As both of these lock on indexLock.writeLock(), and both take place on the NIO transport threads, i think this accounts for at least some of the message throughput limits. As messages are added and removed from the index one by one, regardless of sync type settings, this adds a fair amount of overhead. While we're not synchronising on writes to disk, we are performing work on the NIO worker thread which can block on locks, and could account for the behaviour we've seen client side. To Reproduce: 1. Install a broker and use the attached configuration. 2. Use the 5.8.0 example ant script to consume from the queues, TopicQueueRouted.1 - 5. eg: ant consumer -Durl=tcp://localhost:61616 -Dsubject=TopicQueueRouted.1 -Duser=admin -Dpassword=admin -Dmax=-1 3. Use the modified version of 5.8.0 example ant script (attached) to send messages to topics, TopicRouted.1 - 5, eg: ant producer -Durl='tcp://localhost:61616?jms.useAsyncSend=truewireFormat.tightEncodingEnabled=falsekeepAlive=truewireFormat.maxInactivityDuration=6socketBufferSize=32768'
[jira] [Commented] (AMQ-5077) Improve performance of ConcurrentStoreAndDispatch
[ https://issues.apache.org/jira/browse/AMQ-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951061#comment-13951061 ] Richard Wagg commented on AMQ-5077: --- Leaving aside the question of message loss on failover, We have 2 goals here: - getting the maximum possible throughput to consumers - Never blocking/delaying a producer until the JMS hits an understood/visible limit (memory/diskstore). I need to come up with a better test case to see how a larger ProducerWindowSize affects this, but for the moment i don't believe it's working as we would want it to. Currently the limit on the rate which we're able to deliver messages from producers to consumers is the speed at which the JMS can write/remove messages from the index diskstore. This happens in such a way that producers block on the send() call. org.apache.activemq.store.kahadb.KahaDBStore: public FutureObject asyncAddQueueMessage(final ConnectionContext context, final Message message) public void removeAsyncMessage(ConnectionContext context, MessageAck ack) My reading of the code is that messages can be dispatched before the store task has completed, and if the ACK arrives before the store completes, then the store operation is cancelled. This also implies that a message could be delivered without being written to disk. I'm not sure at what point in this process the producer receives the ACK. If the consumer were quick enough to receive, process and ACK the messages in question, then we'd optimise away the need to ever write to the diskstore, and not have an issue here. However, our SAN seems to be fast enough, combined with network latency, to ensure that the disk writes are nearly always in progress before the ACK arrives. In this case, all the work to write/remove messages from the diskstore index, and the synchronisation overheads of doing this, happen on the NIO worker threads. This delays the producers in a way that isn't visible to the producer. Whether sending messages synch or async, all the producer code sees is that calling send() takes longer. My understanding is that increasing the producer send window would allow it to keep more messages in flight before it has received ACKs for them - but would not help when it's blocked at the network level. I'll see if i can come up with a more specific test case that shows the effect of varying the producer send window. What i think we're looking for is some option along the lines of ConcurrentDispatchThenStoreIfNeeded - first dispatch the message, then wait for a timeout period, and then only persist the message to the diskstore incurring the disk/synchronisation penalties if an ACK doesn't arrive on time. This would be for a low (100ms?) time, respect memory limits on the broker for total messages in flight, and would allow the producer send rate to scale with the slowest consumer receive speed, rather than the sum of all queue writes possible on the JMS. Current behaviour: Producer - broker with topic - queue routings - consumers: - Producer is blocked by speed at which broker can write to all queues. Consumers receive messages at speed JMS can write. Queue write limit is global. Producer - broker with embedded routing bean - consumers (router waits for send call to complete before acking the message received): - Producer is able to write up to producer window size. Embedded bean is blocked by broker write speed - consumers receive messages at speed JMS can write. Queue write limit is global. Ideal situation: Producer - broker with topic - queue routings and ConcurrentDispatchThenStoreIfNeeded set: Producer is able to write up to producer window size. Broker is able to dispatch to consumers at consumer receive rate limit, only writing to disk if consumers slow down. Does that make sense, or do you think i'm misunderstanding the issue here? Thanks, Richard Improve performance of ConcurrentStoreAndDispatch - Key: AMQ-5077 URL: https://issues.apache.org/jira/browse/AMQ-5077 Project: ActiveMQ Issue Type: Wish Components: Message Store Affects Versions: 5.9.0 Environment: 5.9.0.redhat-610343 Reporter: Jason Shepherd Assignee: Gary Tully Attachments: Test combinations.xlsx, compDesPerf.tar.gz, topicRouting.zip We have publishers publishing to a topic which has 5 topic - queue routings, and gets a max message rate attainable of ~833 messages/sec, with each message around 5k in size. To test this i set up a JMS config with topic queues: Topic TopicRouted.1 ... TopicRouted.11 Each topic has an increasing number of routings to queues, and a client is set up to subscribe to all the queues. Rough message rates: routings messages/sec 0 2500 1 1428 2 2000 3 1428 4 5
[jira] [Commented] (AMQ-5077) Improve performance of ConcurrentStoreAndDispatch
[ https://issues.apache.org/jira/browse/AMQ-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915908#comment-13915908 ] Gary Tully commented on AMQ-5077: - some thoughts - without having looked at the particulars. for async producers - are you setting a producerWindow? that should allow pending sends to accumulate broker side. I think the root problem is ack contention for the index lock - some sort of ack batching would help there. one other thought - for composite dests - we could send to each dest in parallel - I don't think we do atm. Improve performance of ConcurrentStoreAndDispatch - Key: AMQ-5077 URL: https://issues.apache.org/jira/browse/AMQ-5077 Project: ActiveMQ Issue Type: Wish Components: Message Store Affects Versions: 5.9.0 Environment: 5.9.0.redhat-610343 Reporter: Jason Shepherd Priority: Minor Attachments: Test combinations.xlsx, compDesPerf.tar.gz We have publishers publishing to a topic which has 5 topic - queue routings, and gets a max message rate attainable of ~833 messages/sec, with each message around 5k in size. To test this i set up a JMS config with topic queues: Topic TopicRouted.1 ... TopicRouted.11 Each topic has an increasing number of routings to queues, and a client is set up to subscribe to all the queues. Rough message rates: routings messages/sec 0 2500 1 1428 2 2000 3 1428 4 5 833 This occurs whether the broker config has producerFlowControl=false set to true or false , and KahaDB disk synching is turned off. We also tried experimenting with concurrentStoreAndDispatch, but that didn't seem to help. LevelDB didn't give any notable performance improvement either. We also have asyncSend enabled on the producer, and have a requirement to use persistent messages. We have also experimented with sending messages in a transaction, but that hasn't really helped. It seems like producer throughput rate across all queue destinations, all connections and all publisher machines is limited by something on the broker, through a mechanism which is not producer flow control. I think the prime suspect is still contention on the index. We did some test with Yourkit profiler. Profiler was attached to broker at startup, allowed to run and then a topic publisher was started, routing to 5 queues. Profiler statistics were reset, the publisher allowed to run for 60 seconds, and then profiling snapshot was taken. During that time, ~9600 messages were logged as being sent for a rate of ~160/sec. This ties in roughly with the invocation counts recorded in the snapshot (i think) - ~43k calls. From what i can work out, in the snapshot (filtering everything but org.apache.activemq.store.kahadb), For the 60 second sample period, 24.8 seconds elapsed in org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.removeAsyncMessage(ConnectionContext, MessageAck). 18.3 seconds elapsed in org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.asyncAddQueueMessage(ConnectionContext, Message, boolean). From these, a further large portion of the time is spent inside MessageDatabase: org.apache.activemq.store.kahadb.MessageDatabase.process(KahaRemoveMessageCommand, Location) - 10 secs elapsed org.apache.activemq.store.kahadb.MessageDatabase.process(KahaAddMessageCommand, Location) - 8.5 secs elapsed. As both of these lock on indexLock.writeLock(), and both take place on the NIO transport threads, i think this accounts for at least some of the message throughput limits. As messages are added and removed from the index one by one, regardless of sync type settings, this adds a fair amount of overhead. While we're not synchronising on writes to disk, we are performing work on the NIO worker thread which can block on locks, and could account for the behaviour we've seen client side. To Reproduce: 1. Install a broker and use the attached configuration. 2. Use the 5.8.0 example ant script to consume from the queues, TopicQueueRouted.1 - 5. eg: ant consumer -Durl=tcp://localhost:61616 -Dsubject=TopicQueueRouted.1 -Duser=admin -Dpassword=admin -Dmax=-1 3. Use the modified version of 5.8.0 example ant script (attached) to send messages to topics, TopicRouted.1 - 5, eg: ant producer -Durl='tcp://localhost:61616?jms.useAsyncSend=truewireFormat.tightEncodingEnabled=falsekeepAlive=truewireFormat.maxInactivityDuration=6socketBufferSize=32768' -Dsubject=TopicRouted.1 -Duser=admin -Dpassword=admin -Dmax=1 -Dtopic=true -DsleepTime=0 -Dmax=1 -DmessageSize=5000 This modified version of the script prints the number of messages per second and prints it to the console. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (AMQ-5077) Improve performance of ConcurrentStoreAndDispatch
[ https://issues.apache.org/jira/browse/AMQ-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915975#comment-13915975 ] Richard Wagg commented on AMQ-5077: --- Hi, We weren't setting a producer window before, so whatever's default was used. Have tried some quick tests now - setting abnormally high/low sizes have no effect (1k, 10k, 100k, 1mb, 10mb). Is this used if flow control is disabled? Client side, we've seen calls blocked at the network level, rather than any AMQ specific code - i think regardless of the window size, it's not been hit at the time the network call is blocked. I've been using mostly NMS clients for the testing as we'd thought the problem was initially in the NMS library - i'll get a java test set up next week and go through the code path it takes and where the time is spent waiting. I think we can either say that singular acks contending on the index lock is a problem, or that we write the messages to the index too quickly in the first place - if we just want to handle the ideal case, where consumers are able to ACK the message near-instantaneously, then we could remove the need to write the messages and then remove them in short order from the index just by delaying the writes by the roundtrip time from consumer to broker. It might reduce durability, but we've already accepted that tradeoff with other settings - so it makes sense to me to try and optimise away the index writes/contention, rather than attempt to batch up the acks, which might cause less contention but probably guarantees that the messages are written in the first place. Thanks, Richard Improve performance of ConcurrentStoreAndDispatch - Key: AMQ-5077 URL: https://issues.apache.org/jira/browse/AMQ-5077 Project: ActiveMQ Issue Type: Wish Components: Message Store Affects Versions: 5.9.0 Environment: 5.9.0.redhat-610343 Reporter: Jason Shepherd Priority: Minor Attachments: Test combinations.xlsx, compDesPerf.tar.gz We have publishers publishing to a topic which has 5 topic - queue routings, and gets a max message rate attainable of ~833 messages/sec, with each message around 5k in size. To test this i set up a JMS config with topic queues: Topic TopicRouted.1 ... TopicRouted.11 Each topic has an increasing number of routings to queues, and a client is set up to subscribe to all the queues. Rough message rates: routings messages/sec 0 2500 1 1428 2 2000 3 1428 4 5 833 This occurs whether the broker config has producerFlowControl=false set to true or false , and KahaDB disk synching is turned off. We also tried experimenting with concurrentStoreAndDispatch, but that didn't seem to help. LevelDB didn't give any notable performance improvement either. We also have asyncSend enabled on the producer, and have a requirement to use persistent messages. We have also experimented with sending messages in a transaction, but that hasn't really helped. It seems like producer throughput rate across all queue destinations, all connections and all publisher machines is limited by something on the broker, through a mechanism which is not producer flow control. I think the prime suspect is still contention on the index. We did some test with Yourkit profiler. Profiler was attached to broker at startup, allowed to run and then a topic publisher was started, routing to 5 queues. Profiler statistics were reset, the publisher allowed to run for 60 seconds, and then profiling snapshot was taken. During that time, ~9600 messages were logged as being sent for a rate of ~160/sec. This ties in roughly with the invocation counts recorded in the snapshot (i think) - ~43k calls. From what i can work out, in the snapshot (filtering everything but org.apache.activemq.store.kahadb), For the 60 second sample period, 24.8 seconds elapsed in org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.removeAsyncMessage(ConnectionContext, MessageAck). 18.3 seconds elapsed in org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.asyncAddQueueMessage(ConnectionContext, Message, boolean). From these, a further large portion of the time is spent inside MessageDatabase: org.apache.activemq.store.kahadb.MessageDatabase.process(KahaRemoveMessageCommand, Location) - 10 secs elapsed org.apache.activemq.store.kahadb.MessageDatabase.process(KahaAddMessageCommand, Location) - 8.5 secs elapsed. As both of these lock on indexLock.writeLock(), and both take place on the NIO transport threads, i think this accounts for at least some of the message throughput limits. As messages are added and removed from the index one by one, regardless of sync type settings, this adds a fair amount of overhead. While we're not
[jira] [Commented] (AMQ-5077) Improve performance of ConcurrentStoreAndDispatch
[ https://issues.apache.org/jira/browse/AMQ-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914756#comment-13914756 ] Richard Wagg commented on AMQ-5077: --- Hi, Attached is some more different test configurations i've tried and results. Nothing seems to massively affect the throughput for better or worse. In none of the tests have the consumers had any message backlog. To try and sum up the problem: - Under normal operation we want our producers to remain unblocked for as long as possible (flow control is fine, but that to me means that the producer works uninterrupted up until a memory/disk limit is reached, then PFC kicks in, rather than a gradual degradation). - Clients all run in CLIENT_ACK mode - messages are ACKed one by one. - Both the diskstore and the network are relatively quick - tests running against topics show a roundtrip time of ~ 0.740ms (producer - broker - consumer - broker - producer reply). - The ability of the producers to send messages is currently limited by some TCP level limitation, due to the amount of work the broker is doing on it's receive threads. - The observed behaviour in producer code is that whether sending sync or async, the call to producer.send() just blocks - so even async sends are affected by JMS throughput, in a manner which isn't flow control. - In none of our tests have the consumers ever had a large pending message count - the blocking factor is not consumer speed or memory/queue limits. - Taking thread dumps throughout these tests, we can see contention around the synchronised access points in MessageDatabase - the process() method taking arguments KahaAddMessageCommand and KahaRemoveMessageCommand both lock around the page index. - Any sort of option to batch consumer acks might remove the number of single messages removed, but would also delay the ACKs to the point where more is written to the store. - Some options in the JMS (ConcurrentStoreAndDispatch) are supposed to allow optimisations in this scenario, but appear to have limited or no effect. I think there are 2 options for the root problem: 1. Disk writes are too quick (or too many disk writes are allowed to be in progress) - by the time the roundtrip from broker - consumer and back completes, the kahaDb write is already done or in progress. 2. Thread contention stops the ACKs arriving in time to prevent the diskstore writes from happening, negating the benefit of allowing concurrentStoreAndDispatch from a performance point of view (clients might receive messages quicker, but broker still has to add and remove each message serially from the diskstore). I think the second option is most likely - we're effectively doing a lot of disk based work that we don't need to do, just because the consumer ACKs aren't coming back quick enough or aren't able to be received before the disk write is in progress, causing a double hit on both thread synchronisation, and then again at the disk level adding and removing the same message inside a small timeframe. Ideas welcome for configurations to try or areas to look at. Stuff i'm going to try: - Add some debug logging to see the queue length of the asyncQueueJobQueue in KahaDBStore - Changing client ack mode - optimizeAcknowledge, DUPS_OK_ACK etc. Don't think it'll have much effect but it's something else to rule out. I would be interested in trying any code which would allow the disk writes to go on a delay queue - something along the lines of dispatch straight away, only write to kahaDB if the ACK hasn't arrived after a configurable interval. I'm still not sure that a lot of this work should be done on the NIO send threads - even if the contention on the kahaDb store needs to be allowed to happen, i would expect requests to be allowed to queue up in memory, until a PFC limit kicks in. Until that point i wouldn't expect producer send performance to be affected. Diskstore performance for comparison: [activemq@londngnfrjms01 lib]$ /opt/java/x64/jdk1.7.0_51/bin/java -cp activemq-kahadb-store-5.9.0.redhat-610350.jar org.apache.activemq.store.kahadb.disk.util.DiskBenchmark /jms_vsp/activemq-rh590/data/test.dat Benchmarking: /jms_vsp/activemq-rh590/data/test.dat Writes: 1023993 writes of size 4096 written in 10.69 seconds. 95789.805 writes/second. 374.17892 megs/second. Sync Writes: 49746 writes of size 4096 written in 10.001 seconds. 4974.1025 writes/second. 19.430088 megs/second. Reads: 5468429 reads of size 4096 read in 10.001 seconds. 546788.25 writes/second. 2135.8916 megs/second. Improve performance of ConcurrentStoreAndDispatch - Key: AMQ-5077 URL: https://issues.apache.org/jira/browse/AMQ-5077 Project: ActiveMQ Issue Type: Wish Components: Message Store Affects Versions: 5.9.0
Re: [jira] [Commented] (AMQ-5077) Improve performance of ConcurrentStoreAndDispatch
Did I read that correctly - there's a possible concern that disk writes are too fast? -- View this message in context: http://activemq.2283324.n4.nabble.com/jira-Commented-AMQ-5077-Improve-performance-of-ConcurrentStoreAndDispatch-tp4678395p4678401.html Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.
[jira] [Commented] (AMQ-5077) Improve performance of ConcurrentStoreAndDispatch
[ https://issues.apache.org/jira/browse/AMQ-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914140#comment-13914140 ] Jason Shepherd commented on AMQ-5077: - This issue is also logged in the enterprise 6.1 branch here: https://issues.jboss.org/browse/ENTMQ-569 Improve performance of ConcurrentStoreAndDispatch - Key: AMQ-5077 URL: https://issues.apache.org/jira/browse/AMQ-5077 Project: ActiveMQ Issue Type: Wish Components: Message Store Affects Versions: 5.9.0 Environment: 5.9.0.redhat-610343 Reporter: Jason Shepherd Priority: Minor Attachments: compDesPerf.tar.gz We have publishers publishing to a topic which has 5 topic - queue routings, and gets a max message rate attainable of ~833 messages/sec, with each message around 5k in size. To test this i set up a JMS config with topic queues: Topic TopicRouted.1 ... TopicRouted.11 Each topic has an increasing number of routings to queues, and a client is set up to subscribe to all the queues. Rough message rates: routings messages/sec 0 2500 1 1428 2 2000 3 1428 4 5 833 This occurs whether the broker config has producerFlowControl=false set to true or false , and KahaDB disk synching is turned off. We also tried experimenting with concurrentStoreAndDispatch, but that didn't seem to help. LevelDB didn't give any notable performance improvement either. We also have asyncSend enabled on the producer, and have a requirement to use persistent messages. We have also experimented with sending messages in a transaction, but that hasn't really helped. It seems like producer throughput rate across all queue destinations, all connections and all publisher machines is limited by something on the broker, through a mechanism which is not producer flow control. I think the prime suspect is still contention on the index. We did some test with Yourkit profiler. Profiler was attached to broker at startup, allowed to run and then a topic publisher was started, routing to 5 queues. Profiler statistics were reset, the publisher allowed to run for 60 seconds, and then profiling snapshot was taken. During that time, ~9600 messages were logged as being sent for a rate of ~160/sec. This ties in roughly with the invocation counts recorded in the snapshot (i think) - ~43k calls. From what i can work out, in the snapshot (filtering everything but org.apache.activemq.store.kahadb), For the 60 second sample period, 24.8 seconds elapsed in org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.removeAsyncMessage(ConnectionContext, MessageAck). 18.3 seconds elapsed in org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.asyncAddQueueMessage(ConnectionContext, Message, boolean). From these, a further large portion of the time is spent inside MessageDatabase: org.apache.activemq.store.kahadb.MessageDatabase.process(KahaRemoveMessageCommand, Location) - 10 secs elapsed org.apache.activemq.store.kahadb.MessageDatabase.process(KahaAddMessageCommand, Location) - 8.5 secs elapsed. As both of these lock on indexLock.writeLock(), and both take place on the NIO transport threads, i think this accounts for at least some of the message throughput limits. As messages are added and removed from the index one by one, regardless of sync type settings, this adds a fair amount of overhead. While we're not synchronising on writes to disk, we are performing work on the NIO worker thread which can block on locks, and could account for the behaviour we've seen client side. To Reproduce: 1. Install a broker and use the attached configuration. 2. Use the 5.8.0 example ant script to consume from the queues, TopicQueueRouted.1 - 5. eg: ant consumer -Durl=tcp://localhost:61616 -Dsubject=TopicQueueRouted.1 -Duser=admin -Dpassword=admin -Dmax=-1 3. Use the modified version of 5.8.0 example ant script (attached) to send messages to topics, TopicRouted.1 - 5, eg: ant producer -Durl='tcp://localhost:61616?jms.useAsyncSend=truewireFormat.tightEncodingEnabled=falsekeepAlive=truewireFormat.maxInactivityDuration=6socketBufferSize=32768' -Dsubject=TopicRouted.1 -Duser=admin -Dpassword=admin -Dmax=1 -Dtopic=true -DsleepTime=0 -Dmax=1 -DmessageSize=5000 This modified version of the script prints the number of messages per second and prints it to the console. -- This message was sent by Atlassian JIRA (v6.1.5#6160)