Re: System stalling
> Thanks, Jimmy. I have been looking into this issue a little more. I > couldn't exactly duplicate your numbers as my test machine did not have > sufficient memory but I believe I have identified the key symptom (JIRA > updated accordingly), though as yet not the root cause. > > As noted in the JIRA, it may be possible to tune your receivers to > mitigate the issue. How feasible that is probably depends on how closely > your real system follows the test scenario in the JIRA. For large > messages, reducing the capacity seems to be the most effective > improvement. As message size decreases, acknowledging in larger batches > becomes more effective. I tried lowering the capacity in the albeit extreme testcase, and the issue was only resolved when I put it down to 1, which I can't do on my live systems as the performance will be too poor. However as noted below I've rerouted most of my large messages, which is okay for the time being. > One other question was just to confirm that the case as reported does > match your real system. Initially there was a suspicion that the ingest > process was blocked on send which would I think would be a different issue. I'm pretty sure this is causing at least some of my problems, as I've rerouted the main culprit of the large messages to a second broker, and now the main broker is much happier. The second broker exhibits the performance problems above, despite not having very many messages to process. However I think I'm still see some sends taking >30s on the main broker, however a lot less frequently than before, and not causing ring queue overflows. Possibly a separate issue, although could well be caused by the poor IO performance of the VM its running on which should hopefully be resolved soon, so I will test further then. > I'll do some more digging on what the root cause from the drop in > throughput for large messages on a full ring queue might be and update > the JIRA with any progress. Thanks! Jimmy - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
Re: Removing the bindings from qpid-cpp source tarball...
[X] Yes, remove the language bindings from qpid-cpp-${VER} Havn't been able to compile bindings from this package since 0.22 anyway! Jimmy - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
Re: System stalling
> > Hi, > > > > I've finally managed to isolate the issue and can reproduce it with the > > attached scripts. Running rx-test.pl followed by tx-test.pl results in a > > system where the receiver can keep up with the producer (gets a message > > every <1s) (tx-test 118% CPU, qpidd 97% CPU, rx-test 60% CPU). However, if > > you stop rx-test and restart it (even after only a second or so), it starts > > to take 2s+ to receive messages, going up to about 6s on my system, so the > > ring quickly fills and overflows. Even if the producer is then stopped, > > messages are still only received every 3s - with qpidd on 100% CPU and the > > receiver on 5%. Also the resident size of qpidd reaches 5GB, yet the queue > > is only 2GB. > > > > Hopefully I can now regain my sanity :) > > Well done! Unfortunately your scripts seem to have been stripped off at > some stage. Could you attach them to a JIRA perhaps? This was with 0.22, > right? Created QPID-5135. Also wanted to thank everyone for their awesome help and support! Jimmy - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
Re: System stalling
Hi, I've finally managed to isolate the issue and can reproduce it with the attached scripts. Running rx-test.pl followed by tx-test.pl results in a system where the receiver can keep up with the producer (gets a message every <1s) (tx-test 118% CPU, qpidd 97% CPU, rx-test 60% CPU). However, if you stop rx-test and restart it (even after only a second or so), it starts to take 2s+ to receive messages, going up to about 6s on my system, so the ring quickly fills and overflows. Even if the producer is then stopped, messages are still only received every 3s - with qpidd on 100% CPU and the receiver on 5%. Also the resident size of qpidd reaches 5GB, yet the queue is only 2GB. Hopefully I can now regain my sanity :) Cheers, Jimmy - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
Re: System stalling
> > Hi Ken, > > > > Had a play with oprofile... but seems to have lumped everything into glibc, > > any ideas? The queue is setup as ring, max size 2GB. > > You probably need to install libstdc++-debuginfo & glibc-debuginfo to > see more detail. Right, installed debuginfo packages and now makes a little more sense: CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % image name symbol name 6569 33.6337 libstdc++.so.6.0.8 __gnu_cxx::__atomic_add(int volatile*, int) 5117 26.1994 libstdc++.so.6.0.8 __gnu_cxx::__exchange_and_add(int volatile*, int) 2004 10.2606 libc-2.5.so memcpy 1695 8.6785 libqpidbroker.so.2.0.0 void deque::_M_range_insert_aux<_Deque_iterator>(_Deque_iterator, _Deque_iterator, _Deque_iterator, forward_iterator_tag) 823 4.2138 libc-2.5.so _int_malloc - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
Re: System stalling
> > Hi Ken, > > > > Had a play with oprofile... but seems to have lumped everything into glibc, > > any ideas? The queue is setup as ring, max size 2GB. > > You probably need to install libstdc++-debuginfo & glibc-debuginfo to > see more detail. Will go and hunt them down for RHEL5. Also, on reflection, I should have said I had to use the timer interrupt as I'm on VMware. I'm wondering if the wall clock time in glibc is actually waiting on ptreads locks and epoll which is counted, whereas if I was using the hardware performance counters I'd get CPU time... Jimmy - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
Re: System stalling
Hi Ted, I don't have any flow control that I'm aware of. Will send the logs separately. Cheers, Jimmy - Original Message - From: Ted Ross Sent: 09/06/13 02:02 PM To: users@qpid.apache.org Subject: Re: System stalling Jimmy, Do your ring queues have any flow-control configuration set up? This would be --flow-* thresholds in qpid-config. Also, it would be helpful to see the output of a pstack on the qpidd process when the condition occurs. I think almost everything happens under DispatchHandle::processEvent :) -Ted On 09/06/2013 09:50 AM, Jimmy Jones wrote: > I've done some further digging, and managed to simplify the system a little > to reproduce the problem. The system is now an external process that posts > messages to the default headers exchange on my machine, which has a ring > queue to receive effectively all messages from the default headers exchange, > process them, and post to another headers exchange. There is now nothing > listening on the subsequent headers exchange, and all exchanges are > non-durable. I've also tried Fraser's suggestion of marking the link as > unreliable on the queue which seems to have no effect (is there any way in > the qpid utilities to confirm the link has been set to unreliable?) > > So essentially what happens is the system happily processes away, normally > with an empty ring queue, sometimes it spikes up a bit and goes back down > again, with my ingest process using ~70% CPU and qpidd ~50% CPU, on a machine > with 8 CPU cores. However sometimes the queue spikes up to 2GB (the max), > starts throwing messages away, and qpid hits 100%+ CPU and the ingest process > goes to about 3% CPU. I can see messages are being very slowly processed. > > I've tried attaching to qpidd with gdb a few times, and all threads apart > from one seem to be idle in epoll_wait or pthread_cond_wait. The running > thread always seems to be somewhere under DispatchHandle::processEvent. > > I'm at a bit of a loss for what I can do to fix this! > > Jimmy > > - Original Message - > From: Fraser Adams > Sent: 08/23/13 09:09 AM > To: users@qpid.apache.org > Subject: Re: System stalling > Hi Jimmy, hope you are well! > As an experiment one thing that you could try is messing with the link > "reliability". As you know in the normal mode of operation it's > necessary to periodically send acknowledgements from the consumer client > application which then get passed back ultimately to the broker. > > I'm no expert on this but from my recollection if you are in a position > particularly where circular queues are overflowing and you are > continually trying to produce and consume and you have some fair level > of prefetch/capacity on the consumer the mechanism for handling the > acknowledgements on the broker is "sub-optimal" - I think it's a linear > search or some such and there are conditions where catching up with > acknowledgements becomes a bit "N squared". > > Gordon would be able to explain this way better than me - that's > assuming this hypothesis is even relevant :-) > > Anyway if you try having a link: {reliability: unreliable} stanza in > your consumer address string (as an example one of mine looks like the > following - the address sting syntax isn't exactly trivial :-)). > > string address = "test_consumer; {create: receiver, node: {x-declare: > {auto-delete: True, exclusive: True, arguments: {'qpid.policy_type': > ring, 'qpid.max_size': 1}}, x-bindings: [{exchange: 'amq.match', > queue: 'test_consumer', key: 'test1', arguments: {x-match: all, > data-format: test}}]}, link: {reliability: unreliable}}"; > > Clearly your arguments would be different but hopefully it'll give you a > kick start. > > > The main down side of disabling link reliability is that if you have > enabled prefetch and the consumer unexpectedly dies then all of the > messages on the prefetch queue will be lost, whereas with reliable > messaging the broker maintains references to all unacknowledged messages > so would resent them (I *think* that's how it works.) > > > At the very least it's a fairly simple tweak to your consumer addresses > that might rule out (or point to) acknowledgement shenanigans as being > the root of your problem. From my own experience I always end up blaming > this first if I hit performance weirdness with ring queues :-) > > HTH, > Frase > > > > On 21/08/13 17:08, Jimmy Jones wrote: >>>>>> I've got an simple processing system using the 0.22 C++ broker, all >>>>>> on one box, where an external system po
Re: System stalling
Hi Ken, Had a play with oprofile... but seems to have lumped everything into glibc, any ideas? The queue is setup as ring, max size 2GB. # qpid-stat -q Queues queue dur autoDel excl msg msgIn msgOut bytes bytesIn bytesOut cons bind == 9d587baf-edc4-694b-bd43-f0accdf77a44:0.0 Y Y 0 0 0 0 0 0 1 2 ingest Y 16.4k 49.9k 33.5k 2.13g 7.17g 5.04g 1 2 # opreport --long-filenames --session-dir=/root/oprof CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt TIMER:0| samples| %| -- 78847 100.000 /usr/sbin/qpidd TIMER:0| samples| %| -- 53196 67.4674 /usr/lib64/libstdc++.so.6.0.8 14220 18.0349 /usr/lib/libqpidbroker.so.2.0.0 7368 9.3447 /lib64/libc-2.5.so 3833 4.8613 /usr/lib/libqpidcommon.so.2.0.0 93 0.1179 /usr/lib/libqpidtypes.so.1.0.0 70 0.0888 /lib64/libpthread-2.5.so 43 0.0545 /lib64/ld-2.5.so 19 0.0241 /lib64/librt-2.5.so 4 0.0051 /lib64/libuuid.so.1.2 1 0.0013 /usr/sbin/qpidd # opreport --demangle=smart --session-dir=/root/oprof --symbols `which qpidd` CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % image name symbol name 53196 67.4674 libstdc++.so.6.0.8 /usr/lib64/libstdc++.so.6.0.8 10597 13.4400 libqpidbroker.so.2.0.0 void deque::_M_range_insert_aux<_Deque_iterator>(_Deque_iterator, _Deque_iterator, _Deque_iterator, forward_iterator_tag) 2922 3.7059 libqpidcommon.so.2.0.0 qpid::framing::AMQFrame::~AMQFrame() 2833 3.5930 libqpidbroker.so.2.0.0 deque::clear() 2486 3.1529 libc-2.5.so _int_malloc 1882 2.3869 libc-2.5.so _int_free 1757 2.2284 libc-2.5.so malloc 589 0.7470 libc-2.5.so memcpy 384 0.4870 libc-2.5.so free ... Cheers, Jimmy - Original Message - From: Ken Giusti Sent: 09/06/13 06:27 PM To: users@qpid.apache.org Subject: Re: System stalling Hi Jimmy, Have you ever used the oprofile tool before? http://oprofile.sourceforge.net/about/ I've found this tool useful when I need to get a sense of where the broker is spending its time, especially when it is compute-bound. You'll need to be able to install oprofile on the system that is running the broker, and you'd need root permission to run it. The approach I take is to configure oprofile to analyze the broker, then perform whatever actions I need to get the broker to into the compute bound state. Once the broker is acting up, I then trigger oprofile to start a capture. That results in a capture that best represents what the broker is doing when it is in that compute bound state. It's been awhile since I used oprofile, but here's a summary of the commands I used last. First, the path to the broker executable for this example is /home/kgiusti/mrg/qpid/cpp/src/.libs/lt-qpidd. Be sure you're referencing the actual executable image an not the shell wrapper that autotools generates! After starting the broker daemon, I delete any old oprofile configuration and captures. I then configure and start the oprofile daemon using the following commands (done as root): $ rm -rf /root/oprof $ rm -rf ~/.oprofile $ opcontrol --shutdown $ opcontrol --init $ opcontrol --reset $ opcontrol --setup --no-vmlinux --session-dir=/root/oprof --image=/home/kgiusti/mrg/qpid/cpp/src/.libs/lt-qpidd --separate=library --event=INST_RETIRED_ANY_P:6000:0:0:1 --cpu-buffer-size=100 $ opcontrol --start-daemon Once that is done, you should try to reproduce the problem. Once the broker is in that weird state, start the capture: $ opcontrol --start Capture for a while, then stop the capture and dump the results: $ opcontrol --stop $ opreport --long-filenames --session-dir=/root/oprof opreport will dump the methods where the broker is spending most of its compute time. You might need to also provide the paths to the link libraries, e.g.: $ opreport --long-filenames --session-dir=/root/oprof -l /home/kgiusti/mrg/qpid/cpp/src/.libs/libqpidbroker.so.2.0.0 These notes are a bit old, and opcontrol/opreport's options may have changed a bit, but this should give you a general idea of how to use it. -K ----- Original Message - > From: "Jimmy Jones" > To: users@qpid.apache.org > Sent: Friday, September 6, 2013 9:50:17 AM > Subject: Re: System stalling > > I've done some further digging, a
Re: System stalling
I've done some further digging, and managed to simplify the system a little to reproduce the problem. The system is now an external process that posts messages to the default headers exchange on my machine, which has a ring queue to receive effectively all messages from the default headers exchange, process them, and post to another headers exchange. There is now nothing listening on the subsequent headers exchange, and all exchanges are non-durable. I've also tried Fraser's suggestion of marking the link as unreliable on the queue which seems to have no effect (is there any way in the qpid utilities to confirm the link has been set to unreliable?) So essentially what happens is the system happily processes away, normally with an empty ring queue, sometimes it spikes up a bit and goes back down again, with my ingest process using ~70% CPU and qpidd ~50% CPU, on a machine with 8 CPU cores. However sometimes the queue spikes up to 2GB (the max), starts throwing messages away, and qpid hits 100%+ CPU and the ingest process goes to about 3% CPU. I can see messages are being very slowly processed. I've tried attaching to qpidd with gdb a few times, and all threads apart from one seem to be idle in epoll_wait or pthread_cond_wait. The running thread always seems to be somewhere under DispatchHandle::processEvent. I'm at a bit of a loss for what I can do to fix this! Jimmy - Original Message - From: Fraser Adams Sent: 08/23/13 09:09 AM To: users@qpid.apache.org Subject: Re: System stalling Hi Jimmy, hope you are well! As an experiment one thing that you could try is messing with the link "reliability". As you know in the normal mode of operation it's necessary to periodically send acknowledgements from the consumer client application which then get passed back ultimately to the broker. I'm no expert on this but from my recollection if you are in a position particularly where circular queues are overflowing and you are continually trying to produce and consume and you have some fair level of prefetch/capacity on the consumer the mechanism for handling the acknowledgements on the broker is "sub-optimal" - I think it's a linear search or some such and there are conditions where catching up with acknowledgements becomes a bit "N squared". Gordon would be able to explain this way better than me - that's assuming this hypothesis is even relevant :-) Anyway if you try having a link: {reliability: unreliable} stanza in your consumer address string (as an example one of mine looks like the following - the address sting syntax isn't exactly trivial :-)). string address = "test_consumer; {create: receiver, node: {x-declare: {auto-delete: True, exclusive: True, arguments: {'qpid.policy_type': ring, 'qpid.max_size': 1}}, x-bindings: [{exchange: 'amq.match', queue: 'test_consumer', key: 'test1', arguments: {x-match: all, data-format: test}}]}, link: {reliability: unreliable}}"; Clearly your arguments would be different but hopefully it'll give you a kick start. The main down side of disabling link reliability is that if you have enabled prefetch and the consumer unexpectedly dies then all of the messages on the prefetch queue will be lost, whereas with reliable messaging the broker maintains references to all unacknowledged messages so would resent them (I *think* that's how it works.) At the very least it's a fairly simple tweak to your consumer addresses that might rule out (or point to) acknowledgement shenanigans as being the root of your problem. From my own experience I always end up blaming this first if I hit performance weirdness with ring queues :-) HTH, Frase On 21/08/13 17:08, Jimmy Jones wrote: >>>>> I've got an simple processing system using the 0.22 C++ broker, all >>>>> on one box, where an external system posts messages to the default >>>>> headers exchange, and an ingest process receives them using a ring >>>>> queue, transforms them and outputs to a different headers exchange. >>>>> Various other processes pick messages of interest off that exchange >>>>> using ring queues. Recently however the system has been stalling - >>>>> I'm still receiving lots of data from the other system, but the >>>>> ingest process suddenly goes to <5% CPU usage and its queue fills up >>>>> and messages start getting discarded from the ring, the follow on >>>>> processes go to practically 0% CPU and qpidd hovers around 95-120% >>>>> CPU (normally its ~75%) and the rest of the system pretty much goes >>>>> idle (no swapping, there is free memory) >>>>> >>>>> I attached to the ing
Re: UTF8 / binary strings in dynamic languages
> 3. If the language string is an overloaded text/bytes type, as is > regrettably quite common, what do we do then? > > The current answer to this question is "send it as vbin". That's very > safe, insofar as it won't throw any sort of encoding exception. It > does not, however, always honor what I think is the user's more > typical intention: produce an ascii string at the other end. I guess the problem is between dynamically and statically typed languages, if you stay with the same language you don't notice anything, but this slightly defeats the object of AMQP! > So for 3, I'd like to consider the possibility of, by default, sending > ambiguous language strings as ascii rendered to amqp str16. This > requires an encoding step that may produce errors. And maybe that's > just too obnoxious! That's what I'd like to know. I'm not convinced, but I'm prepared to be convinced. If I put a binary value in a map and encoded it some of the time it might be valid utf8, other times not. Could this lead to a class of subtle bugs where a receiver written in a statically typed language will work most of the time when the value appears as a vbin, but not other times when it "accidentally" appears a a str16? > In summary, if we have a way to determine what the user wanted (text > or bytes), we should try to carry that through on the wire. At the > following URL I've tried to map out what type information we can get > for each language. Please update it as you please. > > > https://cwiki.apache.org/confluence/display/qpid/Language+support+for+unambiguous+text+string+and+byte+array+types I've just signed up, but don't seem to be able to edit the page? I'll add the stuff about utf8::upgrade when I can edit. > On Wed, Aug 21, 2013 at 8:44 AM, Jimmy Jones wrote: >>> > AFAIK in perl, if you include unicode characters in a string it'll >>> > set the utf8 flag. If you don't include any unicode characters (eg. 7 >>> > bit ascii, or raw bytes) the flag won't be set. So given a perl >>> > scalar that doesn't contain any utf8 characters, you don't know if >>> > its a textual string (str16) or a binary string (vbin). There is a >>> > is_utf8_string function, but that'll only tell you if the string >>> > would be valid utf8, but it could be a binary string that happens to >>> > be valid utf8, so that's not really safe. >>> >>> You can explicitly mark it as utf8 using utf8::upgrade() though, right? >>> Certainly I tried that in a simple test and the property in question was >>> then sent as str16. >> >> Yes, if I as a user had a string that was textual, I could call >> utf8::upgrade() to ensure it got sent as str16. I guess this is similar in >> concept to calling setEncoding in C++, although maybe less natural in a >> dynamically typed language. > > It would be more reasonable to treat perl scalars as textual for our > API if perl offered a good way to explicitly handle byte arrays. My > (certainly insufficient) web browsing suggested that wasn't really > available, or not in a form recommended for use. Any candidates for a > serviceable explicitly-arbitrary-bytes-and-not-text-at-all "type" in > perl? Sorry, I don't know of any, althogh I'm no perl guru! I'll have another look though. - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
Re: System stalling
I've got an simple processing system using the 0.22 C++ broker, all on one box, where an external system posts messages to the default headers exchange, and an ingest process receives them using a ring queue, transforms them and outputs to a different headers exchange. Various other processes pick messages of interest off that exchange using ring queues. Recently however the system has been stalling - I'm still receiving lots of data from the other system, but the ingest process suddenly goes to <5% CPU usage and its queue fills up and messages start getting discarded from the ring, the follow on processes go to practically 0% CPU and qpidd hovers around 95-120% CPU (normally its ~75%) and the rest of the system pretty much goes idle (no swapping, there is free memory) I attached to the ingest process with gdb and it was stuck in send (waitForCapacity/waitForCompletionImpl) - I notice this can block. >>> >>> Is there any queue bound to the second headers exchange, i.e. to the one >>> this ingest process is sending to, that is not a ring queue? (If you run >>> qpid-config queue -r, you get a quick listing of the queues and their >>> bindings). >> >> I've run qpid-config queue, and all my queues have --limit-policy=ring, apart >> from a UUID one which I presume is qpid-config itself. Are there any other >> useful >> debugging things I can do? > >What does qpid-stat -q show? Is it possible to test whether the broker >is still responsive e,g, by sending and receiving messages through a >test queue/exchange? Are there any errors in the logs? Are any of the >queues durable (and messages persistent)? qpid-stat -q is all zero's in the msg & bytes column, apart from the ingest queue, and another overflowing ring queue I have. I did run qpid-tool when the system was broken to dump some stats. msgTotalDequeues was slowly incremeneting on the ingest queue, so I presume messages were still being delivered and the broker was responsive? The only logging I've got is syslog, and I just see a warning about unsent data, presumably when the ingest process receives a SIGALARM. I'm happy to swich on more logging, what would you recommend? None of my queues are durable, but I think incoming messages from the other system are marked as durable. The exchange that the ingest process sends to is durable, but I'm not setting any durable flags on outgoing messages (I presume the default is off). >Another thing might be a ptrace of the broker process. Maybe two or >three with a short delay between them. I'll try this next time it goes haywire. >For some reason it seems like the broker is not sending back >confirmation to the sender in the ingest process, causing that to block. >Ring queues shouldn't be subject to producer flow control so we need to >figure out what other reason there could be for that. - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
Re: System stalling
> > I've got an simple processing system using the 0.22 C++ broker, all > > on one box, where an external system posts messages to the default > > headers exchange, and an ingest process receives them using a ring > > queue, transforms them and outputs to a different headers exchange. > > Various other processes pick messages of interest off that exchange > > using ring queues. Recently however the system has been stalling - > > I'm still receiving lots of data from the other system, but the > > ingest process suddenly goes to <5% CPU usage and its queue fills up > > and messages start getting discarded from the ring, the follow on > > processes go to practically 0% CPU and qpidd hovers around 95-120% > > CPU (normally its ~75%) and the rest of the system pretty much goes > > idle (no swapping, there is free memory) > > > > I attached to the ingest process with gdb and it was stuck in send > > (waitForCapacity/waitForCompletionImpl) - I notice this can block. > > Is there any queue bound to the second headers exchange, i.e. to the one > this ingest process is sending to, that is not a ring queue? (If you run > qpid-config queue -r, you get a quick listing of the queues and their > bindings). I've run qpid-config queue, and all my queues have --limit-policy=ring, apart from a UUID one which I presume is qpid-config itself. Are there any other useful debugging things I can do? > If there was a queue to which messages were enqueued that started to > apply rpoducer flow control, then that would block your ingest process > (and since the messages are still coming in, the broker would spend all > its time just removing old ones to make space). I'd expect the broker to use less CPU when discarding messages rather than shipping them to consumers? But I'm saying that without much knowledge of the code! > > However given the rest of the system is idle when this problem occurs > > I can't understand why this would happen. I added a SIGALARM handler > > around send with a timeout of 30s and the process did sometimes get > > killed. Looking at qpid-tool it does seem to still be processing > > messages, just extremely slowly. My other observation is from > > netstat, the Send-Q of qpidd to the ingest process is 16363, and the > > Recv-Q and Send-Q of the ingest process are both 0. > > > > Any ideas on what might be happening are very welcome! - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
System stalling
Hi, I've got an simple processing system using the 0.22 C++ broker, all on one box, where an external system posts messages to the default headers exchange, and an ingest process receives them using a ring queue, transforms them and outputs to a different headers exchange. Various other processes pick messages of interest off that exchange using ring queues. Recently however the system has been stalling - I'm still receiving lots of data from the other system, but the ingest process suddenly goes to <5% CPU usage and its queue fills up and messages start getting discarded from the ring, the follow on processes go to practically 0% CPU and qpidd hovers around 95-120% CPU (normally its ~75%) and the rest of the system pretty much goes idle (no swapping, there is free memory) I attached to the ingest process with gdb and it was stuck in send (waitForCapacity/waitForCompletionImpl) - I notice this can block. However given the rest of the system is idle when this problem occurs I can't understand why this would happen. I added a SIGALARM handler around send with a timeout of 30s and the process did sometimes get killed. Looking at qpid-tool it does seem to still be processing messages, just extremely slowly. My other observation is from netstat, the Send-Q of qpidd to the ingest process is 16363, and the Recv-Q and Send-Q of the ingest process are both 0. Any ideas on what might be happening are very welcome! Cheers, Jimmy - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
Re: Handling queue overflows
I'll second that - thanks Gordon! - Original Message - From: Fraser Adams Sent: 07/17/13 07:05 PM To: users@qpid.apache.org Subject: Re: Handling queue overflows Nice one Gordon, thanks!! Once again you display your awesomeness :-) Frase On 17/07/13 10:02, Gordon Sim wrote: > On 07/16/2013 09:54 PM, Jimmy Jones wrote: >>> On 07/15/2013 07:05 PM, Fraser Adams wrote: >>>> I'd have quite liked the option to be able to trigger message >>>> delivery to the alternate exchange when being automatically removed >>>> from a circular queue >>> >>> That would be a fairly easy change (see attached patch if interested). >>> >>> On 07/15/2013 08:21 PM, Jimmy Jones wrote: >>>> I don't think i'll now need it, but is there interest in a limit >>>> policy of sending to an alternate exchange? >>> >>> This is similar in some ways with the functionality above, except that >>> (if I understand it correctly) you would want the newly arriving >>> messages to be rerouted, rather than the oldest (or lowest priority) >>> messages(?). Not too difficult to implement either, I don't think. >> >> I was actually after the same thing as Fraser - your attached patch does >> the trick! Is it possible for something like this to make its way into >> a release? > > I've committed it to trunk: > https://issues.apache.org/jira/browse/QPID-4993, > https://svn.apache.org/r1504058. I don't believe we have branched yet > for 0.24 (though that is imminent), so this may just have squeezed in > for that. > > > - > To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org > For additional commands, e-mail: users-h...@qpid.apache.org > - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
Re: Handling queue overflows
> On 07/15/2013 07:05 PM, Fraser Adams wrote: > > I'd have quite liked the option to be able to trigger message > > delivery to the alternate exchange when being automatically removed > > from a circular queue > > That would be a fairly easy change (see attached patch if interested). > > On 07/15/2013 08:21 PM, Jimmy Jones wrote: > > I don't think i'll now need it, but is there interest in a limit > > policy of sending to an alternate exchange? > > This is similar in some ways with the functionality above, except that > (if I understand it correctly) you would want the newly arriving > messages to be rerouted, rather than the oldest (or lowest priority) > messages(?). Not too difficult to implement either, I don't think. I was actually after the same thing as Fraser - your attached patch does the trick! Is it possible for something like this to make its way into a release? Thanks, Jimmy - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
Re: Handling queue overflows
Hi Gordon, Thanks for your swift and helpful reply! I had wrongly assumed that the pagesize was fixed at the platform page size. I think paging should suit my needs, I'll give it a go once 0.24 is released. I don't think i'll now need it, but is there interest in a limit policy of sending to an alternate exchange? Cheers, Jimmy - Original Message - From: Gordon Sim Sent: 07/15/13 02:59 PM To: users@qpid.apache.org Subject: Re: Handling queue overflows On 07/15/2013 02:01 PM, Jimmy Jones wrote: > Hi, > > I've got a system which can sometimes be a bit bursty, which would exhaust > system memory if the queues were left unchecked. Therefore I've been using > ring queues, which solve the problem quite nicely, apart from what happens to > the "excess" messages. Ideally I'd like to buffer them to disk and process > them at a later, quieter time. I've been digging around and can see a few > options: > > 1) 0.24 will have flow to disk, which would be perfect but sometimes my > messages are quite big (eg. 10MB) and this requires messages to be smaller > than a page. Is this limitation likely to be removed soon? The old mechanism (removed in 0.20) was called 'flow to disk'. I prefer to call the newer feature (to be released with 0.24) 'paging' or paged queue. Though it is true that the queues page size must be as large as the largest message, you can configure that page size. So you could have just a few pages allowed in memory per queue, but have each page be 10MB (the page size is configured as a multiple of the platform page size). As to whether it is likely that the implementation gets updated to allow a message to span multiple pages... I'd say probably not. To be able to dispatch the message in parts without having the entire thing in memory would require a fair bit of work. And without that I don't see a great advantage over just having bigger pages. (Unless I'm missing something?) > 2) 0.24 allows a "backup engine" to take over a loaded queue (QPID-4650), but > this looks like it'd require a fair bit of legwork to implement said engine. > 3) alternate-exchanges. These look pretty good for my needs, but I can't seem > to get them to work! From reading some documentation, I thought they'd good > with a limit policy of reject - MRG 2 Installation & Configuration guide, > 4.8.2 says for an alternate exchange specified for a queue: "Messages that > are acquired and then rejected by a message consumer". However if I run the > test below, messages only get routed to the alternate exchange when the queue > is destroyed while containing messages, and not when messages are rejected > because the queue is full. Presumably calling Session::reject would cause it > to go to the alternate exchange, but should a limit policy of reject be the > same? The 'reject' policy is probably a little misleading given the other use of reject. WHat a 'reject' policy actually does is raise an AMQP 0-10 exception when the limit is reached, which effectively ends the session. Such messages are never routed to the alternate-exchange of the exchange or the queue. Having a client reject rather than accept a message is in fact entirely different, despite the (confusing) similarity in name. I have also just added a new policy that causes a queue to self destruct when it reaches the preconfigured limit. That could possibly be of interest in conjunction with an alternate-exchange. What would happen would be that at the point the limit is reached, the queue will delete itself, re-routing any orphaned messages to the alternate-exchange if set. The deletion of the queue will result in any subscribing session being terminated, but won't result in the publishers session hitting an exception. The issue there however is that messages published while the queue doesn't exist (i.e. before the subscriber re-establishes the session and recreates it) would be dropped (unless of course there were then no matching bindings in which case it would be rerouted to the exchange's alternate-exchange). I suspect having spelled that all out it won't be a terribly appealing path... > --8<-- > > qpid-config add exchange headers test1 > qpid-config add exchange headers test1-overflow > > # drain for messages in normal case > ./drain -f "normal; { create: receiver, node: {type: queue, x-declare: > {exclusive: True, alternate-exchange: 'test1-overflow', arguments: > {'qpid.max_size': 1024, 'qpid.policy_type': 'reject'}}, x-bindings: > [{exchange: test1, arguments:{x-match:any, data-format: xyz}}]}}" > > # drain for messages in overflow case > ./drain -f "overflow; { create: rec
Handling queue overflows
Hi, I've got a system which can sometimes be a bit bursty, which would exhaust system memory if the queues were left unchecked. Therefore I've been using ring queues, which solve the problem quite nicely, apart from what happens to the "excess" messages. Ideally I'd like to buffer them to disk and process them at a later, quieter time. I've been digging around and can see a few options: 1) 0.24 will have flow to disk, which would be perfect but sometimes my messages are quite big (eg. 10MB) and this requires messages to be smaller than a page. Is this limitation likely to be removed soon? 2) 0.24 allows a "backup engine" to take over a loaded queue (QPID-4650), but this looks like it'd require a fair bit of legwork to implement said engine. 3) alternate-exchanges. These look pretty good for my needs, but I can't seem to get them to work! From reading some documentation, I thought they'd good with a limit policy of reject - MRG 2 Installation & Configuration guide, 4.8.2 says for an alternate exchange specified for a queue: "Messages that are acquired and then rejected by a message consumer". However if I run the test below, messages only get routed to the alternate exchange when the queue is destroyed while containing messages, and not when messages are rejected because the queue is full. Presumably calling Session::reject would cause it to go to the alternate exchange, but should a limit policy of reject be the same? Any ideas very welcome! Jimmy --8<-- qpid-config add exchange headers test1 qpid-config add exchange headers test1-overflow # drain for messages in normal case ./drain -f "normal; { create: receiver, node: {type: queue, x-declare: {exclusive: True, alternate-exchange: 'test1-overflow', arguments: {'qpid.max_size': 1024, 'qpid.policy_type': 'reject'}}, x-bindings: [{exchange: test1, arguments:{x-match:any, data-format: xyz}}]}}" # drain for messages in overflow case ./drain -f "overflow; { create: receiver, node: {type: queue, x-declare: {exclusive: True, arguments: {'qpid.max_size': 1024000, 'qpid.policy_type': 'ring'}}, x-bindings: [{exchange: test1-overflow, arguments:{x-match:any, data-format: xyz}}]}}" ./spout --content test -c 5 --property data-format=xyz test1 # works as expected, messages received by normal drain ./spout --content test -c 5 --property data-format=xyz test1-overflow # works as expected, messages received by overflow drain # Now ctrl-c normal drain, and queue will remain # Send loads of messages, fills up q1 ./spout --content test -c 500 --property data-format=xyz test1 # Blocks... and no messages sent to overflow drain qpid-config del queue normal --force # now messages appear in overflow drain - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
Munin monitoring plugin
Hi, For those who might be interested, I've just had my munin plugin to monitor qpid queues (msg depth, byte depth, ring discards, msg rate, byte rate) integrated into the munin-contrib repo. It's nowhere near as advanced as Cumin or Fraser Adams' GUI, but is good if you already use munin or want some simple graphs for monitoring. Cheers, Jimmy - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org