Re: Dispatch Router: Wow. Large message test with different buffer sizes
On Wed, 17 Feb 2021 at 13:53, Michael Goulish wrote: > > Robbie -- thanks for questions! > > > *The senders are noted as having a 10ms delay between sends, how** exactly > is that achieved?* > > My client (both sender and receiver are same program, different flags) is > implemented in C using the Proactor interface. When I run the sender > 'throttled' here is what happens: > > * When the sender gets a FLOW event, it calls pn_proactor_set_timeout() > to set a timeout of N milliseconds, where N is the integer argument to the > command line 'throttle' flag. > > * N milliseconds later, the sender gets the PN_PROACTOR_TIMEOUT event. > Then I 'wake' the connection. > > * When the sender gets the WAKE event -- if it has not already sent all > its messages -- it sends one message -- and sets the timer again to the > same value. > > So, if I set a value of 10 msec for the throttle, the sender will send just > a little less than 100 messages per second. A little less because it takes > a little bit of time (very little) to actually send one message. > > Ok, mainly I was just looking to tease out that it wasnt e.g. pausing the reactor thread and effectively batching up sends for later a single later IO. > > *Do the receivers receive flat out? * > > Yes, there is no form of throttling on the receivers. > > > *Is that 1000 credit window from the receiver to router, or from the router > to the sender, or both?* > > Credit is granted by the receiver and used by the sender. When the sender > is *not* throttled, it just sends messages as fast as ever it can, until > credit is exhausted. > > However I do *not* think that the router is able to simply pass on the > number that it got from the receiver all the way back to the sender. I > think the credit number that the sender gets is probably determined only by > the configured 'capacity' of the router listener it is talking to. > Right, this is why I asked. Unless you are using link-routing, then what the client receiver grants has no bearing on what the client sender gets from the router inbetween them. So your receiver is able running flat out, issuing 1000 credits, while the sender is throttled to 100 msg/sec and only gets whatever credit the router gives it (250 was the last default I think I recall?). Seems very tilted to the receivers being faster, and seeming like they should always be able to keep up if the router does. > > *was there any discernible difference in the 512b test alone at the point > the receive throughput looks to reduce?* > > I didn't have the eyes to see anything changing at that time. I didn't know > there was that weird inflection point until I graphed the data -- I assumed > it was just gradually slowing down. > > I fully expect we are going to decide to standardize on a larger buffer > size -- probably 2K -- depending on tests that I am about to do on AMQP. > Once I do the AMQP tests to support that decision I hope to pursue that > interesting little inflection point fiercely. > I think it's worth pursuing. The test doesn't seem like an especially overtaxing scenario to me, and indeed everything apparently handles it fine for >200sec, with the graphed throughput suggesting there should be no real backlog, until things suddenly changed and throughput dropped. It's not obvious to me why the buffer size should make such distinct (or really, any) difference in that case, and might it doing so might suggest something interesting? I could understand it reducing CPU usage due to doing less work, but not so much the rest unless its maxed out already. If not, perhaps increasing the size is just going to be covering something up, e.g somehow just delaying the same dropoff situation happening until a later unknown point or requiring some more intense load level to get into it. Ted is right that using session flow control in addition would be useful, but as in this test each sender maxes at 20MB/s based on their delayed semds, and the receivers should easily outstrip them, I'm not sure I would expect it to make a difference in this test unless something else is already awry. > > > > On Mon, Feb 15, 2021 at 12:21 PM Robbie Gemmell > wrote: > > > On Sat, 13 Feb 2021 at 16:40, Ted Ross wrote: > > > > > > On Fri, Feb 12, 2021 at 1:47 PM Michael Goulish > > wrote: > > > > > > > Well, *this* certainly made a difference! > > > > I tried this test: > > > > > > > > *message size:* 20 bytes > > > > *client-pairs:* 10 > > > > *sender pause between messages:* 10 msec > > > > *messages per sender:* 10,000 > > > >* credit window:* 1000 > > > > > > > > > > > > > > > > > > > > *Results:* > > > > > > > > router buffer size > > > >512 bytes4K bytes > > > > --- > > > >CPU517% 102% > > > >Mem711 MB59 MB > > > >Latency26.9 *seconds*
Re: Dispatch Router: Wow. Large message test with different buffer sizes
Robbie -- thanks for questions! *The senders are noted as having a 10ms delay between sends, how** exactly is that achieved?* My client (both sender and receiver are same program, different flags) is implemented in C using the Proactor interface. When I run the sender 'throttled' here is what happens: * When the sender gets a FLOW event, it calls pn_proactor_set_timeout() to set a timeout of N milliseconds, where N is the integer argument to the command line 'throttle' flag. * N milliseconds later, the sender gets the PN_PROACTOR_TIMEOUT event. Then I 'wake' the connection. * When the sender gets the WAKE event -- if it has not already sent all its messages -- it sends one message -- and sets the timer again to the same value. So, if I set a value of 10 msec for the throttle, the sender will send just a little less than 100 messages per second. A little less because it takes a little bit of time (very little) to actually send one message. *Do the receivers receive flat out? * Yes, there is no form of throttling on the receivers. *Is that 1000 credit window from the receiver to router, or from the router to the sender, or both?* Credit is granted by the receiver and used by the sender. When the sender is *not* throttled, it just sends messages as fast as ever it can, until credit is exhausted. However I do *not* think that the router is able to simply pass on the number that it got from the receiver all the way back to the sender. I think the credit number that the sender gets is probably determined only by the configured 'capacity' of the router listener it is talking to. *was there any discernible difference in the 512b test alone at the point the receive throughput looks to reduce?* I didn't have the eyes to see anything changing at that time. I didn't know there was that weird inflection point until I graphed the data -- I assumed it was just gradually slowing down. I fully expect we are going to decide to standardize on a larger buffer size -- probably 2K -- depending on tests that I am about to do on AMQP. Once I do the AMQP tests to support that decision I hope to pursue that interesting little inflection point fiercely. On Mon, Feb 15, 2021 at 12:21 PM Robbie Gemmell wrote: > On Sat, 13 Feb 2021 at 16:40, Ted Ross wrote: > > > > On Fri, Feb 12, 2021 at 1:47 PM Michael Goulish > wrote: > > > > > Well, *this* certainly made a difference! > > > I tried this test: > > > > > > *message size:* 20 bytes > > > *client-pairs:* 10 > > > *sender pause between messages:* 10 msec > > > *messages per sender:* 10,000 > > >* credit window:* 1000 > > > > > > > > > > > > > > > *Results:* > > > > > > router buffer size > > >512 bytes4K bytes > > > --- > > >CPU517% 102% > > >Mem711 MB59 MB > > >Latency26.9 *seconds* 2.486 *msec* > > > > > > > > > So with the large messages and our normal buffer size of 1/2 K, the > router > > > just got overwhelmed. What I recorded was average memory usage, but > looking > > > at the time sequence I see that its memory kept increasing steadily > until > > > the end of the test. > > > > > > > With the large messages, the credit window is not sufficient to protect > the > > memory of the router. I think this test needs to use a limited session > > window as well. This will put back-pressure on the senders much earlier > in > > the test. With 200Kbyte messages x 1000 credits x 10 senders, there's a > > theoretical maximum of 2Gig of proton buffer memory that can be consumed > > before the router core ever moves any data. It's interesting that in the > > 4K-buffer case, the router core keeps up with the flow and in the > 512-byte > > case, it does not. > > The senders are noted as having a 10ms delay between sends, how > exactly is that achieved? Do the receivers receive flat out? Is that > 1000 credit window from the receiver to router, or from the router to > the sender, or both? > > If the senders are slow compared to the receivers, and only 200MB/sec > max is actually hitting the router as a result of the governed sends, > I'm somewhat surprised the router would ever seem to accumulate as > much data (noted as an average; any idea what was the peak?) in such a > test unless something odd/interesting starts happening to it at the > smaller buffer size after some time. From the other mail it seems it > all plays nicely for > 200 seconds and only then starts to behave > differently, since delivery speed over time appears as expected from > the governed sends meaning there should be no accumulation, and then > it noticeably reduces meaning there must be some if the sends > maintained their rate. There is a clear disparity in the CPU result > between the two tests; was there any discernible difference in the > 512b test alone
Re: Dispatch Router: Wow. Large message test with different buffer sizes
On Sat, 13 Feb 2021 at 16:40, Ted Ross wrote: > > On Fri, Feb 12, 2021 at 1:47 PM Michael Goulish wrote: > > > Well, *this* certainly made a difference! > > I tried this test: > > > > *message size:* 20 bytes > > *client-pairs:* 10 > > *sender pause between messages:* 10 msec > > *messages per sender:* 10,000 > >* credit window:* 1000 > > > > > > > > > > *Results:* > > > > router buffer size > >512 bytes4K bytes > > --- > >CPU517% 102% > >Mem711 MB59 MB > >Latency26.9 *seconds* 2.486 *msec* > > > > > > So with the large messages and our normal buffer size of 1/2 K, the router > > just got overwhelmed. What I recorded was average memory usage, but looking > > at the time sequence I see that its memory kept increasing steadily until > > the end of the test. > > > > With the large messages, the credit window is not sufficient to protect the > memory of the router. I think this test needs to use a limited session > window as well. This will put back-pressure on the senders much earlier in > the test. With 200Kbyte messages x 1000 credits x 10 senders, there's a > theoretical maximum of 2Gig of proton buffer memory that can be consumed > before the router core ever moves any data. It's interesting that in the > 4K-buffer case, the router core keeps up with the flow and in the 512-byte > case, it does not. The senders are noted as having a 10ms delay between sends, how exactly is that achieved? Do the receivers receive flat out? Is that 1000 credit window from the receiver to router, or from the router to the sender, or both? If the senders are slow compared to the receivers, and only 200MB/sec max is actually hitting the router as a result of the governed sends, I'm somewhat surprised the router would ever seem to accumulate as much data (noted as an average; any idea what was the peak?) in such a test unless something odd/interesting starts happening to it at the smaller buffer size after some time. From the other mail it seems it all plays nicely for > 200 seconds and only then starts to behave differently, since delivery speed over time appears as expected from the governed sends meaning there should be no accumulation, and then it noticeably reduces meaning there must be some if the sends maintained their rate. There is a clear disparity in the CPU result between the two tests; was there any discernible difference in the 512b test alone at the point the receive throughput looks to reduce? > > It appears that increasing the buffer size is a good idea. I don't think > we've figured out how much the increase should be, however. We should look > at interim sizes: 1K, 2K, maybe 1.5K and 3K. We want the smallest buffer > size that gives us acceptable performance. If throughput, CPU, and memory > use improve sharply with buffer size then level off, let's identify the > "knee of the curve" and see what buffer size that represents. > > > > > > Messages just sat there waiting to get processed, which is maybe why their > > average latency was *10,000 times longer* than when I used the large > > buffers. > > > > And Nothing Bad Happened in the 4K buffer test. No crash, all messages > > delivered, normal shutdown. > > > > Now I will try a long-duration test to see if it survives that while using > > the large buffers. > > > > If it does survive OK, we need to see what happens with large buffers as > > message size varies from small to large. > > - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
Re: Dispatch Router: Wow. Large message test with different buffer sizes
On Fri, Feb 12, 2021 at 1:47 PM Michael Goulish wrote: > Well, *this* certainly made a difference! > I tried this test: > > *message size:* 20 bytes > *client-pairs:* 10 > *sender pause between messages:* 10 msec > *messages per sender:* 10,000 >* credit window:* 1000 > > > > > *Results:* > > router buffer size >512 bytes4K bytes > --- >CPU517% 102% >Mem711 MB59 MB >Latency26.9 *seconds* 2.486 *msec* > > > So with the large messages and our normal buffer size of 1/2 K, the router > just got overwhelmed. What I recorded was average memory usage, but looking > at the time sequence I see that its memory kept increasing steadily until > the end of the test. > With the large messages, the credit window is not sufficient to protect the memory of the router. I think this test needs to use a limited session window as well. This will put back-pressure on the senders much earlier in the test. With 200Kbyte messages x 1000 credits x 10 senders, there's a theoretical maximum of 2Gig of proton buffer memory that can be consumed before the router core ever moves any data. It's interesting that in the 4K-buffer case, the router core keeps up with the flow and in the 512-byte case, it does not. It appears that increasing the buffer size is a good idea. I don't think we've figured out how much the increase should be, however. We should look at interim sizes: 1K, 2K, maybe 1.5K and 3K. We want the smallest buffer size that gives us acceptable performance. If throughput, CPU, and memory use improve sharply with buffer size then level off, let's identify the "knee of the curve" and see what buffer size that represents. > > Messages just sat there waiting to get processed, which is maybe why their > average latency was *10,000 times longer* than when I used the large > buffers. > > And Nothing Bad Happened in the 4K buffer test. No crash, all messages > delivered, normal shutdown. > > Now I will try a long-duration test to see if it survives that while using > the large buffers. > > If it does survive OK, we need to see what happens with large buffers as > message size varies from small to large. >
Dispatch Router: Wow. Large message test with different buffer sizes
Well, *this* certainly made a difference! I tried this test: *message size:* 20 bytes *client-pairs:* 10 *sender pause between messages:* 10 msec *messages per sender:* 10,000 * credit window:* 1000 *Results:* router buffer size 512 bytes4K bytes --- CPU517% 102% Mem711 MB59 MB Latency26.9 *seconds* 2.486 *msec* So with the large messages and our normal buffer size of 1/2 K, the router just got overwhelmed. What I recorded was average memory usage, but looking at the time sequence I see that its memory kept increasing steadily until the end of the test. Messages just sat there waiting to get processed, which is maybe why their average latency was *10,000 times longer* than when I used the large buffers. And Nothing Bad Happened in the 4K buffer test. No crash, all messages delivered, normal shutdown. Now I will try a long-duration test to see if it survives that while using the large buffers. If it does survive OK, we need to see what happens with large buffers as message size varies from small to large.