Re: Dispatch Router: Wow. Large message test with different buffer sizes

2021-02-17 Thread Robbie Gemmell
On Wed, 17 Feb 2021 at 13:53, Michael Goulish  wrote:
>
> Robbie -- thanks for questions!
>
>
> *The senders are noted as having a 10ms delay between sends, how** exactly
> is that achieved?*
>
> My client (both sender and receiver are same program, different flags) is
> implemented in C using the Proactor interface.  When I run the sender
> 'throttled' here is what happens:
>
>   * When the sender gets a FLOW event, it calls pn_proactor_set_timeout()
> to set a timeout of N milliseconds, where N is the integer argument to the
> command line 'throttle' flag.
>
>   * N milliseconds later, the sender gets the PN_PROACTOR_TIMEOUT event.
> Then I 'wake' the connection.
>
>   * When the sender gets the WAKE event  -- if it has not already sent all
> its messages -- it sends one message -- and sets the timer again to the
> same value.
>
> So, if I set a value of 10 msec for the throttle, the sender will send just
> a little less than 100 messages per second.  A little less because it takes
> a little bit of time (very little) to actually send one message.
>
>

Ok, mainly I was just looking to tease out that it wasnt e.g. pausing
the reactor thread and effectively batching up sends for later a
single later IO.

>
> *Do the receivers receive flat out? *
>
> Yes, there is no form of throttling on the receivers.
>
>
> *Is that 1000 credit window from the receiver to router, or from the router
> to the sender, or both?*
>
> Credit is granted by the receiver and used by the sender. When the sender
> is *not* throttled, it just sends messages as fast as ever it can, until
> credit is exhausted.
>
> However I do *not* think that the router is able to simply pass on the
> number that it got from the receiver all the way back to the sender. I
> think the credit number that the sender gets is probably determined only by
> the configured  'capacity' of the router listener it is talking to.
>

Right, this is why I asked. Unless you are using link-routing, then
what the client receiver grants has no bearing on what the client
sender gets from the router inbetween them.

So your receiver is able running flat out, issuing 1000 credits, while
the sender is throttled to 100 msg/sec and only gets whatever credit
the router gives it (250 was the last default I think I recall?).
Seems very tilted to the receivers being faster, and seeming like they
should always be able to keep up if the router does.

>
> *was there any discernible difference in the 512b test alone at the point
> the receive throughput looks to reduce?*
>
> I didn't have the eyes to see anything changing at that time. I didn't know
> there was that weird inflection point until I graphed the data -- I assumed
> it was just gradually slowing down.
>
> I fully expect we are going to decide to standardize on a larger buffer
> size -- probably 2K -- depending on tests that I am about to do on AMQP.
> Once I do the AMQP tests to support that decision I hope to pursue that
> interesting little inflection point fiercely.
>

I think it's worth pursuing.

The test doesn't seem like an especially overtaxing scenario to me,
and indeed everything apparently handles it fine for >200sec, with the
graphed throughput suggesting there should be no real backlog, until
things suddenly changed and throughput dropped. It's not obvious to me
why the buffer size should make such distinct (or really, any)
difference in that case, and might it doing so might suggest something
interesting? I could understand it reducing CPU usage due to doing
less work, but not so much the rest unless its maxed out already. If
not, perhaps increasing the size is just going to be covering
something up, e.g somehow just delaying the same dropoff situation
happening until a later unknown point or requiring some more intense
load level to get into it.

Ted is right that using session flow control in addition would be
useful, but as in this test each sender maxes at 20MB/s based on their
delayed semds, and the receivers should easily outstrip them, I'm not
sure I would expect it to make a difference in this test unless
something else is already awry.



>
>
>
> On Mon, Feb 15, 2021 at 12:21 PM Robbie Gemmell 
> wrote:
>
> > On Sat, 13 Feb 2021 at 16:40, Ted Ross  wrote:
> > >
> > > On Fri, Feb 12, 2021 at 1:47 PM Michael Goulish 
> > wrote:
> > >
> > > > Well, *this* certainly made a difference!
> > > > I tried this test:
> > > >
> > > > *message size:*  20 bytes
> > > > *client-pairs:*  10
> > > > *sender pause between messages:* 10 msec
> > > > *messages per sender:*   10,000
> > > >* credit window:* 1000
> > > >
> > > >
> > > >
> > > >
> > > >   *Results:*
> > > >
> > > >   router buffer size
> > > >512 bytes4K bytes
> > > >   ---
> > > >CPU517%  102%
> > > >Mem711 MB59 MB
> > > >Latency26.9 *seconds*  

Re: Dispatch Router: Wow. Large message test with different buffer sizes

2021-02-17 Thread Michael Goulish
Robbie -- thanks for questions!


*The senders are noted as having a 10ms delay between sends, how** exactly
is that achieved?*

My client (both sender and receiver are same program, different flags) is
implemented in C using the Proactor interface.  When I run the sender
'throttled' here is what happens:

  * When the sender gets a FLOW event, it calls pn_proactor_set_timeout()
to set a timeout of N milliseconds, where N is the integer argument to the
command line 'throttle' flag.

  * N milliseconds later, the sender gets the PN_PROACTOR_TIMEOUT event.
Then I 'wake' the connection.

  * When the sender gets the WAKE event  -- if it has not already sent all
its messages -- it sends one message -- and sets the timer again to the
same value.

So, if I set a value of 10 msec for the throttle, the sender will send just
a little less than 100 messages per second.  A little less because it takes
a little bit of time (very little) to actually send one message.



*Do the receivers receive flat out? *

Yes, there is no form of throttling on the receivers.


*Is that 1000 credit window from the receiver to router, or from the router
to the sender, or both?*

Credit is granted by the receiver and used by the sender. When the sender
is *not* throttled, it just sends messages as fast as ever it can, until
credit is exhausted.

However I do *not* think that the router is able to simply pass on the
number that it got from the receiver all the way back to the sender. I
think the credit number that the sender gets is probably determined only by
the configured  'capacity' of the router listener it is talking to.


*was there any discernible difference in the 512b test alone at the point
the receive throughput looks to reduce?*

I didn't have the eyes to see anything changing at that time. I didn't know
there was that weird inflection point until I graphed the data -- I assumed
it was just gradually slowing down.

I fully expect we are going to decide to standardize on a larger buffer
size -- probably 2K -- depending on tests that I am about to do on AMQP.
Once I do the AMQP tests to support that decision I hope to pursue that
interesting little inflection point fiercely.




On Mon, Feb 15, 2021 at 12:21 PM Robbie Gemmell 
wrote:

> On Sat, 13 Feb 2021 at 16:40, Ted Ross  wrote:
> >
> > On Fri, Feb 12, 2021 at 1:47 PM Michael Goulish 
> wrote:
> >
> > > Well, *this* certainly made a difference!
> > > I tried this test:
> > >
> > > *message size:*  20 bytes
> > > *client-pairs:*  10
> > > *sender pause between messages:* 10 msec
> > > *messages per sender:*   10,000
> > >* credit window:* 1000
> > >
> > >
> > >
> > >
> > >   *Results:*
> > >
> > >   router buffer size
> > >512 bytes4K bytes
> > >   ---
> > >CPU517%  102%
> > >Mem711 MB59 MB
> > >Latency26.9 *seconds*  2.486 *msec*
> > >
> > >
> > > So with the large messages and our normal buffer size of 1/2 K, the
> router
> > > just got overwhelmed. What I recorded was average memory usage, but
> looking
> > > at the time sequence I see that its memory kept increasing steadily
> until
> > > the end of the test.
> > >
> >
> > With the large messages, the credit window is not sufficient to protect
> the
> > memory of the router.  I think this test needs to use a limited session
> > window as well.  This will put back-pressure on the senders much earlier
> in
> > the test.  With 200Kbyte messages x 1000 credits x 10 senders, there's a
> > theoretical maximum of 2Gig of proton buffer memory that can be consumed
> > before the router core ever moves any data.  It's interesting that in the
> > 4K-buffer case, the router core keeps up with the flow and in the
> 512-byte
> > case, it does not.
>
> The senders are noted as having a 10ms delay between sends, how
> exactly is that achieved? Do the receivers receive flat out? Is that
> 1000 credit window from the receiver to router, or from the router to
> the sender, or both?
>
> If the senders are slow compared to the receivers, and only 200MB/sec
> max is actually hitting the router as a result of the governed sends,
> I'm somewhat surprised the router would ever seem to accumulate as
> much data (noted as an average; any idea what was the peak?) in such a
> test unless something odd/interesting starts happening to it at the
> smaller buffer size after some time. From the other mail it seems it
> all plays nicely for > 200 seconds and only then starts to behave
> differently, since delivery speed over time appears as expected from
> the governed sends meaning there should be no accumulation, and then
> it noticeably reduces meaning there must be some if the sends
> maintained their rate. There is a clear disparity in the CPU result
> between the two tests; was there any discernible difference in the
> 512b test alone 

Re: Dispatch Router: Wow. Large message test with different buffer sizes

2021-02-15 Thread Robbie Gemmell
On Sat, 13 Feb 2021 at 16:40, Ted Ross  wrote:
>
> On Fri, Feb 12, 2021 at 1:47 PM Michael Goulish  wrote:
>
> > Well, *this* certainly made a difference!
> > I tried this test:
> >
> > *message size:*  20 bytes
> > *client-pairs:*  10
> > *sender pause between messages:* 10 msec
> > *messages per sender:*   10,000
> >* credit window:* 1000
> >
> >
> >
> >
> >   *Results:*
> >
> >   router buffer size
> >512 bytes4K bytes
> >   ---
> >CPU517%  102%
> >Mem711 MB59 MB
> >Latency26.9 *seconds*  2.486 *msec*
> >
> >
> > So with the large messages and our normal buffer size of 1/2 K, the router
> > just got overwhelmed. What I recorded was average memory usage, but looking
> > at the time sequence I see that its memory kept increasing steadily until
> > the end of the test.
> >
>
> With the large messages, the credit window is not sufficient to protect the
> memory of the router.  I think this test needs to use a limited session
> window as well.  This will put back-pressure on the senders much earlier in
> the test.  With 200Kbyte messages x 1000 credits x 10 senders, there's a
> theoretical maximum of 2Gig of proton buffer memory that can be consumed
> before the router core ever moves any data.  It's interesting that in the
> 4K-buffer case, the router core keeps up with the flow and in the 512-byte
> case, it does not.

The senders are noted as having a 10ms delay between sends, how
exactly is that achieved? Do the receivers receive flat out? Is that
1000 credit window from the receiver to router, or from the router to
the sender, or both?

If the senders are slow compared to the receivers, and only 200MB/sec
max is actually hitting the router as a result of the governed sends,
I'm somewhat surprised the router would ever seem to accumulate as
much data (noted as an average; any idea what was the peak?) in such a
test unless something odd/interesting starts happening to it at the
smaller buffer size after some time. From the other mail it seems it
all plays nicely for > 200 seconds and only then starts to behave
differently, since delivery speed over time appears as expected from
the governed sends meaning there should be no accumulation, and then
it noticeably reduces meaning there must be some if the sends
maintained their rate. There is a clear disparity in the CPU result
between the two tests; was there any discernible difference in the
512b test alone at the point the receive throughput looks to reduce?


>
> It appears that increasing the buffer size is a good idea.  I don't think
> we've figured out how much the increase should be, however.  We should look
> at interim sizes:  1K, 2K, maybe 1.5K and 3K.  We want the smallest buffer
> size that gives us acceptable performance.  If throughput, CPU, and memory
> use improve sharply with buffer size then level off, let's identify the
> "knee of the curve" and see what buffer size that represents.
>
>
> >
> > Messages just sat there waiting to get processed, which is maybe why their
> > average latency was *10,000 times longer* than when I used the large
> > buffers.
> >
> > And Nothing Bad Happened in the 4K buffer test. No crash, all messages
> > delivered, normal shutdown.
> >
> > Now I will try a long-duration test to see if it survives that while using
> > the large buffers.
> >
> > If it does survive OK, we need to see what happens with large buffers as
> > message size varies from small to large.
> >

-
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org



Re: Dispatch Router: Wow. Large message test with different buffer sizes

2021-02-13 Thread Ted Ross
On Fri, Feb 12, 2021 at 1:47 PM Michael Goulish  wrote:

> Well, *this* certainly made a difference!
> I tried this test:
>
> *message size:*  20 bytes
> *client-pairs:*  10
> *sender pause between messages:* 10 msec
> *messages per sender:*   10,000
>* credit window:* 1000
>
>
>
>
>   *Results:*
>
>   router buffer size
>512 bytes4K bytes
>   ---
>CPU517%  102%
>Mem711 MB59 MB
>Latency26.9 *seconds*  2.486 *msec*
>
>
> So with the large messages and our normal buffer size of 1/2 K, the router
> just got overwhelmed. What I recorded was average memory usage, but looking
> at the time sequence I see that its memory kept increasing steadily until
> the end of the test.
>

With the large messages, the credit window is not sufficient to protect the
memory of the router.  I think this test needs to use a limited session
window as well.  This will put back-pressure on the senders much earlier in
the test.  With 200Kbyte messages x 1000 credits x 10 senders, there's a
theoretical maximum of 2Gig of proton buffer memory that can be consumed
before the router core ever moves any data.  It's interesting that in the
4K-buffer case, the router core keeps up with the flow and in the 512-byte
case, it does not.

It appears that increasing the buffer size is a good idea.  I don't think
we've figured out how much the increase should be, however.  We should look
at interim sizes:  1K, 2K, maybe 1.5K and 3K.  We want the smallest buffer
size that gives us acceptable performance.  If throughput, CPU, and memory
use improve sharply with buffer size then level off, let's identify the
"knee of the curve" and see what buffer size that represents.


>
> Messages just sat there waiting to get processed, which is maybe why their
> average latency was *10,000 times longer* than when I used the large
> buffers.
>
> And Nothing Bad Happened in the 4K buffer test. No crash, all messages
> delivered, normal shutdown.
>
> Now I will try a long-duration test to see if it survives that while using
> the large buffers.
>
> If it does survive OK, we need to see what happens with large buffers as
> message size varies from small to large.
>


Dispatch Router: Wow. Large message test with different buffer sizes

2021-02-12 Thread Michael Goulish
Well, *this* certainly made a difference!
I tried this test:

*message size:*  20 bytes
*client-pairs:*  10
*sender pause between messages:* 10 msec
*messages per sender:*   10,000
   * credit window:* 1000




  *Results:*

  router buffer size
   512 bytes4K bytes
  ---
   CPU517%  102%
   Mem711 MB59 MB
   Latency26.9 *seconds*  2.486 *msec*


So with the large messages and our normal buffer size of 1/2 K, the router
just got overwhelmed. What I recorded was average memory usage, but looking
at the time sequence I see that its memory kept increasing steadily until
the end of the test.

Messages just sat there waiting to get processed, which is maybe why their
average latency was *10,000 times longer* than when I used the large
buffers.

And Nothing Bad Happened in the 4K buffer test. No crash, all messages
delivered, normal shutdown.

Now I will try a long-duration test to see if it survives that while using
the large buffers.

If it does survive OK, we need to see what happens with large buffers as
message size varies from small to large.