Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

2019-03-25 Thread Willy Tarreau
Hi Ashwin,

On Mon, Mar 25, 2019 at 02:51:17PM -0700, Ashwin Neerabail wrote:
> Hi Willy,
> 
> I tested against the latest version in the haproxy source repo.
> Things got significantly worse. Even median latencies have shot up to 150ms
> ( compared to 4ms for haproxy 1.8)
> p99 shot up above 1second.
> One strange thing I observed in the stats page is nbthread shows up as 64 (
> its 1 for HAProxy 1.8). I am using the exact same configuration across both
> versions.

That's expected after 2.0-dev1, please have a look at the dev2 announce I
just sent. If you want a single thread, please just set "nbthread 1".

> ctime , rtime are reporting higher values for 2.0 ( though 1.8 is fine
> against the same backends at the same time)

What exact version did you try ? I've just issued 2.0-dev2 with important
fixes for issues that were causing some streams to starve for a while and
now I can't reproduce any such delay anymore. But given that some of the
pending issues were addressed yesterday evening, you guess why I'm
interested in knowing which exact version you tried ;-)

Thanks,
Willy



Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

2019-03-25 Thread Ashwin Neerabail
Hi Willy,

I tested against the latest version in the haproxy source repo.
Things got significantly worse. Even median latencies have shot up to 150ms
( compared to 4ms for haproxy 1.8)
p99 shot up above 1second.
One strange thing I observed in the stats page is nbthread shows up as 64 (
its 1 for HAProxy 1.8). I am using the exact same configuration across both
versions.
ctime , rtime are reporting higher values for 2.0 ( though 1.8 is fine
against the same backends at the same time)

-Ashwin



Thanks,
Ashwin

On Fri, Mar 22, 2019 at 11:03 AM Ashwin Neerabail  wrote:

> Hey Willy,
>
> Thats great news. Thanks for the quick action.
> I will verify and get back.
>
> Thanks,
> Ashwin
>
> On Fri, Mar 22, 2019 at 10:19 AM Willy Tarreau  wrote:
>
>> Hi Ashwin,
>>
>> We have found the root cause of this. The H2 streams were not getting
>> the fairness they deserved due to their wake-up ordering : it happened
>> very often that a stream interrupted on a ux buffer full condition could
>> be placed at the end of the list and/or its place preempted by another
>> stream trying to send for the first time.
>>
>> We've pushed all the fixes for this in 2.0-dev for now and I'll backport
>> them to 1.9 early next week. It would be nice if you could give it a try
>> to confirm that it's now OK for you.
>>
>> Cheers,
>> Willy
>>
>


Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

2019-03-22 Thread Ashwin Neerabail
Hey Willy,

Thats great news. Thanks for the quick action.
I will verify and get back.

Thanks,
Ashwin

On Fri, Mar 22, 2019 at 10:19 AM Willy Tarreau  wrote:

> Hi Ashwin,
>
> We have found the root cause of this. The H2 streams were not getting
> the fairness they deserved due to their wake-up ordering : it happened
> very often that a stream interrupted on a ux buffer full condition could
> be placed at the end of the list and/or its place preempted by another
> stream trying to send for the first time.
>
> We've pushed all the fixes for this in 2.0-dev for now and I'll backport
> them to 1.9 early next week. It would be nice if you could give it a try
> to confirm that it's now OK for you.
>
> Cheers,
> Willy
>


Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

2019-03-22 Thread Willy Tarreau
Hi Ashwin,

We have found the root cause of this. The H2 streams were not getting
the fairness they deserved due to their wake-up ordering : it happened
very often that a stream interrupted on a ux buffer full condition could
be placed at the end of the list and/or its place preempted by another
stream trying to send for the first time.

We've pushed all the fixes for this in 2.0-dev for now and I'll backport
them to 1.9 early next week. It would be nice if you could give it a try
to confirm that it's now OK for you.

Cheers,
Willy



Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

2019-03-19 Thread Ashwin Neerabail
Hi Willy,

I have saved the stats page of both server haproxy ( egress) and client
haproxy (ingress) : https://cloud.box.com/s/5wnnm0mcrla1w28101g6jtjkod5p0ryt


- did you enable threads on 1.9 ?

On the test setup, for the client haproxy, its set to 1 thread ( 1 core).
On production yes.
On the server ingress  haproxy , its set to 3 threads ( 3 cores)


  - do you have a "maxconn" setting on your server lines ?
On the server:

*pid = *30 (process #1, nbproc = 1, nbthread = 3)
*uptime = *3d 23h31m15s
*system limits:* memmax = unlimited; ulimit-n = 40037
*maxsock = *40037; *maxconn = *2; *maxpipes = *0
current conns = 1; current pipes = 0/0; conn rate = 0/sec
Running tasks: 1/28; idle = 100 %


  - if so do you know if you've ever had some queue on the
backend caused by this maxconn setting ? This can be seen
in the stats page under the "Queue/Max" column.

Cur/Max in Queue section is all 0.


  - do you observe connection retries in your stats page ? This
could explain the higher latency. Maybe connections time
out quickly and can't be reused, or maybe we fail to allocate
some from time to time due to a low file descriptor limit which
is hit earlier when server-side pools are enabled.

Retries:0

  - do you observe the problem if you put "http-reuse always" on
your 1.8 setup as well (I guess not since you said it doesn't
fail on 1.9 as soon as you remove server pools)?
No. http-always in 1.8 there is no change( from aggressive) and latency
does not spike.


On Mon, Mar 18, 2019 at 11:11 AM Willy Tarreau  wrote:

> Hi Ashwin,
>
> On Mon, Mar 18, 2019 at 10:57:45AM -0700, Ashwin Neerabail wrote:
> > Hi Willy,
> >
> > Thanks for the reply.
> >
> > My Test setup:
> > Client Server1 using local HAProxy 1.9 > 2 Backend servers  and
> > Client Server2 using local HAProxy 1.8 > same 2 backend servers.
> >
> > I am measuring latency from the client server.
> > So when I run 1000rps test , 50% of them end up on 1.9 and 50% on 1.8. So
> > if the backend servers have a problem , 1.8 should show similar high
> > latency too.
>
> Indeed.
>
> > However consistently only 1.9 client shows latency.
> >
> > I even tested this against real traffic in production against various
> > backend ( Java Netty , Java Tomcat , Nginx) . Across the board we saw
> > similar latency spiked when we tested 1.9.
>
> This is quite useful, especially with nginx which is known for not being
> too much bothered by idle connections and that we've extensively tested
> during the server pools design as well.
>
> Now I'll have some questions to dig this issue further :
>   - did you enable threads on 1.9 ?
>   - do you have a "maxconn" setting on your server lines ?
>   - if so do you know if you've ever had some queue on the
> backend caused by this maxconn setting ? This can be seen
> in the stats page under the "Queue/Max" column.
>   - do you observe connection retries in your stats page ? This
> could explain the higher latency. Maybe connections time
> out quickly and can't be reused, or maybe we fail to allocate
> some from time to time due to a low file descriptor limit which
> is hit earlier when server-side pools are enabled.
>   - do you observe the problem if you put "http-reuse always" on
> your 1.8 setup as well (I guess not since you said it doesn't
> fail on 1.9 as soon as you remove server pools)?
>
> Thanks,
> Willy
>


Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

2019-03-18 Thread Willy Tarreau
Hi Ashwin,

On Mon, Mar 18, 2019 at 10:57:45AM -0700, Ashwin Neerabail wrote:
> Hi Willy,
> 
> Thanks for the reply.
> 
> My Test setup:
> Client Server1 using local HAProxy 1.9 > 2 Backend servers  and
> Client Server2 using local HAProxy 1.8 > same 2 backend servers.
> 
> I am measuring latency from the client server.
> So when I run 1000rps test , 50% of them end up on 1.9 and 50% on 1.8. So
> if the backend servers have a problem , 1.8 should show similar high
> latency too.

Indeed.

> However consistently only 1.9 client shows latency.
>
> I even tested this against real traffic in production against various
> backend ( Java Netty , Java Tomcat , Nginx) . Across the board we saw
> similar latency spiked when we tested 1.9.

This is quite useful, especially with nginx which is known for not being
too much bothered by idle connections and that we've extensively tested
during the server pools design as well.

Now I'll have some questions to dig this issue further :
  - did you enable threads on 1.9 ?
  - do you have a "maxconn" setting on your server lines ?
  - if so do you know if you've ever had some queue on the
backend caused by this maxconn setting ? This can be seen
in the stats page under the "Queue/Max" column.
  - do you observe connection retries in your stats page ? This
could explain the higher latency. Maybe connections time
out quickly and can't be reused, or maybe we fail to allocate
some from time to time due to a low file descriptor limit which
is hit earlier when server-side pools are enabled.
  - do you observe the problem if you put "http-reuse always" on
your 1.8 setup as well (I guess not since you said it doesn't
fail on 1.9 as soon as you remove server pools)?

Thanks,
Willy



Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

2019-03-18 Thread Ashwin Neerabail
Hi Willy,

Thanks for the reply.

My Test setup:
Client Server1 using local HAProxy 1.9 > 2 Backend servers  and
Client Server2 using local HAProxy 1.8 > same 2 backend servers.

I am measuring latency from the client server.
So when I run 1000rps test , 50% of them end up on 1.9 and 50% on 1.8. So
if the backend servers have a problem , 1.8 should show similar high
latency too.
However consistently only 1.9 client shows latency.

I even tested this against real traffic in production against various
backend ( Java Netty , Java Tomcat , Nginx) . Across the board we saw
similar latency spiked when we tested 1.9.

Thanks,
Ashwin



On Thu, Feb 28, 2019 at 8:17 PM Willy Tarreau  wrote:

> Ashwin,
>
> I've taken some time to read your tests completely now, and something
> bothers me :
>
> On Mon, Feb 25, 2019 at 11:11:08AM -0800, Ashwin Neerabail wrote:
> > > - by disabling server-side idle connections (using "pool-max-conn 0" on
> > >  the server) though "http-reuse never" should be equivalent
> > >
> > > This seems to have done the trick. Adding `pool-max-conn 0` or
> `http-reuse
> > > never` fixes the problem.
> > > 1.8 and 1.9 perform similarly (client app that calls haproxy is using
> > > connection pooling). *Unfortunately , we have legacy clients that close
> > > connections to front end for every request.*
>
> Well, the thing is that haproxy 1.8 doesn't have connection pooling and
> 1.9 does. So this means that there is no regression between 1.8 and 1.9
> when using the same features. However connection pooling exhibits extra
> latency. Are you really sure that your server remains performant when
> dealing with idle connections ? Maybe it has an accept dispatcher with
> a small queue and has trouble dealing with too many idle connections ?
>
> > > CPU Usage for 1.8 and 1.9 was same ~22%.
> > >
> > >- by placing an inconditional redirect rule in your backend so that
> we
> > >  check how it performs when the connection doesn't leave :
> > >  http-request redirect location /
> > >
> > > Tried adding monitor-uri and returning from remote haproxy rather than
> > > hitting backend server.
> > > Strangely , in this case I see nearly identical performance /CPU usage
> > > with 1.8 and 1.9 even with http reuse set to aggressive.
> > > CPU Usage for 1.8 and 1.9 was same ~35%.
> > > *Set up is Client > HAProxy > HAProxy (with monitor-uri) > Server.*
>
> Ah this test is extremely interesting! It indeed shows that the only
> difference appears when reaching the server. But if the server has
> trouble with idle connections, why don't you disable them on haproxy ?
> As you've seen you can simply do that with "pool-max-conn 0" on the
> server lines. You could even try with different values. It might be
> possible that past a certain point the server's accept queue explodes
> and that's when it starts to have problems. You could try with a limited
> value, e.g. "pool-max-conn 10" then "pool-max-conn 100" etc and see
> where it starts to break.
>
> Regards,
> Willy
>


Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

2019-02-28 Thread Willy Tarreau
Ashwin,

I've taken some time to read your tests completely now, and something
bothers me :

On Mon, Feb 25, 2019 at 11:11:08AM -0800, Ashwin Neerabail wrote:
> > - by disabling server-side idle connections (using "pool-max-conn 0" on
> >  the server) though "http-reuse never" should be equivalent
> >
> > This seems to have done the trick. Adding `pool-max-conn 0` or `http-reuse
> > never` fixes the problem.
> > 1.8 and 1.9 perform similarly (client app that calls haproxy is using
> > connection pooling). *Unfortunately , we have legacy clients that close
> > connections to front end for every request.*

Well, the thing is that haproxy 1.8 doesn't have connection pooling and
1.9 does. So this means that there is no regression between 1.8 and 1.9
when using the same features. However connection pooling exhibits extra
latency. Are you really sure that your server remains performant when
dealing with idle connections ? Maybe it has an accept dispatcher with
a small queue and has trouble dealing with too many idle connections ?

> > CPU Usage for 1.8 and 1.9 was same ~22%.
> >
> >- by placing an inconditional redirect rule in your backend so that we
> >  check how it performs when the connection doesn't leave :
> >  http-request redirect location /
> >
> > Tried adding monitor-uri and returning from remote haproxy rather than
> > hitting backend server.
> > Strangely , in this case I see nearly identical performance /CPU usage
> > with 1.8 and 1.9 even with http reuse set to aggressive.
> > CPU Usage for 1.8 and 1.9 was same ~35%.
> > *Set up is Client > HAProxy > HAProxy (with monitor-uri) > Server.*

Ah this test is extremely interesting! It indeed shows that the only
difference appears when reaching the server. But if the server has
trouble with idle connections, why don't you disable them on haproxy ?
As you've seen you can simply do that with "pool-max-conn 0" on the
server lines. You could even try with different values. It might be
possible that past a certain point the server's accept queue explodes
and that's when it starts to have problems. You could try with a limited
value, e.g. "pool-max-conn 10" then "pool-max-conn 100" etc and see
where it starts to break.

Regards,
Willy



Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

2019-02-26 Thread Willy Tarreau
On Mon, Feb 25, 2019 at 11:11:08AM -0800, Ashwin Neerabail wrote:
> Any ideas on this ? Seeing issues with HAProxy 1.9 performance with
> connection pooling turned on.

No idea for now, we really need to find a way to accurately measure
this in order to spot when the problem happens. It could be anything.

Willy



Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

2019-02-25 Thread Ashwin Neerabail
Any ideas on this ? Seeing issues with HAProxy 1.9 performance with
connection pooling turned on.

Thanks.

On Wed, Feb 13, 2019 at 3:46 PM Ashwin Neerabail  wrote:

> Hi Willy,
>
> Thank you for the detailed response. Sorry for the delay in response.
>
> I ran all the combinations multiple times  to ensure consistent
> reproducibility.
> Here is what i found :
>
> Test Setup (same as last time):
> 2 Kube pods one running Haproxy 1.8.17 and another running
> 1.9.2  loadbalancing across 2 backend pods.
> Haproxy container is given 1 CPU , 1 GB Memory.
> 500 rps per pod test , latencies calculated for 1 min window.
>
> - previous results for comparison
> * Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms
> * Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms
> * Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util
> * Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util
>
>  - without SSL
> HAProxy 1.8 performs slightly better than 1.9
> * Haproxy 1.9 - p99 is ~ 9ms , p95 is ~ 3.5ms , median is 2.3ms*
> * Haproxy 1.8 - p99 is ~ 5ms , p95 is ~ 2.5ms, median is 1.7ms*
> CPU Usage is identical. (0.1% CPU)
>
> - by disabling server-side idle connections (using "pool-max-conn 0" on
>  the server) though "http-reuse never" should be equivalent
>
> This seems to have done the trick. Adding `pool-max-conn 0` or `http-reuse
> never` fixes the problem.
> 1.8 and 1.9 perform similarly (client app that calls haproxy is using
> connection pooling). *Unfortunately , we have legacy clients that close
> connections to front end for every request.*
> CPU Usage for 1.8 and 1.9 was same ~22%.
>
>- by placing an inconditional redirect rule in your backend so that we
>  check how it performs when the connection doesn't leave :
>  http-request redirect location /
>
> Tried adding monitor-uri and returning from remote haproxy rather than
> hitting backend server.
> Strangely , in this case I see nearly identical performance /CPU usage
> with 1.8 and 1.9 even with http reuse set to aggressive.
> CPU Usage for 1.8 and 1.9 was same ~35%.
> *Set up is Client > HAProxy > HAProxy (with monitor-uri) > Server.*
>
> If you're running harmless tests, you can pick the latest nightly snapshot
> of 2.0-dev which is very close to what 1.9.4 will be.
> I also tried the perf tests with 2.0-dev. It shows the same behavior as
> 1.9.
>
> If you have potential fixes / settings / other debugging steps that can be
> tweaked - I can test them out and publish the results.
> Thanks for your help.
>
> -Ashwin
>
>
> On Thu, Jan 31, 2019 at 1:43 PM Willy Tarreau  wrote:
>
>> Hi Ashwin,
>>
>> On Thu, Jan 31, 2019 at 10:32:33AM -0800, Ashwin Neerabail wrote:
>> > Hi,
>> >
>> > We are in process of upgrading to HAProxy 1.9 and we are seeing
>> consistent
>> > high latency with HAProxy 1.9.2 as compared to 1.8.17 when using HTTP
>> Mode
>> > ( both with and without TLS). However no latency issues with TCP Mode.
>> >
>> > Test Setup:
>> > 2 Kube pods one running Haproxy 1.8.17 and another running 1.9.2
>> > loadbalancing across 2 backend pods.
>> > Haproxy container is given 1 CPU , 1 GB Memory.
>> > 500 rps per pod test , latencies calculated for 1 min window.
>> >
>> > Latencies as measured by client:
>> >
>> > *When running TCP Mode , the p99 latency between 1.9 and 1.8 is the
>> same.*
>> > *When running HTTP Mode (with TLS),*
>> > *Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms*
>> > *Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms*
>>
>> The difference is huge, I'm wondering if it could be caused by a last TCP
>> segment being sent 40ms too late once in a while. Otherwise I'm having a
>> hard time imagining what can take so long a time at 500 Rps!
>>
>> In case you can vary some test parameters to try to narrow this down, it
>> would be interesting to try again :
>>- without SSL
>>- by disabling server-side idle connections (using "pool-max-conn 0" on
>>  the server) though "http-reuse never" should be equivalent
>>- by placing an inconditional redirect rule in your backend so that we
>>  check how it performs when the connection doesn't leave :
>>  http-request redirect location /
>>
>> > This increased latency is reproducible across multiple runs with 100%
>> > consistency.
>> > Haproxy reported metrics for connections and requests are the same for
>> both
>> > 1.8 and 1.9.
>> >
>> > Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util
>> > Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util
>>
>> That's quite interesting, it could indicate some excessive SSL
>> renegotiations. Regarding the extra RAM, I have no idea though. It could
>> be the result of a leak though.
>>
>> Trying 1.9.3 would obviously help, since it fixes a number of isses, even
>> if at first glance I'm not spotting one which could explain this. And I'd
>> be interested in another attempt once 1.9.4 is ready since it fixes many
>> backend-side connection issues. If you're running harmless tests, 

Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

2019-02-13 Thread Ashwin Neerabail
Hi Willy,

Thank you for the detailed response. Sorry for the delay in response.

I ran all the combinations multiple times  to ensure consistent
reproducibility.
Here is what i found :

Test Setup (same as last time):
2 Kube pods one running Haproxy 1.8.17 and another running
1.9.2  loadbalancing across 2 backend pods.
Haproxy container is given 1 CPU , 1 GB Memory.
500 rps per pod test , latencies calculated for 1 min window.

- previous results for comparison
* Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms
* Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms
* Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util
* Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util

 - without SSL
HAProxy 1.8 performs slightly better than 1.9
* Haproxy 1.9 - p99 is ~ 9ms , p95 is ~ 3.5ms , median is 2.3ms*
* Haproxy 1.8 - p99 is ~ 5ms , p95 is ~ 2.5ms, median is 1.7ms*
CPU Usage is identical. (0.1% CPU)

- by disabling server-side idle connections (using "pool-max-conn 0" on
 the server) though "http-reuse never" should be equivalent

This seems to have done the trick. Adding `pool-max-conn 0` or `http-reuse
never` fixes the problem.
1.8 and 1.9 perform similarly (client app that calls haproxy is using
connection pooling). *Unfortunately , we have legacy clients that close
connections to front end for every request.*
CPU Usage for 1.8 and 1.9 was same ~22%.

   - by placing an inconditional redirect rule in your backend so that we
 check how it performs when the connection doesn't leave :
 http-request redirect location /

Tried adding monitor-uri and returning from remote haproxy rather than
hitting backend server.
Strangely , in this case I see nearly identical performance /CPU usage with
1.8 and 1.9 even with http reuse set to aggressive.
CPU Usage for 1.8 and 1.9 was same ~35%.
*Set up is Client > HAProxy > HAProxy (with monitor-uri) > Server.*

If you're running harmless tests, you can pick the latest nightly snapshot
of 2.0-dev which is very close to what 1.9.4 will be.
I also tried the perf tests with 2.0-dev. It shows the same behavior as 1.9.

If you have potential fixes / settings / other debugging steps that can be
tweaked - I can test them out and publish the results.
Thanks for your help.

-Ashwin


On Thu, Jan 31, 2019 at 1:43 PM Willy Tarreau  wrote:

> Hi Ashwin,
>
> On Thu, Jan 31, 2019 at 10:32:33AM -0800, Ashwin Neerabail wrote:
> > Hi,
> >
> > We are in process of upgrading to HAProxy 1.9 and we are seeing
> consistent
> > high latency with HAProxy 1.9.2 as compared to 1.8.17 when using HTTP
> Mode
> > ( both with and without TLS). However no latency issues with TCP Mode.
> >
> > Test Setup:
> > 2 Kube pods one running Haproxy 1.8.17 and another running 1.9.2
> > loadbalancing across 2 backend pods.
> > Haproxy container is given 1 CPU , 1 GB Memory.
> > 500 rps per pod test , latencies calculated for 1 min window.
> >
> > Latencies as measured by client:
> >
> > *When running TCP Mode , the p99 latency between 1.9 and 1.8 is the
> same.*
> > *When running HTTP Mode (with TLS),*
> > *Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms*
> > *Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms*
>
> The difference is huge, I'm wondering if it could be caused by a last TCP
> segment being sent 40ms too late once in a while. Otherwise I'm having a
> hard time imagining what can take so long a time at 500 Rps!
>
> In case you can vary some test parameters to try to narrow this down, it
> would be interesting to try again :
>- without SSL
>- by disabling server-side idle connections (using "pool-max-conn 0" on
>  the server) though "http-reuse never" should be equivalent
>- by placing an inconditional redirect rule in your backend so that we
>  check how it performs when the connection doesn't leave :
>  http-request redirect location /
>
> > This increased latency is reproducible across multiple runs with 100%
> > consistency.
> > Haproxy reported metrics for connections and requests are the same for
> both
> > 1.8 and 1.9.
> >
> > Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util
> > Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util
>
> That's quite interesting, it could indicate some excessive SSL
> renegotiations. Regarding the extra RAM, I have no idea though. It could
> be the result of a leak though.
>
> Trying 1.9.3 would obviously help, since it fixes a number of isses, even
> if at first glance I'm not spotting one which could explain this. And I'd
> be interested in another attempt once 1.9.4 is ready since it fixes many
> backend-side connection issues. If you're running harmless tests, you can
> pick the latest nightly snapshot of 2.0-dev which is very close to what
> 1.9.4 will be. But already, testing the points above to bisect the issues
> will help.
>
> > Please let me know if I can provide any more details on this.
>
> In 1.9 we also have the ability to watch more details (per-connection
> CPU timing, 

Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

2019-01-31 Thread Willy Tarreau
Hi Ashwin,

On Thu, Jan 31, 2019 at 10:32:33AM -0800, Ashwin Neerabail wrote:
> Hi,
> 
> We are in process of upgrading to HAProxy 1.9 and we are seeing consistent
> high latency with HAProxy 1.9.2 as compared to 1.8.17 when using HTTP Mode
> ( both with and without TLS). However no latency issues with TCP Mode.
> 
> Test Setup:
> 2 Kube pods one running Haproxy 1.8.17 and another running 1.9.2
> loadbalancing across 2 backend pods.
> Haproxy container is given 1 CPU , 1 GB Memory.
> 500 rps per pod test , latencies calculated for 1 min window.
> 
> Latencies as measured by client:
> 
> *When running TCP Mode , the p99 latency between 1.9 and 1.8 is the same.*
> *When running HTTP Mode (with TLS),*
> *Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms*
> *Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms*

The difference is huge, I'm wondering if it could be caused by a last TCP
segment being sent 40ms too late once in a while. Otherwise I'm having a
hard time imagining what can take so long a time at 500 Rps!

In case you can vary some test parameters to try to narrow this down, it
would be interesting to try again :
   - without SSL
   - by disabling server-side idle connections (using "pool-max-conn 0" on
 the server) though "http-reuse never" should be equivalent
   - by placing an inconditional redirect rule in your backend so that we
 check how it performs when the connection doesn't leave :
 http-request redirect location /

> This increased latency is reproducible across multiple runs with 100%
> consistency.
> Haproxy reported metrics for connections and requests are the same for both
> 1.8 and 1.9.
> 
> Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util
> Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util

That's quite interesting, it could indicate some excessive SSL
renegotiations. Regarding the extra RAM, I have no idea though. It could
be the result of a leak though.

Trying 1.9.3 would obviously help, since it fixes a number of isses, even
if at first glance I'm not spotting one which could explain this. And I'd
be interested in another attempt once 1.9.4 is ready since it fixes many
backend-side connection issues. If you're running harmless tests, you can
pick the latest nightly snapshot of 2.0-dev which is very close to what
1.9.4 will be. But already, testing the points above to bisect the issues
will help.

> Please let me know if I can provide any more details on this.

In 1.9 we also have the ability to watch more details (per-connection
CPU timing, stolen CPU, etc). Some of them may be immediately retrieved
using "show info" and "show activity" on the CLI during the test. Others
will require some config adjustments to log extra fields and will take
some time to diagnose. Since nothing stands out of the crowd in your
config, I don't think it's necessary for now.

Willy



High p99 latency with HAProxy 1.9 in http mode compared to 1.8

2019-01-31 Thread Ashwin Neerabail
Hi,

We are in process of upgrading to HAProxy 1.9 and we are seeing consistent
high latency with HAProxy 1.9.2 as compared to 1.8.17 when using HTTP Mode
( both with and without TLS). However no latency issues with TCP Mode.

Test Setup:
2 Kube pods one running Haproxy 1.8.17 and another running 1.9.2
loadbalancing across 2 backend pods.
Haproxy container is given 1 CPU , 1 GB Memory.
500 rps per pod test , latencies calculated for 1 min window.

Latencies as measured by client:

*When running TCP Mode , the p99 latency between 1.9 and 1.8 is the same.*
*When running HTTP Mode (with TLS),*
*Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms*
*Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms*

This increased latency is reproducible across multiple runs with 100%
consistency.
Haproxy reported metrics for connections and requests are the same for both
1.8 and 1.9.

Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util
Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util

Please let me know if I can provide any more details on this.

Is there any setting I should be using ? I have tried various combinations
of http-reuse , tune ssl without much luck.

Here is the haproxy.cfg dump:

global
daemon
user container
group container
maxconn 2
log 127.0.0.1 local1
stats   socket /var/lib/haproxy/stats mode 666 level admin
pidfile /var/run/haproxy.pid

defaults
log  global
option   dontlognull
option   contstats
option   tcplog
maxconn  2
timeout  connect 5s
timeout  client  120m
timeout  server  120m
balance  roundrobin

listen stats
bind :8081
mode http
stats enable
stats uri /
stats refresh 5s
http-request set-log-level silent

backend test-echo-server...
mode http
timeout server 120m
timeout connect 5s
server... 1 check inter 30s rise 3 fall 2 ssl crt .pem ca-file
pem verify required verifyhost test-echo-server
server... 1 check inter 30s rise 3 fall 2 ssl crt .pem ca-file
pem verify required verifyhost test-echo-server

frontend shared-frontend
bind 127.0.0.1:80
mode http
option httplog
acl is_test-echo-server hdr_dom(host) -m reg
^test-echo-server\.localhost(:80)?$
use_backend test-echo-server... if is_test-echo-server
option http-keep-alive

Thanks ,
Ashwin