Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

Ashwin Neerabail Wed, 13 Feb 2019 15:48:12 -0800

Hi Willy,

Thank you for the detailed response. Sorry for the delay in response.


I ran all the combinations multiple times  to ensure consistent
reproducibility.
Here is what i found :

Test Setup (same as last time):
2 Kube pods one running Haproxy 1.8.17 and another running
1.9.2  loadbalancing across 2 backend pods.
Haproxy container is given 1 CPU , 1 GB Memory.
500 rps per pod test , latencies calculated for 1 min window.

- previous results for comparison
* Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms
* Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms
* Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util
* Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util

 - without SSL
HAProxy 1.8 performs slightly better than 1.9
* Haproxy 1.9 - p99 is ~ 9ms , p95 is ~ 3.5ms , median is 2.3ms*
* Haproxy 1.8 - p99 is ~ 5ms , p95 is ~ 2.5ms, median is 1.7ms*
CPU Usage is identical. (0.1% CPU)

- by disabling server-side idle connections (using "pool-max-conn 0" on
     the server) though "http-reuse never" should be equivalent

This seems to have done the trick. Adding `pool-max-conn 0` or `http-reuse
never` fixes the problem.
1.8 and 1.9 perform similarly (client app that calls haproxy is using
connection pooling). *Unfortunately , we have legacy clients that close
connections to front end for every request.*
CPU Usage for 1.8 and 1.9 was same ~22%.

   - by placing an inconditional redirect rule in your backend so that we
     check how it performs when the connection doesn't leave :
         http-request redirect location /

Tried adding monitor-uri and returning from remote haproxy rather than
hitting backend server.
Strangely , in this case I see nearly identical performance /CPU usage with
1.8 and 1.9 even with http reuse set to aggressive.
CPU Usage for 1.8 and 1.9 was same ~35%.
*Set up is Client > HAProxy > HAProxy (with monitor-uri) > Server.*

If you're running harmless tests, you can pick the latest nightly snapshot
of 2.0-dev which is very close to what 1.9.4 will be.
I also tried the perf tests with 2.0-dev. It shows the same behavior as 1.9.

If you have potential fixes / settings / other debugging steps that can be
tweaked - I can test them out and publish the results.
Thanks for your help.

-Ashwin


On Thu, Jan 31, 2019 at 1:43 PM Willy Tarreau <w...@1wt.eu> wrote:

> Hi Ashwin,
>
> On Thu, Jan 31, 2019 at 10:32:33AM -0800, Ashwin Neerabail wrote:
> > Hi,
> >
> > We are in process of upgrading to HAProxy 1.9 and we are seeing
> consistent
> > high latency with HAProxy 1.9.2 as compared to 1.8.17 when using HTTP
> Mode
> > ( both with and without TLS). However no latency issues with TCP Mode.
> >
> > Test Setup:
> > 2 Kube pods one running Haproxy 1.8.17 and another running 1.9.2
> > loadbalancing across 2 backend pods.
> > Haproxy container is given 1 CPU , 1 GB Memory.
> > 500 rps per pod test , latencies calculated for 1 min window.
> >
> > Latencies as measured by client:
> >
> > *When running TCP Mode , the p99 latency between 1.9 and 1.8 is the
> same.*
> > *When running HTTP Mode (with TLS),*
> > *Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms*
> > *Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms*
>
> The difference is huge, I'm wondering if it could be caused by a last TCP
> segment being sent 40ms too late once in a while. Otherwise I'm having a
> hard time imagining what can take so long a time at 500 Rps!
>
> In case you can vary some test parameters to try to narrow this down, it
> would be interesting to try again :
>    - without SSL
>    - by disabling server-side idle connections (using "pool-max-conn 0" on
>      the server) though "http-reuse never" should be equivalent
>    - by placing an inconditional redirect rule in your backend so that we
>      check how it performs when the connection doesn't leave :
>          http-request redirect location /
>
> > This increased latency is reproducible across multiple runs with 100%
> > consistency.
> > Haproxy reported metrics for connections and requests are the same for
> both
> > 1.8 and 1.9.
> >
> > Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util
> > Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util
>
> That's quite interesting, it could indicate some excessive SSL
> renegotiations. Regarding the extra RAM, I have no idea though. It could
> be the result of a leak though.
>
> Trying 1.9.3 would obviously help, since it fixes a number of isses, even
> if at first glance I'm not spotting one which could explain this. And I'd
> be interested in another attempt once 1.9.4 is ready since it fixes many
> backend-side connection issues. If you're running harmless tests, you can
> pick the latest nightly snapshot of 2.0-dev which is very close to what
> 1.9.4 will be. But already, testing the points above to bisect the issues
> will help.
>
> > Please let me know if I can provide any more details on this.
>
> In 1.9 we also have the ability to watch more details (per-connection
> CPU timing, stolen CPU, etc). Some of them may be immediately retrieved
> using "show info" and "show activity" on the CLI during the test. Others
> will require some config adjustments to log extra fields and will take
> some time to diagnose. Since nothing stands out of the crowd in your
> config, I don't think it's necessary for now.
>
> Willy
>

Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

Reply via email to