Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

Ashwin Neerabail Mon, 25 Feb 2019 11:11:48 -0800

Any ideas on this ? Seeing issues with HAProxy 1.9 performance with
connection pooling turned on.


Thanks.

On Wed, Feb 13, 2019 at 3:46 PM Ashwin Neerabail <ash...@box.com> wrote:

> Hi Willy,
>
> Thank you for the detailed response. Sorry for the delay in response.
>
> I ran all the combinations multiple times  to ensure consistent
> reproducibility.
> Here is what i found :
>
> Test Setup (same as last time):
> 2 Kube pods one running Haproxy 1.8.17 and another running
> 1.9.2  loadbalancing across 2 backend pods.
> Haproxy container is given 1 CPU , 1 GB Memory.
> 500 rps per pod test , latencies calculated for 1 min window.
>
> - previous results for comparison
> * Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms
> * Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms
> * Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util
> * Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util
>
>  - without SSL
> HAProxy 1.8 performs slightly better than 1.9
> * Haproxy 1.9 - p99 is ~ 9ms , p95 is ~ 3.5ms , median is 2.3ms*
> * Haproxy 1.8 - p99 is ~ 5ms , p95 is ~ 2.5ms, median is 1.7ms*
> CPU Usage is identical. (0.1% CPU)
>
> - by disabling server-side idle connections (using "pool-max-conn 0" on
>      the server) though "http-reuse never" should be equivalent
>
> This seems to have done the trick. Adding `pool-max-conn 0` or `http-reuse
> never` fixes the problem.
> 1.8 and 1.9 perform similarly (client app that calls haproxy is using
> connection pooling). *Unfortunately , we have legacy clients that close
> connections to front end for every request.*
> CPU Usage for 1.8 and 1.9 was same ~22%.
>
>    - by placing an inconditional redirect rule in your backend so that we
>      check how it performs when the connection doesn't leave :
>          http-request redirect location /
>
> Tried adding monitor-uri and returning from remote haproxy rather than
> hitting backend server.
> Strangely , in this case I see nearly identical performance /CPU usage
> with 1.8 and 1.9 even with http reuse set to aggressive.
> CPU Usage for 1.8 and 1.9 was same ~35%.
> *Set up is Client > HAProxy > HAProxy (with monitor-uri) > Server.*
>
> If you're running harmless tests, you can pick the latest nightly snapshot
> of 2.0-dev which is very close to what 1.9.4 will be.
> I also tried the perf tests with 2.0-dev. It shows the same behavior as
> 1.9.
>
> If you have potential fixes / settings / other debugging steps that can be
> tweaked - I can test them out and publish the results.
> Thanks for your help.
>
> -Ashwin
>
>
> On Thu, Jan 31, 2019 at 1:43 PM Willy Tarreau <w...@1wt.eu> wrote:
>
>> Hi Ashwin,
>>
>> On Thu, Jan 31, 2019 at 10:32:33AM -0800, Ashwin Neerabail wrote:
>> > Hi,
>> >
>> > We are in process of upgrading to HAProxy 1.9 and we are seeing
>> consistent
>> > high latency with HAProxy 1.9.2 as compared to 1.8.17 when using HTTP
>> Mode
>> > ( both with and without TLS). However no latency issues with TCP Mode.
>> >
>> > Test Setup:
>> > 2 Kube pods one running Haproxy 1.8.17 and another running 1.9.2
>> > loadbalancing across 2 backend pods.
>> > Haproxy container is given 1 CPU , 1 GB Memory.
>> > 500 rps per pod test , latencies calculated for 1 min window.
>> >
>> > Latencies as measured by client:
>> >
>> > *When running TCP Mode , the p99 latency between 1.9 and 1.8 is the
>> same.*
>> > *When running HTTP Mode (with TLS),*
>> > *Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms*
>> > *Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms*
>>
>> The difference is huge, I'm wondering if it could be caused by a last TCP
>> segment being sent 40ms too late once in a while. Otherwise I'm having a
>> hard time imagining what can take so long a time at 500 Rps!
>>
>> In case you can vary some test parameters to try to narrow this down, it
>> would be interesting to try again :
>>    - without SSL
>>    - by disabling server-side idle connections (using "pool-max-conn 0" on
>>      the server) though "http-reuse never" should be equivalent
>>    - by placing an inconditional redirect rule in your backend so that we
>>      check how it performs when the connection doesn't leave :
>>          http-request redirect location /
>>
>> > This increased latency is reproducible across multiple runs with 100%
>> > consistency.
>> > Haproxy reported metrics for connections and requests are the same for
>> both
>> > 1.8 and 1.9.
>> >
>> > Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util
>> > Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util
>>
>> That's quite interesting, it could indicate some excessive SSL
>> renegotiations. Regarding the extra RAM, I have no idea though. It could
>> be the result of a leak though.
>>
>> Trying 1.9.3 would obviously help, since it fixes a number of isses, even
>> if at first glance I'm not spotting one which could explain this. And I'd
>> be interested in another attempt once 1.9.4 is ready since it fixes many
>> backend-side connection issues. If you're running harmless tests, you can
>> pick the latest nightly snapshot of 2.0-dev which is very close to what
>> 1.9.4 will be. But already, testing the points above to bisect the issues
>> will help.
>>
>> > Please let me know if I can provide any more details on this.
>>
>> In 1.9 we also have the ability to watch more details (per-connection
>> CPU timing, stolen CPU, etc). Some of them may be immediately retrieved
>> using "show info" and "show activity" on the CLI during the test. Others
>> will require some config adjustments to log extra fields and will take
>> some time to diagnose. Since nothing stands out of the crowd in your
>> config, I don't think it's necessary for now.
>>
>> Willy
>>
>

Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8

Reply via email to