Any ideas on this ? Seeing issues with HAProxy 1.9 performance with connection pooling turned on.
Thanks. On Wed, Feb 13, 2019 at 3:46 PM Ashwin Neerabail <ash...@box.com> wrote: > Hi Willy, > > Thank you for the detailed response. Sorry for the delay in response. > > I ran all the combinations multiple times to ensure consistent > reproducibility. > Here is what i found : > > Test Setup (same as last time): > 2 Kube pods one running Haproxy 1.8.17 and another running > 1.9.2 loadbalancing across 2 backend pods. > Haproxy container is given 1 CPU , 1 GB Memory. > 500 rps per pod test , latencies calculated for 1 min window. > > - previous results for comparison > * Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms > * Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms > * Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util > * Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util > > - without SSL > HAProxy 1.8 performs slightly better than 1.9 > * Haproxy 1.9 - p99 is ~ 9ms , p95 is ~ 3.5ms , median is 2.3ms* > * Haproxy 1.8 - p99 is ~ 5ms , p95 is ~ 2.5ms, median is 1.7ms* > CPU Usage is identical. (0.1% CPU) > > - by disabling server-side idle connections (using "pool-max-conn 0" on > the server) though "http-reuse never" should be equivalent > > This seems to have done the trick. Adding `pool-max-conn 0` or `http-reuse > never` fixes the problem. > 1.8 and 1.9 perform similarly (client app that calls haproxy is using > connection pooling). *Unfortunately , we have legacy clients that close > connections to front end for every request.* > CPU Usage for 1.8 and 1.9 was same ~22%. > > - by placing an inconditional redirect rule in your backend so that we > check how it performs when the connection doesn't leave : > http-request redirect location / > > Tried adding monitor-uri and returning from remote haproxy rather than > hitting backend server. > Strangely , in this case I see nearly identical performance /CPU usage > with 1.8 and 1.9 even with http reuse set to aggressive. > CPU Usage for 1.8 and 1.9 was same ~35%. > *Set up is Client > HAProxy > HAProxy (with monitor-uri) > Server.* > > If you're running harmless tests, you can pick the latest nightly snapshot > of 2.0-dev which is very close to what 1.9.4 will be. > I also tried the perf tests with 2.0-dev. It shows the same behavior as > 1.9. > > If you have potential fixes / settings / other debugging steps that can be > tweaked - I can test them out and publish the results. > Thanks for your help. > > -Ashwin > > > On Thu, Jan 31, 2019 at 1:43 PM Willy Tarreau <w...@1wt.eu> wrote: > >> Hi Ashwin, >> >> On Thu, Jan 31, 2019 at 10:32:33AM -0800, Ashwin Neerabail wrote: >> > Hi, >> > >> > We are in process of upgrading to HAProxy 1.9 and we are seeing >> consistent >> > high latency with HAProxy 1.9.2 as compared to 1.8.17 when using HTTP >> Mode >> > ( both with and without TLS). However no latency issues with TCP Mode. >> > >> > Test Setup: >> > 2 Kube pods one running Haproxy 1.8.17 and another running 1.9.2 >> > loadbalancing across 2 backend pods. >> > Haproxy container is given 1 CPU , 1 GB Memory. >> > 500 rps per pod test , latencies calculated for 1 min window. >> > >> > Latencies as measured by client: >> > >> > *When running TCP Mode , the p99 latency between 1.9 and 1.8 is the >> same.* >> > *When running HTTP Mode (with TLS),* >> > *Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms* >> > *Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms* >> >> The difference is huge, I'm wondering if it could be caused by a last TCP >> segment being sent 40ms too late once in a while. Otherwise I'm having a >> hard time imagining what can take so long a time at 500 Rps! >> >> In case you can vary some test parameters to try to narrow this down, it >> would be interesting to try again : >> - without SSL >> - by disabling server-side idle connections (using "pool-max-conn 0" on >> the server) though "http-reuse never" should be equivalent >> - by placing an inconditional redirect rule in your backend so that we >> check how it performs when the connection doesn't leave : >> http-request redirect location / >> >> > This increased latency is reproducible across multiple runs with 100% >> > consistency. >> > Haproxy reported metrics for connections and requests are the same for >> both >> > 1.8 and 1.9. >> > >> > Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util >> > Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util >> >> That's quite interesting, it could indicate some excessive SSL >> renegotiations. Regarding the extra RAM, I have no idea though. It could >> be the result of a leak though. >> >> Trying 1.9.3 would obviously help, since it fixes a number of isses, even >> if at first glance I'm not spotting one which could explain this. And I'd >> be interested in another attempt once 1.9.4 is ready since it fixes many >> backend-side connection issues. If you're running harmless tests, you can >> pick the latest nightly snapshot of 2.0-dev which is very close to what >> 1.9.4 will be. But already, testing the points above to bisect the issues >> will help. >> >> > Please let me know if I can provide any more details on this. >> >> In 1.9 we also have the ability to watch more details (per-connection >> CPU timing, stolen CPU, etc). Some of them may be immediately retrieved >> using "show info" and "show activity" on the CLI during the test. Others >> will require some config adjustments to log extra fields and will take >> some time to diagnose. Since nothing stands out of the crowd in your >> config, I don't think it's necessary for now. >> >> Willy >> >