Hi Willy, Thank you for the detailed response. Sorry for the delay in response.
I ran all the combinations multiple times to ensure consistent reproducibility. Here is what i found : Test Setup (same as last time): 2 Kube pods one running Haproxy 1.8.17 and another running 1.9.2 loadbalancing across 2 backend pods. Haproxy container is given 1 CPU , 1 GB Memory. 500 rps per pod test , latencies calculated for 1 min window. - previous results for comparison * Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms * Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms * Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util * Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util - without SSL HAProxy 1.8 performs slightly better than 1.9 * Haproxy 1.9 - p99 is ~ 9ms , p95 is ~ 3.5ms , median is 2.3ms* * Haproxy 1.8 - p99 is ~ 5ms , p95 is ~ 2.5ms, median is 1.7ms* CPU Usage is identical. (0.1% CPU) - by disabling server-side idle connections (using "pool-max-conn 0" on the server) though "http-reuse never" should be equivalent This seems to have done the trick. Adding `pool-max-conn 0` or `http-reuse never` fixes the problem. 1.8 and 1.9 perform similarly (client app that calls haproxy is using connection pooling). *Unfortunately , we have legacy clients that close connections to front end for every request.* CPU Usage for 1.8 and 1.9 was same ~22%. - by placing an inconditional redirect rule in your backend so that we check how it performs when the connection doesn't leave : http-request redirect location / Tried adding monitor-uri and returning from remote haproxy rather than hitting backend server. Strangely , in this case I see nearly identical performance /CPU usage with 1.8 and 1.9 even with http reuse set to aggressive. CPU Usage for 1.8 and 1.9 was same ~35%. *Set up is Client > HAProxy > HAProxy (with monitor-uri) > Server.* If you're running harmless tests, you can pick the latest nightly snapshot of 2.0-dev which is very close to what 1.9.4 will be. I also tried the perf tests with 2.0-dev. It shows the same behavior as 1.9. If you have potential fixes / settings / other debugging steps that can be tweaked - I can test them out and publish the results. Thanks for your help. -Ashwin On Thu, Jan 31, 2019 at 1:43 PM Willy Tarreau <w...@1wt.eu> wrote: > Hi Ashwin, > > On Thu, Jan 31, 2019 at 10:32:33AM -0800, Ashwin Neerabail wrote: > > Hi, > > > > We are in process of upgrading to HAProxy 1.9 and we are seeing > consistent > > high latency with HAProxy 1.9.2 as compared to 1.8.17 when using HTTP > Mode > > ( both with and without TLS). However no latency issues with TCP Mode. > > > > Test Setup: > > 2 Kube pods one running Haproxy 1.8.17 and another running 1.9.2 > > loadbalancing across 2 backend pods. > > Haproxy container is given 1 CPU , 1 GB Memory. > > 500 rps per pod test , latencies calculated for 1 min window. > > > > Latencies as measured by client: > > > > *When running TCP Mode , the p99 latency between 1.9 and 1.8 is the > same.* > > *When running HTTP Mode (with TLS),* > > *Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms* > > *Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms* > > The difference is huge, I'm wondering if it could be caused by a last TCP > segment being sent 40ms too late once in a while. Otherwise I'm having a > hard time imagining what can take so long a time at 500 Rps! > > In case you can vary some test parameters to try to narrow this down, it > would be interesting to try again : > - without SSL > - by disabling server-side idle connections (using "pool-max-conn 0" on > the server) though "http-reuse never" should be equivalent > - by placing an inconditional redirect rule in your backend so that we > check how it performs when the connection doesn't leave : > http-request redirect location / > > > This increased latency is reproducible across multiple runs with 100% > > consistency. > > Haproxy reported metrics for connections and requests are the same for > both > > 1.8 and 1.9. > > > > Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util > > Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util > > That's quite interesting, it could indicate some excessive SSL > renegotiations. Regarding the extra RAM, I have no idea though. It could > be the result of a leak though. > > Trying 1.9.3 would obviously help, since it fixes a number of isses, even > if at first glance I'm not spotting one which could explain this. And I'd > be interested in another attempt once 1.9.4 is ready since it fixes many > backend-side connection issues. If you're running harmless tests, you can > pick the latest nightly snapshot of 2.0-dev which is very close to what > 1.9.4 will be. But already, testing the points above to bisect the issues > will help. > > > Please let me know if I can provide any more details on this. > > In 1.9 we also have the ability to watch more details (per-connection > CPU timing, stolen CPU, etc). Some of them may be immediately retrieved > using "show info" and "show activity" on the CLI during the test. Others > will require some config adjustments to log extra fields and will take > some time to diagnose. Since nothing stands out of the crowd in your > config, I don't think it's necessary for now. > > Willy >