Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8
Hi Ashwin, On Mon, Mar 25, 2019 at 02:51:17PM -0700, Ashwin Neerabail wrote: > Hi Willy, > > I tested against the latest version in the haproxy source repo. > Things got significantly worse. Even median latencies have shot up to 150ms > ( compared to 4ms for haproxy 1.8) > p99 shot up above 1second. > One strange thing I observed in the stats page is nbthread shows up as 64 ( > its 1 for HAProxy 1.8). I am using the exact same configuration across both > versions. That's expected after 2.0-dev1, please have a look at the dev2 announce I just sent. If you want a single thread, please just set "nbthread 1". > ctime , rtime are reporting higher values for 2.0 ( though 1.8 is fine > against the same backends at the same time) What exact version did you try ? I've just issued 2.0-dev2 with important fixes for issues that were causing some streams to starve for a while and now I can't reproduce any such delay anymore. But given that some of the pending issues were addressed yesterday evening, you guess why I'm interested in knowing which exact version you tried ;-) Thanks, Willy
Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8
Hi Willy, I tested against the latest version in the haproxy source repo. Things got significantly worse. Even median latencies have shot up to 150ms ( compared to 4ms for haproxy 1.8) p99 shot up above 1second. One strange thing I observed in the stats page is nbthread shows up as 64 ( its 1 for HAProxy 1.8). I am using the exact same configuration across both versions. ctime , rtime are reporting higher values for 2.0 ( though 1.8 is fine against the same backends at the same time) -Ashwin Thanks, Ashwin On Fri, Mar 22, 2019 at 11:03 AM Ashwin Neerabail wrote: > Hey Willy, > > Thats great news. Thanks for the quick action. > I will verify and get back. > > Thanks, > Ashwin > > On Fri, Mar 22, 2019 at 10:19 AM Willy Tarreau wrote: > >> Hi Ashwin, >> >> We have found the root cause of this. The H2 streams were not getting >> the fairness they deserved due to their wake-up ordering : it happened >> very often that a stream interrupted on a ux buffer full condition could >> be placed at the end of the list and/or its place preempted by another >> stream trying to send for the first time. >> >> We've pushed all the fixes for this in 2.0-dev for now and I'll backport >> them to 1.9 early next week. It would be nice if you could give it a try >> to confirm that it's now OK for you. >> >> Cheers, >> Willy >> >
Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8
Hey Willy, Thats great news. Thanks for the quick action. I will verify and get back. Thanks, Ashwin On Fri, Mar 22, 2019 at 10:19 AM Willy Tarreau wrote: > Hi Ashwin, > > We have found the root cause of this. The H2 streams were not getting > the fairness they deserved due to their wake-up ordering : it happened > very often that a stream interrupted on a ux buffer full condition could > be placed at the end of the list and/or its place preempted by another > stream trying to send for the first time. > > We've pushed all the fixes for this in 2.0-dev for now and I'll backport > them to 1.9 early next week. It would be nice if you could give it a try > to confirm that it's now OK for you. > > Cheers, > Willy >
Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8
Hi Ashwin, We have found the root cause of this. The H2 streams were not getting the fairness they deserved due to their wake-up ordering : it happened very often that a stream interrupted on a ux buffer full condition could be placed at the end of the list and/or its place preempted by another stream trying to send for the first time. We've pushed all the fixes for this in 2.0-dev for now and I'll backport them to 1.9 early next week. It would be nice if you could give it a try to confirm that it's now OK for you. Cheers, Willy
Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8
Hi Willy, I have saved the stats page of both server haproxy ( egress) and client haproxy (ingress) : https://cloud.box.com/s/5wnnm0mcrla1w28101g6jtjkod5p0ryt - did you enable threads on 1.9 ? On the test setup, for the client haproxy, its set to 1 thread ( 1 core). On production yes. On the server ingress haproxy , its set to 3 threads ( 3 cores) - do you have a "maxconn" setting on your server lines ? On the server: *pid = *30 (process #1, nbproc = 1, nbthread = 3) *uptime = *3d 23h31m15s *system limits:* memmax = unlimited; ulimit-n = 40037 *maxsock = *40037; *maxconn = *2; *maxpipes = *0 current conns = 1; current pipes = 0/0; conn rate = 0/sec Running tasks: 1/28; idle = 100 % - if so do you know if you've ever had some queue on the backend caused by this maxconn setting ? This can be seen in the stats page under the "Queue/Max" column. Cur/Max in Queue section is all 0. - do you observe connection retries in your stats page ? This could explain the higher latency. Maybe connections time out quickly and can't be reused, or maybe we fail to allocate some from time to time due to a low file descriptor limit which is hit earlier when server-side pools are enabled. Retries:0 - do you observe the problem if you put "http-reuse always" on your 1.8 setup as well (I guess not since you said it doesn't fail on 1.9 as soon as you remove server pools)? No. http-always in 1.8 there is no change( from aggressive) and latency does not spike. On Mon, Mar 18, 2019 at 11:11 AM Willy Tarreau wrote: > Hi Ashwin, > > On Mon, Mar 18, 2019 at 10:57:45AM -0700, Ashwin Neerabail wrote: > > Hi Willy, > > > > Thanks for the reply. > > > > My Test setup: > > Client Server1 using local HAProxy 1.9 > 2 Backend servers and > > Client Server2 using local HAProxy 1.8 > same 2 backend servers. > > > > I am measuring latency from the client server. > > So when I run 1000rps test , 50% of them end up on 1.9 and 50% on 1.8. So > > if the backend servers have a problem , 1.8 should show similar high > > latency too. > > Indeed. > > > However consistently only 1.9 client shows latency. > > > > I even tested this against real traffic in production against various > > backend ( Java Netty , Java Tomcat , Nginx) . Across the board we saw > > similar latency spiked when we tested 1.9. > > This is quite useful, especially with nginx which is known for not being > too much bothered by idle connections and that we've extensively tested > during the server pools design as well. > > Now I'll have some questions to dig this issue further : > - did you enable threads on 1.9 ? > - do you have a "maxconn" setting on your server lines ? > - if so do you know if you've ever had some queue on the > backend caused by this maxconn setting ? This can be seen > in the stats page under the "Queue/Max" column. > - do you observe connection retries in your stats page ? This > could explain the higher latency. Maybe connections time > out quickly and can't be reused, or maybe we fail to allocate > some from time to time due to a low file descriptor limit which > is hit earlier when server-side pools are enabled. > - do you observe the problem if you put "http-reuse always" on > your 1.8 setup as well (I guess not since you said it doesn't > fail on 1.9 as soon as you remove server pools)? > > Thanks, > Willy >
Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8
Hi Ashwin, On Mon, Mar 18, 2019 at 10:57:45AM -0700, Ashwin Neerabail wrote: > Hi Willy, > > Thanks for the reply. > > My Test setup: > Client Server1 using local HAProxy 1.9 > 2 Backend servers and > Client Server2 using local HAProxy 1.8 > same 2 backend servers. > > I am measuring latency from the client server. > So when I run 1000rps test , 50% of them end up on 1.9 and 50% on 1.8. So > if the backend servers have a problem , 1.8 should show similar high > latency too. Indeed. > However consistently only 1.9 client shows latency. > > I even tested this against real traffic in production against various > backend ( Java Netty , Java Tomcat , Nginx) . Across the board we saw > similar latency spiked when we tested 1.9. This is quite useful, especially with nginx which is known for not being too much bothered by idle connections and that we've extensively tested during the server pools design as well. Now I'll have some questions to dig this issue further : - did you enable threads on 1.9 ? - do you have a "maxconn" setting on your server lines ? - if so do you know if you've ever had some queue on the backend caused by this maxconn setting ? This can be seen in the stats page under the "Queue/Max" column. - do you observe connection retries in your stats page ? This could explain the higher latency. Maybe connections time out quickly and can't be reused, or maybe we fail to allocate some from time to time due to a low file descriptor limit which is hit earlier when server-side pools are enabled. - do you observe the problem if you put "http-reuse always" on your 1.8 setup as well (I guess not since you said it doesn't fail on 1.9 as soon as you remove server pools)? Thanks, Willy
Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8
Hi Willy, Thanks for the reply. My Test setup: Client Server1 using local HAProxy 1.9 > 2 Backend servers and Client Server2 using local HAProxy 1.8 > same 2 backend servers. I am measuring latency from the client server. So when I run 1000rps test , 50% of them end up on 1.9 and 50% on 1.8. So if the backend servers have a problem , 1.8 should show similar high latency too. However consistently only 1.9 client shows latency. I even tested this against real traffic in production against various backend ( Java Netty , Java Tomcat , Nginx) . Across the board we saw similar latency spiked when we tested 1.9. Thanks, Ashwin On Thu, Feb 28, 2019 at 8:17 PM Willy Tarreau wrote: > Ashwin, > > I've taken some time to read your tests completely now, and something > bothers me : > > On Mon, Feb 25, 2019 at 11:11:08AM -0800, Ashwin Neerabail wrote: > > > - by disabling server-side idle connections (using "pool-max-conn 0" on > > > the server) though "http-reuse never" should be equivalent > > > > > > This seems to have done the trick. Adding `pool-max-conn 0` or > `http-reuse > > > never` fixes the problem. > > > 1.8 and 1.9 perform similarly (client app that calls haproxy is using > > > connection pooling). *Unfortunately , we have legacy clients that close > > > connections to front end for every request.* > > Well, the thing is that haproxy 1.8 doesn't have connection pooling and > 1.9 does. So this means that there is no regression between 1.8 and 1.9 > when using the same features. However connection pooling exhibits extra > latency. Are you really sure that your server remains performant when > dealing with idle connections ? Maybe it has an accept dispatcher with > a small queue and has trouble dealing with too many idle connections ? > > > > CPU Usage for 1.8 and 1.9 was same ~22%. > > > > > >- by placing an inconditional redirect rule in your backend so that > we > > > check how it performs when the connection doesn't leave : > > > http-request redirect location / > > > > > > Tried adding monitor-uri and returning from remote haproxy rather than > > > hitting backend server. > > > Strangely , in this case I see nearly identical performance /CPU usage > > > with 1.8 and 1.9 even with http reuse set to aggressive. > > > CPU Usage for 1.8 and 1.9 was same ~35%. > > > *Set up is Client > HAProxy > HAProxy (with monitor-uri) > Server.* > > Ah this test is extremely interesting! It indeed shows that the only > difference appears when reaching the server. But if the server has > trouble with idle connections, why don't you disable them on haproxy ? > As you've seen you can simply do that with "pool-max-conn 0" on the > server lines. You could even try with different values. It might be > possible that past a certain point the server's accept queue explodes > and that's when it starts to have problems. You could try with a limited > value, e.g. "pool-max-conn 10" then "pool-max-conn 100" etc and see > where it starts to break. > > Regards, > Willy >
Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8
Ashwin, I've taken some time to read your tests completely now, and something bothers me : On Mon, Feb 25, 2019 at 11:11:08AM -0800, Ashwin Neerabail wrote: > > - by disabling server-side idle connections (using "pool-max-conn 0" on > > the server) though "http-reuse never" should be equivalent > > > > This seems to have done the trick. Adding `pool-max-conn 0` or `http-reuse > > never` fixes the problem. > > 1.8 and 1.9 perform similarly (client app that calls haproxy is using > > connection pooling). *Unfortunately , we have legacy clients that close > > connections to front end for every request.* Well, the thing is that haproxy 1.8 doesn't have connection pooling and 1.9 does. So this means that there is no regression between 1.8 and 1.9 when using the same features. However connection pooling exhibits extra latency. Are you really sure that your server remains performant when dealing with idle connections ? Maybe it has an accept dispatcher with a small queue and has trouble dealing with too many idle connections ? > > CPU Usage for 1.8 and 1.9 was same ~22%. > > > >- by placing an inconditional redirect rule in your backend so that we > > check how it performs when the connection doesn't leave : > > http-request redirect location / > > > > Tried adding monitor-uri and returning from remote haproxy rather than > > hitting backend server. > > Strangely , in this case I see nearly identical performance /CPU usage > > with 1.8 and 1.9 even with http reuse set to aggressive. > > CPU Usage for 1.8 and 1.9 was same ~35%. > > *Set up is Client > HAProxy > HAProxy (with monitor-uri) > Server.* Ah this test is extremely interesting! It indeed shows that the only difference appears when reaching the server. But if the server has trouble with idle connections, why don't you disable them on haproxy ? As you've seen you can simply do that with "pool-max-conn 0" on the server lines. You could even try with different values. It might be possible that past a certain point the server's accept queue explodes and that's when it starts to have problems. You could try with a limited value, e.g. "pool-max-conn 10" then "pool-max-conn 100" etc and see where it starts to break. Regards, Willy
Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8
On Mon, Feb 25, 2019 at 11:11:08AM -0800, Ashwin Neerabail wrote: > Any ideas on this ? Seeing issues with HAProxy 1.9 performance with > connection pooling turned on. No idea for now, we really need to find a way to accurately measure this in order to spot when the problem happens. It could be anything. Willy
Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8
Any ideas on this ? Seeing issues with HAProxy 1.9 performance with connection pooling turned on. Thanks. On Wed, Feb 13, 2019 at 3:46 PM Ashwin Neerabail wrote: > Hi Willy, > > Thank you for the detailed response. Sorry for the delay in response. > > I ran all the combinations multiple times to ensure consistent > reproducibility. > Here is what i found : > > Test Setup (same as last time): > 2 Kube pods one running Haproxy 1.8.17 and another running > 1.9.2 loadbalancing across 2 backend pods. > Haproxy container is given 1 CPU , 1 GB Memory. > 500 rps per pod test , latencies calculated for 1 min window. > > - previous results for comparison > * Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms > * Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms > * Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util > * Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util > > - without SSL > HAProxy 1.8 performs slightly better than 1.9 > * Haproxy 1.9 - p99 is ~ 9ms , p95 is ~ 3.5ms , median is 2.3ms* > * Haproxy 1.8 - p99 is ~ 5ms , p95 is ~ 2.5ms, median is 1.7ms* > CPU Usage is identical. (0.1% CPU) > > - by disabling server-side idle connections (using "pool-max-conn 0" on > the server) though "http-reuse never" should be equivalent > > This seems to have done the trick. Adding `pool-max-conn 0` or `http-reuse > never` fixes the problem. > 1.8 and 1.9 perform similarly (client app that calls haproxy is using > connection pooling). *Unfortunately , we have legacy clients that close > connections to front end for every request.* > CPU Usage for 1.8 and 1.9 was same ~22%. > >- by placing an inconditional redirect rule in your backend so that we > check how it performs when the connection doesn't leave : > http-request redirect location / > > Tried adding monitor-uri and returning from remote haproxy rather than > hitting backend server. > Strangely , in this case I see nearly identical performance /CPU usage > with 1.8 and 1.9 even with http reuse set to aggressive. > CPU Usage for 1.8 and 1.9 was same ~35%. > *Set up is Client > HAProxy > HAProxy (with monitor-uri) > Server.* > > If you're running harmless tests, you can pick the latest nightly snapshot > of 2.0-dev which is very close to what 1.9.4 will be. > I also tried the perf tests with 2.0-dev. It shows the same behavior as > 1.9. > > If you have potential fixes / settings / other debugging steps that can be > tweaked - I can test them out and publish the results. > Thanks for your help. > > -Ashwin > > > On Thu, Jan 31, 2019 at 1:43 PM Willy Tarreau wrote: > >> Hi Ashwin, >> >> On Thu, Jan 31, 2019 at 10:32:33AM -0800, Ashwin Neerabail wrote: >> > Hi, >> > >> > We are in process of upgrading to HAProxy 1.9 and we are seeing >> consistent >> > high latency with HAProxy 1.9.2 as compared to 1.8.17 when using HTTP >> Mode >> > ( both with and without TLS). However no latency issues with TCP Mode. >> > >> > Test Setup: >> > 2 Kube pods one running Haproxy 1.8.17 and another running 1.9.2 >> > loadbalancing across 2 backend pods. >> > Haproxy container is given 1 CPU , 1 GB Memory. >> > 500 rps per pod test , latencies calculated for 1 min window. >> > >> > Latencies as measured by client: >> > >> > *When running TCP Mode , the p99 latency between 1.9 and 1.8 is the >> same.* >> > *When running HTTP Mode (with TLS),* >> > *Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms* >> > *Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms* >> >> The difference is huge, I'm wondering if it could be caused by a last TCP >> segment being sent 40ms too late once in a while. Otherwise I'm having a >> hard time imagining what can take so long a time at 500 Rps! >> >> In case you can vary some test parameters to try to narrow this down, it >> would be interesting to try again : >>- without SSL >>- by disabling server-side idle connections (using "pool-max-conn 0" on >> the server) though "http-reuse never" should be equivalent >>- by placing an inconditional redirect rule in your backend so that we >> check how it performs when the connection doesn't leave : >> http-request redirect location / >> >> > This increased latency is reproducible across multiple runs with 100% >> > consistency. >> > Haproxy reported metrics for connections and requests are the same for >> both >> > 1.8 and 1.9. >> > >> > Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util >> > Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util >> >> That's quite interesting, it could indicate some excessive SSL >> renegotiations. Regarding the extra RAM, I have no idea though. It could >> be the result of a leak though. >> >> Trying 1.9.3 would obviously help, since it fixes a number of isses, even >> if at first glance I'm not spotting one which could explain this. And I'd >> be interested in another attempt once 1.9.4 is ready since it fixes many >> backend-side connection issues. If you're running harmless tests,
Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8
Hi Willy, Thank you for the detailed response. Sorry for the delay in response. I ran all the combinations multiple times to ensure consistent reproducibility. Here is what i found : Test Setup (same as last time): 2 Kube pods one running Haproxy 1.8.17 and another running 1.9.2 loadbalancing across 2 backend pods. Haproxy container is given 1 CPU , 1 GB Memory. 500 rps per pod test , latencies calculated for 1 min window. - previous results for comparison * Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms * Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms * Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util * Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util - without SSL HAProxy 1.8 performs slightly better than 1.9 * Haproxy 1.9 - p99 is ~ 9ms , p95 is ~ 3.5ms , median is 2.3ms* * Haproxy 1.8 - p99 is ~ 5ms , p95 is ~ 2.5ms, median is 1.7ms* CPU Usage is identical. (0.1% CPU) - by disabling server-side idle connections (using "pool-max-conn 0" on the server) though "http-reuse never" should be equivalent This seems to have done the trick. Adding `pool-max-conn 0` or `http-reuse never` fixes the problem. 1.8 and 1.9 perform similarly (client app that calls haproxy is using connection pooling). *Unfortunately , we have legacy clients that close connections to front end for every request.* CPU Usage for 1.8 and 1.9 was same ~22%. - by placing an inconditional redirect rule in your backend so that we check how it performs when the connection doesn't leave : http-request redirect location / Tried adding monitor-uri and returning from remote haproxy rather than hitting backend server. Strangely , in this case I see nearly identical performance /CPU usage with 1.8 and 1.9 even with http reuse set to aggressive. CPU Usage for 1.8 and 1.9 was same ~35%. *Set up is Client > HAProxy > HAProxy (with monitor-uri) > Server.* If you're running harmless tests, you can pick the latest nightly snapshot of 2.0-dev which is very close to what 1.9.4 will be. I also tried the perf tests with 2.0-dev. It shows the same behavior as 1.9. If you have potential fixes / settings / other debugging steps that can be tweaked - I can test them out and publish the results. Thanks for your help. -Ashwin On Thu, Jan 31, 2019 at 1:43 PM Willy Tarreau wrote: > Hi Ashwin, > > On Thu, Jan 31, 2019 at 10:32:33AM -0800, Ashwin Neerabail wrote: > > Hi, > > > > We are in process of upgrading to HAProxy 1.9 and we are seeing > consistent > > high latency with HAProxy 1.9.2 as compared to 1.8.17 when using HTTP > Mode > > ( both with and without TLS). However no latency issues with TCP Mode. > > > > Test Setup: > > 2 Kube pods one running Haproxy 1.8.17 and another running 1.9.2 > > loadbalancing across 2 backend pods. > > Haproxy container is given 1 CPU , 1 GB Memory. > > 500 rps per pod test , latencies calculated for 1 min window. > > > > Latencies as measured by client: > > > > *When running TCP Mode , the p99 latency between 1.9 and 1.8 is the > same.* > > *When running HTTP Mode (with TLS),* > > *Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms* > > *Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms* > > The difference is huge, I'm wondering if it could be caused by a last TCP > segment being sent 40ms too late once in a while. Otherwise I'm having a > hard time imagining what can take so long a time at 500 Rps! > > In case you can vary some test parameters to try to narrow this down, it > would be interesting to try again : >- without SSL >- by disabling server-side idle connections (using "pool-max-conn 0" on > the server) though "http-reuse never" should be equivalent >- by placing an inconditional redirect rule in your backend so that we > check how it performs when the connection doesn't leave : > http-request redirect location / > > > This increased latency is reproducible across multiple runs with 100% > > consistency. > > Haproxy reported metrics for connections and requests are the same for > both > > 1.8 and 1.9. > > > > Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util > > Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util > > That's quite interesting, it could indicate some excessive SSL > renegotiations. Regarding the extra RAM, I have no idea though. It could > be the result of a leak though. > > Trying 1.9.3 would obviously help, since it fixes a number of isses, even > if at first glance I'm not spotting one which could explain this. And I'd > be interested in another attempt once 1.9.4 is ready since it fixes many > backend-side connection issues. If you're running harmless tests, you can > pick the latest nightly snapshot of 2.0-dev which is very close to what > 1.9.4 will be. But already, testing the points above to bisect the issues > will help. > > > Please let me know if I can provide any more details on this. > > In 1.9 we also have the ability to watch more details (per-connection > CPU timing,
Re: High p99 latency with HAProxy 1.9 in http mode compared to 1.8
Hi Ashwin, On Thu, Jan 31, 2019 at 10:32:33AM -0800, Ashwin Neerabail wrote: > Hi, > > We are in process of upgrading to HAProxy 1.9 and we are seeing consistent > high latency with HAProxy 1.9.2 as compared to 1.8.17 when using HTTP Mode > ( both with and without TLS). However no latency issues with TCP Mode. > > Test Setup: > 2 Kube pods one running Haproxy 1.8.17 and another running 1.9.2 > loadbalancing across 2 backend pods. > Haproxy container is given 1 CPU , 1 GB Memory. > 500 rps per pod test , latencies calculated for 1 min window. > > Latencies as measured by client: > > *When running TCP Mode , the p99 latency between 1.9 and 1.8 is the same.* > *When running HTTP Mode (with TLS),* > *Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms* > *Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms* The difference is huge, I'm wondering if it could be caused by a last TCP segment being sent 40ms too late once in a while. Otherwise I'm having a hard time imagining what can take so long a time at 500 Rps! In case you can vary some test parameters to try to narrow this down, it would be interesting to try again : - without SSL - by disabling server-side idle connections (using "pool-max-conn 0" on the server) though "http-reuse never" should be equivalent - by placing an inconditional redirect rule in your backend so that we check how it performs when the connection doesn't leave : http-request redirect location / > This increased latency is reproducible across multiple runs with 100% > consistency. > Haproxy reported metrics for connections and requests are the same for both > 1.8 and 1.9. > > Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util > Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util That's quite interesting, it could indicate some excessive SSL renegotiations. Regarding the extra RAM, I have no idea though. It could be the result of a leak though. Trying 1.9.3 would obviously help, since it fixes a number of isses, even if at first glance I'm not spotting one which could explain this. And I'd be interested in another attempt once 1.9.4 is ready since it fixes many backend-side connection issues. If you're running harmless tests, you can pick the latest nightly snapshot of 2.0-dev which is very close to what 1.9.4 will be. But already, testing the points above to bisect the issues will help. > Please let me know if I can provide any more details on this. In 1.9 we also have the ability to watch more details (per-connection CPU timing, stolen CPU, etc). Some of them may be immediately retrieved using "show info" and "show activity" on the CLI during the test. Others will require some config adjustments to log extra fields and will take some time to diagnose. Since nothing stands out of the crowd in your config, I don't think it's necessary for now. Willy
High p99 latency with HAProxy 1.9 in http mode compared to 1.8
Hi, We are in process of upgrading to HAProxy 1.9 and we are seeing consistent high latency with HAProxy 1.9.2 as compared to 1.8.17 when using HTTP Mode ( both with and without TLS). However no latency issues with TCP Mode. Test Setup: 2 Kube pods one running Haproxy 1.8.17 and another running 1.9.2 loadbalancing across 2 backend pods. Haproxy container is given 1 CPU , 1 GB Memory. 500 rps per pod test , latencies calculated for 1 min window. Latencies as measured by client: *When running TCP Mode , the p99 latency between 1.9 and 1.8 is the same.* *When running HTTP Mode (with TLS),* *Haproxy 1.9 - p99 is ~ 20ms , p95 is ~ 11ms , median is 5.5ms* *Haproxy 1.8 - p99 is ~ 8ms , p95 is ~ 6ms, median is 4.5ms* This increased latency is reproducible across multiple runs with 100% consistency. Haproxy reported metrics for connections and requests are the same for both 1.8 and 1.9. Haproxy 1.9 : Memory usage - 130MB , CPU : 55% util Haproxy 1.8 : Memory usage - 90MB , CPU : 45% util Please let me know if I can provide any more details on this. Is there any setting I should be using ? I have tried various combinations of http-reuse , tune ssl without much luck. Here is the haproxy.cfg dump: global daemon user container group container maxconn 2 log 127.0.0.1 local1 stats socket /var/lib/haproxy/stats mode 666 level admin pidfile /var/run/haproxy.pid defaults log global option dontlognull option contstats option tcplog maxconn 2 timeout connect 5s timeout client 120m timeout server 120m balance roundrobin listen stats bind :8081 mode http stats enable stats uri / stats refresh 5s http-request set-log-level silent backend test-echo-server... mode http timeout server 120m timeout connect 5s server... 1 check inter 30s rise 3 fall 2 ssl crt .pem ca-file pem verify required verifyhost test-echo-server server... 1 check inter 30s rise 3 fall 2 ssl crt .pem ca-file pem verify required verifyhost test-echo-server frontend shared-frontend bind 127.0.0.1:80 mode http option httplog acl is_test-echo-server hdr_dom(host) -m reg ^test-echo-server\.localhost(:80)?$ use_backend test-echo-server... if is_test-echo-server option http-keep-alive Thanks , Ashwin