Re: 1.8.1 Segfault + slowdown
update: we've disabled h2 on 1.8, and everything is running as expected again. haproxy does not degrade performance anymore nor does it segfault. so it issues seem to be related to the h2 Med venlig hilsen *Peter Lindegaard Hansen* *Softwareudvikler / Partner* Telefon: +45 96 500 300 | Direkte: 69 14 97 04 | Email: p...@tigermedia.dk Tiger Media A/S | Gl. Gugvej 17C | 9000 Aalborg | Web: www.tigermedia.dk For supportspørgsmål kontakt os da på supp...@tigermedia.dk eller på tlf. 96 500 300 og din henvendelse vil blive besvaret af første ledige medarbejder. 2017-12-19 11:36 GMT+01:00 Peter Lindegaard Hansen: > Hi list, > > We upgraded from 1.5 to 1.8 recently - then to 1.8.1 > > Now we're seeing segfaults and slowdowns with haproxy > > Repeating: > Dec 19 11:14:26 haproxy02 kernel: [122635.295196] haproxy[29582]: segfault > at 55d5152279b2 ip 7f9c2dcc5a28 sp 7fff07caf4b8 error 6 in > libc-2.23.so[7f9c2dc26000+1c] > Dec 19 11:14:26 haproxy02 systemd[1]: haproxy.service: Main process > exited, code=exited, status=139/n/a > Dec 19 11:14:26 haproxy02 systemd[1]: haproxy.service: Unit entered failed > state. > Dec 19 11:14:26 haproxy02 systemd[1]: haproxy.service: Failed with result > 'exit-code'. > Dec 19 11:14:26 haproxy02 systemd[1]: haproxy.service: Service hold-off > time over, scheduling restart. > Dec 19 11:14:26 haproxy02 systemd[1]: Stopped HAProxy Load Balancer. > Dec 19 11:14:26 haproxy02 systemd[1]: Starting HAProxy Load Balancer... > Dec 19 11:14:26 haproxy02 systemd[1]: Started HAProxy Load Balancer. > Dec 19 11:14:27 haproxy02 kernel: [122636.578738] haproxy[31479]: segfault > at 56409a8c1de2 ip 7fa5fa349a28 sp 7ffe66f4f688 error 6 in > libc-2.23.so[7fa5fa2aa000+1c] > Dec 19 11:14:27 haproxy02 systemd[1]: haproxy.service: Main process > exited, code=exited, status=139/n/a > Dec 19 11:14:27 haproxy02 systemd[1]: haproxy.service: Unit entered failed > state. > Dec 19 11:14:27 haproxy02 systemd[1]: haproxy.service: Failed with result > 'exit-code'. > Dec 19 11:14:27 haproxy02 systemd[1]: haproxy.service: Service hold-off > time over, scheduling restart. > Dec 19 11:14:27 haproxy02 systemd[1]: Stopped HAProxy Load Balancer. > Dec 19 11:14:27 haproxy02 systemd[1]: Starting HAProxy Load Balancer... > Dec 19 11:14:28 haproxy02 systemd[1]: Started HAProxy Load Balancer. > Dec 19 11:14:28 haproxy02 kernel: [122637.569863] haproxy[31487]: segfault > at 55cb4bd59857 ip 7f71e678aa28 sp 7fffb94427b8 error 6 in > libc-2.23.so[7f71e66eb000+1c] > Dec 19 11:14:28 haproxy02 systemd[1]: haproxy.service: Main process > exited, code=exited, status=139/n/a > Dec 19 11:14:28 haproxy02 systemd[1]: haproxy.service: Unit entered failed > state. > Dec 19 11:14:28 haproxy02 systemd[1]: haproxy.service: Failed with result > 'exit-code'. > Dec 19 11:14:28 haproxy02 systemd[1]: haproxy.service: Service hold-off > time over, scheduling restart. > Dec 19 11:14:28 haproxy02 systemd[1]: Stopped HAProxy Load Balancer. > Dec 19 11:14:28 haproxy02 systemd[1]: Starting HAProxy Load Balancer... > Dec 19 11:14:29 haproxy02 systemd[1]: Started HAProxy Load Balancer. > > > At same time in haproxy.log > > (lots of ssl handshake failures...) then > Dec 19 11:14:26 haproxy02 haproxy[29579]: [ALERT] 352/090058 (29579) : > Current worker 29582 left with exit code 139 > Dec 19 11:14:26 haproxy02 haproxy[29579]: [ALERT] 352/090058 (29579) : > exit-on-failure: killing every workers with SIGTERM > Dec 19 11:14:26 haproxy02 haproxy[29579]: [WARNING] 352/090058 (29579) : > All workers are left. Leaving... (139) > Dec 19 11:14:27 haproxy02 haproxy[31476]: [ALERT] 352/111426 (31476) : > Current worker 31479 left with exit code 139 > Dec 19 11:14:27 haproxy02 haproxy[31476]: [ALERT] 352/111426 (31476) : > exit-on-failure: killing every workers with SIGTERM > Dec 19 11:14:27 haproxy02 haproxy[31476]: [WARNING] 352/111426 (31476) : > All workers are left. Leaving... (139) > Dec 19 11:14:28 haproxy02 haproxy[31485]: [ALERT] 352/111428 (31485) : > Current worker 31487 left with exit code 139 > Dec 19 11:14:28 haproxy02 haproxy[31485]: [ALERT] 352/111428 (31485) : > exit-on-failure: killing every workers with SIGTERM > Dec 19 11:14:28 haproxy02 haproxy[31485]: [WARNING] 352/111428 (31485) : > All workers are left. Leaving... (139) > Dec 19 11:14:29 haproxy02 haproxy[31493]: [ALERT] 352/111429 (31493) : > Current worker 31496 left with exit code 139 > Dec 19 11:14:29 haproxy02 haproxy[31493]: [ALERT] 352/111429 (31493) : > exit-on-failure: killing every workers with SIGTERM > Dec 19 11:14:29 haproxy02 haproxy[31493]: [WARNING] 352/111429 (31493) : > All workers are left. Leaving... (139) > Dec 19 11:14:30 haproxy02 haproxy[31503]: [ALERT] 352/111429 (31503) : > Current worker 31505 left with exit code 139 > Dec 19 11:14:30 haproxy02 haproxy[31503]: [ALERT] 352/111429 (31503) : > exit-on-failure: killing every workers with SIGTERM > Dec 19 11:14:30 haproxy02 haproxy[31503]: [WARNING]
Re: Quick update on pending HTTP/2 issues
Hi again guys, so another quick update on the subject : - the currently known POST issues have been resolved a few days ago, requiring a significant number of changes which make the code better anyway so it was not bad in the end ; - the abortonclose case has been solved as well. The issue was that in HTTP/1 there's no way for a browser to mention that it's aborting so the best it can do is to close, resulting in a read0 being received and the option is there to indicate that haproxy must consider this read0 as an abort. In HTTP/2 we have RST_STREAM to explicitly abort, and the read0 is sent at the end of all requests to indicate that the request is complete, thus creates the confusion with the option above. The fix consists in making abortonclose useless on read0 for H2, and now it's OK. I'd have liked to issue 1.8.2 with these fixes, but apparently a few other issues are pending. Let's say that we'll release before the end of the week with or without the fixes. (PS: yes I'm still thinking about issuing 1.7.10 ASAP, but the 1.8 issues have been diverting everyone *a lot* and it's not finished). Willy
Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1
On Thu, Dec 21, 2017 at 12:04:11AM +0100, Cyril Bonté wrote: > Hi Greg, > > Le 20/12/2017 à 22:42, Greg Nolle a écrit : > > Hi Andrew, > > > > Thanks for the info but I'm afraid I'm not seeing anything here that > > would affect the issue I'm seeing, and by the way the docs don't > > indicate that the cookie names have to match the server names. > > First, don't worry about the configuration, there is nothing wrong in it ;-) > > > That being said, I tried using your settings and am still seeing the > > issue (see below for new full config). And like I say, this is only an > > issue with v1.8.1, it works as expected in v1.7.9. > > I won't be able to look further tonight, but at least I could identify when > the regression occured : it's caused by the work done to prepare > multi-threading, more specifically by this commit : > http://git.haproxy.org/?p=haproxy.git;a=commitdiff;h=64cc49cf7 > > I add Emeric to the thread, maybe he'll be able to provide a fix faster than > me (I'll won't be very available for the next days). Thus I'll ping Emeric tomorrow as well so that we can issue 1.8.2 soon in case someone wants to play with it on friday afternoon jus before xmas :-) Willy
Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1
Hi Greg, Le 20/12/2017 à 22:42, Greg Nolle a écrit : Hi Andrew, Thanks for the info but I’m afraid I’m not seeing anything here that would affect the issue I’m seeing, and by the way the docs don’t indicate that the cookie names have to match the server names. First, don't worry about the configuration, there is nothing wrong in it ;-) That being said, I tried using your settings and am still seeing the issue (see below for new full config). And like I say, this is only an issue with v1.8.1, it works as expected in v1.7.9. I won't be able to look further tonight, but at least I could identify when the regression occured : it's caused by the work done to prepare multi-threading, more specifically by this commit : http://git.haproxy.org/?p=haproxy.git;a=commitdiff;h=64cc49cf7 I add Emeric to the thread, maybe he'll be able to provide a fix faster than me (I'll won't be very available for the next days). -- Cyril Bonté
Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1
Hi Andrew, Thanks for the info but I’m afraid I’m not seeing anything here that would affect the issue I’m seeing, and by the way the docs don’t indicate that the cookie names have to match the server names. That being said, I tried using your settings and am still seeing the issue (see below for new full config). And like I say, this is only an issue with v1.8.1, it works as expected in v1.7.9. defaults mode http option redispatch retries 3 timeout queue 20s timeout client 50s timeout connect 5s timeout server 50s listen stats bind :1936 stats enable stats uri / stats hide-version stats admin if TRUE frontend main bind :9080 default_backend main backend main balance leastconn cookie SERVERID maxidle 30m maxlife 12h insert nocache indirect server server-1-google www.google.com:80 weight 100 cookie server-1-google check port 80 inter 4000 rise 2 fall 2 minconn 0 maxconn 0 on-marked-down shutdown-sessions server server-2-yahoo www.yahoo.com:80 weight 100 cookie server-2-yahoo check port 80 inter 4000 rise 2 fall 2 minconn 0 maxconn 0 on-marked-down shutdown-sessions On Wed, Dec 20, 2017 at 8:57 PM, Andrew Smalleywrote: > Also our cookie line looks as below > > cookie SERVERID maxidle 30m maxlife 12h insert nocache indirect > Andruw Smalley > > Loadbalancer.org Ltd. > > www.loadbalancer.org > +1 888 867 9504 / +44 (0)330 380 1064 > asmal...@loadbalancer.org > > Leave a Review | Deployment Guides | Blog > > > On 20 December 2017 at 20:55, Andrew Smalley > wrote: > > Greg > > > > its just been pointed out your cookies are wrong, they would usually > > match your server name. > > I would change this > > > > server server-1-google www.google.com:80 check cookie google > > server server-2-yahoo www.yahoo.com:80 check cookie yahoo > > > > > > to this > > > > server server-1-google www.google.com:80 check cookie server-1-google > > server server-2-yahoo www.yahoo.com:80 check cookie server-2-yahoo > > > > > > We use something like this as a default server line > > > > server RIP_Name 172.16.1.1 weight 100 cookie RIP_Name check port > > 80 inter 4000 rise 2 fall 2 minconn 0 maxconn 0 on-marked-down > > shutdown-sessions > > Andruw Smalley > > > > Loadbalancer.org Ltd. > > > > www.loadbalancer.org > > +1 888 867 9504 / +44 (0)330 380 1064 > > asmal...@loadbalancer.org > > > > Leave a Review | Deployment Guides | Blog > > > > > > On 20 December 2017 at 20:52, Andrew Smalley > wrote: > >> Hi Greg > >> > >> Apologies I was confused with the terminology we use here, > >> > >> Indeed MAINT should be the same as our HALT feature, > >> > >> Maybe you can share your config and we can see what's wrong? > >> > >> > >> Andruw Smalley > >> > >> Loadbalancer.org Ltd. > >> > >> www.loadbalancer.org > >> +1 888 867 9504 / +44 (0)330 380 1064 > >> asmal...@loadbalancer.org > >> > >> Leave a Review | Deployment Guides | Blog > >> > >> > >> On 20 December 2017 at 20:45, Greg Nolle > wrote: > >>> Hi Andrew, > >>> > >>> I can’t find any reference to a “HALTED” status in the manual. I’m > >>> *not* referring to “DRAIN” though (which I would expect to behave as > >>> you describe), I’m referring to "MAINT", i.e. disabling the backend > >>> server. Here’s the snippet from the management manual to clarify what > >>> I’m referring to: > >>> > “Setting the state to “maint” disables any traffic to the server as > well as any health checks" > >>> > >>> Best regards, > >>> Greg > >>> > >>> On Wed, Dec 20, 2017 at 8:29 PM, Andrew Smalley > >>> wrote: > Hi Greg > > You say traffic still goes to the real server when in MAINT mode, > Assuming you mean DRAIN Mode and not HALTED then this is expected. > > Existing connections still goto a server while DRAINING but no new > connections will get there. > > If the real server is HALTED then no traffic gets to it. > > > Andruw Smalley > > Loadbalancer.org Ltd. > > www.loadbalancer.org > +1 888 867 9504 / +44 (0)330 380 1064 > asmal...@loadbalancer.org > > Leave a Review | Deployment Guides | Blog > > > On 20 December 2017 at 20:26, Greg Nolle > wrote: > > When cookie persistence is used, it seems that the status of the > > servers in the backend is ignored in v1.8.1. I try marking as MAINT a > > backend server for which my browser has been given a cookie but > > subsequent requests still go to that server (as verified in the > > stats). The same issue happens when I use a stick table. > > > > I’ve included a simple example config where this happens at the > > bottom. The exact same config in v1.7.9 gives the expected behaviour > > that new requests are migrated to a different active backend server. > > > >
回复:Haproxy SSl Termination performance issue
Hi, JohanThanks a lotMikeSent from my HuaWei mate9 Phone 原始邮件 主题:Re: Haproxy SSl Termination performance issue发件人:Johan Hendriks收件人:haproxy@formilux.org,hongw...@163.com抄送:Op di 19 dec. 2017 om 16:16 schreef hongw...@163.com Hi, Thierry.Thanks again.One more question about you talking about, can i just think like this way: assume we got a 8core cpu, we use 7 of them for ssl termination and one is for http forward? If it is, is there any document for this soulution?Thanks a lotMikeI think the following blog can help you with that.https://medium.com/cagataygurturk/using-haproxy-in-multi-core-environments-68ee2d3ae39eRegardsJohan
Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1
Also our cookie line looks as below cookie SERVERID maxidle 30m maxlife 12h insert nocache indirect Andruw Smalley Loadbalancer.org Ltd. www.loadbalancer.org +1 888 867 9504 / +44 (0)330 380 1064 asmal...@loadbalancer.org Leave a Review | Deployment Guides | Blog On 20 December 2017 at 20:55, Andrew Smalleywrote: > Greg > > its just been pointed out your cookies are wrong, they would usually > match your server name. > I would change this > > server server-1-google www.google.com:80 check cookie google > server server-2-yahoo www.yahoo.com:80 check cookie yahoo > > > to this > > server server-1-google www.google.com:80 check cookie server-1-google > server server-2-yahoo www.yahoo.com:80 check cookie server-2-yahoo > > > We use something like this as a default server line > > server RIP_Name 172.16.1.1 weight 100 cookie RIP_Name check port > 80 inter 4000 rise 2 fall 2 minconn 0 maxconn 0 on-marked-down > shutdown-sessions > Andruw Smalley > > Loadbalancer.org Ltd. > > www.loadbalancer.org > +1 888 867 9504 / +44 (0)330 380 1064 > asmal...@loadbalancer.org > > Leave a Review | Deployment Guides | Blog > > > On 20 December 2017 at 20:52, Andrew Smalley > wrote: >> Hi Greg >> >> Apologies I was confused with the terminology we use here, >> >> Indeed MAINT should be the same as our HALT feature, >> >> Maybe you can share your config and we can see what's wrong? >> >> >> Andruw Smalley >> >> Loadbalancer.org Ltd. >> >> www.loadbalancer.org >> +1 888 867 9504 / +44 (0)330 380 1064 >> asmal...@loadbalancer.org >> >> Leave a Review | Deployment Guides | Blog >> >> >> On 20 December 2017 at 20:45, Greg Nolle wrote: >>> Hi Andrew, >>> >>> I can’t find any reference to a “HALTED” status in the manual. I’m >>> *not* referring to “DRAIN” though (which I would expect to behave as >>> you describe), I’m referring to "MAINT", i.e. disabling the backend >>> server. Here’s the snippet from the management manual to clarify what >>> I’m referring to: >>> “Setting the state to “maint” disables any traffic to the server as well as any health checks" >>> >>> Best regards, >>> Greg >>> >>> On Wed, Dec 20, 2017 at 8:29 PM, Andrew Smalley >>> wrote: Hi Greg You say traffic still goes to the real server when in MAINT mode, Assuming you mean DRAIN Mode and not HALTED then this is expected. Existing connections still goto a server while DRAINING but no new connections will get there. If the real server is HALTED then no traffic gets to it. Andruw Smalley Loadbalancer.org Ltd. www.loadbalancer.org +1 888 867 9504 / +44 (0)330 380 1064 asmal...@loadbalancer.org Leave a Review | Deployment Guides | Blog On 20 December 2017 at 20:26, Greg Nolle wrote: > When cookie persistence is used, it seems that the status of the > servers in the backend is ignored in v1.8.1. I try marking as MAINT a > backend server for which my browser has been given a cookie but > subsequent requests still go to that server (as verified in the > stats). The same issue happens when I use a stick table. > > I’ve included a simple example config where this happens at the > bottom. The exact same config in v1.7.9 gives the expected behaviour > that new requests are migrated to a different active backend server. > > Any ideas? > > Many thanks, > Greg > > defaults > mode http > option redispatch > retries 3 > timeout queue 20s > timeout client 50s > timeout connect 5s > timeout server 50s > > listen stats > bind :1936 > stats enable > stats uri / > stats hide-version > stats admin if TRUE > > frontend main > bind :9080 > default_backend main > > backend main > balance leastconn > cookie SERVERID insert indirect nocache > server server-1-google www.google.com:80 check cookie google > server server-2-yahoo www.yahoo.com:80 check cookie yahoo >
Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1
Greg its just been pointed out your cookies are wrong, they would usually match your server name. I would change this server server-1-google www.google.com:80 check cookie google server server-2-yahoo www.yahoo.com:80 check cookie yahoo to this server server-1-google www.google.com:80 check cookie server-1-google server server-2-yahoo www.yahoo.com:80 check cookie server-2-yahoo We use something like this as a default server line server RIP_Name 172.16.1.1 weight 100 cookie RIP_Name check port 80 inter 4000 rise 2 fall 2 minconn 0 maxconn 0 on-marked-down shutdown-sessions Andruw Smalley Loadbalancer.org Ltd. www.loadbalancer.org +1 888 867 9504 / +44 (0)330 380 1064 asmal...@loadbalancer.org Leave a Review | Deployment Guides | Blog On 20 December 2017 at 20:52, Andrew Smalleywrote: > Hi Greg > > Apologies I was confused with the terminology we use here, > > Indeed MAINT should be the same as our HALT feature, > > Maybe you can share your config and we can see what's wrong? > > > Andruw Smalley > > Loadbalancer.org Ltd. > > www.loadbalancer.org > +1 888 867 9504 / +44 (0)330 380 1064 > asmal...@loadbalancer.org > > Leave a Review | Deployment Guides | Blog > > > On 20 December 2017 at 20:45, Greg Nolle wrote: >> Hi Andrew, >> >> I can’t find any reference to a “HALTED” status in the manual. I’m >> *not* referring to “DRAIN” though (which I would expect to behave as >> you describe), I’m referring to "MAINT", i.e. disabling the backend >> server. Here’s the snippet from the management manual to clarify what >> I’m referring to: >> >>> “Setting the state to “maint” disables any traffic to the server as well as >>> any health checks" >> >> Best regards, >> Greg >> >> On Wed, Dec 20, 2017 at 8:29 PM, Andrew Smalley >> wrote: >>> Hi Greg >>> >>> You say traffic still goes to the real server when in MAINT mode, >>> Assuming you mean DRAIN Mode and not HALTED then this is expected. >>> >>> Existing connections still goto a server while DRAINING but no new >>> connections will get there. >>> >>> If the real server is HALTED then no traffic gets to it. >>> >>> >>> Andruw Smalley >>> >>> Loadbalancer.org Ltd. >>> >>> www.loadbalancer.org >>> +1 888 867 9504 / +44 (0)330 380 1064 >>> asmal...@loadbalancer.org >>> >>> Leave a Review | Deployment Guides | Blog >>> >>> >>> On 20 December 2017 at 20:26, Greg Nolle wrote: When cookie persistence is used, it seems that the status of the servers in the backend is ignored in v1.8.1. I try marking as MAINT a backend server for which my browser has been given a cookie but subsequent requests still go to that server (as verified in the stats). The same issue happens when I use a stick table. I’ve included a simple example config where this happens at the bottom. The exact same config in v1.7.9 gives the expected behaviour that new requests are migrated to a different active backend server. Any ideas? Many thanks, Greg defaults mode http option redispatch retries 3 timeout queue 20s timeout client 50s timeout connect 5s timeout server 50s listen stats bind :1936 stats enable stats uri / stats hide-version stats admin if TRUE frontend main bind :9080 default_backend main backend main balance leastconn cookie SERVERID insert indirect nocache server server-1-google www.google.com:80 check cookie google server server-2-yahoo www.yahoo.com:80 check cookie yahoo >>>
Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1
Hi Greg Apologies I was confused with the terminology we use here, Indeed MAINT should be the same as our HALT feature, Maybe you can share your config and we can see what's wrong? Andruw Smalley Loadbalancer.org Ltd. www.loadbalancer.org +1 888 867 9504 / +44 (0)330 380 1064 asmal...@loadbalancer.org Leave a Review | Deployment Guides | Blog On 20 December 2017 at 20:45, Greg Nollewrote: > Hi Andrew, > > I can’t find any reference to a “HALTED” status in the manual. I’m > *not* referring to “DRAIN” though (which I would expect to behave as > you describe), I’m referring to "MAINT", i.e. disabling the backend > server. Here’s the snippet from the management manual to clarify what > I’m referring to: > >> “Setting the state to “maint” disables any traffic to the server as well as >> any health checks" > > Best regards, > Greg > > On Wed, Dec 20, 2017 at 8:29 PM, Andrew Smalley > wrote: >> Hi Greg >> >> You say traffic still goes to the real server when in MAINT mode, >> Assuming you mean DRAIN Mode and not HALTED then this is expected. >> >> Existing connections still goto a server while DRAINING but no new >> connections will get there. >> >> If the real server is HALTED then no traffic gets to it. >> >> >> Andruw Smalley >> >> Loadbalancer.org Ltd. >> >> www.loadbalancer.org >> +1 888 867 9504 / +44 (0)330 380 1064 >> asmal...@loadbalancer.org >> >> Leave a Review | Deployment Guides | Blog >> >> >> On 20 December 2017 at 20:26, Greg Nolle wrote: >>> When cookie persistence is used, it seems that the status of the >>> servers in the backend is ignored in v1.8.1. I try marking as MAINT a >>> backend server for which my browser has been given a cookie but >>> subsequent requests still go to that server (as verified in the >>> stats). The same issue happens when I use a stick table. >>> >>> I’ve included a simple example config where this happens at the >>> bottom. The exact same config in v1.7.9 gives the expected behaviour >>> that new requests are migrated to a different active backend server. >>> >>> Any ideas? >>> >>> Many thanks, >>> Greg >>> >>> defaults >>> mode http >>> option redispatch >>> retries 3 >>> timeout queue 20s >>> timeout client 50s >>> timeout connect 5s >>> timeout server 50s >>> >>> listen stats >>> bind :1936 >>> stats enable >>> stats uri / >>> stats hide-version >>> stats admin if TRUE >>> >>> frontend main >>> bind :9080 >>> default_backend main >>> >>> backend main >>> balance leastconn >>> cookie SERVERID insert indirect nocache >>> server server-1-google www.google.com:80 check cookie google >>> server server-2-yahoo www.yahoo.com:80 check cookie yahoo >>> >>
Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1
Hi Andrew, I can’t find any reference to a “HALTED” status in the manual. I’m *not* referring to “DRAIN” though (which I would expect to behave as you describe), I’m referring to "MAINT", i.e. disabling the backend server. Here’s the snippet from the management manual to clarify what I’m referring to: > “Setting the state to “maint” disables any traffic to the server as well as > any health checks" Best regards, Greg On Wed, Dec 20, 2017 at 8:29 PM, Andrew Smalleywrote: > Hi Greg > > You say traffic still goes to the real server when in MAINT mode, > Assuming you mean DRAIN Mode and not HALTED then this is expected. > > Existing connections still goto a server while DRAINING but no new > connections will get there. > > If the real server is HALTED then no traffic gets to it. > > > Andruw Smalley > > Loadbalancer.org Ltd. > > www.loadbalancer.org > +1 888 867 9504 / +44 (0)330 380 1064 > asmal...@loadbalancer.org > > Leave a Review | Deployment Guides | Blog > > > On 20 December 2017 at 20:26, Greg Nolle wrote: >> When cookie persistence is used, it seems that the status of the >> servers in the backend is ignored in v1.8.1. I try marking as MAINT a >> backend server for which my browser has been given a cookie but >> subsequent requests still go to that server (as verified in the >> stats). The same issue happens when I use a stick table. >> >> I’ve included a simple example config where this happens at the >> bottom. The exact same config in v1.7.9 gives the expected behaviour >> that new requests are migrated to a different active backend server. >> >> Any ideas? >> >> Many thanks, >> Greg >> >> defaults >> mode http >> option redispatch >> retries 3 >> timeout queue 20s >> timeout client 50s >> timeout connect 5s >> timeout server 50s >> >> listen stats >> bind :1936 >> stats enable >> stats uri / >> stats hide-version >> stats admin if TRUE >> >> frontend main >> bind :9080 >> default_backend main >> >> backend main >> balance leastconn >> cookie SERVERID insert indirect nocache >> server server-1-google www.google.com:80 check cookie google >> server server-2-yahoo www.yahoo.com:80 check cookie yahoo >> >
Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1
Hi Greg You say traffic still goes to the real server when in MAINT mode, Assuming you mean DRAIN Mode and not HALTED then this is expected. Existing connections still goto a server while DRAINING but no new connections will get there. If the real server is HALTED then no traffic gets to it. Andruw Smalley Loadbalancer.org Ltd. www.loadbalancer.org +1 888 867 9504 / +44 (0)330 380 1064 asmal...@loadbalancer.org Leave a Review | Deployment Guides | Blog On 20 December 2017 at 20:26, Greg Nollewrote: > When cookie persistence is used, it seems that the status of the > servers in the backend is ignored in v1.8.1. I try marking as MAINT a > backend server for which my browser has been given a cookie but > subsequent requests still go to that server (as verified in the > stats). The same issue happens when I use a stick table. > > I’ve included a simple example config where this happens at the > bottom. The exact same config in v1.7.9 gives the expected behaviour > that new requests are migrated to a different active backend server. > > Any ideas? > > Many thanks, > Greg > > defaults > mode http > option redispatch > retries 3 > timeout queue 20s > timeout client 50s > timeout connect 5s > timeout server 50s > > listen stats > bind :1936 > stats enable > stats uri / > stats hide-version > stats admin if TRUE > > frontend main > bind :9080 > default_backend main > > backend main > balance leastconn > cookie SERVERID insert indirect nocache > server server-1-google www.google.com:80 check cookie google > server server-2-yahoo www.yahoo.com:80 check cookie yahoo >
Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1
When cookie persistence is used, it seems that the status of the servers in the backend is ignored in v1.8.1. I try marking as MAINT a backend server for which my browser has been given a cookie but subsequent requests still go to that server (as verified in the stats). The same issue happens when I use a stick table. I’ve included a simple example config where this happens at the bottom. The exact same config in v1.7.9 gives the expected behaviour that new requests are migrated to a different active backend server. Any ideas? Many thanks, Greg defaults mode http option redispatch retries 3 timeout queue 20s timeout client 50s timeout connect 5s timeout server 50s listen stats bind :1936 stats enable stats uri / stats hide-version stats admin if TRUE frontend main bind :9080 default_backend main backend main balance leastconn cookie SERVERID insert indirect nocache server server-1-google www.google.com:80 check cookie google server server-2-yahoo www.yahoo.com:80 check cookie yahoo
Re: Haproxy SSl Termination performance issue
Op di 19 dec. 2017 om 16:16 schreef hongw...@163.com> Hi, Thierry. > > Thanks again. > > One more question about you talking about, can i just think like this way: > assume we got a 8core cpu, we use 7 of them for ssl termination and one is > for http forward? If it is, is there any document for this soulution? > > Thanks a lot > > Mike I think the following blog can help you with that. > https://medium.com/cagataygurturk/using-haproxy-in-multi-core-environments-68ee2d3ae39e Regards Johan >
Re: Issue after upgrade from 1.7 to 1.8 related with active sessions
Hello Ricardo, On Wed, Dec 20, 2017 at 05:00:33PM +0100, Ricardo Fraile wrote: > Hello, > > After upgrade from 1.7.4 to 1.8.1, basically with the end of mail conf > snippet, the sessions started to grow, as example: > > 1.7.4: > Active sessions: ~161 > Active sessions rate: ~425 > > 1.8.1: > Active sessions: ~6700 > Active sessions rate: ~350 Ah that's not good :-( > Looking into the linux (3.16.7) server, there are a high number of > CLOSE_WAIT connections from the bind address of the listen service to > the backend nodes. Strange, I don't understand well what type of traffic could cause this except a loop, that sounds a bit unusual. > System logs reported "TCP: too many orphaned sockets", but after > increase net.ipv4.tcp_max_orphans value, the message stops but nothing > changes. Normally orphans correspond to closed sockets for which there are still data in the system's buffers so this should be unrelated to the CLOSE_WAIT, unless there's a loop somewhere where a backend reconnects to the frontend, which can explain both situations at once when the timeout strikes. > Haproxy logs reported for that listen the indicator "sD", but only with > 1.8. Thus a server timeout during the end of the transfer. That doesn't make much sense either. > Any ideas to dig into the issue? It would be very useful that you share your configuration (please remove any sensitive info like stats passwords or IP addresses you prefer to keep private). When running 1.8, it would be useful to issue the following commands on the CLI and capture the output to a file : - "show sess all" - "show fd" Warning, the first one will reveal a lot of info (internal addresses etc) so you may want to send it privately and not to the list if this is the case (though it takes longer to diagnose it :-)). If you think you can reproduce this on a test machine out of production, that would be extremely useful. We have not noticed any single such issue on haproxy.org which has delivered about 100 GB and 2 million requests over the last 2 weeks, with this exact same version, so that makes me think that either the config or the type of traffic count a lot there to trigger the problem you are observing. Regards, Willy
Issue after upgrade from 1.7 to 1.8 related with active sessions
Hello, After upgrade from 1.7.4 to 1.8.1, basically with the end of mail conf snippet, the sessions started to grow, as example: 1.7.4: Active sessions: ~161 Active sessions rate: ~425 1.8.1: Active sessions: ~6700 Active sessions rate: ~350 Looking into the linux (3.16.7) server, there are a high number of CLOSE_WAIT connections from the bind address of the listen service to the backend nodes. System logs reported "TCP: too many orphaned sockets", but after increase net.ipv4.tcp_max_orphans value, the message stops but nothing changes. Haproxy logs reported for that listen the indicator "sD", but only with 1.8. Any ideas to dig into the issue? Thanks, defaults modetcp retries 3 option redispatch maxconn 10 fullconn 10 timeout connect 5s timeout server 50s timeout client 50s listen proxy-tcp bind 192.168.1.1:80 balance roundrobin server node1 192.168.1.10:80 server node2 192.168.1.11:80 server node3 192.168.1.12:80
Re: haproxy and solarflare onload
> > Apparently I'm not graphing conn_rate (i need to add it, but I have no > values now), cause we're also sending all SSL traffic to other nodes using > TCP load balancing. > Update: I'm at around 7,7k connection rate.
Re: haproxy and solarflare onload
On Wed, Dec 20, 2017 at 2:10 PM, Willy Tarreauwrote: > On Wed, Dec 20, 2017 at 11:48:27AM +0100, Elias Abacioglu wrote: > > Yes, I have one node running with Solarflare SFN8522 2p 10Gbit/s > currently > > without Onload enabled. > > it has 17.5K http_request_rate and ~26% server interrupts on core 0 and 1 > > where the NIC IRQ is bound to. > > > > And I have a similar node with Intel X710 2p 10Gbit/s. > > It has 26.1K http_request_rate and ~26% server interrupts on core 0 and 1 > > where the NIC IRQ is bound to. > > > > both nodes have 1 socket, Intel Xeon CPU E3-1280 v6, 32 GB RAM. > > In both cases this is very low performance. We're getting 245k req/s and > 90k > connections/s oon a somewhat comparable Core i7-4790K on small objects and > are easily saturating 2 10G NICs with medium sized objects. The problem I'm > seeing is that if your cable is not saturated, you're supposed to be > running > at a higher request rate, and if it's saturated you should not observe the > slightest difference between the two tests. In fact what I'm suspecting is > that you're running with ~45kB objects and that your intel NIC managed to > reach the line rate, and that in the same test the SFN8522 cannot even > reach > it. Am I wrong ? If so, from what I remember from the 40G tests 2 years > ago, > you should be able to get close to 35-40G with such object sizes. > I forgot to mention that this was not a benchmark test. I tested with live traffic (/me hides in shame). Thats the reason we aren't saturated, so it's not that we've hit the limit now. And why one node gets more traffic got to do with the different VIP's assigned, not sure really why, cause we split the VIP's evenly but I suspect one of the VIP's get more traffic. And I can tell you that we can't reach 245k req/s. At this very moment if I look on the Intel node, we've got ~70% cpu idle on core 0+1 were the NIC IRQ is set to, and ~58% on core 2+3 where haproxy is running. And this node is currently at around 27k req/s. With this math we would hit 100% CPU on core 2+3 at around 47k req/s. So we spike in CPU usage before 50k req/s, we're not even close to 245k req/s, guess I need to learn more tuning. That was my goal/vision with Solarflare offload the CPU more so I can give more cores to haproxy. Is there a metric that shows avg object size? Apparently I'm not graphing conn_rate (i need to add it, but I have no values now), cause we're also sending all SSL traffic to other nodes using TCP load balancing. > Oh just one thing : verify that you're not running with jumbo frames on the > solarflare case. Jumbo frames used to help *a lot* 10 years ago when they > were saving interrupt processing time. Nowadays they instead hurt a lot > because allocating 9kB of contiguous memory at once for a packet is much > more difficult than allocating only 1.5kB. Honnestly I don't remember > having > seen a single case over the last 5+ years where running with jumbo frames > would permit to reach the same performance as no jumbo. GSO+GRO have helped > a lot there as well! > Jumbo frames are not enabled, these nodes are connected directly to the Internet :) GSO+GRO is enabled for both Intel and Solarflare.
Re: haproxy and solarflare onload
On Wed, Dec 20, 2017 at 3:27 PM, Christian Ruppertwrote: > Oh, btw, I'm just reading that onload documentation. > > > Filters > Filters are used to deliver packets received from the wire to the > appropriate > application. When filters are exhausted it is not possible to create new > accelerated > sockets. The general recommendation is that applications do not allocate > more than > 4096 filters ‐ or applications should not create more than 4096 outgoing > connections. > The limit does not apply to inbound connections to a listening socket. > This feels severely limiting. The support was talking about Scalable filters, but that doesn't seem applicable when using virtual IP's.
Re: haproxy and solarflare onload
On Wed, Dec 20, 2017 at 1:11 PM, Christian Ruppertwrote: > Hi Elias, > > I'm currently preparing a test setup including a SFN8522 + onload. > How did you measure it? When did those errors (drops/discard?) appear, > during a test or some real traffic? > The first thing I did is updating the driver + firmware. Is both up2date > in your case? > > I haven't measured / compared the SFN8522 against a X520 nor X710 yet but > do you have RSS / affinity or something related, enabled/set? Intel has > some features and Solarflare may have its own stuff. Those errors appear during real traffic. My workflow. * kill keepalived, public VIP's gets removed. * restart HAproxy with onload (no errors here) * start keepalived, public VIP's gets added. * traffic starts to flow in and errors start appearing almost immediately. And driver and firmware is latest. Currently I've have not set RSS, which means it creates one queue per core, in my case 4 per port. And I assign affinity to core 0+1, haproxy gets core 2+3.
Re: haproxy and solarflare onload
Oh, btw, I'm just reading that onload documentation. Filters Filters are used to deliver packets received from the wire to the appropriate application. When filters are exhausted it is not possible to create new accelerated sockets. The general recommendation is that applications do not allocate more than 4096 filters ‐ or applications should not create more than 4096 outgoing connections. The limit does not apply to inbound connections to a listening socket. On 2017-12-20 13:11, Christian Ruppert wrote: Hi Elias, I'm currently preparing a test setup including a SFN8522 + onload. How did you measure it? When did those errors (drops/discard?) appear, during a test or some real traffic? The first thing I did is updating the driver + firmware. Is both up2date in your case? I haven't measured / compared the SFN8522 against a X520 nor X710 yet but do you have RSS / affinity or something related, enabled/set? Intel has some features and Solarflare may have its own stuff. On 2017-12-20 11:48, Elias Abacioglu wrote: Hi, Yes on the LD_PRELOAD. Yes, I have one node running with Solarflare SFN8522 2p 10Gbit/s currently without Onload enabled. it has 17.5K http_request_rate and ~26% server interrupts on core 0 and 1 where the NIC IRQ is bound to. And I have a similar node with Intel X710 2p 10Gbit/s. It has 26.1K http_request_rate and ~26% server interrupts on core 0 and 1 where the NIC IRQ is bound to. both nodes have 1 socket, Intel Xeon CPU E3-1280 v6, 32 GB RAM. So without Onload Solarflare performs worse than the X710 since it has the same amount of SI load with less traffic. And a side note is that I haven't compared the ethtool settings between Intel and Solarflare, just running with the defaults of both cards. I currently have a support ticket open with the Solarflare team to about the issues I mentioned in my previous mail, if they sort that out I can perhaps setup a test server if I can manage to free up one server. Then we can do some synthetic benchmarks with a set of parameters of your choosing. Regards, /Elias On Wed, Dec 20, 2017 at 9:48 AM, Willy Tarreauwrote: Hi Elias, On Tue, Dec 19, 2017 at 02:23:21PM +0100, Elias Abacioglu wrote: Hi, I recently bought a solarflare NIC with (ScaleOut) Onload / OpenOnload to test it with HAproxy. Have anyone tried running haproxy with solarflare onload functions? After I started haproxy with onload, this started spamming on the kernel log: Dec 12 14:11:54 dflb06 kernel: [357643.035355] [onload] oof_socket_add_full_hw: 6:3083 ERROR: FILTER TCP 10.3.54.43:4147 [1] 10.3.20.116:80 [2] failed (-16) Dec 12 14:11:54 dflb06 kernel: [357643.064395] [onload] oof_socket_add_full_hw: 6:3491 ERROR: FILTER TCP 10.3.54.43:39321 [3] 10.3.20.113:80 [4] failed (-16) Dec 12 14:11:54 dflb06 kernel: [357643.081069] [onload] oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403 [5] 10.3.20.30:445 [6] failed (-16) Dec 12 14:11:54 dflb06 kernel: [357643.082625] [onload] oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403 [5] 10.3.20.30:445 [6] failed (-16) And this in haproxy log: Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy ssl-relay reached system memory limit at 9931 sockets. Please check system tunables. Dec 12 14:12:07 dflb06 haproxy[21146]: Proxy ssl-relay reached system memory limit at 9184 sockets. Please check system tunables. Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system memory limit at 9931 sockets. Please check system tunables. Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system memory limit at 9931 sockets. Please check system tunables. Apparently I've hit the max hardware filter limit on the card. Does anyone here have experience in running haproxy with onload features? I've never got any report of any such test, though in the past I thought it would be nice to run such a test, at least to validate the perimeter covered by the library (you're using it as LD_PRELOAD, that's it ?). Mind sharing insights and advice on how to get a functional setup? I really don't know what can reasonably be expected from code trying to partially bypass a part of the TCP stack to be honnest. From what I've read a long time ago, onload might be doing its work in a not very intrusive way but judging by your messages above I'm having some doubts now. Have you tried without this software, using the card normally ? I mean, 2 years ago I had the opportunity to test haproxy on a dual-40G setup and we reached 60 Gbps of forwarded traffic with all machines in the test bench reaching their limits (and haproxy reaching 100% as well), so for me that proves that the TCP stack still scales extremely well and that while such acceleration software might make sense for a next generation NIC running on old hardware (eg: when 400 Gbps NICs start to appear), I'm really not convinced that it makes any sense to use them on well supported setups like 2-4 10Gbps links which are very
Re: haproxy and solarflare onload
On Wed, Dec 20, 2017 at 11:48:27AM +0100, Elias Abacioglu wrote: > Yes, I have one node running with Solarflare SFN8522 2p 10Gbit/s currently > without Onload enabled. > it has 17.5K http_request_rate and ~26% server interrupts on core 0 and 1 > where the NIC IRQ is bound to. > > And I have a similar node with Intel X710 2p 10Gbit/s. > It has 26.1K http_request_rate and ~26% server interrupts on core 0 and 1 > where the NIC IRQ is bound to. > > both nodes have 1 socket, Intel Xeon CPU E3-1280 v6, 32 GB RAM. In both cases this is very low performance. We're getting 245k req/s and 90k connections/s oon a somewhat comparable Core i7-4790K on small objects and are easily saturating 2 10G NICs with medium sized objects. The problem I'm seeing is that if your cable is not saturated, you're supposed to be running at a higher request rate, and if it's saturated you should not observe the slightest difference between the two tests. In fact what I'm suspecting is that you're running with ~45kB objects and that your intel NIC managed to reach the line rate, and that in the same test the SFN8522 cannot even reach it. Am I wrong ? If so, from what I remember from the 40G tests 2 years ago, you should be able to get close to 35-40G with such object sizes. Oh just one thing : verify that you're not running with jumbo frames on the solarflare case. Jumbo frames used to help *a lot* 10 years ago when they were saving interrupt processing time. Nowadays they instead hurt a lot because allocating 9kB of contiguous memory at once for a packet is much more difficult than allocating only 1.5kB. Honnestly I don't remember having seen a single case over the last 5+ years where running with jumbo frames would permit to reach the same performance as no jumbo. GSO+GRO have helped a lot there as well! Cheers, Willy
Re: haproxy and solarflare onload
Hi Elias, I'm currently preparing a test setup including a SFN8522 + onload. How did you measure it? When did those errors (drops/discard?) appear, during a test or some real traffic? The first thing I did is updating the driver + firmware. Is both up2date in your case? I haven't measured / compared the SFN8522 against a X520 nor X710 yet but do you have RSS / affinity or something related, enabled/set? Intel has some features and Solarflare may have its own stuff. On 2017-12-20 11:48, Elias Abacioglu wrote: Hi, Yes on the LD_PRELOAD. Yes, I have one node running with Solarflare SFN8522 2p 10Gbit/s currently without Onload enabled. it has 17.5K http_request_rate and ~26% server interrupts on core 0 and 1 where the NIC IRQ is bound to. And I have a similar node with Intel X710 2p 10Gbit/s. It has 26.1K http_request_rate and ~26% server interrupts on core 0 and 1 where the NIC IRQ is bound to. both nodes have 1 socket, Intel Xeon CPU E3-1280 v6, 32 GB RAM. So without Onload Solarflare performs worse than the X710 since it has the same amount of SI load with less traffic. And a side note is that I haven't compared the ethtool settings between Intel and Solarflare, just running with the defaults of both cards. I currently have a support ticket open with the Solarflare team to about the issues I mentioned in my previous mail, if they sort that out I can perhaps setup a test server if I can manage to free up one server. Then we can do some synthetic benchmarks with a set of parameters of your choosing. Regards, /Elias On Wed, Dec 20, 2017 at 9:48 AM, Willy Tarreauwrote: Hi Elias, On Tue, Dec 19, 2017 at 02:23:21PM +0100, Elias Abacioglu wrote: Hi, I recently bought a solarflare NIC with (ScaleOut) Onload / OpenOnload to test it with HAproxy. Have anyone tried running haproxy with solarflare onload functions? After I started haproxy with onload, this started spamming on the kernel log: Dec 12 14:11:54 dflb06 kernel: [357643.035355] [onload] oof_socket_add_full_hw: 6:3083 ERROR: FILTER TCP 10.3.54.43:4147 [1] 10.3.20.116:80 [2] failed (-16) Dec 12 14:11:54 dflb06 kernel: [357643.064395] [onload] oof_socket_add_full_hw: 6:3491 ERROR: FILTER TCP 10.3.54.43:39321 [3] 10.3.20.113:80 [4] failed (-16) Dec 12 14:11:54 dflb06 kernel: [357643.081069] [onload] oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403 [5] 10.3.20.30:445 [6] failed (-16) Dec 12 14:11:54 dflb06 kernel: [357643.082625] [onload] oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403 [5] 10.3.20.30:445 [6] failed (-16) And this in haproxy log: Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy ssl-relay reached system memory limit at 9931 sockets. Please check system tunables. Dec 12 14:12:07 dflb06 haproxy[21146]: Proxy ssl-relay reached system memory limit at 9184 sockets. Please check system tunables. Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system memory limit at 9931 sockets. Please check system tunables. Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system memory limit at 9931 sockets. Please check system tunables. Apparently I've hit the max hardware filter limit on the card. Does anyone here have experience in running haproxy with onload features? I've never got any report of any such test, though in the past I thought it would be nice to run such a test, at least to validate the perimeter covered by the library (you're using it as LD_PRELOAD, that's it ?). Mind sharing insights and advice on how to get a functional setup? I really don't know what can reasonably be expected from code trying to partially bypass a part of the TCP stack to be honnest. From what I've read a long time ago, onload might be doing its work in a not very intrusive way but judging by your messages above I'm having some doubts now. Have you tried without this software, using the card normally ? I mean, 2 years ago I had the opportunity to test haproxy on a dual-40G setup and we reached 60 Gbps of forwarded traffic with all machines in the test bench reaching their limits (and haproxy reaching 100% as well), so for me that proves that the TCP stack still scales extremely well and that while such acceleration software might make sense for a next generation NIC running on old hardware (eg: when 400 Gbps NICs start to appear), I'm really not convinced that it makes any sense to use them on well supported setups like 2-4 10Gbps links which are very common nowadays. I mean, I managed to run haproxy at 10Gbps 10 years ago on a core2-duo! Hardware has evolved quite a bit since :-) Regards, Willy Links: -- [1] http://10.3.54.43:4147 [2] http://10.3.20.116:80 [3] http://10.3.54.43:39321 [4] http://10.3.20.113:80 [5] http://10.3.54.43:62403 [6] http://10.3.20.30:445 -- Regards, Christian Ruppert
Re: haproxy and solarflare onload
Hi, Yes on the LD_PRELOAD. Yes, I have one node running with Solarflare SFN8522 2p 10Gbit/s currently without Onload enabled. it has 17.5K http_request_rate and ~26% server interrupts on core 0 and 1 where the NIC IRQ is bound to. And I have a similar node with Intel X710 2p 10Gbit/s. It has 26.1K http_request_rate and ~26% server interrupts on core 0 and 1 where the NIC IRQ is bound to. both nodes have 1 socket, Intel Xeon CPU E3-1280 v6, 32 GB RAM. So without Onload Solarflare performs worse than the X710 since it has the same amount of SI load with less traffic. And a side note is that I haven't compared the ethtool settings between Intel and Solarflare, just running with the defaults of both cards. I currently have a support ticket open with the Solarflare team to about the issues I mentioned in my previous mail, if they sort that out I can perhaps setup a test server if I can manage to free up one server. Then we can do some synthetic benchmarks with a set of parameters of your choosing. Regards, /Elias On Wed, Dec 20, 2017 at 9:48 AM, Willy Tarreauwrote: > Hi Elias, > > On Tue, Dec 19, 2017 at 02:23:21PM +0100, Elias Abacioglu wrote: > > Hi, > > > > I recently bought a solarflare NIC with (ScaleOut) Onload / OpenOnload to > > test it with HAproxy. > > > > Have anyone tried running haproxy with solarflare onload functions? > > > > After I started haproxy with onload, this started spamming on the kernel > > log: > > Dec 12 14:11:54 dflb06 kernel: [357643.035355] [onload] > > oof_socket_add_full_hw: 6:3083 ERROR: FILTER TCP 10.3.54.43:4147 > > 10.3.20.116:80 failed (-16) > > Dec 12 14:11:54 dflb06 kernel: [357643.064395] [onload] > > oof_socket_add_full_hw: 6:3491 ERROR: FILTER TCP 10.3.54.43:39321 > > 10.3.20.113:80 failed (-16) > > Dec 12 14:11:54 dflb06 kernel: [357643.081069] [onload] > > oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403 > > 10.3.20.30:445 failed (-16) > > Dec 12 14:11:54 dflb06 kernel: [357643.082625] [onload] > > oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403 > > 10.3.20.30:445 failed (-16) > > > > And this in haproxy log: > > Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy ssl-relay reached system > > memory limit at 9931 sockets. Please check system tunables. > > Dec 12 14:12:07 dflb06 haproxy[21146]: Proxy ssl-relay reached system > > memory limit at 9184 sockets. Please check system tunables. > > Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system memory > > limit at 9931 sockets. Please check system tunables. > > Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system memory > > limit at 9931 sockets. Please check system tunables. > > > > > > Apparently I've hit the max hardware filter limit on the card. > > Does anyone here have experience in running haproxy with onload features? > > I've never got any report of any such test, though in the past I thought > it would be nice to run such a test, at least to validate the perimeter > covered by the library (you're using it as LD_PRELOAD, that's it ?). > > > Mind sharing insights and advice on how to get a functional setup? > > I really don't know what can reasonably be expected from code trying to > partially bypass a part of the TCP stack to be honnest. From what I've > read a long time ago, onload might be doing its work in a not very > intrusive way but judging by your messages above I'm having some doubts > now. > > Have you tried without this software, using the card normally ? I mean, > 2 years ago I had the opportunity to test haproxy on a dual-40G setup > and we reached 60 Gbps of forwarded traffic with all machines in the > test bench reaching their limits (and haproxy reaching 100% as well), > so for me that proves that the TCP stack still scales extremely well > and that while such acceleration software might make sense for a next > generation NIC running on old hardware (eg: when 400 Gbps NICs start > to appear), I'm really not convinced that it makes any sense to use > them on well supported setups like 2-4 10Gbps links which are very > common nowadays. I mean, I managed to run haproxy at 10Gbps 10 years > ago on a core2-duo! Hardware has evolved quite a bit since :-) > > Regards, > Willy >
Re: haproxy and solarflare onload
Hi Elias, On Tue, Dec 19, 2017 at 02:23:21PM +0100, Elias Abacioglu wrote: > Hi, > > I recently bought a solarflare NIC with (ScaleOut) Onload / OpenOnload to > test it with HAproxy. > > Have anyone tried running haproxy with solarflare onload functions? > > After I started haproxy with onload, this started spamming on the kernel > log: > Dec 12 14:11:54 dflb06 kernel: [357643.035355] [onload] > oof_socket_add_full_hw: 6:3083 ERROR: FILTER TCP 10.3.54.43:4147 > 10.3.20.116:80 failed (-16) > Dec 12 14:11:54 dflb06 kernel: [357643.064395] [onload] > oof_socket_add_full_hw: 6:3491 ERROR: FILTER TCP 10.3.54.43:39321 > 10.3.20.113:80 failed (-16) > Dec 12 14:11:54 dflb06 kernel: [357643.081069] [onload] > oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403 > 10.3.20.30:445 failed (-16) > Dec 12 14:11:54 dflb06 kernel: [357643.082625] [onload] > oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403 > 10.3.20.30:445 failed (-16) > > And this in haproxy log: > Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy ssl-relay reached system > memory limit at 9931 sockets. Please check system tunables. > Dec 12 14:12:07 dflb06 haproxy[21146]: Proxy ssl-relay reached system > memory limit at 9184 sockets. Please check system tunables. > Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system memory > limit at 9931 sockets. Please check system tunables. > Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system memory > limit at 9931 sockets. Please check system tunables. > > > Apparently I've hit the max hardware filter limit on the card. > Does anyone here have experience in running haproxy with onload features? I've never got any report of any such test, though in the past I thought it would be nice to run such a test, at least to validate the perimeter covered by the library (you're using it as LD_PRELOAD, that's it ?). > Mind sharing insights and advice on how to get a functional setup? I really don't know what can reasonably be expected from code trying to partially bypass a part of the TCP stack to be honnest. From what I've read a long time ago, onload might be doing its work in a not very intrusive way but judging by your messages above I'm having some doubts now. Have you tried without this software, using the card normally ? I mean, 2 years ago I had the opportunity to test haproxy on a dual-40G setup and we reached 60 Gbps of forwarded traffic with all machines in the test bench reaching their limits (and haproxy reaching 100% as well), so for me that proves that the TCP stack still scales extremely well and that while such acceleration software might make sense for a next generation NIC running on old hardware (eg: when 400 Gbps NICs start to appear), I'm really not convinced that it makes any sense to use them on well supported setups like 2-4 10Gbps links which are very common nowadays. I mean, I managed to run haproxy at 10Gbps 10 years ago on a core2-duo! Hardware has evolved quite a bit since :-) Regards, Willy
Re: [PATCH 1/2] Fix compiler warning in iprange.c
On Fri, Dec 15, 2017 at 10:21:29AM -0600, Ryan O'Hara wrote: > The declaration of main() in iprange.c did not specify a type, causing > a compiler warning [-Wimplicit-int]. This patch simply declares main() > to be type 'int' and calls exit(0) at the end of the function. Both patches applied, thank you Ryan. Willy
Re: [PATCH] BUG: NetScaler CIP handling is incorrect
Great, thank you guys! Best regards, Andreas On Wed, Dec 20, 2017 at 7:06 AM, Willy Tarreauwrote: > On Tue, Dec 19, 2017 at 11:10:58PM +, Bertrand Jacquin wrote: > > Hi Andreas and Willy, > > > > Please find attached a patch serie adding support for both legacy and > > standard CIP protocol while keeping compatibility with current > > configuration format. > > Excellent, now applied to 1.9, will backport it to 1.8 later. > > Thanks a lot guys, I've seen how many round trips it required to > validate these changes on your respective infrastructures, it was > a very productive cooperation! > > Cheers, > Willy >