Re: 1.8.1 Segfault + slowdown

2017-12-20 Thread Peter Lindegaard Hansen
update:

we've disabled h2 on 1.8, and everything is running as expected again.
haproxy does not degrade performance anymore nor does it segfault.
so it issues seem to be related to the h2



Med venlig hilsen


*Peter Lindegaard Hansen*

*Softwareudvikler / Partner*

Telefon: +45 96 500 300 | Direkte: 69 14 97 04 | Email: p...@tigermedia.dk
Tiger Media A/S | Gl. Gugvej 17C | 9000 Aalborg | Web: www.tigermedia.dk

For supportspørgsmål kontakt os da på supp...@tigermedia.dk eller på tlf.
96 500 300
og din henvendelse vil blive besvaret af første ledige medarbejder.

2017-12-19 11:36 GMT+01:00 Peter Lindegaard Hansen :

> Hi list,
>
> We upgraded from 1.5 to 1.8 recently - then to 1.8.1
>
> Now we're seeing segfaults and slowdowns with haproxy
>
> Repeating:
> Dec 19 11:14:26 haproxy02 kernel: [122635.295196] haproxy[29582]: segfault
> at 55d5152279b2 ip 7f9c2dcc5a28 sp 7fff07caf4b8 error 6 in
> libc-2.23.so[7f9c2dc26000+1c]
> Dec 19 11:14:26 haproxy02 systemd[1]: haproxy.service: Main process
> exited, code=exited, status=139/n/a
> Dec 19 11:14:26 haproxy02 systemd[1]: haproxy.service: Unit entered failed
> state.
> Dec 19 11:14:26 haproxy02 systemd[1]: haproxy.service: Failed with result
> 'exit-code'.
> Dec 19 11:14:26 haproxy02 systemd[1]: haproxy.service: Service hold-off
> time over, scheduling restart.
> Dec 19 11:14:26 haproxy02 systemd[1]: Stopped HAProxy Load Balancer.
> Dec 19 11:14:26 haproxy02 systemd[1]: Starting HAProxy Load Balancer...
> Dec 19 11:14:26 haproxy02 systemd[1]: Started HAProxy Load Balancer.
> Dec 19 11:14:27 haproxy02 kernel: [122636.578738] haproxy[31479]: segfault
> at 56409a8c1de2 ip 7fa5fa349a28 sp 7ffe66f4f688 error 6 in
> libc-2.23.so[7fa5fa2aa000+1c]
> Dec 19 11:14:27 haproxy02 systemd[1]: haproxy.service: Main process
> exited, code=exited, status=139/n/a
> Dec 19 11:14:27 haproxy02 systemd[1]: haproxy.service: Unit entered failed
> state.
> Dec 19 11:14:27 haproxy02 systemd[1]: haproxy.service: Failed with result
> 'exit-code'.
> Dec 19 11:14:27 haproxy02 systemd[1]: haproxy.service: Service hold-off
> time over, scheduling restart.
> Dec 19 11:14:27 haproxy02 systemd[1]: Stopped HAProxy Load Balancer.
> Dec 19 11:14:27 haproxy02 systemd[1]: Starting HAProxy Load Balancer...
> Dec 19 11:14:28 haproxy02 systemd[1]: Started HAProxy Load Balancer.
> Dec 19 11:14:28 haproxy02 kernel: [122637.569863] haproxy[31487]: segfault
> at 55cb4bd59857 ip 7f71e678aa28 sp 7fffb94427b8 error 6 in
> libc-2.23.so[7f71e66eb000+1c]
> Dec 19 11:14:28 haproxy02 systemd[1]: haproxy.service: Main process
> exited, code=exited, status=139/n/a
> Dec 19 11:14:28 haproxy02 systemd[1]: haproxy.service: Unit entered failed
> state.
> Dec 19 11:14:28 haproxy02 systemd[1]: haproxy.service: Failed with result
> 'exit-code'.
> Dec 19 11:14:28 haproxy02 systemd[1]: haproxy.service: Service hold-off
> time over, scheduling restart.
> Dec 19 11:14:28 haproxy02 systemd[1]: Stopped HAProxy Load Balancer.
> Dec 19 11:14:28 haproxy02 systemd[1]: Starting HAProxy Load Balancer...
> Dec 19 11:14:29 haproxy02 systemd[1]: Started HAProxy Load Balancer.
>
>
> At same time in haproxy.log
>
> (lots of ssl handshake failures...) then
> Dec 19 11:14:26 haproxy02 haproxy[29579]: [ALERT] 352/090058 (29579) :
> Current worker 29582 left with exit code 139
> Dec 19 11:14:26 haproxy02 haproxy[29579]: [ALERT] 352/090058 (29579) :
> exit-on-failure: killing every workers with SIGTERM
> Dec 19 11:14:26 haproxy02 haproxy[29579]: [WARNING] 352/090058 (29579) :
> All workers are left. Leaving... (139)
> Dec 19 11:14:27 haproxy02 haproxy[31476]: [ALERT] 352/111426 (31476) :
> Current worker 31479 left with exit code 139
> Dec 19 11:14:27 haproxy02 haproxy[31476]: [ALERT] 352/111426 (31476) :
> exit-on-failure: killing every workers with SIGTERM
> Dec 19 11:14:27 haproxy02 haproxy[31476]: [WARNING] 352/111426 (31476) :
> All workers are left. Leaving... (139)
> Dec 19 11:14:28 haproxy02 haproxy[31485]: [ALERT] 352/111428 (31485) :
> Current worker 31487 left with exit code 139
> Dec 19 11:14:28 haproxy02 haproxy[31485]: [ALERT] 352/111428 (31485) :
> exit-on-failure: killing every workers with SIGTERM
> Dec 19 11:14:28 haproxy02 haproxy[31485]: [WARNING] 352/111428 (31485) :
> All workers are left. Leaving... (139)
> Dec 19 11:14:29 haproxy02 haproxy[31493]: [ALERT] 352/111429 (31493) :
> Current worker 31496 left with exit code 139
> Dec 19 11:14:29 haproxy02 haproxy[31493]: [ALERT] 352/111429 (31493) :
> exit-on-failure: killing every workers with SIGTERM
> Dec 19 11:14:29 haproxy02 haproxy[31493]: [WARNING] 352/111429 (31493) :
> All workers are left. Leaving... (139)
> Dec 19 11:14:30 haproxy02 haproxy[31503]: [ALERT] 352/111429 (31503) :
> Current worker 31505 left with exit code 139
> Dec 19 11:14:30 haproxy02 haproxy[31503]: [ALERT] 352/111429 (31503) :
> exit-on-failure: killing every workers with SIGTERM
> Dec 19 11:14:30 haproxy02 haproxy[31503]: [WARNING] 

Re: Quick update on pending HTTP/2 issues

2017-12-20 Thread Willy Tarreau
Hi again guys,

so another quick update on the subject :
  - the currently known POST issues have been resolved a few days ago,
requiring a significant number of changes which make the code better
anyway so it was not bad in the end ;

  - the abortonclose case has been solved as well. The issue was that
in HTTP/1 there's no way for a browser to mention that it's aborting
so the best it can do is to close, resulting in a read0 being received
and the option is there to indicate that haproxy must consider this
read0 as an abort. In HTTP/2 we have RST_STREAM to explicitly abort,
and the read0 is sent at the end of all requests to indicate that the
request is complete, thus creates the confusion with the option above.
The fix consists in making abortonclose useless on read0 for H2, and
now it's OK.

I'd have liked to issue 1.8.2 with these fixes, but apparently a few other
issues are pending. Let's say that we'll release before the end of the week
with or without the fixes.

(PS: yes I'm still thinking about issuing 1.7.10 ASAP, but the 1.8 issues
 have been diverting everyone *a lot* and it's not finished).

Willy



Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1

2017-12-20 Thread Willy Tarreau
On Thu, Dec 21, 2017 at 12:04:11AM +0100, Cyril Bonté wrote:
> Hi Greg,
> 
> Le 20/12/2017 à 22:42, Greg Nolle a écrit :
> > Hi Andrew,
> > 
> > Thanks for the info but I'm afraid I'm not seeing anything here that
> > would affect the issue I'm seeing, and by the way the docs don't
> > indicate that the cookie names have to match the server names.
> 
> First, don't worry about the configuration, there is nothing wrong in it ;-)
> 
> > That being said, I tried using your settings and am still seeing the
> > issue (see below for new full config). And like I say, this is only an
> > issue with v1.8.1, it works as expected in v1.7.9.
> 
> I won't be able to look further tonight, but at least I could identify when
> the regression occured : it's caused by the work done to prepare
> multi-threading, more specifically by this commit :
> http://git.haproxy.org/?p=haproxy.git;a=commitdiff;h=64cc49cf7
> 
> I add Emeric to the thread, maybe he'll be able to provide a fix faster than
> me (I'll won't be very available for the next days).

Thus I'll ping Emeric tomorrow as well so that we can issue 1.8.2 soon in
case someone wants to play with it on friday afternoon jus before xmas :-)

Willy



Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1

2017-12-20 Thread Cyril Bonté

Hi Greg,

Le 20/12/2017 à 22:42, Greg Nolle a écrit :

Hi Andrew,

Thanks for the info but I’m afraid I’m not seeing anything here that 
would affect the issue I’m seeing, and by the way the docs don’t 
indicate that the cookie names have to match the server names.


First, don't worry about the configuration, there is nothing wrong in it ;-)

That being said, I tried using your settings and am still seeing the 
issue (see below for new full config). And like I say, this is only an 
issue with v1.8.1, it works as expected in v1.7.9.


I won't be able to look further tonight, but at least I could identify 
when the regression occured : it's caused by the work done to prepare 
multi-threading, more specifically by this commit : 
http://git.haproxy.org/?p=haproxy.git;a=commitdiff;h=64cc49cf7


I add Emeric to the thread, maybe he'll be able to provide a fix faster 
than me (I'll won't be very available for the next days).


--
Cyril Bonté



Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1

2017-12-20 Thread Greg Nolle
Hi Andrew,

Thanks for the info but I’m afraid I’m not seeing anything here that would
affect the issue I’m seeing, and by the way the docs don’t indicate that
the cookie names have to match the server names.

That being said, I tried using your settings and am still seeing the issue
(see below for new full config). And like I say, this is only an issue with
v1.8.1, it works as expected in v1.7.9.

defaults
  mode http
  option redispatch
  retries 3
  timeout queue 20s
  timeout client 50s
  timeout connect 5s
  timeout server 50s

listen stats
  bind :1936
  stats enable
  stats uri /
  stats hide-version
  stats admin if TRUE

frontend main
  bind :9080
  default_backend main

backend main
  balance leastconn
  cookie SERVERID maxidle 30m maxlife 12h insert nocache indirect
  server server-1-google www.google.com:80 weight 100 cookie
server-1-google check port 80 inter 4000  rise 2  fall 2  minconn 0
 maxconn 0 on-marked-down shutdown-sessions
  server server-2-yahoo www.yahoo.com:80 weight 100 cookie server-2-yahoo
check port 80 inter 4000  rise 2  fall 2  minconn 0  maxconn 0
on-marked-down shutdown-sessions



On Wed, Dec 20, 2017 at 8:57 PM, Andrew Smalley 
wrote:

> Also our cookie line looks as below
>
> cookie SERVERID maxidle 30m maxlife 12h insert nocache indirect
> Andruw Smalley
>
> Loadbalancer.org Ltd.
>
> www.loadbalancer.org
> +1 888 867 9504 / +44 (0)330 380 1064
> asmal...@loadbalancer.org
>
> Leave a Review | Deployment Guides | Blog
>
>
> On 20 December 2017 at 20:55, Andrew Smalley 
> wrote:
> > Greg
> >
> > its just been pointed out your cookies are wrong, they would usually
> > match your server name.
> > I would change this
> >
> >   server server-1-google www.google.com:80 check cookie google
> >   server server-2-yahoo www.yahoo.com:80 check cookie yahoo
> >
> >
> > to this
> >
> >   server server-1-google www.google.com:80 check cookie server-1-google
> >   server server-2-yahoo www.yahoo.com:80 check cookie server-2-yahoo
> >
> >
> > We use something like this as a default server line
> >
> > server RIP_Name 172.16.1.1  weight 100  cookie RIP_Name  check port
> > 80 inter 4000  rise 2  fall 2  minconn 0  maxconn 0  on-marked-down
> > shutdown-sessions
> > Andruw Smalley
> >
> > Loadbalancer.org Ltd.
> >
> > www.loadbalancer.org
> > +1 888 867 9504 / +44 (0)330 380 1064
> > asmal...@loadbalancer.org
> >
> > Leave a Review | Deployment Guides | Blog
> >
> >
> > On 20 December 2017 at 20:52, Andrew Smalley 
> wrote:
> >> Hi Greg
> >>
> >> Apologies  I was confused with the terminology we use here,
> >>
> >> Indeed MAINT should be the same as our HALT feature,
> >>
> >> Maybe you can share your config and we can see what's wrong?
> >>
> >>
> >> Andruw Smalley
> >>
> >> Loadbalancer.org Ltd.
> >>
> >> www.loadbalancer.org
> >> +1 888 867 9504 / +44 (0)330 380 1064
> >> asmal...@loadbalancer.org
> >>
> >> Leave a Review | Deployment Guides | Blog
> >>
> >>
> >> On 20 December 2017 at 20:45, Greg Nolle 
> wrote:
> >>> Hi Andrew,
> >>>
> >>> I can’t find any reference to a “HALTED” status in the manual. I’m
> >>> *not* referring to “DRAIN” though (which I would expect to behave as
> >>> you describe), I’m referring to "MAINT", i.e. disabling the backend
> >>> server. Here’s the snippet from the management manual to clarify what
> >>> I’m referring to:
> >>>
>  “Setting the state to “maint” disables any traffic to the server as
> well as any health checks"
> >>>
> >>> Best regards,
> >>> Greg
> >>>
> >>> On Wed, Dec 20, 2017 at 8:29 PM, Andrew Smalley
> >>>  wrote:
>  Hi Greg
> 
>  You say traffic still goes to the real server when in MAINT mode,
>  Assuming you mean DRAIN Mode and not HALTED then this is expected.
> 
>  Existing connections still goto a server while DRAINING but no new
>  connections will get there.
> 
>  If the real server is HALTED then no traffic gets to it.
> 
> 
>  Andruw Smalley
> 
>  Loadbalancer.org Ltd.
> 
>  www.loadbalancer.org
>  +1 888 867 9504 / +44 (0)330 380 1064
>  asmal...@loadbalancer.org
> 
>  Leave a Review | Deployment Guides | Blog
> 
> 
>  On 20 December 2017 at 20:26, Greg Nolle 
> wrote:
> > When cookie persistence is used, it seems that the status of the
> > servers in the backend is ignored in v1.8.1. I try marking as MAINT a
> > backend server for which my browser has been given a cookie but
> > subsequent requests still go to that server (as verified in the
> > stats). The same issue happens when I use a stick table.
> >
> > I’ve included a simple example config where this happens at the
> > bottom. The exact same config in v1.7.9 gives the expected behaviour
> > that new requests are migrated to a different active backend server.
> >
> > 

回复:Haproxy SSl Termination performance issue

2017-12-20 Thread hongw...@163.com
Hi, JohanThanks a lotMikeSent from my HuaWei mate9 Phone 原始邮件 主题:Re: Haproxy SSl Termination performance issue发件人:Johan Hendriks 收件人:haproxy@formilux.org,hongw...@163.com抄送:Op di 19 dec. 2017 om 16:16 schreef hongw...@163.com Hi, Thierry.Thanks again.One more question about you talking about, can i just think like this way: assume we got a 8core cpu, we use 7 of them for ssl termination and one is for http forward?  If it is, is there any document for this soulution?Thanks a lotMikeI think the following blog can help you with that.https://medium.com/cagataygurturk/using-haproxy-in-multi-core-environments-68ee2d3ae39eRegardsJohan


Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1

2017-12-20 Thread Andrew Smalley
Also our cookie line looks as below

 cookie SERVERID maxidle 30m maxlife 12h insert nocache indirect
Andruw Smalley

Loadbalancer.org Ltd.

www.loadbalancer.org
+1 888 867 9504 / +44 (0)330 380 1064
asmal...@loadbalancer.org

Leave a Review | Deployment Guides | Blog


On 20 December 2017 at 20:55, Andrew Smalley  wrote:
> Greg
>
> its just been pointed out your cookies are wrong, they would usually
> match your server name.
> I would change this
>
>   server server-1-google www.google.com:80 check cookie google
>   server server-2-yahoo www.yahoo.com:80 check cookie yahoo
>
>
> to this
>
>   server server-1-google www.google.com:80 check cookie server-1-google
>   server server-2-yahoo www.yahoo.com:80 check cookie server-2-yahoo
>
>
> We use something like this as a default server line
>
> server RIP_Name 172.16.1.1  weight 100  cookie RIP_Name  check port
> 80 inter 4000  rise 2  fall 2  minconn 0  maxconn 0  on-marked-down
> shutdown-sessions
> Andruw Smalley
>
> Loadbalancer.org Ltd.
>
> www.loadbalancer.org
> +1 888 867 9504 / +44 (0)330 380 1064
> asmal...@loadbalancer.org
>
> Leave a Review | Deployment Guides | Blog
>
>
> On 20 December 2017 at 20:52, Andrew Smalley  
> wrote:
>> Hi Greg
>>
>> Apologies  I was confused with the terminology we use here,
>>
>> Indeed MAINT should be the same as our HALT feature,
>>
>> Maybe you can share your config and we can see what's wrong?
>>
>>
>> Andruw Smalley
>>
>> Loadbalancer.org Ltd.
>>
>> www.loadbalancer.org
>> +1 888 867 9504 / +44 (0)330 380 1064
>> asmal...@loadbalancer.org
>>
>> Leave a Review | Deployment Guides | Blog
>>
>>
>> On 20 December 2017 at 20:45, Greg Nolle  wrote:
>>> Hi Andrew,
>>>
>>> I can’t find any reference to a “HALTED” status in the manual. I’m
>>> *not* referring to “DRAIN” though (which I would expect to behave as
>>> you describe), I’m referring to "MAINT", i.e. disabling the backend
>>> server. Here’s the snippet from the management manual to clarify what
>>> I’m referring to:
>>>
 “Setting the state to “maint” disables any traffic to the server as well 
 as any health checks"
>>>
>>> Best regards,
>>> Greg
>>>
>>> On Wed, Dec 20, 2017 at 8:29 PM, Andrew Smalley
>>>  wrote:
 Hi Greg

 You say traffic still goes to the real server when in MAINT mode,
 Assuming you mean DRAIN Mode and not HALTED then this is expected.

 Existing connections still goto a server while DRAINING but no new
 connections will get there.

 If the real server is HALTED then no traffic gets to it.


 Andruw Smalley

 Loadbalancer.org Ltd.

 www.loadbalancer.org
 +1 888 867 9504 / +44 (0)330 380 1064
 asmal...@loadbalancer.org

 Leave a Review | Deployment Guides | Blog


 On 20 December 2017 at 20:26, Greg Nolle  wrote:
> When cookie persistence is used, it seems that the status of the
> servers in the backend is ignored in v1.8.1. I try marking as MAINT a
> backend server for which my browser has been given a cookie but
> subsequent requests still go to that server (as verified in the
> stats). The same issue happens when I use a stick table.
>
> I’ve included a simple example config where this happens at the
> bottom. The exact same config in v1.7.9 gives the expected behaviour
> that new requests are migrated to a different active backend server.
>
> Any ideas?
>
> Many thanks,
> Greg
>
> defaults
>   mode http
>   option redispatch
>   retries 3
>   timeout queue 20s
>   timeout client 50s
>   timeout connect 5s
>   timeout server 50s
>
> listen stats
>   bind :1936
>   stats enable
>   stats uri /
>   stats hide-version
>   stats admin if TRUE
>
> frontend main
>   bind :9080
>   default_backend main
>
> backend main
>   balance leastconn
>   cookie SERVERID insert indirect nocache
>   server server-1-google www.google.com:80 check cookie google
>   server server-2-yahoo www.yahoo.com:80 check cookie yahoo
>




Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1

2017-12-20 Thread Andrew Smalley
Greg

its just been pointed out your cookies are wrong, they would usually
match your server name.
I would change this

  server server-1-google www.google.com:80 check cookie google
  server server-2-yahoo www.yahoo.com:80 check cookie yahoo


to this

  server server-1-google www.google.com:80 check cookie server-1-google
  server server-2-yahoo www.yahoo.com:80 check cookie server-2-yahoo


We use something like this as a default server line

  server RIP_Name 172.16.1.1  weight 100  cookie RIP_Name  check port
80 inter 4000  rise 2  fall 2  minconn 0  maxconn 0  on-marked-down
shutdown-sessions
Andruw Smalley

Loadbalancer.org Ltd.

www.loadbalancer.org
+1 888 867 9504 / +44 (0)330 380 1064
asmal...@loadbalancer.org

Leave a Review | Deployment Guides | Blog


On 20 December 2017 at 20:52, Andrew Smalley  wrote:
> Hi Greg
>
> Apologies  I was confused with the terminology we use here,
>
> Indeed MAINT should be the same as our HALT feature,
>
> Maybe you can share your config and we can see what's wrong?
>
>
> Andruw Smalley
>
> Loadbalancer.org Ltd.
>
> www.loadbalancer.org
> +1 888 867 9504 / +44 (0)330 380 1064
> asmal...@loadbalancer.org
>
> Leave a Review | Deployment Guides | Blog
>
>
> On 20 December 2017 at 20:45, Greg Nolle  wrote:
>> Hi Andrew,
>>
>> I can’t find any reference to a “HALTED” status in the manual. I’m
>> *not* referring to “DRAIN” though (which I would expect to behave as
>> you describe), I’m referring to "MAINT", i.e. disabling the backend
>> server. Here’s the snippet from the management manual to clarify what
>> I’m referring to:
>>
>>> “Setting the state to “maint” disables any traffic to the server as well as 
>>> any health checks"
>>
>> Best regards,
>> Greg
>>
>> On Wed, Dec 20, 2017 at 8:29 PM, Andrew Smalley
>>  wrote:
>>> Hi Greg
>>>
>>> You say traffic still goes to the real server when in MAINT mode,
>>> Assuming you mean DRAIN Mode and not HALTED then this is expected.
>>>
>>> Existing connections still goto a server while DRAINING but no new
>>> connections will get there.
>>>
>>> If the real server is HALTED then no traffic gets to it.
>>>
>>>
>>> Andruw Smalley
>>>
>>> Loadbalancer.org Ltd.
>>>
>>> www.loadbalancer.org
>>> +1 888 867 9504 / +44 (0)330 380 1064
>>> asmal...@loadbalancer.org
>>>
>>> Leave a Review | Deployment Guides | Blog
>>>
>>>
>>> On 20 December 2017 at 20:26, Greg Nolle  wrote:
 When cookie persistence is used, it seems that the status of the
 servers in the backend is ignored in v1.8.1. I try marking as MAINT a
 backend server for which my browser has been given a cookie but
 subsequent requests still go to that server (as verified in the
 stats). The same issue happens when I use a stick table.

 I’ve included a simple example config where this happens at the
 bottom. The exact same config in v1.7.9 gives the expected behaviour
 that new requests are migrated to a different active backend server.

 Any ideas?

 Many thanks,
 Greg

 defaults
   mode http
   option redispatch
   retries 3
   timeout queue 20s
   timeout client 50s
   timeout connect 5s
   timeout server 50s

 listen stats
   bind :1936
   stats enable
   stats uri /
   stats hide-version
   stats admin if TRUE

 frontend main
   bind :9080
   default_backend main

 backend main
   balance leastconn
   cookie SERVERID insert indirect nocache
   server server-1-google www.google.com:80 check cookie google
   server server-2-yahoo www.yahoo.com:80 check cookie yahoo

>>>



Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1

2017-12-20 Thread Andrew Smalley
Hi Greg

Apologies  I was confused with the terminology we use here,

Indeed MAINT should be the same as our HALT feature,

Maybe you can share your config and we can see what's wrong?


Andruw Smalley

Loadbalancer.org Ltd.

www.loadbalancer.org
+1 888 867 9504 / +44 (0)330 380 1064
asmal...@loadbalancer.org

Leave a Review | Deployment Guides | Blog


On 20 December 2017 at 20:45, Greg Nolle  wrote:
> Hi Andrew,
>
> I can’t find any reference to a “HALTED” status in the manual. I’m
> *not* referring to “DRAIN” though (which I would expect to behave as
> you describe), I’m referring to "MAINT", i.e. disabling the backend
> server. Here’s the snippet from the management manual to clarify what
> I’m referring to:
>
>> “Setting the state to “maint” disables any traffic to the server as well as 
>> any health checks"
>
> Best regards,
> Greg
>
> On Wed, Dec 20, 2017 at 8:29 PM, Andrew Smalley
>  wrote:
>> Hi Greg
>>
>> You say traffic still goes to the real server when in MAINT mode,
>> Assuming you mean DRAIN Mode and not HALTED then this is expected.
>>
>> Existing connections still goto a server while DRAINING but no new
>> connections will get there.
>>
>> If the real server is HALTED then no traffic gets to it.
>>
>>
>> Andruw Smalley
>>
>> Loadbalancer.org Ltd.
>>
>> www.loadbalancer.org
>> +1 888 867 9504 / +44 (0)330 380 1064
>> asmal...@loadbalancer.org
>>
>> Leave a Review | Deployment Guides | Blog
>>
>>
>> On 20 December 2017 at 20:26, Greg Nolle  wrote:
>>> When cookie persistence is used, it seems that the status of the
>>> servers in the backend is ignored in v1.8.1. I try marking as MAINT a
>>> backend server for which my browser has been given a cookie but
>>> subsequent requests still go to that server (as verified in the
>>> stats). The same issue happens when I use a stick table.
>>>
>>> I’ve included a simple example config where this happens at the
>>> bottom. The exact same config in v1.7.9 gives the expected behaviour
>>> that new requests are migrated to a different active backend server.
>>>
>>> Any ideas?
>>>
>>> Many thanks,
>>> Greg
>>>
>>> defaults
>>>   mode http
>>>   option redispatch
>>>   retries 3
>>>   timeout queue 20s
>>>   timeout client 50s
>>>   timeout connect 5s
>>>   timeout server 50s
>>>
>>> listen stats
>>>   bind :1936
>>>   stats enable
>>>   stats uri /
>>>   stats hide-version
>>>   stats admin if TRUE
>>>
>>> frontend main
>>>   bind :9080
>>>   default_backend main
>>>
>>> backend main
>>>   balance leastconn
>>>   cookie SERVERID insert indirect nocache
>>>   server server-1-google www.google.com:80 check cookie google
>>>   server server-2-yahoo www.yahoo.com:80 check cookie yahoo
>>>
>>



Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1

2017-12-20 Thread Greg Nolle
Hi Andrew,

I can’t find any reference to a “HALTED” status in the manual. I’m
*not* referring to “DRAIN” though (which I would expect to behave as
you describe), I’m referring to "MAINT", i.e. disabling the backend
server. Here’s the snippet from the management manual to clarify what
I’m referring to:

> “Setting the state to “maint” disables any traffic to the server as well as 
> any health checks"

Best regards,
Greg

On Wed, Dec 20, 2017 at 8:29 PM, Andrew Smalley
 wrote:
> Hi Greg
>
> You say traffic still goes to the real server when in MAINT mode,
> Assuming you mean DRAIN Mode and not HALTED then this is expected.
>
> Existing connections still goto a server while DRAINING but no new
> connections will get there.
>
> If the real server is HALTED then no traffic gets to it.
>
>
> Andruw Smalley
>
> Loadbalancer.org Ltd.
>
> www.loadbalancer.org
> +1 888 867 9504 / +44 (0)330 380 1064
> asmal...@loadbalancer.org
>
> Leave a Review | Deployment Guides | Blog
>
>
> On 20 December 2017 at 20:26, Greg Nolle  wrote:
>> When cookie persistence is used, it seems that the status of the
>> servers in the backend is ignored in v1.8.1. I try marking as MAINT a
>> backend server for which my browser has been given a cookie but
>> subsequent requests still go to that server (as verified in the
>> stats). The same issue happens when I use a stick table.
>>
>> I’ve included a simple example config where this happens at the
>> bottom. The exact same config in v1.7.9 gives the expected behaviour
>> that new requests are migrated to a different active backend server.
>>
>> Any ideas?
>>
>> Many thanks,
>> Greg
>>
>> defaults
>>   mode http
>>   option redispatch
>>   retries 3
>>   timeout queue 20s
>>   timeout client 50s
>>   timeout connect 5s
>>   timeout server 50s
>>
>> listen stats
>>   bind :1936
>>   stats enable
>>   stats uri /
>>   stats hide-version
>>   stats admin if TRUE
>>
>> frontend main
>>   bind :9080
>>   default_backend main
>>
>> backend main
>>   balance leastconn
>>   cookie SERVERID insert indirect nocache
>>   server server-1-google www.google.com:80 check cookie google
>>   server server-2-yahoo www.yahoo.com:80 check cookie yahoo
>>
>



Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1

2017-12-20 Thread Andrew Smalley
Hi Greg

You say traffic still goes to the real server when in MAINT mode,
Assuming you mean DRAIN Mode and not HALTED then this is expected.

Existing connections still goto a server while DRAINING but no new
connections will get there.

If the real server is HALTED then no traffic gets to it.


Andruw Smalley

Loadbalancer.org Ltd.

www.loadbalancer.org
+1 888 867 9504 / +44 (0)330 380 1064
asmal...@loadbalancer.org

Leave a Review | Deployment Guides | Blog


On 20 December 2017 at 20:26, Greg Nolle  wrote:
> When cookie persistence is used, it seems that the status of the
> servers in the backend is ignored in v1.8.1. I try marking as MAINT a
> backend server for which my browser has been given a cookie but
> subsequent requests still go to that server (as verified in the
> stats). The same issue happens when I use a stick table.
>
> I’ve included a simple example config where this happens at the
> bottom. The exact same config in v1.7.9 gives the expected behaviour
> that new requests are migrated to a different active backend server.
>
> Any ideas?
>
> Many thanks,
> Greg
>
> defaults
>   mode http
>   option redispatch
>   retries 3
>   timeout queue 20s
>   timeout client 50s
>   timeout connect 5s
>   timeout server 50s
>
> listen stats
>   bind :1936
>   stats enable
>   stats uri /
>   stats hide-version
>   stats admin if TRUE
>
> frontend main
>   bind :9080
>   default_backend main
>
> backend main
>   balance leastconn
>   cookie SERVERID insert indirect nocache
>   server server-1-google www.google.com:80 check cookie google
>   server server-2-yahoo www.yahoo.com:80 check cookie yahoo
>



Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1

2017-12-20 Thread Greg Nolle
When cookie persistence is used, it seems that the status of the
servers in the backend is ignored in v1.8.1. I try marking as MAINT a
backend server for which my browser has been given a cookie but
subsequent requests still go to that server (as verified in the
stats). The same issue happens when I use a stick table.

I’ve included a simple example config where this happens at the
bottom. The exact same config in v1.7.9 gives the expected behaviour
that new requests are migrated to a different active backend server.

Any ideas?

Many thanks,
Greg

defaults
  mode http
  option redispatch
  retries 3
  timeout queue 20s
  timeout client 50s
  timeout connect 5s
  timeout server 50s

listen stats
  bind :1936
  stats enable
  stats uri /
  stats hide-version
  stats admin if TRUE

frontend main
  bind :9080
  default_backend main

backend main
  balance leastconn
  cookie SERVERID insert indirect nocache
  server server-1-google www.google.com:80 check cookie google
  server server-2-yahoo www.yahoo.com:80 check cookie yahoo



Re: Haproxy SSl Termination performance issue

2017-12-20 Thread Johan Hendriks
Op di 19 dec. 2017 om 16:16 schreef hongw...@163.com 

> Hi, Thierry.
>
> Thanks again.
>
> One more question about you talking about, can i just think like this way:
> assume we got a 8core cpu, we use 7 of them for ssl termination and one is
> for http forward? If it is, is there any document for this soulution?
>
> Thanks a lot
>
> Mike


I think the following blog can help you with that.

>
https://medium.com/cagataygurturk/using-haproxy-in-multi-core-environments-68ee2d3ae39e

Regards
Johan



>


Re: Issue after upgrade from 1.7 to 1.8 related with active sessions

2017-12-20 Thread Willy Tarreau
Hello Ricardo,

On Wed, Dec 20, 2017 at 05:00:33PM +0100, Ricardo Fraile wrote:
> Hello,
> 
> After upgrade from 1.7.4 to 1.8.1, basically with the end of mail conf
> snippet, the sessions started to grow, as example:
> 
> 1.7.4:
> Active sessions: ~161
> Active sessions rate: ~425
> 
> 1.8.1:
> Active sessions: ~6700
> Active sessions rate: ~350

Ah that's not good :-(

> Looking into the linux (3.16.7) server, there are a high number of
> CLOSE_WAIT connections from the bind address of the listen service to
> the backend nodes.

Strange, I don't understand well what type of traffic could cause this
except a loop, that sounds a bit unusual.

> System logs reported "TCP: too many orphaned sockets", but after
> increase net.ipv4.tcp_max_orphans value, the message stops but nothing
> changes.

Normally orphans correspond to closed sockets for which there are still
data in the system's buffers so this should be unrelated to the CLOSE_WAIT,
unless there's a loop somewhere where a backend reconnects to the frontend,
which can explain both situations at once when the timeout strikes.

> Haproxy logs reported for that listen the indicator "sD", but only with
> 1.8.

Thus a server timeout during the end of the transfer. That doesn't make
much sense either.

> Any ideas to dig into the issue?

It would be very useful that you share your configuration (please remove
any sensitive info like stats passwords or IP addresses you prefer to keep
private). When running 1.8, it would be useful to issue the following
commands on the CLI and capture the output to a file :
  - "show sess all"
  - "show fd"

Warning, the first one will reveal a lot of info (internal addresses etc)
so you may want to send it privately and not to the list if this is the
case (though it takes longer to diagnose it :-)).

If you think you can reproduce this on a test machine out of production,
that would be extremely useful.

We have not noticed any single such issue on haproxy.org which has delivered
about 100 GB and 2 million requests over the last 2 weeks, with this exact
same version, so that makes me think that either the config or the type of
traffic count a lot there to trigger the problem you are observing.

Regards,
Willy



Issue after upgrade from 1.7 to 1.8 related with active sessions

2017-12-20 Thread Ricardo Fraile
Hello,


After upgrade from 1.7.4 to 1.8.1, basically with the end of mail conf
snippet, the sessions started to grow, as example:

1.7.4:
Active sessions: ~161
Active sessions rate: ~425

1.8.1:
Active sessions: ~6700
Active sessions rate: ~350

Looking into the linux (3.16.7) server, there are a high number of
CLOSE_WAIT connections from the bind address of the listen service to
the backend nodes.

System logs reported "TCP: too many orphaned sockets", but after
increase net.ipv4.tcp_max_orphans value, the message stops but nothing
changes.

Haproxy logs reported for that listen the indicator "sD", but only with
1.8.


Any ideas to dig into the issue?



Thanks,





defaults
modetcp
retries 3
option redispatch

maxconn 10
fullconn 10

timeout connect  5s
timeout server   50s
timeout client   50s

listen proxy-tcp
bind 192.168.1.1:80
balance roundrobin

server node1 192.168.1.10:80
server node2 192.168.1.11:80
server node3 192.168.1.12:80




Re: haproxy and solarflare onload

2017-12-20 Thread Elias Abacioglu
>
> Apparently I'm not graphing conn_rate (i need to add it, but I have no
> values now), cause we're also sending all SSL traffic to other nodes using
> TCP load balancing.
>

Update: I'm at around 7,7k connection rate.


Re: haproxy and solarflare onload

2017-12-20 Thread Elias Abacioglu
On Wed, Dec 20, 2017 at 2:10 PM, Willy Tarreau  wrote:

> On Wed, Dec 20, 2017 at 11:48:27AM +0100, Elias Abacioglu wrote:
> > Yes, I have one node running with Solarflare SFN8522 2p 10Gbit/s
> currently
> > without Onload enabled.
> > it has 17.5K http_request_rate and ~26% server interrupts on core 0 and 1
> > where the NIC IRQ is bound to.
> >
> > And I have a similar node with Intel X710 2p 10Gbit/s.
> > It has 26.1K http_request_rate and ~26% server interrupts on core 0 and 1
> > where the NIC IRQ is bound to.
> >
> > both nodes have 1 socket, Intel Xeon CPU E3-1280 v6, 32 GB RAM.
>
> In both cases this is very low performance. We're getting 245k req/s and
> 90k
> connections/s oon a somewhat comparable Core i7-4790K on small objects and
> are easily saturating 2 10G NICs with medium sized objects. The problem I'm
> seeing is that if your cable is not saturated, you're supposed to be
> running
> at a higher request rate, and if it's saturated you should not observe the
> slightest difference between the two tests. In fact what I'm suspecting is
> that you're running with ~45kB objects and that your intel NIC managed to
> reach the line rate, and that in the same test the SFN8522 cannot even
> reach
> it. Am I wrong ? If so, from what I remember from the 40G tests 2 years
> ago,
> you should be able to get close to 35-40G with such object sizes.
>

I forgot to mention that this was not a benchmark test. I tested with live
traffic (/me hides in shame).
Thats the reason we aren't saturated, so it's not that we've hit the limit
now. And why one node gets more traffic got to do with the different VIP's
assigned, not sure really why, cause we split the VIP's evenly but I
suspect one of the VIP's get more traffic.
And I can tell you that we can't reach 245k req/s. At this very moment if I
look on the Intel node, we've got ~70% cpu idle on core 0+1 were the NIC
IRQ is set to, and ~58% on core 2+3 where haproxy is running. And this node
is currently at around 27k req/s. With this math we would hit 100% CPU on
core 2+3 at around 47k req/s.
So we spike in CPU usage before 50k req/s, we're not even close to 245k
req/s, guess I need to learn more tuning.
That was my goal/vision with Solarflare offload the CPU more so I can give
more cores to haproxy.

Is there a metric that shows avg object size?

Apparently I'm not graphing conn_rate (i need to add it, but I have no
values now), cause we're also sending all SSL traffic to other nodes using
TCP load balancing.



> Oh just one thing : verify that you're not running with jumbo frames on the
> solarflare case. Jumbo frames used to help *a lot* 10 years ago when they
> were saving interrupt processing time. Nowadays they instead hurt a lot
> because allocating 9kB of contiguous memory at once for a packet is much
> more difficult than allocating only 1.5kB. Honnestly I don't remember
> having
> seen a single case over the last 5+ years where running with jumbo frames
> would permit to reach the same performance as no jumbo. GSO+GRO have helped
> a lot there as well!
>

Jumbo frames are not enabled, these nodes are connected directly to the
Internet :)
GSO+GRO is enabled for both Intel and Solarflare.


Re: haproxy and solarflare onload

2017-12-20 Thread Elias Abacioglu
On Wed, Dec 20, 2017 at 3:27 PM, Christian Ruppert  wrote:

> Oh, btw, I'm just reading that onload documentation.
>
> 
> Filters
> Filters are used to deliver packets received from the wire to the
> appropriate
> application. When filters are exhausted it is not possible to create new
> accelerated
> sockets. The general recommendation is that applications do not allocate
> more than
> 4096 filters ‐ or applications should not create more than 4096 outgoing
> connections.
> The limit does not apply to inbound connections to a listening socket.
> 


This feels severely limiting.
The support was talking about Scalable filters, but that doesn't seem
applicable when using virtual IP's.


Re: haproxy and solarflare onload

2017-12-20 Thread Elias Abacioglu
On Wed, Dec 20, 2017 at 1:11 PM, Christian Ruppert  wrote:

> Hi Elias,
>
> I'm currently preparing a test setup including a SFN8522 + onload.
> How did you measure it? When did those errors (drops/discard?) appear,
> during a test or some real traffic?
> The first thing I did is updating the driver + firmware. Is both up2date
> in your case?
>
> I haven't measured / compared the SFN8522 against a X520 nor X710 yet but
> do you have RSS / affinity or something related, enabled/set? Intel has
> some features and Solarflare may have its own stuff.


Those errors appear during real traffic.
My workflow.
* kill keepalived, public VIP's gets removed.
* restart HAproxy with onload (no errors here)
* start keepalived, public VIP's gets added.
* traffic starts to flow in and errors start appearing almost immediately.

And driver and firmware is latest.
Currently I've have not set RSS, which means it creates one queue per core,
in my case 4 per port. And I assign affinity to core 0+1, haproxy gets core
2+3.


Re: haproxy and solarflare onload

2017-12-20 Thread Christian Ruppert

Oh, btw, I'm just reading that onload documentation.


Filters
Filters are used to deliver packets received from the wire to the 
appropriate
application. When filters are exhausted it is not possible to create new 
accelerated
sockets. The general recommendation is that applications do not allocate 
more than

4096 filters ‐ or applications should not create more than 4096 outgoing
connections.
The limit does not apply to inbound connections to a listening socket.


On 2017-12-20 13:11, Christian Ruppert wrote:

Hi Elias,

I'm currently preparing a test setup including a SFN8522 + onload.
How did you measure it? When did those errors (drops/discard?) appear,
during a test or some real traffic?
The first thing I did is updating the driver + firmware. Is both
up2date in your case?

I haven't measured / compared the SFN8522 against a X520 nor X710 yet
but do you have RSS / affinity or something related, enabled/set?
Intel has some features and Solarflare may have its own stuff.

On 2017-12-20 11:48, Elias Abacioglu wrote:

Hi,

Yes on the LD_PRELOAD.

Yes, I have one node running with Solarflare SFN8522 2p 10Gbit/s
currently without Onload enabled.
it has 17.5K http_request_rate and ~26% server interrupts on core 0
and 1 where the NIC IRQ is bound to.

And I have a similar node with Intel X710 2p 10Gbit/s.
It has 26.1K http_request_rate and ~26% server interrupts on core 0
and 1 where the NIC IRQ is bound to.

both nodes have 1 socket, Intel Xeon CPU E3-1280 v6, 32 GB RAM.

So without Onload Solarflare performs worse than the X710 since it has
the same amount of SI load with less traffic. And a side note is that
I haven't compared the ethtool settings between Intel and Solarflare,
just running with the defaults of both cards.

I currently have a support ticket open with the Solarflare team to
about the issues I mentioned in my previous mail, if they sort that
out I can perhaps setup a test server if I can manage to free up one
server.
Then we can do some synthetic benchmarks with a set of parameters of
your choosing.

Regards,

/Elias

On Wed, Dec 20, 2017 at 9:48 AM, Willy Tarreau  wrote:


Hi Elias,

On Tue, Dec 19, 2017 at 02:23:21PM +0100, Elias Abacioglu wrote:

Hi,

I recently bought a solarflare NIC with (ScaleOut) Onload /

OpenOnload to

test it with HAproxy.

Have anyone tried running haproxy with solarflare onload

functions?


After I started haproxy with onload, this started spamming on the

kernel

log:
Dec 12 14:11:54 dflb06 kernel: [357643.035355] [onload]
oof_socket_add_full_hw: 6:3083 ERROR: FILTER TCP 10.3.54.43:4147

[1]

10.3.20.116:80 [2] failed (-16)
Dec 12 14:11:54 dflb06 kernel: [357643.064395] [onload]
oof_socket_add_full_hw: 6:3491 ERROR: FILTER TCP 10.3.54.43:39321

[3]

10.3.20.113:80 [4] failed (-16)
Dec 12 14:11:54 dflb06 kernel: [357643.081069] [onload]
oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403

[5]

10.3.20.30:445 [6] failed (-16)
Dec 12 14:11:54 dflb06 kernel: [357643.082625] [onload]
oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403

[5]

10.3.20.30:445 [6] failed (-16)

And this in haproxy log:
Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy ssl-relay reached

system

memory limit at 9931 sockets. Please check system tunables.
Dec 12 14:12:07 dflb06 haproxy[21146]: Proxy ssl-relay reached

system

memory limit at 9184 sockets. Please check system tunables.
Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system

memory

limit at 9931 sockets. Please check system tunables.
Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system

memory

limit at 9931 sockets. Please check system tunables.


Apparently I've hit the max hardware filter limit on the  card.
Does anyone here have experience in running haproxy with onload

features?

I've never got any report of any such test, though in the past I
thought
it would be nice to run such a test, at least to validate the
perimeter
covered by the library (you're using it as LD_PRELOAD, that's it ?).


Mind sharing insights and advice on how to get a functional setup?


I really don't know what can reasonably be expected from code trying
to
partially bypass a part of the TCP stack to be honnest. From what
I've
read a long time ago, onload might be doing its work in a not very
intrusive way but judging by your messages above I'm having some
doubts
now.

Have you tried without this software, using the card normally ? I
mean,
2 years ago I had the opportunity to test haproxy on a dual-40G
setup
and we reached 60 Gbps of forwarded traffic with all machines in the
test bench reaching their limits (and haproxy reaching 100% as
well),
so for me that proves that the TCP stack still scales extremely well
and that while such acceleration software might make sense for a
next
generation NIC running on old hardware (eg: when 400 Gbps NICs start
to appear), I'm really not convinced that it makes any sense to use
them on well supported setups like 2-4 10Gbps links which are very

Re: haproxy and solarflare onload

2017-12-20 Thread Willy Tarreau
On Wed, Dec 20, 2017 at 11:48:27AM +0100, Elias Abacioglu wrote:
> Yes, I have one node running with Solarflare SFN8522 2p 10Gbit/s currently
> without Onload enabled.
> it has 17.5K http_request_rate and ~26% server interrupts on core 0 and 1
> where the NIC IRQ is bound to.
> 
> And I have a similar node with Intel X710 2p 10Gbit/s.
> It has 26.1K http_request_rate and ~26% server interrupts on core 0 and 1
> where the NIC IRQ is bound to.
> 
> both nodes have 1 socket, Intel Xeon CPU E3-1280 v6, 32 GB RAM.

In both cases this is very low performance. We're getting 245k req/s and 90k
connections/s oon a somewhat comparable Core i7-4790K on small objects and
are easily saturating 2 10G NICs with medium sized objects. The problem I'm
seeing is that if your cable is not saturated, you're supposed to be running
at a higher request rate, and if it's saturated you should not observe the
slightest difference between the two tests. In fact what I'm suspecting is
that you're running with ~45kB objects and that your intel NIC managed to
reach the line rate, and that in the same test the SFN8522 cannot even reach
it. Am I wrong ? If so, from what I remember from the 40G tests 2 years ago,
you should be able to get close to 35-40G with such object sizes.

Oh just one thing : verify that you're not running with jumbo frames on the
solarflare case. Jumbo frames used to help *a lot* 10 years ago when they
were saving interrupt processing time. Nowadays they instead hurt a lot
because allocating 9kB of contiguous memory at once for a packet is much
more difficult than allocating only 1.5kB. Honnestly I don't remember having
seen a single case over the last 5+ years where running with jumbo frames
would permit to reach the same performance as no jumbo. GSO+GRO have helped
a lot there as well!

Cheers,
Willy



Re: haproxy and solarflare onload

2017-12-20 Thread Christian Ruppert

Hi Elias,

I'm currently preparing a test setup including a SFN8522 + onload.
How did you measure it? When did those errors (drops/discard?) appear, 
during a test or some real traffic?
The first thing I did is updating the driver + firmware. Is both up2date 
in your case?


I haven't measured / compared the SFN8522 against a X520 nor X710 yet 
but do you have RSS / affinity or something related, enabled/set? Intel 
has some features and Solarflare may have its own stuff.


On 2017-12-20 11:48, Elias Abacioglu wrote:

Hi,

Yes on the LD_PRELOAD.

Yes, I have one node running with Solarflare SFN8522 2p 10Gbit/s
currently without Onload enabled.
it has 17.5K http_request_rate and ~26% server interrupts on core 0
and 1 where the NIC IRQ is bound to.

And I have a similar node with Intel X710 2p 10Gbit/s.
It has 26.1K http_request_rate and ~26% server interrupts on core 0
and 1 where the NIC IRQ is bound to.

both nodes have 1 socket, Intel Xeon CPU E3-1280 v6, 32 GB RAM.

So without Onload Solarflare performs worse than the X710 since it has
the same amount of SI load with less traffic. And a side note is that
I haven't compared the ethtool settings between Intel and Solarflare,
just running with the defaults of both cards.

I currently have a support ticket open with the Solarflare team to
about the issues I mentioned in my previous mail, if they sort that
out I can perhaps setup a test server if I can manage to free up one
server.
Then we can do some synthetic benchmarks with a set of parameters of
your choosing.

Regards,

/Elias

On Wed, Dec 20, 2017 at 9:48 AM, Willy Tarreau  wrote:


Hi Elias,

On Tue, Dec 19, 2017 at 02:23:21PM +0100, Elias Abacioglu wrote:

Hi,

I recently bought a solarflare NIC with (ScaleOut) Onload /

OpenOnload to

test it with HAproxy.

Have anyone tried running haproxy with solarflare onload

functions?


After I started haproxy with onload, this started spamming on the

kernel

log:
Dec 12 14:11:54 dflb06 kernel: [357643.035355] [onload]
oof_socket_add_full_hw: 6:3083 ERROR: FILTER TCP 10.3.54.43:4147

[1]

10.3.20.116:80 [2] failed (-16)
Dec 12 14:11:54 dflb06 kernel: [357643.064395] [onload]
oof_socket_add_full_hw: 6:3491 ERROR: FILTER TCP 10.3.54.43:39321

[3]

10.3.20.113:80 [4] failed (-16)
Dec 12 14:11:54 dflb06 kernel: [357643.081069] [onload]
oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403

[5]

10.3.20.30:445 [6] failed (-16)
Dec 12 14:11:54 dflb06 kernel: [357643.082625] [onload]
oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403

[5]

10.3.20.30:445 [6] failed (-16)

And this in haproxy log:
Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy ssl-relay reached

system

memory limit at 9931 sockets. Please check system tunables.
Dec 12 14:12:07 dflb06 haproxy[21146]: Proxy ssl-relay reached

system

memory limit at 9184 sockets. Please check system tunables.
Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system

memory

limit at 9931 sockets. Please check system tunables.
Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system

memory

limit at 9931 sockets. Please check system tunables.


Apparently I've hit the max hardware filter limit on the  card.
Does anyone here have experience in running haproxy with onload

features?

I've never got any report of any such test, though in the past I
thought
it would be nice to run such a test, at least to validate the
perimeter
covered by the library (you're using it as LD_PRELOAD, that's it ?).


Mind sharing insights and advice on how to get a functional setup?


I really don't know what can reasonably be expected from code trying
to
partially bypass a part of the TCP stack to be honnest. From what
I've
read a long time ago, onload might be doing its work in a not very
intrusive way but judging by your messages above I'm having some
doubts
now.

Have you tried without this software, using the card normally ? I
mean,
2 years ago I had the opportunity to test haproxy on a dual-40G
setup
and we reached 60 Gbps of forwarded traffic with all machines in the
test bench reaching their limits (and haproxy reaching 100% as
well),
so for me that proves that the TCP stack still scales extremely well
and that while such acceleration software might make sense for a
next
generation NIC running on old hardware (eg: when 400 Gbps NICs start
to appear), I'm really not convinced that it makes any sense to use
them on well supported setups like 2-4 10Gbps links which are very
common nowadays. I mean, I managed to run haproxy at 10Gbps 10 years
ago on a core2-duo! Hardware has evolved quite a bit since :-)

Regards,
Willy




Links:
--
[1] http://10.3.54.43:4147
[2] http://10.3.20.116:80
[3] http://10.3.54.43:39321
[4] http://10.3.20.113:80
[5] http://10.3.54.43:62403
[6] http://10.3.20.30:445


--
Regards,
Christian Ruppert



Re: haproxy and solarflare onload

2017-12-20 Thread Elias Abacioglu
Hi,

Yes on the LD_PRELOAD.

Yes, I have one node running with Solarflare SFN8522 2p 10Gbit/s currently
without Onload enabled.
it has 17.5K http_request_rate and ~26% server interrupts on core 0 and 1
where the NIC IRQ is bound to.

And I have a similar node with Intel X710 2p 10Gbit/s.
It has 26.1K http_request_rate and ~26% server interrupts on core 0 and 1
where the NIC IRQ is bound to.

both nodes have 1 socket, Intel Xeon CPU E3-1280 v6, 32 GB RAM.

So without Onload Solarflare performs worse than the X710 since it has the
same amount of SI load with less traffic. And a side note is that I haven't
compared the ethtool settings between Intel and Solarflare, just running
with the defaults of both cards.
I currently have a support ticket open with the Solarflare team to about
the issues I mentioned in my previous mail, if they sort that out I can
perhaps setup a test server if I can manage to free up one server.
Then we can do some synthetic benchmarks with a set of parameters of your
choosing.

Regards,
/Elias



On Wed, Dec 20, 2017 at 9:48 AM, Willy Tarreau  wrote:

> Hi Elias,
>
> On Tue, Dec 19, 2017 at 02:23:21PM +0100, Elias Abacioglu wrote:
> > Hi,
> >
> > I recently bought a solarflare NIC with (ScaleOut) Onload / OpenOnload to
> > test it with HAproxy.
> >
> > Have anyone tried running haproxy with solarflare onload functions?
> >
> > After I started haproxy with onload, this started spamming on the kernel
> > log:
> > Dec 12 14:11:54 dflb06 kernel: [357643.035355] [onload]
> > oof_socket_add_full_hw: 6:3083 ERROR: FILTER TCP 10.3.54.43:4147
> > 10.3.20.116:80 failed (-16)
> > Dec 12 14:11:54 dflb06 kernel: [357643.064395] [onload]
> > oof_socket_add_full_hw: 6:3491 ERROR: FILTER TCP 10.3.54.43:39321
> > 10.3.20.113:80 failed (-16)
> > Dec 12 14:11:54 dflb06 kernel: [357643.081069] [onload]
> > oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403
> > 10.3.20.30:445 failed (-16)
> > Dec 12 14:11:54 dflb06 kernel: [357643.082625] [onload]
> > oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403
> > 10.3.20.30:445 failed (-16)
> >
> > And this in haproxy log:
> > Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy ssl-relay reached system
> > memory limit at 9931 sockets. Please check system tunables.
> > Dec 12 14:12:07 dflb06 haproxy[21146]: Proxy ssl-relay reached system
> > memory limit at 9184 sockets. Please check system tunables.
> > Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system memory
> > limit at 9931 sockets. Please check system tunables.
> > Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system memory
> > limit at 9931 sockets. Please check system tunables.
> >
> >
> > Apparently I've hit the max hardware filter limit on the  card.
> > Does anyone here have experience in running haproxy with onload features?
>
> I've never got any report of any such test, though in the past I thought
> it would be nice to run such a test, at least to validate the perimeter
> covered by the library (you're using it as LD_PRELOAD, that's it ?).
>
> > Mind sharing insights and advice on how to get a functional setup?
>
> I really don't know what can reasonably be expected from code trying to
> partially bypass a part of the TCP stack to be honnest. From what I've
> read a long time ago, onload might be doing its work in a not very
> intrusive way but judging by your messages above I'm having some doubts
> now.
>
> Have you tried without this software, using the card normally ? I mean,
> 2 years ago I had the opportunity to test haproxy on a dual-40G setup
> and we reached 60 Gbps of forwarded traffic with all machines in the
> test bench reaching their limits (and haproxy reaching 100% as well),
> so for me that proves that the TCP stack still scales extremely well
> and that while such acceleration software might make sense for a next
> generation NIC running on old hardware (eg: when 400 Gbps NICs start
> to appear), I'm really not convinced that it makes any sense to use
> them on well supported setups like 2-4 10Gbps links which are very
> common nowadays. I mean, I managed to run haproxy at 10Gbps 10 years
> ago on a core2-duo! Hardware has evolved quite a bit since :-)
>
> Regards,
> Willy
>


Re: haproxy and solarflare onload

2017-12-20 Thread Willy Tarreau
Hi Elias,

On Tue, Dec 19, 2017 at 02:23:21PM +0100, Elias Abacioglu wrote:
> Hi,
> 
> I recently bought a solarflare NIC with (ScaleOut) Onload / OpenOnload to
> test it with HAproxy.
> 
> Have anyone tried running haproxy with solarflare onload functions?
> 
> After I started haproxy with onload, this started spamming on the kernel
> log:
> Dec 12 14:11:54 dflb06 kernel: [357643.035355] [onload]
> oof_socket_add_full_hw: 6:3083 ERROR: FILTER TCP 10.3.54.43:4147
> 10.3.20.116:80 failed (-16)
> Dec 12 14:11:54 dflb06 kernel: [357643.064395] [onload]
> oof_socket_add_full_hw: 6:3491 ERROR: FILTER TCP 10.3.54.43:39321
> 10.3.20.113:80 failed (-16)
> Dec 12 14:11:54 dflb06 kernel: [357643.081069] [onload]
> oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403
> 10.3.20.30:445 failed (-16)
> Dec 12 14:11:54 dflb06 kernel: [357643.082625] [onload]
> oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403
> 10.3.20.30:445 failed (-16)
> 
> And this in haproxy log:
> Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy ssl-relay reached system
> memory limit at 9931 sockets. Please check system tunables.
> Dec 12 14:12:07 dflb06 haproxy[21146]: Proxy ssl-relay reached system
> memory limit at 9184 sockets. Please check system tunables.
> Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system memory
> limit at 9931 sockets. Please check system tunables.
> Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system memory
> limit at 9931 sockets. Please check system tunables.
> 
> 
> Apparently I've hit the max hardware filter limit on the  card.
> Does anyone here have experience in running haproxy with onload features?

I've never got any report of any such test, though in the past I thought
it would be nice to run such a test, at least to validate the perimeter
covered by the library (you're using it as LD_PRELOAD, that's it ?).

> Mind sharing insights and advice on how to get a functional setup?

I really don't know what can reasonably be expected from code trying to
partially bypass a part of the TCP stack to be honnest. From what I've
read a long time ago, onload might be doing its work in a not very
intrusive way but judging by your messages above I'm having some doubts
now.

Have you tried without this software, using the card normally ? I mean,
2 years ago I had the opportunity to test haproxy on a dual-40G setup
and we reached 60 Gbps of forwarded traffic with all machines in the
test bench reaching their limits (and haproxy reaching 100% as well),
so for me that proves that the TCP stack still scales extremely well
and that while such acceleration software might make sense for a next
generation NIC running on old hardware (eg: when 400 Gbps NICs start
to appear), I'm really not convinced that it makes any sense to use
them on well supported setups like 2-4 10Gbps links which are very
common nowadays. I mean, I managed to run haproxy at 10Gbps 10 years
ago on a core2-duo! Hardware has evolved quite a bit since :-)

Regards,
Willy



Re: [PATCH 1/2] Fix compiler warning in iprange.c

2017-12-20 Thread Willy Tarreau
On Fri, Dec 15, 2017 at 10:21:29AM -0600, Ryan O'Hara wrote:
> The declaration of main() in iprange.c did not specify a type, causing
> a compiler warning [-Wimplicit-int]. This patch simply declares main()
> to be type 'int' and calls exit(0) at the end of the function.

Both patches applied, thank you Ryan.

Willy



Re: [PATCH] BUG: NetScaler CIP handling is incorrect

2017-12-20 Thread Andreas Mahnke
Great,

thank you guys!

Best regards,
Andreas

On Wed, Dec 20, 2017 at 7:06 AM, Willy Tarreau  wrote:

> On Tue, Dec 19, 2017 at 11:10:58PM +, Bertrand Jacquin wrote:
> > Hi Andreas and Willy,
> >
> > Please find attached a patch serie adding support for both legacy and
> > standard CIP protocol while keeping compatibility with current
> > configuration format.
>
> Excellent, now applied to 1.9, will backport it to 1.8 later.
>
> Thanks a lot guys, I've seen how many round trips it required to
> validate these changes on your respective infrastructures, it was
> a very productive cooperation!
>
> Cheers,
> Willy
>