Re: SSL, peered sticky tables + nbproc 1?

2014-05-02 Thread James Hogarth
On 2 May 2014 19:02, Willy Tarreau w...@1wt.eu wrote:

 On Fri, May 02, 2014 at 10:59:00AM -0700, Bryan Talbot wrote:
  It sounds like that Jeff ran out of CPU for SSL terminations and that
could
  be addressed as described by Willy here
 
  https://www.mail-archive.com/haproxy@formilux.org/msg13104.html
 
  and allow him to stay with a single-process stick table for the actual
load
  balancing.

 Yes that's perfectly possible. And when we have proxy proto v2 with SSL
info,
 it'll be even better :-)

 Willy



We've done quite a bit of work on this internally recently to provide SSL
multiprocess with sane load balancing.

There's a couple of small edge cases we've got left then we were intending
to push it up for comments...

I've literally just got home but I'll follow up in the office next week to
see how close we are.

James


Re: 'packet of death' in 1.5-dev21.x86_64.el6_4

2014-02-06 Thread James Hogarth
Hi,

Just providing some closure to this interesting edge case...


 Thank you for this detailed report, this is *very* useful. As you tracked
 the crash to happen inside openssl, I think you should file a report to
 centos/redhat because it's a security issue.


CentOS is bug-for-bug compatible (see recent libsvr2 issue there) and
seeing as we don't at this time have a support contract with RH plus it  it
is not reproducible on the current release (the glibc in 6.5 fixes it) I
doubt there will be any traction in that direction.


 It's possible that the bug
 is easier to trigger with haproxy or with a specific version of it than
 other products, but nevertheless, no lib should ever crash depending on
 the traffic so I suspect there's an unchecked error code in it causing a
 NULL pointer to be dereferenced.


It wasn't a null pointer but an explicit choice of behaviour in libkrb5 it
seems...


 In 1.5-dev12, I believe we did not yet support SNI, which could be an
 explanation for the different behaviour between the two versions. I
 think that the chroot is needed to trigger the bug simply because the
 glibc does not find a file it looks up, and causes a different return
 code to be fed to openssl.


This is possible and seems a likely possibility ... Incidentally blocking
one of the 'ciphers of death' by not permitting it on the bind also appears
to avoid the code path - the 'cipher not permitted' gets triggered before
whatever leads to the query through libkrb that results in the SIGABRT.


 It would be useful to know if you can also
 trigger the issue using the legacy openssl library instead of the
 distro's version (just pick 1.0.0l or 1.0.1f from the site if you're
 willing to rebuild it).


We probably won't get a chance to do this and we're unlikely to move off of
the libssl supported by the distro due to the added maintenance overhead.



 Thanks a lot!
 Willy


Not a problem ... our Head of IS did a detailed write up on our
investigation process and findings at his blog if you are interested:

http://blog.tinola.com/?e=36

Cheers,

James


'packet of death' in 1.5-dev21.x86_64.el6_4

2014-01-30 Thread James Hogarth
Hi all,

We carried out an update from dev12 to dev21 as per my previous message to
the list and the specific issue I mentioned before no longer occurred -
good.

Unfortunately we hit a fairly major problem which summed up is a 'packet of
death' scenario that affects dev21 (have not built backwards in time yet to
determine the exact 'regression') but not dev12.

I mentioned this briefly in IRC but for archive's sake here's what we spent
the day debugging:

Steps to reproduce:
1) Build a C6.4 system using vault.centos.org
2) Install haproxy-1.5-dev21.x86_64
3) Set the sysctl properties net.ipv6.conf.default.disable_ipv6 = 1 and
net.ipv6.conf.all.disable_ipv6 = 1
4) Start a basic haproxy configuration with a chroot specified and an ssl
enabled listener.
5) Use a tool that can specify cipher strength (LOW or MEDIUM crashes but
HIGH does not) to open a connection to the listener such as ApacheBench eg:
ab -c 1 -n 1 -Z LOW https://targethost/

Results:
Haproxy gets a segfault - specifically a SIGABRT - and the process dies.

Expectation:
Haproxy carries on working without issue.

Workarounds:
As you can read above there's specific things that need to line up for this
to occur (and unfortunately we hit them all on our production systems as it
turns out).
1) If the glibc from 6.5 is installed (and yes boxes should be updated and
'there is only C6 not C6.X' should prevail) then no crash occurs.
2) If haproxy is not in a chroot then no crash occurs.
3) If ipv6 is not disabled (ie systctl reports disable_ipv6 = 0) then no
crash occurs.

Specifics:
We spent most of the day in gdb, strace and ltrace today working through
the specific codepaths and it would appear that SSL makes a call through
SSL_get_hostname which then calls through
libkrb krb5int_get_fq_local_hostname (localhost, sizeof(localhost)) and the
routines this calls ends up with  -5 (EAI_NODATA) being passed to
krb5int_translate_gai_error which then, if EAI_NODATA has not been defined
which appears might be the case without __USE_GNU, calls abort().

As you can see we went into some detail on this ... We're not sure at this
time why dev12 does not crash or specifically what fixes in glibc for the
el6.5 point release avoid this code path and crash (updating openssl or
krb5-libs does not help only glibc).

TL:DR; chroot haproxy, disable_ipv6 and run c6.4 with an SSL front end and
ab -c 1 -n 1 -Z LOW https://target/ is enough to crash your haproxy with a
SIGABRT.

 I'm not sure how much more we are going to deep dive this given the
available workarounds but it's a heads up for anyone else that hits the
three criteria and an interesting problem for why, exactly, it happens ;)

If anyone has any thoughts or insights I'd be intrigued to hear them and if
you want to reproduce and have difficulties doing so I'd be happy to help.

Cheers,

James


Re: 'packet of death' in 1.5-dev21.x86_64.el6_4

2014-01-30 Thread James Hogarth
On 30 January 2014 22:21, Lukas Tribus luky...@hotmail.com wrote:

 Hi,


  If anyone has any thoughts or insights I'd be intrigued to hear them
  and if you want to reproduce and have difficulties doing so I'd be
  happy to help.

 Please provide the smallest config you can reproduce the problem with
 and the output of haproxy -vv. I cannot currently reproduce this.

Also, can you please provide:
 ldd haproxy


[root@localhost ~]# ldd /usr/sbin/haproxy
linux-vdso.so.1 =  (0x7fff3e5d1000)
libcrypt.so.1 = /lib64/libcrypt.so.1 (0x7fdd7c79e000)
libssl.so.10 = /usr/lib64/libssl.so.10 (0x7fdd7c541000)
libcrypto.so.10 = /usr/lib64/libcrypto.so.10 (0x7fdd7c1a6000)
libpcreposix.so.0 = /usr/lib64/libpcreposix.so.0 (0x7fdd7bfa4000)
libpcre.so.0 = /lib64/libpcre.so.0 (0x7fdd7bd78000)
libc.so.6 = /lib64/libc.so.6 (0x7fdd7b9e4000)
libfreebl3.so = /lib64/libfreebl3.so (0x7fdd7b782000)
libgssapi_krb5.so.2 = /lib64/libgssapi_krb5.so.2 (0x7fdd7b53e000)
libkrb5.so.3 = /lib64/libkrb5.so.3 (0x7fdd7b257000)
libcom_err.so.2 = /lib64/libcom_err.so.2 (0x7fdd7b053000)
libk5crypto.so.3 = /lib64/libk5crypto.so.3 (0x7fdd7ae27000)
libdl.so.2 = /lib64/libdl.so.2 (0x7fdd7ac22000)
libz.so.1 = /lib64/libz.so.1 (0x7fdd7aa0c000)
/lib64/ld-linux-x86-64.so.2 (0x7fdd7c9db000)
libkrb5support.so.0 = /lib64/libkrb5support.so.0 (0x7fdd7a801000)
libkeyutils.so.1 = /lib64/libkeyutils.so.1 (0x7fdd7a5fd000)
libresolv.so.2 = /lib64/libresolv.so.2 (0x7fdd7a3e3000)
libpthread.so.0 = /lib64/libpthread.so.0 (0x7fdd7a1c6000)
libselinux.so.1 = /lib64/libselinux.so.1 (0x7fdd79fa6000)



 yum info glibc


[root@localhost ~]# rpm -qi glibc
Name: glibcRelocations: (not relocatable)
Version : 2.12  Vendor: CentOS
Release : 1.107.el6_4.5 Build Date: Mon 14 Oct 2013
09:14:18 BST
Install Date: Thu 30 Jan 2014 17:59:28 GMT  Build Host:
c6b8.bsys.dev.centos.org
Group   : System Environment/Libraries   Source RPM:
glibc-2.12-1.107.el6_4.5.src.rpm
Size: 12947502 License: LGPLv2+ and LGPLv2+
with exceptions and GPLv2+
Signature   : RSA/SHA1, Mon 14 Oct 2013 16:44:18 BST, Key ID
0946fca2c105b9de
Packager: CentOS BuildSystem http://bugs.centos.org
URL : http://sources.redhat.com/glibc/
Summary : The GNU libc libraries



 yum info glibc-devel


[root@localhost ~]# rpm -qi glibc-devel
Name: glibc-devel  Relocations: (not relocatable)
Version : 2.12  Vendor: CentOS
Release : 1.107.el6_4.5 Build Date: Mon 14 Oct 2013
09:14:18 BST
Install Date: Thu 30 Jan 2014 17:59:35 GMT  Build Host:
c6b8.bsys.dev.centos.org
Group   : Development/Libraries Source RPM:
glibc-2.12-1.107.el6_4.5.src.rpm
Size: 988726   License: LGPLv2+ and LGPLv2+
with exceptions and GPLv2+
Signature   : RSA/SHA1, Mon 14 Oct 2013 16:44:18 BST, Key ID
0946fca2c105b9de
Packager: CentOS BuildSystem http://bugs.centos.org
URL : http://sources.redhat.com/glibc/
Summary : Object files for development using standard C libraries.


 yum info openssl


[root@localhost ~]# rpm -qi openssl
Name: openssl  Relocations: (not relocatable)
Version : 1.0.0 Vendor: CentOS
Release : 27.el6_4.2Build Date: Mon 04 Mar 2013
22:23:22 GMT
Install Date: Thu 30 Jan 2014 14:57:25 GMT  Build Host:
c6b9.bsys.dev.centos.org
Group   : System Environment/Libraries   Source RPM:
openssl-1.0.0-27.el6_4.2.src.rpm
Size: 3776670  License: OpenSSL
Signature   : RSA/SHA1, Mon 04 Mar 2013 22:33:47 GMT, Key ID
0946fca2c105b9de
Packager: CentOS BuildSystem http://bugs.centos.org
URL : http://www.openssl.org/
Summary : A general purpose cryptography library with TLS implementation


 yum info openssl-devel


[root@localhost ~]# rpm -qi openssl-devel
Name: openssl-develRelocations: (not relocatable)
Version : 1.0.0 Vendor: CentOS
Release : 27.el6_4.2Build Date: Mon 04 Mar 2013
22:23:22 GMT
Install Date: Thu 30 Jan 2014 14:57:45 GMT  Build Host:
c6b9.bsys.dev.centos.org
Group   : Development/Libraries Source RPM:
openssl-1.0.0-27.el6_4.2.src.rpm
Size: 2162085  License: OpenSSL
Signature   : RSA/SHA1, Mon 04 Mar 2013 22:33:47 GMT, Key ID
0946fca2c105b9de
Packager: CentOS BuildSystem http://bugs.centos.org
URL : http://www.openssl.org/



 Any other dependencies in haproxy? Did you build with compression or pcre
 support?


Make line in spec file is:

make USE_OPENSSL=1 %{?_smp_mflags} CPU=generic TARGET=linux26
USE_PCRE=1 USE_REGPARM=1 ADDINC=%{optflags} -I/usr/include/pcre



 If yes, please provide:
 yum info pcre



Re: 'packet of death' in 1.5-dev21.x86_64.el6_4

2014-01-30 Thread James Hogarth
On 30 January 2014 22:21, Lukas Tribus luky...@hotmail.com wrote:

 Please provide the smallest config you can reproduce the problem with
 and the output of haproxy -vv. I cannot currently reproduce this.


Sorry I missed config and -vvv :

 [root@localhost ~]# haproxy -vvv
HA-Proxy version 1.5-dev21-6b07bf7 +2013/12/17
Copyright 2000-2013 Willy Tarreau w...@1wt.eu

Build options :
  TARGET  = linux26
  CPU = generic
  CC  = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing
  OPTIONS = USE_REGPARM=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built without zlib support (USE_ZLIB not set)
Compression algorithms supported : identity
Built with OpenSSL version : OpenSSL 1.0.0-fips 29 Mar 2010
Running on OpenSSL version : OpenSSL 1.0.0-fips 29 Mar 2010
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with transparent proxy support using: IP_TRANSPARENT IP_FREEBIND

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

[root@localhost ~]# cat /etc/haproxy/haproxy.cfg
global
   chroot /var/lib/haproxy
   pidfile /var/run/haproxy.pid
defaults
   clitimeout 18
   srvtimeout 18
   contimeout 4000
 listen test-fe
  bind 192.168.122.54:443 ssl crt /etc/pki/tls/private/cert.pem
  mode tcp
  server test-server 192.168.122.244:9090 check

Note you don't need a working backend server ... just as long as you're
listening the behaviour occurs.

Cheers,

James


Haproxy disconnecting SSL clients with minimal changes from last week

2014-01-27 Thread James Hogarth
Hi all,

SSL sessions appear to be being closed by haproxy (FIN being sent to client
and server).

This seems to be pretty intermittent in frequency but very reproducible
over time with enough data.

Haproxy is logging it as a client disconnect (cd) ...

This was fine last week and no changes have occurred in the system or
haproxy configuration excepting a new SSL certificate with an increase in
key size form 2048bit to 4096bit.

We are running haproxy 1.5-dev12 at the moment on CentOS 6.4 64bit.

We're having troubles isolating a specific reason we might be seeing this
now and not last week...

Has anyone seen anything like this before?

Cheers,

James


Re: Haproxy disconnecting SSL clients with minimal changes from last week

2014-01-27 Thread James Hogarth
On 27 Jan 2014 21:26, Baptiste bed...@gmail.com wrote:
 HAProxy 1.5-dev12 is the very first version with SSL inside.
 Please upgrade to the latest GIT version which includes so many fixes
 and improvements.

Hi Baptiste,

We figured that would be a good step and have indeed just done so...
Hopefully that does indeed resolve our issues.

Cheers

James


Re: Improving behaviour when nbproc 1

2013-12-10 Thread James Hogarth
On 9 December 2013 22:23, Chris Burroughs chris.burrou...@gmail.com wrote:

 On 12/09/2013 05:02 PM, James Hogarth wrote:

 To answer yes it is against 1.5 ... The caveats are the peers don't work
 and the session table and load balancing can get messed up due to the lack
 of shared information between processes but if you just need to utilise
 multiple stat sockets and the rest doesn't matter so much then it works
 nicely.


 But thesse are the same caveats that nbproc has always had, not new ones,
 correct?


Correct - if nbrpoc was satisfactory before that does not alter that
behaviour - just provides the additional separated stat sockets.


Re: Improving behaviour when nbproc 1

2013-12-09 Thread James Hogarth
On 4 Dec 2013 16:23, Chris Burroughs chris.burrou...@gmail.com wrote:

 On 12/03/2013 04:07 PM, Chris Burroughs wrote:

 This could just be me not being adept at email patches.  Sorry if this
 is obvious but is this supposed to apply against 1.4 or 1.5?


 To answer my own question this applies against 1.5.  I'm not sure of the
feasibility or desirability of backporting to 1.4.

 I ran with this in a production load test and as far I can tell it worked
as advertised.  We were able to run with nbproc and still have useful
looking stats sockets for haptop, ganglia etc.  At least in our use case,
stats sockets with this patch solve the primary objection to running with
nbproc.

Apologies for missing your message the other day.

To answer yes it is against 1.5 ... The caveats are the peers don't work
and the session table and load balancing can get messed up due to the lack
of shared information between processes but if you just need to utilise
multiple stat sockets and the rest doesn't matter so much then it works
nicely.


Re: %usr/sys nbproc

2013-11-26 Thread James Hogarth
On 26 Nov 2013 12:26, Chris Burroughs chris.burrou...@gmail.com wrote:
 Highly threaded Java apps happens to be what most of our servers are
used for and what we benchmarked for purchasing decisions.


 Finally assuming the single process performance can not be further
 improved I was considering the following setup:
   * core 0: eht0 interrupts
   * core 1: haproxy bound to eth0
   * core 2: eth1 interrupts
   * core 3: haproxy bound to eth1
   * core 4-5: ssl terminator


 I definitely agree. I know at least one setup which runs fine this way.
 It was a two-socket system, each with its own NIC and process. But here
 you're in the same situation, consider that you have 3 independant CPUs
 in the same box. The benefit of doing it this way is that you can still
 parallelize network interrupts to multiple cores without having the
 response traffic come to the wrong core (proxies are a hell to optimize
 because of their two sides).


 This setup (haproxy per NIC) was able to handle 50% more load than a
single haproxy. So from about 20k req/s to 30k.  This is very nice bump
with with what would otherwise be mostly idle cpu cores.  We found this to
be very complex to setup at the IP layer though (which isn't haproxy's
fault but in our particular circumstances might not be worth it).



 But I could not find too many examples of similar setups and was unsure
 if it was a viable long term configuration.


 Yes it is viable. The only limit right now is that you'll need to start
 two processes. In the future, when listeners reliably support the
 bind-process keyword, it will even be possible to centralize
 everything and have a dedicated stats socket for each.


Chris it sounds like we are in a very similar business given I'm in the
process of tuning almost the same case...

Have a look at our patch we sent to the list a few days or so ago providing
a stats socket power process.

Depending on your use case it might provide what you need... But note it
still has the other issues with nbproc  1 such as losing peers, sticky
take issues for sessions, etc