Re: SSL, peered sticky tables + nbproc 1?
On 2 May 2014 19:02, Willy Tarreau w...@1wt.eu wrote: On Fri, May 02, 2014 at 10:59:00AM -0700, Bryan Talbot wrote: It sounds like that Jeff ran out of CPU for SSL terminations and that could be addressed as described by Willy here https://www.mail-archive.com/haproxy@formilux.org/msg13104.html and allow him to stay with a single-process stick table for the actual load balancing. Yes that's perfectly possible. And when we have proxy proto v2 with SSL info, it'll be even better :-) Willy We've done quite a bit of work on this internally recently to provide SSL multiprocess with sane load balancing. There's a couple of small edge cases we've got left then we were intending to push it up for comments... I've literally just got home but I'll follow up in the office next week to see how close we are. James
Re: 'packet of death' in 1.5-dev21.x86_64.el6_4
Hi, Just providing some closure to this interesting edge case... Thank you for this detailed report, this is *very* useful. As you tracked the crash to happen inside openssl, I think you should file a report to centos/redhat because it's a security issue. CentOS is bug-for-bug compatible (see recent libsvr2 issue there) and seeing as we don't at this time have a support contract with RH plus it it is not reproducible on the current release (the glibc in 6.5 fixes it) I doubt there will be any traction in that direction. It's possible that the bug is easier to trigger with haproxy or with a specific version of it than other products, but nevertheless, no lib should ever crash depending on the traffic so I suspect there's an unchecked error code in it causing a NULL pointer to be dereferenced. It wasn't a null pointer but an explicit choice of behaviour in libkrb5 it seems... In 1.5-dev12, I believe we did not yet support SNI, which could be an explanation for the different behaviour between the two versions. I think that the chroot is needed to trigger the bug simply because the glibc does not find a file it looks up, and causes a different return code to be fed to openssl. This is possible and seems a likely possibility ... Incidentally blocking one of the 'ciphers of death' by not permitting it on the bind also appears to avoid the code path - the 'cipher not permitted' gets triggered before whatever leads to the query through libkrb that results in the SIGABRT. It would be useful to know if you can also trigger the issue using the legacy openssl library instead of the distro's version (just pick 1.0.0l or 1.0.1f from the site if you're willing to rebuild it). We probably won't get a chance to do this and we're unlikely to move off of the libssl supported by the distro due to the added maintenance overhead. Thanks a lot! Willy Not a problem ... our Head of IS did a detailed write up on our investigation process and findings at his blog if you are interested: http://blog.tinola.com/?e=36 Cheers, James
'packet of death' in 1.5-dev21.x86_64.el6_4
Hi all, We carried out an update from dev12 to dev21 as per my previous message to the list and the specific issue I mentioned before no longer occurred - good. Unfortunately we hit a fairly major problem which summed up is a 'packet of death' scenario that affects dev21 (have not built backwards in time yet to determine the exact 'regression') but not dev12. I mentioned this briefly in IRC but for archive's sake here's what we spent the day debugging: Steps to reproduce: 1) Build a C6.4 system using vault.centos.org 2) Install haproxy-1.5-dev21.x86_64 3) Set the sysctl properties net.ipv6.conf.default.disable_ipv6 = 1 and net.ipv6.conf.all.disable_ipv6 = 1 4) Start a basic haproxy configuration with a chroot specified and an ssl enabled listener. 5) Use a tool that can specify cipher strength (LOW or MEDIUM crashes but HIGH does not) to open a connection to the listener such as ApacheBench eg: ab -c 1 -n 1 -Z LOW https://targethost/ Results: Haproxy gets a segfault - specifically a SIGABRT - and the process dies. Expectation: Haproxy carries on working without issue. Workarounds: As you can read above there's specific things that need to line up for this to occur (and unfortunately we hit them all on our production systems as it turns out). 1) If the glibc from 6.5 is installed (and yes boxes should be updated and 'there is only C6 not C6.X' should prevail) then no crash occurs. 2) If haproxy is not in a chroot then no crash occurs. 3) If ipv6 is not disabled (ie systctl reports disable_ipv6 = 0) then no crash occurs. Specifics: We spent most of the day in gdb, strace and ltrace today working through the specific codepaths and it would appear that SSL makes a call through SSL_get_hostname which then calls through libkrb krb5int_get_fq_local_hostname (localhost, sizeof(localhost)) and the routines this calls ends up with -5 (EAI_NODATA) being passed to krb5int_translate_gai_error which then, if EAI_NODATA has not been defined which appears might be the case without __USE_GNU, calls abort(). As you can see we went into some detail on this ... We're not sure at this time why dev12 does not crash or specifically what fixes in glibc for the el6.5 point release avoid this code path and crash (updating openssl or krb5-libs does not help only glibc). TL:DR; chroot haproxy, disable_ipv6 and run c6.4 with an SSL front end and ab -c 1 -n 1 -Z LOW https://target/ is enough to crash your haproxy with a SIGABRT. I'm not sure how much more we are going to deep dive this given the available workarounds but it's a heads up for anyone else that hits the three criteria and an interesting problem for why, exactly, it happens ;) If anyone has any thoughts or insights I'd be intrigued to hear them and if you want to reproduce and have difficulties doing so I'd be happy to help. Cheers, James
Re: 'packet of death' in 1.5-dev21.x86_64.el6_4
On 30 January 2014 22:21, Lukas Tribus luky...@hotmail.com wrote: Hi, If anyone has any thoughts or insights I'd be intrigued to hear them and if you want to reproduce and have difficulties doing so I'd be happy to help. Please provide the smallest config you can reproduce the problem with and the output of haproxy -vv. I cannot currently reproduce this. Also, can you please provide: ldd haproxy [root@localhost ~]# ldd /usr/sbin/haproxy linux-vdso.so.1 = (0x7fff3e5d1000) libcrypt.so.1 = /lib64/libcrypt.so.1 (0x7fdd7c79e000) libssl.so.10 = /usr/lib64/libssl.so.10 (0x7fdd7c541000) libcrypto.so.10 = /usr/lib64/libcrypto.so.10 (0x7fdd7c1a6000) libpcreposix.so.0 = /usr/lib64/libpcreposix.so.0 (0x7fdd7bfa4000) libpcre.so.0 = /lib64/libpcre.so.0 (0x7fdd7bd78000) libc.so.6 = /lib64/libc.so.6 (0x7fdd7b9e4000) libfreebl3.so = /lib64/libfreebl3.so (0x7fdd7b782000) libgssapi_krb5.so.2 = /lib64/libgssapi_krb5.so.2 (0x7fdd7b53e000) libkrb5.so.3 = /lib64/libkrb5.so.3 (0x7fdd7b257000) libcom_err.so.2 = /lib64/libcom_err.so.2 (0x7fdd7b053000) libk5crypto.so.3 = /lib64/libk5crypto.so.3 (0x7fdd7ae27000) libdl.so.2 = /lib64/libdl.so.2 (0x7fdd7ac22000) libz.so.1 = /lib64/libz.so.1 (0x7fdd7aa0c000) /lib64/ld-linux-x86-64.so.2 (0x7fdd7c9db000) libkrb5support.so.0 = /lib64/libkrb5support.so.0 (0x7fdd7a801000) libkeyutils.so.1 = /lib64/libkeyutils.so.1 (0x7fdd7a5fd000) libresolv.so.2 = /lib64/libresolv.so.2 (0x7fdd7a3e3000) libpthread.so.0 = /lib64/libpthread.so.0 (0x7fdd7a1c6000) libselinux.so.1 = /lib64/libselinux.so.1 (0x7fdd79fa6000) yum info glibc [root@localhost ~]# rpm -qi glibc Name: glibcRelocations: (not relocatable) Version : 2.12 Vendor: CentOS Release : 1.107.el6_4.5 Build Date: Mon 14 Oct 2013 09:14:18 BST Install Date: Thu 30 Jan 2014 17:59:28 GMT Build Host: c6b8.bsys.dev.centos.org Group : System Environment/Libraries Source RPM: glibc-2.12-1.107.el6_4.5.src.rpm Size: 12947502 License: LGPLv2+ and LGPLv2+ with exceptions and GPLv2+ Signature : RSA/SHA1, Mon 14 Oct 2013 16:44:18 BST, Key ID 0946fca2c105b9de Packager: CentOS BuildSystem http://bugs.centos.org URL : http://sources.redhat.com/glibc/ Summary : The GNU libc libraries yum info glibc-devel [root@localhost ~]# rpm -qi glibc-devel Name: glibc-devel Relocations: (not relocatable) Version : 2.12 Vendor: CentOS Release : 1.107.el6_4.5 Build Date: Mon 14 Oct 2013 09:14:18 BST Install Date: Thu 30 Jan 2014 17:59:35 GMT Build Host: c6b8.bsys.dev.centos.org Group : Development/Libraries Source RPM: glibc-2.12-1.107.el6_4.5.src.rpm Size: 988726 License: LGPLv2+ and LGPLv2+ with exceptions and GPLv2+ Signature : RSA/SHA1, Mon 14 Oct 2013 16:44:18 BST, Key ID 0946fca2c105b9de Packager: CentOS BuildSystem http://bugs.centos.org URL : http://sources.redhat.com/glibc/ Summary : Object files for development using standard C libraries. yum info openssl [root@localhost ~]# rpm -qi openssl Name: openssl Relocations: (not relocatable) Version : 1.0.0 Vendor: CentOS Release : 27.el6_4.2Build Date: Mon 04 Mar 2013 22:23:22 GMT Install Date: Thu 30 Jan 2014 14:57:25 GMT Build Host: c6b9.bsys.dev.centos.org Group : System Environment/Libraries Source RPM: openssl-1.0.0-27.el6_4.2.src.rpm Size: 3776670 License: OpenSSL Signature : RSA/SHA1, Mon 04 Mar 2013 22:33:47 GMT, Key ID 0946fca2c105b9de Packager: CentOS BuildSystem http://bugs.centos.org URL : http://www.openssl.org/ Summary : A general purpose cryptography library with TLS implementation yum info openssl-devel [root@localhost ~]# rpm -qi openssl-devel Name: openssl-develRelocations: (not relocatable) Version : 1.0.0 Vendor: CentOS Release : 27.el6_4.2Build Date: Mon 04 Mar 2013 22:23:22 GMT Install Date: Thu 30 Jan 2014 14:57:45 GMT Build Host: c6b9.bsys.dev.centos.org Group : Development/Libraries Source RPM: openssl-1.0.0-27.el6_4.2.src.rpm Size: 2162085 License: OpenSSL Signature : RSA/SHA1, Mon 04 Mar 2013 22:33:47 GMT, Key ID 0946fca2c105b9de Packager: CentOS BuildSystem http://bugs.centos.org URL : http://www.openssl.org/ Any other dependencies in haproxy? Did you build with compression or pcre support? Make line in spec file is: make USE_OPENSSL=1 %{?_smp_mflags} CPU=generic TARGET=linux26 USE_PCRE=1 USE_REGPARM=1 ADDINC=%{optflags} -I/usr/include/pcre If yes, please provide: yum info pcre
Re: 'packet of death' in 1.5-dev21.x86_64.el6_4
On 30 January 2014 22:21, Lukas Tribus luky...@hotmail.com wrote: Please provide the smallest config you can reproduce the problem with and the output of haproxy -vv. I cannot currently reproduce this. Sorry I missed config and -vvv : [root@localhost ~]# haproxy -vvv HA-Proxy version 1.5-dev21-6b07bf7 +2013/12/17 Copyright 2000-2013 Willy Tarreau w...@1wt.eu Build options : TARGET = linux26 CPU = generic CC = gcc CFLAGS = -O2 -g -fno-strict-aliasing OPTIONS = USE_REGPARM=1 USE_OPENSSL=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200 Encrypted password support via crypt(3): yes Built without zlib support (USE_ZLIB not set) Compression algorithms supported : identity Built with OpenSSL version : OpenSSL 1.0.0-fips 29 Mar 2010 Running on OpenSSL version : OpenSSL 1.0.0-fips 29 Mar 2010 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 7.8 2008-09-05 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built with transparent proxy support using: IP_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. [root@localhost ~]# cat /etc/haproxy/haproxy.cfg global chroot /var/lib/haproxy pidfile /var/run/haproxy.pid defaults clitimeout 18 srvtimeout 18 contimeout 4000 listen test-fe bind 192.168.122.54:443 ssl crt /etc/pki/tls/private/cert.pem mode tcp server test-server 192.168.122.244:9090 check Note you don't need a working backend server ... just as long as you're listening the behaviour occurs. Cheers, James
Haproxy disconnecting SSL clients with minimal changes from last week
Hi all, SSL sessions appear to be being closed by haproxy (FIN being sent to client and server). This seems to be pretty intermittent in frequency but very reproducible over time with enough data. Haproxy is logging it as a client disconnect (cd) ... This was fine last week and no changes have occurred in the system or haproxy configuration excepting a new SSL certificate with an increase in key size form 2048bit to 4096bit. We are running haproxy 1.5-dev12 at the moment on CentOS 6.4 64bit. We're having troubles isolating a specific reason we might be seeing this now and not last week... Has anyone seen anything like this before? Cheers, James
Re: Haproxy disconnecting SSL clients with minimal changes from last week
On 27 Jan 2014 21:26, Baptiste bed...@gmail.com wrote: HAProxy 1.5-dev12 is the very first version with SSL inside. Please upgrade to the latest GIT version which includes so many fixes and improvements. Hi Baptiste, We figured that would be a good step and have indeed just done so... Hopefully that does indeed resolve our issues. Cheers James
Re: Improving behaviour when nbproc 1
On 9 December 2013 22:23, Chris Burroughs chris.burrou...@gmail.com wrote: On 12/09/2013 05:02 PM, James Hogarth wrote: To answer yes it is against 1.5 ... The caveats are the peers don't work and the session table and load balancing can get messed up due to the lack of shared information between processes but if you just need to utilise multiple stat sockets and the rest doesn't matter so much then it works nicely. But thesse are the same caveats that nbproc has always had, not new ones, correct? Correct - if nbrpoc was satisfactory before that does not alter that behaviour - just provides the additional separated stat sockets.
Re: Improving behaviour when nbproc 1
On 4 Dec 2013 16:23, Chris Burroughs chris.burrou...@gmail.com wrote: On 12/03/2013 04:07 PM, Chris Burroughs wrote: This could just be me not being adept at email patches. Sorry if this is obvious but is this supposed to apply against 1.4 or 1.5? To answer my own question this applies against 1.5. I'm not sure of the feasibility or desirability of backporting to 1.4. I ran with this in a production load test and as far I can tell it worked as advertised. We were able to run with nbproc and still have useful looking stats sockets for haptop, ganglia etc. At least in our use case, stats sockets with this patch solve the primary objection to running with nbproc. Apologies for missing your message the other day. To answer yes it is against 1.5 ... The caveats are the peers don't work and the session table and load balancing can get messed up due to the lack of shared information between processes but if you just need to utilise multiple stat sockets and the rest doesn't matter so much then it works nicely.
Re: %usr/sys nbproc
On 26 Nov 2013 12:26, Chris Burroughs chris.burrou...@gmail.com wrote: Highly threaded Java apps happens to be what most of our servers are used for and what we benchmarked for purchasing decisions. Finally assuming the single process performance can not be further improved I was considering the following setup: * core 0: eht0 interrupts * core 1: haproxy bound to eth0 * core 2: eth1 interrupts * core 3: haproxy bound to eth1 * core 4-5: ssl terminator I definitely agree. I know at least one setup which runs fine this way. It was a two-socket system, each with its own NIC and process. But here you're in the same situation, consider that you have 3 independant CPUs in the same box. The benefit of doing it this way is that you can still parallelize network interrupts to multiple cores without having the response traffic come to the wrong core (proxies are a hell to optimize because of their two sides). This setup (haproxy per NIC) was able to handle 50% more load than a single haproxy. So from about 20k req/s to 30k. This is very nice bump with with what would otherwise be mostly idle cpu cores. We found this to be very complex to setup at the IP layer though (which isn't haproxy's fault but in our particular circumstances might not be worth it). But I could not find too many examples of similar setups and was unsure if it was a viable long term configuration. Yes it is viable. The only limit right now is that you'll need to start two processes. In the future, when listeners reliably support the bind-process keyword, it will even be possible to centralize everything and have a dedicated stats socket for each. Chris it sounds like we are in a very similar business given I'm in the process of tuning almost the same case... Have a look at our patch we sent to the list a few days or so ago providing a stats socket power process. Depending on your use case it might provide what you need... But note it still has the other issues with nbproc 1 such as losing peers, sticky take issues for sessions, etc