RE: 'packet of death' in 1.5-dev21.x86_64.el6_4
Hi, Not a problem ... our Head of IS did a detailed write up on our investigation process and findings at his blog if you are interested: http://blog.tinola.com/?e=36 Thanks, thats really interesting and very detailed. Someone from RedHat really should take a look at this. Most likely EAI_NODATA is not defined in the libc, thats why upgrading libc helps and upgrading libkrb5 doesn't. So the real problem is that getaddrinfo() returns an error code unknown to the libc (other applications than libkrb5 may suffer from problems as well; although they probably don't abort()). Looks like EAI_NODATA is deprecated, and its already removed from freebsd for example, in favor of EAI_NONAME [1]. As for the workaround: you should be able to disable the kerberos ciphers in HAproxy configuration, so that you can continue to run it in chroot. Or maybe compiling with -DEAI_NODATA=EAI_NONAME would help? What are those ciphers anyway (openssl ciphers -v 'LOW')? I don't seem to have them here on ubuntu ... [1] http://krbdev.mit.edu/rt/Ticket/History.html?id=5518
Re: 'packet of death' in 1.5-dev21.x86_64.el6_4
On Fri, Feb 07, 2014 at 07:23:42PM +0100, Lukas Tribus wrote: Hi, Not a problem ... our Head of IS did a detailed write up on our investigation process and findings at his blog if you are interested: http://blog.tinola.com/?e=36 Thanks, thats really interesting and very detailed. Indeed. Someone from RedHat really should take a look at this. Most likely EAI_NODATA is not defined in the libc, thats why upgrading libc helps and upgrading libkrb5 doesn't. So the real problem is that getaddrinfo() returns an error code unknown to the libc (other applications than libkrb5 may suffer from problems as well; although they probably don't abort()). I've passed along the information to the appropriate people. Interesting that it is fixed in Centos 6.5, would be great to know how it was fixed. I took a quick look at krb5-libs and glibc and nothing jumped out at me. Ryan Looks like EAI_NODATA is deprecated, and its already removed from freebsd for example, in favor of EAI_NONAME [1]. As for the workaround: you should be able to disable the kerberos ciphers in HAproxy configuration, so that you can continue to run it in chroot. Or maybe compiling with -DEAI_NODATA=EAI_NONAME would help? What are those ciphers anyway (openssl ciphers -v 'LOW')? I don't seem to have them here on ubuntu ... [1] http://krbdev.mit.edu/rt/Ticket/History.html?id=5518
Re: 'packet of death' in 1.5-dev21.x86_64.el6_4
Hi, Just providing some closure to this interesting edge case... Thank you for this detailed report, this is *very* useful. As you tracked the crash to happen inside openssl, I think you should file a report to centos/redhat because it's a security issue. CentOS is bug-for-bug compatible (see recent libsvr2 issue there) and seeing as we don't at this time have a support contract with RH plus it it is not reproducible on the current release (the glibc in 6.5 fixes it) I doubt there will be any traction in that direction. It's possible that the bug is easier to trigger with haproxy or with a specific version of it than other products, but nevertheless, no lib should ever crash depending on the traffic so I suspect there's an unchecked error code in it causing a NULL pointer to be dereferenced. It wasn't a null pointer but an explicit choice of behaviour in libkrb5 it seems... In 1.5-dev12, I believe we did not yet support SNI, which could be an explanation for the different behaviour between the two versions. I think that the chroot is needed to trigger the bug simply because the glibc does not find a file it looks up, and causes a different return code to be fed to openssl. This is possible and seems a likely possibility ... Incidentally blocking one of the 'ciphers of death' by not permitting it on the bind also appears to avoid the code path - the 'cipher not permitted' gets triggered before whatever leads to the query through libkrb that results in the SIGABRT. It would be useful to know if you can also trigger the issue using the legacy openssl library instead of the distro's version (just pick 1.0.0l or 1.0.1f from the site if you're willing to rebuild it). We probably won't get a chance to do this and we're unlikely to move off of the libssl supported by the distro due to the added maintenance overhead. Thanks a lot! Willy Not a problem ... our Head of IS did a detailed write up on our investigation process and findings at his blog if you are interested: http://blog.tinola.com/?e=36 Cheers, James
Re: 'packet of death' in 1.5-dev21.x86_64.el6_4
Hi James, On Thu, Feb 06, 2014 at 08:36:00PM +, James Hogarth wrote: CentOS is bug-for-bug compatible (see recent libsvr2 issue there) and seeing as we don't at this time have a support contract with RH plus it it is not reproducible on the current release (the glibc in 6.5 fixes it) I doubt there will be any traction in that direction. Possible, but according to this below, 6.4 is very recent and supposed to be maintained till 2015 : https://access.redhat.com/site/support/policy/updates/errata/ So maybe they're interested in backporting the fix from 6.5 into 6.4. There are some Red Hat people here on the list, maybe they could relay that information internally (Ryan ?). It's possible that the bug is easier to trigger with haproxy or with a specific version of it than other products, but nevertheless, no lib should ever crash depending on the traffic so I suspect there's an unchecked error code in it causing a NULL pointer to be dereferenced. It wasn't a null pointer but an explicit choice of behaviour in libkrb5 it seems... Indeed, but given the code you showed, I suspect that the abort() was put there a bit in a hurry or as a sign of despair. abort() followed by return -1 is pointless and not that common! In 1.5-dev12, I believe we did not yet support SNI, which could be an explanation for the different behaviour between the two versions. I think that the chroot is needed to trigger the bug simply because the glibc does not find a file it looks up, and causes a different return code to be fed to openssl. This is possible and seems a likely possibility ... Incidentally blocking one of the 'ciphers of death' by not permitting it on the bind also appears to avoid the code path - the 'cipher not permitted' gets triggered before whatever leads to the query through libkrb that results in the SIGABRT. Yes so it seems you needed a perfect alignment of planets for this to happen, but that in your environments, planets are always aligned :-/ It would be useful to know if you can also trigger the issue using the legacy openssl library instead of the distro's version (just pick 1.0.0l or 1.0.1f from the site if you're willing to rebuild it). We probably won't get a chance to do this and we're unlikely to move off of the libssl supported by the distro due to the added maintenance overhead. I easily understand! Especially since you really want to rely on someone who knows it well to correctly backport only the right fixes and not the bogus ones from upstream... Thanks a lot! Willy Not a problem ... our Head of IS did a detailed write up on our investigation process and findings at his blog if you are interested: http://blog.tinola.com/?e=36 Ah cool, thanks for the link! Cheers, Willy
'packet of death' in 1.5-dev21.x86_64.el6_4
Hi all, We carried out an update from dev12 to dev21 as per my previous message to the list and the specific issue I mentioned before no longer occurred - good. Unfortunately we hit a fairly major problem which summed up is a 'packet of death' scenario that affects dev21 (have not built backwards in time yet to determine the exact 'regression') but not dev12. I mentioned this briefly in IRC but for archive's sake here's what we spent the day debugging: Steps to reproduce: 1) Build a C6.4 system using vault.centos.org 2) Install haproxy-1.5-dev21.x86_64 3) Set the sysctl properties net.ipv6.conf.default.disable_ipv6 = 1 and net.ipv6.conf.all.disable_ipv6 = 1 4) Start a basic haproxy configuration with a chroot specified and an ssl enabled listener. 5) Use a tool that can specify cipher strength (LOW or MEDIUM crashes but HIGH does not) to open a connection to the listener such as ApacheBench eg: ab -c 1 -n 1 -Z LOW https://targethost/ Results: Haproxy gets a segfault - specifically a SIGABRT - and the process dies. Expectation: Haproxy carries on working without issue. Workarounds: As you can read above there's specific things that need to line up for this to occur (and unfortunately we hit them all on our production systems as it turns out). 1) If the glibc from 6.5 is installed (and yes boxes should be updated and 'there is only C6 not C6.X' should prevail) then no crash occurs. 2) If haproxy is not in a chroot then no crash occurs. 3) If ipv6 is not disabled (ie systctl reports disable_ipv6 = 0) then no crash occurs. Specifics: We spent most of the day in gdb, strace and ltrace today working through the specific codepaths and it would appear that SSL makes a call through SSL_get_hostname which then calls through libkrb krb5int_get_fq_local_hostname (localhost, sizeof(localhost)) and the routines this calls ends up with -5 (EAI_NODATA) being passed to krb5int_translate_gai_error which then, if EAI_NODATA has not been defined which appears might be the case without __USE_GNU, calls abort(). As you can see we went into some detail on this ... We're not sure at this time why dev12 does not crash or specifically what fixes in glibc for the el6.5 point release avoid this code path and crash (updating openssl or krb5-libs does not help only glibc). TL:DR; chroot haproxy, disable_ipv6 and run c6.4 with an SSL front end and ab -c 1 -n 1 -Z LOW https://target/ is enough to crash your haproxy with a SIGABRT. I'm not sure how much more we are going to deep dive this given the available workarounds but it's a heads up for anyone else that hits the three criteria and an interesting problem for why, exactly, it happens ;) If anyone has any thoughts or insights I'd be intrigued to hear them and if you want to reproduce and have difficulties doing so I'd be happy to help. Cheers, James
RE: 'packet of death' in 1.5-dev21.x86_64.el6_4
Hi, If anyone has any thoughts or insights I'd be intrigued to hear them and if you want to reproduce and have difficulties doing so I'd be happy to help. Please provide the smallest config you can reproduce the problem with and the output of haproxy -vv. I cannot currently reproduce this. Also, can you please provide: ldd haproxy yum info glibc yum info glibc-devel yum info openssl yum info openssl-devel Any other dependencies in haproxy? Did you build with compression or pcre support? If yes, please provide: yum info pcre yum info pcre-devel yum info zlib yum info zlib-devel Are you able to reproduce this with a openssl s_connect command? That would provide more SSL related output (like the cipher used to connect). Regards, Lukas
Re: 'packet of death' in 1.5-dev21.x86_64.el6_4
On 30 January 2014 22:21, Lukas Tribus luky...@hotmail.com wrote: Hi, If anyone has any thoughts or insights I'd be intrigued to hear them and if you want to reproduce and have difficulties doing so I'd be happy to help. Please provide the smallest config you can reproduce the problem with and the output of haproxy -vv. I cannot currently reproduce this. Also, can you please provide: ldd haproxy [root@localhost ~]# ldd /usr/sbin/haproxy linux-vdso.so.1 = (0x7fff3e5d1000) libcrypt.so.1 = /lib64/libcrypt.so.1 (0x7fdd7c79e000) libssl.so.10 = /usr/lib64/libssl.so.10 (0x7fdd7c541000) libcrypto.so.10 = /usr/lib64/libcrypto.so.10 (0x7fdd7c1a6000) libpcreposix.so.0 = /usr/lib64/libpcreposix.so.0 (0x7fdd7bfa4000) libpcre.so.0 = /lib64/libpcre.so.0 (0x7fdd7bd78000) libc.so.6 = /lib64/libc.so.6 (0x7fdd7b9e4000) libfreebl3.so = /lib64/libfreebl3.so (0x7fdd7b782000) libgssapi_krb5.so.2 = /lib64/libgssapi_krb5.so.2 (0x7fdd7b53e000) libkrb5.so.3 = /lib64/libkrb5.so.3 (0x7fdd7b257000) libcom_err.so.2 = /lib64/libcom_err.so.2 (0x7fdd7b053000) libk5crypto.so.3 = /lib64/libk5crypto.so.3 (0x7fdd7ae27000) libdl.so.2 = /lib64/libdl.so.2 (0x7fdd7ac22000) libz.so.1 = /lib64/libz.so.1 (0x7fdd7aa0c000) /lib64/ld-linux-x86-64.so.2 (0x7fdd7c9db000) libkrb5support.so.0 = /lib64/libkrb5support.so.0 (0x7fdd7a801000) libkeyutils.so.1 = /lib64/libkeyutils.so.1 (0x7fdd7a5fd000) libresolv.so.2 = /lib64/libresolv.so.2 (0x7fdd7a3e3000) libpthread.so.0 = /lib64/libpthread.so.0 (0x7fdd7a1c6000) libselinux.so.1 = /lib64/libselinux.so.1 (0x7fdd79fa6000) yum info glibc [root@localhost ~]# rpm -qi glibc Name: glibcRelocations: (not relocatable) Version : 2.12 Vendor: CentOS Release : 1.107.el6_4.5 Build Date: Mon 14 Oct 2013 09:14:18 BST Install Date: Thu 30 Jan 2014 17:59:28 GMT Build Host: c6b8.bsys.dev.centos.org Group : System Environment/Libraries Source RPM: glibc-2.12-1.107.el6_4.5.src.rpm Size: 12947502 License: LGPLv2+ and LGPLv2+ with exceptions and GPLv2+ Signature : RSA/SHA1, Mon 14 Oct 2013 16:44:18 BST, Key ID 0946fca2c105b9de Packager: CentOS BuildSystem http://bugs.centos.org URL : http://sources.redhat.com/glibc/ Summary : The GNU libc libraries yum info glibc-devel [root@localhost ~]# rpm -qi glibc-devel Name: glibc-devel Relocations: (not relocatable) Version : 2.12 Vendor: CentOS Release : 1.107.el6_4.5 Build Date: Mon 14 Oct 2013 09:14:18 BST Install Date: Thu 30 Jan 2014 17:59:35 GMT Build Host: c6b8.bsys.dev.centos.org Group : Development/Libraries Source RPM: glibc-2.12-1.107.el6_4.5.src.rpm Size: 988726 License: LGPLv2+ and LGPLv2+ with exceptions and GPLv2+ Signature : RSA/SHA1, Mon 14 Oct 2013 16:44:18 BST, Key ID 0946fca2c105b9de Packager: CentOS BuildSystem http://bugs.centos.org URL : http://sources.redhat.com/glibc/ Summary : Object files for development using standard C libraries. yum info openssl [root@localhost ~]# rpm -qi openssl Name: openssl Relocations: (not relocatable) Version : 1.0.0 Vendor: CentOS Release : 27.el6_4.2Build Date: Mon 04 Mar 2013 22:23:22 GMT Install Date: Thu 30 Jan 2014 14:57:25 GMT Build Host: c6b9.bsys.dev.centos.org Group : System Environment/Libraries Source RPM: openssl-1.0.0-27.el6_4.2.src.rpm Size: 3776670 License: OpenSSL Signature : RSA/SHA1, Mon 04 Mar 2013 22:33:47 GMT, Key ID 0946fca2c105b9de Packager: CentOS BuildSystem http://bugs.centos.org URL : http://www.openssl.org/ Summary : A general purpose cryptography library with TLS implementation yum info openssl-devel [root@localhost ~]# rpm -qi openssl-devel Name: openssl-develRelocations: (not relocatable) Version : 1.0.0 Vendor: CentOS Release : 27.el6_4.2Build Date: Mon 04 Mar 2013 22:23:22 GMT Install Date: Thu 30 Jan 2014 14:57:45 GMT Build Host: c6b9.bsys.dev.centos.org Group : Development/Libraries Source RPM: openssl-1.0.0-27.el6_4.2.src.rpm Size: 2162085 License: OpenSSL Signature : RSA/SHA1, Mon 04 Mar 2013 22:33:47 GMT, Key ID 0946fca2c105b9de Packager: CentOS BuildSystem http://bugs.centos.org URL : http://www.openssl.org/ Any other dependencies in haproxy? Did you build with compression or pcre support? Make line in spec file is: make USE_OPENSSL=1 %{?_smp_mflags} CPU=generic TARGET=linux26 USE_PCRE=1 USE_REGPARM=1 ADDINC=%{optflags} -I/usr/include/pcre If yes, please provide: yum info pcre
Re: 'packet of death' in 1.5-dev21.x86_64.el6_4
On 30 January 2014 22:21, Lukas Tribus luky...@hotmail.com wrote: Please provide the smallest config you can reproduce the problem with and the output of haproxy -vv. I cannot currently reproduce this. Sorry I missed config and -vvv : [root@localhost ~]# haproxy -vvv HA-Proxy version 1.5-dev21-6b07bf7 +2013/12/17 Copyright 2000-2013 Willy Tarreau w...@1wt.eu Build options : TARGET = linux26 CPU = generic CC = gcc CFLAGS = -O2 -g -fno-strict-aliasing OPTIONS = USE_REGPARM=1 USE_OPENSSL=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200 Encrypted password support via crypt(3): yes Built without zlib support (USE_ZLIB not set) Compression algorithms supported : identity Built with OpenSSL version : OpenSSL 1.0.0-fips 29 Mar 2010 Running on OpenSSL version : OpenSSL 1.0.0-fips 29 Mar 2010 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 7.8 2008-09-05 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built with transparent proxy support using: IP_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. [root@localhost ~]# cat /etc/haproxy/haproxy.cfg global chroot /var/lib/haproxy pidfile /var/run/haproxy.pid defaults clitimeout 18 srvtimeout 18 contimeout 4000 listen test-fe bind 192.168.122.54:443 ssl crt /etc/pki/tls/private/cert.pem mode tcp server test-server 192.168.122.244:9090 check Note you don't need a working backend server ... just as long as you're listening the behaviour occurs. Cheers, James
Re: 'packet of death' in 1.5-dev21.x86_64.el6_4
Hi James, On Thu, Jan 30, 2014 at 11:20:07PM +, James Hogarth wrote: On 30 January 2014 22:21, Lukas Tribus luky...@hotmail.com wrote: Please provide the smallest config you can reproduce the problem with and the output of haproxy -vv. I cannot currently reproduce this. Sorry I missed config and -vvv : [root@localhost ~]# haproxy -vvv HA-Proxy version 1.5-dev21-6b07bf7 +2013/12/17 Copyright 2000-2013 Willy Tarreau w...@1wt.eu (...) Thank you for this detailed report, this is *very* useful. As you tracked the crash to happen inside openssl, I think you should file a report to centos/redhat because it's a security issue. It's possible that the bug is easier to trigger with haproxy or with a specific version of it than other products, but nevertheless, no lib should ever crash depending on the traffic so I suspect there's an unchecked error code in it causing a NULL pointer to be dereferenced. In 1.5-dev12, I believe we did not yet support SNI, which could be an explanation for the different behaviour between the two versions. I think that the chroot is needed to trigger the bug simply because the glibc does not find a file it looks up, and causes a different return code to be fed to openssl. It would be useful to know if you can also trigger the issue using the legacy openssl library instead of the distro's version (just pick 1.0.0l or 1.0.1f from the site if you're willing to rebuild it). Thanks a lot! Willy