RE: 'packet of death' in 1.5-dev21.x86_64.el6_4

2014-02-07 Thread Lukas Tribus
Hi,


 Not a problem ... our Head of IS did a detailed write up on our
 investigation process and findings at his blog if you are interested:

 http://blog.tinola.com/?e=36

Thanks, thats really interesting and very detailed.


Someone from RedHat really should take a look at this. Most likely
EAI_NODATA is not defined in the libc, thats why upgrading libc
helps and upgrading libkrb5 doesn't. So the real problem is that
getaddrinfo() returns an error code unknown to the libc (other
applications than libkrb5 may suffer from problems as well; although
they probably don't abort()).

Looks like EAI_NODATA is deprecated, and its already removed from
freebsd for example, in favor of EAI_NONAME [1].


As for the workaround: you should be able to disable the kerberos
ciphers in HAproxy configuration, so that you can continue to run
it in chroot. Or maybe compiling with -DEAI_NODATA=EAI_NONAME would
help?

What are those ciphers anyway (openssl ciphers -v 'LOW')? I don't
seem to have them here on ubuntu ...



[1] http://krbdev.mit.edu/rt/Ticket/History.html?id=5518
  


Re: 'packet of death' in 1.5-dev21.x86_64.el6_4

2014-02-07 Thread Ryan O'Hara
On Fri, Feb 07, 2014 at 07:23:42PM +0100, Lukas Tribus wrote:
 Hi,
 
 
  Not a problem ... our Head of IS did a detailed write up on our
  investigation process and findings at his blog if you are interested:
 
  http://blog.tinola.com/?e=36
 
 Thanks, thats really interesting and very detailed.

Indeed.

 Someone from RedHat really should take a look at this. Most likely
 EAI_NODATA is not defined in the libc, thats why upgrading libc
 helps and upgrading libkrb5 doesn't. So the real problem is that
 getaddrinfo() returns an error code unknown to the libc (other
 applications than libkrb5 may suffer from problems as well; although
 they probably don't abort()).

I've passed along the information to the appropriate
people. Interesting that it is fixed in Centos 6.5, would be great to
know how it was fixed. I took a quick look at krb5-libs and glibc and
nothing jumped out at me.

Ryan

 Looks like EAI_NODATA is deprecated, and its already removed from
 freebsd for example, in favor of EAI_NONAME [1].
 
 
 As for the workaround: you should be able to disable the kerberos
 ciphers in HAproxy configuration, so that you can continue to run
 it in chroot. Or maybe compiling with -DEAI_NODATA=EAI_NONAME would
 help?
 
 What are those ciphers anyway (openssl ciphers -v 'LOW')? I don't
 seem to have them here on ubuntu ...
 
 
 
 [1] http://krbdev.mit.edu/rt/Ticket/History.html?id=5518  
   



Re: 'packet of death' in 1.5-dev21.x86_64.el6_4

2014-02-06 Thread James Hogarth
Hi,

Just providing some closure to this interesting edge case...


 Thank you for this detailed report, this is *very* useful. As you tracked
 the crash to happen inside openssl, I think you should file a report to
 centos/redhat because it's a security issue.


CentOS is bug-for-bug compatible (see recent libsvr2 issue there) and
seeing as we don't at this time have a support contract with RH plus it  it
is not reproducible on the current release (the glibc in 6.5 fixes it) I
doubt there will be any traction in that direction.


 It's possible that the bug
 is easier to trigger with haproxy or with a specific version of it than
 other products, but nevertheless, no lib should ever crash depending on
 the traffic so I suspect there's an unchecked error code in it causing a
 NULL pointer to be dereferenced.


It wasn't a null pointer but an explicit choice of behaviour in libkrb5 it
seems...


 In 1.5-dev12, I believe we did not yet support SNI, which could be an
 explanation for the different behaviour between the two versions. I
 think that the chroot is needed to trigger the bug simply because the
 glibc does not find a file it looks up, and causes a different return
 code to be fed to openssl.


This is possible and seems a likely possibility ... Incidentally blocking
one of the 'ciphers of death' by not permitting it on the bind also appears
to avoid the code path - the 'cipher not permitted' gets triggered before
whatever leads to the query through libkrb that results in the SIGABRT.


 It would be useful to know if you can also
 trigger the issue using the legacy openssl library instead of the
 distro's version (just pick 1.0.0l or 1.0.1f from the site if you're
 willing to rebuild it).


We probably won't get a chance to do this and we're unlikely to move off of
the libssl supported by the distro due to the added maintenance overhead.



 Thanks a lot!
 Willy


Not a problem ... our Head of IS did a detailed write up on our
investigation process and findings at his blog if you are interested:

http://blog.tinola.com/?e=36

Cheers,

James


Re: 'packet of death' in 1.5-dev21.x86_64.el6_4

2014-02-06 Thread Willy Tarreau
Hi James,

On Thu, Feb 06, 2014 at 08:36:00PM +, James Hogarth wrote:
 CentOS is bug-for-bug compatible (see recent libsvr2 issue there) and
 seeing as we don't at this time have a support contract with RH plus it  it
 is not reproducible on the current release (the glibc in 6.5 fixes it) I
 doubt there will be any traction in that direction.

Possible, but according to this below, 6.4 is very recent and supposed to
be maintained till 2015 :

   https://access.redhat.com/site/support/policy/updates/errata/

So maybe they're interested in backporting the fix from 6.5 into 6.4. There
are some Red Hat people here on the list, maybe they could relay that
information internally (Ryan ?).

  It's possible that the bug
  is easier to trigger with haproxy or with a specific version of it than
  other products, but nevertheless, no lib should ever crash depending on
  the traffic so I suspect there's an unchecked error code in it causing a
  NULL pointer to be dereferenced.
 
 
 It wasn't a null pointer but an explicit choice of behaviour in libkrb5 it
 seems...

Indeed, but given the code you showed, I suspect that the abort() was put
there a bit in a hurry or as a sign of despair. abort() followed by return -1
is pointless and not that common!

  In 1.5-dev12, I believe we did not yet support SNI, which could be an
  explanation for the different behaviour between the two versions. I
  think that the chroot is needed to trigger the bug simply because the
  glibc does not find a file it looks up, and causes a different return
  code to be fed to openssl.
 
 
 This is possible and seems a likely possibility ... Incidentally blocking
 one of the 'ciphers of death' by not permitting it on the bind also appears
 to avoid the code path - the 'cipher not permitted' gets triggered before
 whatever leads to the query through libkrb that results in the SIGABRT.

Yes so it seems you needed a perfect alignment of planets for this to
happen, but that in your environments, planets are always aligned :-/

  It would be useful to know if you can also
  trigger the issue using the legacy openssl library instead of the
  distro's version (just pick 1.0.0l or 1.0.1f from the site if you're
  willing to rebuild it).
 
 
 We probably won't get a chance to do this and we're unlikely to move off of
 the libssl supported by the distro due to the added maintenance overhead.

I easily understand! Especially since you really want to rely on someone
who knows it well to correctly backport only the right fixes and not the
bogus ones from upstream...

  Thanks a lot!
  Willy
 
 
 Not a problem ... our Head of IS did a detailed write up on our
 investigation process and findings at his blog if you are interested:
 
 http://blog.tinola.com/?e=36

Ah cool, thanks for the link!

Cheers,
Willy




'packet of death' in 1.5-dev21.x86_64.el6_4

2014-01-30 Thread James Hogarth
Hi all,

We carried out an update from dev12 to dev21 as per my previous message to
the list and the specific issue I mentioned before no longer occurred -
good.

Unfortunately we hit a fairly major problem which summed up is a 'packet of
death' scenario that affects dev21 (have not built backwards in time yet to
determine the exact 'regression') but not dev12.

I mentioned this briefly in IRC but for archive's sake here's what we spent
the day debugging:

Steps to reproduce:
1) Build a C6.4 system using vault.centos.org
2) Install haproxy-1.5-dev21.x86_64
3) Set the sysctl properties net.ipv6.conf.default.disable_ipv6 = 1 and
net.ipv6.conf.all.disable_ipv6 = 1
4) Start a basic haproxy configuration with a chroot specified and an ssl
enabled listener.
5) Use a tool that can specify cipher strength (LOW or MEDIUM crashes but
HIGH does not) to open a connection to the listener such as ApacheBench eg:
ab -c 1 -n 1 -Z LOW https://targethost/

Results:
Haproxy gets a segfault - specifically a SIGABRT - and the process dies.

Expectation:
Haproxy carries on working without issue.

Workarounds:
As you can read above there's specific things that need to line up for this
to occur (and unfortunately we hit them all on our production systems as it
turns out).
1) If the glibc from 6.5 is installed (and yes boxes should be updated and
'there is only C6 not C6.X' should prevail) then no crash occurs.
2) If haproxy is not in a chroot then no crash occurs.
3) If ipv6 is not disabled (ie systctl reports disable_ipv6 = 0) then no
crash occurs.

Specifics:
We spent most of the day in gdb, strace and ltrace today working through
the specific codepaths and it would appear that SSL makes a call through
SSL_get_hostname which then calls through
libkrb krb5int_get_fq_local_hostname (localhost, sizeof(localhost)) and the
routines this calls ends up with  -5 (EAI_NODATA) being passed to
krb5int_translate_gai_error which then, if EAI_NODATA has not been defined
which appears might be the case without __USE_GNU, calls abort().

As you can see we went into some detail on this ... We're not sure at this
time why dev12 does not crash or specifically what fixes in glibc for the
el6.5 point release avoid this code path and crash (updating openssl or
krb5-libs does not help only glibc).

TL:DR; chroot haproxy, disable_ipv6 and run c6.4 with an SSL front end and
ab -c 1 -n 1 -Z LOW https://target/ is enough to crash your haproxy with a
SIGABRT.

 I'm not sure how much more we are going to deep dive this given the
available workarounds but it's a heads up for anyone else that hits the
three criteria and an interesting problem for why, exactly, it happens ;)

If anyone has any thoughts or insights I'd be intrigued to hear them and if
you want to reproduce and have difficulties doing so I'd be happy to help.

Cheers,

James


RE: 'packet of death' in 1.5-dev21.x86_64.el6_4

2014-01-30 Thread Lukas Tribus
Hi,


 If anyone has any thoughts or insights I'd be intrigued to hear them  
 and if you want to reproduce and have difficulties doing so I'd be  
 happy to help.

Please provide the smallest config you can reproduce the problem with
and the output of haproxy -vv. I cannot currently reproduce this.

Also, can you please provide:
ldd haproxy
yum info glibc
yum info glibc-devel
yum info openssl
yum info openssl-devel

Any other dependencies in haproxy? Did you build with compression or pcre
support? If yes, please provide:
yum info pcre
yum info pcre-devel
yum info zlib
yum info zlib-devel


Are you able to reproduce this with a openssl s_connect command? That
would provide more SSL related output (like the cipher used to connect).



Regards,

Lukas 


Re: 'packet of death' in 1.5-dev21.x86_64.el6_4

2014-01-30 Thread James Hogarth
On 30 January 2014 22:21, Lukas Tribus luky...@hotmail.com wrote:

 Hi,


  If anyone has any thoughts or insights I'd be intrigued to hear them
  and if you want to reproduce and have difficulties doing so I'd be
  happy to help.

 Please provide the smallest config you can reproduce the problem with
 and the output of haproxy -vv. I cannot currently reproduce this.

Also, can you please provide:
 ldd haproxy


[root@localhost ~]# ldd /usr/sbin/haproxy
linux-vdso.so.1 =  (0x7fff3e5d1000)
libcrypt.so.1 = /lib64/libcrypt.so.1 (0x7fdd7c79e000)
libssl.so.10 = /usr/lib64/libssl.so.10 (0x7fdd7c541000)
libcrypto.so.10 = /usr/lib64/libcrypto.so.10 (0x7fdd7c1a6000)
libpcreposix.so.0 = /usr/lib64/libpcreposix.so.0 (0x7fdd7bfa4000)
libpcre.so.0 = /lib64/libpcre.so.0 (0x7fdd7bd78000)
libc.so.6 = /lib64/libc.so.6 (0x7fdd7b9e4000)
libfreebl3.so = /lib64/libfreebl3.so (0x7fdd7b782000)
libgssapi_krb5.so.2 = /lib64/libgssapi_krb5.so.2 (0x7fdd7b53e000)
libkrb5.so.3 = /lib64/libkrb5.so.3 (0x7fdd7b257000)
libcom_err.so.2 = /lib64/libcom_err.so.2 (0x7fdd7b053000)
libk5crypto.so.3 = /lib64/libk5crypto.so.3 (0x7fdd7ae27000)
libdl.so.2 = /lib64/libdl.so.2 (0x7fdd7ac22000)
libz.so.1 = /lib64/libz.so.1 (0x7fdd7aa0c000)
/lib64/ld-linux-x86-64.so.2 (0x7fdd7c9db000)
libkrb5support.so.0 = /lib64/libkrb5support.so.0 (0x7fdd7a801000)
libkeyutils.so.1 = /lib64/libkeyutils.so.1 (0x7fdd7a5fd000)
libresolv.so.2 = /lib64/libresolv.so.2 (0x7fdd7a3e3000)
libpthread.so.0 = /lib64/libpthread.so.0 (0x7fdd7a1c6000)
libselinux.so.1 = /lib64/libselinux.so.1 (0x7fdd79fa6000)



 yum info glibc


[root@localhost ~]# rpm -qi glibc
Name: glibcRelocations: (not relocatable)
Version : 2.12  Vendor: CentOS
Release : 1.107.el6_4.5 Build Date: Mon 14 Oct 2013
09:14:18 BST
Install Date: Thu 30 Jan 2014 17:59:28 GMT  Build Host:
c6b8.bsys.dev.centos.org
Group   : System Environment/Libraries   Source RPM:
glibc-2.12-1.107.el6_4.5.src.rpm
Size: 12947502 License: LGPLv2+ and LGPLv2+
with exceptions and GPLv2+
Signature   : RSA/SHA1, Mon 14 Oct 2013 16:44:18 BST, Key ID
0946fca2c105b9de
Packager: CentOS BuildSystem http://bugs.centos.org
URL : http://sources.redhat.com/glibc/
Summary : The GNU libc libraries



 yum info glibc-devel


[root@localhost ~]# rpm -qi glibc-devel
Name: glibc-devel  Relocations: (not relocatable)
Version : 2.12  Vendor: CentOS
Release : 1.107.el6_4.5 Build Date: Mon 14 Oct 2013
09:14:18 BST
Install Date: Thu 30 Jan 2014 17:59:35 GMT  Build Host:
c6b8.bsys.dev.centos.org
Group   : Development/Libraries Source RPM:
glibc-2.12-1.107.el6_4.5.src.rpm
Size: 988726   License: LGPLv2+ and LGPLv2+
with exceptions and GPLv2+
Signature   : RSA/SHA1, Mon 14 Oct 2013 16:44:18 BST, Key ID
0946fca2c105b9de
Packager: CentOS BuildSystem http://bugs.centos.org
URL : http://sources.redhat.com/glibc/
Summary : Object files for development using standard C libraries.


 yum info openssl


[root@localhost ~]# rpm -qi openssl
Name: openssl  Relocations: (not relocatable)
Version : 1.0.0 Vendor: CentOS
Release : 27.el6_4.2Build Date: Mon 04 Mar 2013
22:23:22 GMT
Install Date: Thu 30 Jan 2014 14:57:25 GMT  Build Host:
c6b9.bsys.dev.centos.org
Group   : System Environment/Libraries   Source RPM:
openssl-1.0.0-27.el6_4.2.src.rpm
Size: 3776670  License: OpenSSL
Signature   : RSA/SHA1, Mon 04 Mar 2013 22:33:47 GMT, Key ID
0946fca2c105b9de
Packager: CentOS BuildSystem http://bugs.centos.org
URL : http://www.openssl.org/
Summary : A general purpose cryptography library with TLS implementation


 yum info openssl-devel


[root@localhost ~]# rpm -qi openssl-devel
Name: openssl-develRelocations: (not relocatable)
Version : 1.0.0 Vendor: CentOS
Release : 27.el6_4.2Build Date: Mon 04 Mar 2013
22:23:22 GMT
Install Date: Thu 30 Jan 2014 14:57:45 GMT  Build Host:
c6b9.bsys.dev.centos.org
Group   : Development/Libraries Source RPM:
openssl-1.0.0-27.el6_4.2.src.rpm
Size: 2162085  License: OpenSSL
Signature   : RSA/SHA1, Mon 04 Mar 2013 22:33:47 GMT, Key ID
0946fca2c105b9de
Packager: CentOS BuildSystem http://bugs.centos.org
URL : http://www.openssl.org/



 Any other dependencies in haproxy? Did you build with compression or pcre
 support?


Make line in spec file is:

make USE_OPENSSL=1 %{?_smp_mflags} CPU=generic TARGET=linux26
USE_PCRE=1 USE_REGPARM=1 ADDINC=%{optflags} -I/usr/include/pcre



 If yes, please provide:
 yum info pcre



Re: 'packet of death' in 1.5-dev21.x86_64.el6_4

2014-01-30 Thread James Hogarth
On 30 January 2014 22:21, Lukas Tribus luky...@hotmail.com wrote:

 Please provide the smallest config you can reproduce the problem with
 and the output of haproxy -vv. I cannot currently reproduce this.


Sorry I missed config and -vvv :

 [root@localhost ~]# haproxy -vvv
HA-Proxy version 1.5-dev21-6b07bf7 +2013/12/17
Copyright 2000-2013 Willy Tarreau w...@1wt.eu

Build options :
  TARGET  = linux26
  CPU = generic
  CC  = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing
  OPTIONS = USE_REGPARM=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built without zlib support (USE_ZLIB not set)
Compression algorithms supported : identity
Built with OpenSSL version : OpenSSL 1.0.0-fips 29 Mar 2010
Running on OpenSSL version : OpenSSL 1.0.0-fips 29 Mar 2010
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with transparent proxy support using: IP_TRANSPARENT IP_FREEBIND

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

[root@localhost ~]# cat /etc/haproxy/haproxy.cfg
global
   chroot /var/lib/haproxy
   pidfile /var/run/haproxy.pid
defaults
   clitimeout 18
   srvtimeout 18
   contimeout 4000
 listen test-fe
  bind 192.168.122.54:443 ssl crt /etc/pki/tls/private/cert.pem
  mode tcp
  server test-server 192.168.122.244:9090 check

Note you don't need a working backend server ... just as long as you're
listening the behaviour occurs.

Cheers,

James


Re: 'packet of death' in 1.5-dev21.x86_64.el6_4

2014-01-30 Thread Willy Tarreau
Hi James,

On Thu, Jan 30, 2014 at 11:20:07PM +, James Hogarth wrote:
 On 30 January 2014 22:21, Lukas Tribus luky...@hotmail.com wrote:
 
  Please provide the smallest config you can reproduce the problem with
  and the output of haproxy -vv. I cannot currently reproduce this.
 
 
 Sorry I missed config and -vvv :
 
  [root@localhost ~]# haproxy -vvv
 HA-Proxy version 1.5-dev21-6b07bf7 +2013/12/17
 Copyright 2000-2013 Willy Tarreau w...@1wt.eu
(...)

Thank you for this detailed report, this is *very* useful. As you tracked
the crash to happen inside openssl, I think you should file a report to
centos/redhat because it's a security issue. It's possible that the bug
is easier to trigger with haproxy or with a specific version of it than
other products, but nevertheless, no lib should ever crash depending on
the traffic so I suspect there's an unchecked error code in it causing a
NULL pointer to be dereferenced.

In 1.5-dev12, I believe we did not yet support SNI, which could be an
explanation for the different behaviour between the two versions. I
think that the chroot is needed to trigger the bug simply because the
glibc does not find a file it looks up, and causes a different return
code to be fed to openssl. It would be useful to know if you can also
trigger the issue using the legacy openssl library instead of the
distro's version (just pick 1.0.0l or 1.0.1f from the site if you're
willing to rebuild it).

Thanks a lot!
Willy