Re: HAproxy constant memory leak

2015-02-06 Thread Pavlos Parissis
On 06/02/2015 11:19 πμ, Georges-Etienne Legendre wrote:
> Hi Willy,
> 
> Yes, please send me the script.
> 

Willy,
If it isn't against the policies of this ML to send attachments and the
script is few kilobytes size, could you please send it to the list?

Thanks,
Pavlos




signature.asc
Description: OpenPGP digital signature


Re: HAproxy constant memory leak

2015-02-06 Thread Georges-Etienne Legendre
Hi Willy,

Yes, please send me the script.

Thanks!

-- Georges-Etienne

> Le 2015-02-06 à 01:55, Willy Tarreau  a écrit :
> 
> Hi Georges-Etienne,
> 
>> On Thu, Feb 05, 2015 at 09:10:25PM -0500, Georges-Etienne Legendre wrote:
>> Hi Willy,
>> 
>> I'm not sure how to document this leak. I don't know exactly how is
>> implemented the firewall SSL health check... Would the Wireshark trace be
>> enough to report the issue?
> 
> Yes I think it will definitely help, given that the exchange is very short,
> it basically is an SSL hello. It is important to report the strace output
> as well and to mention that haproxy is running in a chroot.
> 
> If that helps you, I can send you the script I wrote from your capture.
> You could try it to confirm it does the same effect as the firewall's
> checks. Once confirmed, it will be much easier to build a bug report
> given that you'll simply have to attach the script as a reproducer.
> 
> Best regards,
> Willy
> 



Re: HAproxy constant memory leak

2015-02-05 Thread Willy Tarreau
Hi Georges-Etienne,

On Thu, Feb 05, 2015 at 09:10:25PM -0500, Georges-Etienne Legendre wrote:
> Hi Willy,
> 
> I'm not sure how to document this leak. I don't know exactly how is
> implemented the firewall SSL health check... Would the Wireshark trace be
> enough to report the issue?

Yes I think it will definitely help, given that the exchange is very short,
it basically is an SSL hello. It is important to report the strace output
as well and to mention that haproxy is running in a chroot.

If that helps you, I can send you the script I wrote from your capture.
You could try it to confirm it does the same effect as the firewall's
checks. Once confirmed, it will be much easier to build a bug report
given that you'll simply have to attach the script as a reproducer.

Best regards,
Willy




Re: HAproxy constant memory leak

2015-02-05 Thread Georges-Etienne Legendre
Hi Willy,

I'm not sure how to document this leak. I don't know exactly how is
implemented the firewall SSL health check... Would the Wireshark trace be
enough to report the issue?

Thanks!

-- Georges-Etienne

On Tue, Feb 3, 2015 at 5:52 PM, Willy Tarreau  wrote:

> Hi Georges-Etienne,
>
> On Tue, Feb 03, 2015 at 08:09:15AM -0500, Georges-Etienne Legendre wrote:
> > Hi Willy,
> >
> > Thanks a lot for this investigation, it was really helpful.
> >
> > My OpenSSL is up-to-date on this server. I first tried to remove the
> chroot
> > statement. I'm pretty sure this in itself solved the leak, but I no
> longer
> > have the traces and couple of hours after, our Ops changed the SSL check
> to
> > a simple TCP check on port 443. So, I cannot confirm 100%.
> >
> > I can however confirm that I no longer experience the leak. I put back
> the
> > chroot command to be safer.
>
> OK that's great.
>
> > This also prompted me to tweak the SSL ciphers. I now use a more
> thoughtful
> > list of ciphers (
> > https://mozilla.github.io/server-side-tls/ssl-config-generator/) and
> > disabled SSLv3. This indeed disables KRB5.
> >
> > ssl-default-bind-ciphers
> >
> ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA
> > ssl-default-bind-options no-sslv3
>
> Wow! When we introduced SSL, I expected that a lot of difficulties
> would come from it, but not that the ugliest config statements would
> come with it as well :-)
>
> > I will keep a close eye on the memory usage... HAproxy has been running
> for
> > about 16 hours now, and here is the ps output:
> > # ps -u nobody u
> > USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> > nobody   63985  0.5  0.0  53868 10960 ?Ss   Feb02   5:19
> > /usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid
> >
> > Looks good :-)
>
> Yes indeed. Now I think it will really be important to report this leak
> to whomever it concerns (probably the distro vendor so that they decide
> whether it's in their own patches or in openssl upstream). My openssl
> version doesn't have krb5 and I have never understood what is needed to
> enable it nor what it provides. Crypto libs tend to be cryptic ...
>
> Willy
>
>


Re: HAproxy constant memory leak

2015-02-03 Thread Willy Tarreau
Hi Georges-Etienne,

On Tue, Feb 03, 2015 at 08:09:15AM -0500, Georges-Etienne Legendre wrote:
> Hi Willy,
> 
> Thanks a lot for this investigation, it was really helpful.
> 
> My OpenSSL is up-to-date on this server. I first tried to remove the chroot
> statement. I'm pretty sure this in itself solved the leak, but I no longer
> have the traces and couple of hours after, our Ops changed the SSL check to
> a simple TCP check on port 443. So, I cannot confirm 100%.
> 
> I can however confirm that I no longer experience the leak. I put back the
> chroot command to be safer.

OK that's great.

> This also prompted me to tweak the SSL ciphers. I now use a more thoughtful
> list of ciphers (
> https://mozilla.github.io/server-side-tls/ssl-config-generator/) and
> disabled SSLv3. This indeed disables KRB5.
> 
> ssl-default-bind-ciphers
> ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA
> ssl-default-bind-options no-sslv3

Wow! When we introduced SSL, I expected that a lot of difficulties
would come from it, but not that the ugliest config statements would
come with it as well :-)

> I will keep a close eye on the memory usage... HAproxy has been running for
> about 16 hours now, and here is the ps output:
> # ps -u nobody u
> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> nobody   63985  0.5  0.0  53868 10960 ?Ss   Feb02   5:19
> /usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid
> 
> Looks good :-)

Yes indeed. Now I think it will really be important to report this leak
to whomever it concerns (probably the distro vendor so that they decide
whether it's in their own patches or in openssl upstream). My openssl
version doesn't have krb5 and I have never understood what is needed to
enable it nor what it provides. Crypto libs tend to be cryptic ...

Willy




Re: HAproxy constant memory leak

2015-02-03 Thread Georges-Etienne Legendre
Hi Willy,

Thanks a lot for this investigation, it was really helpful.

My OpenSSL is up-to-date on this server. I first tried to remove the chroot
statement. I'm pretty sure this in itself solved the leak, but I no longer
have the traces and couple of hours after, our Ops changed the SSL check to
a simple TCP check on port 443. So, I cannot confirm 100%.

I can however confirm that I no longer experience the leak. I put back the
chroot command to be safer.

This also prompted me to tweak the SSL ciphers. I now use a more thoughtful
list of ciphers (
https://mozilla.github.io/server-side-tls/ssl-config-generator/) and
disabled SSLv3. This indeed disables KRB5.

ssl-default-bind-ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA
ssl-default-bind-options no-sslv3

I will keep a close eye on the memory usage... HAproxy has been running for
about 16 hours now, and here is the ps output:
# ps -u nobody u
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
nobody   63985  0.5  0.0  53868 10960 ?Ss   Feb02   5:19
/usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid

Looks good :-)

-- Georges-Etienne

On Mon, Feb 2, 2015 at 10:25 AM, Willy Tarreau  wrote:

> Georges-Etienne,
>
> your captures were extremely informative. While I cannot reproduce the
> behaviour here even by reinjecting the same health check requests, I'm
> seeing two really odd things in your trace below :
>
> We accept an SSL connection from the firewall :
>
> 08:15:52.297357 accept(6, {sa_family=AF_INET, sin_port=htons(32764),
> sin_addr=inet_addr("")}, [16]) = 1
>
> It sends 48 bytes :
>
> 08:15:52.297717 read(1, "\200.\1\3\0\0\25\0\0\0\20", 11) = 11
> 08:15:52.297831 read(1, 
> "\0\0\3\0\0\10\0\0\6\4\0\200\0\0\4\0\0\5O\0\0@\202J#i\242K7)\300\2536o\245=\23",
> 37) = 37
>
> Then we're checking for /etc/krb5.conf :
>
> 08:15:52.297984 stat("/etc/krb5.conf", 0x7fff544b1990) = -1 ENOENT (No
> such file or directory)
>
> Then trying to read some random :
>
> 08:15:52.298082 open("/dev/urandom", O_RDONLY) = -1 ENOENT (No such file
> or directory)
>
> Then trying to figure the local host name :
>
> 08:15:52.298187 uname({sys="Linux", node="", ...})
> = 0
>
> Then doing some netlink-based studd :
>
> 08:15:52.298316 socket(PF_NETLINK, SOCK_RAW, 0) = 2
> 08:15:52.298395 bind(2, {sa_family=AF_NETLINK, pid=0, groups=},
> 12) = 0
> 08:15:52.298471 getsockname(2, {sa_family=AF_NETLINK, pid=9103,
> groups=}, [12]) = 0
> 08:15:52.298550 sendto(2, "\24\0\0\0\26\0\1\3\210x\317T\0\0\0\0\0\0\0\0",
> 20, 0, {sa_family=AF_NETLINK, pid=0, groups=}, 12) = 20
> 08:15:52.298650 recvmsg(2, {msg_name(12)={sa_family=AF_NETLINK, pid=0,
> groups=},
> msg_iov(1)=[{"0\0\0\0\24\0\2\0\210x\317T\217#\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1\10\0\2\0\177\0\0\1\7\0\3\0lo\0\0<\0\0\0\24\0\2\0\210x\317T\217#\0\0\2\32\200\0\n\0\0\0\10\0\1\0\n\0\35\22\10\0\2\0\n\0\35\22\10\0\4\0\n\0\35?\n\0\3\0bond0\0\0\0<\0\0\0\24\0\2\0\210x\317T\217#\0\0\2\32\200\0\f\0\0\0\10\0\1\0\n\2\177\217\10\0\2\0\n\2\177\217\10\0\4\0\n\2\177\277\n\0\3\0bond2\0\0\0<\0\0\0\24\0\2\0\210x\317T\217#\0\0\2\32\200\0\r\0\0\0\10\0\1\0\nZ\6j"...,
> 4096}], msg_controllen=0, msg_flags=0}, 0) = 356
> 08:15:52.298841 recvmsg(2, {msg_name(12)={sa_family=AF_NETLINK, pid=0,
> groups=},
> msg_iov(1)=[{"@\0\0\0\24\0\2\0\210x\317T\217#\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\24\0\6\0\377\377\377\377\377\377\377\377H\3\0\0H\3\0\0@
> \0\0\0\24\0\2\0\210x\317T\217#\0\0\n@
> \200\375\n\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\3064k\377\376\256\37@
> \24\0\6\0\377\377\377\377\377\377\377\377f\5\0\0f\5\0\0@
> \0\0\0\24\0\2\0\210x\317T\217#\0\0\n@
> \200\375\v\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\3064k\377\376\256\37A\24\0\6\0\377\377\377\377\377\377\377\377\232\5\0\0\232\5\0\0@\0\0\0\24\0\2\0"...,
> 4096}], msg_controllen=0, msg_flags=0}, 0) = 448
> 08:15:52.299059 recvmsg(2, {msg_name(12)={sa_family=AF_NETLINK, pid=0,
> groups=},
> msg_iov(1)=[{"\24\0\0\0\3\0\2\0\210x\317T\217#\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\24\0\6\0\377\377\377\377\377\377\377\377H\3\0\0H\3\0\0@
> \0\0\0\24\0\2\0\210x\317T\217#\0\0\n@
> \200\375\n\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\3064k\377\376\256\37@
> \24\0\6\0\377\377\377\377\377\377\377\377f\5\0\0f

RE: HAproxy constant memory leak

2015-02-02 Thread Lukas Tribus
> OpenSSL sometimes acts stupidly like this inside a chroot. We've
> encountered a few issues in the past with openssl doing totally crazy
> stuff inside a chroot, including abort() on krb5-related things. From
> what I understood (others, please correct me if I'm wrong), such
> processing may be altered by the type of key or ciphers.
> 
> In my opinion, you should attempt two things :
> 
> 1) ensure that your ssl library is up to date (double checking doesn't
> cost much)
> 
> 2) try it again without the chroot statement to see if when openssl finds
> what it's looking for, the leak stops.
> 
> 3) maybe file a report to the openssl list about a memory leak in that
> exact situation, with the traces you sent to me. Maybe they'll want
> to have your public key as well to verify some assumptions about
> what could be done inside the lib with its properties.

I suppose you can disable KRB5 ciphers to be part of the negotiation by
adding !KRB5 to your cipher configuration. If that works, it is probably
the simplest workaround, unless you actually need KRB5.


Lukas

  

Re: HAproxy constant memory leak

2015-02-02 Thread Willy Tarreau
Georges-Etienne,

your captures were extremely informative. While I cannot reproduce the
behaviour here even by reinjecting the same health check requests, I'm
seeing two really odd things in your trace below :

We accept an SSL connection from the firewall :

08:15:52.297357 accept(6, {sa_family=AF_INET, sin_port=htons(32764), 
sin_addr=inet_addr("")}, [16]) = 1

It sends 48 bytes :

08:15:52.297717 read(1, "\200.\1\3\0\0\25\0\0\0\20", 11) = 11
08:15:52.297831 read(1, 
"\0\0\3\0\0\10\0\0\6\4\0\200\0\0\4\0\0\5O\0\0@\202J#i\242K7)\300\2536o\245=\23",
 37) = 37

Then we're checking for /etc/krb5.conf :

08:15:52.297984 stat("/etc/krb5.conf", 0x7fff544b1990) = -1 ENOENT (No such 
file or directory)

Then trying to read some random :

08:15:52.298082 open("/dev/urandom", O_RDONLY) = -1 ENOENT (No such file or 
directory)

Then trying to figure the local host name :

08:15:52.298187 uname({sys="Linux", node="", ...}) = 0

Then doing some netlink-based studd :

08:15:52.298316 socket(PF_NETLINK, SOCK_RAW, 0) = 2
08:15:52.298395 bind(2, {sa_family=AF_NETLINK, pid=0, groups=}, 12) = 0
08:15:52.298471 getsockname(2, {sa_family=AF_NETLINK, pid=9103, 
groups=}, [12]) = 0
08:15:52.298550 sendto(2, "\24\0\0\0\26\0\1\3\210x\317T\0\0\0\0\0\0\0\0", 20, 
0, {sa_family=AF_NETLINK, pid=0, groups=}, 12) = 20
08:15:52.298650 recvmsg(2, {msg_name(12)={sa_family=AF_NETLINK, pid=0, 
groups=}, 
msg_iov(1)=[{"0\0\0\0\24\0\2\0\210x\317T\217#\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1\10\0\2\0\177\0\0\1\7\0\3\0lo\0\0<\0\0\0\24\0\2\0\210x\317T\217#\0\0\2\32\200\0\n\0\0\0\10\0\1\0\n\0\35\22\10\0\2\0\n\0\35\22\10\0\4\0\n\0\35?\n\0\3\0bond0\0\0\0<\0\0\0\24\0\2\0\210x\317T\217#\0\0\2\32\200\0\f\0\0\0\10\0\1\0\n\2\177\217\10\0\2\0\n\2\177\217\10\0\4\0\n\2\177\277\n\0\3\0bond2\0\0\0<\0\0\0\24\0\2\0\210x\317T\217#\0\0\2\32\200\0\r\0\0\0\10\0\1\0\nZ\6j"...,
 4096}], msg_controllen=0, msg_flags=0}, 0) = 356
08:15:52.298841 recvmsg(2, {msg_name(12)={sa_family=AF_NETLINK, pid=0, 
groups=}, 
msg_iov(1)=[{"@\0\0\0\24\0\2\0\210x\317T\217#\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\24\0\6\0\377\377\377\377\377\377\377\377H\3\0\0H\3\0\0@\0\0\0\24\0\2\0\210x\317T\217#\0\0\n@\200\375\n\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\3064k\377\376\256\37@\24\0\6\0\377\377\377\377\377\377\377\377f\5\0\0f\5\0\0@\0\0\0\24\0\2\0\210x\317T\217#\0\0\n@\200\375\v\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\3064k\377\376\256\37A\24\0\6\0\377\377\377\377\377\377\377\377\232\5\0\0\232\5\0\0@\0\0\0\24\0\2\0"...,
 4096}], msg_controllen=0, msg_flags=0}, 0) = 448
08:15:52.299059 recvmsg(2, {msg_name(12)={sa_family=AF_NETLINK, pid=0, 
groups=}, 
msg_iov(1)=[{"\24\0\0\0\3\0\2\0\210x\317T\217#\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\24\0\6\0\377\377\377\377\377\377\377\377H\3\0\0H\3\0\0@\0\0\0\24\0\2\0\210x\317T\217#\0\0\n@\200\375\n\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\3064k\377\376\256\37@\24\0\6\0\377\377\377\377\377\377\377\377f\5\0\0f\5\0\0@\0\0\0\24\0\2\0\210x\317T\217#\0\0\n@\200\375\v\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\3064k\377\376\256\37A\24\0\6\0\377\377\377\377\377\377\377\377\232\5\0\0\232\5\0\0@\0\0\0\24\0\2\0"...,
 4096}], msg_controllen=0, msg_flags=0}, 0) = 20
08:15:52.299242 close(2)= 0

Then trying to open nsswitch.conf :

08:15:52.299353 open("/etc/nsswitch.conf", O_RDONLY) = -1 ENOENT (No such file 
or directory)

Then does the netlink + nsswitch dance a second time, followed by about
10 times the following with various domain name suffixes :

08:15:52.300841 open("/etc/resolv.conf", O_RDONLY) = -1 ENOENT (No such file or 
directory)
08:15:52.300938 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 2
08:15:52.301018 connect(2, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("127.0.0.1")}, 16) = 0
08:15:52.301100 poll([{fd=2, events=POLLOUT}], 1, 0) = 1 ([{fd=2, 
revents=POLLOUT}])
08:15:52.301179 sendto(2, "\327\r\1\0\0\1\0\0\0\0\0\0\t_kerberos\f\0\0\20\0\1", 51, MSG_NOSIGNAL, NULL, 0) = 51
08:15:52.301296 poll([{fd=2, events=POLLIN}], 1, 5000) = 1 ([{fd=2, 
revents=POLLERR}])
08:15:52.301373 close(2)= 0

Etc. It does that *a lot*. A few times we're seeing brk() with an
increasing value though it's not huge enough to prove everything leaks
there, but it proves that it happens inside openssl, since it's between
a read() performed by openssl and a stat() performed by it as well :

08:16:02.055371 epoll_wait(0, {{EPOLLIN, {u32=6, u64=6}}, {EPOLLIN, {u32=5, 
u64=5}}}, 200, 159) = 2
08:16:02.055457 accept(6, {sa_family=AF_INET, sin_port=htons(13053), 
sin_addr=inet_addr()}, [16]) = 2
08:16:02.00 fcntl(2, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
08:16:02.055658 accept(6, 0x7fff544b3e70, [128]) = -1 EAGAIN (Resource 
temporarily unavailable)
08:16:02.055806 read(2, "\200.\1\3\0\0\25\0\0\0\20", 11) = 11
08:16:02.055908 read(2, "\0\0\3\0\0\10\0\0\6\4\0\200\0\0\4\0\0\5\233G\314"..., 
37) = 37
08:16:02.056005 brk(0x5062d000) = 0x5062d00

Re: HAproxy constant memory leak

2015-02-02 Thread Willy Tarreau
Hi Georges-Etienne,

On Mon, Feb 02, 2015 at 08:35:21AM -0500, Georges-Etienne Legendre wrote:
> Thanks for your help.
> 
> The configuration is now back to 5000 maxconn, and Haproxy has been running
> with this config over the last weekend. The memory footprint is now >1G.

OK, so there's no doubt about it.

> # ps -u nobody u
> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> nobody9103  0.7  3.9 1334192 1291740 ? Ss   Jan30  30:03
> /usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid
> 
> # telnet localhost 1935
> show pools
> Dumping pools usage. Use SIGQUIT to flush them.
>   - Pool pipe (32 bytes) : 5 allocated (160 bytes), 5 used, 3 users [SHARED]
>   - Pool capture (64 bytes) : 0 allocated (0 bytes), 0 used, 1 users
> [SHARED]
>   - Pool channel (80 bytes) : 168 allocated (13440 bytes), 6 used, 1 users
> [SHARED]
>   - Pool task (112 bytes) : 149 allocated (16688 bytes), 67 used, 1 users
> [SHARED]
>   - Pool uniqueid (128 bytes) : 0 allocated (0 bytes), 0 used, 1 users
> [SHARED]
>   - Pool connection (320 bytes) : 168 allocated (53760 bytes), 6 used, 1
> users [SHARED]
>   - Pool hdr_idx (416 bytes) : 84 allocated (34944 bytes), 2 used, 1 users
> [SHARED]
>   - Pool session (864 bytes) : 85 allocated (73440 bytes), 3 used, 1 users
> [SHARED]
>   - Pool requri (1024 bytes) : 21 allocated (21504 bytes), 0 used, 1 users
> [SHARED]
>   - Pool buffer (16416 bytes) : 168 allocated (2757888 bytes), 6 used, 1
> users [SHARED]
> Total: 10 pools, 2971824 bytes allocated, 111984 used.

Impressive, nothing *seems* to be used here. So we're leaking somewhere else.
I spent the whole afternoon on Friday trying to reproduce various cases using
connection probes, with/without data, with/without request, etc, but couldn't
come to anything related to what you're seeing.

> I've also executed the tcpdump + strace for about 30 seconds. There should
> be no confidential info, but to be sure, I will send them to you personally.

Great, thank you, I'll study all this. I really appreciate your help on
this bug! I hope I can reproduce it and spot what is happening.

Best regards,
Willy




Re: HAproxy constant memory leak

2015-02-02 Thread Georges-Etienne Legendre
Thanks for your help.

The configuration is now back to 5000 maxconn, and Haproxy has been running
with this config over the last weekend. The memory footprint is now >1G.

# ps -u nobody u
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
nobody9103  0.7  3.9 1334192 1291740 ? Ss   Jan30  30:03
/usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid

# telnet localhost 1935
show pools
Dumping pools usage. Use SIGQUIT to flush them.
  - Pool pipe (32 bytes) : 5 allocated (160 bytes), 5 used, 3 users [SHARED]
  - Pool capture (64 bytes) : 0 allocated (0 bytes), 0 used, 1 users
[SHARED]
  - Pool channel (80 bytes) : 168 allocated (13440 bytes), 6 used, 1 users
[SHARED]
  - Pool task (112 bytes) : 149 allocated (16688 bytes), 67 used, 1 users
[SHARED]
  - Pool uniqueid (128 bytes) : 0 allocated (0 bytes), 0 used, 1 users
[SHARED]
  - Pool connection (320 bytes) : 168 allocated (53760 bytes), 6 used, 1
users [SHARED]
  - Pool hdr_idx (416 bytes) : 84 allocated (34944 bytes), 2 used, 1 users
[SHARED]
  - Pool session (864 bytes) : 85 allocated (73440 bytes), 3 used, 1 users
[SHARED]
  - Pool requri (1024 bytes) : 21 allocated (21504 bytes), 0 used, 1 users
[SHARED]
  - Pool buffer (16416 bytes) : 168 allocated (2757888 bytes), 6 used, 1
users [SHARED]
Total: 10 pools, 2971824 bytes allocated, 111984 used.



I've also executed the tcpdump + strace for about 30 seconds. There should
be no confidential info, but to be sure, I will send them to you personally.

Let me know if I can capture anything else that could be helpful.

Thanks!

-- Georges-Etienne

On Sat, Jan 31, 2015 at 8:40 AM, Willy Tarreau  wrote:

> On Sat, Jan 31, 2015 at 12:59:34AM +0100, Lukas Tribus wrote:
> > > The maxconn was set to 4096 before, and after 45 days, haproxy was
> > > using 20gigs...
> >
> > Ok, can you set maxconn back to 4096, reproduce the leak (to at least
> > a few gigabytes) and a run "show pools" a few times to see where
> > exactly the memory consumption comes from?
>
> Also, could you please send a network capture of the checks from
> the firewall to haproxy (if possible, taken on the haproxy side) ?
> It is possible that there is a specific sequence leading to an
> improper close (eg: some SSL structs not being released at certain
> steps in the handhskake, etc).
>
> Please use this to take your capture :
>
> tcpdump -vs0 -pi eth0 -w checks.cap host  and port
> 
>
> Wait for several seconds, then Ctrl-C. Be careful, your capture
> will contain all the traffic flowing between haproxy and the
> firewall's address facing it, so there might be confidential
> information there, only send to the list if you think it's OK.
>
> Ideally, in parallel you can try to strace haproxy during this
> capture :
>
>strace -tts200 -o checks.log -p $(pgrep haproxy)
>
> Thanks,
> Willy
>
>


Re: HAproxy constant memory leak

2015-01-31 Thread Georges-Etienne Legendre
The maxconn was set to 4096 before, and after 45 days, haproxy was using
20gigs...

What else could it be?

-- Georges-Etienne

On Fri, Jan 30, 2015 at 1:49 PM, Lukas Tribus  wrote:

> With "maxconn 5" this is expected behavior, because haproxy will use
> RAM up to an amount that is justified for 5 concurrent connections.
> Configure maxconn to a proper and real value and the RAM usage will be
> predictable. Lukas
>


Re: HAproxy constant memory leak

2015-01-31 Thread Willy Tarreau
On Sat, Jan 31, 2015 at 12:59:34AM +0100, Lukas Tribus wrote:
> > The maxconn was set to 4096 before, and after 45 days, haproxy was  
> > using 20gigs... 
> 
> Ok, can you set maxconn back to 4096, reproduce the leak (to at least
> a few gigabytes) and a run "show pools" a few times to see where
> exactly the memory consumption comes from?

Also, could you please send a network capture of the checks from
the firewall to haproxy (if possible, taken on the haproxy side) ?
It is possible that there is a specific sequence leading to an
improper close (eg: some SSL structs not being released at certain
steps in the handhskake, etc).

Please use this to take your capture :

tcpdump -vs0 -pi eth0 -w checks.cap host  and port 

Wait for several seconds, then Ctrl-C. Be careful, your capture
will contain all the traffic flowing between haproxy and the
firewall's address facing it, so there might be confidential
information there, only send to the list if you think it's OK.

Ideally, in parallel you can try to strace haproxy during this
capture :

   strace -tts200 -o checks.log -p $(pgrep haproxy)

Thanks,
Willy




RE: HAproxy constant memory leak

2015-01-31 Thread Lukas Tribus
> The maxconn was set to 4096 before, and after 45 days, haproxy was  
> using 20gigs... 

Ok, can you set maxconn back to 4096, reproduce the leak (to at least
a few gigabytes) and a run "show pools" a few times to see where
exactly the memory consumption comes from?

Lukas
  


RE: HAproxy constant memory leak

2015-01-31 Thread Lukas Tribus
With "maxconn 5" this is expected behavior, because haproxy will use
RAM up to an amount that is justified for 5 concurrent connections.

Configure maxconn to a proper and real value and the RAM usage will be
predictable.


Lukas

  

HAproxy constant memory leak

2015-01-28 Thread Georges-Etienne Legendre
Hi all,

We're deploying HAproxy and we're experiencing what apprears to be a memory
leak. After couple of days, HAproxy is consuming gigs of RAM.

Running a ps command returns:

# ps -u nobody u
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
nobody   29960  0.2  1.0 387220 343472 ?   Rs   Jan27   2:40
/usr/sbin/haproxy -D -f /etc/haproxy/haproxy


Yesterday, I've been tracking the VSZ memory and haproxy leaks ~655 bytes
per 30 minutes. Today, we're at 387220 bytes.

14:00 128396
14:10 131624
14:20 134868
14:30 138100


What is totally strange is that we have the same version of HAproxy running
in another environment, and there is no leak there... However, there is
just one difference: In the environment that leaks, there is a firewall
pinging HAproxy every second (TCP open/close). I see that HAproxy is
reporting those ping as request errors (ereq). Here is a trace when running
haproxy with the -d flag:

:web.accept(0005)=0009 from [10.90.19.3:25611]
:web.clicls[0009:]
:web.closed[0009:]
0002:web.accept(0005)=000a from [10.90.19.2:52213]
0002:web.clicls[000a:]
0002:web.closed[000a:]
0004:web.accept(0005)=000a from [10.90.19.2:52357]
0004:web.clicls[000a:]
0004:web.closed[000a:]
0007:web.accept(0005)=000a from [10.90.19.2:52502]
0007:web.clicls[000a:]
0007:web.closed[000a:]


Anyway idea what could be wrong?
Thanks!



Our configuration is pretty simple:

global
log 127.0.0.1 local1 info
chroot /usr/share/haproxy
maxconn 5
uid 99
gid 99
daemon
tune.ssl.default-dh-param 1024
stats socket :1935 level admin
stats timeout 2m

defaults
log global
modehttp
retries 3
timeout connect 5s
timeout client 60s
timeout server 120s
timeout queue 60s
timeout http-request 15s
timeout http-keep-alive 15s
balance roundrobin
option http-keep-alive
option forwardfor
option httpchk OPTIONS /
option httplog clf
option dontlognull

frontend web
bind *:80
bind *:443 ssl crt /etc/pki/tls/wildcardssl.pem

use_backend app_auth if { path_dir app-authnz }
use_backend app_stats if { path_dir app-stats }
use_backend app_search if { path_dir app-search }
use_backend app_settings if { path_dir app-settings }

backend app_auth
reqrep ^([^\ :]*)\ /[^/]*/(.*) \1\ /\2
server fe01 127.0.0.1:4 check
server fe02 10.2.127.144:4 check backup
server fe03 10.2.127.145:4 check backup
server fe04 10.2.127.146:4 check backup


Here is some additional details:

# haproxy -vv
HA-Proxy version 1.5.9 2014/11/25
Copyright 2000-2014 Willy Tarreau 

Build options :
  TARGET  = linux26
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing
  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.3
Compression algorithms supported : identity, deflate, gzip
Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with transparent proxy support using: IP_TRANSPARENT IP_FREEBIND

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.


--


# telnet localhost 1935
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
show info;show stat;show pools
Name: HAProxy
Version: 1.5.9
Release_date: 2014/11/25
Nbproc: 1
Process_num: 1
Pid: 29960
Uptime: 0d 0h12m25s
Uptime_sec: 745
Memmax_MB: 0
Ulimit-n: 100033
Maxsock: 100033
Maxconn: 5
Hard_maxconn: 5
CurrConns: 1
CumConns: 1630
CumReq: 1684
MaxSslConns: 0
CurrSslConns: 0
CumSslConns: 856
Maxpipes: 0
PipesUsed: 0
PipesFree: 0
ConnRate: 4
ConnRateLimit: 0
MaxConnRate: 9
SessRate: 4
SessRateLimit: 0
MaxSessRate: 9
SslRate: 2
SslRateLimit: 0
MaxSslRate: 4
SslFrontendKeyRate: 0
SslFrontendMaxKeyRate: 1
SslFrontendSessionReuse_pct: 100
SslBackendKeyRate: 0
SslBackendMaxKeyRate: 0
SslCacheLookups: 10
SslCacheMisses: 0
CompressBpsIn: 0
CompressBpsOut: 0
CompressBpsRateLim: 0
ZlibMemUsage: 0
MaxZlibMemUsage: 0
Tasks: 18
Run_queue: 1
Idle_pct: 100
node: toro63nsfe01.pf.spop.ca
description:

#
pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,ra