Re: HAproxy constant memory leak
Hi Willy, Yes, please send me the script. Thanks! -- Georges-Etienne > Le 2015-02-06 à 01:55, Willy Tarreau a écrit : > > Hi Georges-Etienne, > >> On Thu, Feb 05, 2015 at 09:10:25PM -0500, Georges-Etienne Legendre wrote: >> Hi Willy, >> >> I'm not sure how to document this leak. I don't know exactly how is >> implemented the firewall SSL health check... Would the Wireshark trace be >> enough to report the issue? > > Yes I think it will definitely help, given that the exchange is very short, > it basically is an SSL hello. It is important to report the strace output > as well and to mention that haproxy is running in a chroot. > > If that helps you, I can send you the script I wrote from your capture. > You could try it to confirm it does the same effect as the firewall's > checks. Once confirmed, it will be much easier to build a bug report > given that you'll simply have to attach the script as a reproducer. > > Best regards, > Willy >
Re: HAproxy constant memory leak
Hi Willy, I'm not sure how to document this leak. I don't know exactly how is implemented the firewall SSL health check... Would the Wireshark trace be enough to report the issue? Thanks! -- Georges-Etienne On Tue, Feb 3, 2015 at 5:52 PM, Willy Tarreau wrote: > Hi Georges-Etienne, > > On Tue, Feb 03, 2015 at 08:09:15AM -0500, Georges-Etienne Legendre wrote: > > Hi Willy, > > > > Thanks a lot for this investigation, it was really helpful. > > > > My OpenSSL is up-to-date on this server. I first tried to remove the > chroot > > statement. I'm pretty sure this in itself solved the leak, but I no > longer > > have the traces and couple of hours after, our Ops changed the SSL check > to > > a simple TCP check on port 443. So, I cannot confirm 100%. > > > > I can however confirm that I no longer experience the leak. I put back > the > > chroot command to be safer. > > OK that's great. > > > This also prompted me to tweak the SSL ciphers. I now use a more > thoughtful > > list of ciphers ( > > https://mozilla.github.io/server-side-tls/ssl-config-generator/) and > > disabled SSLv3. This indeed disables KRB5. > > > > ssl-default-bind-ciphers > > > ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA > > ssl-default-bind-options no-sslv3 > > Wow! When we introduced SSL, I expected that a lot of difficulties > would come from it, but not that the ugliest config statements would > come with it as well :-) > > > I will keep a close eye on the memory usage... HAproxy has been running > for > > about 16 hours now, and here is the ps output: > > # ps -u nobody u > > USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND > > nobody 63985 0.5 0.0 53868 10960 ?Ss Feb02 5:19 > > /usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid > > > > Looks good :-) > > Yes indeed. Now I think it will really be important to report this leak > to whomever it concerns (probably the distro vendor so that they decide > whether it's in their own patches or in openssl upstream). My openssl > version doesn't have krb5 and I have never understood what is needed to > enable it nor what it provides. Crypto libs tend to be cryptic ... > > Willy > >
Re: HAproxy constant memory leak
Hi Willy, Thanks a lot for this investigation, it was really helpful. My OpenSSL is up-to-date on this server. I first tried to remove the chroot statement. I'm pretty sure this in itself solved the leak, but I no longer have the traces and couple of hours after, our Ops changed the SSL check to a simple TCP check on port 443. So, I cannot confirm 100%. I can however confirm that I no longer experience the leak. I put back the chroot command to be safer. This also prompted me to tweak the SSL ciphers. I now use a more thoughtful list of ciphers ( https://mozilla.github.io/server-side-tls/ssl-config-generator/) and disabled SSLv3. This indeed disables KRB5. ssl-default-bind-ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA ssl-default-bind-options no-sslv3 I will keep a close eye on the memory usage... HAproxy has been running for about 16 hours now, and here is the ps output: # ps -u nobody u USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND nobody 63985 0.5 0.0 53868 10960 ?Ss Feb02 5:19 /usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid Looks good :-) -- Georges-Etienne On Mon, Feb 2, 2015 at 10:25 AM, Willy Tarreau wrote: > Georges-Etienne, > > your captures were extremely informative. While I cannot reproduce the > behaviour here even by reinjecting the same health check requests, I'm > seeing two really odd things in your trace below : > > We accept an SSL connection from the firewall : > > 08:15:52.297357 accept(6, {sa_family=AF_INET, sin_port=htons(32764), > sin_addr=inet_addr("")}, [16]) = 1 > > It sends 48 bytes : > > 08:15:52.297717 read(1, "\200.\1\3\0\0\25\0\0\0\20", 11) = 11 > 08:15:52.297831 read(1, > "\0\0\3\0\0\10\0\0\6\4\0\200\0\0\4\0\0\5O\0\0@\202J#i\242K7)\300\2536o\245=\23", > 37) = 37 > > Then we're checking for /etc/krb5.conf : > > 08:15:52.297984 stat("/etc/krb5.conf", 0x7fff544b1990) = -1 ENOENT (No > such file or directory) > > Then trying to read some random : > > 08:15:52.298082 open("/dev/urandom", O_RDONLY) = -1 ENOENT (No such file > or directory) > > Then trying to figure the local host name : > > 08:15:52.298187 uname({sys="Linux", node="", ...}) > = 0 > > Then doing some netlink-based studd : > > 08:15:52.298316 socket(PF_NETLINK, SOCK_RAW, 0) = 2 > 08:15:52.298395 bind(2, {sa_family=AF_NETLINK, pid=0, groups=}, > 12) = 0 > 08:15:52.298471 getsockname(2, {sa_family=AF_NETLINK, pid=9103, > groups=}, [12]) = 0 > 08:15:52.298550 sendto(2, "\24\0\0\0\26\0\1\3\210x\317T\0\0\0\0\0\0\0\0", > 20, 0, {sa_family=AF_NETLINK, pid=0, groups=}, 12) = 20 > 08:15:52.298650 recvmsg(2, {msg_name(12)={sa_family=AF_NETLINK, pid=0, > groups=}, > msg_iov(1)=[{"0\0\0\0\24\0\2\0\210x\317T\217#\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1\10\0\2\0\177\0\0\1\7\0\3\0lo\0\0<\0\0\0\24\0\2\0\210x\317T\217#\0\0\2\32\200\0\n\0\0\0\10\0\1\0\n\0\35\22\10\0\2\0\n\0\35\22\10\0\4\0\n\0\35?\n\0\3\0bond0\0\0\0<\0\0\0\24\0\2\0\210x\317T\217#\0\0\2\32\200\0\f\0\0\0\10\0\1\0\n\2\177\217\10\0\2\0\n\2\177\217\10\0\4\0\n\2\177\277\n\0\3\0bond2\0\0\0<\0\0\0\24\0\2\0\210x\317T\217#\0\0\2\32\200\0\r\0\0\0\10\0\1\0\nZ\6j"..., > 4096}], msg_controllen=0, msg_flags=0}, 0) = 356 > 08:15:52.298841 recvmsg(2, {msg_name(12)={sa_family=AF_NETLINK, pid=0, > groups=}, > msg_iov(1)=[{"@\0\0\0\24\0\2\0\210x\317T\217#\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\24\0\6\0\377\377\377\377\377\377\377\377H\3\0\0H\3\0\0@ > \0\0\0\24\0\2\0\210x\317T\217#\0\0\n@ > \200\375\n\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\3064k\377\376\256\37@ > \24\0\6\0\377\377\377\377\377\377\377\377f\5\0\0f\5\0\0@ > \0\0\0\24\0\2\0\210x\317T\217#\0\0\n@ > \200\375\v\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\3064k\377\376\256\37A\24\0\6\0\377\377\377\377\377\377\377\377\232\5\0\0\232\5\0\0@\0\0\0\24\0\2\0"..., > 4096}], msg_controllen=0, msg_flags=0}, 0) = 448 > 08:15:52.299059 recvmsg(2, {msg_name(12)={sa_family=AF_NETLINK, pid=0, > groups=}, > msg_iov(1)=[{"\24\0\0\0\3\0\2\0\210x\317T\217#\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\24\0\6\0\377\377\377\377\377\377\377\377H\3\0\0H\3\0\0@ > \0\0\0\24\0\2\0\210x\317T\217#\0\0\n@ > \200\375\n\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\3064k\377\376\256\37@ > \24\0\6\0\377\377\377\377\377\377\377\377f\5\0\0f
Re: HAproxy constant memory leak
Thanks for your help. The configuration is now back to 5000 maxconn, and Haproxy has been running with this config over the last weekend. The memory footprint is now >1G. # ps -u nobody u USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND nobody9103 0.7 3.9 1334192 1291740 ? Ss Jan30 30:03 /usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid # telnet localhost 1935 show pools Dumping pools usage. Use SIGQUIT to flush them. - Pool pipe (32 bytes) : 5 allocated (160 bytes), 5 used, 3 users [SHARED] - Pool capture (64 bytes) : 0 allocated (0 bytes), 0 used, 1 users [SHARED] - Pool channel (80 bytes) : 168 allocated (13440 bytes), 6 used, 1 users [SHARED] - Pool task (112 bytes) : 149 allocated (16688 bytes), 67 used, 1 users [SHARED] - Pool uniqueid (128 bytes) : 0 allocated (0 bytes), 0 used, 1 users [SHARED] - Pool connection (320 bytes) : 168 allocated (53760 bytes), 6 used, 1 users [SHARED] - Pool hdr_idx (416 bytes) : 84 allocated (34944 bytes), 2 used, 1 users [SHARED] - Pool session (864 bytes) : 85 allocated (73440 bytes), 3 used, 1 users [SHARED] - Pool requri (1024 bytes) : 21 allocated (21504 bytes), 0 used, 1 users [SHARED] - Pool buffer (16416 bytes) : 168 allocated (2757888 bytes), 6 used, 1 users [SHARED] Total: 10 pools, 2971824 bytes allocated, 111984 used. I've also executed the tcpdump + strace for about 30 seconds. There should be no confidential info, but to be sure, I will send them to you personally. Let me know if I can capture anything else that could be helpful. Thanks! -- Georges-Etienne On Sat, Jan 31, 2015 at 8:40 AM, Willy Tarreau wrote: > On Sat, Jan 31, 2015 at 12:59:34AM +0100, Lukas Tribus wrote: > > > The maxconn was set to 4096 before, and after 45 days, haproxy was > > > using 20gigs... > > > > Ok, can you set maxconn back to 4096, reproduce the leak (to at least > > a few gigabytes) and a run "show pools" a few times to see where > > exactly the memory consumption comes from? > > Also, could you please send a network capture of the checks from > the firewall to haproxy (if possible, taken on the haproxy side) ? > It is possible that there is a specific sequence leading to an > improper close (eg: some SSL structs not being released at certain > steps in the handhskake, etc). > > Please use this to take your capture : > > tcpdump -vs0 -pi eth0 -w checks.cap host and port > > > Wait for several seconds, then Ctrl-C. Be careful, your capture > will contain all the traffic flowing between haproxy and the > firewall's address facing it, so there might be confidential > information there, only send to the list if you think it's OK. > > Ideally, in parallel you can try to strace haproxy during this > capture : > >strace -tts200 -o checks.log -p $(pgrep haproxy) > > Thanks, > Willy > >
Re: HAproxy constant memory leak
The maxconn was set to 4096 before, and after 45 days, haproxy was using 20gigs... What else could it be? -- Georges-Etienne On Fri, Jan 30, 2015 at 1:49 PM, Lukas Tribus wrote: > With "maxconn 5" this is expected behavior, because haproxy will use > RAM up to an amount that is justified for 5 concurrent connections. > Configure maxconn to a proper and real value and the RAM usage will be > predictable. Lukas >
HAproxy constant memory leak
Hi all, We're deploying HAproxy and we're experiencing what apprears to be a memory leak. After couple of days, HAproxy is consuming gigs of RAM. Running a ps command returns: # ps -u nobody u USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND nobody 29960 0.2 1.0 387220 343472 ? Rs Jan27 2:40 /usr/sbin/haproxy -D -f /etc/haproxy/haproxy Yesterday, I've been tracking the VSZ memory and haproxy leaks ~655 bytes per 30 minutes. Today, we're at 387220 bytes. 14:00 128396 14:10 131624 14:20 134868 14:30 138100 What is totally strange is that we have the same version of HAproxy running in another environment, and there is no leak there... However, there is just one difference: In the environment that leaks, there is a firewall pinging HAproxy every second (TCP open/close). I see that HAproxy is reporting those ping as request errors (ereq). Here is a trace when running haproxy with the -d flag: :web.accept(0005)=0009 from [10.90.19.3:25611] :web.clicls[0009:] :web.closed[0009:] 0002:web.accept(0005)=000a from [10.90.19.2:52213] 0002:web.clicls[000a:] 0002:web.closed[000a:] 0004:web.accept(0005)=000a from [10.90.19.2:52357] 0004:web.clicls[000a:] 0004:web.closed[000a:] 0007:web.accept(0005)=000a from [10.90.19.2:52502] 0007:web.clicls[000a:] 0007:web.closed[000a:] Anyway idea what could be wrong? Thanks! Our configuration is pretty simple: global log 127.0.0.1 local1 info chroot /usr/share/haproxy maxconn 5 uid 99 gid 99 daemon tune.ssl.default-dh-param 1024 stats socket :1935 level admin stats timeout 2m defaults log global modehttp retries 3 timeout connect 5s timeout client 60s timeout server 120s timeout queue 60s timeout http-request 15s timeout http-keep-alive 15s balance roundrobin option http-keep-alive option forwardfor option httpchk OPTIONS / option httplog clf option dontlognull frontend web bind *:80 bind *:443 ssl crt /etc/pki/tls/wildcardssl.pem use_backend app_auth if { path_dir app-authnz } use_backend app_stats if { path_dir app-stats } use_backend app_search if { path_dir app-search } use_backend app_settings if { path_dir app-settings } backend app_auth reqrep ^([^\ :]*)\ /[^/]*/(.*) \1\ /\2 server fe01 127.0.0.1:4 check server fe02 10.2.127.144:4 check backup server fe03 10.2.127.145:4 check backup server fe04 10.2.127.146:4 check backup Here is some additional details: # haproxy -vv HA-Proxy version 1.5.9 2014/11/25 Copyright 2000-2014 Willy Tarreau Build options : TARGET = linux26 CPU = generic CC = gcc CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200 Encrypted password support via crypt(3): yes Built with zlib version : 1.2.3 Compression algorithms supported : identity, deflate, gzip Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 7.8 2008-09-05 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built with transparent proxy support using: IP_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. -- # telnet localhost 1935 Trying ::1... telnet: connect to address ::1: Connection refused Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. show info;show stat;show pools Name: HAProxy Version: 1.5.9 Release_date: 2014/11/25 Nbproc: 1 Process_num: 1 Pid: 29960 Uptime: 0d 0h12m25s Uptime_sec: 745 Memmax_MB: 0 Ulimit-n: 100033 Maxsock: 100033 Maxconn: 5 Hard_maxconn: 5 CurrConns: 1 CumConns: 1630 CumReq: 1684 MaxSslConns: 0 CurrSslConns: 0 CumSslConns: 856 Maxpipes: 0 PipesUsed: 0 PipesFree: 0 ConnRate: 4 ConnRateLimit: 0 MaxConnRate: 9 SessRate: 4 SessRateLimit: 0 MaxSessRate: 9 SslRate: 2 SslRateLimit: 0 MaxSslRate: 4 SslFrontendKeyRate: 0 SslFrontendMaxKeyRate: 1 SslFrontendSessionReuse_pct: 100 SslBackendKeyRate: 0 SslBackendMaxKeyRate: 0 SslCacheLookups: 10 SslCacheMisses: 0 CompressBpsIn: 0 CompressBpsOut: 0 CompressBpsRateLim: 0 ZlibMemUsage: 0 MaxZlibMemUsage: 0 Tasks: 18 Run_queue: 1 Idle_pct: 100 node: toro63nsfe01.pf.spop.ca description: # pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,ra