Re: Crash with kernel error
Lukas, 1.6.3 didn't have any crashes. These crashes are sporadic and are not happening under the load, there is very little traffic as we are not running production yet. The proxy starts fine and can run for hours with the crash. Where would the core be generated? I set it up running as user haproxy would I have to adjust limits for that user? Thank you for all your help, On Wed, May 11, 2016 at 4:02 PM, Lukas Tribuswrote: > Hi Sasha, > > > so the crash happens sporadically after hours of production traffic? Or > does it crash right away after you start it? > > > You are saying this started with 1.6.4, what was the version you used > before and that worked fine? 1.6.3? > > > Before starting haproxy, enable core dumping like this: > > ulimit -c unlimited > > > Confirm its unlimited (right before starting haproxy from this shell): > > ulimit -c > > > > Disabling compiler optimizations will make sure the generated coredump is > as meaningful as possible, you can do it like this: > > make clean; make CFLAGS="-O0 -g -fno-strict-aliasing > -Wdeclaration-after-statement" TARGET=linux2628 USE_ZLIB=1 USE_OPENSSL=1 > USE_PCRE=1 > > > But be advised that there will be performance/cpu impact, so you better > monitor it. > > > When you have a coredump, you can provide a backtrace with gdb like this: > > gdb > > and issuing a "bt full" > > > > > Regards, > > Lukas > > >
Re: [PATCH] MEDIUM: init: allow directory as argument of -f
Hi Willy, > > I don't receive all the mails from haproxy@formilux.org. > > For exemple I didn't received : > > http://article.gmane.org/gmane.comp.web.haproxy/27795 > > Well, this one was sent to you directly by Cyril, so you should have > received it. Indeed, it was this one: http://article.gmane.org/gmane.comp.web.haproxy/27802 I will check my anti-spam. Anyway : > I've subscribed you by hand now. I now receive all the mailling mail. Thanks a lot. > > > If tmp_str fails and wlt succeeds, wlt is not freed. > > If tmp_str fails and wlt succeeds we still got the Alert and > > everything it freed on exit. > > Yes I know but as I said, if/when such code later moves to its own > function, this function might initially decide to exit then to let > the caller take the decision and one day all of this will be used > dynamically or from the CLI and then people discover a memory leak. > And there are the valgrind users who send patches very often to fix > such warnings that annoy them. I mean we spent a lot of time killing > some such old issues that were not bugs initially and that became > bugs later, so we try to be careful. We don't want to be the next > openssl if you see what I mean :-) Yes I see :-) For the record I now always check my code with "valgrind --leak- check=full". > > I create the function "void cfgfiles_expand_directories(void)", but > > not the "load_config_file" one. > > I am not accustomed to using goto and it's hard for me to use it > > here as I actually don't see the point of it (in > > cfgfiles_expand_directories). > > That's the best way to deal with error unrolling. I'm sad that > teachers at school teach students not to use it because : > 1) it's what the compiler implements anyway for all other > constructs > 2) it's the only safe way to perform unrolling which resists to > code additions. > > We used to have some leaks in the past because we were not using it. > When you have some session initialization code like this : > > s = malloc(sizeof(*s)); > if (!s) > return; > > s->req = malloc(sizeof(*s->req)); > if (!s->req)) { > free(s); > return; > } > > s->res = malloc(sizeof(*s->res)); > if (!s->res)) { > free(s->req); > free(s); > return; > } > > s->txn = malloc(sizeof(*s->txn)); > if (!s->txn)) { > free(s->res); > free(s->req); > free(s); > return; > } > > s->log = malloc(sizeof(*s->log)); > if (!s->log)) { > free(s->txn); > free(s->res); > free(s->req); > free(s); > return; > } > > s->req_capture = malloc(sizeof(*s->req_capture)); > if (!s->req_capture)) { > free(s->log); > free(s->txn); > free(s->res); > free(s->req); > free(s); > return; > } > > s->res_capture = malloc(sizeof(*s->res_capture)); > if (!s->res_capture)) { > free(s->req_capture); > free(s->log); > free(s->txn); > free(s->res); > free(s->req); > free(s); > return; > } > > ... and so on for 10 entries ... > > Then you may already have bugs above due to the inevitable copy- > paste, and once you insert a new field in the middle (eg: s->vars) > you're pretty sure that you will miss it in one of the next "if" > blocks because they are never as clear as above but themselves > enclosed within other "if" blocks. And when you need to switch > allocation order, that's even worse. But the horrible thing above can > be safely turned into this : > > s = malloc(sizeof(*s)); > if (!s) > goto fail_s; > > s->req = malloc(sizeof(*s->req)); > if (!s->req)) > goto fail_req; > > s->res = malloc(sizeof(*s->res)); > if (!s->res)) > goto fail_res; > > s->txn = malloc(sizeof(*s->txn)); > if (!s->txn)) > goto fail_txn; > > s->log = malloc(sizeof(*s->log)); > if (!s->log)) > goto fail_log; > > s->req_capture = malloc(sizeof(*s->req_capture)); > if (!s->req_capture)) > goto fail_req_cap; > > s->res_capture = malloc(sizeof(*s->res_capture)); > if (!s->res_capture)) > goto fail_res_cap; > > return s; > > fail_res_cap: > free(s->req_capture); > fail_req_cap: > free(s->log); > fail_log: > free(s->txn); > fail_txn: > free(s->res); > fail_res: > free(s->req); > fail_req: > free(s); > fail_s: > return NULL; > > And a nice side effect is that when you look at the assembly code, > it's much smaller and much more efficient. > > > It doesn't reduce the number of lines and, as after every alert we > > call exit, there is no need to clean anything. > > As I explained above I agree on this but code correctness and code > cleanness are two different things. We try to have a bit of > modularity because we know that code moves and that it's better if it > can move safely. For example I recently
Re: Crash with kernel error
Hi Sasha, so the crash happens sporadically after hours of production traffic? Or does it crash right away after you start it? You are saying this started with 1.6.4, what was the version you used before and that worked fine? 1.6.3? Before starting haproxy, enable core dumping like this: ulimit -c unlimited Confirm its unlimited (right before starting haproxy from this shell): ulimit -c Disabling compiler optimizations will make sure the generated coredump is as meaningful as possible, you can do it like this: make clean; make CFLAGS="-O0 -g -fno-strict-aliasing -Wdeclaration-after-statement" TARGET=linux2628 USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1 But be advised that there will be performance/cpu impact, so you better monitor it. When you have a coredump, you can provide a backtrace with gdb like this: gdb and issuing a "bt full" Regards, Lukas
Re: 100% cpu , epoll_wait()
Hi Sebastian, Am 11.05.2016 um 16:07 schrieb Sebastian Heid: Hi, I updated from 1.5.17 to 1.5.18 today, but sadly this issue still exits in the latest version in our environment. However downgrading to 1.5.14 "fixed" the issue for us. Seems like a different issue then. Can you elaborate what you are seeing? Sporadic 100% cpu load? Do you have to kill it or does it recover on its own? Can you strace it? Thanks, Lukas
Re: Crash with kernel error
I apologize the version is 1.6.5. I built it myself with zlib, openssl, pcre and linux2628 and I run it on CentOS 6.7. I had several crashes happening starting from 1.6.4 but they were related to zlib which also was weird Apr 21 15:11:25 node1lvs1-la kernel: haproxy[15586]: segfault at 3dbed94000 ip 003dbfe02866 sp 7fff12d6e9d8 error 4 in libz.so.1.2.3[3dbfe0+15000] Apr 21 15:26:03 node1lvs1-la kernel: haproxy[23231]: segfault at 3dbed94000 ip 003dbfe02866 sp 7fffe19228f8 error 4 in libz.so.1.2.3[3dbfe0+15000] Apr 21 15:28:25 node1lvs1-la kernel: haproxy[23809]: segfault at 3dbed94000 ip 003dbfe02866 sp 7fff05fdf728 error 4 in libz.so.1.2.3[3dbfe0+15000] Apr 21 15:41:59 node1lvs1-la kernel: haproxy[24005]: segfault at 3dbed94000 ip 003dbfe02866 sp 7fffdb690788 error 4 in libz.so.1.2.3[3dbfe0+15000] May 5 16:26:26 node1lvs1-la kernel: haproxy[26050]: segfault at 3dbed94000 ip 003dbfe02866 sp 7fff84c41e08 error 4 in libz.so.1.2.3[3dbfe0+15000] I build 1.6.5 yesterday simply out of desperation as couldn't find anything on the web other then one post in 2013 which was supposedly fix by project maintainer then, But it didn't help May 10 14:25:24 node1lvs1-la kernel: haproxy[18206]: segfault at 3dbed94000 ip 003dbfe02866 sp 7fff9670ae58 error 4 in libz.so.1.2.3[3dbfe0+15000] So I updated zlib to 1.2.7-15 building rpm from CentOS 7 repo. At that point I had the crash with different data May 10 17:36:33 node1lvs1-la kernel: haproxy[24074]: segfault at 3dbed94000 ip 003dbea897fb sp 7fffc7278e68 error 4 in libc-2.12.so [3dbea0+18a000] I tested hardware including memory and it all passed the diagnostic tests. I upgraded kernel from 3.8 to 4.5.1, again not sure if this will make any sense. This morning again May 11 06:14:34 node1lvs1-la kernel: haproxy[10599]: segfault at 3dbed94000 ip 003dbea8993e sp 7ffeaec50968 error 4 in libc-2.12.so [3dbea0+18a000] Here is -vv haproxy -vv HA-Proxy version 1.6.5 2016/05/10 Copyright 2000-2016 Willy TarreauBuild options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Encrypted password support via crypt(3): yes Built with zlib version : 1.2.7 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 7.8 2008-09-05 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built without Lua support Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Haproxy.cfg global debug maxconn 4096 user haproxy group haproxy daemon # nbproc 16 #ca-base /etc/ssl #crt-base /etc/ssl log 127.0.0.1:514 local2 debug tune.ssl.default-dh-param 2048 tune.zlib.windowsize 15 defaults log global option tcplog option httplog option logasap maxconn 4096 mode http # Add x-forwarded-for header. option forwardfor option originalto option http-pretend-keepalive option http-server-close timeout connect 5s timeout client 30s timeout server 30s # Long timeout for WebSocket connections. timeout tunnel 1h # Gzip some content back to client compression algo deflate gzip compression type text/html text/plain text/javascript application/javascript application/xml text/css listen admin_stats bind *:8800 mode http stats enable stats hide-version stats uri /stats stats refresh 10s stats realm Haproxy\ Statistics stats auth xcastadmin:Antilopa! backend http_redir mode http option nolinger server web I have multiple other files separated in conf.d, I can ship them to you but I am not sure what is the best way to do it. Also how do you configure haproxy to generate a meaningful coredump in cases like this ? On Wed, May 11, 2016 at 6:13 AM, Nenad Merdanovic wrote: > Hello, > > On 5/11/2016 10:16 AM, Alex Litvak wrote: > > Haproxy 1.6.15 crashes with following error > > > > haproxy[24074]: segfault at 3dbed94000 ip 003dbea897fb sp > > 7fffc7278e68 error 4 in libc-2.12.so[3dbea0+18a000] > > > > > > Are you able to reliably reproduce this? Please post the output of > 'haproxy -vv', send an anonymized version of your configuration file. I > am not sure what you mean by 1.6.15, because
Re: 100% cpu , epoll_wait()
Hi, I updated from 1.5.17 to 1.5.18 today, but sadly this issue still exits in the latest version in our environment. However downgrading to 1.5.14 "fixed" the issue for us. Running CentOS6, Linux XXX 2.6.32-504.8.1.el6.x86_64 #1 SMP Wed Jan 28 21:11:36 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Build options : TARGET = linux26 CPU = generic CC = gcc CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing OPTIONS = USE_OPENSSL=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200 Encrypted password support via crypt(3): yes Built without zlib support (USE_ZLIB not set) Compression algorithms supported : identity Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 7.8 2008-09-05 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built with transparent proxy support using: IP_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Bye, Sebastian -Ursprüngliche Nachricht- > Von:Jim Freeman> Gesendet: Fre 22 April 2016 01:21 > An: HAProxy > Betreff: Re: 100% cpu , epoll_wait() > > [ Apologies for consuming yet more vertical space ] > > With this in .cfg : > log-format > ±{"date":"%t","lbtype":"haproxy","lbname":"%H","cip":"%ci","pid":"%pid","name_f":"%f","name_b":"%b","name_s":"%s","time_cr":"%Tq","time_dq":"%Tw","time_sc":"%Tc","time_sr":"%Tr","time_t":"%Tt","scode":"%ST","bytes_c":"%U","bytes_s":"%B","termstat":"%ts","con_act":"%ac","con_frnt":"%fc","con_back":"%bc","con_srv":"%sc","rtry":"%rc","queue_s":"%sq","queue_b":"%bq","rqst":"%r","hdrs":"%hr"} > > , these requests logged with large %Tt (one request for favicon.ico, > which gets answered?): > = > 4/21/16 > 3:06:36.268 PM > { [-] > bytes_c: 578 > bytes_s: 2485558 > cip: 10.107.152.81 > con_act: 43 > con_back: 0 > con_frnt: 0 > con_srv: 0 > date: 21/Apr/2016:21:06:36.268 > hdrs: {Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.} > lbname: haproxy01 > lbtype: haproxy > name_b: haproxy_stats > name_f: haproxy_stats > name_s: > pid: 20030 > queue_b: 0 > queue_s: 0 > rqst: GET /favicon.ico HTTP/1.1 > rtry: 0 > scode: 200 > termstat: LR > time_cr: 5874 > time_dq: 0 > time_sc: 0 > time_sr: 0 > time_t: 992288 > } > host = haproxy01.a source = /logs/haproxy.log sourcetype = haproxy > > 4/21/16 > 3:06:36.268 PM > { [-] > bytes_c: 577 > bytes_s: 3091670 > cip: 10.107.152.81 > con_act: 198 > con_back: 0 > con_frnt: 1 > con_srv: 0 > date: 21/Apr/2016:21:06:36.268 > hdrs: {Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.} > lbname: haproxy01 > lbtype: haproxy > name_b: haproxy_stats > name_f: haproxy_stats > name_s: > pid: 20030 > queue_b: 0 > queue_s: 0 > rqst: GET / HTTP/1.1 > rtry: 0 > scode: 200 > termstat: LR > time_cr: 107 > time_dq: 0 > time_sc: 0 > time_sr: 0 > time_t: 2493 > } > host = haproxy01.a source = /logs/haproxy.log sourcetype = haproxy > > 4/21/16 > 3:05:06.722 PM > { [-] > bytes_c: 577 > bytes_s: 2448514 > cip: 10.107.152.81 > con_act: 1133 > con_back: 0 > con_frnt: 0 > con_srv: 0 > date: 21/Apr/2016:21:05:06.722 > hdrs: {Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.} > lbname: haproxy01 > lbtype: haproxy > name_b: haproxy_stats > name_f: haproxy_stats > name_s: > pid: 20030 > queue_b: 0 > queue_s: 0 > rqst: GET / HTTP/1.1 > rtry: 0 > scode: 200 > termstat: LR > time_cr: 126 > time_dq: 0 > time_sc: 0 > time_sr: 0 > time_t: 88490 > } > host = haproxy01.a source = /logs/haproxy.log sourcetype = haproxy > > On Thu, Apr 21, 2016 at 5:10 PM, Jim Freeman wrote: > > Another alert+followup : > > > > Cpu pegged again - connected to host and ran : > > == > > # netstat -pantu | egrep "(^Proto|:5)" > > Proto Recv-Q Send-Q Local Address Foreign Address > > State PID/Program name > > tcp0 0 0.0.0.0:5 0.0.0.0:* > > LISTEN 7944/haproxy > > tcp0 0 10.33.176.98:5 10.34.157.166:53155 > > TIME_WAIT - > > tcp0 191520 10.33.176.98:5 10.107.152.81:59029 > > ESTABLISHED 20030/haproxy > > tcp0 0 10.33.176.98:5 10.34.155.182:43154 > > TIME_WAIT - > > tcp0 0
Re: Crash with kernel error
Hello, On 5/11/2016 10:16 AM, Alex Litvak wrote: > Haproxy 1.6.15 crashes with following error > > haproxy[24074]: segfault at 3dbed94000 ip 003dbea897fb sp > 7fffc7278e68 error 4 in libc-2.12.so[3dbea0+18a000] > > Are you able to reliably reproduce this? Please post the output of 'haproxy -vv', send an anonymized version of your configuration file. I am not sure what you mean by 1.6.15, because 1.6.5 was just released, but haproxy -vv output will clear that up. Regards, Nenad
Crash with kernel error
Haproxy 1.6.15 crashes with following error haproxy[24074]: segfault at 3dbed94000 ip 003dbea897fb sp 7fffc7278e68 error 4 in libc-2.12.so[3dbea0+18a000]