Re: Crash with kernel error

2016-05-11 Thread Sasha Litvak
Lukas,

1.6.3 didn't have any crashes.  These crashes are sporadic and are not
happening under the load, there is very little traffic as we are not
running production yet.  The proxy starts fine and can run for hours with
the crash.
Where would the core be generated?  I set it up running as user haproxy
would I have to adjust limits for that user?

Thank you for all your help,


On Wed, May 11, 2016 at 4:02 PM, Lukas Tribus  wrote:

> Hi Sasha,
>
>
> so the crash happens sporadically after hours of production traffic? Or
> does it crash right away after you start it?
>
>
> You are saying this started with 1.6.4, what was the version you used
> before and that worked fine? 1.6.3?
>
>
> Before starting haproxy, enable core dumping like this:
>
> ulimit -c unlimited
>
>
> Confirm its unlimited (right before starting haproxy from this shell):
>
> ulimit -c
>
>
>
> Disabling compiler optimizations will make sure the generated coredump is
> as meaningful as possible, you can do it like this:
>
> make clean; make CFLAGS="-O0 -g -fno-strict-aliasing
> -Wdeclaration-after-statement" TARGET=linux2628 USE_ZLIB=1 USE_OPENSSL=1
> USE_PCRE=1
>
>
> But be advised that there will be performance/cpu impact, so you better
> monitor it.
>
>
> When you have a coredump, you can provide a backtrace with gdb like this:
>
> gdb  
>
> and issuing a "bt full"
>
>
>
>
> Regards,
>
> Lukas
>
>
>


Re: [PATCH] MEDIUM: init: allow directory as argument of -f

2016-05-11 Thread Maxime de Roucy
Hi Willy,

> > I don't receive all the mails from haproxy@formilux.org.
> > For exemple I didn't received :
> > http://article.gmane.org/gmane.comp.web.haproxy/27795
> 
> Well, this one was sent to you directly by Cyril, so you should have
> received it.

Indeed, it was this one:
 http://article.gmane.org/gmane.comp.web.haproxy/27802

I will check my anti-spam.

Anyway :
> I've subscribed you by hand now.

I now receive all the mailling mail.
Thanks a lot.

> > > If tmp_str fails and wlt succeeds, wlt is not freed.
> > If tmp_str fails and wlt succeeds we still got the Alert and
> > everything it freed on exit.
> 
> Yes I know but as I said, if/when such code later moves to its own
> function, this function might initially decide to exit then to let
> the caller take the decision and one day all of this will be used
> dynamically or from the CLI and then people discover a memory leak.
> And there are the valgrind users who send patches very often to fix
> such warnings that annoy them. I mean we spent a lot of time killing
> some such old issues that were not bugs initially and that became
> bugs later, so we try to be careful. We don't want to be the next
> openssl if you see what I mean :-)

Yes I see :-)

For the record I now always check my code with "valgrind  --leak-
check=full".

> > I create the function "void cfgfiles_expand_directories(void)", but
> > not the "load_config_file" one.
> > I am not accustomed to using goto and it's hard for me to use it
> > here as I actually don't see the point of it (in
> > cfgfiles_expand_directories).
> 
> That's the best way to deal with error unrolling. I'm sad that
> teachers at school teach students not to use it because :
>   1) it's what the compiler implements anyway for all other
> constructs
>   2) it's the only safe way to perform unrolling which resists to
> code additions.
> 
> We used to have some leaks in the past because we were not using it.
> When you have some session initialization code like this :
> 
> s = malloc(sizeof(*s));
> if (!s)
> return;
> 
> s->req = malloc(sizeof(*s->req));
> if (!s->req)) {
>    free(s);
>    return;
> }
> 
> s->res = malloc(sizeof(*s->res));
> if (!s->res)) {
>    free(s->req);
>    free(s);
>    return;
> }
> 
> s->txn = malloc(sizeof(*s->txn));
> if (!s->txn)) {
>    free(s->res);
>    free(s->req);
>    free(s);
>    return;
> }
> 
> s->log = malloc(sizeof(*s->log));
> if (!s->log)) {
>    free(s->txn);
>    free(s->res);
>    free(s->req);
>    free(s);
>    return;
> }
> 
> s->req_capture = malloc(sizeof(*s->req_capture));
> if (!s->req_capture)) {
>    free(s->log);
>    free(s->txn);
>    free(s->res);
>    free(s->req);
>    free(s);
>    return;
> }
> 
> s->res_capture = malloc(sizeof(*s->res_capture));
> if (!s->res_capture)) {
>    free(s->req_capture);
>    free(s->log);
>    free(s->txn);
>    free(s->res);
>    free(s->req);
>    free(s);
>    return;
> }
> 
> ... and so on for 10 entries ...
> 
> Then you may already have bugs above due to the inevitable copy-
> paste, and once you insert a new field in the middle (eg: s->vars)
> you're pretty sure that you will miss it in one of the next "if"
> blocks because they are never as clear as above but themselves
> enclosed within other "if" blocks. And when you need to switch
> allocation order, that's even worse. But the horrible thing above can
> be safely turned into this :
> 
> s = malloc(sizeof(*s));
> if (!s)
> goto fail_s;
> 
> s->req = malloc(sizeof(*s->req));
> if (!s->req))
> goto fail_req;
> 
> s->res = malloc(sizeof(*s->res));
> if (!s->res))
> goto fail_res;
> 
> s->txn = malloc(sizeof(*s->txn));
> if (!s->txn))
> goto fail_txn;
> 
> s->log = malloc(sizeof(*s->log));
> if (!s->log))
> goto fail_log;
> 
> s->req_capture = malloc(sizeof(*s->req_capture));
> if (!s->req_capture))
>    goto fail_req_cap;
> 
> s->res_capture = malloc(sizeof(*s->res_capture));
> if (!s->res_capture))
>    goto fail_res_cap;
> 
> return s;
> 
>  fail_res_cap:
> free(s->req_capture);
>  fail_req_cap:
> free(s->log);
>  fail_log:
> free(s->txn);
>  fail_txn:
> free(s->res);
>  fail_res:
> free(s->req);
>  fail_req:
> free(s);
>  fail_s:
> return NULL;
> 
> And a nice side effect is that when you look at the assembly code,
> it's much smaller and much more efficient.
> 
> > It doesn't reduce the number of lines and, as after every alert we
> > call exit, there is no need to clean anything.
> 
> As I explained above I agree on this but code correctness and code
> cleanness are two different things. We try to have a bit of
> modularity because we know that code moves and that it's better if it
> can move safely. For example I recently 

Re: Crash with kernel error

2016-05-11 Thread Lukas Tribus

Hi Sasha,


so the crash happens sporadically after hours of production traffic? Or 
does it crash right away after you start it?



You are saying this started with 1.6.4, what was the version you used 
before and that worked fine? 1.6.3?



Before starting haproxy, enable core dumping like this:

ulimit -c unlimited


Confirm its unlimited (right before starting haproxy from this shell):

ulimit -c



Disabling compiler optimizations will make sure the generated coredump 
is as meaningful as possible, you can do it like this:


make clean; make CFLAGS="-O0 -g -fno-strict-aliasing 
-Wdeclaration-after-statement" TARGET=linux2628 USE_ZLIB=1 USE_OPENSSL=1 
USE_PCRE=1



But be advised that there will be performance/cpu impact, so you better 
monitor it.



When you have a coredump, you can provide a backtrace with gdb like this:

gdb  

and issuing a "bt full"




Regards,

Lukas





Re: 100% cpu , epoll_wait()

2016-05-11 Thread Lukas Tribus

Hi Sebastian,


Am 11.05.2016 um 16:07 schrieb Sebastian Heid:

Hi,

I updated from 1.5.17 to 1.5.18 today, but sadly this issue still exits in the latest 
version in our environment. However downgrading to 1.5.14 "fixed" the issue for 
us.


Seems like a different issue then. Can you elaborate what you are 
seeing? Sporadic 100% cpu load? Do you have to kill it or does it 
recover on its own? Can you strace it?




Thanks,

Lukas




Re: Crash with kernel error

2016-05-11 Thread Sasha Litvak
I apologize the version is 1.6.5.  I built it myself with zlib, openssl,
pcre and linux2628 and I run it on CentOS 6.7.

I had several crashes happening starting from 1.6.4 but they were related
to zlib which also was weird

Apr 21 15:11:25 node1lvs1-la kernel: haproxy[15586]: segfault at 3dbed94000
ip 003dbfe02866 sp 7fff12d6e9d8 error 4 in
libz.so.1.2.3[3dbfe0+15000]
Apr 21 15:26:03 node1lvs1-la kernel: haproxy[23231]: segfault at 3dbed94000
ip 003dbfe02866 sp 7fffe19228f8 error 4 in
libz.so.1.2.3[3dbfe0+15000]
Apr 21 15:28:25 node1lvs1-la kernel: haproxy[23809]: segfault at 3dbed94000
ip 003dbfe02866 sp 7fff05fdf728 error 4 in
libz.so.1.2.3[3dbfe0+15000]
Apr 21 15:41:59 node1lvs1-la kernel: haproxy[24005]: segfault at 3dbed94000
ip 003dbfe02866 sp 7fffdb690788 error 4 in
libz.so.1.2.3[3dbfe0+15000]
May  5 16:26:26 node1lvs1-la kernel: haproxy[26050]: segfault at 3dbed94000
ip 003dbfe02866 sp 7fff84c41e08 error 4 in
libz.so.1.2.3[3dbfe0+15000]

I build 1.6.5 yesterday simply out of desperation as couldn't find anything
on the web other then one post in 2013 which was supposedly fix by project
maintainer then,

But it didn't help

May 10 14:25:24 node1lvs1-la kernel: haproxy[18206]: segfault at 3dbed94000
ip 003dbfe02866 sp 7fff9670ae58 error 4 in
libz.so.1.2.3[3dbfe0+15000]

So I updated zlib to 1.2.7-15 building rpm from CentOS 7 repo.

At that point I had the crash with different data

May 10 17:36:33 node1lvs1-la kernel: haproxy[24074]: segfault at 3dbed94000
ip 003dbea897fb sp 7fffc7278e68 error 4 in libc-2.12.so
[3dbea0+18a000]

I tested hardware including memory and it all passed the diagnostic tests.
I upgraded kernel from 3.8 to 4.5.1, again not sure if this will make any
sense.

This morning again

May 11 06:14:34 node1lvs1-la kernel: haproxy[10599]: segfault at 3dbed94000
ip 003dbea8993e sp 7ffeaec50968 error 4 in libc-2.12.so
[3dbea0+18a000]

Here is -vv

haproxy -vv
HA-Proxy version 1.6.5 2016/05/10
Copyright 2000-2016 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing
-Wdeclaration-after-statement
  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT
IP_FREEBIND

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Haproxy.cfg

global
  debug
  maxconn 4096
  user haproxy
  group haproxy
  daemon
  # nbproc 16
  #ca-base /etc/ssl
  #crt-base /etc/ssl
  log 127.0.0.1:514 local2 debug
  tune.ssl.default-dh-param 2048
  tune.zlib.windowsize 15

defaults
  log global
  option tcplog
  option httplog
  option logasap
  maxconn 4096
  mode http
  # Add x-forwarded-for header.
  option forwardfor
  option originalto
  option http-pretend-keepalive
  option http-server-close
  timeout connect 5s
  timeout client 30s
  timeout server 30s
  # Long timeout for WebSocket connections.
  timeout tunnel 1h

  # Gzip some content back to client
  compression algo deflate gzip
  compression type text/html text/plain text/javascript
application/javascript application/xml text/css


listen admin_stats
  bind *:8800
  mode http
  stats enable
  stats hide-version
  stats uri /stats
  stats refresh 10s
  stats realm Haproxy\ Statistics
  stats auth xcastadmin:Antilopa!


backend http_redir
  mode http
  option nolinger
  server web 

I have multiple other files separated in conf.d, I can ship them to you but
I am not sure what is the best way to do it.
Also how do you configure haproxy to generate a meaningful coredump in
cases like this ?


On Wed, May 11, 2016 at 6:13 AM, Nenad Merdanovic  wrote:

> Hello,
>
> On 5/11/2016 10:16 AM, Alex Litvak wrote:
> > Haproxy 1.6.15 crashes with following error
> >
> > haproxy[24074]: segfault at 3dbed94000 ip 003dbea897fb sp
> > 7fffc7278e68 error 4 in libc-2.12.so[3dbea0+18a000]
> >
> >
>
> Are you able to reliably reproduce this? Please post the output of
> 'haproxy -vv', send an anonymized version of your configuration file. I
> am not sure what you mean by 1.6.15, because 

Re: 100% cpu , epoll_wait()

2016-05-11 Thread Sebastian Heid
Hi,

I updated from 1.5.17 to 1.5.18 today, but sadly this issue still exits in the 
latest version in our environment. However downgrading to 1.5.14 "fixed" the 
issue for us.

Running CentOS6, Linux XXX 2.6.32-504.8.1.el6.x86_64 #1 SMP Wed Jan 28 21:11:36 
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Build options :
  TARGET  = linux26
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing
  OPTIONS = USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built without zlib support (USE_ZLIB not set)
Compression algorithms supported : identity
Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with transparent proxy support using: IP_TRANSPARENT IP_FREEBIND

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.


Bye,
Sebastian
 
 
-Ursprüngliche Nachricht-
> Von:Jim Freeman 
> Gesendet: Fre 22 April 2016 01:21
> An: HAProxy 
> Betreff: Re: 100% cpu , epoll_wait()
> 
> [ Apologies for consuming yet more vertical space ]
> 
> With this in .cfg :
> log-format 
> ±{"date":"%t","lbtype":"haproxy","lbname":"%H","cip":"%ci","pid":"%pid","name_f":"%f","name_b":"%b","name_s":"%s","time_cr":"%Tq","time_dq":"%Tw","time_sc":"%Tc","time_sr":"%Tr","time_t":"%Tt","scode":"%ST","bytes_c":"%U","bytes_s":"%B","termstat":"%ts","con_act":"%ac","con_frnt":"%fc","con_back":"%bc","con_srv":"%sc","rtry":"%rc","queue_s":"%sq","queue_b":"%bq","rqst":"%r","hdrs":"%hr"}
> 
> , these requests logged with large %Tt (one request for favicon.ico,
> which gets answered?):
> =
> 4/21/16
> 3:06:36.268 PM
> { [-]
> bytes_c:  578
> bytes_s:  2485558
> cip:  10.107.152.81
> con_act:  43
> con_back:  0
> con_frnt:  0
> con_srv:  0
> date:  21/Apr/2016:21:06:36.268
> hdrs:  {Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.}
> lbname:  haproxy01
> lbtype:  haproxy
> name_b:  haproxy_stats
> name_f:  haproxy_stats
> name_s:  
> pid:  20030
> queue_b:  0
> queue_s:  0
> rqst:  GET /favicon.ico HTTP/1.1
> rtry:  0
> scode:  200
> termstat:  LR
> time_cr:  5874
> time_dq:  0
> time_sc:  0
> time_sr:  0
> time_t:  992288
> }
> host = haproxy01.a source = /logs/haproxy.log sourcetype = haproxy
> 
> 4/21/16
> 3:06:36.268 PM
> { [-]
> bytes_c:  577
> bytes_s:  3091670
> cip:  10.107.152.81
> con_act:  198
> con_back:  0
> con_frnt:  1
> con_srv:  0
> date:  21/Apr/2016:21:06:36.268
> hdrs:  {Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.}
> lbname:  haproxy01
> lbtype:  haproxy
> name_b:  haproxy_stats
> name_f:  haproxy_stats
> name_s:  
> pid:  20030
> queue_b:  0
> queue_s:  0
> rqst:  GET / HTTP/1.1
> rtry:  0
> scode:  200
> termstat:  LR
> time_cr:  107
> time_dq:  0
> time_sc:  0
> time_sr:  0
> time_t:  2493
> }
> host = haproxy01.a source = /logs/haproxy.log sourcetype = haproxy
> 
> 4/21/16
> 3:05:06.722 PM
> { [-]
> bytes_c:  577
> bytes_s:  2448514
> cip:  10.107.152.81
> con_act:  1133
> con_back:  0
> con_frnt:  0
> con_srv:  0
> date:  21/Apr/2016:21:05:06.722
> hdrs:  {Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.}
> lbname:  haproxy01
> lbtype:  haproxy
> name_b:  haproxy_stats
> name_f:  haproxy_stats
> name_s:  
> pid:  20030
> queue_b:  0
> queue_s:  0
> rqst:  GET / HTTP/1.1
> rtry:  0
> scode:  200
> termstat:  LR
> time_cr:  126
> time_dq:  0
> time_sc:  0
> time_sr:  0
> time_t:  88490
> }
> host = haproxy01.a source = /logs/haproxy.log sourcetype = haproxy
> 
> On Thu, Apr 21, 2016 at 5:10 PM, Jim Freeman  wrote:
> > Another alert+followup :
> >
> > Cpu pegged again - connected to host and ran :
> > ==
> > # netstat -pantu | egrep "(^Proto|:5)"
> > Proto Recv-Q Send-Q Local Address   Foreign Address
> > State   PID/Program name
> > tcp0  0 0.0.0.0:5   0.0.0.0:*
> > LISTEN  7944/haproxy
> > tcp0  0 10.33.176.98:5  10.34.157.166:53155
> > TIME_WAIT   -
> > tcp0 191520 10.33.176.98:5  10.107.152.81:59029
> > ESTABLISHED 20030/haproxy
> > tcp0  0 10.33.176.98:5  10.34.155.182:43154
> > TIME_WAIT   -
> > tcp0  0 

Re: Crash with kernel error

2016-05-11 Thread Nenad Merdanovic
Hello,

On 5/11/2016 10:16 AM, Alex Litvak wrote:
> Haproxy 1.6.15 crashes with following error 
> 
> haproxy[24074]: segfault at 3dbed94000 ip 003dbea897fb sp 
> 7fffc7278e68 error 4 in libc-2.12.so[3dbea0+18a000]
> 
> 

Are you able to reliably reproduce this? Please post the output of
'haproxy -vv', send an anonymized version of your configuration file. I
am not sure what you mean by 1.6.15, because 1.6.5 was just released,
but haproxy -vv output will clear that up.

Regards,
Nenad



Crash with kernel error

2016-05-11 Thread Alex Litvak
Haproxy 1.6.15 crashes with following error 

haproxy[24074]: segfault at 3dbed94000 ip 003dbea897fb sp 
7fffc7278e68 error 4 in libc-2.12.so[3dbea0+18a000]