haproxy 1.4.7 segfaults under load around 1k connections

2010-06-16 Thread Morten Gade Sørensen
Hi there

We have been running haproxy 1.4.1 perfectly fine without any problems for 
several months.
Today haproxy started crashing with the following message in /var/log/messages:

Jun 16 13:26:21 lb2 kernel: [ 3352.666283] haproxy[4679]: segfault at 8 ip 
0043e462 sp 7fff0dac92c8 error 4 in haproxy[40+54000]

I tried upgrading to 1.4.7 with no luck.

We primarily use haproxy for TCP load balancing of SMTP, RPC (Exchange 2010) 
and RDP. The load might have increased slightly the last few days, and this 
might be what triggers it.
Initially I suspected it was because of only 512 MB RAM had been assigned to 
the machine (running x86_64 SuSE on ESX), so I increased that to 1GB.
Also ulimit -n showed 1024, so I increased that to 32768 and sysctl fs.file-max 
= 65535 - that didn't seem to do any difference. Also tried to disable some 
logging, just to see if that had anything to do with it.

It seems to crash everytime the total connection count gets around 800-1000...

lb2:~ # uname -a
Linux lb2 2.6.31.12-0.1-default #1 SMP 2010-01-27 08:20:11 +0100 x86_64 x86_64 
x86_64 GNU/Linux

lb2:~ # haproxy -vv
HA-Proxy version 1.4.7 2010/06/07
Copyright 2000-2010 Willy Tarreau mailto:w...@1wt.eu>>

Build options :
  TARGET  = linux26
  CPU = generic
  CC  = gcc
  CFLAGS  = -O2 -g
  OPTIONS = USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes

Available polling systems :
 sepoll : pref=400,  test result OK
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 4 (4 usable), will use sepoll.

## gdb session #

lb2:/usr/local/sbin # gdb ./haproxy
GNU gdb (GDB) SUSE (6.8.91.20090930-2.4)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
For bug reporting instructions, please see:
...
Reading symbols from /usr/local/sbin/haproxy...done.
(gdb) run -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -db
Starting program: /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p 
/var/run/haproxy.pid -db
Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2
Try: zypper install -C 
"debuginfo(build-id)=591af1afa33f255704fb6a60859b93d00e205302"
Missing separate debuginfo for /lib64/libcrypt.so.1
Try: zypper install -C 
"debuginfo(build-id)=b4127c6e9abfb7711018173fc6010b5853a5a781"
Missing separate debuginfo for /lib64/libpcreposix.so.0
Try: zypper install -C 
"debuginfo(build-id)=89f575b0c91220f553dff19b0f5d8ac786cf3e21"
Missing separate debuginfo for /lib64/libpcre.so.0
Try: zypper install -C 
"debuginfo(build-id)=faf1aba9b565a29c99ce1d3944978347d6209cc3"
Missing separate debuginfo for /lib64/libc.so.6
Try: zypper install -C 
"debuginfo(build-id)=b5ded0f18b9b11c5cd6b26387426ead562c332f8"

Program received signal SIGSEGV, Segmentation fault.
eb32_lookup_ge (root=0x656df0, x=2160585045) at ebtree/eb32tree.c:212
212 troot = (eb_untag(troot, EB_LEFT))->b[EB_RGHT];
(gdb)
(gdb) bt
#0  eb32_lookup_ge (root=0x656df0, x=2160585045) at ebtree/eb32tree.c:212
#1  0x004080de in process_runnable_tasks (next=0x7fffe64c) at 
src/task.c:206
#2  0x00402829 in run_poll_loop () at src/haproxy.c:966
#3  0x0040426f in main (argc=, 
argv=0x7fffe788) at src/haproxy.c:1240
(gdb) list
207 while (eb_gettag(troot) != EB_LEFT)
208 /* Walking up from right branch, so we cannot be below root */
209 troot = (eb_root_to_node(eb_untag(troot, EB_RGHT)))->node_p;
210
211 /* Note that  cannot be NULL at this stage */
212 troot = (eb_untag(troot, EB_LEFT))->b[EB_RGHT];
213 if (eb_clrtag(troot) == NULL)
214 return NULL;
215
216 node = eb32_entry(eb_walk_down(troot, EB_LEFT), struct eb32_node, node);


## /etc/haproxy/haproxy.cfg ##

global
log 127.0.0.1 local0
#log 127.0.0.1 local1 notice
#log netwatch local0 info
maxconn 5000
#chroot /usr/share/haproxy
stats socket /var/run/haproxy.stat mode 600
user haproxy
group haproxy
daemon
#debug
#quiet

defaults
log global
modehttp
option dontlognull
option redispatch
retries 3
maxconn 5000
timeout connect 5000
timeout client 5
timeout server 5

# Axapta Terminal Services
listen AXTS_rdp 10.131.25.20:3389
mode tcp
balance roundrobin
option tcpka
#option tcplog
timeout connect 10s
timeout client 3h
timeout server 3h
monitor-net 10.131.25.62/32
server axts1 10.131.25.21 weight 1 check
server axts2 10.131.25.22 weight 1 check

# CAS HTTP balancing
listen CAS_http 10.131.25.75:80
mode http
balance source
option httplog
option httpclose
option forwardfor
monitor-net 10.131.25.62/32
server cas1 10.131.25.73 weight 1 check
server cas2 10.1

Re: haproxy 1.4.7 segfaults under load around 1k connections

2010-06-16 Thread Morten Gade Sørensen
Hi again

Just an update on this issue.
I had it crash instantly by telnetting on the CAS_smtp listener port from the 
host defined in the monitor-net option.
It connects and then instantly disconnects without being load-balanced (which 
is expected behaviour for the monitor-net host).
The problem is that haproxy segfaults right after - or maybe after 4-5 tries.
If I remove the monitor-net option in the listener section it stops crashing.

vr was very helpful in #hapr...@irc.gnu.org and he was able to replicate the 
error. - Thanks vr!!

--
Med venlig hilsen
Fleggaard IT

Morten Gade Sørensen
Network Engineer

Tlf: +45 7230 3999
Fax: +45 7230 3998
Mail: m...@fleggaard.dk
Web: www.fleggaard-holding.dk



On 16/06/2010, at 13.57, Morten Gade Sørensen wrote:

Hi there

We have been running haproxy 1.4.1 perfectly fine without any problems for 
several months.
Today haproxy started crashing with the following message in /var/log/messages:

Jun 16 13:26:21 lb2 kernel: [ 3352.666283] haproxy[4679]: segfault at 8 ip 
0043e462 sp 7fff0dac92c8 error 4 in haproxy[40+54000]

I tried upgrading to 1.4.7 with no luck.

We primarily use haproxy for TCP load balancing of SMTP, RPC (Exchange 2010) 
and RDP. The load might have increased slightly the last few days, and this 
might be what triggers it.
Initially I suspected it was because of only 512 MB RAM had been assigned to 
the machine (running x86_64 SuSE on ESX), so I increased that to 1GB.
Also ulimit -n showed 1024, so I increased that to 32768 and sysctl fs.file-max 
= 65535 - that didn't seem to do any difference. Also tried to disable some 
logging, just to see if that had anything to do with it.

It seems to crash everytime the total connection count gets around 800-1000...

lb2:~ # uname -a
Linux lb2 2.6.31.12-0.1-default #1 SMP 2010-01-27 08:20:11 +0100 x86_64 x86_64 
x86_64 GNU/Linux

lb2:~ # haproxy -vv
HA-Proxy version 1.4.7 2010/06/07
Copyright 2000-2010 Willy Tarreau mailto:w...@1wt.eu>>

Build options :
  TARGET  = linux26
  CPU = generic
  CC  = gcc
  CFLAGS  = -O2 -g
  OPTIONS = USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes

Available polling systems :
 sepoll : pref=400,  test result OK
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 4 (4 usable), will use sepoll.

## gdb session #

lb2:/usr/local/sbin # gdb ./haproxy
GNU gdb (GDB) SUSE (6.8.91.20090930-2.4)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
For bug reporting instructions, please see:
...
Reading symbols from /usr/local/sbin/haproxy...done.
(gdb) run -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -db
Starting program: /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p 
/var/run/haproxy.pid -db
Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2
Try: zypper install -C 
"debuginfo(build-id)=591af1afa33f255704fb6a60859b93d00e205302"
Missing separate debuginfo for /lib64/libcrypt.so.1
Try: zypper install -C 
"debuginfo(build-id)=b4127c6e9abfb7711018173fc6010b5853a5a781"
Missing separate debuginfo for /lib64/libpcreposix.so.0
Try: zypper install -C 
"debuginfo(build-id)=89f575b0c91220f553dff19b0f5d8ac786cf3e21"
Missing separate debuginfo for /lib64/libpcre.so.0
Try: zypper install -C 
"debuginfo(build-id)=faf1aba9b565a29c99ce1d3944978347d6209cc3"
Missing separate debuginfo for /lib64/libc.so.6
Try: zypper install -C 
"debuginfo(build-id)=b5ded0f18b9b11c5cd6b26387426ead562c332f8"

Program received signal SIGSEGV, Segmentation fault.
eb32_lookup_ge (root=0x656df0, x=2160585045) at ebtree/eb32tree.c:212
212 troot = (eb_untag(troot, EB_LEFT))->b[EB_RGHT];
(gdb)
(gdb) bt
#0  eb32_lookup_ge (root=0x656df0, x=2160585045) at ebtree/eb32tree.c:212
#1  0x004080de in process_runnable_tasks (next=0x7fffe64c) at 
src/task.c:206
#2  0x00402829 in run_poll_loop () at src/haproxy.c:966
#3  0x0040426f in main (argc=, 
argv=0x7fffe788) at src/haproxy.c:1240
(gdb) list
207 while (eb_gettag(troot) != EB_LEFT)
208 /* Walking up from right branch, so we cannot be below root */
209 troot = (eb_root_to_node(eb_untag(troot, EB_RGHT)))->node_p;
210
211 /* Note that  cannot be NULL at this stage */
212 troot = (eb_untag(troot, EB_LEFT))->b[EB_RGHT];
213 if (eb_clrtag(troot) == NULL)
214 return NULL;
215
216 node = eb32_entry(eb_walk_down(troot, EB_LEFT), struct eb32_node, node);


## /etc/haproxy/haproxy.cfg ##

global
log 127.0.0.1 local0
#log 127.0.0.1 local1 notice
#log netwatch local0 info
maxconn 5000
#chroo

Re: haproxy 1.4.7 segfaults under load around 1k connections

2010-06-16 Thread Hervé COMMOWICK

 Hello mgades,

Willy send me the patch who fix this bug.
It is good for me, can you test it on your configuration ?

On 06/16/2010 03:16 PM, Morten Gade Sørensen wrote:

Hi again

Just an update on this issue.
I had it crash instantly by telnetting on the CAS_smtp listener port 
from the host defined in the monitor-net option.
It connects and then instantly disconnects without being load-balanced 
(which is expected behaviour for the monitor-net host).
The problem is that haproxy segfaults right after - or maybe after 4-5 
tries.
If I remove the monitor-net option in the listener section it stops 
crashing.


vr was very helpful in #hapr...@irc.gnu.org and he was able to 
replicate the error. - Thanks vr!!

You're welcome. :)

Hervé.

--
Your Network supports your *BUSINESS !*
Appliances de *contrôle d'activité* et d'*optimisation* du réseau
EXCELIANCE - Rule your Network ! - www.exceliance.fr
ZAC des Metz - 3 Rue du petit robinson
78350 Jouy en Josas
Tél: +33 1 30 67 60 74 - Fax: +33 1 75 43 40 70

>From 7ec37ed4e0b24535cd20e12ac2b3774b128f6875 Mon Sep 17 00:00:00 2001
From: Willy Tarreau 
Date: Wed, 16 Jun 2010 17:17:39 +0200
Subject: [BUG] client: don't add a new session to the list too early

Adding a new session to the sessions list too early can cause it to
indefinitely remain in the list if a request from a monitor-net comes
in TCP mode, because the session will then not be removed from the
list. This issue causes crashes very soon after when this happens.

It should be backported to 1.3 too.
---
 src/client.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/src/client.c b/src/client.c
index be0c902..476ad9f 100644
--- a/src/client.c
+++ b/src/client.c
@@ -126,7 +126,6 @@ int event_accept(int fd) {
goto out_close;
}
 
-   LIST_ADDQ(&sessions, &s->list);
LIST_INIT(&s->back_refs);
 
s->flags = 0;
@@ -146,6 +145,8 @@ int event_accept(int fd) {
s->flags |= SN_MONITOR;
}
 
+   LIST_ADDQ(&sessions, &s->list);
+
if ((t = task_new()) == NULL) { /* disable this proxy for a 
while */
Alert("out of memory in event_accept().\n");
EV_FD_CLR(fd, DIR_RD);
-- 
1.6.0.4



Re: haproxy 1.4.7 segfaults under load around 1k connections

2010-06-16 Thread Morten Gade Sørensen
Hi Hervé

I have applied the patch, and it seems to do the trick - thanks Hervé/Willy.

--
Med venlig hilsen
Fleggaard IT

Morten Gade Sørensen
Network Engineer

Tlf: +45 7230 3999
Fax: +45 7230 3998
Mail: m...@fleggaard.dk
Web: www.fleggaard-holding.dk



On 16/06/2010, at 18.27, Hervé COMMOWICK wrote:

Hello mgades,

Willy send me the patch who fix this bug.
It is good for me, can you test it on your configuration ?

On 06/16/2010 03:16 PM, Morten Gade Sørensen wrote:
Hi again

Just an update on this issue.
I had it crash instantly by telnetting on the CAS_smtp listener port from the 
host defined in the monitor-net option.
It connects and then instantly disconnects without being load-balanced (which 
is expected behaviour for the monitor-net host).
The problem is that haproxy segfaults right after - or maybe after 4-5 tries.
If I remove the monitor-net option in the listener section it stops crashing.

vr was very helpful in #hapr...@irc.gnu.org and he was able to replicate the 
error. - Thanks vr!!
You're welcome. :)

Hervé.

--
Your Network supports your *BUSINESS !*
Appliances de *contrôle d'activité* et d'*optimisation* du réseau
EXCELIANCE - Rule your Network ! - www.exceliance.fr
ZAC des Metz - 3 Rue du petit robinson
78350 Jouy en Josas
Tél: +33 1 30 67 60 74 - Fax: +33 1 75 43 40 70

<0001--BUG-client-don-t-add-a-new-session-to-the-list-to.patch>