Hi again

Just an update on this issue.
I had it crash instantly by telnetting on the CAS_smtp listener port from the 
host defined in the monitor-net option.
It connects and then instantly disconnects without being load-balanced (which 
is expected behaviour for the monitor-net host).
The problem is that haproxy segfaults right after - or maybe after 4-5 tries.
If I remove the monitor-net option in the listener section it stops crashing.

vr was very helpful in #hapr...@irc.gnu.org and he was able to replicate the 
error. - Thanks vr!!

--
Med venlig hilsen
Fleggaard IT

Morten Gade Sørensen
Network Engineer

Tlf: +45 7230 3999
Fax: +45 7230 3998
Mail: m...@fleggaard.dk<x-msg://16/m...@fleggaard.dk>
Web: www.fleggaard-holding.dk<x-msg://16/www.fleggaard-holding.dk>



On 16/06/2010, at 13.57, Morten Gade Sørensen wrote:

Hi there

We have been running haproxy 1.4.1 perfectly fine without any problems for 
several months.
Today haproxy started crashing with the following message in /var/log/messages:

Jun 16 13:26:21 lb2 kernel: [ 3352.666283] haproxy[4679]: segfault at 8 ip 
000000000043e462 sp 00007fff0dac92c8 error 4 in haproxy[400000+54000]

I tried upgrading to 1.4.7 with no luck.

We primarily use haproxy for TCP load balancing of SMTP, RPC (Exchange 2010) 
and RDP. The load might have increased slightly the last few days, and this 
might be what triggers it.
Initially I suspected it was because of only 512 MB RAM had been assigned to 
the machine (running x86_64 SuSE on ESX), so I increased that to 1GB.
Also ulimit -n showed 1024, so I increased that to 32768 and sysctl fs.file-max 
= 65535 - that didn't seem to do any difference. Also tried to disable some 
logging, just to see if that had anything to do with it.

It seems to crash everytime the total connection count gets around 800-1000...

lb2:~ # uname -a
Linux lb2 2.6.31.12-0.1-default #1 SMP 2010-01-27 08:20:11 +0100 x86_64 x86_64 
x86_64 GNU/Linux

lb2:~ # haproxy -vv
HA-Proxy version 1.4.7 2010/06/07
Copyright 2000-2010 Willy Tarreau <w...@1wt.eu<mailto:w...@1wt.eu>>

Build options :
  TARGET  = linux26
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g
  OPTIONS = USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes

Available polling systems :
     sepoll : pref=400,  test result OK
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 4 (4 usable), will use sepoll.

###### gdb session #####

lb2:/usr/local/sbin # gdb ./haproxy
GNU gdb (GDB) SUSE (6.8.91.20090930-2.4)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/sbin/haproxy...done.
(gdb) run -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -db
Starting program: /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p 
/var/run/haproxy.pid -db
Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2
Try: zypper install -C 
"debuginfo(build-id)=591af1afa33f255704fb6a60859b93d00e205302"
Missing separate debuginfo for /lib64/libcrypt.so.1
Try: zypper install -C 
"debuginfo(build-id)=b4127c6e9abfb7711018173fc6010b5853a5a781"
Missing separate debuginfo for /lib64/libpcreposix.so.0
Try: zypper install -C 
"debuginfo(build-id)=89f575b0c91220f553dff19b0f5d8ac786cf3e21"
Missing separate debuginfo for /lib64/libpcre.so.0
Try: zypper install -C 
"debuginfo(build-id)=faf1aba9b565a29c99ce1d3944978347d6209cc3"
Missing separate debuginfo for /lib64/libc.so.6
Try: zypper install -C 
"debuginfo(build-id)=b5ded0f18b9b11c5cd6b26387426ead562c332f8"

Program received signal SIGSEGV, Segmentation fault.
eb32_lookup_ge (root=0x656df0, x=2160585045) at ebtree/eb32tree.c:212
212 troot = (eb_untag(troot, EB_LEFT))->b[EB_RGHT];
(gdb)
(gdb) bt
#0  eb32_lookup_ge (root=0x656df0, x=2160585045) at ebtree/eb32tree.c:212
#1  0x00000000004080de in process_runnable_tasks (next=0x7fffffffe64c) at 
src/task.c:206
#2  0x0000000000402829 in run_poll_loop () at src/haproxy.c:966
#3  0x000000000040426f in main (argc=<value optimized out>, 
argv=0x7fffffffe788) at src/haproxy.c:1240
(gdb) list
207 while (eb_gettag(troot) != EB_LEFT)
208 /* Walking up from right branch, so we cannot be below root */
209 troot = (eb_root_to_node(eb_untag(troot, EB_RGHT)))->node_p;
210
211 /* Note that <troot> cannot be NULL at this stage */
212 troot = (eb_untag(troot, EB_LEFT))->b[EB_RGHT];
213 if (eb_clrtag(troot) == NULL)
214 return NULL;
215
216 node = eb32_entry(eb_walk_down(troot, EB_LEFT), struct eb32_node, node);


###### /etc/haproxy/haproxy.cfg ######

global
log 127.0.0.1 local0
#log 127.0.0.1 local1 notice
#log netwatch local0 info
maxconn 5000
#chroot /usr/share/haproxy
stats socket /var/run/haproxy.stat mode 600
user haproxy
group haproxy
daemon
#debug
#quiet

defaults
log global
mode    http
option dontlognull
option redispatch
retries 3
maxconn 5000
timeout connect 5000
timeout client 50000
timeout server 50000

# Axapta Terminal Services
listen AXTS_rdp 10.131.25.20:3389
mode tcp
balance roundrobin
option tcpka
#option tcplog
timeout connect 10s
timeout client 3h
timeout server 3h
monitor-net 10.131.25.62/32
server axts1 10.131.25.21 weight 1 check
server axts2 10.131.25.22 weight 1 check

# CAS HTTP balancing
listen CAS_http 10.131.25.75:80
mode http
balance source
option httplog
option httpclose
option forwardfor
monitor-net 10.131.25.62/32
server cas1 10.131.25.73 weight 1 check
server cas2 10.131.25.74 weight 1 check

# CAS HTTPS balancing
listen CAS_https 10.131.25.75:443
mode tcp
balance source
#option tcplog
option ssl-hello-chk
option tcpka
timeout client 3h
timeout server 3h
        #clitimeout 180000
       #srvtimeout 180000
        #contimeout 4000
monitor-net 10.131.25.62/32
server cas1 10.131.25.73 weight 1 check
server cas2 10.131.25.74 weight 1 check

# RPC Endpoint map
listen CAS_rpc 10.131.25.75:135,10.131.25.75:55000,10.131.25.75:55001
mode tcp
balance source
#option tcplog
timeout client 3h
timeout server 3h
monitor-net 10.131.25.62/32
server cas1 10.131.25.73 weight 1 check
server cas2 10.131.25.74 weight 1 check

# CAS POP3s balancing
listen CAS_pop3s 10.131.25.75:995
mode tcp
balance roundrobin
#option tcplog
monitor-net 10.131.25.62/32
server cas1 10.131.25.73 weight 1 check
server cas2 10.131.25.74 weight 1 check

# CAS SMTP balancing
listen CAS_smtp 10.131.25.75:25
mode tcp
balance roundrobin
#option tcplog
monitor-net 10.131.25.62/32
server cas1 10.131.25.73 weight 1 check
server cas2 10.131.25.74 weight 1 check

# Calle Terminal Services
listen CATS_rdp 10.131.25.25:3389
mode tcp
balance roundrobin
option tcpka
#option tcplog
timeout connect 10s
timeout client 3h
timeout server 3h
monitor-net 10.131.25.62/32
server cats1 10.131.25.26 weight 1 check
server cats2 10.131.25.27 weight 1 check

listen stats
# disabled
bind       :8888
stats uri /

###########

Thanks in advance!

--
Med venlig hilsen
Fleggaard IT

Morten Gade Sørensen
Network Engineer

Tlf: +45 7230 3999
Fax: +45 7230 3998
Mail: m...@fleggaard.dk<x-msg://16/m...@fleggaard.dk>
Web: www.fleggaard-holding.dk<x-msg://16/www.fleggaard-holding.dk>




Reply via email to