Hi Will, On Wed, Apr 10, 2013 at 05:00:33PM -0700, Will Glass-Husain wrote: > Hi, > > I have more info on this. It seems that this problem occurs when I bring a > haproxy server up and there's already an existing haproxy server in the > peer group > --> Server A is running and handles requests > --> Server B comes up and immediately dies. > > I believe if both servers are already up and the request using the stick > table goes to A, B dies at that point as well. (Need to check this case > further). > > One thing to note is that I am using two stick tables used by about 10 back > ends. Excerpt: > > peers mypeers > peer www2-new 10.0.3.174:1024 > peer www1-new 10.0.2.85:1024 > > backend simulate > option httpchk OPTIONS /simulate/api/status > stick-table type string len 40 size 5M expire 30m peers mypeers > stick store-response set-cookie(SIMULATE_STICKY_SESSION) table simulate > stick on cookie(SIMULATE_STICKY_SESSION) table simulate > stick on url_param(SIMULATE_STICKY_SESSION,;) table simulate > > server app1 10.0.2.11:8080 cookie app1 check inter 10000 > server app2 10.0.3.11:8080 cookie app2 check inter 10000 > > backend simulate-qa > option httpchk OPTIONS /simulate/api/status > > stick-table type string len 40 size 5M expire 30m peers mypeers > stick store-response set-cookie(SIMULATE_STICKY_SESSION) table > simulate-qa > stick on cookie(SIMULATE_STICKY_SESSION) table simulate-qa > stick on url_param(SIMULATE_STICKY_SESSION,;) table simulate-qa > > server qa1 10.0.2.125:8080 cookie qa1 check inter 10000 > server qa2 10.0.3.125:8080 cookie qa2 check inter 10000 > > > > Here's the gdb output, following instructions from Lukas. > > ------------------- > > gdb ./haproxy /tmp/core-haproxy-6-117-126-25675-1365638027 > > Reading symbols from /home/wglass/haproxy-1.5-dev18/haproxy...done. > [New LWP 25675] > > warning: Can't read pathname for load map: Input/output error. > Core was generated by `./haproxy -f /etc/haproxy/haproxy.cfg -D -p > /var/run/haproxy.pid'. > Program terminated with signal 6, Aborted. > #0 0x00007f1c2fd5c425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > > (gdb) bt > #0 0x00007f1c2fd5c425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > #1 0x00007f1c2fd5fb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6 > #2 0x00007f1c2fd9a39e in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #3 0x00007f1c2fe30807 in __fortify_fail () from > /lib/x86_64-linux-gnu/libc.so.6 > #4 0x00007f1c2fe307d0 in __stack_chk_fail () from > /lib/x86_64-linux-gnu/libc.so.6 > #5 0x000000000043f902 in peer_io_handler (si=0xeec028) at src/peers.c:1041
I don't like this one, it looks like something has overflown the stack in this function. I don't understand what, since there are only pointers in this stack, no user data. Maybe one of them is improperly referenced. Could you please check if dev17 had the same issue, just in case ? There were very little changes between the two on the peers, but I don't remember any such report recently. Hmmm, are some of your stick-tables of type "string" ? I'm seeing something I don't like at all in the peers code, the stkey.key uses its buf without allocating room for it. It could very well be the cause of what you're seeing. So in practice, it will work as long as the string you're receiving is shorter than the size of the union which is a in_addr6 (16 bytes), but above that it will do random crap (unless I'm mistaken, of course). Thanks, Willy

