I sent some strace logs to Willy in private email
and meanwhile I've been taking a crash course on haproxy internals.

I only have a superficial understanding (or misunderstanding) from my initial 
look.
But perhaps here are some clues to what is happening.
The test case fails for me about 80% of the time
and is sensitive to timing and delays caused by debugger tracing.

The scenario is that a server restarts causing two client connections to 
reconnect.
The first client reconnects on fd=8 and blasts some data.
haproxy attempts a new server connection on fd=9, but the server isn't fully 
available yet
and haproxy closes fd=9, but the first client session isn't terminated yet
because it looks like haproxy will retry server connections.
However the stream for the first client fd=8 still references fd=9 on the 
server side.

The second client connects and is assigned a recycled fd=9.
haproxy starts reading whatever data it can from both clients
while retrying server connections.

Eventually a session_interface connection wake handler for the first session 
(input fd=8)
tries to signal its server side that data is available (perhaps through 
stream_int_chk_snd_conn,
although I don't see it mentioned in the stack trace). But something looks bad 
(some hand waving here)
and the handler directs the server side to stop both reading and writing.
** But the fd of the server side stream_interface is still referring to fd=9,
 which is actually now the input to the second session. The second session 
hangs.

Here is a high level trace of some hopefully relevant events,
starting with accepting the first reconnection (fd=8)
up to the point that  fd_stop_both(fd=9) is called
on the second session's input fd as part of processing the first session.

I also print the connections associated with the first

00000002:test_og_in.accept(0007)=0008 from [10.59.159.20:36494]
create_server_socket: fd=0009
epoll_ctl: sts=0, op=1, fd=0008 ev=2001, eo=21, en=25
== do_poll ==
fd_process_cached_events: fd=0009, eo=32, en=12
epoll_ctl: sts=0, op=1, fd=0009 ev=4, eo=12, en=52
== do_poll ==
epoll: fd=0008 e=00000001, n=01, fdtab.ev.old=00, fdtab.ev.new=01
== do_poll ==
epoll: fd=0008 e=00000001, n=01, fdtab.ev.old=00, fdtab.ev.new=01
epoll: fd=0009 e=0000001c, n=1c, fdtab.ev.old=00, fdtab.ev.new=1c
fd_stop_both: fd=0009
fd_process_cached_events: fd=0009, eo=72, en=62
close: fd=0009
== do_poll ==
epoll: fd=0007 e=00000001, n=01, fdtab.ev.old=01, fdtab.ev.new=01
00000003:test_og_in.accept(0007)=0009 from [10.59.159.20:36495]
fd_process_cached_events: fd=0009, eo=23, en=21
create_server_socket: fd=000a
epoll_ctl: sts=0, op=1, fd=0009 ev=2001, eo=21, en=25
== do_poll ==
epoll: fd=0009 e=00000001, n=01, fdtab.ev.old=00, fdtab.ev.new=01
fd_process_cached_events: fd=0009, eo=27, en=25
epoll_ctl: sts=0, op=1, fd=000a ev=4, eo=12, en=52
== do_poll ==
epoll: fd=0009 e=00000001, n=01, fdtab.ev.old=00, fdtab.ev.new=01
fd_process_cached_events: fd=0009, eo=27, en=25
== do_poll ==
epoll: fd=0009 e=00000001, n=01, fdtab.ev.old=00, fdtab.ev.new=01
fd_process_cached_events: fd=0009, eo=27, en=25
== do_poll ==
epoll: fd=0008 e=00000001, n=01, fdtab.ev.old=00, fdtab.ev.new=01
epoll: fd=000a e=0000001c, n=1c, fdtab.ev.old=00, fdtab.ev.new=1c

Program received signal SIGINT, Interrupt.
0x00007ffff762b067 in __GI_raise (sig=sig@entry=2) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) where
#0  0x00007ffff762b067 in __GI_raise (sig=sig@entry=2) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56down

#1  0x0000000000416542 in fd_stop_both (fd=9) at src/fd.c:399
#2  0x000000000045fe78 in si_chk_snd (si=<optimized out>) at 
include/proto/stream_interface.h:342
#3  stream_int_notify (si=si@entry=0x7411c8) at src/stream_interface.c:497
#4  0x00000000004603a4 in si_conn_wake_cb (conn=0x740d50) at 
src/stream_interface.c:571
#5  0x000000000043f8d6 in conn_fd_handler (fd=<optimized out>) at 
src/connection.c:135
#6  0x0000000000415f04 in fd_process_cached_events () at src/fd.c:226
#7  0x00000000004070c2 in run_poll_loop () at src/haproxy.c:1591
#8  0x0000000000404017 in main (argc=<optimized out>, argv=<optimized out>) at 
src/haproxy.c:1933
(gdb)


gdb) frame
#3  stream_int_notify (si=si@entry=0x7421c8) at src/stream_interface.c:497
497                     si_chk_snd(si_opposite(si));


(gdb) print *((struct connection *)(si->end))
$8 = {obj_type = OBJ_TYPE_CONN, err_code = 0 '\000', send_proxy_ofs = 0, flags = 2134790, 
ctrl = 0x6c31c0 <proto_tcpv4>, xprt = 0x6c2600 <raw_sock>,
  data = 0x6c2980 <si_conn_cb>, xprt_ctx = 0x0, owner = 0x7421c8, xprt_st = 0, 
t = {sock = {fd = 8}}, target = 0x703970, list = {n = 0x741d90, p = 0x741d90},
  proxy_netns = 0x0, addr = {from = {ss_family = 2, __ss_align = 0,
      __ss_padding = 
"\016\000\000\000\000\000\000\000\240\302\231\367\377\177\000\000\016\275J\000\000\000\000\000U\262f\367\377\177\000\000\016\000\000\000\000\000\000\000Q\251f\367\377\177\000\000\240\302\231\367\377\177\000\000\001",
 '\000' <repeats 39 times>, "\240\302\231\367\377\177\000\000\"\373e\367\377\177\000"}, 
to = {ss_family = 0,
      __ss_align = 0, __ss_padding = '\000' <repeats 111 times>}}}

(gdb) print *((struct connection *)(si_opposite(si)->end))
$9 = {obj_type = OBJ_TYPE_CONN, err_code = 0 '\000', send_proxy_ofs = 0, flags = 6070272, 
ctrl = 0x6c31c0 <proto_tcpv4>, xprt = 0x6c2600 <raw_sock>,
  data = 0x6c2980 <si_conn_cb>, xprt_ctx = 0x0, owner = 0x7421e8, xprt_st = 0, 
t = {sock = {fd = 9}}, target = 0x70d390, list = {n = 0x742310, p = 0x742310},
  proxy_netns = 0x0, addr = {from = {ss_family = 0, __ss_align = 0, __ss_padding = 
'\000' <repeats 111 times>}, to = {ss_family = 2, __ss_align = 0,
      __ss_padding = '\000' <repeats 111 times>}}}

-tim




On 2016-07-20 4:37 PM, Willy Tarreau wrote:
On Wed, Jul 20, 2016 at 02:27:28PM -0600, Tim Butler wrote:
Hi Willy,

First of all thanks for your time and your quick reply.
This is my first dive into the code, so that is why the lack of polling
seemed suspicious. I see the cached events no to avoid polling.

I applied both the 'timeout server-fin' and 'tcp-ut' changes
with no affect.

OK.

More than that, at the time the 2 client sessions reconnect,
there are no server connections.
The second set of traces I sent appear to show haproxy making
 multiple connection attempts to the server for the first reconnected client
until the server connection succeeds.
Then a server reconnect for the second session, but in the failure case
the client is just hanging out "disabled for read". No data is read for
the second session even though netstat shows a full recv buffer for the second 
socket
and all sockets (client/haproxy/server) are in CONNECTED states. The clients 
connect to haproxy and then blast
a bunch of data even though the haproxy server connections are still underway.
 Even if the clients closed after sending some data, wouldn't that data before 
the close make it to a server?
In this case I'm not expecting the clients to close, but they could have a 
defect.

Well, it's still difficult for me to get the detailed sequence here. Could
you please try to run strace -tts200 -o log -p $pid on the haproxy process
during a successful case and a failed one ? It should show whether or not
something unexpected happens.

Thanks,
Willy


Reply via email to