Hi Tim,

On Tue, Jul 19, 2016 at 09:42:44PM -0600, Tim Butler wrote:
> Hi,
> 
> I'm looking for any enlightenment or suggestions on pursuing
> the following problem.
> 
> After bouncing a server with two tcp-mode sessions,
> my second reconnecting session hangs,
> even though the frontend socket recv buffer is full
> and the server is connected. netstat shows all parties connected
> with data on haproxy's doorstep.
> 
> It may very well be that the client code is doing something bad,
> but my final hung state appears as if haproxy has stopped processing the 
> session
> even though the frontend and backend are connected.
> Based on the traces below, it appears to me that in the failure case,
> the frontend fd is removed from polling, even though netstat says data is 
> available
> and the server is connected.

The fact that it's removed from polling only means there's no more waiting
expected from it. Here I'm seeing this :

> epoll_ctl: op=2, fd=9, ev=0, eo=24, en=20    <-- fe fd is deleted from 
> polling and never added again

The old event was 0x24, which is ready for write, polling for read. The fact
that it disappears only means no polling is needed anymore. And this is the
case, as the new event 0x20 means "ready for write", "disabled for read". This
seems to mean for me that a close on read was received from the client and was
forwarded to the server side, and we're waiting for the server to close in turn
before forwarding this to the client.

The fact that you're working in TCP mode makes me think you're relying on
long timeouts, and if the server remains unreachable during this time, it
can have to wait for the timeout to expire. You may want to play with
"timeout server-fin" to see if that fixes the behaviour, which will then
confirm it's what you're observing. Note that on Linux you also have
"tcp-ut" on the server side to indicate how long the kernel should consider
the socket alive when it does not acknowledge data. It's even more robust as
it doesn't have to wait for your client to time out first.

Please test this and keep us updated.

thanks!
Willy

Reply via email to