On 2017-01-17 19:58, Willy Tarreau wrote:
I'm surprized that it only kicks in "sometimes". That makes me wonder
whether the problem is not here.
I'm starting to wonder if it might be related to network latency between the
client and HAProxy. More details below.
- it would be useful to check (using netstat -atnp) if these eternal
FIN_WAIT2 sockets still belong to haproxy or are real orphans. I
suspect that most of them are still attached to haproxy given your
timeouts ; if you manage to find some very old ones still attached
to haproxy then it would mean the issue is in haproxy.
Yes, in most cases if I select a random socket in FIN_WAIT2 from
netstat, there will be an HAProxy session associated with that socket. e.g.
$ netstat -an | grep FIN_WA | grep 122.56.198.232:26734
tcp 0 0 10.24.20.104:993 122.56.198.232:26734 FIN_WAIT2
$ echo "show sess" | sudo socat unix-connect:/var/lib/haproxy/stats
stdio | grep 122.56.198.232:26734
0x7f906fdb7050: proto=tcpv4 src=122.56.198.232:26734
fe=hosting-imaps_proxy-external-vip be=hosting-imaps_proxy-external-vip
srv=imap1 ts=08 age=25m42s calls=3
rq[f=848202h,i=0,an=00h,rx=14m45s,wx=,ax=]
rp[f=04a020h,i=0,an=00h,rx=,wx=,ax=] s0=[7,0h,fd=6684,ex=]
s1=[7,8h,fd=10039,ex=] exp=14m45s
- your idea of a race is interesting. I'm wondering what happens if
the connection spends more than 60s still attached to haproxy and
only then becomes orphaned. It could be possible that the kernel
timeout only strikes when the socket age reaches the timeout while
already being orphaned.
In my testing, I find that most of the time, 'timeout client-fin' is
applied. HAProxy sends a FIN to the client, which acks the FIN, but does
not send a FIN of its own. The HAProxy session moves into the client-fin
timeout period of 30 seconds. After 30 seconds, the HAProxy session is
removed. The FIN_WAIT2 connection state is then maintained by the kernel
for tcp_fin_timeout of 60 seconds, before being removed altogether.
Aha! Testing just now, and I managed to catch one.
To test, I open a connection to the HAProxy VIP with my python test
client. After 3 minutes, an idle time out on the backend server causes
the connection to be closed. My client doesn't close its end though,
leaving the client side connection in FIN_WAIT2.
Here's the HAProxy session at 2m19s:
$ echo "show sess" | sudo socat unix-connect:/var/lib/haproxy/stats
stdio | grep 146.185.142.165
0x7f9075ba98c0: proto=tcpv4 src=146.185.142.165:56198
fe=hosting-imap_proxy-external-vip be=hosting-imap_proxy-external-vip
srv=imap1 ts=08 age=2m19s calls=2
rq[f=808000h,i=0,an=00h,rx=32m41s,wx=,ax=]
rp[f=048202h,i=0,an=00h,rx=32m41s,wx=,ax=] s0=[7,8h,fd=14490,ex=]
s1=[7,8h,fd=14919,ex=] exp=32m41s
And then again at 3m16s after the backend has closed its side of the
connection:
$ echo "show sess" | sudo socat unix-connect:/var/lib/haproxy/stats
stdio | grep 146.185.142.165
0x7f9075ba98c0: proto=tcpv4 src=146.185.142.165:56198
fe=hosting-imap_proxy-external-vip be=hosting-imap_proxy-external-vip
srv=imap1 ts=08 age=3m16s calls=3
rq[f=808000h,i=0,an=00h,rx=31m44s,wx=,ax=]
rp[f=04a020h,i=0,an=00h,rx=,wx=,ax=] s0=[7,0h,fd=14490,ex=]
s1=[7,8h,fd=14919,ex=] exp=31m44s
$ netstat -an | grep 146.185.142.165:56198
tcp 0 0 10.24.20.104:143 146.185.142.165:56198 FIN_WAIT2
Note that the exp value is 31m44s, not ~14s as we'd expect if timeout
client-fin had been applied.
To compare, here's what usually happens. At age=2m59s just before the
server disconnection:
$ echo "show sess" | sudo socat unix-connect:/var/lib/haproxy/stats
stdio | grep 146.185.142.165
0x7f907f8e1ef0: proto=tcpv4 src=146.185.142.165:56202
fe=hosting-imap_proxy-external-vip be=hosting-imap_proxy-external-vip
srv=imap1 ts=08 age=2m59s calls=2
rq[f=808000h,i=0,an=00h,rx=32m,wx=,ax=]
rp[f=048202h,i=0,an=00h,rx=32m,wx=,ax=] s0=[7,8h,fd=20385,ex=]
s1=[7,8h,fd=20440,ex=] exp=32m
And then at 3m:
$ echo "show sess" | sudo socat unix-connect:/var/lib/haproxy/stats
stdio | grep 146.185.142.165
0x7f907f8e1ef0: proto=tcpv4 src=146.185.142.165:56202
fe=hosting-imap_proxy-external-vip be=hosting-imap_proxy-external-vip
srv=imap1 ts=08 age=3m calls=3 rq[f=808000h,i=0,an=00h,rx=29s,wx=,ax=]
rp[f=04a020h,i=0,an=00h,rx=,wx=,ax=] s0=[7,0h,fd=20385,ex=]
s1=[7,8h,fd=20440,ex=] exp=29s
Here you can see exp=29s, suggesting that the client-fin timeout has
kicked in.
I still don't know what causes the two different behaviours. It's quite
difficult to trigger the abnormal behaviour in my testing (only happens
maybe 1 time in 10), but with a session rate of up to 80/s on the
production service, it occurs frequently.
I don't have enough data to back this up yet, but I wonder if network
latency might be involved somehow. For most of today, testing from a
machine with a 3ms RTT to the service, I was unable to reproduce the
problem. Just now though, I've tested from a VM in Amsterdam with a
350ms RTT, and have been able to reproduce the problem 2 times in 4
attempts. Anecdata at this point, but it would lend itself to the race
condition theory.
Thanks,
Richard
_____________________________________________________________________________
This email has been filtered by SMX. For more info visit http://smxemail.com
_____________________________________________________________________________