On 2017-01-17 19:58, Willy Tarreau wrote:
I'm surprized that it only kicks in "sometimes". That makes me wonder
whether the problem is not here.

I'm starting to wonder if it might be related to network latency between the 
client and HAProxy. More details below.

   - it would be useful to check (using netstat -atnp) if these eternal
     FIN_WAIT2 sockets still belong to haproxy or are real orphans. I
     suspect that most of them are still attached to haproxy given your
     timeouts ; if you manage to find some very old ones still attached
     to haproxy then it would mean the issue is in haproxy.
Yes, in most cases if I select a random socket in FIN_WAIT2 from netstat, there will be an HAProxy session associated with that socket. e.g.

$ netstat -an | grep FIN_WA | grep 122.56.198.232:26734
tcp        0      0 10.24.20.104:993        122.56.198.232:26734 FIN_WAIT2
$ echo "show sess" | sudo socat unix-connect:/var/lib/haproxy/stats stdio | grep 122.56.198.232:26734 0x7f906fdb7050: proto=tcpv4 src=122.56.198.232:26734 fe=hosting-imaps_proxy-external-vip be=hosting-imaps_proxy-external-vip srv=imap1 ts=08 age=25m42s calls=3 rq[f=848202h,i=0,an=00h,rx=14m45s,wx=,ax=] rp[f=04a020h,i=0,an=00h,rx=,wx=,ax=] s0=[7,0h,fd=6684,ex=] s1=[7,8h,fd=10039,ex=] exp=14m45s

   - your idea of a race is interesting. I'm wondering what happens if
     the connection spends more than 60s still attached to haproxy and
     only then becomes orphaned. It could be possible that the kernel
     timeout only strikes when the socket age reaches the timeout while
     already being orphaned.
In my testing, I find that most of the time, 'timeout client-fin' is applied. HAProxy sends a FIN to the client, which acks the FIN, but does not send a FIN of its own. The HAProxy session moves into the client-fin timeout period of 30 seconds. After 30 seconds, the HAProxy session is removed. The FIN_WAIT2 connection state is then maintained by the kernel for tcp_fin_timeout of 60 seconds, before being removed altogether.

Aha! Testing just now, and I managed to catch one.

To test, I open a connection to the HAProxy VIP with my python test client. After 3 minutes, an idle time out on the backend server causes the connection to be closed. My client doesn't close its end though, leaving the client side connection in FIN_WAIT2.

Here's the HAProxy session at 2m19s:

$ echo "show sess" | sudo socat unix-connect:/var/lib/haproxy/stats stdio | grep 146.185.142.165 0x7f9075ba98c0: proto=tcpv4 src=146.185.142.165:56198 fe=hosting-imap_proxy-external-vip be=hosting-imap_proxy-external-vip srv=imap1 ts=08 age=2m19s calls=2 rq[f=808000h,i=0,an=00h,rx=32m41s,wx=,ax=] rp[f=048202h,i=0,an=00h,rx=32m41s,wx=,ax=] s0=[7,8h,fd=14490,ex=] s1=[7,8h,fd=14919,ex=] exp=32m41s

And then again at 3m16s after the backend has closed its side of the connection:

$ echo "show sess" | sudo socat unix-connect:/var/lib/haproxy/stats stdio | grep 146.185.142.165 0x7f9075ba98c0: proto=tcpv4 src=146.185.142.165:56198 fe=hosting-imap_proxy-external-vip be=hosting-imap_proxy-external-vip srv=imap1 ts=08 age=3m16s calls=3 rq[f=808000h,i=0,an=00h,rx=31m44s,wx=,ax=] rp[f=04a020h,i=0,an=00h,rx=,wx=,ax=] s0=[7,0h,fd=14490,ex=] s1=[7,8h,fd=14919,ex=] exp=31m44s
$ netstat -an | grep 146.185.142.165:56198
tcp        0      0 10.24.20.104:143        146.185.142.165:56198 FIN_WAIT2

Note that the exp value is 31m44s, not ~14s as we'd expect if timeout client-fin had been applied.

To compare, here's what usually happens. At age=2m59s just before the server disconnection:

$ echo "show sess" | sudo socat unix-connect:/var/lib/haproxy/stats stdio | grep 146.185.142.165 0x7f907f8e1ef0: proto=tcpv4 src=146.185.142.165:56202 fe=hosting-imap_proxy-external-vip be=hosting-imap_proxy-external-vip srv=imap1 ts=08 age=2m59s calls=2 rq[f=808000h,i=0,an=00h,rx=32m,wx=,ax=] rp[f=048202h,i=0,an=00h,rx=32m,wx=,ax=] s0=[7,8h,fd=20385,ex=] s1=[7,8h,fd=20440,ex=] exp=32m

And then at 3m:

$ echo "show sess" | sudo socat unix-connect:/var/lib/haproxy/stats stdio | grep 146.185.142.165 0x7f907f8e1ef0: proto=tcpv4 src=146.185.142.165:56202 fe=hosting-imap_proxy-external-vip be=hosting-imap_proxy-external-vip srv=imap1 ts=08 age=3m calls=3 rq[f=808000h,i=0,an=00h,rx=29s,wx=,ax=] rp[f=04a020h,i=0,an=00h,rx=,wx=,ax=] s0=[7,0h,fd=20385,ex=] s1=[7,8h,fd=20440,ex=] exp=29s

Here you can see exp=29s, suggesting that the client-fin timeout has kicked in.

I still don't know what causes the two different behaviours. It's quite difficult to trigger the abnormal behaviour in my testing (only happens maybe 1 time in 10), but with a session rate of up to 80/s on the production service, it occurs frequently.

I don't have enough data to back this up yet, but I wonder if network latency might be involved somehow. For most of today, testing from a machine with a 3ms RTT to the service, I was unable to reproduce the problem. Just now though, I've tested from a VM in Amsterdam with a 350ms RTT, and have been able to reproduce the problem 2 times in 4 attempts. Anecdata at this point, but it would lend itself to the race condition theory.

Thanks,
Richard
_____________________________________________________________________________

This email has been filtered by SMX. For more info visit http://smxemail.com
_____________________________________________________________________________


Reply via email to