Hi,
We have a setup where there's haproxy (1.4.15) in front of a few apache
servers. We have a test suite that occasionally fails with 502 when going
through haproxy (this doesn't happen when going directly to one of the
apache servers). I was able to have a tcp capture while that was happening,
and I do see some strange behavior. From what I can see haproxy sends a
request to the apache server, then hangs up the tcp connection (while
response is arriving), and reopens a second tcp connection on which it
sends nothing. After 20 seconds apache closes down the second connection
and haproxy sends 502 to the client.
The haproxy log shows the following:
May 17 14:03:20 localhost haproxy[3695]:
10.214.131.22:38909[17/May/2012:14:02:59.347] api-http-ipv4
radosgw-http/plana93-1
0/0/2/-1/21020 502 204 - - SH-- 0/0/0/0/1 0/0 "GET
/jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1"
Which means, if I decipher it correctly, that the server hung the
connection after more than 20 seconds. However, looking at the server logs
I couldn't find evidence for that happening. I do see the server responding
immediately.
I was taking a tcpdump while that was happening, and I do see some strange
behavior. The haproxy log shows the following (captured on the haproxy
machine):
10.214.131.22 client
10.214.131.25 haproxy
10.214.133.23 apache
1. client <-> haproxy
Client sends request to haproxy
18073 2.787839 10.214.131.22 -> 10.214.131.25 TCP 38909 > http [SYN]
Seq=0 Win=14600 Len=0 MSS=1460 TSV=25089868 TSER=0 WS=7
18074 2.787867 10.214.131.25 -> 10.214.131.22 TCP http > 38909 [SYN, ACK]
Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=25177841 TSER=25089868 WS=7
18075 2.787948 10.214.131.22 -> 10.214.131.25 TCP 38909 > http [ACK]
Seq=1 Ack=1 Win=14720 Len=0 TSV=25089868 TSER=25177841
18076 2.788028 10.214.131.22 -> 10.214.131.25 HTTP GET
/jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1
2. haproxy <-> apache (connection 1)
haproxy sends request to apache
18077 2.788185 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [SYN]
Seq=0 Win=14600 Len=0 MSS=1460 TSV=25177841 TSER=0 WS=7
18078 2.789498 10.214.133.23 -> 10.214.131.25 TCP http > 39579 [SYN, ACK]
Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=248773204 TSER=25177841 WS=7
18079 2.789530 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [ACK]
Seq=1 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204
18080 2.789576 10.214.131.25 -> 10.214.133.23 HTTP GET
/jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1
18081 2.789605 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [FIN, ACK]
Seq=333 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204
3. haproxy <-> apache (connection 2)
haproxy creates a second tcp connection to the same apache server
18082 2.789650 10.214.131.25 -> 10.214.133.23 TCP 39580 > http [SYN]
Seq=0 Win=14600 Len=0 MSS=1460 TSV=25177841 TSER=0 WS=7
18083 2.790896 10.214.133.23 -> 10.214.131.25 TCP http > 39580 [SYN, ACK]
Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=248773204 TSER=25177841 WS=7
18084 2.790926 10.214.131.25 -> 10.214.133.23 TCP 39580 > http [ACK]
Seq=1 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204
4. haproxy <-> apache (connection 1)
apache returns http header + part of the data. HTTP header is valid.
18085 2.790938 10.214.133.23 -> 10.214.131.25 TCP http > 39579 [ACK]
Seq=1 Ack=333 Win=15616 Len=0 TSV=248773204 TSER=25177841
18086 2.797494 10.214.133.23 -> 10.214.131.25 HTTP HTTP/1.1 200 OK
(application/octet-stream)
haproxy sends reset on the tcp connection
18087 2.797525 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [RST]
Seq=334 Win=0 Len=0
apache sends more data
18088 2.797545 10.214.133.23 -> 10.214.131.25 HTTP Continuation or
non-HTTP traffic
haproxy sends another reset
18089 2.797558 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [RST]
Seq=334 Win=0 Len=0
5. haproxy -> client
ack for the previous packet
18090 2.825478 10.214.131.25 -> 10.214.131.22 TCP http > 38909 [ACK]
Seq=1 Ack=282 Win=15616 Len=0 TSV=25177845 TSER=25089868
...
6. haproxy <-> apache (connection 2)
haproxy resending SYN, ACK (after a second): could it be that it lost the
first ACK response?
18208 3.788175 10.214.133.23 -> 10.214.131.25 TCP http > 39580 [SYN, ACK]
Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=248773304 TSER=25177841 WS=7
18209 3.788197 10.214.131.25 -> 10.214.133.23 TCP [TCP Dup ACK 18084#1]
39580 > http [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSV=25177941 TSER=248773304
SLE=0 SRE=1
...
<~20 seconds later>
7. haproxy <-> apache (connection 2)
apache hangs up the connection as haproxy didn't send anything on it
18557 23.808745 10.214.133.23 -> 10.214.131.25 TCP http > 39580 [FIN, ACK]
Seq=1 Ack=1 Win=14592 Len=0 TSV=248775306 TSER=25177941
18558 23.808876 10.214.131.25 -> 10.214.133.23 TCP 39580 > http [RST, ACK]
Seq=1 Ack=2 Win=14720 Len=0 TSV=25179943 TSER=248775306
haproxy <-> client
close connection
18559 23.808917 10.214.131.25 -> 10.214.131.22 HTTP HTTP/1.0 502 Bad
Gateway (text/html)
18560 23.809737 10.214.131.22 -> 10.214.131.25 TCP 38909 > http [FIN, ACK]
Seq=282 Ack=206 Win=15744 Len=0 TSV=25091970 TSER=25179943
HTTP request header
GET /jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1
Host: plana15.front.sepia.ceph.com
Accept-Encoding: identity
Date: Thu, 17 May 2012 18:02:59 GMT
Content-Length: 0
Authorization: AWS MDM21HVPZOL66IC8OIOF:2pAaDJ8w5YLi5ZAwL7f7ZGxMNPo=
User-Agent: Boto/2.3.0 (linux2)
X-Forwarded-For: 10.214.131.22
Connection: close
HTTP response header (apache -> haproxy on connection 1):
HTTP/1.1 200 OK
Date: Thu, 17 May 2012 18:02:59 GMT
Server: Apache/2.2.20 (Ubuntu)
Accept-Ranges: bytes
Last-Modified: Thu, 17 May 2012 18:02:59 GMT
ETag: "3580afcd8aaf507ab9860473ab48186b"
Content-Length: 4194304
Connection: close
Content-Type: application/octet-stream
We also see the same symptoms on a different environment that runs 1.5dev7,
though I didn't take any network capture so it might be a different issue.
Thanks,
Yehuda
And here is the haproxy config:
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
maxconn 4096
daemon
defaults
log global
mode http
option httplog
option dontlognull
retries 3
option redispatch
maxconn 2000
contimeout 5000
frontend api-http-ipv4
mode http
bind 10.214.131.25:80
default_backend radosgw-http
option forwardfor
option httpclose
clitimeout 50000
reqidel ^X-Forwarded-For:.*
backend radosgw-http
balance leastconn
srvtimeout 50000
http-check expect ! rstatus ^5
option httpchk HEAD /
timeout check 6
timeout connect 2
timeout http-request 3
server plana88-1 10.214.133.28:80 check inter 2000 rise 2 fall 5
server plana88-2 10.214.133.28:81 check inter 2000 rise 2 fall 5
server plana93-1 10.214.133.23:80 check inter 2000 rise 2 fall 5
server plana93-2 10.214.133.23:81 check inter 2000 rise 2 fall 5
listen stats 0.0.0.0:9000
mode http
stats enable
stats uri /
stats refresh 15sec
stats show-desc clitimeout 50000
srvtimeout 50000
contimeout 5000
clitimeout 50000