Hi,

   We have a setup where there's haproxy (1.4.15) in front of a few apache
servers. We have a test suite that occasionally fails with 502 when going
through haproxy (this doesn't happen when going directly to one of the
apache servers). I was able to have a tcp capture while that was happening,
and I do see some strange behavior. From what I can see haproxy sends a
request to the apache server, then hangs up the tcp connection (while
response is arriving), and reopens a second tcp connection on which it
sends nothing. After 20 seconds apache closes down the second connection
and haproxy sends 502 to the client.

The haproxy log shows the following:

May 17 14:03:20 localhost haproxy[3695]:
10.214.131.22:38909[17/May/2012:14:02:59.347] api-http-ipv4
radosgw-http/plana93-1
0/0/2/-1/21020 502 204 - - SH-- 0/0/0/0/1 0/0 "GET
/jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1"

Which means, if I decipher it correctly, that the server hung the
connection after more than 20 seconds. However, looking at the server logs
I couldn't find evidence for that happening. I do see the server responding
immediately.
I was taking a tcpdump while that was happening, and I do see some strange
behavior. The haproxy log shows the following (captured on the haproxy
machine):

10.214.131.22 client
10.214.131.25 haproxy
10.214.133.23 apache


1. client <-> haproxy

 Client sends request to haproxy

18073   2.787839 10.214.131.22 -> 10.214.131.25 TCP 38909 > http [SYN]
Seq=0 Win=14600 Len=0 MSS=1460 TSV=25089868 TSER=0 WS=7
18074   2.787867 10.214.131.25 -> 10.214.131.22 TCP http > 38909 [SYN, ACK]
Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=25177841 TSER=25089868 WS=7
18075   2.787948 10.214.131.22 -> 10.214.131.25 TCP 38909 > http [ACK]
Seq=1 Ack=1 Win=14720 Len=0 TSV=25089868 TSER=25177841
18076   2.788028 10.214.131.22 -> 10.214.131.25 HTTP GET
/jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1


2. haproxy <-> apache (connection 1)

haproxy sends request to apache

18077   2.788185 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [SYN]
Seq=0 Win=14600 Len=0 MSS=1460 TSV=25177841 TSER=0 WS=7
18078   2.789498 10.214.133.23 -> 10.214.131.25 TCP http > 39579 [SYN, ACK]
Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=248773204 TSER=25177841 WS=7
18079   2.789530 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [ACK]
Seq=1 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204
18080   2.789576 10.214.131.25 -> 10.214.133.23 HTTP GET
/jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1
18081   2.789605 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [FIN, ACK]
Seq=333 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204

3. haproxy <-> apache (connection 2)

haproxy creates a second tcp connection to the same apache server

18082   2.789650 10.214.131.25 -> 10.214.133.23 TCP 39580 > http [SYN]
Seq=0 Win=14600 Len=0 MSS=1460 TSV=25177841 TSER=0 WS=7
18083   2.790896 10.214.133.23 -> 10.214.131.25 TCP http > 39580 [SYN, ACK]
Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=248773204 TSER=25177841 WS=7
18084   2.790926 10.214.131.25 -> 10.214.133.23 TCP 39580 > http [ACK]
Seq=1 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204

4. haproxy <-> apache (connection 1)

apache returns http header + part of the data. HTTP header is valid.

18085   2.790938 10.214.133.23 -> 10.214.131.25 TCP http > 39579 [ACK]
Seq=1 Ack=333 Win=15616 Len=0 TSV=248773204 TSER=25177841
18086   2.797494 10.214.133.23 -> 10.214.131.25 HTTP HTTP/1.1 200 OK
 (application/octet-stream)

haproxy sends reset on the tcp connection
18087   2.797525 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [RST]
Seq=334 Win=0 Len=0

apache sends more data
18088   2.797545 10.214.133.23 -> 10.214.131.25 HTTP Continuation or
non-HTTP traffic

haproxy sends another reset
18089   2.797558 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [RST]
Seq=334 Win=0 Len=0

5. haproxy -> client

ack for the previous packet

18090   2.825478 10.214.131.25 -> 10.214.131.22 TCP http > 38909 [ACK]
Seq=1 Ack=282 Win=15616 Len=0 TSV=25177845 TSER=25089868

...

6. haproxy <-> apache (connection 2)

haproxy resending SYN, ACK (after a second): could it be that it lost the
first ACK response?

18208   3.788175 10.214.133.23 -> 10.214.131.25 TCP http > 39580 [SYN, ACK]
Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=248773304 TSER=25177841 WS=7
18209   3.788197 10.214.131.25 -> 10.214.133.23 TCP [TCP Dup ACK 18084#1]
39580 > http [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSV=25177941 TSER=248773304
SLE=0 SRE=1

...
<~20 seconds later>

7. haproxy <-> apache (connection 2)

apache hangs up the connection as haproxy didn't send anything on it

18557  23.808745 10.214.133.23 -> 10.214.131.25 TCP http > 39580 [FIN, ACK]
Seq=1 Ack=1 Win=14592 Len=0 TSV=248775306 TSER=25177941
18558  23.808876 10.214.131.25 -> 10.214.133.23 TCP 39580 > http [RST, ACK]
Seq=1 Ack=2 Win=14720 Len=0 TSV=25179943 TSER=248775306

haproxy <-> client

close connection


18559  23.808917 10.214.131.25 -> 10.214.131.22 HTTP HTTP/1.0 502 Bad
Gateway  (text/html)
18560  23.809737 10.214.131.22 -> 10.214.131.25 TCP 38909 > http [FIN, ACK]
Seq=282 Ack=206 Win=15744 Len=0 TSV=25091970 TSER=25179943



HTTP request header

GET /jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1
Host: plana15.front.sepia.ceph.com
Accept-Encoding: identity
Date: Thu, 17 May 2012 18:02:59 GMT
Content-Length: 0
Authorization: AWS MDM21HVPZOL66IC8OIOF:2pAaDJ8w5YLi5ZAwL7f7ZGxMNPo=
User-Agent: Boto/2.3.0 (linux2)
X-Forwarded-For: 10.214.131.22
Connection: close

HTTP response header (apache -> haproxy on connection 1):

HTTP/1.1 200 OK
Date: Thu, 17 May 2012 18:02:59 GMT
Server: Apache/2.2.20 (Ubuntu)
Accept-Ranges: bytes
Last-Modified: Thu, 17 May 2012 18:02:59 GMT
ETag: "3580afcd8aaf507ab9860473ab48186b"
Content-Length: 4194304
Connection: close
Content-Type: application/octet-stream


We also see the same symptoms on a different environment that runs 1.5dev7,
though I didn't take any network capture so it might be a different issue.

Thanks,
Yehuda

And here is the haproxy config:

global
log 127.0.0.1 local0
 log 127.0.0.1 local1 notice
maxconn 4096
daemon

defaults
log global
mode http
option httplog
 option dontlognull
retries 3
option redispatch
 maxconn 2000
contimeout 5000

frontend api-http-ipv4
 mode http
bind 10.214.131.25:80
default_backend radosgw-http
 option forwardfor
option httpclose
clitimeout 50000
 reqidel ^X-Forwarded-For:.*

backend radosgw-http
balance leastconn
 srvtimeout 50000
        http-check expect ! rstatus ^5
        option httpchk HEAD /
        timeout check 6
        timeout connect 2
        timeout http-request 3
server plana88-1 10.214.133.28:80 check inter 2000 rise 2 fall 5
server plana88-2 10.214.133.28:81 check inter 2000 rise 2 fall 5
 server plana93-1 10.214.133.23:80 check inter 2000 rise 2 fall 5
server plana93-2 10.214.133.23:81 check inter 2000 rise 2 fall 5

listen stats 0.0.0.0:9000
mode http
stats enable
 stats uri /
stats refresh 15sec
stats show-desc clitimeout 50000
 srvtimeout 50000
contimeout 5000
clitimeout 50000

Reply via email to