Bug#517534: Potential HTTP Timeout race condition may cause 'Zero Sized Reply' error

Steven Chamberlain Sat, 28 Feb 2009 04:57:39 -0800

Package: squid3
Version: 3.0.PRE5-5
Severity: important

Hi,


I intermittently see 'Zero Sized Reply' error pages, generated by Squid,
during web browsing.  I'm using only the default configuration except
for some ACLs allowing clients to connect.  This is with the latest
version of the squid3 package in the current 'etch' (oldstable)
distribution.  I'm not sure if later versions of the package are affected.

After some investigation, I believe this is due to a race condition
which can affect connections to webservers with Keep-Alive enabled.
After such a webserver responds to an HTTP request, it will keep the
connection open, usually (in the case of Apache at least) until some
timeout is reached.

If Squid uses this open connection to make a further HTTP request just
as the webserver reaches this timeout, the connection could be closed by
by remote end before the new request was received.  When Squid sees the
connection close, it produces the 'Zero Sized Reply' error.

This could be a widespread problem because many webservers do enable the
Keep-Alive setting and have timeouts configured, as this seems a
sensible way to configure a webserver.

I believe the race condition is more likely to occur when the
webserver's timeout value is short, and/or when the connection between
the proxy server and webserver has a high latency.

I see this race condition as something of a limitation of HTTP, and I
see no alternative but to enable the retry_on_error setting in
squid.conf as a workaround (which I have *not fully tested*).

Perhaps there is a way for Squid to see if the HTTP request was ACK'd by
the FIN packet from the webserver trying to close the connection?

I am concerned that retry_on_error may have side effects in cases where
the initial HTTP requests *was* in fact received and acted upon by the
webserver, but the response didn't make it back to the proxy.  When the
proxy server re-sends the request, some transaction may be processed
twice, but I think this is no different than a website visitor clicking
a link or button twice by accident, or refreshing the page manually upon
seeing an error message.  So, I'd argue that web-based applications
should be designed to handle this situation anyway.

Below is a packet dump showing part of an HTTP connection where this
behaviour occurred.  The webserver has Keep-Alive enabled, and had
already handled a number of queries during this connection.  Manual
testing with 'telnet' suggested that the server's timeout between HTTP
requests was 5 seconds.


11:08:00.137905 IP 82.70.43.22.59378 > 173.66.90.19.8080:
P 4388:5182(794) ack 9626 win 55 <nop,nop,timestamp 453885651 2700681>
  (proxy sends HTTP request)

11:08:00.269235 IP 173.66.90.19.8080 > 82.70.43.22.59378:
. 9626:11030(1404) ack 5182 win 65535 <nop,nop,timestamp 2700720 453885651>
  (webserver ACKs the request,
   and sends an HTTP response, part 1 of 2,
    which includes a Connection: Keep-Alive header)

11:08:00.269427 IP 82.70.43.22.59378 > 173.66.90.19.8080:
. ack 11030 win 61 <nop,nop,timestamp 453885783 2700720>
  (proxy ACKs HTTP response part 1 of 2)

11:08:00.271284 IP 173.66.90.19.8080 > 82.70.43.22.59378:
P 11030:11144(114) ack
                                                 5182 win 65535
<nop,nop,timestamp 2700720 453885651>
  (webserver ACKs the previous ACK,
   and sends part 2 of 2 of the HTTP response)

11:08:00.271405 IP 82.70.43.22.59378 > 173.66.90.19.8080:
. ack 11144 win 61 <nop,nop,timestamp 453885785 2700720>
  (proxy ACKs part 2 of 2 of the HTTP response)

11:08:00.411658 IP 173.66.90.19.8080 > 82.70.43.22.59378:
. ack 5182 win 65535 <nop,nop,timestamp 2700722 453885785>
  (webserver ACKs the proxy's first ACK from earlier)

11:08:05.733992 IP 82.70.43.22.59378 > 173.66.90.19.8080:
P 5182:5909(727) ack 11144 win 61 <nop,nop,timestamp 453891247 2700722>
  (proxy sends a new HTTP request, ~5 seconds later)

11:08:05.789920 IP 173.66.90.19.8080 > 82.70.43.22.59378:
F 11144:11144(0) ack 5182 win 65535 <nop,nop,timestamp 2700776 453885785>
  (webserver ACKs the proxy's second ACK from earlier,
   and tries closed the connection --
    this packet was probably sent around 100-140ms earlier, so the new
    HTTP request wouldn't have arrived yet)

11:08:05.791683 IP 82.70.43.22.59378 > 173.66.90.19.8080:
F 5909:5909(0) ack 11145 win 61 <nop,nop,timestamp 453891305 2700776>
  (proxy agrees to close the connection)

11:08:05.897023 IP 173.66.90.19.8080 > 82.70.43.22.59378: . ack 5910 win
64808 <nop,nop,timestamp 2700777 453891247>
  (webserver ACKs the second HTTP request, which was received while the
    connection was still half-open;
   this ACK also confirms to the proxy that the server is now closed)


Squid generated the following corresponding access.log entry:

1235819285.792     58 192.168.0.9 TCP_MISS/502 2606
GET http://...:8080/... steven/steven NONE/- text/html


The following message was sent to the proxy's client:

ERROR
The requested URL could not be retrieved

While trying to retrieve the URL: http://...:8080/...

The following error was encountered:

    * Zero Sized Reply

Squid did not receive any data for this request.

Your cache administrator is webmaster.
Generated Sat, 28 Feb 2009 11:08:05 GMT by localhost (squid/3.0.PRE5)


Regards,
-- 
Steven Chamberlain
ste...@pyro.eu.org




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#517534: Potential HTTP Timeout race condition may cause 'Zero Sized Reply' error

Reply via email to