On Tuesday, February 14, 2017 2:34:27 PM CET Adam Sampson wrote:
> Hi wget maintainers,
>
> I've just built wget-1.19.1 on several GNU/Linux machines, and found
> that Test-504.py fails sometimes on a slow-ish ARMv7 system with Python
> 3.6.0. Attached are tcpdump traces of the test succeeding and failing on
> the same machine, and the log from the failure.
>
> In this test, wget makes three requests against the test HTTP server,
> with the first two getting 504 responses and the third succeeding.
> Looking at the trace, the 504 responses don't look correct: the server
> is operating in pipelined mode, but the 504 response contains no
> Content-Length header, so there's no way for wget to know where the body
> of the response ends (other than the connection being closed; RFC 2616
> section 4.4).
>
> The test succeeds when wget manages to read the headers and body of the
> second response, closes the connection in disgust, and opens a new
> connection for the final request (which is probably not what the test
> author intended). It fails when wget sends its third request before it's
> seen the body of the second response, and then tries to parse the body
> as the response to the third message; the test output then includes:
>
>   200 No headers, assuming HTTP/0.9
>
> and wget waits until it times out.
>
> I think the right fix here is to have the test server send a proper
> Content-Length header in its 504 response?

Thanks for reporting.

Well, the wget 504 code circumvents most other checks normally done on non-2xx
status codes (incl. body download and saving to WARC files). That seems like
improper handling.

I moved the 504 code to where it belongs (IMO) and created the attached patch.
Please test and report back if it works for you.

Regards, Tim
From 8b5e487bf38b0dcd8d2fc1002e486f418bf52f20 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tim Rühsen?= <tim.rueh...@gmx.de>
Date: Tue, 14 Feb 2017 16:20:26 +0100
Subject: [PATCH] Fix 504 status handling

---
 src/http.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/src/http.c b/src/http.c
index 898e1841..788a29ff 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3476,7 +3476,7 @@ gethttp (const struct url *u, struct url *original_url, struct http_stat *hs,

 #ifdef HAVE_METALINK
   /* We need to check for the Metalink data in the very first response
-     we get from the server (before redirectionrs, authorization, etc.).  */
+     we get from the server (before redirections, authorization, etc.).  */
   if (metalink)
     {
       hs->metalink = metalink_from_http (resp, hs, u);
@@ -3496,7 +3496,7 @@ gethttp (const struct url *u, struct url *original_url, struct http_stat *hs,
       uerr_t auth_err = RETROK;
       bool retry;
       /* Normally we are not interested in the response body.
-         But if we are writing a WARC file we are: we like to keep everyting.  */
+         But if we are writing a WARC file we are: we like to keep everything.  */
       if (warc_enabled)
         {
           int _err;
@@ -3556,6 +3556,7 @@ gethttp (const struct url *u, struct url *original_url, struct http_stat *hs,
         pconn.authorized = true;
     }

+/*
   if (statcode == HTTP_STATUS_GATEWAY_TIMEOUT)
     {
       hs->len = 0;
@@ -3568,7 +3569,7 @@ gethttp (const struct url *u, struct url *original_url, struct http_stat *hs,
       retval = GATEWAYTIMEOUT;
       goto cleanup;
     }
-
+*/

   {
     uerr_t ret = check_file_output (u, hs, resp, hdrval, sizeof hdrval);
@@ -3910,8 +3911,8 @@ gethttp (const struct url *u, struct url *original_url, struct http_stat *hs,
               retval = _err;
               goto cleanup;
             }
-          else
-            CLOSE_FINISH (sock);
+
+          CLOSE_FINISH (sock);
         }
       else
         {
@@ -3934,7 +3935,14 @@ gethttp (const struct url *u, struct url *original_url, struct http_stat *hs,
             CLOSE_INVALIDATE (sock);
         }

-      retval = RETRFINISHED;
+      if (statcode == HTTP_STATUS_GATEWAY_TIMEOUT)
+        {
+          /* xfree (hs->message); */
+          retval = GATEWAYTIMEOUT;
+        }
+      else
+        retval = RETRFINISHED;
+
       goto cleanup;
     }

--
2.11.0

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to