On Wednesday, August 24, 2016 8:21:31 PM CEST Dale R. Worley wrote:
> This is the change that I'm interested in.  I don't expect this to be
> put into the distribution without a lot of discussion.
>
> This version changes the behavior of --recurse:  If a file is
> downloaded, it will be scanned for links to follow.  This differs from
> the current behavior, in which the URL from which the contents were
> obtained (after any redirections) is further checked to see if that URL
> passes the recursion limitations.
>
> This patch also includes a test to verify the new behavior.
>
> I worry that this is a substantial change of behavior.  OTOH, the
> current behavior seems to be very unintuitive.  And the fact that there
> is no test for this behavior suggests that people have not been
> depending on it.
>
> Comments?

Here is a less invasive patch for review & discussion.

WDYT ?

Regards, Tim
From e5164a8260139a3bb64308350e16b37e454d942a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tim Rühsen?= <tim.rueh...@gmx.de>
Date: Fri, 7 Oct 2016 11:41:34 +0200
Subject: [PATCH] Amend redirection behavior

* src/recur.c (descend_redirect): Ignore WG_RR_LIST and WG_RR_REGEX
  for redirections.
* testenv/Makefile.am: Add Test-recursive-redirect.py
* testenv/Test-recursive-redirect.py: New test

Test-recursive-redirect.py written by Dale R. Worley.

Reported-by: "Dale R. Worley" <wor...@ariadne.com>
---
 src/recur.c                        |  6 ++++
 testenv/Makefile.am                |  1 +
 testenv/Test-recursive-redirect.py | 64 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 71 insertions(+)
 create mode 100644 testenv/Test-recursive-redirect.py

diff --git a/src/recur.c b/src/recur.c
index a195dd4..1469e31 100644
--- a/src/recur.c
+++ b/src/recur.c
@@ -806,6 +806,12 @@ descend_redirect (const char *redirected, struct url *orig_parsed, int depth,

   if (reason == WG_RR_SUCCESS)
     blacklist_add (blacklist, upos->url->url);
+  else if (reason == WG_RR_LIST || reason == WG_RR_REGEX)
+    {
+      DEBUGP (("Ignoring decision for redirects, decided to load it.\n"));
+      blacklist_add (blacklist, upos->url->url);
+      reason = WG_RR_SUCCESS;
+    }
   else
     DEBUGP (("Redirection \"%s\" failed the test.\n", redirected));

diff --git a/testenv/Makefile.am b/testenv/Makefile.am
index 8f61907..3febec7 100644
--- a/testenv/Makefile.am
+++ b/testenv/Makefile.am
@@ -105,6 +105,7 @@ if HAVE_PYTHON3
     Test-Post.py                                    \
     Test-recursive-basic.py                         \
     Test-recursive-include.py                       \
+    Test-recursive-redirect.py                      \
     Test-redirect.py                                \
     Test-redirect-crash.py                          \
     Test--rejected-log.py                           \
diff --git a/testenv/Test-recursive-redirect.py b/testenv/Test-recursive-redirect.py
new file mode 100644
index 0000000..8a114a5
--- /dev/null
+++ b/testenv/Test-recursive-redirect.py
@@ -0,0 +1,64 @@
+#!/usr/bin/env python3
+from sys import exit
+from test.http_test import HTTPTest
+from test.base_test import HTTP, HTTPS
+from misc.wget_file import WgetFile
+
+"""
+    Basic test of --recursive.
+"""
+############# File Definitions ###############################################
+File1 = """<html><body>
+<a href=\"/a/File2.html\">text</a>
+<a href=\"/b/File3.html\">text</a>
+</body></html>"""
+File2 = "With lemon or cream?"
+File3 = "Surely you're joking Mr. Feynman"
+
+File1_rules = {
+    "Response"          : 301,
+    "SendHeader"        : {"Location" : "/b/File1.html"}
+}
+
+File1_File = WgetFile ("a/File1.html", "", rules=File1_rules)
+File1_Redirected = WgetFile ("b/File1.html", File1)
+File1_Retrieved = WgetFile ("a/File1.html", File1)
+File2_File = WgetFile ("a/File2.html", File2)
+File3_File = WgetFile ("b/File3.html", File3)
+
+WGET_OPTIONS = "--recursive --no-host-directories --include-directories=a"
+WGET_URLS = [["a/File1.html"]]
+
+Servers = [HTTP]
+
+Files = [[File1_Redirected, File1_File, File2_File, File3_File]]
+Existing_Files = []
+
+ExpectedReturnCode = 0
+ExpectedDownloadedFiles = [File1_Retrieved, File2_File]
+Request_List = [["GET /a/File1.html",
+                 "GET /a/File2.html",
+                 "GET /b/File3.html"]]
+
+################ Pre and Post Test Hooks #####################################
+pre_test = {
+    "ServerFiles"       : Files,
+    "LocalFiles"        : Existing_Files
+}
+test_options = {
+    "WgetCommands"      : WGET_OPTIONS,
+    "Urls"              : WGET_URLS
+}
+post_test = {
+    "ExpectedFiles"     : ExpectedDownloadedFiles,
+    "ExpectedRetcode"   : ExpectedReturnCode
+}
+
+err = HTTPTest (
+                pre_hook=pre_test,
+                test_params=test_options,
+                post_hook=post_test,
+                protocols=Servers
+).begin ()
+
+exit (err)
--
2.9.3

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to