So I've run into another version of the problem: I'm using
--page-requisites, and they're getting filtered in much the same way as
redirections. However, the new fixes don't change that behavior.
The example case is that
$ wget --mirror --convert-links --page-requisites --limit-rate=20k \
--include-directories=/assignments \
http://www.iana.org/assignments/index.html
does not fetch the CSS specified by
http://www.iana.org/assignments/index.html in
<link rel="stylesheet" media="screen" href="../_css/2015.1/screen.css"/>
which is http://www.iana.org/_css/2015.1/screen.css.
It looks like requisite URLs are flagged with link_inline_p of struct
urlpos true. If that flag is set and opt.page_requisites is set, then
test 4 of download_child is suppressed (which is the --no-parent test).
This change seems to add the same logic as is applied to redirections:
diff --git a/src/recur.c b/src/recur.c
index 1469e31..b1f9109 100644
--- a/src/recur.c
+++ b/src/recur.c
@@ -462,6 +462,12 @@ retrieve_tree (struct url *start_url_parsed, struct iri
*pi)
r = download_child (child, url_parsed, depth,
start_url_parsed, blacklist, i);
+ if (child->link_inline_p &&
+ (reason == WG_RR_LIST || reason == WG_RR_REGEX))
+ {
+ DEBUGP (("Ignoring decision for page requisite, decided
to load it.\n"));
+ reason = WG_RR_SUCCESS;
+ }
if (r == WG_RR_SUCCESS)
{
ci = iri_new ();
and it has the expected effect, the requisites for index.html are
downloaded.
I've attached a patch for this that includes an update to the manual page.
Although the update to the manual page doesn't mention the suppression
of the --no-parent test.
Dale
diff --git a/doc/wget.texi b/doc/wget.texi
index f42773e..04d1562 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -2289,7 +2289,11 @@ wget -p http://@var{site}/1.html
@end example
Note that Wget will behave as if @samp{-r} had been specified, but only
-that single page and its requisites will be downloaded. Links from that
+that single page and its requisites will be downloaded.
+(As with @samp{-r}, the @samp{--include-directories},
+@samp{--exclude-directories}, @samp{--accept-regex}, and @samp{--reject-regex}
+tests are not applied to page requisites.)
+Links from that
page to external documents will not be followed. Actually, to download
a single page and all its requisites (even if they exist on separate
websites), and make sure the lot displays properly locally, this author
diff --git a/src/recur.c b/src/recur.c
index 1469e31..fdb1d2e 100644
--- a/src/recur.c
+++ b/src/recur.c
@@ -462,6 +462,12 @@ retrieve_tree (struct url *start_url_parsed, struct iri *pi)
r = download_child (child, url_parsed, depth,
start_url_parsed, blacklist, i);
+ if (child->link_inline_p &&
+ (r == WG_RR_LIST || r == WG_RR_REGEX))
+ {
+ DEBUGP (("Ignoring decision for page requisite, decided to load it.\n"));
+ r = WG_RR_SUCCESS;
+ }
if (r == WG_RR_SUCCESS)
{
ci = iri_new ();