On 30/04/15 14:04, User Goblin wrote:
My initial idea was to parse wget's -o output and figure out which files
still need to be downloaded, and then feed them via -i when continuing the
download. This led me to the conclusion that I'd need two pieces of
functionality, (1) machine-parseable output of -o, and (2) a way to convert
a partially downloaded directory structure to links that still need
downloading.
I could work around (1), the output of -o is just hard to parse.
For (2), I could use lynx or w3m or something like that, but then I never
am sure that the links produced are the same that wget produced. Therefore
I'd love an option like `wget --extract-links ./index.html` that'd just
read an html file and produce a list of links on output. Or perhaps an
assertion that some other tool like urlscan will do it exactly the same way
as wget.
I made such program some time ago, but was never merged into wget. See
“Exposing wget functionality for extracting links from a web page”
https://lists.gnu.org/archive/html/bug-wget/2013-09/msg00079.html
0001-Moved-free_urlpos.patch no longer applies cleanly, so I'm attaching a
rebased one (it's a trivial change, though).
>From 1335548e721486cd77717c6cb938f9927e63f0fc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81ngel=20Gonz=C3=A1lez?= <[email protected]>
Date: Mon, 4 May 2015 22:37:12 +0200
Subject: [PATCH] Move free_urlpos()
---
src/html-url.c | 15 +++++++++++++++
src/retr.c | 15 ---------------
2 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/src/html-url.c b/src/html-url.c
index 0743587..143fecc 100644
--- a/src/html-url.c
+++ b/src/html-url.c
@@ -870,3 +870,18 @@ cleanup_html_url (void)
if (interesting_attributes)
hash_table_destroy (interesting_attributes);
}
+
+/* Free the linked list of urlpos. */
+void
+free_urlpos (struct urlpos *l)
+{
+ while (l)
+ {
+ struct urlpos *next = l->next;
+ if (l->url)
+ url_free (l->url);
+ xfree_null (l->local_name);
+ xfree (l);
+ l = next;
+ }
+}
diff --git a/src/retr.c b/src/retr.c
index f60da6e..0bca092 100644
--- a/src/retr.c
+++ b/src/retr.c
@@ -1180,21 +1180,6 @@ sleep_between_retrievals (int count)
}
}
-/* Free the linked list of urlpos. */
-void
-free_urlpos (struct urlpos *l)
-{
- while (l)
- {
- struct urlpos *next = l->next;
- if (l->url)
- url_free (l->url);
- xfree (l->local_name);
- xfree (l);
- l = next;
- }
-}
-
/* Rotate FNAME opt.backups times */
void
rotate_backups(const char *fname)
--
2.3.7