On Mon, 01 Aug 2016 16:32:30 +0200
Tim Ruehsen <tim.rueh...@gmx.de> wrote:

> On Saturday, July 30, 2016 9:41:48 PM CEST Matthew White wrote:
> > Hello!
> > 
> > I think that sometimes it could help to keep downloaded Metalink's files
> > which have a bad hash.
> > 
> > The default wget behaviour is to delete such files.
> > 
> > This patch provides a way to keep files which have a bad hash through the
> > option --keep-badhash. It appends the suffix .badhash to the file name,
> > except without overwriting existing files. In the latter case, an unique
> > suffix is appended after .badhash.
> > 
> > I made this patch working on the following branch:
> > master (latest 20cac2c5ab3d63aacfba35fb10878a2d490e2377)
> > git://git.savannah.gnu.org/wget.git
> > 
> > What do you think?
> 
> Hi Matthew,
> 
> good work !
> 
> While your FSF assignment is underway (PM), we can continue polishing.
> 
> Didn't test your code yet,  but from what I see, there are just these peanuts:
> 
> 1.
> +  bhash = malloc (strlen (name) + strlen (".badhash") + 1);
> +  strcat (strcpy (bhash, name), ".badhash");
> 
> Please consider using concat_strings() from util.h.
> And please leave an empty line between var declaration and code - just for 
> readability.
> 
> 2. 
> Please add 'link' and 'unlink' to bootstrap.conf. Gnulib will emulate these 
> on 
> platforms where they are not available (or have a slightly different 
> behavior). I guess we simply forgot 'unlink' when we switched to gnulib.
> 
> 3.
> The (GNU) commit messages should ideally look like:
> 
> one line of short description
> <empty line>
> file changes
> <empty line>
> long description
> 
> For example:
> Add new option --keep-badhash
> 
> * src/init.c: Add keepbadhash
> * src/main.c: Add keep-badhash
> * src/options.h: Add keep_badhash
> * doc/wget.texi: Add docs for --keep-badhash
> * src/metalink.h: Add prototypes badhash_suffix(), badhash_or_remove()
> * src/metalink.c: New functions badhash_suffix(), badhash_or_remove().
>   (retrieve_from_metalink): Call append .badhash()
> 
> With --keep-badhash, append .badhash to Metalink's files with checksum
> mismatch, except without overwriting existing files.
> 
> Without --keep-badhash, remove downloaded files with checksum mismatch
> (this conforms to the old behaviour).
> 
> [This also applies to to your other patches]
> 
> 4. 
> Please not only write 'keepbadhash', but also what you did (e.g. remove, 
> rename, add, ...), see my example above.
> 
> Those ChangeLog entries should allow finding changes / faults / regressions 
> etc. even when a versioning system is not at hand, e.g. when unpacking the 
> sources from a tarball. (Not debatable that consulting commit messages 
> directly is much more powerful.)
> 
> 5.
> You write "except without overwriting existing files", maybe you should 
> mention 
> appending a counter instead of overwriting existent files !?
> 
> 
> Regards, Tim

Thanks Tim!

I really needed your guidance. I sent the modified patches to bug-wget@gnu.org.

I believe there are more things to fix.

Consider the following after applying the attached patch:
* src/metalink.c (retrieve_from_metalink): line 124: If the download is 
interrupted (output_stream isn't NULL), and there are more urls for the same 
file (we are still in the download loop), switching to the next url should 
resume the download (instead than start it over).

In this patch I added a fix to rename/remove the fully downloaded file 
(output_stream is NULL), but there was an error (probably checksum) and we are 
still in the download loop (more urls for the same file). But as said before, 
switching to the next url should continue the download:
* src/metalink.c (retrieve_from_metalink): line 131: Rename/remove fully 
downloaded file on error

I still have to investigate the problem...

Regards,
Matthew

-- 
Matthew White <mehw.is...@inventati.org>
>From 8c29eedbb95230fc8938ad3cfbb93e84effd7960 Mon Sep 17 00:00:00 2001
From: Matthew White <mehw.is...@inventati.org>
Date: Thu, 28 Jul 2016 20:21:48 +0200
Subject: [PATCH 1/2] Add new option --keep-badhash to keep Metalink's files
 with a bad hash

* src/init.c: Add keepbadhash
* src/main.c: Add keep-badhash
* src/options.h: Add keep_badhash
* doc/wget.texi: Add docs for --keep-badhash
* src/metalink.h: Add prototypes badhash_suffix(), badhash_or_remove()
* src/metalink.c: New functions badhash_suffix(), badhash_or_remove().
  (retrieve_from_metalink): Call badhash_or_remove() on download error

With --keep-badhash, append .badhash to Metalink's files with checksum
mismatch. (retrieve_from_metalink): unique_create() may append another
suffix to avoid overwriting existing files.

Without --keep-badhash, remove downloaded files with checksum mismatch
(this conforms to the old behaviour).
---
 doc/wget.texi  |  6 ++++++
 src/init.c     |  1 +
 src/main.c     |  3 +++
 src/metalink.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
 src/metalink.h |  3 +++
 src/options.h  |  1 +
 6 files changed, 67 insertions(+), 7 deletions(-)

diff --git a/doc/wget.texi b/doc/wget.texi
index f6f0fbc..909696c 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -512,6 +512,12 @@ href if none was specified.
 Downloads files covered in local Metalink @var{file}. Metalink version 3
 and 4 are supported.
 
+@cindex keep-badhash
+@item --keep-badhash
+Keeps downloaded Metalink's files with a bad hash. It appends .badhash
+to the name of Metalink's files which have a checksum mismatch, except
+without overwriting existing files.
+
 @cindex metalink-over-http
 @item --metalink-over-http
 Issues HTTP HEAD request instead of GET and extracts Metalink metadata
diff --git a/src/init.c b/src/init.c
index 06d2e44..dc2d9a6 100644
--- a/src/init.c
+++ b/src/init.c
@@ -237,6 +237,7 @@ static const struct {
   { "input-metalink",   &opt.input_metalink,    cmd_file },
 #endif
   { "iri",              &opt.enable_iri,        cmd_boolean },
+  { "keepbadhash",      &opt.keep_badhash,      cmd_boolean },
   { "keepsessioncookies", &opt.keep_session_cookies, cmd_boolean },
   { "limitrate",        &opt.limit_rate,        cmd_bytes },
   { "loadcookies",      &opt.cookies_input,     cmd_file },
diff --git a/src/main.c b/src/main.c
index 4d69e03..57c7703 100644
--- a/src/main.c
+++ b/src/main.c
@@ -343,6 +343,7 @@ static struct cmdline_option option_data[] =
     { "input-metalink", 0, OPT_VALUE, "input-metalink", -1 },
 #endif
     { "iri", 0, OPT_BOOLEAN, "iri", -1 },
+    { "keep-badhash", 0, OPT_BOOLEAN, "keepbadhash", -1 },
     { "keep-session-cookies", 0, OPT_BOOLEAN, "keepsessioncookies", -1 },
     { "level", 'l', OPT_VALUE, "reclevel", -1 },
     { "limit-rate", 0, OPT_VALUE, "limitrate", -1 },
@@ -703,6 +704,8 @@ Download:\n"),
        --unlink                    remove file before clobber\n"),
 #ifdef HAVE_METALINK
     N_("\
+       --keep-badhash              keep files with checksum mismatch (append .badhash)\n"),
+    N_("\
        --metalink-over-http        use Metalink metadata from HTTP response headers\n"),
     N_("\
        --preferred-location        preferred location for Metalink resources\n"),
diff --git a/src/metalink.c b/src/metalink.c
index 5d7e9b2..818c052 100644
--- a/src/metalink.c
+++ b/src/metalink.c
@@ -119,13 +119,20 @@ retrieve_from_metalink (const metalink_t* metalink)
           retr_err = METALINK_RETR_ERROR;
 
           /* If output_stream is not NULL, then we have failed on
-             previous resource and are retrying. Thus, remove the file.  */
+             previous resource and are retrying. Thus, rename/remove
+             the file.  */
           if (output_stream)
             {
               fclose (output_stream);
               output_stream = NULL;
-              if (unlink (filename))
-                logprintf (LOG_NOTQUIET, "unlink: %s\n", strerror (errno));
+              badhash_or_remove (filename);
+              xfree (filename);
+            }
+          else if (filename)
+            {
+              /* Rename/remove the file downloaded previously before
+                 downloading it again.  */
+              badhash_or_remove (filename);
               xfree (filename);
             }
 
@@ -532,15 +539,13 @@ gpg_skip_verification:
 #endif
       last_retr_err = retr_err == RETROK ? last_retr_err : retr_err;
 
-      /* Remove the file if error encountered or if option specified.
+      /* Rename the file if error encountered; remove if option specified.
          Note: the file has been downloaded using *_loop. Therefore, it
          is not necessary to keep the file for continuated download.  */
       if ((retr_err != RETROK || opt.delete_after)
            && filename != NULL && file_exists_p (filename))
         {
-          logprintf (LOG_VERBOSE, _("Removing %s.\n"), quote (filename));
-          if (unlink (filename))
-            logprintf (LOG_NOTQUIET, "unlink: %s\n", strerror (errno));
+          badhash_or_remove (filename);
         }
       if (output_stream)
         {
@@ -558,6 +563,47 @@ gpg_skip_verification:
   return last_retr_err;
 }
 
+/* Append the suffix ".badhash" to the file NAME, except without
+   overwriting an existing file with that name and suffix.  */
+void
+badhash_suffix (char *name)
+{
+  char *bhash, *uname;
+
+  bhash = concat_strings (name, ".badhash", (char *)0);
+  uname = unique_name (bhash, false);
+
+  logprintf (LOG_VERBOSE, _("Renaming ‘%s’ to ‘%s’.\n"), name, uname);
+
+  if (link (name, uname))
+    logprintf (LOG_NOTQUIET, "link: %s\n", strerror (errno));
+  else if (unlink (name))
+    logprintf (LOG_NOTQUIET, "unlink: %s\n", strerror (errno));
+
+  xfree (bhash);
+  xfree (uname);
+}
+
+/* Append the suffix ".badhash" to the file NAME, except without
+   overwriting an existing file with that name and suffix.
+
+   Remove the file NAME if the option --delete-after is specified, or
+   if the option --keep-badhash isn't set.  */
+void
+badhash_or_remove (char *name)
+{
+  if (opt.delete_after || !opt.keep_badhash)
+    {
+      logprintf (LOG_VERBOSE, _("Removing %s.\n"), quote (name));
+      if (unlink (name))
+        logprintf (LOG_NOTQUIET, "unlink: %s\n", strerror (errno));
+    }
+  else
+    {
+      badhash_suffix(name);
+    }
+}
+
 int metalink_res_cmp (const void* v1, const void* v2)
 {
   const metalink_resource_t *res1 = *(metalink_resource_t **) v1,
diff --git a/src/metalink.h b/src/metalink.h
index e98c210..020fdf5 100644
--- a/src/metalink.h
+++ b/src/metalink.h
@@ -47,6 +47,9 @@ uerr_t retrieve_from_metalink (const metalink_t *metalink);
 
 int metalink_res_cmp (const void *res1, const void *res2);
 
+void badhash_suffix (char *name);
+void badhash_or_remove (char *name);
+
 bool find_key_value (const char *start,
                      const char *end,
                      const char *key,
diff --git a/src/options.h b/src/options.h
index b2e31a8..63b9bba 100644
--- a/src/options.h
+++ b/src/options.h
@@ -260,6 +260,7 @@ struct options
   bool cookies;                 /* whether cookies are used. */
   char *cookies_input;          /* file we're loading the cookies from. */
   char *cookies_output;         /* file we're saving the cookies to. */
+  bool keep_badhash;            /* Keep files with checksum mismatch. */
   bool keep_session_cookies;    /* whether session cookies should be
                                    saved and loaded. */
 
-- 
2.7.3

>From 81885f5af2d13ff398006585d5160461bb6cf645 Mon Sep 17 00:00:00 2001
From: Matthew White <mehw.is...@inventati.org>
Date: Mon, 1 Aug 2016 18:27:32 +0200
Subject: [PATCH 2/2] Add 'link' and 'unlink' to bootstrap.conf

* bootstrap.conf: Add 'link' and 'unlink'

Gnulib will emulate these on platforms where they are not available
(or have a slightly different behavior).
---
 bootstrap.conf | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/bootstrap.conf b/bootstrap.conf
index 6d73192..6824def 100644
--- a/bootstrap.conf
+++ b/bootstrap.conf
@@ -56,6 +56,7 @@ ioctl
 iconv
 iconv-h
 langinfo
+link
 listen
 maintainer-makefile
 mbiter
@@ -90,6 +91,7 @@ strtok_r
 strtoll
 timegm
 tmpdir
+unlink
 unlocked-io
 update-copyright
 vasprintf
-- 
2.7.3

Reply via email to