On Thu, 04 Aug 2016 17:08:20 +0200
Tim Ruehsen <tim.rueh...@gmx.de> wrote:

> On Thursday, August 4, 2016 12:40:49 PM CEST Matthew White wrote:
> > On Thu, 04 Aug 2016 12:08:50 +0200
> > 
> > Tim Ruehsen <tim.rueh...@gmx.de> wrote:
> > > On Wednesday, August 3, 2016 2:40:14 PM CEST Matthew White wrote:
> > > > On Wed, 03 Aug 2016 14:05:06 +0200
> > > > 
> > > > Tim Ruehsen <tim.rueh...@gmx.de> wrote:
> > > > > On Tuesday, August 2, 2016 11:27:28 AM CEST Matthew White wrote:
> > > > > > On Mon, 01 Aug 2016 16:32:30 +0200
> > > > > > 
> > > > > > Tim Ruehsen <tim.rueh...@gmx.de> wrote:
> > > > > > > On Saturday, July 30, 2016 9:41:48 PM CEST Matthew White wrote:
> > > > > > > > Hello!
> > > > > > > > 
> > > > > > > > I think that sometimes it could help to keep downloaded
> > > > > > > > Metalink's
> > > > > > > > files
> > > > > > > > which have a bad hash.
> > > > > > > > 
> > > > > > > > The default wget behaviour is to delete such files.
> > > > > > > > 
> > > > > > > > This patch provides a way to keep files which have a bad hash
> > > > > > > > through
> > > > > > > > the
> > > > > > > > option --keep-badhash. It appends the suffix .badhash to the
> > > > > > > > file
> > > > > > > > name,
> > > > > > > > except without overwriting existing files. In the latter case,
> > > > > > > > an
> > > > > > > > unique
> > > > > > > > suffix is appended after .badhash.
> > > > > > > > 
> > > > > > > > I made this patch working on the following branch:
> > > > > > > > master (latest 20cac2c5ab3d63aacfba35fb10878a2d490e2377)
> > > > > > > > git://git.savannah.gnu.org/wget.git
> > > > > > > > 
> > > > > > > > What do you think?
> > > > > > > 
> > > > > > > Hi Matthew,
> > > > > > > 
> > > > > > > good work !
> > > > > > > 
> > > > > > > While your FSF assignment is underway (PM), we can continue
> > > > > > > polishing.
> > > > > > > 
> > > > > > > Didn't test your code yet,  but from what I see, there are just
> > > > > > > these
> > > > > > > peanuts:
> > > > > > > 
> > > > > > > 1.
> > > > > > > +  bhash = malloc (strlen (name) + strlen (".badhash") + 1);
> > > > > > > +  strcat (strcpy (bhash, name), ".badhash");
> > > > > > > 
> > > > > > > Please consider using concat_strings() from util.h.
> > > > > > > And please leave an empty line between var declaration and code -
> > > > > > > just
> > > > > > > for
> > > > > > > readability.
> > > > > > > 
> > > > > > > 2.
> > > > > > > Please add 'link' and 'unlink' to bootstrap.conf. Gnulib will
> > > > > > > emulate
> > > > > > > these on platforms where they are not available (or have a
> > > > > > > slightly
> > > > > > > different behavior). I guess we simply forgot 'unlink' when we
> > > > > > > switched
> > > > > > > to gnulib.
> > > > > > > 
> > > > > > > 3.
> > > > > > > The (GNU) commit messages should ideally look like:
> > > > > > > 
> > > > > > > one line of short description
> > > > > > > <empty line>
> > > > > > > file changes
> > > > > > > <empty line>
> > > > > > > long description
> > > > > > > 
> > > > > > > For example:
> > > > > > > Add new option --keep-badhash
> > > > > > > 
> > > > > > > * src/init.c: Add keepbadhash
> > > > > > > * src/main.c: Add keep-badhash
> > > > > > > * src/options.h: Add keep_badhash
> > > > > > > * doc/wget.texi: Add docs for --keep-badhash
> > > > > > > * src/metalink.h: Add prototypes badhash_suffix(),
> > > > > > > badhash_or_remove()
> > > > > > > * src/metalink.c: New functions badhash_suffix(),
> > > > > > > badhash_or_remove().
> > > > > > > 
> > > > > > >   (retrieve_from_metalink): Call append .badhash()
> > > > > > > 
> > > > > > > With --keep-badhash, append .badhash to Metalink's files with
> > > > > > > checksum
> > > > > > > mismatch, except without overwriting existing files.
> > > > > > > 
> > > > > > > Without --keep-badhash, remove downloaded files with checksum
> > > > > > > mismatch
> > > > > > > (this conforms to the old behaviour).
> > > > > > > 
> > > > > > > [This also applies to to your other patches]
> > > > > > > 
> > > > > > > 4.
> > > > > > > Please not only write 'keepbadhash', but also what you did (e.g.
> > > > > > > remove,
> > > > > > > rename, add, ...), see my example above.
> > > > > > > 
> > > > > > > Those ChangeLog entries should allow finding changes / faults /
> > > > > > > regressions
> > > > > > > etc. even when a versioning system is not at hand, e.g. when
> > > > > > > unpacking
> > > > > > > the
> > > > > > > sources from a tarball. (Not debatable that consulting commit
> > > > > > > messages
> > > > > > > directly is much more powerful.)
> > > > > > > 
> > > > > > > 5.
> > > > > > > You write "except without overwriting existing files", maybe you
> > > > > > > should
> > > > > > > mention appending a counter instead of overwriting existent files
> > > > > > > !?
> > > > > > > 
> > > > > > > 
> > > > > > > Regards, Tim
> > > > > > 
> > > > > > Thanks Tim!
> > > > > > 
> > > > > > I really needed your guidance. I sent the modified patches to
> > > > > > bug-wget@gnu.org.
> > > > > > 
> > > > > > I believe there are more things to fix.
> > > > > > 
> > > > > > Consider the following after applying the attached patch:
> > > > > > * src/metalink.c (retrieve_from_metalink): line 124: If the download
> > > > > > is
> > > > > > interrupted (output_stream isn't NULL), and there are more urls for
> > > > > > the
> > > > > > same file (we are still in the download loop), switching to the next
> > > > > > url
> > > > > > should resume the download (instead than start it over).
> > > > > > 
> > > > > > In this patch I added a fix to rename/remove the fully downloaded
> > > > > > file
> > > > > > (output_stream is NULL), but there was an error (probably checksum)
> > > > > > and
> > > > > > we
> > > > > > are still in the download loop (more urls for the same file). But as
> > > > > > said
> > > > > > before, switching to the next url should continue the download: *
> > > > > > src/metalink.c (retrieve_from_metalink): line 131: Rename/remove
> > > > > > fully
> > > > > > downloaded file on error
> > > > > > 
> > > > > > I still have to investigate the problem...
> > > > > 
> > > > > So, I wait with 0001-... !?
> > > > > 
> > > > > 0002-... has been pushed (extended with 'symlink').
> > > > > 
> > > > > Thanks for your contribution.
> > > > > 
> > > > > Tim
> > > > 
> > > > Ok for 0002. Thanks.
> > > > 
> > > > There's no problem applying the 0001.
> > > 
> > > Applied and pushed !
> > > 
> > > > The "if the stream got interrupted, then restart the download with the
> > > > next
> > > > url" (output_stream isn't NULL) was already there before the patch 0001.
> > > > 
> > > > [debug is required to know when output_stream isn't NULL]
> > > > 
> > > > commit 7fad76db4cdb7a0fe7e5aa0dd88f5faaf8f4cdc8
> > > > * src/metalink.c (retrieve_from_metalink): line 124: 'if
> > > > (output_stream)'
> > > > remove file and restart download with next url
> > > > 
> > > > With the 0001, if --keep-badhash is used the file is renamed instead
> > > > than
> > > > removed, even when "output_stream is NULL".
> > > > 
> > > > I have to look through the "stream interrupted" situation.
> > > > 
> > > > I'm guessing that the download is not resumed.
> > > > 
> > > > What do you suggest?
> > > 
> > > We need a  document where we define wanted behavior, create tests and
> > > amend the code.
> > > 
> > > Regards, Tim
> > 
> > Ok, I just documented a test. See the attached tarball.
> > 
> > Also there is a patch attached (there's a copy in the tarball too),
> > providing the suggested bugfix.
> 
> I made that test without your patch => crash due to badhash_or_remove(NULL).
> With your patch, valgrind says:
> 
> ==15068== Syscall param open(filename) points to unaddressable byte(s)
> ==15068==    at 0x6848F40: __open_nocancel (syscall-template.S:84)
> ==15068==    by 0x67DFF7D: _IO_file_open (fileops.c:221)
> ==15068==    by 0x67E009F: _IO_file_fopen@@GLIBC_2.2.5 (fileops.c:328)
> ==15068==    by 0x67D4F03: __fopen_internal (iofopen.c:86)
> ==15068==    by 0x434FAE: retrieve_from_metalink (metalink.c:184)
> ==15068==    by 0x406C63: main (main.c:2025)
> ==15068==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==15068== 
> 
> Can you reproduce this ?
> 
> Tim

Thanks Tim for spotting that out. I did some tests.

Please, don't push the patches, let me understand the bugs first.

Vanilla commit e3fb4c3859d2705fb2de80801b0bba2b64bea1a1:
* Doesn't understand metalink:file "path/file" name format
* Downloads "path/file" as "file" in the working directory
* Fails the "path/file" checksum (cannot find "path/file")

Vanilla commit, test.meta4: name="bash-4.3-patches/bash43-001"
$ wget --input-metalink=test.meta4
$ wget -c --input-metalink=test.meta4
Both the aboves: File OK, checksum fails (cannot find "path/file").

Vanilla commit, test2.meta4: name="bash43-001"
$ wget --input-metalink=test2.meta4
File OK, checksum OK.
$ wget -c --input-metalink=test2.meta4
Segmentation fault (I'm working on it).

Patch 0001 http://lists.gnu.org/archive/html/bug-wget/2016-08/msg00021.html
$ wget --input-metalink=test.meta4
$ wget -c --input-metalink=test.meta4
$ wget --input-metalink=test2.meta4
$ wget -c --input-metalink=test2.meta4
All the aboves: File OK, checksum OK (understands "path/file").

Patch 0002 http://lists.gnu.org/archive/html/bug-wget/2016-08/msg00033.html
$ wget --input-metalink=test.meta4
$ wget -c --input-metalink=test.meta4
Both the aboves: File OK, checksum fails (cannot find "path/file").
$ wget --input-metalink=test2.meta4
File OK, checksum OK.
$ wget -c --input-metalink=test2.meta4
File OK, checksum fails (what?! I'm working on it).

0001 + 0002: All OK.

I posted 0001 and 0002 before (see links above), I attach them here too.

PS: sorry for the boring message.

Later,
Matthew

-- 
Matthew White <mehw.is...@inventati.org>
>From a7978778db4501b872b0946346c2dedd8e12769d Mon Sep 17 00:00:00 2001
From: Matthew White <mehw.is...@inventati.org>
Date: Thu, 28 Jul 2016 17:10:46 +0200
Subject: [PATCH 1/2] Bugfix: create/download/verify Metalink's files with a
 "path/file" format

* src/metalink.c (retrieve_from_metalink): Set filename to mfile->name
  to create/download/verify files with a "path/file" format

Bug:
* src/utils.c (unique_create, fopen_excl): cannot create "path/file"
* src/metalink.c (retrieve_from_metalink): when filename is NULL,
  "path/file" is downloaded as "file", and it cannot be verified

The directory information contained in a metalink:file element shall
be used in the same way the option --directory-prefix will.

  <file name="dirA/dirB/file.gz">

  --directory-prefix="dirA/dirB/file.gz"

This patch conforms to the RFC5854 specification:
  The Metalink Download Description Format
  4.1.2.1.  The "name" Attribute
  https://tools.ietf.org/html/rfc5854#section-4.1.2.1
---
 src/metalink.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/metalink.c b/src/metalink.c
index 9f34b32..dd16db2 100644
--- a/src/metalink.c
+++ b/src/metalink.c
@@ -156,6 +156,13 @@ retrieve_from_metalink (const metalink_t* metalink)
               output_stream = unique_create (mfile->name, true, &filename);
               output_stream_regular = true;
 
+              /* Bug: utils.h (unique_create): cannot create a
+                 "path/file" tree.  Bugfix: If filename is set to
+                 "path/file", a tree is created in the same way as
+                 --directory-prefix does (see RFC5854).  */
+              if (filename == NULL && mfile->name)
+                filename = xstrdup (mfile->name);
+
               /* Store the real file name for displaying in messages.  */
               opt.output_document = filename;
 
-- 
2.7.3

>From 80c1b02a08c20d946f0dfd8848c27250edfde34a Mon Sep 17 00:00:00 2001
From: Matthew White <mehw.is...@inventati.org>
Date: Thu, 4 Aug 2016 11:35:42 +0200
Subject: [PATCH 2/2] Bugfix: Continue download when retrying with the next
 metalink:url

* src/metalink.c (retrieve_from_metalink): If output_stream isn't
  NULL, continue download with the next mres->url

Bug:
* src/metalink.c (retrieve_from_metalink): If output_stream isn't
  NULL, restart download with the next mres->url

Keep the download progress while iterating metalink:url.  In such
scenario, closing output_stream means to lose the download progress.
---
 src/metalink.c | 50 ++++++++++++++++++++++----------------------------
 1 file changed, 22 insertions(+), 28 deletions(-)

diff --git a/src/metalink.c b/src/metalink.c
index 1799e3a..2e296f9 100644
--- a/src/metalink.c
+++ b/src/metalink.c
@@ -119,24 +119,6 @@ retrieve_from_metalink (const metalink_t* metalink)
 
           retr_err = METALINK_RETR_ERROR;
 
-          /* If output_stream is not NULL, then we have failed on
-             previous resource and are retrying. Thus, rename/remove
-             the file.  */
-          if (output_stream)
-            {
-              fclose (output_stream);
-              output_stream = NULL;
-              badhash_or_remove (filename);
-              xfree (filename);
-            }
-          else if (filename)
-            {
-              /* Rename/remove the file downloaded previously before
-                 downloading it again.  */
-              badhash_or_remove (filename);
-              xfree (filename);
-            }
-
           /* Parse our resource URL.  */
           iri = iri_new ();
           set_uri_encoding (iri, opt.locale, true);
@@ -156,17 +138,29 @@ retrieve_from_metalink (const metalink_t* metalink)
               /* Avoid recursive Metalink from HTTP headers.  */
               bool _metalink_http = opt.metalink_over_http;
 
-              /* Assure proper local file name regardless of the URL
-                 of particular Metalink resource.
-                 To do that we create the local file here and put
-                 it as output_stream. We restore the original configuration
-                 after we are finished with the file.  */
-              if (opt.always_rest)
-                /* continue previous download */
-                output_stream = fopen (mfile->name, "ab");
+              /* If output_stream is not NULL, then we have failed on
+                 previous resource and are retrying. Thus, continue
+                 with the next resource.  Do not close output_stream
+                 while iterating over the resources, or the download
+                 progress will be lost.  */
+              if (output_stream)
+                {
+                  DEBUGP (("Previous resource failed, continue with next resource.\n"));
+                }
               else
-                /* create a file with an unique name */
-                output_stream = unique_create (mfile->name, true, &filename);
+                {
+                  /* Assure proper local file name regardless of the URL
+                     of particular Metalink resource.
+                     To do that we create the local file here and put
+                     it as output_stream. We restore the original configuration
+                     after we are finished with the file.  */
+                  if (opt.always_rest)
+                    /* continue previous download */
+                    output_stream = fopen (mfile->name, "ab");
+                  else
+                    /* create a file with an unique name */
+                    output_stream = unique_create (mfile->name, true, &filename);
+                }
 
               output_stream_regular = true;
 
-- 
2.7.3

Attachment: test.meta4
Description: application/metalink4

Attachment: test2.meta4
Description: application/metalink4

Attachment: pgpFkKnAdWtuo.pgp
Description: PGP signature

Reply via email to