Hi,

Ouch. In jigdo-lite it is not easy to have the downloaded files verified
with the checksums of the expected FileParts.

Steve, i could need a decision in which direction i should go:

- Check .jigdo MD5s by jigdo-lite.

- Check by jigdo-file, with a new option --warn-unused-file to enable
  my "POSSIBLE FILE CORRUPTION" test when jigdo-lite is cycling between
  downloading and jigdo-file "make-image" scanning. 
  (I expect this test to produce lots of false positives if jigdo-file
   would use it when exploiting a large local pool tree.)

- Declare "Won't fix" and have other fun.

---------------------------------------------------------------------
Things which are so far ok for a MD5 check in jigdo-lite:

The list of files to download is obtained by a run of
  jigdo-file print-missing-all ...
This is not too bad, because it not only delivers a list of possible URLs
per file (usually one per file) but also a MD5 in jigdo-file's modified
base64 encoding
jigdo-file command MD5SUM is supposed to produce a disk file's MD5 in the
same format. So comparison would be possible

If i add "http://archive.debian.org/..."; to the [Servers] list in .jigdo,
i get per missing file: two URLs, one encoded MD5, and an empty line.

  http://archive.debian.org/.../openssh-client-udeb_5.5p1-6+squeeze3_amd64.udeb
  
http://us.cdimage.debian.org/.../openssh/openssh-client-udeb_5.5p1-6+squeeze3_amd64.udeb
  MD5Sum:BjBWgpWgZYkV0gdXgcpm5A

  http://archive.debian.org/.../reiserfsprogs-udeb_3.6.21-1_amd64.udeb
  http://us.cdimage.debian.org/.../reiserfsprogs-udeb_3.6.21-1_amd64.udeb
  MD5Sum:HEsrTtJufOa50DKzAIQ3EA

jigdo-lite seems to expect up to 8 such URLs per file.
See it counting by fingers in line 591:
    for pass in x xx xxx xxxx xxxxx xxxxxx xxxxxxx xxxxxxxx; do
      ...
      while $readLine url <&3; do
        count="x$count"
        ...
        if test "$count" != "$pass"; then continue; fi

Up to 10 collected URLs are then handed as arguments to function 
  fetchAndMerge
which not only downloads them but also runs jigdo-file to put them
into the emerging ISO. So this is where verifying would have to happen.

I made a plan how to give the MD5s of the URLs as further arguments to
fetchAndMerge. Since the encoded MD5s are single words one could send
them down as first argument, shift 1, and then give the other arguments to
function "fetch" for download.

---------------------------------------------------------------------
But then i becomes ugly:

Now fetchAndMerge has URLs for wget and corresponding MD5s for files.
It would need to deduce the file paths from the URLs in order to run
jigdo-file MD5SUM on it.

jigdo-file MAKE-IMAGE gets the root of the file pool. I do not dare to
guess whether only the freshly downloaded files are in there.
If others are present, the relation between downloaded files and MD5s
would derail.

---------------------------------------------------------------------
Possible workaround:

I am now exploring the effort to introduce a new option for jigdo-file:

  --warn-unused-file [make-image] Complain if a submitted file matches
                   none of the wanted checksums

which shall control whether the message
  POSSIBLE FILE CORRUPTION: Offered file did not fit into the template.
  POSSIBLY CORRUPTED: `...path...'
shall be emitted if a file does not match any wanted template checksum.

The option will be disabled by default. jigdo-lite function fetchAndMerge
could set it.

I am still unsure whether jigdo-lite should use it by default. In my
tests with "netinst" and "businesscard" images it produced no false positives.
If it encounters surplus files which were not freshly downloaded, then it
would report them but would not confuse them with others.

I can invest a few dozen GB of netload into larger tests, if this is desired.

---------------------------------------------------------------------

Well, given the fact that this is only for the unusual case of damaged
files on the fallback server, one could easily argue that the risk of
a regression is not outwighted by the potential benefit.


Have a nice day :)

Thomas

Reply via email to