I had asked questions on the expert, newbie, and rsync list about using
rsync to correct a bad md5sum on an iso image.  I finally got it to work
(instructions below).  Somebody told me (in an email) that it will not
work to upgrade from one version of an iso to another (like Mandrake 8.0
beta 2 to final), but I don't believe that -- I believe it will work but
I haven't tried it yet -- thought I'd wait a few days or longer to let
traffic on the mirrors die down.  

When I try it, I will:

-Make an archive copy of my existing local copy of the iso.  Put a
working copy in a directory with space for at least one additional copy
of the iso and rename it to match the name of the iso that I am trying
to "download".  

-If I have space for two copies, I will put one in a subdirectory of
this directory and then specify the subdirectory name in the
--compare-dest option of rsyc.  

(See my notes below about the --partial and the --compare-dest options
-- if you specify the partial option (which I think is a good idea) but
your transfer is interrupted, the original iso will be replaced by the
partially transferred copy.  Sometimes it will make sense to continuse
rsync with the partially transferred copy (if the transfer was almost
completed), but more often it will make sense to restart rsync from the
beginning, using the original iso.  If you don't save a copy somewhere,
you will be forced to continue with the partially transferred iso.  When
you continue with a partially transferred copy, you are no longer
reaping the benefit of rsync but basically doing a "pure download" of
the remainder of the file.  If the --compare-dest option worked as I
thought it might, this would not be an issue, but, so far, the
--compare-dest option has not worked for me (more discussion below).)

-Look at the mirror site to find the full name and path of the iso I
want to download (use a web browser).

-Find the first directory in the rsync path by running:

rsync carroll.cac.psu.edu::

(The rsync server establishes a relative path to the iso file -- in the
rsync command you must specify this relative path, not the full path.)

-Use a command line like (with the PWD set to the directory I want to
download to):

rsync -a -vv --progress --partial [--compare-dest=<subdirectory_name>] \
carroll.cac.psu.edu::mandrake-iso/mandrakefreq-i586-20010316.iso \
mandrakefreq-i586-20010316.iso

(Note the full path to the file (on carroll) is something like
/pub/.../.../mandrake-iso/mandrakefreq-... -- the rsync path (on
carroll) starts with mandrake-iso.)

(Note the double colon.  That works on carroll.  There are other
variations for specifying the rsync server, some with a single colon,
some using the word rsync.  You don't get a choice, you have to specify
the one that is right for the server you want to download from and the
communication method you want to use -- ssh, rsh, or?  See the man pages
for rsync -- when I learn more about these other options I will modify
these instructions.  Note that I don't think anybody needs to use ssh to
download a Mandrake iso -- who are you keeping a secret from?)

(The --compare-dest parameter is optional, see discussion above and
below.)

You must make sure that the timestamp (or filesize?) on your local copy
of the file differs from the remote copy, otherwise rsync will think the
local and remote iso are the same and not attempt to sync them.  Touch
the local file if necessary.  

-Do not specify the -c option!

(You might think that you can use the -c option to force a full checksum
check on the files.  This did not work for me.  The problem for me was
that I got a message "unexpected EOF on read-timeout", and a short time
later the rsync process quit.  What I didn't realize until later is that
this error message means the rsync server has quit.  The rsync server
often quit for me when I used the -c option, generally because something
on the server end decided to kill rsync.  I don't know whether  this was
due to some parameters of the rsync server, or something like an
external watchdog watching the rsync server and deciding it was not
making sufficient progress, but I do know that if you specify the -c
option it uses a lot of processor resources on the server side, and, at
least for me and carroll, I often got the "-timeout" message until I
stopped specifying the -c.)

-The -a option stands for archive and is a way of specifying several
options at once.  It will make sure that (IIRC) the file size,
timestamp, owner, group, permissions, and ?? are corrected by rsync. 
(Note that if you try to use rsync in dos/Windows you may run into a
problem because dos/Windows can only resolve timestamps to an even
number of seconds -- if the original iso file has a timstamp with an odd
number of seconds, you will have trouble getting the md5sum to come out
correctly.  (There may be some workarounds -- I used rsync under Linux
so I did not have to deal with any issues here.)

-I read some of Andrew Tridge's thesis (somewhere at rsync.samba.org). 
Rsync divides a file into blocks (IIRC, somewhere between 3000 and 8000
bytes per block (and there is a command line parameter to adjust the
block size).  As long as there are blocks in this size range that are
identical between the remote file and the local file, rsync will find
those duplicate blocks and not transfer them, thus saving the amount of
time that would otherwise be required to transfer those blocks.  I
suspect that there are enough matching blocks between beta 3 and the
final Mandrake 8.0 to make rsync worthwhile.  Some other notes about
using rsync:

--If you are trying to rsync a Linux text file with a dos/Windows text
file, rsync may be useless if one copy uses the <cr><lf> line ending
convention (dos) and the other file uses the <lf> line ending convention
(Linux).  (If lines are 80 characters, no 3000 character blocks will
match.  There are some potential workarounds to this problem that I will
not discuss here, except to suggest using sed on your local copy to
convert all <cr><lf>s to <lf>s or vice versa, if necessary, before
running rsync.  I would not expect any problems like this while dealing
with an iso, and I did not have any such problems even though I
downloaded the file using rsync under Linux, and then copied the file to
a Windows machine to burn the CD.  (When I got the md5sum right in
Linux, it stayed correct after the transfer to Windows.))

--rsync must have space for at least one extra copy of the iso file in
your local directory.  As rsync works, it copies portions of the file
from the server or from the local copy (for matching blocks) and
constructs a new copy of the iso under a temporary file name.  When
rsync has reconstructed the file, it deletes the original file, and
renames the reconstructed copy to match the original file's name, date,
etc.

If rsync is interrupted before completion, the temporary file is lost,
and rsync starts over from scratch unless you have specifed the
--partial option.  I recommend you specify the --partial option, but if
the transfer is interrupted you will have to make a decision.  First see
my notes (below) about the --compare-dest option.  Assuming the
--compare-dest option does not work the way I hoped it would work, you
must choose whether to start the rsync process over from the beginning
(by replacing the partially transferred file with the original local
copy of the iso), or continuing from where you left off.  If you
continue from where you left off, the remainder of the transfer will be
basically a "pure download" -- rsync will have no local copy of the file
to check for block matches.

In reading about the --compare-dest option, it sounded like I could keep
two local copies of the file I wanted to update, one in the download
directory and one in a subdirectory specified in the --compare-dest
option.  I hoped, that if I restarted rsync after an interruption, it
would continue building the partially transferred file but use the file
in the --compare-dest directory for comparison to the original and as
"raw material" to avoid transferring duplicate blocks.  It did not work
for me, but it might have been because:

-I did not set things up properly -- my hard disk does not have space
for three copies of the iso, so I burned a CD-Rom and used it for the
--compare-dest copy.  If I had more hard disk space, I would try this
again with the third copy in a subdirectory below the directory I am
transferring the iso into.

-The --compare-dest option only works for entire files -- if you are
transferring multiple files, those that have been successfully
transferred are recognized as completed by rsync and are not rechecked,
those that have not been transferred use the files in --compare-dest as
the raw material for the resumption of the rsync process.

-I misunderstood the description and purpose of --compare-dest.

-There is a bug in --compare-dest.

Hope this helps!

After I get my TWiki up and working, I will rewrite these instructions
and put them on my TWiki site.  (I apologize because these are not as
clear as I would like -- they need to be rewritten -- I just don't have
the time right now.)  I would appreciate corrections or clarifications
to these instructions, especially with respect to the --compare-dest
option.

Randy Kramer

CB wrote:
> 
> On Thu, Apr 19, 2001 at 02:27:37PM -0700, Rusty Carruth wrote:
> > There was a question a while back on the rsync list (I think)
> > about updating iso images using rsync, thus keeping you from
> > having to load an entire 600meg iso image when you already
> > have the beta just before that one.
> 
> I don't think rsync works that way.  It compares on a file by file basis
> and when it finds a file that's different, it downloads it and replaces
> it on the local machine.  Since the iso file changes, it downloads the
> new iso file.  The whole 600 megs.  You're not getting the savings you
> think you are.
> 
> The one advantage you might get is that ftp might be traffic
> shaped, whereas rsync might not.  Notice the two _might_s in there.
> Those were strictly guesses.
> --
> Blue skies...           Todd
> | Get a bigger hammer!   |  Sometimes you get what you want.      |
> | http://www.mrball.net  |  Sometimes you get experience.         |
> | http://faq.mrball.net  |                     --unknown origin   |

Reply via email to