On Wed, Apr 10, 2002 at 08:29:49PM +1000, Martijn van Oosterhout wrote: > On Wed, Apr 10, 2002 at 09:22:50AM +0200, Michael Bramer wrote: > > On Wed, Apr 10, 2002 at 10:25:22AM +1000, Martijn van Oosterhout wrote: > > > With the standard rsync algorithm, the rsync checksum files would actually > > > be 8 times larger than the original file (you need to store the checksum > > > for each possible block in the file). > > > > I don't see that the checksum file is larger than the origanl file. If > > the checksum file is larger, we will have more bytes to download... This > > was not the goal. > > That's because the client doesn't not download the checksums. Look below. > > > maybe I don't understand the rsync algorithm... > > > > IMHO the rsync algorithm is: > > 1.) Computer beta splits file B in blocks. > > 2.) calculate two checksums > > a.) weak ``rolling'' 32-bit checksum > > b.) md5sum > > 3.) Computer B send this to computer A. > > 4.) Computer A search in file A for parts with the same checksums from > > file B > > 5.) Computer A request unmatch blocks from computer B and > > build the file B. > > > > I get this from /usr/share/doc/rsync/tech_report.tex.gz > > Computer A wants to download a file F from computer B. > > 1. Computer A splits it's version into blocks, calculates the checksum for > each block. > 2. Computer A sends this list to computer B. This should be <1% the size of > the original file. Depends on the block size. > 3. Computer B takes this list and does the rolling checksum over the file. > Basically, it calculates the checksum for bytes 0-1023, checks for it in the > list from the client. If it's a match send back a string indicating which > block it is, else send byte 0. Calculate checksum of 1-1024 and do the same. > The rolling checksum is just an optimisation. > 4. Computer A receives list of "tokens" which are either bytes of data or > indications of which block to copy from the original file.
all ok. I write the same above, except point '4' and you switch A and B... > Notice that: > a. The server (computer B) does *all* the work. If you use A as Server, the client make all the work. > c. Precalculating checksums on the client is useless > d. Precalculating checksums on the server is also useless because the > storage would be more (remember, checksum for bytes 0-1023, then for 1-1024, > 2-1025, etc). It's faster to calculate them than to load them off disk. Precalculating of the _block_ checksums is _not_ useless. This checksums are only <1% the size of the original file (depends on the block size). > So, the main difference between what you are proposing is 1 versus 2 > requests per file. And rsync definitly only has one. The main difference is: The client and not the server make all the work! > Besides, look at the other posts on this thread. Diff requires less download > than rsync. I read it, but I don't understand it. But this is not the problem. IMHO the diff is a kind of a hack and a cached rsync is a nice framework. But this is only my taste... Maybe I should read the rsync-source-code...Done Ok, with the normal rsync program the client make the block checksums and the server search in the file... Thanks for your help. Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debsupport.de PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux "Hummeln koennen wirklich stechen, tun das aber nur in extremen Ausnahme- Situationen. NT tut in solchen Situationen nichts mehr." aus d.a.s.r
pgphEKKQCFH3u.pgp
Description: PGP signature