tom rothamel is working on a project called debdiff that works towards the same goal. please read his announcment thread, which is archived at http://www.debian.org/Lists-Archives/debian-devel-0002/msg00391.htm.
i like the idea of rsync modules, but the concept you project misses is that even a small addition or subtraction in the beginning of a file ruins rsync's speed bonus because it then has to send everything. take a look at tom's code. i think you'll find it interesting. Andrea Mennucc1 ([EMAIL PROTECTED]) wrote: > > hi everybody > > I have implemented > a good idea for reducing download stress for everybody who is > mirroring a lot of data using rsync, > like, the people who are mirroring Debian GNU/Linux: > currently, many Debian "leaf mirrors" are using rsync > for mirroring from the main .debian.org hosts. > > rsync contains a wonderful algorithm to speedup downloads when mirroring > files which have only minor differences; > only problem is, this algorithm is ALMOST NEVER used > when mirroring a debian repository > ... indeed, whenever a new version of a > package is entered in the debianrepository, > this package has a different name: for this reason rsync does just a > full download. > Summarizing, rsync currently does some speedup only > when it downloads Packages.gz files, or when it skips an already existing > package. > > well, I have just implemented a simple > way to use the algorithm even when downloading the .debs . > > here is a simple example > > suppose the current situation is > $REMOTE::/pub/debian/dist/bin/dpkg_2.deb > whereas locally we have > /debian/dist/bin/dpkg_1.deb > > when rsync looks for a local version of > /debian/dist/bin/dpkg_2.deb > if there is none, then rsync does > ls -t /debian/dist/bin/dpkg_* > and looks for the most recent file it finds > > this way, rsync will use the file /debian/dist/bin/dpkg_1.deb > to try to speedup the download of $REMOTE::/pub/debian/dist/bin/dpkg_2.deb > (using its fabulous algorithm) > > BIG PRO: my new "rsync" is totally compatible with the old one > > Conclusion: > this idea would make all debian mirror-people happier > (specially if they mirror "unstable"; consider that, often, > when a new version of a package is released, only small changes are made... > sometimes, only the .postinst , or such, are really changed; > this may , thou, masked by the compression, alas: but, see TODO) > > I attach two files: the first file is a diff, showing where, in > the "rsync 2.4.1" source code tree, I have done some modifications; > the second is a .tgz of the all the new and modified files you > need to build the new rsync: > to build, first you need to download > the source code (see rsync.samba.org/rsync/download.html) > and then you unpack the file rsync.diffsrc.tgz in the tree code, > and build. > > You may also get the compiled binary directly as > ftp://tonelli.sns.it/pub/rsync/rsync > and the new code alltogether in > ftp://tonelli.sns.it/pub/rsync > > TODO: > there are some potentially good ideas here: > > 1) the idea is to add "modules" to rsync: > a "gzip" module, a "deb" module, and "rpm" module...; > currently, modules just look for an older local version of the file; > > in a future version, any module would > apply to a certain type of file, and create > another file to pass to "rsync" > so that this another file may probably lead to more speedup: > e.g., the "gzip" module would unzip files before doing comparisons, > and the "deb" module would unzip the data.tar.gz part of a package > > CONS: this would not be backward compatible, of course > > The idea is, a module may provide the following calls: > find_alternative_version_MOD() > receive_file_MOD() > send_file_MOD() > > Currently, only find_alternative_version_deb() was implemented. > > If rsync uses only the find_alternative_version_MOD() > calls, then it is "backward compatible" with the usual version: > (in a sense , it is doing what the option --compare-dest already does, > only in a smarter way) > > I have not currently implemented any receive_file_MOD() > send_file_MOD() : these would need a change in the protocol: > I hope that the rsync authors will give permission > > 1b) My idea (not sure) is that "rsync" may work if provided with "named pipes" > instead of files: indeed, according to the technical report, > it needs to read the local and remote files only once, > and then, it writes the local file, without ever seeking backwards; > then, the above modules would not need to actually > use disk space and create temporary files. > > > 2) for a faster apt-get downloading, > it may be possible to do the same trick WHEN UPGRADING > INSTALLED PACKAGES! Here is the idea: > "apt-get creates a local version of the package > (using dpkg-repack) > and do the rsync to get the remote version" > > > > -- > Andrea C. Mennucci, Scuola Normale Superiore, Pisa, Italy -- (jacob kuntz) [EMAIL PROTECTED] [EMAIL PROTECTED],underworld}.net (megabite systems) "think free speech, not free beer."