Hey rsync people, Here's a really radical idea and a possible future direction for the rsync project to explore.
It occurs to me that tar and rsync are closely related in their purposes. "tar -c (blah) | tar -x" can be used to copy files; rsync's setup with a sender process and a receiver process is strikingly similar. The only major conceptual difference is that the rsync protocol uses two-way communication to transmit only what has changed, while tar always transmits a complete snapshot of a collection of files. Since both tar and rsync read and write filesystems in great detail, they have many analogous sections of source code. For example, both programs set permissions on received/extracted files in two passes: first by supplying a mode to "open" and then with an explicit "chmod". Many options correspond, and not just the obvious "preserve-this, preserve-that" ones: the sending-end "--chmod" option that can be added to rsync with a distributed patch is analogous to tar's "--mode" option. So I am led to ask: is there a practical way to merge tar and rsync into one program whose focus is capturing and recreating collections of files? This program could be invoked in many different ways to copy between archives and files, both local and remote. But something tells me that it would be a pain to get this program to communicate in two "modes", two-way protocol and complete snapshot. I am inclined to use a concept that has already appeared plenty in pluggable multimedia systems: multiple "sources" and "sinks". Let's standardize on a single two-way protocol based on the rsync one. Then there can be a filesystem source, an archive source, a filesystem sink, and an archive sink. The user can then run an arbitrary source and an arbitrary sink, possibly on different machines, and they communicate through a pipe using the two-way protocol. The difference is that, when the source says "do you already have a file at path X that has the same checksum as mine?" the archive sink will say "umm...give me the whole file, please" while the filesystem sink might say "yes, I do". Actually, an archive sink whose options dictate that it send the archive to _standard_output_ will always behave in this fashion, but an archive sink that has an old archive to consult might be able to optimize away much of the transmission. One could even synchronize changes between two archive files on different machines, possibly translating permissions or other attributes in the process. Of course, there could be different sources and sinks for different archive formats and maybe even for exotic filesystems that don't use the POSIX interface. These could even be packaged separately in shared libraries and loaded by an rsync core that carries out the standard protocol. There's one obvious drawback to this approach. It will probably be noticeably slower than traditional tar at simple archiving operations since the data has to pass through one more pipe. Shared memory or even running both source and sink in the same process are likely to help for local transfers. -- Matt McCutchen, ``hashproduct'' [EMAIL PROTECTED] -- http://mysite.verizon.net/hashproduct/ -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html