Thu, 2 Nov 2000 17:56:56, Bennett Todd wrote:

> 2000-11-02-15:17:27 Andy Small:
> > I searched the archive of last 3 months of this list for a FAQ
> > posting, but I could not find one.
>
> I haven't seen such a document, but this mailing list seems to work
> pretty well, and the repetition rate I've seen hasn't been enough to
> drive me to FAQ.

Thanks.  I guess I did enough due diligence in looking for the FAQ before I posted to 
the list.  :-)

> > Where does rsync store the filelist that it builds?
>
> I wouldn't call this one an FAQ, not yet anyway. The file list is in
> memory; space required to run rsync grows linearly in the number of
> files in the directory heirarchy to be copied, and the whole thing
> is built in memory before the copying starts. This can be a
> performance hit in some settings. There's been discussion about
> doing away with this, letting rsync work through something like a
> depth-first sorted-order traversal or some such, both ends could do
> that in synch, with memory requirements often down to O(logN) from
> O(N), but as far as I know it hasn't progressed beyond
> talk; presumably, nobody with the expertise and tuits to try to
> tackle the coding is severely inconvenienced by the performance
> consequences of the current implementation.

You are describing RDIST.  The problem with RDIST is that it doesn't react well to 
situations with (a) high latency, and/or (b) thousands of files to synch.  Say your
round-trip time is 1000 ms (not unusual for international links) and you have 30,000 
files to synch... RDIST takes 8 hours (!!) to work.

I like RSYNC precisely because it does *not* have both ends working together.

> > When you fork off rsync processes in rapid succession, does
> > it have to build the list for each process?
>
> Yup.
>
> > Say you want to sync from ONE source server to FIVE destination
> > servers, and you have PERL doing the process management portion
> > (i.e. forking, waiting, etc...).
>
> There will be 5 complete scans of the whole src tree done, building
> 5 in-memory data structures representing the entire tree, and if the
> first one started hasn't finished before the last one gets far
> enough along to start the actual copying, all 5 will be in VM at
> once.

If you could drop that filelist from memory to a configurable place (like 
/tmp/rsync.filelist), and gave it a configurable time-out (like 10 minutes), I would 
be very
happy!

-asmall


Reply via email to