On Fri, Jun 22, 2007 at 03:33:31PM -0400, George Georgalis wrote: >On Tue, Jun 05, 2007 at 11:11:27AM -0700, Chuck Wolber wrote: >>On Tue, 5 Jun 2007, Paul Slootman wrote: >> >>> > In any case, what's the general consensus behind using the >>> > --hard-links option on large (100GB and above) images? Does it still >>> > use a ton of memory? Or has that situation been alleviated? >>> >>> The size of the filesystem isn't relevant, the number of hard-linked >>> files is. It still uses a certain amount of memory for each hard-linked >>> file, but the situation is a lot better than with earlier rsync >>> versions. (As always, make sure you use the newest version.) >> >>In our case, we store images as hardlinks and would like an easy way to >>migrate images from one backup server to another. We currently do it with >>a script that does a combination of rsync'ing and cp -al. Our layout is >>similar to: >> >>image_dir >>| -- img1 >>| -- img2 (~99% hardlinked to img1) >>| -- img3 (~99% hardlinked to img2) >> . >> . >> . >>` -- imgN (~99% hardlinked to img(N-1)) >> >> >>Each image in image_dir is hundreds of thousands of files. It seems to me >>that even a small amount of memory for each hardlinked file is going to >>clobber even the most stout of machines (at least by 2007 standards) if I >>tried a wholesale rsync of image_dir using --hard-links. No? >> >>If so, then is a "hard link rich environment" an assumption that can be >>used to make an optimization of some sort? > >I had a C program which would scan directory points and on some >criteria, (I forget exactly, size and mtime?), it would decide to >unlink one file and link the name to the other. I could look for >it but no guarantees I'll find it, or soon... it was designed for >identical files with different names. > >you could tar transfer then minimize with the program. of course >everyone on this list would prefer to use rsync, maybe the >algorithm could be integrated in? :) maybe I can find the code. >it was written by a very senior individual...
the program is http://www.ka9q.net/code/dupmerge/ there are 200 lines of well commented C; however there may be a bug which allocates too much memory (one block per file); so my application runs out. :\ If you (anyone) can work it out and/or bring it into rsync as a new feature, that would be great. Please keep the author and myself in the loop! // George -- George Georgalis, information systems scientist <IXOYE>< -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html