On Fri, 2008-01-04 at 16:21 -0500, Boris Toloknov wrote: > Ming Zhang wrote: > > On Fri, 2008-01-04 at 15:05 -0500, Boris Toloknov wrote: > > > > > Ming Zhang wrote: > > > > > > > On Fri, 2008-01-04 at 14:12 -0500, Boris Toloknov wrote: > > > > > > > > > > > > > Ming Zhang wrote: > > > > > > > > > > > > > > > > On Thu, 2008-01-03 at 20:19 -0500, Boris Toloknov wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > It seems that rsync transfers files whose names was changed or > > > > > > > which > > > > > > > were moved to another directory since the previous > > > > > > > synchronization. I > > > > > > > think that ability not to transfer (large) files which are > > > > > > > present on > > > > > > > another computer would be very helpful. Right before rsync is > > > > > > > going to > > > > > > > transfer some large file it could check if there some other files > > > > > > > with > > > > > > > the same size ( and maybe the same mtime ) on the destination > > > > > > > computer. In case if the destination computer has such files then > > > > > > > it > > > > > > > could be asked to find the file with given MD5. If it's found then > > > > > > > there is no need to transfer that file. Local copy/rename/move > > > > > > > can be > > > > > > > performed instead. > > > > > > > > > > > > > > > > > > > > > > > > > > > let us say you have N files in one directory and you rename the > > > > > > directory name. so for N files, u need to check destination side > > > > > > all M > > > > > > files and see if it is the renamed one. so you do NxM comparison and > > > > > > this is not scalable at all... > > > > > > > > > > > > > > > > > > > > > > > I think that a hash could be used instead of that. The destination > > > > > computer ( at least ) must has a list of all the files in the > > > > > destination directory. The key = size + mtime and value = pointer to > > > > > the file entry in the list. Actually for that operation it would be > > > > > better to have that list and hash on the sending computer. > > > > > > > > > > > > > > rsync 3.0 introduce incremental scan to avoid the OOM issue, so hash > > > > need to be optional as well... also i think this hash can be used to > > > > detect hard link at same time. for normal use, it should be ok. > > > > > > > > > > > I agree that with incremental scan "move/rename" feature can be > > > optional. Anyway to minimize memory usage ( if it's necessary ) a > > > sorted list can be used instead of hash and a list of all files could > > > be stored in the temporary file with buffered access to it. In that > > > case the key = size + mtime, value = offset in the file with the list. > > > > > > > another issue is rsync need to build this list up front before handling > > file transfer. this can take quite some time on a huge file system (when > > i say huge, i mean the file system with 20-100m files)... > > > > also rsync already have some rename detection. check command line option > > please. > > > I don't mind to have "move/rename" detection as an optional feature > that is turned off by default. Actually that list doesn't have to have > all the files. The files with size < some configurable size ( for > example 100KB ) don't need to be in the list. So it's likely won't > take much memory and time ( for sorting ) even for huge systems. > Scanning of the file tree takes some time though. 1TB HDD filled up > with 100,000,000 files has average file size about 10KB. > I have 2.6.9 and didn't find any command line option for rename > detection. I just found that there is some patch "--detect-renamed". > But it seems that that patch doesn't detect files which were moved to > another directory. "News file" for 3.0.0pre7 doesn't have anything > about rename detection.
i must remember the feature because of this patch. another way is to use inotify, generate a moved file list, pass list to receiver side, and handle the list before running rsync. > > Boris > > Boris -- Ming Zhang @#$%^ purging memory... (*!% http://blackmagic02881.wordpress.com/ http://www.linkedin.com/in/blackmagic02881 -------------------------------------------- -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html