Re: Using --fuzzy

2014-11-17 Thread Matthias Schniedermeyer
On 16.11.2014 18:38, Karl O. Pinc wrote:
 On 11/16/2014 03:53:12 PM, Joe wrote:
  I have a lot of files (and directories) (up to a few hundred at a
  time)
  that I get from various sources. Some time after I get them (after
  they
  are already backed up), I often have to move them around and 
  normalize
  their names.
  
  When I do this, rsync sees them as unrelated to the copies of these
  files which are already on the backup destination. 
 
 I don't know if it suits your use case but
 you could consider using hardlinks.

It should be noted that using hardlinks has 1 major caveat:
Order

It only saves a copy when the new hardlinks appears in the hierachy 
AFTER the original file.

(This is true for incremental-mode (default for =3.0). It might work 
differently for 3.0 or --no-inc-recursive-mode, but i haven't tried.)

Otherwise rsync will copy the new file and later hard link the 
old-file to the new-file and not the other way around.

So i personally use a directory '.z' in the root of a hierarchy where 
each file has an additional hardlink, so i can move files around in the 
hierarchy however i want.
That way rsync sees the '.z'-directory first and acts accordingly.


Such a directory can be created after the fact.
Make a directory that is LAST in sort-order. Assuming plain ASCII 
filesnames:
mkdir zzz
Then link all files into that directory and rsync (Don't forged adding 
-H).
Then rename it to be first in sort-order (on both sides!):
mv zzz .z

And after you have made the necessary changes to your procedures to make 
the additonal hardlink you are free to move around files without rsync 
having to copy them each time they are moved.

After deleting files you can use:
find .z -type f -links 1 -delete
to find and delete files that don't have an additional hardlink.




-- 

Matthias
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Using --fuzzy

2014-11-17 Thread Joe
I'm going to have to digest this for awhile. It makes sense, but I have
to work on it a bit before I understand it enough to actually apply it.

This would make a good howto article.

Thanks to both of you.
On 11/17/2014 04:56 AM, Matthias Schniedermeyer wrote:
 On 16.11.2014 18:38, Karl O. Pinc wrote:
 On 11/16/2014 03:53:12 PM, Joe wrote:
 I have a lot of files (and directories) (up to a few hundred at a
 time)
 that I get from various sources. Some time after I get them (after
 they
 are already backed up), I often have to move them around and 
 normalize
 their names.

 When I do this, rsync sees them as unrelated to the copies of these
 files which are already on the backup destination. 
 I don't know if it suits your use case but
 you could consider using hardlinks.
 It should be noted that using hardlinks has 1 major caveat:
 Order

 It only saves a copy when the new hardlinks appears in the hierachy 
 AFTER the original file.

 (This is true for incremental-mode (default for =3.0). It might work 
 differently for 3.0 or --no-inc-recursive-mode, but i haven't tried.)

 Otherwise rsync will copy the new file and later hard link the 
 old-file to the new-file and not the other way around.

 So i personally use a directory '.z' in the root of a hierarchy where 
 each file has an additional hardlink, so i can move files around in the 
 hierarchy however i want.
 That way rsync sees the '.z'-directory first and acts accordingly.


 Such a directory can be created after the fact.
 Make a directory that is LAST in sort-order. Assuming plain ASCII 
 filesnames:
 mkdir zzz
 Then link all files into that directory and rsync (Don't forged adding 
 -H).
 Then rename it to be first in sort-order (on both sides!):
 mv zzz .z

 And after you have made the necessary changes to your procedures to make 
 the additonal hardlink you are free to move around files without rsync 
 having to copy them each time they are moved.

 After deleting files you can use:
 find .z -type f -links 1 -delete
 to find and delete files that don't have an additional hardlink.





-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Using --fuzzy

2014-11-16 Thread Karl O. Pinc
On 11/16/2014 03:53:12 PM, Joe wrote:
 I have a lot of files (and directories) (up to a few hundred at a
 time)
 that I get from various sources. Some time after I get them (after
 they
 are already backed up), I often have to move them around and 
 normalize
 their names.
 
 When I do this, rsync sees them as unrelated to the copies of these
 files which are already on the backup destination. 

I don't know if it suits your use case but
you could consider using hardlinks.

If, instead of moving the files, you hardlinked them
then rsync with -H would see the files as being the same.

(Hardlinking can only be done within a filesystem.)

Then you'd have to delete the original filenames and
rsync again.

This is only practicable if it's easy to delete
the old filenames, say, if all the new files
arrive in a single directory that can later
be deleted.



Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Using --fuzzy

2014-11-16 Thread Joe
Great idea which I will keep in mind for other cases!

In this case, however, the backups are on separate partitions on
external USB drives (I have a notebook), so hard links won't work.

Joe

On 11/16/2014 07:38 PM, Karl O. Pinc wrote:
 On 11/16/2014 03:53:12 PM, Joe wrote:
 I have a lot of files (and directories) (up to a few hundred at a
 time)
 that I get from various sources. Some time after I get them (after
 they
 are already backed up), I often have to move them around and 
 normalize
 their names.

 When I do this, rsync sees them as unrelated to the copies of these
 files which are already on the backup destination. 
 I don't know if it suits your use case but
 you could consider using hardlinks.

 If, instead of moving the files, you hardlinked them
 then rsync with -H would see the files as being the same.

 (Hardlinking can only be done within a filesystem.)

 Then you'd have to delete the original filenames and
 rsync again.

 This is only practicable if it's easy to delete
 the old filenames, say, if all the new files
 arrive in a single directory that can later
 be deleted.



 Karl k...@meme.com
 Free Software:  You don't pay back, you pay forward.
  -- Robert A. Heinlein


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Using --fuzzy

2014-11-16 Thread Karl .O Pinc
The backups can be on separate partitions.  What must be on one partition is 
the file and it's hard link.

On November 16, 2014 6:58:26 PM CST, Joe jose...@main.nc.us wrote:
Great idea which I will keep in mind for other cases!

In this case, however, the backups are on separate partitions on
external USB drives (I have a notebook), so hard links won't work.

Joe

On 11/16/2014 07:38 PM, Karl O. Pinc wrote:
 On 11/16/2014 03:53:12 PM, Joe wrote:
 I have a lot of files (and directories) (up to a few hundred at a
 time)
 that I get from various sources. Some time after I get them (after
 they
 are already backed up), I often have to move them around and 
 normalize
 their names.

 When I do this, rsync sees them as unrelated to the copies of these
 files which are already on the backup destination. 
 I don't know if it suits your use case but
 you could consider using hardlinks.

 If, instead of moving the files, you hardlinked them
 then rsync with -H would see the files as being the same.

 (Hardlinking can only be done within a filesystem.)

 Then you'd have to delete the original filenames and
 rsync again.

 This is only practicable if it's easy to delete
 the old filenames, say, if all the new files
 arrive in a single directory that can later
 be deleted.



 Karl k...@meme.com
 Free Software:  You don't pay back, you pay forward.
  -- Robert A. Heinlein

Karl k...@meme.com
Free Software: You don't pay back, you pay forward.
-- Robert A. Heinlein
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html