I'm copying files using --link-dest to avoid duplication.  I'm also
using a de-duplicator (rmlint) to further reduce duplication.  For
files that are duplicates, I've rmlint set to use the timestamp of the
oldest file.

This ends up with starting conditions where the source of a copy might
have been:

   [root@archive3 tmp]# ls -lin --time-style=full-iso  /tmp/SRC/{a,b}
   75676805 -rw-r--r--. 1 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/SRC/a
   75687257 -rw-r--r--. 1 0 0 4 2023-07-08 14:18:45.497699620 -0400 /tmp/SRC/b
   [root@archive3 tmp]# 

while the previously copied (and de-duplicated) result is:

   [root@archive3 tmp]# ls -lin --time-style=full-iso  /tmp/DEST0/{a,b}
   75676810 -rw-r--r--. 4 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/a
   75676810 -rw-r--r--. 4 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/b
   [root@archive3 tmp]#

Note that files 'a' and 'b' in the result directory DEST0 share an
inode (are hardlinked) while they are separate inodes (with identical
content but different timestamps) in the source directory SRC.

If I create a new copy using --link-dest as follows, I get my desired
behavior

   [root@archive3 tmp]# ls -lin --time-style=full-iso  /tmp/{SRC,DEST*}/{a,b,c}
   ls: cannot access /tmp/SRC/c: No such file or directory
   ls: cannot access /tmp/DEST*/c: No such file or directory
   75676810 -rw-r--r--. 2 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/a
   75676810 -rw-r--r--. 2 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/b
   75676805 -rw-r--r--. 1 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/SRC/a
   75687257 -rw-r--r--. 1 0 0 4 2023-07-08 14:18:45.497699620 -0400 /tmp/SRC/b
   [root@archive3 tmp]# rsync --itemize-changes  -rlpgoD --size-only 
--link-dest=/tmp/DEST0 /tmp/SRC/ /tmp/DEST1
   [root@archive3 tmp]# ls -lin --time-style=full-iso  /tmp/{SRC,DEST*}/{a,b,c}
   ls: cannot access /tmp/SRC/c: No such file or directory
   ls: cannot access /tmp/DEST*/c: No such file or directory
   75676810 -rw-r--r--. 4 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/a
   75676810 -rw-r--r--. 4 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/b
   75676810 -rw-r--r--. 4 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST1/a
   75676810 -rw-r--r--. 4 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST1/b
   75676805 -rw-r--r--. 1 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/SRC/a
   75687257 -rw-r--r--. 1 0 0 4 2023-07-08 14:18:45.497699620 -0400 /tmp/SRC/b
   [root@archive3 tmp]# 

Note that DEST1/a and tmp/DEST1/b share an inode because of --link-dest 
and the fact that the source 'a' matches DEST0/a while the source 'b'
matches DEST0/b (comparing with --size-only).

What I don't like from this set of options to rsync is what happens
with 'c' in this case:

   [root@archive3 tmp]# ls -lin --time-style=full-iso  /tmp/{SRC,DEST*}/{a,b,c}
   ls: cannot access /tmp/DEST*/c: No such file or directory
   75676810 -rw-r--r--. 2 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/a
   75676810 -rw-r--r--. 2 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/b
   75676805 -rw-r--r--. 1 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/SRC/a
   75687257 -rw-r--r--. 1 0 0 4 2023-07-08 14:18:45.497699620 -0400 /tmp/SRC/b
   75691117 -rw-r--r--. 1 0 0 4 2023-07-08 10:00:00.000000000 -0400 /tmp/SRC/c
   [root@archive3 tmp]# rsync --itemize-changes  -rlpgoD --size-only 
--link-dest=/tmp/DEST0 /tmp/SRC/ /tmp/DEST1
   >f+++++++++ c
   [root@archive3 tmp]# ls -lin --time-style=full-iso  /tmp/{SRC,DEST*}/{a,b,c}
   75676810 -rw-r--r--. 4 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/a
   75676810 -rw-r--r--. 4 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/b
   75676810 -rw-r--r--. 4 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST1/a
   75676810 -rw-r--r--. 4 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST1/b
   75693529 -rw-r--r--. 1 0 0 4 2023-07-08 14:50:08.670091884 -0400 /tmp/DEST1/c
   75676805 -rw-r--r--. 1 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/SRC/a
   75687257 -rw-r--r--. 1 0 0 4 2023-07-08 14:18:45.497699620 -0400 /tmp/SRC/b
   75691117 -rw-r--r--. 1 0 0 4 2023-07-08 10:00:00.000000000 -0400 /tmp/SRC/c
   [root@archive3 tmp]# 

Note that the timestamp of 'c' was not preserved in the copy.  While in
the case of 'a' and 'b' I didn't care which of two timestamps were
used, I do want the timestamp taken from one of the source files; I
just don't care which.  The copy of 'c' breaks this as the timestamp of
DEST1/c is the time of the copy; not of SRC/c.

The solution should be obvious: add --times (or replace -rlpgoD with
-a).  However, this breaks the --link-dest behavior

   [root@archive3 tmp]# ls -lin --time-style=full-iso  /tmp/{SRC,DEST*}/{a,b,c}
   ls: cannot access /tmp/SRC/c: No such file or directory
   ls: cannot access /tmp/DEST*/c: No such file or directory
   75676810 -rw-r--r--. 2 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/a
   75676810 -rw-r--r--. 2 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/b
   75676805 -rw-r--r--. 1 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/SRC/a
   75687257 -rw-r--r--. 1 0 0 4 2023-07-08 14:18:45.497699620 -0400 /tmp/SRC/b
   [root@archive3 tmp]# rsync --itemize-changes  -rlpgoD --size-only --times 
--link-dest=/tmp/DEST0 /tmp/SRC/ /tmp/DEST1
   .d..t...... ./
   cf..t...... b
   [root@archive3 tmp]# ls -lin --time-style=full-iso  /tmp/{SRC,DEST*}/{a,b,c}
   ls: cannot access /tmp/SRC/c: No such file or directory
   ls: cannot access /tmp/DEST*/c: No such file or directory
   75676810 -rw-r--r--. 3 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/a
   75676810 -rw-r--r--. 3 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/b
   75676810 -rw-r--r--. 3 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST1/a
   75689925 -rw-r--r--. 1 0 0 4 2023-07-08 14:18:45.497699620 -0400 /tmp/DEST1/b
   75676805 -rw-r--r--. 1 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/SRC/a
   75687257 -rw-r--r--. 1 0 0 4 2023-07-08 14:18:45.497699620 -0400 /tmp/SRC/b
   [root@archive3 tmp]# 

Because rsync is preserving the timestamp of SRC/b in DEST1/b, DEST1/b
no longer shares the inode of DEST1/a and DEST0/{a,b}.  That's
reasonable for these options, but not what I want.

I want to use one of the source timestamps.  I may not care which, but
it should be one of them.  I don't want the timestamp on a copied file
to be the time of the copy.

I've come up with a solution which works but which feels like cheating
or abusing --modify-window.

   [root@archive3 tmp]# ls -lin --time-style=full-iso  /tmp/{SRC,DEST*}/{a,b,c}
   ls: cannot access /tmp/DEST*/c: No such file or directory
   75676810 -rw-r--r--. 2 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/a
   75676810 -rw-r--r--. 2 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/b
   75676805 -rw-r--r--. 1 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/SRC/a
   75687257 -rw-r--r--. 1 0 0 4 2023-07-08 14:18:45.497699620 -0400 /tmp/SRC/b
   75690561 -rw-r--r--. 1 0 0 0 2023-07-08 10:00:00.000000000 -0400 /tmp/SRC/c
   [root@archive3 tmp]# rsync --itemize-changes  -rlpgoD --times 
--modify-window=99999 --size-only --link-dest=/tmp/DEST0 /tmp/SRC/ /tmp/DEST1
   >f+++++++++ c
   [root@archive3 tmp]# ls -lin --time-style=full-iso  /tmp/{SRC,DEST*}/{a,b,c}
   75676810 -rw-r--r--. 4 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/a
   75676810 -rw-r--r--. 4 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST0/b
   75676810 -rw-r--r--. 4 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST1/a
   75676810 -rw-r--r--. 4 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/DEST1/b
   75689931 -rw-r--r--. 1 0 0 0 2023-07-08 10:00:00.000000000 -0400 /tmp/DEST1/c
   75676805 -rw-r--r--. 1 0 0 4 2023-07-08 12:36:42.955935831 -0400 /tmp/SRC/a
   75687257 -rw-r--r--. 1 0 0 4 2023-07-08 14:18:45.497699620 -0400 /tmp/SRC/b
   75690561 -rw-r--r--. 1 0 0 0 2023-07-08 10:00:00.000000000 -0400 /tmp/SRC/c
   [root@archive3 tmp]# 

In this case, all is well.  DEST*/{a,b} share an inode and DEST1/c has
the timestamp from SRC/c.  The use of --times caused the timestamp of
DEST1/c to be correct while --modify-window=99999 appears to have let
DEST1/b share an inode with DEST0/{a,b} and DEST1/a since the timestamp
of SRC/b is within 99999 seconds of the timestamp on DEST0/b.

It works, and it makes sense.  Is this really the proper way to do
this?  It feels like cheating because I'm using --modify-window to
affect a particular result (copy vs. link of 'b') as opposed to
choosing matches.  I understand that "choosing matches" is in the case
determining the result (copy vs. link), but ... it still feels like a
misuse.

Is there a better approach?  Am I nuts?

Thanks.




-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to