Patch for rsync --link-dest won't link even if existing file is out of date (fwd)

2015-04-06 Thread Robert Bell

Folks,
We faced a similar situation to that which Ken described - we recycle
backup directories, for good reason.

There is a patch to solve the problem.

Our systems administrator provided the following description of the
patches we use:


1. rsync_link_dest improvement

by Bryant Hansen

Normally, existing files in destination are never updated from link-dest
but are transferred over the wire. This patch changes that behaviour to
use link-dest instead, which is a major performance enhancement in our
environment.

2. Warnings for --max-size ignored files are displayed if -w/--warning
is specified

by Rowan McKenzie (CSIRO SC)

Warnings for -max-size ignored files are displayed if -w/-warning is
specified. Normally, -max-size causes files to be silently ignored!

3. Only output '=' notifications when -v/--verbose specified

by Rowan McKenzie (CSIRO SC)

Only output '#' notifications when -v/-verbose specified (it's a patch
to the rsync_link_dest_from_bryant patch). This reduces clutter by
suppressing a large class of false positives.:



Hope you can find these.


(All we need now for rsync perfection for our backups is a solution to
the problem of metadata changes being propagated across all directories
for hard-linked files - we would rather new copies be made than lose the
old metadata.)


Regards

Rob.

Dr Robert C. Bell
HPC National Partnerships | Scientific Computing
Information Management and Technology
CSIRO
T +61 3 9669 8102 Alt +61 3 8601 3810 Mob +61 428 108 333
robert.b...@csiro.aumailto:robert.b...@csiro.au | www.csiro.au | 
wiki.csiro.au/display/ASC/
Street: CSIRO ASC Level 11, 700 Collins Street, Docklands Vic 3008, Australia
Postal: CSIRO ASC Level 11, GPO Box 1289, Melbourne Vic 3001, Australia

PLEASE NOTE
The information contained in this email may be confidential or privileged.
Any unauthorised use or disclosure is prohibited.  If you have received
this email in error, please delete it immediately and notify the sender by
return email. Thank you.  To the extent permitted by law, CSIRO does not
represent, warrant and/or guarantee that the integrity of this
communication has been maintained or that the communication is free of
errors, virus, interception or interference.

Please consider the environment before printing this email.

-- Forwarded message --
Date: Mon, 6 Apr 2015 01:51:21 -0400
From: Ken Chase rsync-list-m...@sizone.org
To: rsync@lists.samba.org
Subject: rsync --link-dest won't link even if existing file is out of date

Feature request: allow --link-dest dir to be linked to even if file exists
in target.

This statement from the man page is adhered to too strongly IMHO:

This option works best when copying into an empty destination hierarchy, as
rsync treats existing files as definitive (so it never looks in the link-dest
dirs when a destination file already exists).

I was suprised by this behaviour as generally the scheme is to be efficient/save
space with rsync.

When the file is out of date but exists in the --l-d target, it would be great
if it could be removed and linked. If an option was supplied to request this
behaviour, I'd actually throw some money at making it happen.  (And a further
option to retain a copy if inode permissions/ownership would otherwise be
changed.)

Reasoning:

I backup many servers with --link-dest that have filesystems of 10+M files on
them.  I do not delete old backups - which take 60min per tree or more just so
rsync can recreate them all in an empty target dir when 1% of files change
per day (takes 3-5 hrs per backup!).

Instead, I cycle them in with mv $olddate $today then rsync --del --link-dest
over them - takes 30-60 min depending. (Yes, some malleability of permissions
risk there, mostly interested in contents tho).  Problem is, if a file exists
AT ALL, even out of date, a new copy is put overtop of it per the above man
page decree.

Thus much more disk space is used. Running this scheme with moving old backups
to be written overtop of accumulates many copies of the exact same file over
time.  Running pax -rpl over the copies before rsyncing to them works (and
saves much space!), but takes a very long time as it traverses and compares 2
large backup trees thrashing the same device (in the order of 3-5x the rsync's
time, 3-5 hrs for pax - hardlink(1) is far worse, I suspect a some non-linear
algorithm therein - it ran 3-5x slower than pax again).

I have detailed an example of this scenario at

http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists

which also indicates --delete-before and --whole-file do not help at all.

/kc
--
Ken Chase - k...@heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto 
Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front 
St. W.

--

Re: Downloading specific files with rsync and make them keeping the original directories structures.

2015-04-06 Thread Hongyi Zhao
On Mon, 06 Apr 2015 00:34:37 -0400, Kevin Korb wrote:

 See --relative though it will need a little bit of massaging to avoid
 the debian dir.

Good, thanks a lot.  The following command does the trick:

rsync -vR -P rsync://ftp.cn.debian.org/debian/./dists/Debian7.8/main/
binary-i386/Packages.gz .

Regards
-- 
.: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync --link-dest won't link even if existing file is out of date

2015-04-06 Thread Clint Olsen
Not to mention the fact that ZFS requires considerable hardware resources
(CPU  memory) to perform well. It also requires you to learn a whole new
terminology to wrap your head around it.

It's certainly not a trivial swap to say the least...

Thanks,

-Clint

On Mon, Apr 6, 2015 at 9:12 AM, Ken Chase rsync-list-m...@sizone.org
wrote:

 This has been a consideration. But it pains me that a tiny change/addition
 to the rsync option set would save much time and space for other legit use
 cases.

 We know rsync very well, we dont know ZFS very well (licensing kept the
 tech out of our linux-centric operations). We've been using it but we're
 not experts yet.

 Thanks for the suggestion.

 /kc

 On Mon, Apr 06, 2015 at 12:07:05PM -0400, Kevin Korb said:
   -BEGIN PGP SIGNED MESSAGE-
   Hash: SHA1
   
   Since you are in an environment with millions of files I highly
   recommend that you move to ZFS storage and use ZFS's subvolume
   snapshots instead of --link-dest.  It is much more space efficient,
   rsync run time efficient, and the old backups can be deleted in
   seconds.  Rsync doesn't have to understand anything about ZFS.  You
   just rsync to the same directory every time and have ZFS do a snapshot
   on that directory between runs.
   
   On 04/06/2015 01:51 AM, Ken Chase wrote:
Feature request: allow --link-dest dir to be linked to even if file
exists in target.
   
This statement from the man page is adhered to too strongly IMHO:
   
This option works best when copying into an empty destination
hierarchy, as rsync treats existing files as definitive (so it
never looks in the link-dest dirs when a destination file already
exists).
   
I was suprised by this behaviour as generally the scheme is to be
efficient/save space with rsync.
   
When the file is out of date but exists in the --l-d target, it
would be great if it could be removed and linked. If an option was
supplied to request this behaviour, I'd actually throw some money
at making it happen.  (And a further option to retain a copy if
inode permissions/ownership would otherwise be changed.)
   
Reasoning:
   
I backup many servers with --link-dest that have filesystems of
10+M files on them.  I do not delete old backups - which take 60min
per tree or more just so rsync can recreate them all in an empty
target dir when 1% of files change per day (takes 3-5 hrs per
backup!).
   
Instead, I cycle them in with mv $olddate $today then rsync --del
--link-dest over them - takes 30-60 min depending. (Yes, some
malleability of permissions risk there, mostly interested in
contents tho).  Problem is, if a file exists AT ALL, even out of
date, a new copy is put overtop of it per the above man page
decree.
   
Thus much more disk space is used. Running this scheme with moving
old backups to be written overtop of accumulates many copies of the
exact same file over time.  Running pax -rpl over the copies before
rsyncing to them works (and saves much space!), but takes a very
long time as it traverses and compares 2 large backup trees
thrashing the same device (in the order of 3-5x the rsync's time,
3-5 hrs for pax - hardlink(1) is far worse, I suspect a some
non-linear algorithm therein - it ran 3-5x slower than pax again).
   
I have detailed an example of this scenario at
   
   
 http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists
   
 which also indicates --delete-before and --whole-file do not help
at all.
   
/kc
   
   
   - --

 ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
Kevin Korb  Phone:(407) 252-6853
Systems Administrator   Internet:
FutureQuest, Inc.   ke...@futurequest.net  (work)
Orlando, Floridak...@sanitarium.net (personal)
Web page:   http://www.sanitarium.net/
PGP public key available on web site.

 ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
   -BEGIN PGP SIGNATURE-
   Version: GnuPG v2
   
   iEYEARECAAYFAlUirykACgkQVKC1jlbQAQc83ACfa7lawkyPFyO9kDE/D8aztql0
   AkAAoIQ970yTCHB1ypScQ8ILIQR6zphl
   =ktEg
   -END PGP SIGNATURE-
   --
   Please use reply-all for most replies to avoid omitting the mailing
 list.
   To unsubscribe or change options:
 https://lists.samba.org/mailman/listinfo/rsync
   Before posting, read:
 http://www.catb.org/~esr/faqs/smart-questions.html

 --
 Ken Chase - ken att heavycomputing.ca Toronto Canada
 Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151
 Front St. W.
 --
 Please use reply-all for most replies to avoid omitting the mailing list.
 To unsubscribe or change options:
 https://lists.samba.org/mailman/listinfo/rsync
 Before posting, read: 

Re: rsync --link-dest won't link even if existing file is out of date

2015-04-06 Thread Kevin Korb
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

ZFS does have big RAM requirements.  8GB of RAM is pretty much the
minimum.  As for CPU besides being new enough to be on a motherboard
with 8GB of RAM you should be fine.

On 04/06/2015 12:25 PM, Clint Olsen wrote:
 Not to mention the fact that ZFS requires considerable hardware 
 resources (CPU  memory) to perform well. It also requires you to
 learn a whole new terminology to wrap your head around it.
 
 It's certainly not a trivial swap to say the least...
 
 Thanks,
 
 -Clint
 
 On Mon, Apr 6, 2015 at 9:12 AM, Ken Chase
 rsync-list-m...@sizone.org mailto:rsync-list-m...@sizone.org
 wrote:
 
 This has been a consideration. But it pains me that a tiny 
 change/addition to the rsync option set would save much time and
 space for other legit use cases.
 
 We know rsync very well, we dont know ZFS very well (licensing kept
 the tech out of our linux-centric operations). We've been using it
 but we're not experts yet.
 
 Thanks for the suggestion.
 
 /kc
 
 On Mon, Apr 06, 2015 at 12:07:05PM -0400, Kevin Korb said: Since
 you are in an environment with millions of files I highly recommend
 that you move to ZFS storage and use ZFS's subvolume snapshots
 instead of --link-dest.  It is much more space efficient, rsync run
 time efficient, and the old backups can be deleted in seconds.
 Rsync doesn't have to understand anything about ZFS.  You just
 rsync to the same directory every time and have ZFS do a
 snapshot
 on that directory between runs.
 
 On 04/06/2015 01:51 AM, Ken Chase wrote:
 Feature request: allow --link-dest dir to be linked to even if
 file exists in target.
 
 This statement from the man page is adhered to too strongly
 IMHO:
 
 This option works best when copying into an empty destination 
 hierarchy, as rsync treats existing files as definitive (so it 
 never looks in the link-dest dirs when a destination file
 already exists).
 
 I was suprised by this behaviour as generally the scheme is to
 be efficient/save space with rsync.
 
 When the file is out of date but exists in the --l-d target, it 
 would be great if it could be removed and linked. If an option
 was supplied to request this behaviour, I'd actually throw some
 money at making it happen.  (And a further option to retain a
 copy if inode permissions/ownership would otherwise be changed.)
 
 Reasoning:
 
 I backup many servers with --link-dest that have filesystems of 
 10+M files on them.  I do not delete old backups - which take
 60min per tree or more just so rsync can recreate them all in an
 empty target dir when 1% of files change per day (takes 3-5 hrs
 per backup!).
 
 Instead, I cycle them in with mv $olddate $today then rsync
 --del --link-dest over them - takes 30-60 min depending. (Yes,
 some malleability of permissions risk there, mostly interested
 in contents tho).  Problem is, if a file exists AT ALL, even out
 of date, a new copy is put overtop of it per the above man page 
 decree.
 
 Thus much more disk space is used. Running this scheme with
 moving old backups to be written overtop of accumulates many
 copies of the exact same file over time.  Running pax -rpl over
 the copies before rsyncing to them works (and saves much space!),
 but takes a very long time as it traverses and compares 2 large
 backup trees thrashing the same device (in the order of 3-5x the
 rsync's time, 3-5 hrs for pax - hardlink(1) is far worse, I
 suspect a some non-linear algorithm therein - it ran 3-5x slower
 than pax again).
 
 I have detailed an example of this scenario at
 
 
 http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists

  which also indicates --delete-before and --whole-file do not
 help at all.
 
 /kc
 
 
 -- Please use reply-all for most replies to avoid omitting the
 mailing list.
 To unsubscribe or change options:
 https://lists.samba.org/mailman/listinfo/rsync
 Before posting, read:
 http://www.catb.org/~esr/faqs/smart-questions.html
 
 -- Ken Chase - ken att heavycomputing.ca
 http://heavycomputing.ca Toronto Canada Heavy Computing - Clued
 bandwidth, colocation and managed linux VPS @151 Front St. W. -- 
 Please use reply-all for most replies to avoid omitting the
 mailing list. To unsubscribe or change options: 
 https://lists.samba.org/mailman/listinfo/rsync Before posting,
 read: http://www.catb.org/~esr/faqs/smart-questions.html
 
 
 
 

- -- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
Kevin Korb  Phone:(407) 252-6853
Systems Administrator   Internet:
FutureQuest, Inc.   ke...@futurequest.net  (work)
Orlando, Floridak...@sanitarium.net (personal)
Web page:   http://www.sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
-BEGIN PGP SIGNATURE-
Version: 

Re: rsync --link-dest won't link even if existing file is out of date

2015-04-06 Thread Kevin Korb
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Since you are in an environment with millions of files I highly
recommend that you move to ZFS storage and use ZFS's subvolume
snapshots instead of --link-dest.  It is much more space efficient,
rsync run time efficient, and the old backups can be deleted in
seconds.  Rsync doesn't have to understand anything about ZFS.  You
just rsync to the same directory every time and have ZFS do a snapshot
on that directory between runs.

On 04/06/2015 01:51 AM, Ken Chase wrote:
 Feature request: allow --link-dest dir to be linked to even if file
 exists in target.
 
 This statement from the man page is adhered to too strongly IMHO:
 
 This option works best when copying into an empty destination
 hierarchy, as rsync treats existing files as definitive (so it
 never looks in the link-dest dirs when a destination file already
 exists).
 
 I was suprised by this behaviour as generally the scheme is to be
 efficient/save space with rsync.
 
 When the file is out of date but exists in the --l-d target, it
 would be great if it could be removed and linked. If an option was
 supplied to request this behaviour, I'd actually throw some money
 at making it happen.  (And a further option to retain a copy if
 inode permissions/ownership would otherwise be changed.)
 
 Reasoning:
 
 I backup many servers with --link-dest that have filesystems of
 10+M files on them.  I do not delete old backups - which take 60min
 per tree or more just so rsync can recreate them all in an empty
 target dir when 1% of files change per day (takes 3-5 hrs per
 backup!).
 
 Instead, I cycle them in with mv $olddate $today then rsync --del
 --link-dest over them - takes 30-60 min depending. (Yes, some
 malleability of permissions risk there, mostly interested in
 contents tho).  Problem is, if a file exists AT ALL, even out of
 date, a new copy is put overtop of it per the above man page
 decree.
 
 Thus much more disk space is used. Running this scheme with moving
 old backups to be written overtop of accumulates many copies of the
 exact same file over time.  Running pax -rpl over the copies before
 rsyncing to them works (and saves much space!), but takes a very
 long time as it traverses and compares 2 large backup trees
 thrashing the same device (in the order of 3-5x the rsync's time,
 3-5 hrs for pax - hardlink(1) is far worse, I suspect a some
 non-linear algorithm therein - it ran 3-5x slower than pax again).
 
 I have detailed an example of this scenario at
 
 http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists

  which also indicates --delete-before and --whole-file do not help
 at all.
 
 /kc
 

- -- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
Kevin Korb  Phone:(407) 252-6853
Systems Administrator   Internet:
FutureQuest, Inc.   ke...@futurequest.net  (work)
Orlando, Floridak...@sanitarium.net (personal)
Web page:   http://www.sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iEYEARECAAYFAlUirykACgkQVKC1jlbQAQc83ACfa7lawkyPFyO9kDE/D8aztql0
AkAAoIQ970yTCHB1ypScQ8ILIQR6zphl
=ktEg
-END PGP SIGNATURE-
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync --link-dest won't link even if existing file is out of date

2015-04-06 Thread Ken Chase
This has been a consideration. But it pains me that a tiny change/addition
to the rsync option set would save much time and space for other legit use
cases.

We know rsync very well, we dont know ZFS very well (licensing kept the
tech out of our linux-centric operations). We've been using it but we're
not experts yet.

Thanks for the suggestion.

/kc

On Mon, Apr 06, 2015 at 12:07:05PM -0400, Kevin Korb said:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
  
  Since you are in an environment with millions of files I highly
  recommend that you move to ZFS storage and use ZFS's subvolume
  snapshots instead of --link-dest.  It is much more space efficient,
  rsync run time efficient, and the old backups can be deleted in
  seconds.  Rsync doesn't have to understand anything about ZFS.  You
  just rsync to the same directory every time and have ZFS do a snapshot
  on that directory between runs.
  
  On 04/06/2015 01:51 AM, Ken Chase wrote:
   Feature request: allow --link-dest dir to be linked to even if file
   exists in target.
   
   This statement from the man page is adhered to too strongly IMHO:
   
   This option works best when copying into an empty destination
   hierarchy, as rsync treats existing files as definitive (so it
   never looks in the link-dest dirs when a destination file already
   exists).
   
   I was suprised by this behaviour as generally the scheme is to be
   efficient/save space with rsync.
   
   When the file is out of date but exists in the --l-d target, it
   would be great if it could be removed and linked. If an option was
   supplied to request this behaviour, I'd actually throw some money
   at making it happen.  (And a further option to retain a copy if
   inode permissions/ownership would otherwise be changed.)
   
   Reasoning:
   
   I backup many servers with --link-dest that have filesystems of
   10+M files on them.  I do not delete old backups - which take 60min
   per tree or more just so rsync can recreate them all in an empty
   target dir when 1% of files change per day (takes 3-5 hrs per
   backup!).
   
   Instead, I cycle them in with mv $olddate $today then rsync --del
   --link-dest over them - takes 30-60 min depending. (Yes, some
   malleability of permissions risk there, mostly interested in
   contents tho).  Problem is, if a file exists AT ALL, even out of
   date, a new copy is put overtop of it per the above man page
   decree.
   
   Thus much more disk space is used. Running this scheme with moving
   old backups to be written overtop of accumulates many copies of the
   exact same file over time.  Running pax -rpl over the copies before
   rsyncing to them works (and saves much space!), but takes a very
   long time as it traverses and compares 2 large backup trees
   thrashing the same device (in the order of 3-5x the rsync's time,
   3-5 hrs for pax - hardlink(1) is far worse, I suspect a some
   non-linear algorithm therein - it ran 3-5x slower than pax again).
   
   I have detailed an example of this scenario at
   
   
http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists
  
which also indicates --delete-before and --whole-file do not help
   at all.
   
   /kc
   
  
  - -- 
  ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
   Kevin Korb  Phone:(407) 252-6853
   Systems Administrator   Internet:
   FutureQuest, Inc.   ke...@futurequest.net  (work)
   Orlando, Floridak...@sanitarium.net (personal)
   Web page:   http://www.sanitarium.net/
   PGP public key available on web site.
  ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
  -BEGIN PGP SIGNATURE-
  Version: GnuPG v2
  
  iEYEARECAAYFAlUirykACgkQVKC1jlbQAQc83ACfa7lawkyPFyO9kDE/D8aztql0
  AkAAoIQ970yTCHB1ypScQ8ILIQR6zphl
  =ktEg
  -END PGP SIGNATURE-
  -- 
  Please use reply-all for most replies to avoid omitting the mailing list.
  To unsubscribe or change options: 
https://lists.samba.org/mailman/listinfo/rsync
  Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

-- 
Ken Chase - ken att heavycomputing.ca Toronto Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front 
St. W.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync --link-dest won't link even if existing file is out of date

2015-04-06 Thread Kevin Korb
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

It is actually pretty simple...
Instead of mkdir you run zfs create [options] /path/to/directory zfspath
When the rsync run finishes you would do: zfs snapshot zfspath@date
When you want to delete an old backup it do: zfs destroy zfspath

To list the subvolumes: zfs list [-t snapshot]

On 04/06/2015 12:12 PM, Ken Chase wrote:
 This has been a consideration. But it pains me that a tiny
 change/addition to the rsync option set would save much time and
 space for other legit use cases.
 
 We know rsync very well, we dont know ZFS very well (licensing kept
 the tech out of our linux-centric operations). We've been using it
 but we're not experts yet.
 
 Thanks for the suggestion.
 
 /kc
 
 On Mon, Apr 06, 2015 at 12:07:05PM -0400, Kevin Korb said: Since
 you are in an environment with millions of files I highly recommend
 that you move to ZFS storage and use ZFS's subvolume snapshots
 instead of --link-dest.  It is much more space efficient, rsync run
 time efficient, and the old backups can be deleted in seconds.
 Rsync doesn't have to understand anything about ZFS.  You just
 rsync to the same directory every time and have ZFS do a snapshot 
 on that directory between runs.
 
 On 04/06/2015 01:51 AM, Ken Chase wrote:
 Feature request: allow --link-dest dir to be linked to even if
 file exists in target.
 
 This statement from the man page is adhered to too strongly
 IMHO:
 
 This option works best when copying into an empty destination 
 hierarchy, as rsync treats existing files as definitive (so it 
 never looks in the link-dest dirs when a destination file
 already exists).
 
 I was suprised by this behaviour as generally the scheme is to
 be efficient/save space with rsync.
 
 When the file is out of date but exists in the --l-d target, it 
 would be great if it could be removed and linked. If an option
 was supplied to request this behaviour, I'd actually throw some
 money at making it happen.  (And a further option to retain a
 copy if inode permissions/ownership would otherwise be changed.)
 
 Reasoning:
 
 I backup many servers with --link-dest that have filesystems of 
 10+M files on them.  I do not delete old backups - which take
 60min per tree or more just so rsync can recreate them all in an
 empty target dir when 1% of files change per day (takes 3-5 hrs
 per backup!).
 
 Instead, I cycle them in with mv $olddate $today then rsync
 --del --link-dest over them - takes 30-60 min depending. (Yes,
 some malleability of permissions risk there, mostly interested
 in contents tho).  Problem is, if a file exists AT ALL, even out
 of date, a new copy is put overtop of it per the above man page 
 decree.
 
 Thus much more disk space is used. Running this scheme with
 moving old backups to be written overtop of accumulates many
 copies of the exact same file over time.  Running pax -rpl over
 the copies before rsyncing to them works (and saves much space!),
 but takes a very long time as it traverses and compares 2 large
 backup trees thrashing the same device (in the order of 3-5x the
 rsync's time, 3-5 hrs for pax - hardlink(1) is far worse, I
 suspect a some non-linear algorithm therein - it ran 3-5x slower
 than pax again).
 
 I have detailed an example of this scenario at
 
 http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists

  which also indicates --delete-before and --whole-file do not
 help at all.
 
 /kc
 
 
 -- Please use reply-all for most replies to avoid omitting the
 mailing list. To unsubscribe or change options:
 https://lists.samba.org/mailman/listinfo/rsync Before posting,
 read: http://www.catb.org/~esr/faqs/smart-questions.html
 

- -- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
Kevin Korb  Phone:(407) 252-6853
Systems Administrator   Internet:
FutureQuest, Inc.   ke...@futurequest.net  (work)
Orlando, Floridak...@sanitarium.net (personal)
Web page:   http://www.sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iEYEARECAAYFAlUitHAACgkQVKC1jlbQAQeLYQCghRS26weHdBuYDAGBtM0mSB22
OvMAnjmLti7BqNiD9bCfjdewQQ/x2jts
=kFFB
-END PGP SIGNATURE-
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html