rsync Folks,

The following explanatory text is by me and the patches are by Rowan
McKenzie for use by the Advanced Scientific Computing group at CSIRO.

This patch builds upon the --link-dest patch by Bryant Hansen (Thanks heaps!).

1. The original patch provided an alternate behaviour for rsync when
using the --link-dest option.

When there are identical files in the source and link-dest areas, but
a different file in the destination area, standard rsync will update
the destination by copying the file from source to destination: the
patch updates the destination by hard-linking from the link-dest area.

Why do we want this behaviour?

In our backup system which we have been running since 2007 using
rsync, we recycle old backup destinations to be the target for new
backups.  This is an efficiency gain - we don't want to create a new
area with, in several cases, millions of files, when we have an older
area which is nearly up to date.  (With mature backups of user areas,
we typically find that our daily backups have a churn of 0.5% of files
and 1% of data.)

The original patch ensures we get the maximum amount of hard-linking
available.

However, the original patch unconditionally outputs messages for every
file hard-linked under the scenario outlined above.  Our modified
patch makes output of the diagnostic message controlled by the -v option.


2. In addition, the patch adds one more feature and option.  In our
backups some time ago, we wished to avoid repeated daily backups of
large files that were being appended to each day - they were the
outputs of computational models.  We used the --max-size parameter to
skip these files: however, we did not like the lack of warning about
skipped files.  This patch adds another parameter, --warn, to select
the output of a warning message when files bigger than the selected
--max-size are skipped.  The message is of the form:
big_file is over max-size


3. For information: our backups are controlled from the destination of
the backups (pull rather than push as Kevin Korb recently advised).
We use the rsync daemon capability.

The destinations of our backups are file systems subject to HSM
(Hierarchical Storage Management), using SGI's Data Migration
Facility (DMF).

A typical command we use is the following (but I have shortened the
paths and addresses).


rsync --password-file=not_for_your_eyes --numeric-ids -a --stats
--one-file-system --max-size=8.0GB  --warn --whole-file
--link-dest=previous --delete root@source_host::backups/source_dir current

--password-file=not_for_your_eyes
        . for the daemon
--numeric-ids
        . since the userids on the source are not always available on the 
destination
-a
        . archive mode
--stats
        . statistics
--one-file-system
        . stops the backup of everything when backing up /
--max-size=8.0GB
        . to skip large files
--warn
        . NEW parameter - warn of skipped files because of --max-size
--whole-file
        . essential when the destination is subject to HSM: otherwise,
          files will be recalled to use the rsync comparison algorithm
--link-dest=previous
        . pointer to previous backup: to provide a source of files for
          hard-linking
--delete
        . essential when the destination is a recycled directory, to
ensure superseded files are deleted
root@source_host::backups/source_dir
        . the source specification: username @source_host, module
specification, and source directory
current
        . the destination directory.

We use an extended Tower of Hanoi scheme to manage the keeping of backups:
- highly recommended for its ability to provide sensible keeping of
    backups matched to the likelihood of restores, and because it avoids
    messy management using dates and times.



Regards
Rob. Bell              e-mail: robert.b...@csiro.au
--
Dr Robert C. Bell, BSc (Hons) PhD
Technical Services Manager
Advanced Scientific Computing
CSIRO IM&T

Phone: +61 3 9669 8102 | Mobile: +61 428 108 333 | CSIRO 93 3810
robert.b...@csiro.au | http://www.csiro.au/ | http://www.hpsc.csiro.au/
Addresses:
Street: CSIRO ASC Level 11, 700 Collins Street, Docklands Vic 3008, Australia
Postal: CSIRO ASC Level 11, GPO Box 1289, Melbourne Vic 3001, Australia
# CSIRO-ASC patches for rsync.
#
# Note: name this so that it's called after rsync_link_dest_from_bryant
#
# Rowan McKenzie 27/7/2012

# Warnings for --max-size ignored files are displayed if -w/--warning is specified
diff -Naur orig/options.c new/options.c
--- orig/options.c	2012-07-26 16:36:59.201899312 +1000
+++ new/options.c	2012-07-26 16:49:40.065898039 +1000
@@ -173,6 +173,7 @@
 char *dest_option = NULL;
 
 int verbose = 0;
+int warn = 0;
 int quiet = 0;
 int output_motd = 1;
 int log_before_transfer = 0;
@@ -313,6 +314,7 @@
   rprintf(F,"\n");
   rprintf(F,"Options\n");
   rprintf(F," -v, --verbose               increase verbosity\n");
+  rprintf(F," -w, --warn                  display warnings\n");
   rprintf(F," -q, --quiet                 suppress non-error messages\n");
   rprintf(F,"     --no-motd               suppress daemon-mode MOTD (see manpage caveat)\n");
   rprintf(F," -c, --checksum              skip based on checksum, not mod-time & size\n");
@@ -453,6 +455,7 @@
   {"help",             0,  POPT_ARG_NONE,   0, OPT_HELP, 0, 0 },
   {"version",          0,  POPT_ARG_NONE,   0, OPT_VERSION, 0, 0},
   {"verbose",         'v', POPT_ARG_NONE,   0, 'v', 0, 0 },
+  {"warn",            'w', POPT_ARG_NONE,   0, 'w', 0, 0 },
   {"no-verbose",       0,  POPT_ARG_VAL,    &verbose, 0, 0, 0 },
   {"no-v",             0,  POPT_ARG_VAL,    &verbose, 0, 0, 0 },
   {"quiet",           'q', POPT_ARG_NONE,   0, 'q', 0, 0 },
@@ -671,6 +674,7 @@
   rprintf(F,"     --log-file-format=FMT   override the \"log format\" setting\n");
   rprintf(F,"     --sockopts=OPTIONS      specify custom TCP options\n");
   rprintf(F," -v, --verbose               increase verbosity\n");
+  rprintf(F," -w, --warn                  display warnings\n");
   rprintf(F," -4, --ipv4                  prefer IPv4\n");
   rprintf(F," -6, --ipv6                  prefer IPv6\n");
   rprintf(F,"     --help                  show this help screen\n");
@@ -698,6 +702,7 @@
   {"server",           0,  POPT_ARG_NONE,   &am_server, 0, 0, 0 },
   {"temp-dir",        'T', POPT_ARG_STRING, &tmpdir, 0, 0, 0 },
   {"verbose",         'v', POPT_ARG_NONE,   0, 'v', 0, 0 },
+  {"warn",            'w', POPT_ARG_NONE,   0, 'w', 0, 0 },
   {"no-verbose",       0,  POPT_ARG_VAL,    &verbose, 0, 0, 0 },
   {"no-v",             0,  POPT_ARG_VAL,    &verbose, 0, 0, 0 },
   {"help",            'h', POPT_ARG_NONE,   0, 'h', 0, 0 },
@@ -978,6 +983,10 @@
 					verbose++;
 					break;
 
+				case 'w':
+					warn++;
+					break;
+
 				default:
 					rprintf(FERROR,
 					    "rsync: %s: %s (in daemon mode)\n",
@@ -1095,6 +1104,10 @@
 			verbose++;
 			break;
 
+		case 'w':
+			warn++;
+			break;
+
 		case 'q':
 			quiet++;
 			break;

# Warnings for --max-size ignored files are displayed if -w/--warning is specified
diff -Naur orig/generator.c new/generator.c
--- orig/generator.c	2012-07-26 16:56:13.773898292 +1000
+++ new/generator.c	2012-07-26 16:44:19.597902412 +1000
@@ -24,6 +24,7 @@
 #include "same-inode.h"
 
 extern int verbose;
+extern int warn;
 extern int dry_run;
 extern int do_xfers;
 extern int stdout_format_has_i;
@@ -1796,7 +1796,7 @@
 	}
 
 	if (max_size > 0 && F_LENGTH(file) > max_size) {
-		if (verbose > 1) {
+		if (verbose > 1 || warn) {
 			if (solo_file)
 				fname = f_name(file, NULL);
 			rprintf(FINFO, "%s is over max-size\n", fname);

# Only output '=>' notifications when -v/--verbose specified (it's a patch to the rsync_link_dest_from_bryant patch)
diff -Naur orig/generator.c new/generator.c
--- orig/generator.c	2012-07-26 16:56:13.773898292 +1000
+++ new/generator.c	2012-07-26 16:44:19.597902412 +1000
@@ -1066,7 +1066,7 @@
 			 * doesn't exist... */
 			if (!statres || stat_err == ENOENT) {
 				if (!SAME_INODE(sxp->st, st_tmp)) {
-					if (!statres) {
+			/*if (!statres) {*/ if (!statres && verbose) {
 						rprintf(FCLIENT, "%s => %s\n", fname, cmpbuf);
 					}
 					if (!hard_link_one(file, fname, cmpbuf, 1)) {
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to