On Thu, Oct 6, 2011 at 4:01 PM, Benjamin R. Haskell <rs...@benizi.com> wrote:
> It sounds like you missed the point of Kevin's message (in the other fork of 
> this thread).  The point wasn't to use
> `du`, it was that you can run your stats against the backed-up files, not the 
> source.  Then you're only running stats
> against the results of running the backup using the filters, so you don't 
> need to filter them again.

I got that but neglected to respond to the whole group.  My mistake.
The backups are being performed using BackupPC to a central server
where compression and de-duplication is done.  While it's true that
the actual storage on the backup server being consumed by each user is
less because of these, I don't have any problem hiding this from them
and instead telling them what their uncompressed and duplicated usage
is instead.  It has more of an effect that way if you know what I
mean.

> If that doesn't make sense or isn't possible (backups are on some remote 
> server), then just use your rsync command
> with '--list-only', and post-process that list.

I've been tinkering with using --verbose and --dry-run then parsing
the total size our of the last line of the output and I think I'm
close.  Curiously, when I don't include the --filter option as a
baseline, I'm not getting the same results as "du".

$ du -sb . | awk '{print $1}'
508625653

$ rsync --dry-run --verbose -a . /tmp/does_not_exist | tail -1 | awk
'{print $4}'
506037893

The difference is minimal and probably negligible for this purpose but
I'm still curious where it's coming from.  Maybe there are some sparse
files in there somewhere.

Paul
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to