Estimating backup usage with dir-merge filter

2011-10-05 Thread Paul Dugas
I use --filter='dir-merge .backup-filter" to allow my users to
designate portions of their home directories that should be excluded
from my rsync-based backup system.  I'm looking for a way to
periodically generate a report that shows the amount of backup space
being used by each user.  I've tinkered with writing my own script
that processes any filter files into --exclude parameters for "du" but
recently, I've been wondering if there's an easier way that would use
rsync itself, the --filter argument, and --dry-run.  Anyone ever run
into something like this?

P
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Estimating backup usage with dir-merge filter

2011-10-06 Thread Paul Dugas
I appreciate the suggestions so far but I know how to measure usage with
'du' et al. The hitch here is that I want to exclude files the
--filter='dir-merge .rsync-filter' excludes. Hense the thought to use rsync
itself.
On Oct 6, 2011 11:02 AM, "K S Braunsdorf"  wrote:
>>that processes any filter files into --exclude parameters for "du" but
>>recently, I've been wondering if there's an easier way that would use
>
> If your backups are all on a single partition you might try quot(8)
> ("quot -- display disk space occupied by each user"). I wrote a
> very simple perl script to munge quot ouptut to create a "diskhogs"
> report about 20 years ago, and I still use it today. I suggest you
> take the output of
> quot -kvf $BACKUP_DEVICE
>
> and filter it to fit your needs. If you can't find a "quot" for your
> OS I might have a C program that works as a replacement.
>
> --ksb at_host sac.fedex.com
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Estimating backup usage with dir-merge filter

2011-10-06 Thread Paul Dugas
On Thu, Oct 6, 2011 at 4:01 PM, Benjamin R. Haskell  wrote:
> It sounds like you missed the point of Kevin's message (in the other fork of 
> this thread).  The point wasn't to use
> `du`, it was that you can run your stats against the backed-up files, not the 
> source.  Then you're only running stats
> against the results of running the backup using the filters, so you don't 
> need to filter them again.

I got that but neglected to respond to the whole group.  My mistake.
The backups are being performed using BackupPC to a central server
where compression and de-duplication is done.  While it's true that
the actual storage on the backup server being consumed by each user is
less because of these, I don't have any problem hiding this from them
and instead telling them what their uncompressed and duplicated usage
is instead.  It has more of an effect that way if you know what I
mean.

> If that doesn't make sense or isn't possible (backups are on some remote 
> server), then just use your rsync command
> with '--list-only', and post-process that list.

I've been tinkering with using --verbose and --dry-run then parsing
the total size our of the last line of the output and I think I'm
close.  Curiously, when I don't include the --filter option as a
baseline, I'm not getting the same results as "du".

$ du -sb . | awk '{print $1}'
508625653

$ rsync --dry-run --verbose -a . /tmp/does_not_exist | tail -1 | awk
'{print $4}'
506037893

The difference is minimal and probably negligible for this purpose but
I'm still curious where it's coming from.  Maybe there are some sparse
files in there somewhere.

Paul
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Estimating backup usage with dir-merge filter

2011-10-07 Thread Paul Dugas
On Thu, Oct 6, 2011 at 6:49 PM, Henri Shustak  wrote:
>>> It sounds like you missed the point of Kevin's message (in the other fork 
>>> of this thread).  The point wasn't to use
>>> `du`, it was that you can run your stats against the backed-up files, not 
>>> the source.  Then you're only running stats
>>> against the results of running the backup using the filters, so you don't 
>>> need to filter them again.
>>
>> I got that but neglected to respond to the whole group.  My mistake.
>> The backups are being performed using BackupPC to a central server
>> where compression and de-duplication is done.  While it's true that
>> the actual storage on the backup server being consumed by each user is
>> less because of these, I don't have any problem hiding this from them
>> and instead telling them what their uncompressed and duplicated usage
>> is instead.  It has more of an effect that way if you know what I
>> mean.
>>
>>> If that doesn't make sense or isn't possible (backups are on some remote 
>>> server), then just use your rsync command
>>> with '--list-only', and post-process that list.
>>
>> I've been tinkering with using --verbose and --dry-run then parsing
>> the total size our of the last line of the output and I think I'm
>> close.  Curiously, when I don't include the --filter option as a
>> baseline, I'm not getting the same results as "du".
>>
>> $ du -sb . | awk '{print $1}'
>> 508625653
>>
>> $ rsync --dry-run --verbose -a . /tmp/does_not_exist | tail -1 | awk
>> '{print $4}'
>> 506037893
>>
>> The difference is minimal and probably negligible for this purpose but
>> I'm still curious where it's coming from.  Maybe there are some sparse
>> files in there somewhere.
>
> Do you have the same discrepancy if you use the --stats option?

Yes.  Using --stats, the last line of the output is the same as is the
earlier "Total file size:" line in the additional output.

Paul
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html