I'm not sure what you're doing with your --verify...
It *sounds* like you want a full CRC style check of the *current*
files after the backup is complete. (i.e. File X gets updated with a
delta, and you want to verify that file X is the same both on the
source and destination locations/drives.)
Yes, although it's more of an internal consistency check within the
rdiff-backup repository itself. I'm looking for a way to quickly
verify the integrity my entire rdiff-backup repository.
In my scenario the repository is synced to an external USB drive that
gets rotated each day (i.e. each day I put yesterday's drive in
storage and bring a different drive out of storage to use for the
next
backup). I use rsync to transfer my rdiff-backup repository (which
gets updated daily) to the USB drive. Then I run rdiff-backup --
verify-
at-time to verify that the files on the USB drive are not corrupt.
But
lately this has been taking too long.
Does that make sense?
Yes, and the USB connection may explain the longish verify times,
since it's somewhat slow, compared to a SATA drive connected directly
to the controller...
USB probably does have something to do with how long it takes. But on
the other hand yafic can do a full verify in 1/4 of the time on the
same drive with the same data, etc. So maybe rdiff-backup could be
made to be faster?
But I see that you want to verify the "local" RDiff repository to the
"of-line" one.
I'm not sure what you mean by this statement... I want to do an
internal consistency check on my rdiff-backup repository after it's
been rsync'd to the USB disk. I need to be sure that the data on the
USB disk is valid. I am doing the verify on the USB drive because that
is the last place that the data will be copied before it goes into
secure storage (for up to a month, but normally just a few days).
Maybe an outline of my data flow will help you to understand what I'm
trying to accomplish.
First the hardware:
- Xserve with raid array - this is being backed up with rdiff-backup
- Firewire 800 drive attached to Xserve - staging location for rdiff-
backup repository, gets a new revision each night
- Mac Mini - remote backup "server"
- USB 2.0 drive attached to Mac Mini - gets a copy of the rdiff-backup
repo from the Firewire 800 drive on the Xserve
Now the data flow:
- Xserve runs rdiff-backup from raid array to local firewire drive
- Xserve runs rdiff-backup --verify-at-time 0B on local firewire drive
to verify integrity of most recent revision (this step may not be
necessary)
- Mac Mini runs rsync to copy rdiff-backup repo from Xserve firewire
800 drive to local USB drive
- Mac Mini would now like to verify the integrity of the rdiff-backup
repository that it just rsync'd to the USB drive
During this last step I would rather not tie up any resources on the
Xserve. Instead, I want to do a fully local (to Mac Mini) verification
of the rdiff-backup repository. This verification should let me know
if any link in the (hardware) chain is failing: is the firewire 800
staging drive failing? is the USB drive failing?
Not sure how to do that - I'd guess you could do it with some other
tools - not storing the hashes - just a full compare each time. (How
big is the repository? [I think you said, but I don't recall.]
100 GB mirror + 80 GB of rdiff data. So almost 200 GB
---
But I'd guess your "local" repository isn't on the same disks as the
data, right?
Right.
If so, then it's probably not a huge deal if it takes 20 hours to
check the local repository against the remote. [Though I guess all
that disk channel activity might impact other disk through-put too...]
The drive will be moved to a secure location, so it needs to happen as
quickly as possible. If we have a disaster (fire, etc.) a backup
doesn't do us much good if the most recent snapshot is still online
being verified (and hence consumed by the fire).
(Add a controller? Dunno...)
I use a similar system and I don't verify the local repository to the
remote, though perhaps I should. (I trust rsync to make sure they're
the same...since it's not just copying the files - it's doing hash
matches like RDiff...)
Even if rsync verifies that they're the same this is only a false
sense of security since the staging repo (the source that rsync copied
from) could be corrupt and you'll never know it. This corruption could
be sneaking into old revisions which you don't bother to verify
because it takes too long. There needs to be some way to verify that
ALL of the data is fully intact after it's been copied... --verify-at-
time almost gets there, but not quite. It could get you there if you
have lots of time to do a verify-at-time for each revision in the
repo, but I'm guessing that would be prohibitively expensive in most
cases.
BTW, is this on a windows platform? (Curious...) Ah, probably not
since yafic isn't... :)
Nope. All machines are running Mac OS. I have aspirations to add some
Windows machines at some point, but that's not likely until I get a
faster verify.
~ Daniel
_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki