Ben Escoto wrote:
Yes, I just checked some patches into the unstable tree which do just
that. So right now each mirror_metadata entry for a regular file has
an "SHA1Digest" field with the 40 character hex digest in it. Only
hash writing has been added--it doesn't actually do anything with the
information yet.
I'm not sure about a speed penalty, but I just recently realized the
biggest drawback of writing hashes. Hash data is incompressable, and
so a 160 bit hash like SHA1 will add at least 20 bytes (and probably
more) per regular file to the size of the compressed mirror_metadata
file.
At least for my usage, this approximately triples the size of the
mirror_metadata file. I like to keep about a year's worth of backups
of my files, and I have about a million files. So adding the hashs
would turn each of my mirror_metadata file from 12MB as they are now
to 32MB+. Over a year that would cost about 8GB.
I was assuming before I would just turn hashing on and not expose any
other option. But with this tradeoff I think we need to give people
the option. So what do you think the default should be? Keep the
hashs and triple the size of the mirror_metadata file?
Without the hashes, the 8 GB of metadata would be a little under 3 GB. In my
opinion, that doesn't matter much, but I tend to ignore that some people just
don't have the resources to spare. But a difference of 5 GB for a year? One
question you can ask, is "is your data worth the investment of a bit of HD
space". In my opinion, this can be standard behaviour, not available under an
option.
Disadvantages of options, is that it is possible to forget to supply one or two
sometimes. I do my backups from scripts, but what if someone does not? What will
happen if you forget --store-checksums? Will you have an archive with files of
which some are checksummed and some are not? Or will rdiff-backup simply enable
checksums if you're backing up to an archive which already has them?
In the end, my opinion is this: either make it standard behaviour, or when
enabling it with an option, make sure when enabled for the first backup,
subsequent backups have it enabled implictly. Or, when first disabled and then
sometime later enabled, keep it enabled for subsequent backups as well.
Perhaps these issues justify just having it enabled all the time.
BTW, what is your plan what will happen when one upgrades to an rdiff-backup
version which enables checksums? Will an archive be backwards compatible with
older versions?
_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki