Re: [BackupPC-users] BackupPC V4 and --checksum
> > IIUC, you want a way to check the integrity of the pool files on the > server side. > Yes > BackupPC 3 used to have such a function, by re-checksumming and > verifying some percentage of the pool during a nightly (can't remember > the details, and I don't have the v3 docs available). > Found it here: https://backuppc.github.io/backuppc/BackupPC-3.3.2.html#Rsync-checksum-caching The wording further confirms that V4 won't checksum the files once they're added to the pool, contrary to what I believed. > If you want to do this for yourself, it's pretty easy with a cronjob. > Just compare, for all files in $topDir/pool/*/*/, their md5sum with the > filename. Same = good, not the same = bad. > If your pool is compressed, pipe the compressed files in > $topDir/cpool/*/*/ through pigz [1] (which, as opposed to gzip, can > handle the headerless gz format used there), as in the following piece > of bash: > >digest=$(pigz -dc $file | md5sum -b | cut -d' ' -f1) > > Now, check if $digest == $file, and you have a sanity check. (It's > slightly more annoying to find out where $file was referenced in case it > is corrupted; but it's possible, and I recommend not to worry about that > until it happens.) > Perfect, thanks! I can then use --checksum to verify the client, and a script to checksum the server off-line from time to time. The best of both worlds :) Regards, Guillermo -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC V4 and --checksum
Hi, On 2018-07-28 20:04, Guillermo Rozas wrote: Agreed, that is my situation. I'm reasonably sure of the system(UPS, Debian stable, ext4), but as my backups are relatively small (1can trade some extra hours of backup once in a while for the extra peace of mind. IIUC, you want a way to check the integrity of the pool files on the server side. BackupPC 3 used to have such a function, by re-checksumming and verifying some percentage of the pool during a nightly (can't remember the details, and I don't have the v3 docs available). If you want to do this for yourself, it's pretty easy with a cronjob. Just compare, for all files in $topDir/pool/*/*/, their md5sum with the filename. Same = good, not the same = bad. If your pool is compressed, pipe the compressed files in $topDir/cpool/*/*/ through pigz [1] (which, as opposed to gzip, can handle the headerless gz format used there), as in the following piece of bash: digest=$(pigz -dc $file | md5sum -b | cut -d' ' -f1) Now, check if $digest == $file, and you have a sanity check. (It's slightly more annoying to find out where $file was referenced in case it is corrupted; but it's possible, and I recommend not to worry about that until it happens.) Of course, you can easily scrub only a part of your pool, just choose how many subdirectories you want to process each night. 1: https://zlib.net/pigz/ HTH, Alex smime.p7s Description: S/MIME Cryptographic Signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC V4 and --checksum
Hi! Thanks for the answer > > ... rsync --checksum only checksums files on the client, not the > > server. I find this strange because not only the manual says > > otherwise ... > > It is not clear to me what document ("manual") you are reading which > leads you to the conclusions which you seem to have drawn. If you can > give links to the document(s), and quote(s), that might assist. > I was reading the same passages you mentioned, but interpreting them in a different way. > [quote] > * Uses full-file MD5 digests, which are stored in the directory attrib > files. Each backup directory only contains an empty attrib file whose > name includes its own MD5 digest, which is used to look up the attrib > file's contents in the pool. In turn, that file contains the metadata > for every file in that directory, including each files's MD5 digest. > [/quote] > > I take this to mean that, in order to find the checksums for the files > on the client, the server looks in the files in its data directory for > that client precisely because, when it does so, it does NOT then need > to read pool files (to re-calculate the checksums) because it has done > that work already and saved the results in the filesystem. Agreed, this could be an interpretation. However, a bit below it says: [quote] * rsync-bpc doesn't support checksum caching [/quote] Which I interpreted as 'It uses the MD5 digest names only for file reference, but it doesn't rely on them for file integrity. Therefore, it will checksum the files again'. After that, my mind was set: I knew BackupPC already had the checksums, but I thought there were not used by rsync-bpc. Your email prompt me to check re-frame that, and sure enough there is this comment on https://github.com/backuppc/rsync-bpc/blob/master/checksum.c: [quote] * Try to grab the digest from the attributes, which are both MD5 for protocol >= 30. * Otherwise fall through and do it the slow way. [/quote] so this solves the question? In V4, rsync-bpc uses the attributes' MD5 as a cache for the full checksum (which is used by --checksum), but it doesn't have caching capabilities for the block checksums (used by --ignore-times)? Naturally, > using this approach, you rely on the integrity of the previously saved > pool data. Agreed, that is my situation. I'm reasonably sure of the system(UPS, Debian stable, ext4), but as my backups are relatively small (1 This seems to me further to confirm my interpretation of the earlier > quote, and also to suggest the behaviour which you yourself describe > in your posts. It explicitly refers to "a more conservative approach" > which may be what you want. > Yes. However, as the same documentation says: [quote] * The use of rsync --checksum allows BackupPC to guess a potential match anywhere in the pool, even on a first-time backup. In that case, the usual rsync block checksums are still exchanged to make sure the complete file is identical. [/quote] I thought it would be better to use --checksum. But if --checksum doesn't actually checksums the files on the server each time, I agree that using --ignore-times is a better fit for my use case at this point. Thanks. Regards, Guillermo -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC V4 and --checksum
Hi there, On Sat, 28 Jul 2018, Guillermo Rozas wrote: ... rsync --checksum only checksums files on the client, not the server. I find this strange because not only the manual says otherwise ... It is not clear to me what document ("manual") you are reading which leads you to the conclusions which you seem to have drawn. If you can give links to the document(s), and quote(s), that might assist. Quoting from the file .../backuppc-master/doc-src/BackupPC.pod which I downloaded today from Github: [quote] * Uses full-file MD5 digests, which are stored in the directory attrib files. Each backup directory only contains an empty attrib file whose name includes its own MD5 digest, which is used to look up the attrib file's contents in the pool. In turn, that file contains the metadata for every file in that directory, including each files's MD5 digest. [/quote] I take this to mean that, in order to find the checksums for the files on the client, the server looks in the files in its data directory for that client precisely because, when it does so, it does NOT then need to read pool files (to re-calculate the checksums) because it has done that work already and saved the results in the filesystem. Naturally, using this approach, you rely on the integrity of the previously saved pool data. That seems to me to be a very reasonable approach if, for example, (1) you are confident of the reliability of your power supply, your hardware, and your choices of OS and filesystem; (2) the backup server is dedicated to the task (perhaps even if it is a shared server but the backup data store is on a dedicated partition) so you can be confident that errant processes will not unexpectedly damage the data; and (3) neither life nor limb will depend on the backup. Also from the same document: [quote] * An rsync "full" backup now uses --checksum (instead of --ignore-times), which is much more efficient on the server side - the server just needs to check the full-file checksum computed by the client, together with the mtime, nlinks, size attributes, to see if the file has changed. If you want a more conservative approach, you can change it back to --ignore-times, which requires the server to send block checksums to the client. [/quote] This seems to me further to confirm my interpretation of the earlier quote, and also to suggest the behaviour which you yourself describe in your posts. It explicitly refers to "a more conservative approach" which may be what you want. -- 73, Ged. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC V4 and --checksum
Hi, does anybody have a tip on this? Both my computers (Win10, Ubuntu 17.10) have already aged to the point of making their first full backups on their own, and the 'problem' persists: rsync --checksum only checksums files on the client, not the server. I find this strange because not only the manual says otherwise, but there are comments on this list and a even a feature request mentioning the slow checksuming on server during full backups! I'd appreciate any help to elucidate what's happening on my system. I need cheksuming working on the server for data integrity reasons. Thanks! Regards, Guillermo On Tue, Jun 26, 2018 at 10:44 PM Guillermo Rozas wrote: > Hi, > > I've recently installed BackupPC 4.2.1 on my home server (ARMBIAN > 5.38), and I'm trying to understand the behavior of rsync's --checksum > option on V4. > > According to the docs, V4 doesn't have checksum caching, so I was > expecting the server to read and checksum all the files during a full > backup. What I'm seeing is the complete opposite: is seems my server > is not checksuming any file during a full backup with the --checksum > option. Reading operations on the backup disk are kept to a minimum > (less than 300kB/s). Is this the expected behavior or maybe I have a > problem somewhere? > > For reference, the same backup using --ignore-times maxes out the > server's capacity at 20MB/s. In both cases the client behaves as > expected, checksuming everything. Actually, the duration time of a > full --checksum backup is essentially defined by the reading speed of > the client disk, although I know that the server's reading speed is > much slower (all this is running a full backup immediately after an > incremental, so no files are transferred). > > Any help will be appreciated. I can send log files if needed. > Best regards, > Guillermo > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
[BackupPC-users] BackupPC V4 and --checksum
Hi, I've recently installed BackupPC 4.2.1 on my home server (ARMBIAN 5.38), and I'm trying to understand the behavior of rsync's --checksum option on V4. According to the docs, V4 doesn't have checksum caching, so I was expecting the server to read and checksum all the files during a full backup. What I'm seeing is the complete opposite: is seems my server is not checksuming any file during a full backup with the --checksum option. Reading operations on the backup disk are kept to a minimum (less than 300kB/s). Is this the expected behavior or maybe I have a problem somewhere? For reference, the same backup using --ignore-times maxes out the server's capacity at 20MB/s. In both cases the client behaves as expected, checksuming everything. Actually, the duration time of a full --checksum backup is essentially defined by the reading speed of the client disk, although I know that the server's reading speed is much slower (all this is running a full backup immediately after an incremental, so no files are transferred). Any help will be appreciated. I can send log files if needed. Best regards, Guillermo -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/