Ah yes... I remember from v3 days that the block checksums were appended to the pool files... Craig Barratt via BackupPC-users wrote at about 21:28:49 -0700 on Sunday, June 7, 2020: > Jeff, > > Yes, that's correct. > > In v4 a full backup using --checksum will compare all the metadata and > full-file checksum. Any file that matches all those will be presumed > unchanged. In v4 the server load for a full is very low, since all that > meta data (including the full-file checksum) is stored and easily accessed > without needing to look at the file contents at all. An incremental backup > just checks all the metadata and not the full-file checksum, which is fast > on both the server and client side. V4 also supports incremental-only (by > periodically filling a backup), in cases where that is sufficient. > However, that's more risky and not the default. > > In v3, a full backup checks the block-based deltas and full-file checksum > for every file. That's a lot more work and seems unnecessary. You can get > that behavior in v4 too by replacing --checksums with --ignore-times, but > it's a lot more expensive on the server side since v4 doesn't cache the > block and full-file checksums. > > While md5 collisions can be constructed with various properties, the chance > of a random file change creating a hash collision is 2^-128, as you note. > > Craig > > > On Sun, Jun 7, 2020 at 9:11 PM <backu...@kosowsky.org> wrote: > > > Silly me... the '--checksum' is only for 'Full' so that explains the > > difference between 'incrementals' and 'fulls'... along with presumably > > why my case wasn't caught by an incremental. > > > > I still don't fully understand the comment referencing V3 and replacing > > --checksum with --ignore-times. > > > > Is the point that v3 compared both full file and block > > checksums while in v4 --checksum only compares full file checksums? > > And so v3 is more conservative since there might be checksum > > collisions of 2 non-identical files at the file-checksum level that > > would be unmasked by checksum differences at the block level? > > (presumably a very rare event -- presumably < 2^128 since the hash > > itself is 128 bits and the times and size are also checked) > > > > "" wrote at about 23:54:14 -0400 on Sunday, June 7, 2020: > > > Can someone clarify how --checksum works in v4? > > > And specifically, when could it get 'fooled' thinking 2 files are > > > identical when they really aren't... > > > > > > According to config.pl: > > > > > > The --checksum argument causes the client to send full-file > > > checksum for every file (meaning the client reads every file and > > > computes the checksum, which is sent with the file list). On the > > > server, rsync_bpc will skip any files that have a matching > > > full-file checksum, and size, mtime and number of hardlinks. Any > > > file that has different attributes will be updating using the block > > > rsync algorithm. > > > > > > In V3, full backups applied the block rsync algorithm to every > > > file, which is a lot slower but a bit more conservative. To get > > > that behavior, replace --checksum with --ignore-times. > > > > > > > > > While according to the 'rsync' man pages: > > > -c, --checksum > > > This changes the way rsync checks if the files have been changed > > > and are in need of a transfer. Without this option, rsync uses a > > > "quick check" that (by default) checks if each file’s size and time > > > of last modification match between the sender and receiver. This > > > option changes this to compare a 128-bit checksum for each file > > > that has a matching size. Generating the checksums means that both > > > sides will expend a lot of disk I/O reading all the data in the > > > files in the transfer (and this is prior to any reading that will > > > be done to transfer changed files), so this can slow things down > > > significantly. > > > > > > > > > Note by default: > > > $Conf{RsyncFullArgsExtra} = ['--checksum']; > > > > > > So in v4: > > > - Do incrementals and fulls differ in how/when checksums are used? > > > - For each case, what situations would cause BackupPC to be fooled? > > > - Specifically, I don't understand the comment of replacing --checksum > > > with --ignore-times since the rsync definition of --checksum > > > says that it deosn't look at times but a 128-bit file checksum. > > > > > > The reason I ask is that I recompiled a debian package (happens to be > > > libbackuppc-xs-perl) to pull in the latest version 0.60. But I forgot > > > to change the date in the Changelog. When installing the package, the > > > file dates were the same even though the content and file md5sums for > > > some files had changed. > > > > > > Specifically, > > > /usr/lib/x86_64-linux-gnu/perl5/5.26/auto/BackupPC/XS/XS.so > > > had the same size (and date due to my mistake) but a different file > > > md5sum. > > > > > > And an incremental backup didn't detect this difference... > > > > > > > > > _______________________________________________ > > > BackupPC-users mailing list > > > BackupPC-users@lists.sourceforge.net > > > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users > > > Wiki: http://backuppc.wiki.sourceforge.net > > > Project: http://backuppc.sourceforge.net/ > > > > > > _______________________________________________ > > BackupPC-users mailing list > > BackupPC-users@lists.sourceforge.net > > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users > > Wiki: http://backuppc.wiki.sourceforge.net > > Project: http://backuppc.sourceforge.net/ > > > _______________________________________________ > BackupPC-users mailing list > BackupPC-users@lists.sourceforge.net > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users > Wiki: http://backuppc.wiki.sourceforge.net > Project: http://backuppc.sourceforge.net/
_______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/