Ah yes... I remember from v3 days that the block checksums were
appended to the pool files...
Craig Barratt via BackupPC-users wrote at about 21:28:49 -0700 on Sunday, June 
7, 2020:
 > Jeff,
 > 
 > Yes, that's correct.
 > 
 > In v4 a full backup using --checksum will compare all the metadata and
 > full-file checksum.  Any file that matches all those will be presumed
 > unchanged.  In v4 the server load for a full is very low, since all that
 > meta data (including the full-file checksum) is stored and easily accessed
 > without needing to look at the file contents at all.  An incremental backup
 > just checks all the metadata and not the full-file checksum, which is fast
 > on both the server and client side.  V4 also supports incremental-only (by
 > periodically filling a backup), in cases where that is sufficient.
 > However, that's more risky and not the default.
 > 
 > In v3, a full backup checks the block-based deltas and full-file checksum
 > for every file.  That's a lot more work and seems unnecessary.  You can get
 > that behavior in v4 too by replacing --checksums with --ignore-times, but
 > it's a lot more expensive on the server side since v4 doesn't cache the
 > block and full-file checksums.
 > 
 > While md5 collisions can be constructed with various properties, the chance
 > of a random file change creating a hash collision is 2^-128, as you note.
 > 
 > Craig
 > 
 > 
 > On Sun, Jun 7, 2020 at 9:11 PM <backu...@kosowsky.org> wrote:
 > 
 > > Silly me... the '--checksum' is only for 'Full' so that explains the
 > > difference between 'incrementals' and 'fulls'... along with presumably
 > > why my case wasn't caught by an incremental.
 > >
 > > I still don't fully understand the comment referencing V3 and replacing
 > > --checksum with --ignore-times.
 > >
 > > Is the point that v3 compared both full file and block
 > > checksums while in v4 --checksum only compares full file checksums?
 > > And so v3 is more conservative since there might be checksum
 > > collisions of 2 non-identical files at the file-checksum level that
 > > would be unmasked by checksum differences at the block level?
 > > (presumably a very rare event -- presumably < 2^128 since the hash
 > > itself is 128 bits and the times and size are also checked)
 > >
 > > "" wrote at about 23:54:14 -0400 on Sunday, June 7, 2020:
 > >  > Can someone clarify how --checksum works in v4?
 > >  > And specifically, when could it get 'fooled' thinking 2 files are
 > >  > identical when they really aren't...
 > >  >
 > >  > According to config.pl:
 > >  >
 > >  >    The --checksum argument causes the client to send full-file
 > >  >    checksum for every file (meaning the client reads every file and
 > >  >    computes the checksum, which is sent with the file list).  On the
 > >  >    server, rsync_bpc will skip any files that have a matching
 > >  >    full-file checksum, and size, mtime and number of hardlinks.  Any
 > >  >    file that has different attributes will be updating using the block
 > >  >    rsync algorithm.
 > >  >
 > >  >    In V3, full backups applied the block rsync algorithm to every
 > >  >    file, which is a lot slower but a bit more conservative.  To get
 > >  >    that behavior, replace --checksum with --ignore-times.
 > >  >
 > >  >
 > >  > While according to the 'rsync' man pages:
 > >  >    -c, --checksum
 > >  >    This changes the way rsync checks if the files have been changed
 > >  >    and are in need of a transfer.  Without this option, rsync uses a
 > >  >    "quick check" that (by default) checks if each file’s size and time
 > >  >    of last modification match between the sender and receiver.  This
 > >  >    option changes this to compare a 128-bit checksum for each file
 > >  >    that has a matching size.  Generating the checksums means that both
 > >  >    sides will expend a lot of disk I/O reading all the data in the
 > >  >    files in the transfer (and this is prior to any reading that will
 > >  >    be done to transfer changed files), so this can slow things down
 > >  >    significantly.
 > >  >
 > >  >
 > >  > Note by default:
 > >  > $Conf{RsyncFullArgsExtra} = ['--checksum'];
 > >  >
 > >  > So in v4:
 > >  > - Do incrementals and fulls differ in how/when checksums are used?
 > >  > - For each case, what situations would cause BackupPC to be fooled?
 > >  > - Specifically, I don't understand the comment of replacing --checksum
 > >  >   with --ignore-times since the rsync definition of --checksum
 > >  >   says that it deosn't look at times but a 128-bit file checksum.
 > >  >
 > >  > The reason I ask is that I recompiled a debian package (happens to be
 > >  > libbackuppc-xs-perl) to pull in the latest version 0.60. But I forgot
 > >  > to change the date in the Changelog. When installing the package, the
 > >  > file dates were the same even though the content and file md5sums for
 > >  > some files had changed.
 > >  >
 > >  > Specifically,
 > >  > /usr/lib/x86_64-linux-gnu/perl5/5.26/auto/BackupPC/XS/XS.so
 > >  > had the same size (and date due to my mistake) but a different file
 > >  > md5sum.
 > >  >
 > >  > And an incremental backup didn't detect this difference...
 > >  >
 > >  >
 > >  > _______________________________________________
 > >  > BackupPC-users mailing list
 > >  > BackupPC-users@lists.sourceforge.net
 > >  > List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
 > >  > Wiki:    http://backuppc.wiki.sourceforge.net
 > >  > Project: http://backuppc.sourceforge.net/
 > >
 > >
 > > _______________________________________________
 > > BackupPC-users mailing list
 > > BackupPC-users@lists.sourceforge.net
 > > List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
 > > Wiki:    http://backuppc.wiki.sourceforge.net
 > > Project: http://backuppc.sourceforge.net/
 > >
 > _______________________________________________
 > BackupPC-users mailing list
 > BackupPC-users@lists.sourceforge.net
 > List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
 > Wiki:    http://backuppc.wiki.sourceforge.net
 > Project: http://backuppc.sourceforge.net/


_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to