Re: [BackupPC-users] BackupPC V4 and --checksum

2018-07-30 Thread Guillermo Rozas
>
> IIUC, you want a way to check the integrity of the pool files on the
> server side.
>

Yes


> BackupPC 3 used to have such a function, by re-checksumming and
> verifying some percentage of the pool during a nightly (can't remember
> the details, and I don't have the v3 docs available).
>

Found it here:
https://backuppc.github.io/backuppc/BackupPC-3.3.2.html#Rsync-checksum-caching

The wording further confirms that V4 won't checksum the files once they're
added to the pool, contrary to what I believed.


> If you want to do this for yourself, it's pretty easy with a cronjob.
> Just compare, for all files in $topDir/pool/*/*/, their md5sum with the
> filename. Same = good, not the same = bad.
> If your pool is compressed, pipe the compressed files in
> $topDir/cpool/*/*/ through pigz [1] (which, as opposed to gzip, can
> handle the headerless gz format used there), as in the following piece
> of bash:
>
>digest=$(pigz -dc $file | md5sum -b | cut -d' ' -f1)
>
> Now, check if $digest == $file, and you have a sanity check. (It's
> slightly more annoying to find out where $file was referenced in case it
> is corrupted; but it's possible, and I recommend not to worry about that
> until it happens.)
>

Perfect, thanks! I can then use --checksum to verify the client, and a
script to checksum the server off-line from time to time. The best of both
worlds :)

Regards,
Guillermo
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] BackupPC V4 and --checksum

2018-07-30 Thread Alexander Kobel

Hi,

On 2018-07-28 20:04, Guillermo Rozas wrote:
Agreed, that is my situation. I'm reasonably sure of the system(UPS, 
Debian stable, ext4), but as my backups are relatively small (1can trade some extra hours of backup once in a while for the extra peace 
of mind.


IIUC, you want a way to check the integrity of the pool files on the 
server side.
BackupPC 3 used to have such a function, by re-checksumming and 
verifying some percentage of the pool during a nightly (can't remember 
the details, and I don't have the v3 docs available).


If you want to do this for yourself, it's pretty easy with a cronjob. 
Just compare, for all files in $topDir/pool/*/*/, their md5sum with the 
filename. Same = good, not the same = bad.
If your pool is compressed, pipe the compressed files in 
$topDir/cpool/*/*/ through pigz [1] (which, as opposed to gzip, can 
handle the headerless gz format used there), as in the following piece 
of bash:


  digest=$(pigz -dc $file | md5sum -b | cut -d' ' -f1)

Now, check if $digest == $file, and you have a sanity check. (It's 
slightly more annoying to find out where $file was referenced in case it 
is corrupted; but it's possible, and I recommend not to worry about that 
until it happens.)


Of course, you can easily scrub only a part of your pool, just choose 
how many subdirectories you want to process each night.



  1: https://zlib.net/pigz/


HTH,
Alex



smime.p7s
Description: S/MIME Cryptographic Signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] BackupPC V4 and --checksum

2018-07-28 Thread Guillermo Rozas
Hi! Thanks for the answer


> > ... rsync --checksum only checksums files on the client, not the
> > server.  I find this strange because not only the manual says
> > otherwise ...
>
> It is not clear to me what document ("manual") you are reading which
> leads you to the conclusions which you seem to have drawn.  If you can
> give links to the document(s), and quote(s), that might assist.
>

I was reading the same passages you mentioned, but interpreting them in a
different way.


> [quote]
> *   Uses full-file MD5 digests, which are stored in the directory attrib
>  files. Each backup directory only contains an empty attrib file whose
>  name includes its own MD5 digest, which is used to look up the attrib
>  file's contents in the pool. In turn, that file contains the metadata
>  for every file in that directory, including each files's MD5 digest.
> [/quote]
>
> I take this to mean that, in order to find the checksums for the files
> on the client, the server looks in the files in its data directory for
> that client precisely because, when it does so, it does NOT then need
> to read pool files (to re-calculate the checksums) because it has done
> that work already and saved the results in the filesystem.


Agreed, this could be an interpretation. However, a bit below it says:

[quote]
*  rsync-bpc doesn't support checksum caching
[/quote]

Which I interpreted as 'It uses the MD5 digest names only for file
reference, but it doesn't rely on them for file integrity. Therefore, it
will checksum the files again'. After that, my mind was set: I knew
BackupPC already had the checksums, but I thought there were not used by
rsync-bpc.

Your email prompt me to check re-frame that, and sure enough there is this
comment on https://github.com/backuppc/rsync-bpc/blob/master/checksum.c:

[quote]
* Try to grab the digest from the attributes, which are both MD5 for
protocol >= 30.
* Otherwise fall through and do it the slow way.
[/quote]

so this solves the question? In V4, rsync-bpc uses the attributes' MD5 as a
cache for the full checksum (which is used by --checksum), but it doesn't
have caching capabilities for the block checksums (used by --ignore-times)?

Naturally,
> using this approach, you rely on the integrity of the previously saved
> pool data.


Agreed, that is my situation. I'm reasonably sure of the system(UPS, Debian
stable, ext4), but as my backups are relatively small (1 This seems to me further to confirm my interpretation of the earlier
> quote, and also to suggest the behaviour which you yourself describe
> in your posts.  It explicitly refers to "a more conservative approach"
> which may be what you want.
>

Yes. However, as the same documentation says:

[quote]
* The use of rsync --checksum allows BackupPC to guess a potential match
anywhere in the pool, even on a first-time backup. In that case, the usual
rsync block checksums are still exchanged to make sure the complete file is
identical.
[/quote]

I thought it would be better to use --checksum. But if --checksum doesn't
actually checksums the files on the server each time, I agree that using
--ignore-times is a better fit for my use case at this point. Thanks.

Regards,
Guillermo
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] BackupPC V4 and --checksum

2018-07-28 Thread G.W. Haywood via BackupPC-users

Hi there,

On Sat, 28 Jul 2018, Guillermo Rozas wrote:


... rsync --checksum only checksums files on the client, not the
server.  I find this strange because not only the manual says
otherwise ...


It is not clear to me what document ("manual") you are reading which
leads you to the conclusions which you seem to have drawn.  If you can
give links to the document(s), and quote(s), that might assist.

Quoting from the file .../backuppc-master/doc-src/BackupPC.pod which I
downloaded today from Github:

[quote]
*   Uses full-file MD5 digests, which are stored in the directory attrib
files. Each backup directory only contains an empty attrib file whose
name includes its own MD5 digest, which is used to look up the attrib
file's contents in the pool. In turn, that file contains the metadata
for every file in that directory, including each files's MD5 digest.
[/quote]

I take this to mean that, in order to find the checksums for the files
on the client, the server looks in the files in its data directory for
that client precisely because, when it does so, it does NOT then need
to read pool files (to re-calculate the checksums) because it has done
that work already and saved the results in the filesystem.  Naturally,
using this approach, you rely on the integrity of the previously saved
pool data.  That seems to me to be a very reasonable approach if, for
example, (1) you are confident of the reliability of your power supply,
your hardware, and your choices of OS and filesystem; (2) the backup
server is dedicated to the task (perhaps even if it is a shared server
but the backup data store is on a dedicated partition) so you can be
confident that errant processes will not unexpectedly damage the data;
and (3) neither life nor limb will depend on the backup.

Also from the same document:

[quote]
*   An rsync "full" backup now uses --checksum (instead of --ignore-times),
which is much more efficient on the server side - the server just
needs to check the full-file checksum computed by the client,
together with the mtime, nlinks, size attributes, to see if the
file has changed. If you want a more conservative approach, you
can change it back to --ignore-times, which requires the server to
send block checksums to the client.
[/quote]

This seems to me further to confirm my interpretation of the earlier
quote, and also to suggest the behaviour which you yourself describe
in your posts.  It explicitly refers to "a more conservative approach"
which may be what you want.

--

73,
Ged.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] BackupPC V4 and --checksum

2018-07-28 Thread Guillermo Rozas
Hi,
does anybody have a tip on this? Both my computers (Win10, Ubuntu 17.10)
have already aged to the point of making their first full backups on their
own, and the 'problem' persists: rsync --checksum only checksums files on
the client, not the server.
I find this strange because not only the manual says otherwise, but there
are comments on this list and a even a feature request mentioning the slow
checksuming on server during full backups!
I'd appreciate any help to elucidate what's happening on my system. I need
cheksuming working on the server for data integrity reasons. Thanks!
Regards,
Guillermo

On Tue, Jun 26, 2018 at 10:44 PM Guillermo Rozas 
wrote:

> Hi,
>
> I've recently installed BackupPC 4.2.1 on my home server (ARMBIAN
> 5.38), and I'm trying to understand the behavior of rsync's --checksum
> option on V4.
>
> According to the docs, V4 doesn't have checksum caching, so I was
> expecting the server to read and checksum all the files during a full
> backup. What I'm seeing is the complete opposite: is seems my server
> is not checksuming any file during a full backup with the --checksum
> option. Reading operations on the backup disk are kept to a minimum
> (less than 300kB/s). Is this the expected behavior or maybe I have a
> problem somewhere?
>
> For reference, the same backup using --ignore-times maxes out the
> server's capacity at 20MB/s. In both cases the client behaves as
> expected, checksuming everything. Actually, the duration time of a
> full --checksum backup is essentially defined by the reading speed of
> the client disk, although I know that the server's reading speed is
> much slower (all this is running a full backup immediately after an
> incremental, so no files are transferred).
>
> Any help will be appreciated. I can send log files if needed.
> Best regards,
> Guillermo
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


[BackupPC-users] BackupPC V4 and --checksum

2018-06-26 Thread Guillermo Rozas
Hi,

I've recently installed BackupPC 4.2.1 on my home server (ARMBIAN
5.38), and I'm trying to understand the behavior of rsync's --checksum
option on V4.

According to the docs, V4 doesn't have checksum caching, so I was
expecting the server to read and checksum all the files during a full
backup. What I'm seeing is the complete opposite: is seems my server
is not checksuming any file during a full backup with the --checksum
option. Reading operations on the backup disk are kept to a minimum
(less than 300kB/s). Is this the expected behavior or maybe I have a
problem somewhere?

For reference, the same backup using --ignore-times maxes out the
server's capacity at 20MB/s. In both cases the client behaves as
expected, checksuming everything. Actually, the duration time of a
full --checksum backup is essentially defined by the reading speed of
the client disk, although I know that the server's reading speed is
much slower (all this is running a full backup immediately after an
incremental, so no files are transferred).

Any help will be appreciated. I can send log files if needed.
Best regards,
Guillermo

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/