Re: [BackupPC-users] Bug in rsync_bpc --sparse?

2018-01-21 Thread Craig Barratt via BackupPC-users
Alex,

I pushed some changes

to rsync-bpc (and 3.0.9 here
)
a few weeks ago that exit with an error if --sparse is specified.

Craig

On Fri, Dec 15, 2017 at 2:31 AM, Alexander Kobel  wrote:

> Hi Craig,
>
> thanks for your swift reply.
>
> On 2017-12-15 05:17, Craig Barratt via BackupPC-users wrote:
>
>> Unfortunately sparse files are not supported by rsync_bpc, and there are
>> no plans to do so.
>>
>
> Okay. Not a big impact for BackupPC's files, anyway - I just thought it's
> safe and no harm, but proven wrong...
>
> I should make it a fatal error if that option is specified.
>>
>
> Yes, that would be great to avoid future mistakes.
>
> I believe a full backup (without --sparse of course) should update the
>> files to their correct state.
>>
>
> Okay. For me to understand: the MD5 hashes are generated on the server
> side, correct? So a file that was transferred incorrectly will not be
> stored under the hash of the original file? And the full backup does not
> just skip based on size, times and names, but on the actual hash of the
> file? In that situation I see why running a full backup should resolve
> everything.
>
>
> May I ask you to crank out a short comment on point d) as well? If it's
> complicated, don't. But I found earlier questions on how to decompress an
> entire pool on the mailing list to employ ZFS' or Btrfs' compression, and
> while it's officially unsupported to convert the pool, I might try if (and
> only if) my assumptions are correct on what would need to be done.
>
>> d) On my *actual* server, I used compression. This incident taught
>> me to verify some of the files manually, and to perhaps migrate to
>> filesystem compression (which I had planned anyway) to keep things
>> as simple as possible.
>>d.1) BackupPC_zcat for verifying/decompressing has a remarkable
>> overhead for a well-grown set of small files (even when pointed
>> directly to the pool files). From what I can tell, Adler's pigz [2]
>> implementation supports headerless zlib files and is *way* faster.
>> Also, all my tests show that files compress to output with the
>> expected hashes encoded in the filename. However, I remembered that
>> BackupPC's compression flushes in between, apparently much like
>> pigz. Are BackupPC's compressed files *fully* in default zlib
>> format, or do I need to expect trouble with large files in corner
>> cases?
>>d.2) Conceptually, what is needed to convert an entire v4 pool to
>> uncompressed storage? Is it just
>> - decompression of all files from cpool/??/?? to pool/??/??
>> (identical names, because hashes are computed on the decompressed
>> data)
>> - move poolCnt files from cpool/?? to pool/??
>> - replace compression level in pc/$host/backups and
>> pc/$host/nnn/backupInfo
>>or do any refCnt files need to be touched as well?
>>
>
>
> Thanks a lot,
> Alex
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> BackupPC-users mailing list
> BackupPC-users@lists.sourceforge.net
> List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki:http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bug in rsync_bpc --sparse?

2017-12-15 Thread Alexander Kobel

Hi Craig,

thanks for your swift reply.

On 2017-12-15 05:17, Craig Barratt via BackupPC-users wrote:
Unfortunately sparse files are not supported by rsync_bpc, and there are 
no plans to do so.


Okay. Not a big impact for BackupPC's files, anyway - I just thought 
it's safe and no harm, but proven wrong...



I should make it a fatal error if that option is specified.


Yes, that would be great to avoid future mistakes.

I believe a full backup (without --sparse of course) should update the 
files to their correct state.


Okay. For me to understand: the MD5 hashes are generated on the server 
side, correct? So a file that was transferred incorrectly will not be 
stored under the hash of the original file? And the full backup does not 
just skip based on size, times and names, but on the actual hash of the 
file? In that situation I see why running a full backup should resolve 
everything.



May I ask you to crank out a short comment on point d) as well? If it's 
complicated, don't. But I found earlier questions on how to decompress 
an entire pool on the mailing list to employ ZFS' or Btrfs' compression, 
and while it's officially unsupported to convert the pool, I might try 
if (and only if) my assumptions are correct on what would need to be done.

d) On my *actual* server, I used compression. This incident taught
me to verify some of the files manually, and to perhaps migrate to
filesystem compression (which I had planned anyway) to keep things
as simple as possible.
   d.1) BackupPC_zcat for verifying/decompressing has a remarkable
overhead for a well-grown set of small files (even when pointed
directly to the pool files). From what I can tell, Adler's pigz [2]
implementation supports headerless zlib files and is *way* faster.
Also, all my tests show that files compress to output with the
expected hashes encoded in the filename. However, I remembered that
BackupPC's compression flushes in between, apparently much like
pigz. Are BackupPC's compressed files *fully* in default zlib
format, or do I need to expect trouble with large files in corner cases?
   d.2) Conceptually, what is needed to convert an entire v4 pool to
uncompressed storage? Is it just
    - decompression of all files from cpool/??/?? to pool/??/??
(identical names, because hashes are computed on the decompressed data)
    - move poolCnt files from cpool/?? to pool/??
    - replace compression level in pc/$host/backups and
pc/$host/nnn/backupInfo
   or do any refCnt files need to be touched as well?



Thanks a lot,
Alex



smime.p7s
Description: S/MIME Cryptographic Signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bug in rsync_bpc --sparse?

2017-12-14 Thread Craig Barratt via BackupPC-users
Unfortunately sparse files are not supported by rsync_bpc, and there are no
plans to do so.

I should make it a fatal error if that option is specified.

I believe a full backup (without --sparse of course) should update the
files to their correct state.

Craig

On Thu, Dec 14, 2017 at 4:59 AM, Alexander Kobel  wrote:

> Dear all,
>
> some while ago I started to include '--sparse' in RsyncArgs.  Now, I
> realized that some files from my backups are corrupted - namely, some of
> those after the configuration change.
>
> I tried to find the problem and realized that, apparently, --sparse is
> indeed the culprit.  I have a file attached which shows the problem, tested
> on Arch Linux' most recent backuppc package, which contains BackupPC-4.1.5,
> BackupPC-XS 0.57 and rsync_bpc 3.0.9.10.  The build script can be seen at
> [1], but it's pretty much as vanilla as it gets. Rsync on the client side
> is again from Arch's official repo, version 3.1.2, protocol 31.  In fact,
> the client is the same machine in this case, but connection is done via ssh
> nevertheless, to simulate the same conditions.
> FWIW, I disabled compression for the test.  Otherwise, the configuration
> of the live server is essentially vanilla except for some excludes.
>
> The original file in this test is core.db, with md5sum 9bde...  Without
> --sparse, the file is stored correctly in $topDir/pool/09/de/9bde
> With --sparse, however, this file does not appear.  Via the webinterface,
> I recovered the file core.db--sparse, with md5sum 7a9e..., and it turned
> out that this is indeed stored in the corresponding pool folder.
>
> So I get:
>
> % md5sum core*
> 9bde60210cc9a3d57fd6aa69d2a964d3  core.db
> 7a9e44d8fcc02d98a2a28d8d9069f448  core.db--sparse
> 9bde60210cc9a3d57fd6aa69d2a964d3  core.db_no--sparse
>
> % diff <(hexdump core.db) <(hexdump core.db--sparse)
> 3584c3584
> < 000dff0 489e 25a7 d831 8731 6ca0 46a3 5dd7 00ee
> ---
> > 000dff0 489e 25a7 d831 8731 6ca0 46a3 5dd7 90ee
> 8072c8072
> < 001f870 5d5a 4a81  0008
> ---
> > 001f870 5d5a 4a81  ad08
>
> Test files straight out of /dev/urandom are unaffected, which was the
> crucial hint that led me to --sparse in the first place - they have no
> sparsity at all.  However, I even found some highly compressed data (e.g.,
> JPEGs) that are affected.
>
>
> So, couple of questions.
>
> a) Can someone confirm (preferably not on a live host and not with real
> data, of course)?
>
> b) Is --sparse plainly not supported by rsync_bpc? If so, should it be
> disabled?
>
> c) Is there any way to reliably list, verify and, if possible, retransfer
> files that have been transferred within the past n days (where n is,
> obviously, today minus date of introduction of --sparse in my args) without
> retransfering *everything*?  (I'm not sure whether any files, especially of
> full runs, have been skipped entirely due or, in other words, whether the
> rolling hashes of rsync have been correct or not.)
>
>
> And finally, somewhat unrelated:
>
> d) On my *actual* server, I used compression. This incident taught me to
> verify some of the files manually, and to perhaps migrate to filesystem
> compression (which I had planned anyway) to keep things as simple as
> possible.
>   d.1) BackupPC_zcat for verifying/decompressing has a remarkable overhead
> for a well-grown set of small files (even when pointed directly to the pool
> files). From what I can tell, Adler's pigz [2] implementation supports
> headerless zlib files and is *way* faster. Also, all my tests show that
> files compress to output with the expected hashes encoded in the filename.
> However, I remembered that BackupPC's compression flushes in between,
> apparently much like pigz. Are BackupPC's compressed files *fully* in
> default zlib format, or do I need to expect trouble with large files in
> corner cases?
>   d.2) Conceptually, what is needed to convert an entire v4 pool to
> uncompressed storage? Is it just
>- decompression of all files from cpool/??/?? to pool/??/?? (identical
> names, because hashes are computed on the decompressed data)
>- move poolCnt files from cpool/?? to pool/??
>- replace compression level in pc/$host/backups and
> pc/$host/nnn/backupInfo
>   or do any refCnt files need to be touched as well?
>
>
> [1] https://git.archlinux.org/svntogit/community.git/tree/trunk/
> PKGBUILD?h=packages/backuppc
> [2] https://zlib.net/pigz/
>
>
> Best,
> Alex
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> BackupPC-users mailing list
> BackupPC-users@lists.sourceforge.net
> List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki:http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/
>
>