Re: [BackupPC-users] Find file given digest (and a decompression error)

2018-11-27 Thread Guillermo Rozas
>
> > Of course, you're right :) (although pigz failed only in 2 files out of
> > several thousands).
>
> oh well, I was wondering about that. I've yet to see such a file (and
> probably never will, because I disabled pool compression for good and
> now use btrfs' lzop filesystem-based compression), but...
>

I've found a third, all of them 6GB+ ISO. I'm starting to see a pattern :P

> BackupPC_zcat decompresses both files correctly and their checksums are
> > correct now. However, at least with one of the files there is something
> > fishy going on because the compressed version is 60KB, the decompressed
> > is 7GB!
>
> I'd bet that those two files are extremely sparse.
> There are good reasons for such a file to be generated: e.g., from a
> ddrescue run that skipped lots of bad areas on a drive, or a VM disk
> image with a recently formatted partition, or similar. On many modern
> file systems supporting sparse files, the overhead for the holes in the
> file is negligible, so it's easier from a user perspective to allocate
> the "full" file and rely on the filesystem's abilities to optimize
> storage and access.
> However, some of BackupPC's transfer methods (in particular, rsync)
> cannot treat sparse files natively, but since they compress so well,
> that's hardly an issue for transfer nor storage on the server.
>

Thanks for the nice explanation. Unfortunately in this case was a rather
more mundane reason, like me failing to properly read the number of digits
of a big number...


> The reason why I recommended pigz (unfortunately without an appropriate
> disclaimer) is that it
> - never failed on me, for the files I had around at that time, and
> - it was *magnitudes* faster than BackupPC_zcat.
>
> But I had a severely CPU-limited machine; YMMV with a more powerful CPU.
> Depending on your use case (and performance experience), it might still
> be clever to run pigz first and only run BackupPC_zcat if there is a
> mismatch. If a pigz-decompressed file matches the expected hash, I'd bet
> at approximately 1 : 2^64 that no corruption happened.
>

I'm very severely CPU-limited (Banana Pi), so this can make a huge
difference. I tested it by checking two top level cpool dirs (roughly 1/64
~ 1.5% of the pool). I compared pigz, zlib-flate and BackupPC_zcat and on
my system:
- both pigz and zlib-flate are much faster than BackupPC_zcat, they take
around a quarter of the time to check the files
- pigz is marginally faster than zlib-flate
- BackupPC_zcat puts the lower load on the CPU, zlib-flate's load is 30-35%
higher, and pigz's is a whooping 80-100% higher (pigz's load is actually
higher than 2 in this 2-core system)
- of course, BackupPC_zcat has the advantage of always working, zlib-flate
and pigz fail at the same files (very few)

With this data, I modified my script to normally run zlib-flate to check
the files, and re-check every failure with BackupPC_zcat before calling it
a real error. I think this gets the best balance between load on the system
and time spent checking the pool (I can traverse the entire pool in 32 days
with ~30 min of checking every day).

> I'll check those 2 files tonight, and hopefully
> > have a script working by the weekend.
>
> Cool! If you don't mind and are allowed to, please share here...
>

The check script is almost there, I want to verify it for a couple of days
more before sharing it. The find script seems a bit harder to code that
what I first thought :)

Cheers,
Guillermo
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Find file given digest (and a decompression error)

2018-11-27 Thread Alexander Kobel

Hi,

On 27.11.18 13:40, Guillermo Rozas wrote:

Pigz doesn't correctly support BackupPC compressed files, although
it will in some cases.  The reported error is likely the problem. 
Please use BackupPC_zcat instead and report back.



Of course, you're right :) (although pigz failed only in 2 files out of 
several thousands).


oh well, I was wondering about that. I've yet to see such a file (and 
probably never will, because I disabled pool compression for good and 
now use btrfs' lzop filesystem-based compression), but...


BackupPC_zcat decompresses both files correctly and their checksums are 
correct now. However, at least with one of the files there is something 
fishy going on because the compressed version is 60KB, the decompressed 
is 7GB!


... from what I understand, the main difference between "standard" 
gzip/pigz and BackupPC's variant is that the latter adds additional 
"sync" operations when a (e.g. sparse) file compresses way better than 
usual. In that case, BackupPC's zipping mechanism ensures that 
decompression only requires a fixed amount of memory, at the expense 
that extremely compressible data does not compress to 0.01%, but 
only to 0.001% or something. (I'm lacking the details, sorry.)


I'd bet that those two files are extremely sparse.
There are good reasons for such a file to be generated: e.g., from a 
ddrescue run that skipped lots of bad areas on a drive, or a VM disk 
image with a recently formatted partition, or similar. On many modern 
file systems supporting sparse files, the overhead for the holes in the 
file is negligible, so it's easier from a user perspective to allocate 
the "full" file and rely on the filesystem's abilities to optimize 
storage and access.
However, some of BackupPC's transfer methods (in particular, rsync) 
cannot treat sparse files natively, but since they compress so well, 
that's hardly an issue for transfer nor storage on the server.



The reason why I recommended pigz (unfortunately without an appropriate 
disclaimer) is that it

- never failed on me, for the files I had around at that time, and
- it was *magnitudes* faster than BackupPC_zcat.

But I had a severely CPU-limited machine; YMMV with a more powerful CPU.
Depending on your use case (and performance experience), it might still 
be clever to run pigz first and only run BackupPC_zcat if there is a 
mismatch. If a pigz-decompressed file matches the expected hash, I'd bet 
at approximately 1 : 2^64 that no corruption happened.



Which brings me to:

I added a guide to the Wiki to find out where a pool file is
referenced

.


That's great, thanks!


Indeed, thanks!

I'll check those 2 files tonight, and hopefully 
have a script working by the weekend.


Cool! If you don't mind and are allowed to, please share here...


Cheers,
Alex



smime.p7s
Description: S/MIME Cryptographic Signature
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Find file given digest (and a decompression error)

2018-11-27 Thread Guillermo Rozas
>
> Pigz doesn't correctly support BackupPC compressed files, although it will
> in some cases.  The reported error is likely the problem.  Please use
> BackupPC_zcat instead and report back.
>

Of course, you're right :) (although pigz failed only in 2 files out of
several thousands).

BackupPC_zcat decompresses both files correctly and their checksums are
correct now. However, at least with one of the files there is something
fishy going on because the compressed version is 60KB, the decompressed is
7GB! Which brings me to:

I added a guide to the Wiki to find out where a pool file is referenced
> 
> .
>

That's great, thanks! I'll check those 2 files tonight, and hopefully have
a script working by the weekend.

Best regards,
Guillermo
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Find file given digest (and a decompression error)

2018-11-26 Thread Craig Barratt via BackupPC-users
Guillermo,

Actually, I'm still unsure if there is a problem with the files themselves
> or with the decompression, as pigz complains that "trailing junk was
> ignored" on both files. Any idea why could be that?


Pigz doesn't correctly support BackupPC compressed files, although it will
in some cases.  The reported error is likely the problem.  Please use
BackupPC_zcat instead and report back.

I added a guide to the Wiki to find out where a pool file is referenced

.

Craig


On Mon, Nov 26, 2018 at 4:31 PM Guillermo Rozas 
wrote:

> Hi,
>
> following the advice from Alex [1] I successfully created a script to
> check the cpool on the server for checksum errors. So far I've found 2
> files whose name do not match their checksum. Is there a simple way to find
> to which paths correspond those files given their MD5 digest?
>
> Actually, I'm still unsure if there is a problem with the files themselves
> or with the decompression, as pigz complains that "trailing junk was
> ignored" on both files. Any idea why could be that?
>
> Best regards,
> Guillermo
>
> [1] https://sourceforge.net/p/backuppc/mailman/message/36379588/
> ___
> BackupPC-users mailing list
> BackupPC-users@lists.sourceforge.net
> List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki:http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/
>
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


[BackupPC-users] Find file given digest (and a decompression error)

2018-11-26 Thread Guillermo Rozas
Hi,

following the advice from Alex [1] I successfully created a script to check
the cpool on the server for checksum errors. So far I've found 2 files
whose name do not match their checksum. Is there a simple way to find to
which paths correspond those files given their MD5 digest?

Actually, I'm still unsure if there is a problem with the files themselves
or with the decompression, as pigz complains that "trailing junk was
ignored" on both files. Any idea why could be that?

Best regards,
Guillermo

[1] https://sourceforge.net/p/backuppc/mailman/message/36379588/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/