Re: [BackupPC-users] Cpool vs. filesystem level compression

Alexander Kobel Thu, 28 Jan 2021 09:07:47 -0800

Hi,

On 1/27/21 10:58 PM, backu...@kosowsky.org wrote:

I know this question has been asked in the more distant past, but I
would like to get the latest views, as relevant to backuppc 4.x


I have my TopDir on a btrfs filesystem which has file-level
compression capabilities (using the mount option -o compress=lzo for
example).

I use the same, both as a daily driver on my machines and for my BackupPC pool. And I've been an early adopter of zstd instead of lzop, which I cannot praise highly enough.

I can do either:
1. Cpool with no btrfs compression
2. Pool with btrfs compression
3. Cpool plus btrfs compression (presumably no advantage)

Correct IMHO. The compressed cpool data will not compress any further. So I'll only comment on scenarios 1 and 2.

Throughout, I'll assume rsync transfers. Educated guess: the arguments hold for tar and rsyncd. For smb, no idea; decompression speed could be even more relevant.

I would like to understand the pros/cons of #1 vs. #2, considering
among other things:
1. Backup speed, including:
      - Initial backup of new files
      - Subsequent incremental/full backups of the same (unchanged) file
      - Subsequent incremental/full backups of the same changed file

For initial backups and changes, it depends on your BackupPC server CPU. The zlib compression in BackupPC is *way* more resource hungry than lzop or zstd. You probably want to make sure that the network bandwidth is the bottleneck rather than compressor throughput:


  gzip -c $somebigfile | pv > /dev/null
  zstd -c $somebigfile | pv > /dev/null
  lzop -c $somebigfile | pv > /dev/null


+/- multithreading, check for yourself.

Unchanged files are essentially for free with both cpool and pool+btrfs-comp for incrementals, but require decryption for full backups except for rsync (as the hashes are always built over the uncompressed content). Same for nightlies, where integrity checks over your pool data is done. Decryption is significantly faster, of course, but still vastly different between the three algorithms. For fast full backups, you might want to ensure that you can decrypt even several times faster than network throughput.

2. Storage efficiency, including:
      - Raw compression efficiency of each file

Cpool does file-level compression, btrfs does block-level compression. The difference is measurable, but not huge (~ 1 to 2% compression ratio in my experience for the same algorithm, i.e. zstd on block vs. file level). Btrfs also includes a logic to not even attempt further compression if a block looks like it's not going to compress well. In my experience, that's hardly ever an issue.

So, yes, using zlib at the same compression level, btrfs compresses slightly worse than BackupPC. But for btrfs there's also lzop and zstd.

      - Ability to take advantage of btrfs extent deduplication for 2
        distinct files that share some or all of the same (uncompressed) content


Won't work with cpool compression.

For pool+btrfs-comp, it's hard to assess - depends on how your data changes. Effectively, this only helps with large files that are mostly identical, such as VM images. Block-level dedup is difficult, only available as offline dedup in btrfs, and you risk that all your backups are destroyed if the one copy of the common block in there gets corrupted. For me a no-go, but YMMV, in particular with a RAID-1.

File level deduplication is irrelevant, because BackupPC takes care of that by pooling.

3. Robustness in case of disk crashes, file corruption, file system
    corruption, other types of "bit rot" etc.
    (note my btrfs filesystem is in a btrfs-native Raid-1
    configuration)

DISCLAIMER: These are instances for personal data of few people. I care about the data, but there are no lives or jobs at stake.

Solid in my experience. Make sure to perform regular scrubs and check that you get informed about problems. On my backup system, I only ever saw problems once, when the HDD was about to die. No RAID to help, so this was fatal for a dozen files, which I had to recover from a second off-site BackupPC server.

On my laptops, I saw scrub errors five, six after power losses during heavy duty. That's less than one occasion per year, but still, it happened.

On a side note, theoretically you won't need nightly pool checks if you run btrfs scrub at the same rate.

With kernel 5.10 being a LTS release, we even have a stable kernel + fallback supporting xxhash/blake2/sha256 checksums, which is great at least from a theoretical perspective.

In case there *is* a defect, however, there's not a whole lot of recovery options on btrfs systems. I wasn't able to recover from any of the above scrub errors, I had to delete the affected files.

In the past, it seems like the tradeoffs were not always clear so
hoping the above outline will help flesh out the details...

Looking for both real-world experience as well as theoretical
observations :)

Theoretically, pool+btrfs-comp with zstd is hard to beat. You won't find a better trade-off between resource usage and compression ratio these days.

Also, I believe it's more elegant and clean to keep compression apart from BackupPC. Storing and retrieving files efficiently is what filesystems are there for; BackupPC is busy enough already with rotating backups, deduplication and transfer.




Practically, it depends on how much trust you put into btrfs.

Definitive answers for that one are expected to be available immediately after the emacs-vs-vi question is settled for good.




HTH,
Alex

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/

Re: [BackupPC-users] Cpool vs. filesystem level compression

Reply via email to