On 12/20/2018 09:40 PM, Marek Marczykowski-Górecki wrote:
Thanks for doing this!

I haven't really looked at the code, but I have more generic comment:

The idea of small, frequent snapshots to collect modified blocks bitmaps
is neat. But I'd really, really like to avoid inventing yet another
backup archive format. The current qubes backup format have its own
limitations and while I have some ideas[1] how to plug incremental
backups there, I don't think there is a future in that. On the other
hand, there are already solutions using very similar approach for
handling incremental backup (basically, do not differentiate between
"full", "incremental" and "differential" backups, but split data set
into chunks and send only those not already present in backup archive).
And those already have established formats, including encryption and
integrity protection. Specifically, I'm looking into two of them:
  - duplicity
  - BorgBackup


I think its about time someone in open source created an analog to the Time Machine sparsebundle format, just because its so effective and _simple_: Fixed size chunks of the volume stored as files with filenames representing addresses, and manifests with sha256 hashes. There's scarcely anything more to it than that, and its simple enough to be processed by shell commands like find, cp and zcat (see 'spbk-assemble' for a functional example).

This already works on millions of Mac systems where people expect it to provide hourly backups without noticeably affecting system resources. This class of format I don't mind creating; I think Apple chose well.

As for borg, I'm not sure a heavy emphasis on deduplication is appropriate for many PC applications. Its a resource drain that leads to complex archive formats on the back end. And my initial testing suggests the dedup efficacy is oversold: Sparsebak can sometimes produce smaller multi-generation archives even without dedup.

(We should also consider that among users of de-duplicating filesystems like ZFS and Btrfs, the feature is rarely used in an always-on fashion due to resources.)

But mainly I have doubts about adopting a program like borg as a back end while intending to replace its... back end ...in service of the original goal which is to replace its... front end. This situation speaks to the fact that borg was not designed for this type of use case; It wants files as input and the docs only briefly mention whole-volume data sets when they suggest using image files. That's why using it here is still largely academic.

duplicity have a nice thing that it's easy to integrate with "untrusted
backup storage" - like an AppVM or even sending to some cloud service
via AppVM. Because its model explicitly assume it, and it have an API
for that. Rudd-O even have written such plugin already: [2].

Duplicity is far from having that low-maintenance Time Machine quality that quickly prunes old backups instead of filling up the destination volume and requiring user administration in the form of deleting entire archives and establishing new ones. You might as well adapt qvm-backup to accommodate increments; either way, you get tar files.


One issue with duplicity is its usage of gpg, which should be done
carefully. Because gpg is rather keen on passing untrusted data through
a lot of code paths, unless explicitly told what to do. Even in the PoC
linked above, Rudd-O suggested some non-default gpg options...

As for BorgBackup, AFAIR encryption scheme is done better there (this
needs verification), but on the other hand, it doesn't have such
flexible API for plugging alternative store for the backup. It can
backup either to a local directory, or to remote server speaking
BorgBackup-specific protocol (in practice - running borg tool over ssh).
But the threat model do assume possibility of that server being
malicious, and I believe the client is written with such assumption.

It would be good idea to talk with BorgBackup developers (of which at
least one do use Qubes OS ;) ) about possible integration here. I think
this could include those areas:
  - more abstract and simplified handling of remote repositories (like
    duplicity)


Actually this is one of Sparsebak's strong points... very low interactivity during remote operations.

--

Chris Laprise, tas...@posteo.net
https://github.com/tasket
https://twitter.com/ttaskett
PGP: BEE2 20C5 356E 764A 73EB  4AB3 1DC4 D106 F07F 1886

--
You received this message because you are subscribed to the Google Groups 
"qubes-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to qubes-devel+unsubscr...@googlegroups.com.
To post to this group, send email to qubes-devel@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-devel/54afcc1c-9274-e482-8c20-130b1e1f7077%40posteo.net.
For more options, visit https://groups.google.com/d/optout.

Reply via email to