On 12/31/2018 08:49 AM, Brendan Hoar wrote:
On Saturday, December 29, 2018 at 2:30:12 PM UTC-5, Chris Laprise wrote:
Also note that we'd like to have at least some level of hiding metadata
- - like VM names (leaked through file names).

I have an idea for a relatively simple obfuscation layer that could even
re-order the transmission of chunks in addition to concealing filenames.
It would use an additional index with randomized names and the order
shuffled. Implementing this, I surmise, could improve robustness of the
encryption.
...
Yes, keeping in mind the chunk size I'm using currently is 128kB with
fixed boundaries. I've experimented with simple retroactive dedup based
on sorting the manifest hashes and that can save a little space with
almost no time/power cost. This could be done at send time to save
bandwidth, but that savings may not be worth it. OTOH, if we expect some
users to backup related cloned VMs (common with templates) the potential
savings then becomes very significant even with this simple method.

I tend to keep one or two clones of each template some number of weeks of updates behind, just in 
case an update (especially a *-testing update) goes awry. I think this approach is useful for most 
folks who are trying to balance "more secure by updating regularly" and "able to 
manually recover when a template stops working". So: a backup regime that can dedupe on some 
level would be very welcome.

Q: Speaking of hashes (this is regardless of the encryption question): are the 
hashes in sparsebak salted per qubes system (or backup set?)...or would the 
same hash on two different (non-cloned) Qubes systems match for the same 
content?

They're not salted, so there's likely to be at least some matches between systems. However, the precise version of the template you started with on each system (preferably the same version) will play a role in how much can be matched. To increase the chance of matching, the chunk size used for backups could be reduced as well.

Note that templates are usually very compressible, so its possible compression -- and the incremental delta technique -- will make a bigger difference than dedup overall. Especially true since re-doing initial backups isn't needed, so vast majority of backups are increments which are probably small.

Also, this won't save bandwidth in the near future. Dedup will begin as an optional process that retroactively reclaims space on the destination after send operations. I do have an idea for efficient matching before transmission, but it will take several months to work out.

-

On a different performance note, I'd like to mention something about the ballyhooed borg. In my tests with 8GB volume data and 3GB of incremental updates over wifi, clearing the obsolete 3GB in pruning operations took borg 65sec while pruning out the same 3GB with sparsebak took only 11sec.

That's a factor of six and is almost a minute difference.

This raises questions about the apparent need for borg to re-write some data blocks (or at least large amounts of metadata) in what is probably a highly interactive process. Assuming a user might want backup frequency like they're used to on a Mac with Time Machine, this means eventually pruning will start to kick in many times each day; its a significant factor in performance and may also have ramifications for security.

-


And Chris: thanks for all your contributions to Qubes usability, I really 
appreciate it.

Brendan

Thanks for the feedback! And if you have suggestions for the release name ("Sparsebak" it will not be!) I'd like to hear it.


--

Chris Laprise, tas...@posteo.net
https://github.com/tasket
https://twitter.com/ttaskett
PGP: BEE2 20C5 356E 764A 73EB  4AB3 1DC4 D106 F07F 1886

--
You received this message because you are subscribed to the Google Groups 
"qubes-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to qubes-devel+unsubscr...@googlegroups.com.
To post to this group, send email to qubes-devel@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-devel/d0d9ae3e-143d-11b1-9821-2353f9d77399%40posteo.net.
For more options, visit https://groups.google.com/d/optout.

Reply via email to