On tis, 2014-10-21 at 13:31 +0200, Lennart Poettering wrote: > On Mon, 13.10.14 08:44, Alexander Larsson ([email protected]) wrote: > > > In some sense it is unavoidable. We have to tie the exact file data to > > the signature. However, does this mean we have to shove random bits at > > the kernel rather than going through the syscall interface? > > > > btrfs-receive is a userspace tool that uses the regular userspace i/o > > syscalls to do its modifications. How does this propose to handle the > > signatures? If it can do it, why would it not be possible to do > > ourselves? > > Sure, it's possible to implement our own btrfs send/recv > implementation in userspace. > > At LPC we sat down with Chris Mason about this, and it's certainly an > option for us, the code for serializing/deserializing things is > supposedly not that difficult.
btrfs-send is a kernel-space tool (a syscall), but all it generates is an array of "op + data" tuples which btrfs-receive applies using the normal syscalls (i.e. op=write, data=file,offset,content, or op=rename, data=src,dest). This is very nice for things like e.g. an incremental backup of a database where only some blocks changed of the file. However, for an app upgrade you generally rebuild from scratch, you don't actually modify the previous release. The delta must be generated by a userspace tool like e.g. rsyncing the new release over the old, so the use of btrfs-send is really just a way to encode the output of rsync. > > > Also, the hardlink farms are certainly not pretty. > > > > They are not pretty, sure. However they are very widely available, and > > the *only* solution that allows page-cache sharing between images, and > > "trivial" deduplication between unrelated images. I don't think we > > should to easily dismiss it. > > So, we asked Chris about dedup. He basically said that online dedup is > there, and will be done implicitly when you do btrfs recv hence. Or in > other words, dedup is really nothing we need to actviely think about > if we use btrfs, it's just there. That is only dedup of the parent<->child though, not between unrelated images. And the dedup is only on the disk, not in page cache. > > > Harald has been playing around with some build logic that makes sure > > > that rebuilt app updates are efficiently shipped as btrfs send/recv, > > > with stable inode numbers and stuff. > > > > How exactly do you envision this would work in practice for updates? Say > > you have an application that receives regular updates (major and minor). > > At any time the user comes in an does a fetch-from-scratch, or an update > > between two essentially "random" versions. What does the server store? > > A copy of each full image? Only for major versions? Delta inbetween each > > consecutive image? Delta between each possible image pair? > > Well, it could certainly generate the diffs on the fly, by looking at > the actual btrfs volumes with their subvolumes. However, I'd assume > we'd pre-generate relevant deltas in advance, maybe in logarithmic > increasing distances. You don't want the servers to be doing "smart" things, they are typically very dumb mirroring systems that just store and deliver plain files. So, in the btrfs case one would have to store minimally the initial version and all incremental deltas, and then to decrease the amount users have to download you have to start duplicating this by adding various kinds of deltas and full versions, plus some kind of indexing system for these so you know what are available. Not impossible, but its not trivial either, and you'll have to duplicate a lot of data to avoid the initial download of a "random" (i.e. not the first) version to be fast. _______________________________________________ gnome-os-list mailing list [email protected] https://mail.gnome.org/mailman/listinfo/gnome-os-list
