Re: backing up a file server with many subvolumes

Peter Grandi Sun, 26 Mar 2017 13:25:08 -0700

> [ ... ] In each filesystem subdirectory are incremental
> snapshot subvolumes for that filesystem.  [ ... ] The scheme
> is something like this:


> <top>/backup/<machine>/<filesystem>/<many snapshot subvolumes>

BTW hopefully this does not amounts to too many subvolumes in
the '.../backup/' volume, because that can create complications,
where "too many" IIRC is more than a few dozen (even if a low
number of hundreds is still doable).

> I'd like to try to back up (duplicate) the file server
> filesystem containing these snapshot subvolumes for each
> remote machine. The problem is that I don't think I can use
> send/receive to do this. "Btrfs send" requires "read-only"
> snapshots, and snapshots are not recursive as yet.

Why is that a problem? What is a recursive snapshot?

> I think there are too many subvolumes which change too often
> to make doing this without recursion practical.

It is not clear to me how the «incremental snapshot subvolumes
for that filesystem» are made, whether with RSYNC or 'send' and
'receive' itself. It is also not clear to me why those snapshots
«change too often», why would they change at all? Once a backup
is made in whichever way to an «incremental snapshot», why would
that «incremental snapshot» ever change but for being deleted?

There are some tools that rely on the specific abilities of
'send' with options '-p' and '-c' to save a lot of network
bandwidth and target storage space, perhaps you might be
interested in searching for them.

Anyhow I'll repeat here part of an answer to a similar message:
issues like yours usually are based on incomplete understanding
of 'send' and 'receive', and on IRC user "darkling" explained it
fairly well:

> When you use -c, you're telling the FS that it can expect to
> find a sent copy of that subvol on the receiving side, and
> that anything shared with it can be sent by reference. OK, so
> with -c on its own, you're telling the FS that "all the data
> in this subvol already exists on the remote".

> So, when you send your subvol, *all* of the subvol's metadata
> is sent, and where that metadata refers to an extent that's
> shared with the -c subvol, the extent data isn't sent, because
> it's known to be on the other end already, and can be shared
> directly from there.

> OK. So, with -p, there's a "base" subvol. The send subvol and
> the -p reference subvol are both snapshots of that base (at
> different times). The -p reference subvol, as with -c, is
> assumed to be on the remote FS. However, because it's known to
> be an earlier version of the same data, you can be more
> efficient in the sending by saying "start from the earlier
> version, and modify it in this way to get the new version"

> So, with -p, not all of the metadata is sent, because you know
> you've already got most of it on the remote in the form of the
> earlier version.

> So -p is "take this thing and apply these differences to it"
> and -c is "build this thing from scratch, but you can share
> some of the data with these sources"

Also here some additional details:

  http://logs.tvrrug.org.uk/logs/%23btrfs/2016-06-29.html#2016-06-29T22:39:59

The requirement for read-only is because in that way it is
pretty sure that the same stuff is on both origin and target
volume.

It may help to compare with RSYNC: it has to scan both the full
origin and target trees, because it cannot be told that there is
a parent tree that is the same on origin and target; but with
option '--link-dest' it can do something similar to 'send -c'.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: backing up a file server with many subvolumes

Reply via email to