Re: Copy on write of unmodified data

Hugo Mills Wed, 25 May 2016 05:29:08 -0700

On Wed, May 25, 2016 at 07:45:23AM -0400, Austin S. Hemmelgarn wrote:
> On 2016-05-25 04:58, H. Peter Anvin wrote:
> >Hi,
> >
> >I'm looking at using a btrfs with snapshots to implement a generational
> >backup capacity.  However, doing it the naïve way would have the side
> >effect that for a file that has been partially modified, after
> >snapshotting the file would be written with *mostly* the same data.  How
> >does btrfs' COW algorithm deal with that?  If necessary I might want to
> >write some smarter user space utilities for this.
> >
> I might be completely incorrect about this, but here's what I
> believe happens in this case:
> 1. If the file is small enough that it gets stored in-line in the
> metadata, you can't avoid COW for the whole file.
> 2. If the file is less than the block size (16k is the current
> default in mkfs.btrfs for reasonably sized filesystems), then you
> also can't avoid COW for the whole file.
> 3. If the file is larger than the block size, COW will only happen
> per-block, and extents will get split at block boundaries to
> minimize the amount of duplication.
> 
> This of course requires that the updates are done by partial
> re-writes instead of a replace-by-rename semantic which is
> particularly popular among various software tools.


   The reason it's popular is that it can be made atomic -- either the
updates all make it to the named file, or they don't (obviously, only
if it's done in the right way, which many applications don't). If you
overwrite in place, then it can't be an atomic update.

   You could get both effects (minimal replacement and atomic update)
if you reflink copy the file, update in place on the copy, and then
replace it atomically, but that of course needs the tool to support it
and fall back to a sane default if reflinks aren't available.

   Hugo.

> FWIW, while I don't use BTRFS like this (I just use snapshots to get
> a consistent state to copy out for backups, usually doing the actual
> backup using SquashFS), one of my friends uses rsync together with
> BTRFS to do incremental backups of his personal systems.  He runs
> rsync with --in-place on the system being backed up to copy things
> out to a dedicated subvolume on his backup device, and then
> snapshots the subvolume after each backup (and uses a snapshot
> thinning system similar to that used by snapper).  While it's not
> quite as efficient as it could be, it's still works well.
> 
> Alternatively, if you're backing up a BTRFS filesystem to another
> one, you can keep around the previous backup snapshot and do an
> incremental send against that, which will result in proper sharing
> of blocks.  I used to use this before I decided that I wanted better
> space efficiency for backups than BTRFS can currently offer.

-- 
Hugo Mills             | A diverse working environment: Di longer you vork
hugo@... carfax.org.uk | here, di verse it gets
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

signature.asc
Description: Digital signature

Re: Copy on write of unmodified data

Reply via email to