Duncan <1i5t5.dun...@cox.net> schrieb:

[...]

Difficult to twist your mind around that but well explained. ;-)

> A snapshot thus looks much like a crash in terms of NOCOW file integrity
> since the blocks of a NOCOW file are simply snapshotted in-place, and
> there's already no checksumming or file integrity verification on such
> files -- they're simply directly written in-place (with the exception of
> a single COW write when a writable snapshottted NOCOW file diverges from
> the shared snapshot version).
> 
> But as I said, the applications themselves are normally designed to
> handle and recover from crashes, and in fact, having btrfs try to manage
> it too only complicates things and can actually make it impossible for
> the app to recover what it would have otherwise recovered just fine.
> 
> So it should be with these NOCOW in-place snapshotted files, too.  If a
> NOCOW file is put back into operation from a snapshot, and the file was
> being written to at snapshot time, it'll very likely trigger exactly the
> same response from the application as a crash while writing would have
> triggered, but, the point is, such applications are normally designed to
> deal with just that, and thus, they should recover just as they would
> from a crash.  If they could recover from a crash, it shouldn't be an
> issue.  If they couldn't, well...

So we have common sense that taking a snapshot looks like a crash from the 
applications perspective. That means if their are facilities to instruct the 
application to suspend its operations first, you should use them - like in 
the InnoDB case:

http://dev.mysql.com/doc/refman/5.1/en/lock-tables.html:

| FLUSH TABLES WITH READ LOCK;
| SHOW MASTER STATUS;
| SYSTEM xfs_freeze -f /var/lib/mysql;
| SYSTEM YOUR_SCRIPT_TO_CREATE_SNAPSHOT.sh;
| SYSTEM xfs_freeze -u /var/lib/mysql;
| UNLOCK TABLES;
| EXIT;

Only that way you get consistent snapshots and won't trigger crash-recovery 
(which might otherwise throw away unrecoverable transactions or otherwise 
harm your data for the sake of consistency). InnoDB is more or less like a 
vm filesystem image on btrfs in this case. So the same approach should be 
taken for vm images if possible. I think VMware has facilities to prepare 
the guest for a snapshot being taken (it is triggered when you take 
snapshots with VMware itself, and btw it usually takes much longer than 
btrfs snapshots do).

Take xfs for example: Although it is crash-safe, it prefers to zero-out your 
files for security reasons during log-replay - because it is crash-safe only 
for meta-data: if meta-data has already allocated blocks but file-data has 
not yet been written, a recovered file may end up with wrong content 
otherwise, so its cleared out. This _IS_NOT_ the situation you want with vm 
images with xfs inside hosted on btrfs when taking a snapshot. You should 
trigger xfs_freeze in the guest before taking the btrfs snapshot in the 
host.

I think the same holds true for most other meta-data-only-journalling file 
systems which probably even do not zero-out files during recovery and just 
silently corrupt your files during crash-recovery.

So in case of crash or snapshot (which looks the same from the application 
perspective), btrfs' capabilities won't help you here (at least in the nocow 
case, probably in the cow case too, because the vm guest may write blocks 
out-of-order without having the possibility to pass write-barriers down to 
btrfs cow mechanism). Taking snapshots of database files or vm images 
without proper prepartion only guarantees you crash-like rollback 
situations. Taking snapshots even at short intervals only makes this worse, 
with all the extra downsides of effects this has within the btrfs.

I think this is important to understand for people planning to do automated 
snapshots of such file data. Making a file nocow only helps the situation 
during normal operation - but after a snapshot, a nocow file is essentially 
cow while carried over to the new subvolume generation during writes of 
blocks from the old generation.

-- 
Replies to list only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to