Duncan <1i5t5.dun...@cox.net> schrieb: [...]
Difficult to twist your mind around that but well explained. ;-) > A snapshot thus looks much like a crash in terms of NOCOW file integrity > since the blocks of a NOCOW file are simply snapshotted in-place, and > there's already no checksumming or file integrity verification on such > files -- they're simply directly written in-place (with the exception of > a single COW write when a writable snapshottted NOCOW file diverges from > the shared snapshot version). > > But as I said, the applications themselves are normally designed to > handle and recover from crashes, and in fact, having btrfs try to manage > it too only complicates things and can actually make it impossible for > the app to recover what it would have otherwise recovered just fine. > > So it should be with these NOCOW in-place snapshotted files, too. If a > NOCOW file is put back into operation from a snapshot, and the file was > being written to at snapshot time, it'll very likely trigger exactly the > same response from the application as a crash while writing would have > triggered, but, the point is, such applications are normally designed to > deal with just that, and thus, they should recover just as they would > from a crash. If they could recover from a crash, it shouldn't be an > issue. If they couldn't, well... So we have common sense that taking a snapshot looks like a crash from the applications perspective. That means if their are facilities to instruct the application to suspend its operations first, you should use them - like in the InnoDB case: http://dev.mysql.com/doc/refman/5.1/en/lock-tables.html: | FLUSH TABLES WITH READ LOCK; | SHOW MASTER STATUS; | SYSTEM xfs_freeze -f /var/lib/mysql; | SYSTEM YOUR_SCRIPT_TO_CREATE_SNAPSHOT.sh; | SYSTEM xfs_freeze -u /var/lib/mysql; | UNLOCK TABLES; | EXIT; Only that way you get consistent snapshots and won't trigger crash-recovery (which might otherwise throw away unrecoverable transactions or otherwise harm your data for the sake of consistency). InnoDB is more or less like a vm filesystem image on btrfs in this case. So the same approach should be taken for vm images if possible. I think VMware has facilities to prepare the guest for a snapshot being taken (it is triggered when you take snapshots with VMware itself, and btw it usually takes much longer than btrfs snapshots do). Take xfs for example: Although it is crash-safe, it prefers to zero-out your files for security reasons during log-replay - because it is crash-safe only for meta-data: if meta-data has already allocated blocks but file-data has not yet been written, a recovered file may end up with wrong content otherwise, so its cleared out. This _IS_NOT_ the situation you want with vm images with xfs inside hosted on btrfs when taking a snapshot. You should trigger xfs_freeze in the guest before taking the btrfs snapshot in the host. I think the same holds true for most other meta-data-only-journalling file systems which probably even do not zero-out files during recovery and just silently corrupt your files during crash-recovery. So in case of crash or snapshot (which looks the same from the application perspective), btrfs' capabilities won't help you here (at least in the nocow case, probably in the cow case too, because the vm guest may write blocks out-of-order without having the possibility to pass write-barriers down to btrfs cow mechanism). Taking snapshots of database files or vm images without proper prepartion only guarantees you crash-like rollback situations. Taking snapshots even at short intervals only makes this worse, with all the extra downsides of effects this has within the btrfs. I think this is important to understand for people planning to do automated snapshots of such file data. Making a file nocow only helps the situation during normal operation - but after a snapshot, a nocow file is essentially cow while carried over to the new subvolume generation during writes of blocks from the old generation. -- Replies to list only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html