Chris Murphy <li...@colorremedies.com> schrieb:

>> If the database/virtual machine/whatever is crash safe, then the
>> atomic state that a snapshot grabs will be useful.
> 
> How fast is this state fixed on disk from the time of the snapshot
> command? Loosely speaking. I'm curious if this is < 1 second; a few
> seconds; or possibly up to the 30 second default commit interval? And also
> if it's even related to the commit interval time at all?

Such constructs can only be crash-safe if write-barriers are passed down 
through the cow logic of btrfs to the storage layer. That won't probably 
ever happen. Atomic and transactional updates cannot happen without write-
barriers or synchronous writes. To make it work, you need to design the 
storage-layers from the ground up to work without write-barriers, like 
having battery-backed write-caches, synchronous logical file-system layers 
etc. Otherwise, database/vm/whatever transactional/atomic writes are just 
having undefined status down at the lowest storage layer.

> I'm also curious what happens to files that are presently writing. e.g.
> I'm writing a 1GB file to subvol A and before it completes I snapshot
> subvol A into A.1. If I go find the file I was writing to, in A.1, what's
> its state? Truncated? Or or are in-progress writes permitted to complete
> if it's a rw snapshot? Any difference in behavior if it's an ro snapshot?

I wondered that many times, too. What happens to files being written to? I 
suppose, at the time of snapshotting it's taking the current state of the 
blocks as they are, ignoring pending writes. This means, the file being 
written to is probably in limbo state.

For example, xfs has an option to freeze the file system to take atomic 
snapshots. You can use that feature to take consistent snapshots of MySQL 
InnoDB files to create a hot-copy backup of it. But: You need to instruct 
MySQL first to complete its transactions and pausing before running 
xfs_freeze, then after that's done, you can resume MySQL operations. That 
clearly tells me that it is probably not safe to take snapshots of online 
databases, even if they are crash-safe (and by what I know, InnoDB is 
designed to be crash-safe).

A solution, probably far-future, could be that a btrfs snapshot would inform 
all current file-writers to complete transactions and atomic operations and 
wait until each one signals a ready state, then take the snapshot, then 
signal the processes to resume operations. For this, the btrfs driver could 
offer some sort of subscription, similar to what inotify offers. Processes 
subscribe to some sort of notification broadcasts, btrfs can wait for every 
process to report an integral file state. If I remember right, reiser4 
offered some similar feature (approaching the problem from the opposite 
side): processes were offered an interface to start and commit transactions 
within reiser4. If btrfs had such information from file-writers, it could 
take consistent snapshots of online databases/vms/whatever (given, that in 
the vm case the guest could pass this information to the host). Whatever 
approach is taken, however, it will make the time needed to create snapshots 
undeterministic, processes may not finish their transactions within a 
reasonable time...

-- 
Replies to list only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to