[Qemu-devel] Storage requirements for live migration

Anthony Liguori Thu, 10 Nov 2011 16:11:58 -0800

I did a brain dump of my understanding of the various storage requirements forlive migration. I think it's accurate but I may have misunderstand some detailsso I would appreciate review.

I think given sections (1) and (2), the only viable thing is to requirecache=none unless we get new interfaces to flush caches.

Section (3) talks about image formats. As I mentioned elsewhere in the thread,I think the best we can do right now is have a block layer interface to quiescethe image format. I think reopen may be a viable short term strategy for qcow2but I think for raw, we should just make the quiesce operation a nop.


http://wiki.qemu.org/Migration/Storage

Inlined below for ease of review.

Regards,

Anthony Liguori

Migration in QEMU is designed assuming cache coherent shared storage and rawformat block devices. There are some cases where less migration will also workwith more weakly coherent shared storage. This wiki page attempts to outlinethose scenarios. It also attempts to iterate through the reasons why variousimage formats do not support migration even with shared storage.


== NFS ==

=== Background ===

NFS only offers close-to-open cache coherence. This means that the onlyguarantee provided by the protocol is that if you close a file in a client A andthen open the file in another client B, client B will see client A's changes.

The way migration works in QEMU, the source stops the guest after it sends allof the required data but does not immediately free any resources. This makesmigration more reliable since it avoids the Two Generals Problem allowing areliable third node to make the final decision about whether migration wassuccessful.

As soon as the destination receives all of the data, it immediately starts theguest. This means that the reliable third node is not in the critical path ofmigration downtime but can still recover a failed migration.

Since the source never knows that the destination is okay, the only way tosupport NFS robustly would be to close all files on the source before sendingthe last chunk of migration data. This would mean that if any failure occurredafter this point, the VM would be lost.


=== In Practice ===

A Linux NFS server that exports with 'sync' offers a stronger coherency than NFSguarantees. This is an implementation detail, not a guarantee as far as I know.If the client sends a read request, then any data that has been acknowledgeddone with a stable write by any other client will be returned without the needto close and reopen the file.

A file opened with O_DIRECT with the Linux NFS client code wil always issue aprotocol read operation given a userspace read() call. This means that if youissue stable writes (fsync) on the source and then use O_DIRECT to read on thedestination, you can safely access the same file without reopening.


=== Conclusion ===

Migration with QEMU is safe, in practice, when using Linux as an NFS server andclient when both the source and destination are using cache=none for the disksand a raw file.


== iSCSI/Direct Attached Storage ==

iSCSI has a similar cache coherency guarantee to direct attached storage (viafibre channel). Any read request will return data that has been acknowledged aswritten by another client.

Since QEMU issues read() requests in userspace, Linux normally uses the pagecache. The Linux page cache is not coherent across multiple nodes so the onlyway to safely access storage coherently is to bypass the Linux page cache viacache=none.


=== Conclusion ===

iSCSI, FC, or other forms of direct attached storage are only safe to use withlive migration if you use cache=none and a raw image.


== Clustered File Systems ==

Clustered File Systems such as GPFS, Ceph, Glusterfs, or GFS2 are safe to usewith live migration regardless of the caching option use as long as raw imagesare used.


== Image Formats ==

Image formats are not safe to use with live migration. The reason is that QEMUcaches data for image formats and does not have a mechanism to flush thosecaches. The following attempts to describe the issues with the various formats


=== QCOW2 ===

QCOW2 caches two forms of data, cluster metadata (L1/L2 data, refcount table,etc) and mutable header information (file size, snapshot entries, etc).


This data needs to be discarded before after migration starts.

=== QED ===

QED caches similar data to QCOW2. In addition, the QED header has a dirty flagthat must be handled specially in the case of live migration.


=== Raw Files ===

Technically, the file size of a raw file is mutable metadata that QEMU caches.This is only applicable when using online image resizing. If you avoid onlineimage resizing during live migration, raw files are completely safe provided thestorage used meets the above requirements.

[Qemu-devel] Storage requirements for live migration

Reply via email to