On Wed, Sep 11, 2013 at 05:30:10PM +0300, Filippos Giannakos wrote:
> I stumbled upon this link [1] which among other things contains the following:
> 
> "iSCSI, FC, or other forms of direct attached storage are only safe to use 
> with
> live migration if you use cache=none."
> 
> How valid is this assertion with current QEMU versions?
> 
> I checked out the source code and was left with the impression  that
> during migration and *before* handling control to the destination, a flush is
> performed on all disks of the VM. Since the VM is started on the destination
> only after the flush is done, its very first read will bring consistent data
> from disk.
> 
> I can understand that on the corner case in which the storage device has
> already been mapped and perhaps has data in the page cache of the destination
> node, there is no way to invalidate them, so the VM will read stale data,
> despite the flushes which happened at the source node.
> 
> In our case, we provision VMs using our custom storage layer, called
> Archipelago [2], which presents volumes as block devices in the host. We would
> like to run VMs in cache=writeback mode. If we guarantee externally that there
> will be no incoherent cached data on the destination host of the migration
> (e.g., by making sure the volume is not mapped on the destination node before
> the migration), would it be safe to do so?
> 
> Can you comment on the aforementioned approach? Please let me know if there's
> something I have misunderstood.
> 
> [1] http://wiki.qemu.org/Migration/Storage
> [2] http://www.synnefo.org/docs/archipelago/latest

Hi Filippos,
Late response but this may help start the discussion...

Cache consistency during migration was discussed a lot on the mailing
list.  You might be able to find threads from about 2 years ago that
discuss this in detail.

Here is what I remember:

During migration the QEMU process on the destination host must be
started.  When QEMU starts up it opens the image file and reads the
first sector (for disk geometry and image format probing).  At this
point the destination would populate its page cache while the source is
still running the guest.

We're in trouble because the destination host has stale pages in its
page cache.  Hence the recommendation to use cache=none.

There are a few things to look at if you are really eager to use
cache=writeback:

1. Can you avoid geometry probing?  I think by setting the geometry
   options on the -drive you can skip probing.  See
   hw/block/hd-geometry.c.

2. Can you avoid format probing?  Use -drive format=raw to skip format
   probing.

3. Make sure to use raw image files.  Do not use a format since that
   would require reading a header and metadata before migration
   handover.

4. Check if ioctl(BLKFLSBUF) can be used.  Unfortunately it requires
   CAP_SYS_ADMIN so the QEMU process cannot issue it when running
   without privileges.  Perhaps an external tool like libvirt could
   issue it, but that's tricky since live migration handover is a
   delicate operation - it's important to avoided dependencies between
   multiple processes to keep guest downtime low and avoid possibility
   of failures.

So you might be able to get away with cache=writeback *if* you carefully
study the code and double-check with strace that the destination QEMU
processes does not access the image file before handover has completed.

Stefan

Reply via email to