Am 20.04.2018 um 05:21 hat Stefan Hajnoczi geschrieben: > On Thu, Apr 19, 2018 at 10:18:33AM +0100, Dr. David Alan Gilbert wrote: > > * Stefan Hajnoczi (stefa...@redhat.com) wrote: > > > On Linux posix_fadvise(POSIX_FADV_DONTNEED) invalidates pages*. Use > > > this to drop page cache on the destination host during shared storage > > > migration. This way the destination host will read the latest copy of > > > the data and will not use stale data from the page cache. > > > > > > The flow is as follows: > > > > > > 1. Source host writes out all dirty pages and inactivates drives. > > > 2. QEMU_VM_EOF is sent on migration stream. > > > 3. Destination host invalidates caches before accessing drives. > > > > > > This patch enables live migration even with -drive cache.direct=off. > > > > > > * Terms and conditions may apply, please see patch for details. > > > > > > Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> > > > --- > > > block/file-posix.c | 39 +++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 39 insertions(+) > > > > > > diff --git a/block/file-posix.c b/block/file-posix.c > > > index 3794c0007a..df4f52919f 100644 > > > --- a/block/file-posix.c > > > +++ b/block/file-posix.c > > > @@ -2236,6 +2236,42 @@ static int coroutine_fn > > > raw_co_block_status(BlockDriverState *bs, > > > return ret | BDRV_BLOCK_OFFSET_VALID; > > > } > > > > > > +static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs, > > > + Error **errp) > > > +{ > > > + BDRVRawState *s = bs->opaque; > > > + int ret; > > > + > > > + ret = fd_open(bs); > > > + if (ret < 0) { > > > + error_setg_errno(errp, -ret, "The file descriptor is not open"); > > > + return; > > > + } > > > + > > > + if (s->open_flags & O_DIRECT) { > > > + return; /* No host kernel page cache */ > > > + } > > > + > > > +#if defined(__linux__) > > > + /* This sets the scene for the next syscall... */ > > > + ret = bdrv_co_flush(bs); > > > + if (ret < 0) { > > > + error_setg_errno(errp, -ret, "flush failed"); > > > + return; > > > + } > > > + > > > + /* Linux does not invalidate pages that are dirty, locked, or > > > mmapped by a > > > + * process. These limitations are okay because we just fsynced the > > > file, > > > + * we don't use mmap, and the file should not be in use by other > > > processes. > > > + */ > > > + ret = posix_fadvise(s->fd, 0, 0, POSIX_FADV_DONTNEED); > > > > What happens if I try a migrate between two qemu's on the same host? > > (Which I, and avocado, both use for testing; I think think users > > occasionally do for QEMU updates). > > The steps quoted from the commit description: > > 1. Source host writes out all dirty pages and inactivates drives. > 2. QEMU_VM_EOF is sent on migration stream. > 3. Destination host invalidates caches before accessing drives. > > When we reach Step 3 the source QEMU is not doing I/O (no pages are > locked). The destination QEMU does bdrv_co_flush() so even if pages are > still dirty (that shouldn't happen since the source already drained and > flushed) they will be written out and pages will be clean. Therefore > fadvise really invalidates all resident pages. > > FWIW when writing this patch I tested with both QEMUs on the same host.
Which is actually unnecessary overhead on localhost because the local kernel page cache can't be incoherent with itself. But I don't think it's a real problem either. Kevin
signature.asc
Description: PGP signature