> Am 19.02.2021 um 22:14 schrieb David Hildenbrand <dhild...@redhat.com>: > > >>> Am 19.02.2021 um 22:10 schrieb Peter Xu <pet...@redhat.com>: >>> >>> On Fri, Feb 19, 2021 at 03:50:52PM -0500, Peter Xu wrote: >>> Andrey, >>> >>>> On Fri, Feb 19, 2021 at 09:57:37AM +0300, Andrey Gruzdev wrote: >>>> For the discards that happen before snapshot is started, I need to dig >>>> into Linux and QEMU virtio-baloon >>>> code more to get clear with it. >>> >>> Yes it's very tricky on how the error could trigger. >>> >>> Let's think of below sequence: >>> >>> - Start a guest with init_on_free=1 set and also a virtio-balloon device >>> >>> - Guest frees a page P and zeroed it (since init_on_free=1). Now P contains >>> all zeros. >>> >>> - Virtio-balloon reports this page to host, MADV_DONTNEED sent, then this >>> page is dropped on the host. >>> >>> - Start live snapshot, wr-protect all pages (but not including page P >>> because >>> it's currently missing). Let's call it $SNAPSHOT1. >>> >>> - Guest does alloc_page(__GFP_ZERO), accidentally fetching this page P and >>> returned >>> >>> - So far, page P is still all zero (which is good!), then guest uses page P >>> and writes data to it (say, now P has data P1 rather than all zeros). >>> >>> - Live snapshot saves page P, which content P1 rather than all zeros. >>> >>> - Live snapshot completed. Saved as $SNAPSHOT1. >>> >>> Then when load snapshot $SNAPSHOT1, we'll have P contains data P1. After >>> snapshot loaded, when guest allocate again with alloc_page(__GFP_ZERO) on >>> this >>> page P, since guest kernel "thought" this page is all-zero already so >>> memzero() >>> is skipped even if __GFP_ZERO is provided. Then this page P (with content >>> P1) >>> got returned for the alloc_page(__GFP_ZERO) even if __GFP_ZERO set. That >>> could >>> break the caller of alloc_page(). >>> >>>> Anyhow I'm quite sure that adding global MISSING handler for snapshotting >>>> is too heavy and not really needed. >>> >>> UFFDIO_ZEROCOPY installs a zero pfn and that should be all of it. There'll >>> definitely be overhead, but it may not be that huge as imagined. Live >>> snapshot >>> is great in that we have point-in-time image of guest without stopping the >>> guest, so taking slightly longer time won't be a huge loss to us too. >>> >>> Actually we can also think of other ways to work around it. One way is we >>> can >>> pre-fault all guest pages before wr-protect. Note that we don't need to >>> write >>> to the guest page because read would suffice, since uffd-wp would also work >>> with zero pfn. It's just that this workaround won't help on saving snapshot >>> disk space, but it seems working. It would be great if you have other >>> workarounds, maybe as you said UFFDIO_ZEROCOPY is not the only route. >> >> Wait.. it actually seems to also solve the disk usage issue.. :) >> >> We should just need to make sure to prohibit balloon before staring to >> pre-fault read on all guest ram. Seems awkward, but also seems working.. >> Hmm.. > > A shiver just went down my spine. Please don‘t just for the sake of creating > a snapshot. > > (Just imagine you don‘t have a shared zeropage...)
... and I just remembered we read all memory either way. Gah. I have some patches to make snapshots fly with virtio-mem so exactly that won‘t happen. But they depend on vfio support, so it might take a while.