On Fri, Feb 19, 2021 at 03:50:52PM -0500, Peter Xu wrote: > Andrey, > > On Fri, Feb 19, 2021 at 09:57:37AM +0300, Andrey Gruzdev wrote: > > For the discards that happen before snapshot is started, I need to dig into > > Linux and QEMU virtio-baloon > > code more to get clear with it. > > Yes it's very tricky on how the error could trigger. > > Let's think of below sequence: > > - Start a guest with init_on_free=1 set and also a virtio-balloon device > > - Guest frees a page P and zeroed it (since init_on_free=1). Now P contains > all zeros. > > - Virtio-balloon reports this page to host, MADV_DONTNEED sent, then this > page is dropped on the host. > > - Start live snapshot, wr-protect all pages (but not including page P > because > it's currently missing). Let's call it $SNAPSHOT1. > > - Guest does alloc_page(__GFP_ZERO), accidentally fetching this page P and > returned > > - So far, page P is still all zero (which is good!), then guest uses page P > and writes data to it (say, now P has data P1 rather than all zeros). > > - Live snapshot saves page P, which content P1 rather than all zeros. > > - Live snapshot completed. Saved as $SNAPSHOT1. > > Then when load snapshot $SNAPSHOT1, we'll have P contains data P1. After > snapshot loaded, when guest allocate again with alloc_page(__GFP_ZERO) on this > page P, since guest kernel "thought" this page is all-zero already so > memzero() > is skipped even if __GFP_ZERO is provided. Then this page P (with content P1) > got returned for the alloc_page(__GFP_ZERO) even if __GFP_ZERO set. That > could > break the caller of alloc_page(). > > > Anyhow I'm quite sure that adding global MISSING handler for snapshotting > > is too heavy and not really needed. > > UFFDIO_ZEROCOPY installs a zero pfn and that should be all of it. There'll > definitely be overhead, but it may not be that huge as imagined. Live > snapshot > is great in that we have point-in-time image of guest without stopping the > guest, so taking slightly longer time won't be a huge loss to us too. > > Actually we can also think of other ways to work around it. One way is we can > pre-fault all guest pages before wr-protect. Note that we don't need to write > to the guest page because read would suffice, since uffd-wp would also work > with zero pfn. It's just that this workaround won't help on saving snapshot > disk space, but it seems working. It would be great if you have other > workarounds, maybe as you said UFFDIO_ZEROCOPY is not the only route.
Wait.. it actually seems to also solve the disk usage issue.. :) We should just need to make sure to prohibit balloon before staring to pre-fault read on all guest ram. Seems awkward, but also seems working.. Hmm.. -- Peter Xu