> Am 19.02.2021 um 22:10 schrieb Peter Xu <pet...@redhat.com>: > > On Fri, Feb 19, 2021 at 03:50:52PM -0500, Peter Xu wrote: >> Andrey, >> >>> On Fri, Feb 19, 2021 at 09:57:37AM +0300, Andrey Gruzdev wrote: >>> For the discards that happen before snapshot is started, I need to dig into >>> Linux and QEMU virtio-baloon >>> code more to get clear with it. >> >> Yes it's very tricky on how the error could trigger. >> >> Let's think of below sequence: >> >> - Start a guest with init_on_free=1 set and also a virtio-balloon device >> >> - Guest frees a page P and zeroed it (since init_on_free=1). Now P contains >> all zeros. >> >> - Virtio-balloon reports this page to host, MADV_DONTNEED sent, then this >> page is dropped on the host. >> >> - Start live snapshot, wr-protect all pages (but not including page P >> because >> it's currently missing). Let's call it $SNAPSHOT1. >> >> - Guest does alloc_page(__GFP_ZERO), accidentally fetching this page P and >> returned >> >> - So far, page P is still all zero (which is good!), then guest uses page P >> and writes data to it (say, now P has data P1 rather than all zeros). >> >> - Live snapshot saves page P, which content P1 rather than all zeros. >> >> - Live snapshot completed. Saved as $SNAPSHOT1. >> >> Then when load snapshot $SNAPSHOT1, we'll have P contains data P1. After >> snapshot loaded, when guest allocate again with alloc_page(__GFP_ZERO) on >> this >> page P, since guest kernel "thought" this page is all-zero already so >> memzero() >> is skipped even if __GFP_ZERO is provided. Then this page P (with content >> P1) >> got returned for the alloc_page(__GFP_ZERO) even if __GFP_ZERO set. That >> could >> break the caller of alloc_page(). >> >>> Anyhow I'm quite sure that adding global MISSING handler for snapshotting >>> is too heavy and not really needed. >> >> UFFDIO_ZEROCOPY installs a zero pfn and that should be all of it. There'll >> definitely be overhead, but it may not be that huge as imagined. Live >> snapshot >> is great in that we have point-in-time image of guest without stopping the >> guest, so taking slightly longer time won't be a huge loss to us too. >> >> Actually we can also think of other ways to work around it. One way is we >> can >> pre-fault all guest pages before wr-protect. Note that we don't need to >> write >> to the guest page because read would suffice, since uffd-wp would also work >> with zero pfn. It's just that this workaround won't help on saving snapshot >> disk space, but it seems working. It would be great if you have other >> workarounds, maybe as you said UFFDIO_ZEROCOPY is not the only route. > > Wait.. it actually seems to also solve the disk usage issue.. :) > > We should just need to make sure to prohibit balloon before staring to > pre-fault read on all guest ram. Seems awkward, but also seems working.. > Hmm..
A shiver just went down my spine. Please don‘t just for the sake of creating a snapshot. (Just imagine you don‘t have a shared zeropage...) > -- > Peter Xu >