> Am 19.02.2021 um 22:10 schrieb Peter Xu <pet...@redhat.com>:
> 
> On Fri, Feb 19, 2021 at 03:50:52PM -0500, Peter Xu wrote:
>> Andrey,
>> 
>>> On Fri, Feb 19, 2021 at 09:57:37AM +0300, Andrey Gruzdev wrote:
>>> For the discards that happen before snapshot is started, I need to dig into 
>>> Linux and QEMU virtio-baloon
>>> code more to get clear with it.
>> 
>> Yes it's very tricky on how the error could trigger.
>> 
>> Let's think of below sequence:
>> 
>>  - Start a guest with init_on_free=1 set and also a virtio-balloon device
>> 
>>  - Guest frees a page P and zeroed it (since init_on_free=1). Now P contains
>>    all zeros.
>> 
>>  - Virtio-balloon reports this page to host, MADV_DONTNEED sent, then this
>>    page is dropped on the host.
>> 
>>  - Start live snapshot, wr-protect all pages (but not including page P 
>> because
>>    it's currently missing).  Let's call it $SNAPSHOT1.
>> 
>>  - Guest does alloc_page(__GFP_ZERO), accidentally fetching this page P and
>>    returned
>> 
>>  - So far, page P is still all zero (which is good!), then guest uses page P
>>    and writes data to it (say, now P has data P1 rather than all zeros).
>> 
>>  - Live snapshot saves page P, which content P1 rather than all zeros.
>> 
>>  - Live snapshot completed.  Saved as $SNAPSHOT1.
>> 
>> Then when load snapshot $SNAPSHOT1, we'll have P contains data P1.  After
>> snapshot loaded, when guest allocate again with alloc_page(__GFP_ZERO) on 
>> this
>> page P, since guest kernel "thought" this page is all-zero already so 
>> memzero()
>> is skipped even if __GFP_ZERO is provided.  Then this page P (with content 
>> P1)
>> got returned for the alloc_page(__GFP_ZERO) even if __GFP_ZERO set.  That 
>> could
>> break the caller of alloc_page().
>> 
>>> Anyhow I'm quite sure that adding global MISSING handler for snapshotting
>>> is too heavy and not really needed.
>> 
>> UFFDIO_ZEROCOPY installs a zero pfn and that should be all of it.  There'll
>> definitely be overhead, but it may not be that huge as imagined.  Live 
>> snapshot
>> is great in that we have point-in-time image of guest without stopping the
>> guest, so taking slightly longer time won't be a huge loss to us too.
>> 
>> Actually we can also think of other ways to work around it.  One way is we 
>> can
>> pre-fault all guest pages before wr-protect.  Note that we don't need to 
>> write
>> to the guest page because read would suffice, since uffd-wp would also work
>> with zero pfn.  It's just that this workaround won't help on saving snapshot
>> disk space, but it seems working.  It would be great if you have other
>> workarounds, maybe as you said UFFDIO_ZEROCOPY is not the only route.
> 
> Wait.. it actually seems to also solve the disk usage issue.. :)
> 
> We should just need to make sure to prohibit balloon before staring to
> pre-fault read on all guest ram.  Seems awkward, but also seems working.. 
> Hmm..

A shiver just went down my spine. Please don‘t just for the sake of creating a 
snapshot.

(Just imagine you don‘t have a shared zeropage...)


> -- 
> Peter Xu
> 

Reply via email to