On Thu, Feb 11, 2021 at 12:21:51PM +0300, Andrey Gruzdev wrote:
> On 09.02.2021 23:31, Peter Xu wrote:
> > On Tue, Feb 09, 2021 at 03:09:28PM -0500, Peter Xu wrote:
> > > Hi, David, Andrey,
> > > 
> > > On Tue, Feb 09, 2021 at 08:06:58PM +0100, David Hildenbrand wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > just stumbled over this, quick question:
> > > > > > 
> > > > > > I recently played with UFFD_WP and notices that write protection is
> > > > > > only effective on pages/ranges that have already pages populated 
> > > > > > (IOW:
> > > > > > !pte_none() in the kernel).
> > > > > > 
> > > > > > In case memory was never populated (or was discarded using e.g.,
> > > > > > madvice(DONTNEED)), write-protection will be skipped silently and 
> > > > > > you
> > > > > > won't get WP events for applicable pages.
> > > > > > 
> > > > > > So if someone writes to a yet unpoupulated page ("zero"), you won't
> > > > > > get WP events.
> > > > > > 
> > > > > > I can spot that you do a single uffd_change_protection() on the 
> > > > > > whole
> > > > > > RAMBlock.
> > > > > > 
> > > > > > How are you handling that scenario, or why don't you have to handle
> > > > > > that scenario?
> > > Good catch..  Indeed I overlooked that as well when reviewing the code.
> > > 
> > > > > Hi David,
> > > > > 
> > > > > I really wonder if such a problem exists.. If we are talking about a
> > > > I immediately ran into this issue with my simplest test cases. :)
> > > > 
> > > > > write to an unpopulated page, we should get first page fault on
> > > > > non-present page and populate it with protection bits from respective 
> > > > > vma.
> > > > > For UFFD_WP vma's  page will be populated non-writable. So we'll get
> > > > > another page fault on present but read-only page and go to 
> > > > > handle_userfault.
> > > The problem is even if the page is read-only, it does not yet have the 
> > > uffd-wp
> > > bit set, so it won't really trigger the handle_userfault() path.
> > > 
> > > > You might have to register also for MISSING faults and place zero pages.
> > > So I think what's missing for live snapshot is indeed to register with 
> > > both
> > > missing & wp mode.
> > > 
> > > Then we'll receive two messages: For wp, we do like before.  For missing, 
> > > we do
> > > UFFDIO_ZEROCOPY and at the same time dump this page as a zero page.
> > > 
> > > I bet live snapshot didn't encounter this issue simply because normal live
> > > snapshots would still work, especially when there's the guest OS. Say, the
> > > worst case is we could have migrated some zero pages with some random data
> > > filled in along with the snapshot, however all these pages were zero 
> > > pages and
> > > not used by the guest OS after all, then when we load a snapshot we won't
> > > easily notice either..
> > I'm thinking some way to verify this from live snapshot pov, and I've got an
> > idea so I just share it out...  Maybe we need a guest application that does
> > something like below:
> > 
> >    - mmap() a huge lot of memory
> > 
> >    - call mlockall(), so that pages will be provisioned in the guest but 
> > without
> >      data written.  IIUC on the host these pages should be backed by missing
> >      pages as long as guest app doesn't write.  Then...
> > 
> >    - the app starts to read input from user:
> > 
> >      - If user inputs "dirty" and enter: it'll start to dirty the whole 
> > range.
> >        Write non-zero to the 1st byte of each page would suffice.
> > 
> >      - If user inputs "check" and enter: it'll read the whole memory chunk 
> > to
> >        see whether all the pages are zero pages.  If it reads any non-zero 
> > page,
> >        it should bail out and print error.
> > 
> > With the help of above program, we can do below to verify the live snapshot
> > worked as expected on zero pages:
> > 
> >    - Guest: start above program, don't input yet (so waiting to read either
> >      "dirty" or "check" command)
> > 
> >    - Host: start live snapshot
> > 
> >    - Guest: input "dirty" command, so start quickly dirtying the ram
> > 
> >    - Host: live snapshot completes
> > 
> > Then to verify the snapshot image, we do:
> > 
> >    - Host: load the snapshot we've got
> > 
> >    - Guest: (should still be in the state of waiting for cmd) this time we 
> > enter
> >      "check"
> > 
> > Thanks,
> > 
> Hi David, Peter,
> 
> A little unexpected behavior, from my point of view, for UFFD 
> write-protection.
> So, that means that UFFD_WP protection/events works only for locked memory?
> I'm now looking at kernel implementation, to understand..

Not really; it definitely works for all memories that we've touched.  My
previous exmaple wanted to let the guest app use a not-yet-allocated page.  I
figured mlockall() might achieve that, hence I proposed such an example
assuming that may verify the zero page issue on live snapshot.  So if my
understanding is correct, if we run above scenario, current live snapshot might
fail that app when we do the "check" command at last, by finding non-zero pages.

Thanks,

-- 
Peter Xu


Reply via email to