On Tue, Feb 25, 2020 at 08:44:56AM +0100, David Hildenbrand wrote: > On 24.02.20 23:49, Peter Xu wrote: > > On Fri, Feb 21, 2020 at 05:42:04PM +0100, David Hildenbrand wrote: > >> When we partially change mappings (esp., mmap over parts of an existing > >> mmap like qemu_ram_remap() does) where we have a userfaultfd handler > >> registered, the handler will implicitly be unregistered from the parts that > >> changed. > >> > >> Trying to place pages onto mappings where there is no longer a handler > >> registered will fail. Let's make sure that any waiter is woken up - we > >> have to do that manually. > >> > >> Let's also document how UFFDIO_UNREGISTER will handle this scenario. > >> > >> This is mainly a preparation for RAM blocks with resizable allcoations, > >> where the mapping of the invalid RAM range will change. The source will > >> keep sending pages that are outside of the new (shrunk) RAM size. We have > >> to treat these pages like they would have been migrated, but can > >> essentially simply drop the content (ignore the placement error). > >> > >> Keep printing a warning on EINVAL, to avoid hiding other (programming) > >> issues. ENOENT is unique. > >> > >> Cc: "Dr. David Alan Gilbert" <dgilb...@redhat.com> > >> Cc: Juan Quintela <quint...@redhat.com> > >> Cc: Peter Xu <pet...@redhat.com> > >> Cc: Andrea Arcangeli <aarca...@redhat.com> > >> Signed-off-by: David Hildenbrand <da...@redhat.com> > >> --- > >> migration/postcopy-ram.c | 37 +++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 37 insertions(+) > >> > >> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c > >> index c68caf4e42..f023830b9a 100644 > >> --- a/migration/postcopy-ram.c > >> +++ b/migration/postcopy-ram.c > >> @@ -506,6 +506,12 @@ static int cleanup_range(RAMBlock *rb, void *opaque) > >> range_struct.start = (uintptr_t)host_addr; > >> range_struct.len = length; > >> > >> + /* > >> + * In case the mapping was partially changed since we enabled > >> userfault > >> + * (e.g., via qemu_ram_remap()), the userfaultfd handler was already > >> removed > >> + * for the mappings that changed. Unregistering will, however, still > >> work > >> + * and ignore mappings without a registered handler. > >> + */ > > > > Ideally we should still only unregister what we have registered. > > After all we do have this information because we know what we > > registered, we know what has unmapped (in your new resize() hook, when > > postcopy_state==RUNNING). > > Not in the case of qemu_ram_remap(). And whatever you propose will > require synchronization (see my other mail) and more complicated > handling than this. uffd allows you to handle races with mmap changes in > a very elegant way (e.g., -ENOENT, or unregisterignoring changed mappings).
All writers to the new postcopy_min_length should have BQL already. The only left is the last cleanup_range() where we can take the BQL for a while. However... > > > > > An extreme example is when we register with pages in range [A, B), > > then shrink it to [A, C), then we mapped something else within [C, B) > > (note, with virtio-mem logically B can be very big and C can be very > > small, it means [B, C) can cover quite some address space). Then if: > > > > - [C, B) memory type is not compatible with uffd, or > > That will never happen in the near future. Without resizable allocations: > - All memory is either anonymous or from a single fd > > In addition, right now, only anonymous memory can be used for resizable > RAM. However, with resizable allocations we could have: > - All used_length memory is either anonymous or from a single fd > - All remaining memory is either anonymous or from a single fd > > Everything else does not make any sense IMHO and I don't think this is > relevant long term. You cannot arbitrarily map things into the > used_length part of a RAMBlock. That would contradict to its page_size > and its fd. E.g., you would break qemu_ram_remap(). ... I think this persuaded me. :) You are right they can still be protected until max_length with PROT_NONE. Would you mind add some of the above into the comment above unregister of uffd? Thanks, -- Peter Xu