On Wed, Jul 12, 2017 at 05:00:04PM +0200, Andrea Arcangeli wrote: > On Tue, Jul 11, 2017 at 12:22:32PM +0800, Peter Xu wrote: > > On Wed, Jun 28, 2017 at 08:00:40PM +0100, Dr. David Alan Gilbert (git) > > wrote: > > > From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> > > > > > > Cause the vhost-user client to be woken up whenever: > > > a) We place a page in postcopy mode > > > > Just to make sure I understand it correctly - UFFDIO_COPY will only > > wake up the waiters on the same userfaultfd context, so we don't need > > to wake up QEMU userfaultfd (vcpu threads), but we need to explicitly > > wake up other ufds/threads, like vhost-user backends. Am I right? > > Yes. > > Every "uffd" represents one and only one "mm" (i.e. a process). So > there is no way a single UFFDIO_COPY can wake the faults happening on > a process different from the "mm" the uffd is associated with. > > vhost-bridge being a different process requires a UFFDIO_WAKE on its > own uffd it passed to qemu in addition of the UFFDIO_COPY that like > you said implicitly wakes the userfaults happening on the qemu process > (vcpus iothread, dataplane etc..). > > On a side note there's a way not to wake userfaults implicitly in > UFFDIO_COPY in case you want to wake userfaults in batches but nobody > uses that for now (uffdio_copy.mode |= UFFDIO_COPY_MODE_DONTWAKE). > > It'd be theoretically nice to optimize away the additional enter/exit > kernel introduced by the UFFDIO_WAKE and the translation table as > well. > > What we could do is to add a UFFDIO_BIND that takes an "fd" as > parameter to the ioctl to bind the two uffd together. Then we could > push logical offsets in addition to the virtual address ranges when > calling UFFDIO_REGISTER_LOGICAL (the logical offsets would then match > the guest physical addresses) so that the UFFDIO_COPY_LOGICAL would > then be able to get a logical range to wakeup that the kernel would > translate into virtual addresses for all uffds bind together. Pushing > offsets into UFFDIO_REGISTER was David's idea. > > That would eliminate the enter/exit kernel for the explicit > UFFDIO_WAKE and calling a single UFFDIO_COPY would be enough. > > Alternatively we should make the uffd work based on file offsets > instead of virtual addresses but that would involve changes to > filesystems and it only would move the needle on top of tmpfs > (shared=on/off no difference) and hugetlbfs. It would be enough for > vhost-bridge.
Really glad to know these ideas. > > Usually the uffd fault lives at the higher level of the virtual memory > subsystem and never deals with file offsets so if we can get away with > logical ranges per-uffd for UFFDIO_REGISTER and UFFDIO_COPY, it may be > simpler and easier to extend automatically to all memory types > supported by uffd (including anon which has no file offset). > > No major improvement is to be expected by such an enhancement though > so it's not very high priority to implement. It's not even clear if > the complexity is worth it. Doing one more syscall per page I think > might be measurable only on very fast network. The current way of > operation where uffd are independent of each other and the translation > table is transferred by userland means is quite optimal already and > much simpler. Furthermore for hugetlbfs the performance difference > most certainly wouldn't be measurable, as the enter/exit kernel would > be diluted by a factor of 512 compared to 4k userfaults. Indeed, performance critical scenarios should be using huge pages, and that means that extra WAKE will have even smaller impact. Thanks Andrea! -- Peter Xu