On Tue, Jul 11, 2017 at 12:22:32PM +0800, Peter Xu wrote:
> On Wed, Jun 28, 2017 at 08:00:40PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilb...@redhat.com>
> > 
> > Cause the vhost-user client to be woken up whenever:
> >   a) We place a page in postcopy mode
> 
> Just to make sure I understand it correctly - UFFDIO_COPY will only
> wake up the waiters on the same userfaultfd context, so we don't need
> to wake up QEMU userfaultfd (vcpu threads), but we need to explicitly
> wake up other ufds/threads, like vhost-user backends. Am I right?

Yes.

Every "uffd" represents one and only one "mm" (i.e. a process). So
there is no way a single UFFDIO_COPY can wake the faults happening on
a process different from the "mm" the uffd is associated with.

vhost-bridge being a different process requires a UFFDIO_WAKE on its
own uffd it passed to qemu in addition of the UFFDIO_COPY that like
you said implicitly wakes the userfaults happening on the qemu process
(vcpus iothread, dataplane etc..).

On a side note there's a way not to wake userfaults implicitly in
UFFDIO_COPY in case you want to wake userfaults in batches but nobody
uses that for now (uffdio_copy.mode |= UFFDIO_COPY_MODE_DONTWAKE).

It'd be theoretically nice to optimize away the additional enter/exit
kernel introduced by the UFFDIO_WAKE and the translation table as
well.

What we could do is to add a UFFDIO_BIND that takes an "fd" as
parameter to the ioctl to bind the two uffd together. Then we could
push logical offsets in addition to the virtual address ranges when
calling UFFDIO_REGISTER_LOGICAL (the logical offsets would then match
the guest physical addresses) so that the UFFDIO_COPY_LOGICAL would
then be able to get a logical range to wakeup that the kernel would
translate into virtual addresses for all uffds bind together. Pushing
offsets into UFFDIO_REGISTER was David's idea.

That would eliminate the enter/exit kernel for the explicit
UFFDIO_WAKE and calling a single UFFDIO_COPY would be enough.

Alternatively we should make the uffd work based on file offsets
instead of virtual addresses but that would involve changes to
filesystems and it only would move the needle on top of tmpfs
(shared=on/off no difference) and hugetlbfs. It would be enough for
vhost-bridge.

Usually the uffd fault lives at the higher level of the virtual memory
subsystem and never deals with file offsets so if we can get away with
logical ranges per-uffd for UFFDIO_REGISTER and UFFDIO_COPY, it may be
simpler and easier to extend automatically to all memory types
supported by uffd (including anon which has no file offset).

No major improvement is to be expected by such an enhancement though
so it's not very high priority to implement. It's not even clear if
the complexity is worth it. Doing one more syscall per page I think
might be measurable only on very fast network. The current way of
operation where uffd are independent of each other and the translation
table is transferred by userland means is quite optimal already and
much simpler. Furthermore for hugetlbfs the performance difference
most certainly wouldn't be measurable, as the enter/exit kernel would
be diluted by a factor of 512 compared to 4k userfaults.

Thanks,
Andrea

Reply via email to