On 01/12/2025 18:35, Peter Xu wrote:
On Mon, Dec 01, 2025 at 04:48:22PM +0000, Nikita Kalyazin wrote:
I believe I found the precise point where we convinced ourselves that minor
support was sufficient: [1].  If at this moment we don't find that reasoning
valid anymore, then indeed implementing missing is the only option.

[1] https://lore.kernel.org/kvm/[email protected]

Now after I re-read the discussion, I may have made a wrong statement
there, sorry.  I could have got slightly confused on when the write()
syscall can be involved.

I agree if you want to get an event when cache missed with the current uffd
definitions and when pre-population is forbidden, then MISSING trap is
required.  That is, with/without the need of UFFDIO_COPY being available.

Do I understand it right that UFFDIO_COPY is not allowed in your case, but
only write()?

No, UFFDIO_COPY would work perfectly fine. We will still use write() whenever we resolve stage-2 faults as they aren't visible to UFFD. When a userfault occurs at an offset that already has a page in the cache, we will have to keep using UFFDIO_CONTINUE so it looks like both will be required:

- user mapping major fault -> UFFDIO_COPY (fills the cache and sets up userspace PT)
 - user mapping minor fault -> UFFDIO_CONTINUE (only sets up userspace PT)
 - stage-2 fault -> write() (only fills the cache)


One way that might work this around, is introducing a new UFFD_FEATURE bit
allowing the MINOR registration to trap all pgtable faults, which will
change the MINOR fault semantics.

This would equally work for us. I suppose this MINOR+MAJOR semantics would be more intrusive from the API point of view though.


That'll need some further thoughts, meanwhile we may also want to make sure
the old shmem/hugetlbfs semantics are kept (e.g. they should fail MINOR
registers if the new feature bit is enabled in the ctx somehow; or support
them properly in the codebase).

Thanks,

--
Peter Xu



Reply via email to