On Tue, Jan 31, 2023 at 02:48:54PM -0500, Peter Xu wrote: > On Thu, Jan 26, 2023 at 12:26:45PM -0500, Peter Xu wrote: > > On Thu, Jan 26, 2023 at 03:59:33PM +0000, Daniel P. Berrangé wrote: > > > On Thu, Jan 26, 2023 at 10:25:05AM -0500, Peter Xu wrote: > > > > On Thu, Jan 26, 2023 at 02:15:11PM +0000, Dr. David Alan Gilbert wrote: > > > > > * Michal Prívozník (mpriv...@redhat.com) wrote: > > > > > > On 1/25/23 23:40, Peter Xu wrote: > > > > > > > The new /dev/userfaultfd handle is superior to the system call > > > > > > > with a > > > > > > > better permission control and also works for a restricted seccomp > > > > > > > environment. > > > > > > > > > > > > > > The new device was only introduced in v6.1 so we need a header > > > > > > > update. > > > > > > > > > > > > > > Please have a look, thanks. > > > > > > > > > > > > I was wondering whether it would make sense/be possible for mgmt app > > > > > > (libvirt) to pass FD for /dev/userfaultfd instead of QEMU opening it > > > > > > itself. But looking into the code, libvirt would need to do that > > > > > > when > > > > > > spawning QEMU because that's when QEMU itself initializes internal > > > > > > state > > > > > > and queries userfaultfd caps. > > > > > > > > > > You also have to be careful about what the userfaultfd semantics are; > > > > > I > > > > > can't remember them - but if you open it in one process and pass it to > > > > > another process, which processes address space are you trying to > > > > > monitor? > > > > > > > > Yes it's a problem. The kernel always fetches the current mm_struct* > > > > which > > > > represents the current context of virtual address space when creating > > > > the > > > > uffd handle (for either the syscall or the ioctl() approach). > > > > > > At what point does the process address space get associated ? When > > > the /dev/userfaultfd is opened, or only when ioctl(USERFAULTFD_IOC_NEW) > > > is called ? If it is the former, then we have no choice, QEMU must open > > > it. if it is the latter, then libvirt can open /dev/userfaultfd, pass > > > it to QEMU which can then do the ioctl(USERFAULTFD_IOC_NEW). > > > > Good point.. It should be the latter, so should be doable. > > > > What should be the best interface for QEMU to detect the fd passing over to > > it? IIUC qemu_open() requires the name to be /dev/fdset/*, but there's no > > existing cmdline that QEMU can know which fd number to fetch from fdset to > > be used as the /dev/userfaultfd descriptor. > > > > monitor_get_fd() seems more proper, where we can define an unique string so > > Libvirt can preset the descriptor with the same string attached to it, then > > I can opt-in monitor_get_fd() before trying to open() or doing the syscall. > > Daniel/Michal, any input here from Libvirt side? > > I just noticed that monitor_get_fd() is bound to a specific monitor, then > it seems not clear which one is from Libvirt. If to use qemu_open() and > add-fd I think we need another QEMU cmdline to set the fd path, iiuc. > > I can also leave that for later if opening /dev/userfaultfd is already > resolving the immediate problem in containers.
I don't have any great ideas really. If we assume the /dev/userfaultfd is accessible to QEMU we can ignore it. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|