On Tue, May 28, 2013 at 08:17:42PM +0300, Michael S. Tsirkin wrote: > On Tue, May 28, 2013 at 12:00:38PM -0500, Anthony Liguori wrote: > > Julian Stecklina <jstec...@os.inf.tu-dresden.de> writes: > > > > > On 05/28/2013 12:10 PM, Luke Gorrie wrote: > > >> On 27 May 2013 11:34, Stefan Hajnoczi <stefa...@redhat.com > > >> <mailto:stefa...@redhat.com>> wrote: > > >> > > >> vhost_net is about connecting the a virtio-net speaking process to a > > >> tun-like device. The problem you are trying to solve is connecting a > > >> virtio-net speaking process to Snabb Switch. > > >> > > >> > > >> Yep! > > > > > > Since I am on a similar path as Luke, let me share another idea. > > > > > > What about extending qemu in a way to allow PCI device models to be > > > implemented in another process. > > > > We aren't going to support any interface that enables out of tree > > devices. This is just plugins in a different form with even more > > downsides. You cannot easily keep track of dirty info, the guest > > physical address translation to host is difficult to keep in sync > > (imagine the complexity of memory hotplug). > > > > Basically, it's easy to hack up but extremely hard to do something that > > works correctly overall. > > > > There isn't a compelling reason to implement something like this other > > than avoiding getting code into QEMU. Best to just submit your device > > to QEMU for inclusion. > > > > If you want to avoid copying in a vswitch, better to use something like > > vmsplice as I outlined in another thread. > > > > > This is not as hard as it may sound. > > > qemu would open a domain socket to this process and map VM memory over > > > to the other side. This can be accomplished by having file descriptors > > > in qemu to VM memory (reusing -mem-path code) and passing those over the > > > domain socket. The other side can then just mmap them. The socket would > > > also be used for configuration and I/O by the guest on the PCI > > > I/O/memory regions. You could also use this to do IRQs or use eventfds, > > > whatever works better. > > > > > > To have a zero copy userspace switch, the switch would offer virtio-net > > > devices to any qemu that wants to connect to it and implement the > > > complete device logic itself. Since it has access to all guest memory, > > > it can just do memcpy for packet data. Of course, this only works for > > > 64-bit systems, because you need vast amounts of virtual address space. > > > In my experience, doing this in userspace is _way less painful_. > > > > > > If you can get away with polling in the switch the overhead of doing all > > > this in userspace is zero. And as long as you can rate-limit explicit > > > notifications over the socket even that overhead should be okay. > > > > > > Opinions? > > > > I don't see any compelling reason to do something like this. It's > > jumping through a tremendous number of hoops to avoid putting code that > > belongs in QEMU in tree. > > > > Regards, > > > > Anthony Liguori > > > > > > > > Julian > > OTOH an in-tree device that runs in a separate process would > be useful e.g. for security. > For example, we could limit a virtio-net device process > to only access tap and vhost files.
For tap or vhost files only this is good for security. I'm not sure it has many advantages over a QEMU process under SELinux though. Obviously when the switch process has shared memory access to multiple guests' RAM, the security is worse than a QEMU process solution but better than a vhost kernel solution. So the security story is not a clear win. Stefan