On Fri, Sep 11, 2015 at 05:39:07PM +0200, Claudio Fontana wrote: > On 09.09.2015 09:06, Michael S. Tsirkin wrote: > > On Mon, Sep 07, 2015 at 02:38:34PM +0200, Claudio Fontana wrote: > >> Coming late to the party, > >> > >> On 31.08.2015 16:11, Michael S. Tsirkin wrote: > >>> Hello! > >>> During the KVM forum, we discussed supporting virtio on top > >>> of ivshmem. I have considered it, and came up with an alternative > >>> that has several advantages over that - please see below. > >>> Comments welcome. > >> > >> as Jan mentioned we actually discussed a virtio-shmem device which would > >> incorporate the advantages of ivshmem (so no need for a separate ivshmem > >> device), which would use the well known virtio interface, taking advantage > >> of the new virtio-1 virtqueue layout to split r/w and read-only rings as > >> seen from the two sides, and make use also of BAR0 which has been freed up > >> for use by the device. > >> > >> This way it would be possible to share the rings and the actual memory for > >> the buffers in the PCI bars. The guest VMs could decide to use the shared > >> memory regions directly as prepared by the hypervisor (in the jailhouse > >> case) or QEMU/KVM, or perform their own validation on the input depending > >> on the use case. > >> > >> Of course the communication between VMs needs in this case to be > >> pre-configured and is quite static (which is actually beneficial in our > >> use case). > >> > >> But still in your proposed solution, each VM needs to be pre-configured to > >> communicate with a specific other VM using a separate device right? > >> > >> But I wonder if we are addressing the same problem.. in your case you are > >> looking at having a shared memory pool for all VMs potentially visible to > >> all VMs (the vhost-user case), while in the virtio-shmem proposal we > >> discussed we were assuming specific different regions for every channel. > >> > >> Ciao, > >> > >> Claudio > > > > The problem, as I see it, is to allow inter-vm communication with > > polling (to get very low latencies) but polling within VMs only, without > > need to run a host thread (which when polling uses up a host CPU). > > > > What was proposed was to simply change virtio to allow > > "offset within BAR" instead of PA. > > There are many consequences to this, offset within BAR alone is not enough, > there are multiple things at the virtio level that need sorting out. > Also we need to consider virtio-mmio etc. > > > This would allow VM2VM communication if there are only 2 VMs, > > but if data needs to be sent to multiple VMs, you > > must copy it. > > Not necessarily, however getting it to work (sharing the backend window and > arbitrating the multicast) is really hard. > > > > > Additionally, it's a single-purpose feature: you can use it from > > a userspace PMD but linux will never use it. > > > > > > My proposal is a superset: don't require that BAR memory is > > used, use IOMMU translation tables. > > This way, data can be sent to multiple VMs by sharing the same > > memory with them all. > > Can you describe in detail how your proposal deals with the arbitration > necessary for multicast handling?
Basically it falls out naturally. Consider linux guest as an example, and assume dynamic mappings for simplicity. Multicast is done by a bridge on the guest side. That code clones the skb (reference-counting page fragments) and passes it to multiple ports. Each of these will program the IOMMU to allow read access to the fragments to the relevant device. > > > > It is still possible to put data in some device BAR if that's > > what the guest wants to do: just program the IOMMU to limit > > virtio to the memory range that is within this BAR. > > > > Another advantage here is that the feature is more generally useful. > > > > > >>> > >>> ----- > >>> > >>> Existing solutions to userspace switching between VMs on the > >>> same host are vhost-user and ivshmem. > >>> > >>> vhost-user works by mapping memory of all VMs being bridged into the > >>> switch memory space. > >>> > >>> By comparison, ivshmem works by exposing a shared region of memory to all > >>> VMs. > >>> VMs are required to use this region to store packets. The switch only > >>> needs access to this region. > >>> > >>> Another difference between vhost-user and ivshmem surfaces when polling > >>> is used. With vhost-user, the switch is required to handle > >>> data movement between VMs, if using polling, this means that 1 host CPU > >>> needs to be sacrificed for this task. > >>> > >>> This is easiest to understand when one of the VMs is > >>> used with VF pass-through. This can be schematically shown below: > >>> > >>> +-- VM1 --------------+ +---VM2-----------+ > >>> | virtio-pci +-vhost-user-+ virtio-pci -- VF | -- VFIO -- IOMMU > >>> -- NIC > >>> +---------------------+ +-----------------+ > >>> > >>> > >>> With ivshmem in theory communication can happen directly, with two VMs > >>> polling the shared memory region. > >>> > >>> > >>> I won't spend time listing advantages of vhost-user over ivshmem. > >>> Instead, having identified two advantages of ivshmem over vhost-user, > >>> below is a proposal to extend vhost-user to gain the advantages > >>> of ivshmem. > >>> > >>> > >>> 1: virtio in guest can be extended to allow support > >>> for IOMMUs. This provides guest with full flexibility > >>> about memory which is readable or write able by each device. > >>> By setting up a virtio device for each other VM we need to > >>> communicate to, guest gets full control of its security, from > >>> mapping all memory (like with current vhost-user) to only > >>> mapping buffers used for networking (like ivshmem) to > >>> transient mappings for the duration of data transfer only. > >>> This also allows use of VFIO within guests, for improved > >>> security. > >>> > >>> vhost user would need to be extended to send the > >>> mappings programmed by guest IOMMU. > >>> > >>> 2. qemu can be extended to serve as a vhost-user client: > >>> remote VM mappings over the vhost-user protocol, and > >>> map them into another VM's memory. > >>> This mapping can take, for example, the form of > >>> a BAR of a pci device, which I'll call here vhost-pci - > >>> with bus address allowed > >>> by VM1's IOMMU mappings being translated into > >>> offsets within this BAR within VM2's physical > >>> memory space. > >>> > >>> Since the translation can be a simple one, VM2 > >>> can perform it within its vhost-pci device driver. > >>> > >>> While this setup would be the most useful with polling, > >>> VM1's ioeventfd can also be mapped to > >>> another VM2's irqfd, and vice versa, such that VMs > >>> can trigger interrupts to each other without need > >>> for a helper thread on the host. > >>> > >>> > >>> The resulting channel might look something like the following: > >>> > >>> +-- VM1 --------------+ +---VM2-----------+ > >>> | virtio-pci -- iommu +--+ vhost-pci -- VF | -- VFIO -- IOMMU -- NIC > >>> +---------------------+ +-----------------+ > >>> > >>> comparing the two diagrams, a vhost-user thread on the host is > >>> no longer required, reducing the host CPU utilization when > >>> polling is active. At the same time, VM2 can not access all of VM1's > >>> memory - it is limited by the iommu configuration setup by VM1. > >>> > >>> > >>> Advantages over ivshmem: > >>> > >>> - more flexibility, endpoint VMs do not have to place data at any > >>> specific locations to use the device, in practice this likely > >>> means less data copies. > >>> - better standardization/code reuse > >>> virtio changes within guests would be fairly easy to implement > >>> and would also benefit other backends, besides vhost-user > >>> standard hotplug interfaces can be used to add and remove these > >>> channels as VMs are added or removed. > >>> - migration support > >>> It's easy to implement since ownership of memory is well defined. > >>> For example, during migration VM2 can notify hypervisor of VM1 > >>> by updating dirty bitmap each time is writes into VM1 memory. > >>> > >>> Thanks, > >>> > >> > >>