On 2017-01-20 12:54, Wang, Wei W wrote: > On Tuesday, January 17, 2017 5:46 PM, Jan Kiszka wrote: >> On 2017-01-17 10:13, Wang, Wei W wrote: >>> Hi Jan, >>> >>> On Monday, January 16, 2017 9:10 PM, Jan Kiszka wrote: >>>> On 2017-01-16 13:41, Marc-André Lureau wrote: >>>>> On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka <jan.kis...@siemens.com >>>>> <mailto:jan.kis...@siemens.com>> wrote: >>>>> some of you may know that we are using a shared memory device similar >> to >>>>> ivshmem in the partitioning hypervisor Jailhouse [1]. >>>>> >>>>> We started as being compatible to the original ivshmem that QEMU >>>>> implements, but we quickly deviated in some details, and in the recent >>>>> months even more. Some of the deviations are related to making the >>>>> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is >>>>> aiming at safety critical systems and, therefore, a small code base. >>>>> Other changes address deficits in the original design, like missing >>>>> life-cycle management. >>>>> >>>>> Now the question is if there is interest in defining a common new >>>>> revision of this device and maybe also of some protocols used on top, >>>>> such as virtual network links. Ideally, this would enable us to share >>>>> Linux drivers. We will definitely go for upstreaming at least a >>>>> network >>>>> driver such as [2], a UIO driver and maybe also a serial port/console. >>>>> >>>>> >>>>> This sounds like duplicating efforts done with virtio and vhost-pci. >>>>> Have you looked at Wei Wang proposal? >>>> >>>> I didn't follow it recently, but the original concept was about >>>> introducing an IOMMU model to the picture, and that's complexity-wise >>>> a no-go for us (we can do this whole thing in less than 500 lines, >>>> even virtio itself is more complex). IIUC, the alternative to an >>>> IOMMU is mapping the whole frontend VM memory into the backend VM - >> that's security/safety-wise an absolute no-go. >>> >>> Though the virtio based solution might be complex for you, a big advantage >>> is >> that we have lots of people working to improve virtio. For example, the >> upcoming virtio 1.1 has vring improvement, we can easily upgrade all the >> virtio >> based solutions, such as vhost-pci, to take advantage of this improvement. >> From >> the long term perspective, I think this kind of complexity is worthwhile. >> >> We will adopt virtio 1.1 ring formats. That's one reason why there is also >> still a >> bidirectional shared memory region: to host the new descriptors (while >> keeping >> the payload safely in the unidirectional regions). > > The vring example I gave might be confusing, sorry about that. My point is > that every part of virtio is getting matured and improved from time to time. > Personally, having a new device developed and maintained in an active and > popular model is helpful. Also, as new features being gradually added in the > future, a simple device could become complex.
We can't afford becoming more complex, that is the whole point. Complexity shall go into the guest, not the hypervisor, when it is really needed. > > Having a theoretical analysis on the performance: > The traditional shared memory mechanism, sharing an intermediate memory, > requires 2 copies to get the packet transmitted. It's not just one more copy > compared to the 1-copy solution, I think some more things we may need to take > into account: 1-copy (+potential transfers to userspace, but that's the same for everyone) is conceptually possible, definitely under stacks like DPDK. However, Linux skbs are currently not prepared for picking up shmem-backed packets, we already looked into this. Likely addressable, though. > 1) there are extra ring operation overhead on both the sending and receiving > side to access the shared memory (i.e. IVSHMEM); > 2) extra protocol to use the shared memory; > 3) the piece of allocated shared memory from the host = C(n,2), where n is > the number of VMs. Like for 20 VMs who want to talk to each other, there will > be 190 pieces of memory allocated from the host. Well, only if all VMs need to talk to all others directly. On real setups, you would add direct links for heavy traffic and otherwise do software switching. Moreover, those links would only have to be backed by physical memory in static setups all the time. Also, we didn't completely rule out a shmem bus with multiple peers connected. That's just looking for a strong use case - and then a robust design, of course. > > That being said, if people really want the 2-copy solution, we can also have > vhost-pci support it that way as a new feature (not sure if you would be > interested in collaborating on the project): > With the new feature added, the master VM sends only a piece of memory > (equivalent to IVSHMEM, but allocated by the guest) to the slave over > vhost-user protocol, and the vhost-pci device on the slave side only hosts > that piece of shared memory. I'm all in for something that allows to strip down vhost-pci to something that - while staying secure - is simple and /also/ allows static configurations. But I'm not yet seeing that this would still be virtio or vhost-pci. What would be the minimal viable vhost-pci device set from your POV? What would have to be provided by the hypervisor for that? Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux