On Thu, Mar 22, 2018 at 04:55:39PM +0200, Michael S. Tsirkin wrote: > On Mon, Mar 19, 2018 at 03:15:31PM +0800, Tiwei Bie wrote: > > This patch set does some small extensions to vhost-user protocol > > to support VFIO based accelerators, and makes it possible to get > > the similar performance of VFIO based PCI passthru while keeping > > the virtio device emulation in QEMU. > > I love your patches! > Yet there are some things to improve. > Posting comments separately as individual messages. >
Thank you so much! :-) It may take me some time to address all your comments. They're really helpful! I'll try to address and reply to these comments in the next few days. Thanks again! I do appreciate it! Best regards, Tiwei Bie > > > How does accelerator accelerate vhost (data path) > > ================================================= > > > > Any virtio ring compatible devices potentially can be used as the > > vhost data path accelerators. We can setup the accelerator based > > on the informations (e.g. memory table, features, ring info, etc) > > available on the vhost backend. And accelerator will be able to use > > the virtio ring provided by the virtio driver in the VM directly. > > So the virtio driver in the VM can exchange e.g. network packets > > with the accelerator directly via the virtio ring. That is to say, > > we will be able to use the accelerator to accelerate the vhost > > data path. We call it vDPA: vhost Data Path Acceleration. > > > > Notice: Although the accelerator can talk with the virtio driver > > in the VM via the virtio ring directly. The control path events > > (e.g. device start/stop) in the VM will still be trapped and handled > > by QEMU, and QEMU will deliver such events to the vhost backend > > via standard vhost protocol. > > > > Below link is an example showing how to setup a such environment > > via nested VM. In this case, the virtio device in the outer VM is > > the accelerator. It will be used to accelerate the virtio device > > in the inner VM. In reality, we could use virtio ring compatible > > hardware device as the accelerators. > > > > http://dpdk.org/ml/archives/dev/2017-December/085044.html > > > > In above example, it doesn't require any changes to QEMU, but > > it has lower performance compared with the traditional VFIO > > based PCI passthru. And that's the problem this patch set wants > > to solve. > > > > The performance issue of vDPA/vhost-user and solutions > > ====================================================== > > > > For vhost-user backend, the critical issue in vDPA is that the > > data path performance is relatively low and some host threads are > > needed for the data path, because some necessary mechanisms are > > missing to support: > > > > 1) guest driver notifies the device directly; > > 2) device interrupts the guest directly; > > > > So this patch set does some small extensions to the vhost-user > > protocol to make both of them possible. It leverages the same > > mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as > > the PCI passthru. > > > > A new protocol feature bit is added to negotiate the accelerator > > feature support. Two new slave message types are added to control > > the notify region and queue interrupt passthru for each queue. > > >From the view of vhost-user protocol design, it's very flexible. > > The passthru can be enabled/disabled for each queue individually, > > and it's possible to accelerate each queue by different devices. > > More design and implementation details can be found from the last > > patch. > > > > Difference between vDPA and PCI passthru > > ======================================== > > > > The key difference between PCI passthru and vDPA is that, in vDPA > > only the data path of the device (e.g. DMA ring, notify region and > > queue interrupt) is pass-throughed to the VM, the device control > > path (e.g. PCI configuration space and MMIO regions) is still > > defined and emulated by QEMU. > > > > The benefits of keeping virtio device emulation in QEMU compared > > with virtio device PCI passthru include (but not limit to): > > > > - consistent device interface for guest OS in the VM; > > - max flexibility on the hardware (i.e. the accelerators) design; > > - leveraging the existing virtio live-migration framework; > > > > Why extend vhost-user for vDPA > > ============================== > > > > We have already implemented various virtual switches (e.g. OVS-DPDK) > > based on vhost-user for VMs in the Cloud. They are purely software > > running on CPU cores. When we have accelerators for such NFVi applications, > > it's ideal if the applications could keep using the original interface > > (i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide > > when and how to switch between CPU and accelerators within the interface. > > And the switching (i.e. switch between CPU and accelerators) can be done > > flexibly and quickly inside the applications. > > > > More details about this can be found from the Cunming's discussions on > > the RFC patch set. > > > > Update notes > > ============ > > > > IOMMU feature bit check is removed in this version, because: > > > > The IOMMU feature is negotiable, when an accelerator is used and > > it doesn't support virtual IOMMU, its driver just won't provide > > this feature bit when vhost library querying its features. And if > > it supports the virtual IOMMU, its driver can provide this feature > > bit. It's not reasonable to add this limitation in this patch set. > > > > The previous links: > > RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html > > v1: http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html > > > > v1 -> v2: > > - Add some explanations about why extend vhost-user in commit log (Paolo); > > - Bug fix in slave_read() according to Stefan's fix in DPDK; > > - Remove IOMMU feature check and related commit log; > > - Some minor refinements; > > - Rebase to the latest QEMU; > > > > RFC -> v1: > > - Add some details about how vDPA works in cover letter (Alexey) > > - Add some details about the OVS offload use-case in cover letter (Jason) > > - Move PCI specific stuffs out of vhost-user (Jason) > > - Handle the virtual IOMMU case (Jason) > > - Move VFIO group management code into vfio/common.c (Alex) > > - Various refinements; > > (approximately sorted by comment posting time) > > > > Tiwei Bie (6): > > vhost-user: support receiving file descriptors in slave_read > > vhost-user: introduce shared vhost-user state > > virtio: support adding sub-regions for notify region > > vfio: support getting VFIOGroup from groupfd > > vfio: remove DPRINTF() definition from vfio-common.h > > vhost-user: add VFIO based accelerators support > > > > Makefile.target | 4 + > > docs/interop/vhost-user.txt | 57 +++++++++ > > hw/scsi/vhost-user-scsi.c | 6 +- > > hw/vfio/common.c | 97 +++++++++++++++- > > hw/virtio/vhost-user.c | 248 > > +++++++++++++++++++++++++++++++++++++++- > > hw/virtio/virtio-pci.c | 48 ++++++++ > > hw/virtio/virtio-pci.h | 5 + > > hw/virtio/virtio.c | 39 +++++++ > > include/hw/vfio/vfio-common.h | 11 +- > > include/hw/virtio/vhost-user.h | 34 ++++++ > > include/hw/virtio/virtio-scsi.h | 6 +- > > include/hw/virtio/virtio.h | 5 + > > include/qemu/osdep.h | 1 + > > net/vhost-user.c | 30 ++--- > > scripts/create_config | 3 + > > 15 files changed, 561 insertions(+), 33 deletions(-) > > create mode 100644 include/hw/virtio/vhost-user.h > > > > -- > > 2.11.0