On Wednesday, December 6, 2017 9:50 PM, Stefan Hajnoczi wrote:
> On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote:
> > Vhost-pci is a point-to-point based inter-VM communication solution.
> > This patch series implements the vhost-pci-net device setup and
> > emulation. The device is implemented as a virtio device, and it is set
> > up via the vhost-user protocol to get the neessary info (e.g the
> > memory info of the remote VM, vring info).
> >
> > Currently, only the fundamental functions are implemented. More
> > features, such as MQ and live migration, will be updated in the future.
> >
> > The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here:
> > http://dpdk.org/ml/archives/dev/2017-November/082615.html
> 
> I have asked questions about the scope of this feature.  In particular, I 
> think
> it's best to support all device types rather than just virtio-net.  Here is a
> design document that shows how this can be achieved.
> 
> What I'm proposing is different from the current approach:
> 1. It's a PCI adapter (see below for justification) 2. The vhost-user 
> protocol is
> exposed by the device (not handled 100% in
>    QEMU).  Ultimately I think your approach would also need to do this.
> 
> I'm not implementing this and not asking you to implement it.  Let's just use
> this for discussion so we can figure out what the final vhost-pci will look 
> like.
> 
> Please let me know what you think, Wei, Michael, and others.
> 

Thanks for sharing the thoughts. If I understand it correctly, the key 
difference is that this approach tries to relay every vhost-user msg to the 
guest. I'm not sure about the benefits of doing this. 
To make data plane (i.e. driver to send/receive packets) work, I think, mostly, 
the memory info and vring info are enough. Other things like callfd, kickfd 
don't need to be sent to the guest, they are needed by QEMU only for the 
eventfd and irqfd setup.



> ---
> vhost-pci device specification
> -------------------------------
> The vhost-pci device allows guests to act as vhost-user slaves.  This enables
> appliance VMs like network switches or storage targets to back devices in
> other VMs.  VM-to-VM communication is possible without vmexits using
> polling mode drivers.
> 
> The vhost-user protocol has been used to implement virtio devices in
> userspace processes on the host.  vhost-pci maps the vhost-user protocol to
> a PCI adapter so guest software can perform virtio device emulation.
> This is useful in environments where high-performance VM-to-VM
> communication is necessary or where it is preferrable to deploy device
> emulation as VMs instead of host userspace processes.
> 
> The vhost-user protocol involves file descriptor passing and shared memory.
> This precludes vhost-user slave implementations over virtio-vsock, virtio-
> serial, or TCP/IP.  Therefore a new device type is needed to expose the
> vhost-user protocol to guests.
> 
> The vhost-pci PCI adapter has the following resources:
> 
> Queues (used for vhost-user protocol communication):
> 1. Master-to-slave messages
> 2. Slave-to-master messages
> 
> Doorbells (used for slave->guest/master events):
> 1. Vring call (one doorbell per virtqueue) 2. Vring err (one doorbell per
> virtqueue) 3. Log changed
> 
> Interrupts (used for guest->slave events):
> 1. Vring kick (one MSI per virtqueue)
> 
> Shared Memory BARs:
> 1. Guest memory
> 2. Log
> 
> Master-to-slave queue:
> The following vhost-user protocol messages are relayed from the vhost-user
> master.  Each message follows the vhost-user protocol VhostUserMsg layout.
> 
> Messages that include file descriptor passing are relayed but do not carry 
> file
> descriptors.  The relevant resources (doorbells, interrupts, or shared memory
> BARs) are initialized from the file descriptors prior to the message becoming
> available on the Master-to-Slave queue.
> 
> Resources must only be used after the corresponding vhost-user message
> has been received.  For example, the Vring call doorbell can only be used
> after VHOST_USER_SET_VRING_CALL becomes available on the Master-to-
> Slave queue.
> 
> Messages must be processed in order.
> 
> The following vhost-user protocol messages are relayed:
>  * VHOST_USER_GET_FEATURES
>  * VHOST_USER_SET_FEATURES
>  * VHOST_USER_GET_PROTOCOL_FEATURES
>  * VHOST_USER_SET_PROTOCOL_FEATURES
>  * VHOST_USER_SET_OWNER
>  * VHOST_USER_SET_MEM_TABLE
>    The shared memory is available in the corresponding BAR.
>  * VHOST_USER_SET_LOG_BASE
>    The shared memory is available in the corresponding BAR.
>  * VHOST_USER_SET_LOG_FD
>    The logging file descriptor can be signalled through the logging
>    virtqueue.
>  * VHOST_USER_SET_VRING_NUM
>  * VHOST_USER_SET_VRING_ADDR
>  * VHOST_USER_SET_VRING_BASE
>  * VHOST_USER_GET_VRING_BASE
>  * VHOST_USER_SET_VRING_KICK
>    This message is still needed because it may indicate only polling
>    mode is supported.
>  * VHOST_USER_SET_VRING_CALL
>    This message is still needed because it may indicate only polling
>    mode is supported.
>  * VHOST_USER_SET_VRING_ERR
>  * VHOST_USER_GET_QUEUE_NUM
>  * VHOST_USER_SET_VRING_ENABLE
>  * VHOST_USER_SEND_RARP
>  * VHOST_USER_NET_SET_MTU
>  * VHOST_USER_SET_SLAVE_REQ_FD
>  * VHOST_USER_IOTLB_MSG
>  * VHOST_USER_SET_VRING_ENDIAN
> 
> Slave-to-Master queue:
> Messages added to the Slave-to-Master queue are sent to the vhost-user
> master.  Each message follows the vhost-user protocol VhostUserMsg layout.
> 
> The following vhost-user protocol messages are relayed:
> 
>  * VHOST_USER_SLAVE_IOTLB_MSG
> 
> Theory of Operation:
> When the vhost-pci adapter is detected the queues must be set up by the
> driver.  Once the driver is ready the vhost-pci device begins relaying vhost-
> user protocol messages over the Master-to-Slave queue.  The driver must
> follow the vhost-user protocol specification to implement virtio device
> initialization and virtqueue processing.
> 
> Notes:
> The vhost-user UNIX domain socket connects two host processes.  The slave
> process interprets messages and initializes vhost-pci resources (doorbells,
> interrupts, shared memory BARs) based on them before relaying via the
> Master-to-Slave queue.  All messages are relayed, even if they only pass a
> file descriptor, because the message itself may act as a signal (e.g. 
> virtqueue
> is now enabled).
> 
> vhost-pci is a PCI adapter instead of a virtio device to allow doorbells and
> interrupts to be connected to the virtio device in the master VM in the most
> efficient way possible.  This means the Vring call doorbell can be an
> ioeventfd that signals an irqfd inside the host kernel without host userspace
> involvement.  The Vring kick interrupt can be an irqfd that is signalled by 
> the
> master VM's virtqueue ioeventfd.
> 


This looks the same as the implementation of inter-VM notification in v2:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg450005.html
which is fig. 4 here: 
https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf

When the vhost-pci driver kicks its tx, the host signals the irqfd of 
virtio-net's rx. I think this has already bypassed the host userspace (thanks 
to the fast mmio implementation)

Best,
Wei

Reply via email to