Re: [PATCH] vhost: introduce vDPA based backend

2020-02-19 Thread Tiwei Bie
On Wed, Feb 19, 2020 at 09:11:02AM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 19, 2020 at 10:52:38AM +0800, Tiwei Bie wrote:
> > > > +static int __init vhost_vdpa_init(void)
> > > > +{
> > > > +   int r;
> > > > +
> > > > +   idr_init(_vdpa.idr);
> > > > +   mutex_init(_vdpa.mutex);
> > > > +   init_waitqueue_head(_vdpa.release_q);
> > > > +
> > > > +   /* /dev/vhost-vdpa/$vdpa_device_index */
> > > > +   vhost_vdpa.class = class_create(THIS_MODULE, "vhost-vdpa");
> > > > +   if (IS_ERR(vhost_vdpa.class)) {
> > > > +   r = PTR_ERR(vhost_vdpa.class);
> > > > +   goto err_class;
> > > > +   }
> > > > +
> > > > +   vhost_vdpa.class->devnode = vhost_vdpa_devnode;
> > > > +
> > > > +   r = alloc_chrdev_region(_vdpa.devt, 0, MINORMASK + 1,
> > > > +   "vhost-vdpa");
> > > > +   if (r)
> > > > +   goto err_alloc_chrdev;
> > > > +
> > > > +   cdev_init(_vdpa.cdev, _vdpa_fops);
> > > > +   r = cdev_add(_vdpa.cdev, vhost_vdpa.devt, MINORMASK + 1);
> > > > +   if (r)
> > > > +   goto err_cdev_add;
> > > 
> > > It is very strange, is the intention to create a single global char
> > > dev?
> > 
> > No. It's to create a per-vdpa char dev named
> > vhost-vdpa/$vdpa_device_index in dev.
> > 
> > I followed the code in VFIO which creates char dev
> > vfio/$GROUP dynamically, e.g.:
> > 
> > https://github.com/torvalds/linux/blob/b1da3acc781c/drivers/vfio/vfio.c#L2164-L2180
> > https://github.com/torvalds/linux/blob/b1da3acc781c/drivers/vfio/vfio.c#L373-L387
> > https://github.com/torvalds/linux/blob/b1da3acc781c/drivers/vfio/vfio.c#L1553
> > 
> > Is it something unwanted?
> 
> Yes it is unwanted. This is some special pattern for vfio's unique
> needs. 
> 
> Since this has a struct device for each char dev instance please use
> the normal cdev_device_add() driven pattern here, or justify why it
> needs to be special like this.

I see. Thanks! I will embed the cdev in each vhost_vdpa
structure directly.

Regards,
Tiwei

> 
> Jason
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 3/5] vDPA: introduce vDPA bus

2020-02-18 Thread Tiwei Bie
On Tue, Feb 18, 2020 at 01:56:12PM +, Jason Gunthorpe wrote:
> On Mon, Feb 17, 2020 at 02:08:03PM +0800, Jason Wang wrote:
> 
> > I thought you were copied in the patch [1], maybe we can move vhost related
> > discussion there to avoid confusion.
> >
> > [1] https://lwn.net/Articles/811210/
> 
> Wow, that is .. confusing.
> 
> So this is supposed to duplicate the uAPI of vhost-user? But it is
> open coded and duplicated because .. vdpa?

Do you mean the vhost-user in DPDK? There is no vhost-user
in Linux kernel.

Thanks,
Tiwei

> 
> > So it's cheaper and simpler to introduce a new bus instead of refactoring a
> > well known bus and API where brunches of drivers and devices had been
> > implemented for years.
> 
> If you reason for this approach is to ease the implementation then you
> should talk about it in the cover letters/etc
> 
> Maybe it is reasonable to do this because the rework is too great, I
> don't know, but to me this whole thing looks rather messy. 
> 
> Remember this stuff is all uAPI as it shows up in sysfs, so you can
> easilly get stuck with it forever.
> 
> Jason
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost: introduce vDPA based backend

2020-02-18 Thread Tiwei Bie
On Tue, Feb 18, 2020 at 09:53:59AM -0400, Jason Gunthorpe wrote:
> On Fri, Jan 31, 2020 at 11:36:51AM +0800, Tiwei Bie wrote:
> 
> > +static int vhost_vdpa_alloc_minor(struct vhost_vdpa *v)
> > +{
> > +   return idr_alloc(_vdpa.idr, v, 0, MINORMASK + 1,
> > +GFP_KERNEL);
> > +}
> 
> Please don't use idr in new code, use xarray directly
> 
> > +static int vhost_vdpa_probe(struct device *dev)
> > +{
> > +   struct vdpa_device *vdpa = dev_to_vdpa(dev);
> > +   const struct vdpa_config_ops *ops = vdpa->config;
> > +   struct vhost_vdpa *v;
> > +   struct device *d;
> > +   int minor, nvqs;
> > +   int r;
> > +
> > +   /* Currently, we only accept the network devices. */
> > +   if (ops->get_device_id(vdpa) != VIRTIO_ID_NET) {
> > +   r = -ENOTSUPP;
> > +   goto err;
> > +   }
> > +
> > +   v = kzalloc(sizeof(*v), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
> > +   if (!v) {
> > +   r = -ENOMEM;
> > +   goto err;
> > +   }
> > +
> > +   nvqs = VHOST_VDPA_VQ_MAX;
> > +
> > +   v->vqs = kmalloc_array(nvqs, sizeof(struct vhost_virtqueue),
> > +  GFP_KERNEL);
> > +   if (!v->vqs) {
> > +   r = -ENOMEM;
> > +   goto err_alloc_vqs;
> > +   }
> > +
> > +   mutex_init(>mutex);
> > +   atomic_set(>opened, 0);
> > +
> > +   v->vdpa = vdpa;
> > +   v->nvqs = nvqs;
> > +   v->virtio_id = ops->get_device_id(vdpa);
> > +
> > +   mutex_lock(_vdpa.mutex);
> > +
> > +   minor = vhost_vdpa_alloc_minor(v);
> > +   if (minor < 0) {
> > +   r = minor;
> > +   goto err_alloc_minor;
> > +   }
> > +
> > +   d = device_create(vhost_vdpa.class, NULL,
> > + MKDEV(MAJOR(vhost_vdpa.devt), minor),
> > + v, "%d", vdpa->index);
> > +   if (IS_ERR(d)) {
> > +   r = PTR_ERR(d);
> > +   goto err_device_create;
> > +   }
> > +
> 
> I can't understand what this messing around with major/minor numbers
> does. Without allocating a cdev via cdev_add/etc there is only a
> single char dev in existence here. This and the stuff in
> vhost_vdpa_open() looks non-functional.

I followed the code in VFIO. Please see more details below.

> 
> > +static void vhost_vdpa_remove(struct device *dev)
> > +{
> > +   DEFINE_WAIT_FUNC(wait, woken_wake_function);
> > +   struct vhost_vdpa *v = dev_get_drvdata(dev);
> > +   int opened;
> > +
> > +   add_wait_queue(_vdpa.release_q, );
> > +
> > +   do {
> > +   opened = atomic_cmpxchg(>opened, 0, 1);
> > +   if (!opened)
> > +   break;
> > +   wait_woken(, TASK_UNINTERRUPTIBLE, HZ * 10);
> > +   } while (1);
> > +
> > +   remove_wait_queue(_vdpa.release_q, );
> 
> *barf* use the normal refcount pattern please
> 
> read side:
> 
>   refcount_inc_not_zero(uses)
>   //stuff
>   if (refcount_dec_and_test(uses))
>  complete(completer)
> 
> destroy side:
>   if (refcount_dec_and_test(uses))
>  complete(completer)
>   wait_for_completion(completer)
>   // refcount now permanently == 0
> 
> Use a completion in driver code
> 
> > +   mutex_lock(_vdpa.mutex);
> > +   device_destroy(vhost_vdpa.class,
> > +  MKDEV(MAJOR(vhost_vdpa.devt), v->minor));
> > +   vhost_vdpa_free_minor(v->minor);
> > +   mutex_unlock(_vdpa.mutex);
> > +   kfree(v->vqs);
> > +   kfree(v);
> 
> This use after-fress vs vhost_vdpa_open prior to it setting the open
> bit. Maybe use xarray, rcu and kfree_rcu ..
> 
> > +static int __init vhost_vdpa_init(void)
> > +{
> > +   int r;
> > +
> > +   idr_init(_vdpa.idr);
> > +   mutex_init(_vdpa.mutex);
> > +   init_waitqueue_head(_vdpa.release_q);
> > +
> > +   /* /dev/vhost-vdpa/$vdpa_device_index */
> > +   vhost_vdpa.class = class_create(THIS_MODULE, "vhost-vdpa");
> > +   if (IS_ERR(vhost_vdpa.class)) {
> > +   r = PTR_ERR(vhost_vdpa.class);
> > +   goto err_class;
> > +   }
> > +
> > +   vhost_vdpa.class->devnode = vhost_vdpa_devnode;
> > +
> > +   r = alloc_chrdev_region(_vdpa.devt, 0, MINORMASK + 1,
> > +   "vhost-vdpa");
> > +   if (r)
> > +   goto err_alloc_chrdev;
> > +
> > +   cdev_init(_vdpa.cdev, _vdpa_fops);
> > +   r = cdev_add(_vdpa.cdev, vh

Re: [PATCH] vhost: introduce vDPA based backend

2020-02-04 Thread Tiwei Bie
On Tue, Feb 04, 2020 at 02:46:16PM +0800, Jason Wang wrote:
> On 2020/2/4 下午2:01, Michael S. Tsirkin wrote:
> > On Tue, Feb 04, 2020 at 11:30:11AM +0800, Jason Wang wrote:
> > > 5) generate diffs of memory table and using IOMMU API to setup the dma
> > > mapping in this method
> > Frankly I think that's a bunch of work. Why not a MAP/UNMAP interface?
> > 
> 
> Sure, so that basically VHOST_IOTLB_UPDATE/INVALIDATE I think?

Do you mean we let userspace to only use VHOST_IOTLB_UPDATE/INVALIDATE
to do the DMA mapping in vhost-vdpa case? When vIOMMU isn't available,
userspace will set msg->iova to GPA, otherwise userspace will set
msg->iova to GIOVA, and vhost-vdpa module will get HPA from msg->uaddr?

Thanks,
Tiwei

> 
> Thanks
> 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost: introduce vDPA based backend

2020-02-04 Thread Tiwei Bie
On Tue, Feb 04, 2020 at 11:30:11AM +0800, Jason Wang wrote:
> On 2020/1/31 上午11:36, Tiwei Bie wrote:
> > This patch introduces a vDPA based vhost backend. This
> > backend is built on top of the same interface defined
> > in virtio-vDPA and provides a generic vhost interface
> > for userspace to accelerate the virtio devices in guest.
> > 
> > This backend is implemented as a vDPA device driver on
> > top of the same ops used in virtio-vDPA. It will create
> > char device entry named vhost-vdpa/$vdpa_device_index
> > for userspace to use. Userspace can use vhost ioctls on
> > top of this char device to setup the backend.
> > 
> > Signed-off-by: Tiwei Bie 
> > ---
> > This patch depends on below series:
> > https://lkml.org/lkml/2020/1/16/353
> > 
> > Please note that _SET_MEM_TABLE isn't fully supported yet.
> > Comments would be appreciated!
> > 
> > Changes since last patch (https://lkml.org/lkml/2019/11/18/1068)
> > - Switch to the vDPA bus;
> > - Switch to vhost's own chardev;
> > 
> >   drivers/vhost/Kconfig|  12 +
> >   drivers/vhost/Makefile   |   3 +
> >   drivers/vhost/vdpa.c | 705 +++
> >   include/uapi/linux/vhost.h   |  21 +
> >   include/uapi/linux/vhost_types.h |   8 +
> >   5 files changed, 749 insertions(+)
> >   create mode 100644 drivers/vhost/vdpa.c
> > 
> > diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> > index f21c45aa5e07..13e6a94d0243 100644
> > --- a/drivers/vhost/Kconfig
> > +++ b/drivers/vhost/Kconfig
> > @@ -34,6 +34,18 @@ config VHOST_VSOCK
> > To compile this driver as a module, choose M here: the module will be 
> > called
> > vhost_vsock.
> > +config VHOST_VDPA
> > +   tristate "Vhost driver for vDPA based backend"
> > +   depends on EVENTFD && VDPA
> > +   select VHOST
> > +   default n
> > +   ---help---
> > +   This kernel module can be loaded in host kernel to accelerate
> > +   guest virtio devices with the vDPA based backends.
> > +
> > +   To compile this driver as a module, choose M here: the module
> > +   will be called vhost_vdpa.
> > +
> >   config VHOST
> > tristate
> >   depends on VHOST_IOTLB
> > diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
> > index df99756fbb26..a65e9f4a2c0a 100644
> > --- a/drivers/vhost/Makefile
> > +++ b/drivers/vhost/Makefile
> > @@ -10,6 +10,9 @@ vhost_vsock-y := vsock.o
> >   obj-$(CONFIG_VHOST_RING) += vringh.o
> > +obj-$(CONFIG_VHOST_VDPA) += vhost_vdpa.o
> > +vhost_vdpa-y := vdpa.o
> > +
> >   obj-$(CONFIG_VHOST)   += vhost.o
> >   obj-$(CONFIG_VHOST_IOTLB) += vhost_iotlb.o
> > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > new file mode 100644
> > index ..631d994d37ac
> > --- /dev/null
> > +++ b/drivers/vhost/vdpa.c
> > @@ -0,0 +1,705 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2018-2020 Intel Corporation.
> > + *
> > + * Author: Tiwei Bie 
> > + *
> > + * Thanks to Jason Wang and Michael S. Tsirkin for the valuable
> > + * comments and suggestions.  And thanks to Cunming Liang and
> > + * Zhihong Wang for all their supports.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "vhost.h"
> > +
> > +enum {
> > +   VHOST_VDPA_FEATURES =
> > +   (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) |
> > +   (1ULL << VIRTIO_F_ANY_LAYOUT) |
> > +   (1ULL << VIRTIO_F_VERSION_1) |
> > +   (1ULL << VIRTIO_F_IOMMU_PLATFORM) |
> > +   (1ULL << VIRTIO_F_RING_PACKED) |
> > +   (1ULL << VIRTIO_F_ORDER_PLATFORM) |
> > +   (1ULL << VIRTIO_RING_F_INDIRECT_DESC) |
> > +   (1ULL << VIRTIO_RING_F_EVENT_IDX),
> > +
> > +   VHOST_VDPA_NET_FEATURES = VHOST_VDPA_FEATURES |
> > +   (1ULL << VIRTIO_NET_F_CSUM) |
> > +   (1ULL << VIRTIO_NET_F_GUEST_CSUM) |
> > +   (1ULL << VIRTIO_NET_F_MTU) |
> > +   (1ULL << VIRTIO_NET_F_MAC) |
> > +   (1ULL << VIRTIO_NET_F_GUEST_TSO4) |
> > +   (1ULL << VIRTIO_NET_F_GUEST_TSO6) |
> > +   (1ULL << VIRTIO_NET_F_GUEST_ECN) |
> > +   

Re: [PATCH] vhost: introduce vDPA based backend

2020-01-30 Thread Tiwei Bie
On Thu, Jan 30, 2020 at 09:12:57PM -0800, Randy Dunlap wrote:
> On 1/30/20 7:56 PM, Randy Dunlap wrote:
> > Hi,
> > 
> > On 1/30/20 7:36 PM, Tiwei Bie wrote:
> >> diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> >> index f21c45aa5e07..13e6a94d0243 100644
> >> --- a/drivers/vhost/Kconfig
> >> +++ b/drivers/vhost/Kconfig
> >> @@ -34,6 +34,18 @@ config VHOST_VSOCK
> >>To compile this driver as a module, choose M here: the module will be 
> >> called
> >>vhost_vsock.
> >>  
> >> +config VHOST_VDPA
> >> +  tristate "Vhost driver for vDPA based backend"
> 
> oops, missed this one:
>  vDPA-based

Will fix. Thanks!

> 
> >> +  depends on EVENTFD && VDPA
> >> +  select VHOST
> >> +  default n
> >> +  ---help---
> >> +  This kernel module can be loaded in host kernel to accelerate
> >> +  guest virtio devices with the vDPA based backends.
> > 
> >   vDPA-based
> > 
> >> +
> >> +  To compile this driver as a module, choose M here: the module
> >> +  will be called vhost_vdpa.
> >> +
> > 
> > The preferred Kconfig style nowadays is
> > (a) use "help" instead of "---help---"
> > (b) indent the help text with one tab + 2 spaces
> > 
> > and don't use "default n" since that is already the default.
> > 
> >>  config VHOST
> >>tristate
> >>  depends on VHOST_IOTLB
> > 
> > thanks.
> > 
> 
> 
> -- 
> ~Randy
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost: introduce vDPA based backend

2020-01-30 Thread Tiwei Bie
On Thu, Jan 30, 2020 at 07:56:43PM -0800, Randy Dunlap wrote:
> Hi,
> 
> On 1/30/20 7:36 PM, Tiwei Bie wrote:
> > diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> > index f21c45aa5e07..13e6a94d0243 100644
> > --- a/drivers/vhost/Kconfig
> > +++ b/drivers/vhost/Kconfig
> > @@ -34,6 +34,18 @@ config VHOST_VSOCK
> > To compile this driver as a module, choose M here: the module will be 
> > called
> > vhost_vsock.
> >  
> > +config VHOST_VDPA
> > +   tristate "Vhost driver for vDPA based backend"
> > +   depends on EVENTFD && VDPA
> > +   select VHOST
> > +   default n
> > +   ---help---
> > +   This kernel module can be loaded in host kernel to accelerate
> > +   guest virtio devices with the vDPA based backends.
> 
> vDPA-based

Will fix this and other similar ones in the patch. Thanks!

> 
> > +
> > +   To compile this driver as a module, choose M here: the module
> > +   will be called vhost_vdpa.
> > +
> 
> The preferred Kconfig style nowadays is
> (a) use "help" instead of "---help---"
> (b) indent the help text with one tab + 2 spaces
> 
> and don't use "default n" since that is already the default.

Will fix in the next version.

Thanks,
Tiwei

> 
> >  config VHOST
> > tristate
> >  depends on VHOST_IOTLB
> 
> thanks.
> -- 
> ~Randy
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] vhost: introduce vDPA based backend

2020-01-30 Thread Tiwei Bie
This patch introduces a vDPA based vhost backend. This
backend is built on top of the same interface defined
in virtio-vDPA and provides a generic vhost interface
for userspace to accelerate the virtio devices in guest.

This backend is implemented as a vDPA device driver on
top of the same ops used in virtio-vDPA. It will create
char device entry named vhost-vdpa/$vdpa_device_index
for userspace to use. Userspace can use vhost ioctls on
top of this char device to setup the backend.

Signed-off-by: Tiwei Bie 
---
This patch depends on below series:
https://lkml.org/lkml/2020/1/16/353

Please note that _SET_MEM_TABLE isn't fully supported yet.
Comments would be appreciated!

Changes since last patch (https://lkml.org/lkml/2019/11/18/1068)
- Switch to the vDPA bus;
- Switch to vhost's own chardev;

 drivers/vhost/Kconfig|  12 +
 drivers/vhost/Makefile   |   3 +
 drivers/vhost/vdpa.c | 705 +++
 include/uapi/linux/vhost.h   |  21 +
 include/uapi/linux/vhost_types.h |   8 +
 5 files changed, 749 insertions(+)
 create mode 100644 drivers/vhost/vdpa.c

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index f21c45aa5e07..13e6a94d0243 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -34,6 +34,18 @@ config VHOST_VSOCK
To compile this driver as a module, choose M here: the module will be 
called
vhost_vsock.
 
+config VHOST_VDPA
+   tristate "Vhost driver for vDPA based backend"
+   depends on EVENTFD && VDPA
+   select VHOST
+   default n
+   ---help---
+   This kernel module can be loaded in host kernel to accelerate
+   guest virtio devices with the vDPA based backends.
+
+   To compile this driver as a module, choose M here: the module
+   will be called vhost_vdpa.
+
 config VHOST
tristate
 depends on VHOST_IOTLB
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index df99756fbb26..a65e9f4a2c0a 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -10,6 +10,9 @@ vhost_vsock-y := vsock.o
 
 obj-$(CONFIG_VHOST_RING) += vringh.o
 
+obj-$(CONFIG_VHOST_VDPA) += vhost_vdpa.o
+vhost_vdpa-y := vdpa.o
+
 obj-$(CONFIG_VHOST)+= vhost.o
 
 obj-$(CONFIG_VHOST_IOTLB) += vhost_iotlb.o
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
new file mode 100644
index ..631d994d37ac
--- /dev/null
+++ b/drivers/vhost/vdpa.c
@@ -0,0 +1,705 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018-2020 Intel Corporation.
+ *
+ * Author: Tiwei Bie 
+ *
+ * Thanks to Jason Wang and Michael S. Tsirkin for the valuable
+ * comments and suggestions.  And thanks to Cunming Liang and
+ * Zhihong Wang for all their supports.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vhost.h"
+
+enum {
+   VHOST_VDPA_FEATURES =
+   (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) |
+   (1ULL << VIRTIO_F_ANY_LAYOUT) |
+   (1ULL << VIRTIO_F_VERSION_1) |
+   (1ULL << VIRTIO_F_IOMMU_PLATFORM) |
+   (1ULL << VIRTIO_F_RING_PACKED) |
+   (1ULL << VIRTIO_F_ORDER_PLATFORM) |
+   (1ULL << VIRTIO_RING_F_INDIRECT_DESC) |
+   (1ULL << VIRTIO_RING_F_EVENT_IDX),
+
+   VHOST_VDPA_NET_FEATURES = VHOST_VDPA_FEATURES |
+   (1ULL << VIRTIO_NET_F_CSUM) |
+   (1ULL << VIRTIO_NET_F_GUEST_CSUM) |
+   (1ULL << VIRTIO_NET_F_MTU) |
+   (1ULL << VIRTIO_NET_F_MAC) |
+   (1ULL << VIRTIO_NET_F_GUEST_TSO4) |
+   (1ULL << VIRTIO_NET_F_GUEST_TSO6) |
+   (1ULL << VIRTIO_NET_F_GUEST_ECN) |
+   (1ULL << VIRTIO_NET_F_GUEST_UFO) |
+   (1ULL << VIRTIO_NET_F_HOST_TSO4) |
+   (1ULL << VIRTIO_NET_F_HOST_TSO6) |
+   (1ULL << VIRTIO_NET_F_HOST_ECN) |
+   (1ULL << VIRTIO_NET_F_HOST_UFO) |
+   (1ULL << VIRTIO_NET_F_MRG_RXBUF) |
+   (1ULL << VIRTIO_NET_F_STATUS) |
+   (1ULL << VIRTIO_NET_F_SPEED_DUPLEX),
+};
+
+/* Currently, only network backend w/o multiqueue is supported. */
+#define VHOST_VDPA_VQ_MAX  2
+
+struct vhost_vdpa {
+   /* The lock is to protect this structure. */
+   struct mutex mutex;
+   struct vhost_dev vdev;
+   struct vhost_virtqueue *vqs;
+   struct vdpa_device *vdpa;
+   struct device *dev;
+   atomic_t opened;
+   int nvqs;
+   int virtio_id;
+   int minor;
+};
+
+static struct {
+   /* The lock is to protect this structure. */
+   struct mutex mutex;
+   struct class *class;
+   struct idr idr;
+   struct cdev cdev;
+   dev_t devt;
+   wait_queue_head_t release_q;
+} vhost_vdpa;
+
+static const u64 v

[PATCH v6] vhost: introduce mdev based hardware backend

2019-11-06 Thread Tiwei Bie
This patch introduces a mdev based hardware vhost backend.
This backend is built on top of the same abstraction used
in virtio-mdev and provides a generic vhost interface for
userspace to accelerate the virtio devices in guest.

This backend is implemented as a mdev device driver on top
of the same mdev device ops used in virtio-mdev but using
a different mdev class id, and it will register the device
as a VFIO device for userspace to use. Userspace can setup
the IOMMU with the existing VFIO container/group APIs and
then get the device fd with the device name. After getting
the device fd, userspace can use vhost ioctls on top of it
to setup the backend.

Signed-off-by: Tiwei Bie 
---
This patch depends on below series:
https://lkml.org/lkml/2019/11/6/538

v5 -> v6:
- Filter out VHOST_SET_LOG_BASE/VHOST_SET_LOG_FD (Jason);
- Simplify len/off check (Jason);
- Address checkpatch warnings, some of them are ignored
  to keep the coding style consistent with existing ones;

v4 -> v5:
- Rebase on top of virtio-mdev series v8;
- Use the virtio_ops of mdev_device in vhost-mdev (Jason);
- Some minor improvements on commit log;

v3 -> v4:
- Rebase on top of virtio-mdev series v6;
- Some minor tweaks and improvements;

v2 -> v3:
- Fix the return value (Jason);
- Don't cache unnecessary information in vhost-mdev (Jason);
- Get rid of the memset in open (Jason);
- Add comments for VHOST_SET_MEM_TABLE, ... (Jason);
- Filter out unsupported features in vhost-mdev (Jason);
- Add _GET_DEVICE_ID ioctl (Jason);
- Add _GET_CONFIG/_SET_CONFIG ioctls (Jason);
- Drop _GET_QUEUE_NUM ioctl (Jason);
- Fix the copy-paste errors in _IOW/_IOR usage;
- Some minor fixes and improvements;

v1 -> v2:
- Replace _SET_STATE with _SET_STATUS (MST);
- Check status bits at each step (MST);
- Report the max ring size and max number of queues (MST);
- Add missing MODULE_DEVICE_TABLE (Jason);
- Only support the network backend w/o multiqueue for now;
- Some minor fixes and improvements;
- Rebase on top of virtio-mdev series v4;

RFC v4 -> v1:
- Implement vhost-mdev as a mdev device driver directly and
  connect it to VFIO container/group. (Jason);
- Pass ring addresses as GPAs/IOVAs in vhost-mdev to avoid
  meaningless HVA->GPA translations (Jason);

RFC v3 -> RFC v4:
- Build vhost-mdev on top of the same abstraction used by
  virtio-mdev (Jason);
- Introduce vhost fd and pass VFIO fd via SET_BACKEND ioctl (MST);

RFC v2 -> RFC v3:
- Reuse vhost's ioctls instead of inventing a VFIO regions/irqs
  based vhost protocol on top of vfio-mdev (Jason);

RFC v1 -> RFC v2:
- Introduce a new VFIO device type to build a vhost protocol
  on top of vfio-mdev;

 drivers/vfio/mdev/mdev_core.c|  21 ++
 drivers/vhost/Kconfig|  12 +
 drivers/vhost/Makefile   |   3 +
 drivers/vhost/mdev.c | 556 +++
 include/linux/mdev.h |   5 +
 include/uapi/linux/vhost.h   |  21 ++
 include/uapi/linux/vhost_types.h |   8 +
 7 files changed, 626 insertions(+)
 create mode 100644 drivers/vhost/mdev.c

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index c58253404ed5..d855be5afbae 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -99,6 +99,27 @@ mdev_get_virtio_ops(struct mdev_device *mdev)
 }
 EXPORT_SYMBOL(mdev_get_virtio_ops);
 
+/*
+ * Specify the vhost device ops for the mdev device, this
+ * must be called during create() callback for vhost mdev device.
+ */
+void mdev_set_vhost_ops(struct mdev_device *mdev,
+   const struct mdev_virtio_device_ops *vhost_ops)
+{
+   mdev_set_class(mdev, MDEV_CLASS_ID_VHOST);
+   mdev->virtio_ops = vhost_ops;
+}
+EXPORT_SYMBOL(mdev_set_vhost_ops);
+
+/* Get the vhost device ops for the mdev device. */
+const struct mdev_virtio_device_ops *
+mdev_get_vhost_ops(struct mdev_device *mdev)
+{
+   WARN_ON(mdev->class_id != MDEV_CLASS_ID_VHOST);
+   return mdev->virtio_ops;
+}
+EXPORT_SYMBOL(mdev_get_vhost_ops);
+
 struct device *mdev_dev(struct mdev_device *mdev)
 {
return >dev;
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 3d03ccbd1adc..062cada28f89 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -34,6 +34,18 @@ config VHOST_VSOCK
To compile this driver as a module, choose M here: the module will be 
called
vhost_vsock.
 
+config VHOST_MDEV
+   tristate "Vhost driver for Mediated devices"
+   depends on EVENTFD && VFIO && VFIO_MDEV
+   select VHOST
+   default n
+   ---help---
+   This kernel module can be loaded in host kernel to accelerate
+   guest virtio devices with the mediated device based backends.
+
+   To compile this driver as a module, choose M here: the module will
+   be called vhost_mdev.
+
 config VHOST
tristate
---help---
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile

Re: [PATCH v5] vhost: introduce mdev based hardware backend

2019-11-06 Thread Tiwei Bie
On Thu, Nov 07, 2019 at 12:08:08PM +0800, Jason Wang wrote:
> On 2019/11/6 下午10:49, Tiwei Bie wrote:
> > > > > > > + default:
> > > > > > > + /*
> > > > > > > +  * VHOST_SET_MEM_TABLE, VHOST_SET_LOG_BASE, and
> > > > > > > +  * VHOST_SET_LOG_FD are not used yet.
> > > > > > > +  */
> > > > > > If we don't even use them, there's probably no need to call
> > > > > > vhost_dev_ioctl(). This may help to avoid confusion when we want to 
> > > > > > develop
> > > > > > new API for e.g dirty page tracking.
> > > > > Good point. It's better to reject these ioctls for now.
> > > > > 
> > > > > PS. One thing I may need to clarify is that, we need the
> > > > > VHOST_SET_OWNER ioctl to get the vq->handle_kick to work.
> > > > > So if we don't call vhost_dev_ioctl(), we will need to
> > > > > call vhost_dev_set_owner() directly.
> > > I may miss something, it looks to me the there's no owner check in
> > > vhost_vring_ioctl() and the vhost_poll_start() can make sure handle_kick
> > > works?
> > Yeah, there is no owner check in vhost_vring_ioctl().
> > IIUC, vhost_poll_start() will start polling the file. And when
> > event arrives, vhost_poll_wakeup() will be called, and it will
> > queue work to work_list and wakeup worker to finish the work.
> > And the worker is created by vhost_dev_set_owner().
> > 
> 
> Right, rethink about this. It looks to me we need:
> 
> - Keep VHOST_SET_OWNER, this could be used for future control vq where it
> needs a kthread to access the userspace memory
> 
> - Temporarily filter  SET_LOG_BASE and SET_LOG_FD until we finalize the API
> for dirty page tracking.
> 
> - For kick through kthread, it looks sub-optimal but we can address this in
> the future, e.g call handle_vq_kick directly in vhost_poll_queue (probably a
> flag for vhost_poll) and deal with the synchronization in vhost_poll_flush
> carefully.

OK.

Thanks,
Tiwei

> 
> Thanks
> 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v5] vhost: introduce mdev based hardware backend

2019-11-06 Thread Tiwei Bie
On Thu, Nov 07, 2019 at 12:16:01PM +0800, Jason Wang wrote:
> On 2019/11/6 下午10:39, Tiwei Bie wrote:
> > On Wed, Nov 06, 2019 at 07:59:02AM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Nov 05, 2019 at 07:53:32PM +0800, Tiwei Bie wrote:
> > > > This patch introduces a mdev based hardware vhost backend.
> > > > This backend is built on top of the same abstraction used
> > > > in virtio-mdev and provides a generic vhost interface for
> > > > userspace to accelerate the virtio devices in guest.
> > > > 
> > > > This backend is implemented as a mdev device driver on top
> > > > of the same mdev device ops used in virtio-mdev but using
> > > > a different mdev class id, and it will register the device
> > > > as a VFIO device for userspace to use. Userspace can setup
> > > > the IOMMU with the existing VFIO container/group APIs and
> > > > then get the device fd with the device name. After getting
> > > > the device fd, userspace can use vhost ioctls on top of it
> > > > to setup the backend.
> > > > 
> > > > Signed-off-by: Tiwei Bie 
> > > So at this point, looks like the only thing missing is IFC, and then all
> > > these patches can go in.
> > > But as IFC is still being worked on anyway, it makes sense to
> > > address the minor comments manwhile so we don't need
> > > patches on top.
> > > Right?
> > Yeah, of course.
> > 
> > Thanks,
> > Tiwei
> 
> 
> Please send V6 and I will ack there.

Got it, I will send it soon.

Thanks!
Tiwei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v5] vhost: introduce mdev based hardware backend

2019-11-06 Thread Tiwei Bie
On Wed, Nov 06, 2019 at 09:20:20PM +0800, Jason Wang wrote:
> On 2019/11/6 下午8:57, Michael S. Tsirkin wrote:
> > On Wed, Nov 06, 2019 at 08:22:50PM +0800, Tiwei Bie wrote:
> > > On Wed, Nov 06, 2019 at 03:54:45PM +0800, Jason Wang wrote:
> > > > On 2019/11/5 下午7:53, Tiwei Bie wrote:
> > > > > This patch introduces a mdev based hardware vhost backend.
> > > > > This backend is built on top of the same abstraction used
> > > > > in virtio-mdev and provides a generic vhost interface for
> > > > > userspace to accelerate the virtio devices in guest.
> > > > > 
> > > > > This backend is implemented as a mdev device driver on top
> > > > > of the same mdev device ops used in virtio-mdev but using
> > > > > a different mdev class id, and it will register the device
> > > > > as a VFIO device for userspace to use. Userspace can setup
> > > > > the IOMMU with the existing VFIO container/group APIs and
> > > > > then get the device fd with the device name. After getting
> > > > > the device fd, userspace can use vhost ioctls on top of it
> > > > > to setup the backend.
> > > > > 
> > > > > Signed-off-by: Tiwei Bie 
> > > > 
> > > > Looks good to me. Only minor nits which could be addressed on top.
> > > > 
> > > > Reviewed-by: Jason Wang 
> > > Thanks!
> > > 
> > > > 
> > > > > ---
> > > > > This patch depends on below series:
> > > > > https://lkml.org/lkml/2019/11/5/217
> > > > > 
> > > > > v4 -> v5:
> > > > > - Rebase on top of virtio-mdev series v8;
> > > > > - Use the virtio_ops of mdev_device in vhost-mdev (Jason);
> > > > > - Some minor improvements on commit log;
> > > > > 
> > > > > v3 -> v4:
> > > > > - Rebase on top of virtio-mdev series v6;
> > > > > - Some minor tweaks and improvements;
> > > > > 
> > > > > v2 -> v3:
> > > > > - Fix the return value (Jason);
> > > > > - Don't cache unnecessary information in vhost-mdev (Jason);
> > > > > - Get rid of the memset in open (Jason);
> > > > > - Add comments for VHOST_SET_MEM_TABLE, ... (Jason);
> > > > > - Filter out unsupported features in vhost-mdev (Jason);
> > > > > - Add _GET_DEVICE_ID ioctl (Jason);
> > > > > - Add _GET_CONFIG/_SET_CONFIG ioctls (Jason);
> > > > > - Drop _GET_QUEUE_NUM ioctl (Jason);
> > > > > - Fix the copy-paste errors in _IOW/_IOR usage;
> > > > > - Some minor fixes and improvements;
> > > > > 
> > > > > v1 -> v2:
> > > > > - Replace _SET_STATE with _SET_STATUS (MST);
> > > > > - Check status bits at each step (MST);
> > > > > - Report the max ring size and max number of queues (MST);
> > > > > - Add missing MODULE_DEVICE_TABLE (Jason);
> > > > > - Only support the network backend w/o multiqueue for now;
> > > > > - Some minor fixes and improvements;
> > > > > - Rebase on top of virtio-mdev series v4;
> > > > > 
> > > > > RFC v4 -> v1:
> > > > > - Implement vhost-mdev as a mdev device driver directly and
> > > > > connect it to VFIO container/group. (Jason);
> > > > > - Pass ring addresses as GPAs/IOVAs in vhost-mdev to avoid
> > > > > meaningless HVA->GPA translations (Jason);
> > > > > 
> > > > > RFC v3 -> RFC v4:
> > > > > - Build vhost-mdev on top of the same abstraction used by
> > > > > virtio-mdev (Jason);
> > > > > - Introduce vhost fd and pass VFIO fd via SET_BACKEND ioctl (MST);
> > > > > 
> > > > > RFC v2 -> RFC v3:
> > > > > - Reuse vhost's ioctls instead of inventing a VFIO regions/irqs
> > > > > based vhost protocol on top of vfio-mdev (Jason);
> > > > > 
> > > > > RFC v1 -> RFC v2:
> > > > > - Introduce a new VFIO device type to build a vhost protocol
> > > > > on top of vfio-mdev;
> > > > > 
> > > > >drivers/vfio/mdev/mdev_core.c|  21 ++
> > > > >drivers/vhost/Kconfig|  12 +
> > > > >drivers/vhost/Makefile   |   3 +
> > > > >drivers/vhost/mdev.c | 553 

Re: [PATCH v5] vhost: introduce mdev based hardware backend

2019-11-06 Thread Tiwei Bie
On Wed, Nov 06, 2019 at 07:59:02AM -0500, Michael S. Tsirkin wrote:
> On Tue, Nov 05, 2019 at 07:53:32PM +0800, Tiwei Bie wrote:
> > This patch introduces a mdev based hardware vhost backend.
> > This backend is built on top of the same abstraction used
> > in virtio-mdev and provides a generic vhost interface for
> > userspace to accelerate the virtio devices in guest.
> > 
> > This backend is implemented as a mdev device driver on top
> > of the same mdev device ops used in virtio-mdev but using
> > a different mdev class id, and it will register the device
> > as a VFIO device for userspace to use. Userspace can setup
> > the IOMMU with the existing VFIO container/group APIs and
> > then get the device fd with the device name. After getting
> > the device fd, userspace can use vhost ioctls on top of it
> > to setup the backend.
> > 
> > Signed-off-by: Tiwei Bie 
> 
> So at this point, looks like the only thing missing is IFC, and then all
> these patches can go in.
> But as IFC is still being worked on anyway, it makes sense to
> address the minor comments manwhile so we don't need
> patches on top.
> Right?

Yeah, of course.

Thanks,
Tiwei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5] vhost: introduce mdev based hardware backend

2019-11-06 Thread Tiwei Bie
On Wed, Nov 06, 2019 at 03:54:45PM +0800, Jason Wang wrote:
> On 2019/11/5 下午7:53, Tiwei Bie wrote:
> > This patch introduces a mdev based hardware vhost backend.
> > This backend is built on top of the same abstraction used
> > in virtio-mdev and provides a generic vhost interface for
> > userspace to accelerate the virtio devices in guest.
> > 
> > This backend is implemented as a mdev device driver on top
> > of the same mdev device ops used in virtio-mdev but using
> > a different mdev class id, and it will register the device
> > as a VFIO device for userspace to use. Userspace can setup
> > the IOMMU with the existing VFIO container/group APIs and
> > then get the device fd with the device name. After getting
> > the device fd, userspace can use vhost ioctls on top of it
> > to setup the backend.
> > 
> > Signed-off-by: Tiwei Bie 
> 
> 
> Looks good to me. Only minor nits which could be addressed on top.
> 
> Reviewed-by: Jason Wang 

Thanks!

> 
> 
> > ---
> > This patch depends on below series:
> > https://lkml.org/lkml/2019/11/5/217
> > 
> > v4 -> v5:
> > - Rebase on top of virtio-mdev series v8;
> > - Use the virtio_ops of mdev_device in vhost-mdev (Jason);
> > - Some minor improvements on commit log;
> > 
> > v3 -> v4:
> > - Rebase on top of virtio-mdev series v6;
> > - Some minor tweaks and improvements;
> > 
> > v2 -> v3:
> > - Fix the return value (Jason);
> > - Don't cache unnecessary information in vhost-mdev (Jason);
> > - Get rid of the memset in open (Jason);
> > - Add comments for VHOST_SET_MEM_TABLE, ... (Jason);
> > - Filter out unsupported features in vhost-mdev (Jason);
> > - Add _GET_DEVICE_ID ioctl (Jason);
> > - Add _GET_CONFIG/_SET_CONFIG ioctls (Jason);
> > - Drop _GET_QUEUE_NUM ioctl (Jason);
> > - Fix the copy-paste errors in _IOW/_IOR usage;
> > - Some minor fixes and improvements;
> > 
> > v1 -> v2:
> > - Replace _SET_STATE with _SET_STATUS (MST);
> > - Check status bits at each step (MST);
> > - Report the max ring size and max number of queues (MST);
> > - Add missing MODULE_DEVICE_TABLE (Jason);
> > - Only support the network backend w/o multiqueue for now;
> > - Some minor fixes and improvements;
> > - Rebase on top of virtio-mdev series v4;
> > 
> > RFC v4 -> v1:
> > - Implement vhost-mdev as a mdev device driver directly and
> >connect it to VFIO container/group. (Jason);
> > - Pass ring addresses as GPAs/IOVAs in vhost-mdev to avoid
> >meaningless HVA->GPA translations (Jason);
> > 
> > RFC v3 -> RFC v4:
> > - Build vhost-mdev on top of the same abstraction used by
> >virtio-mdev (Jason);
> > - Introduce vhost fd and pass VFIO fd via SET_BACKEND ioctl (MST);
> > 
> > RFC v2 -> RFC v3:
> > - Reuse vhost's ioctls instead of inventing a VFIO regions/irqs
> >based vhost protocol on top of vfio-mdev (Jason);
> > 
> > RFC v1 -> RFC v2:
> > - Introduce a new VFIO device type to build a vhost protocol
> >on top of vfio-mdev;
> > 
> >   drivers/vfio/mdev/mdev_core.c|  21 ++
> >   drivers/vhost/Kconfig|  12 +
> >   drivers/vhost/Makefile   |   3 +
> >   drivers/vhost/mdev.c | 553 +++
> >   include/linux/mdev.h |   5 +
> >   include/uapi/linux/vhost.h   |  18 +
> >   include/uapi/linux/vhost_types.h |   8 +
> >   7 files changed, 620 insertions(+)
> >   create mode 100644 drivers/vhost/mdev.c
> > 
> > diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> > index c58253404ed5..d855be5afbae 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> > @@ -99,6 +99,27 @@ mdev_get_virtio_ops(struct mdev_device *mdev)
> >   }
> >   EXPORT_SYMBOL(mdev_get_virtio_ops);
> > +/*
> > + * Specify the vhost device ops for the mdev device, this
> > + * must be called during create() callback for vhost mdev device.
> > + */
> > +void mdev_set_vhost_ops(struct mdev_device *mdev,
> > +   const struct mdev_virtio_device_ops *vhost_ops)
> > +{
> > +   mdev_set_class(mdev, MDEV_CLASS_ID_VHOST);
> > +   mdev->virtio_ops = vhost_ops;
> > +}
> > +EXPORT_SYMBOL(mdev_set_vhost_ops);
> > +
> > +/* Get the vhost device ops for the mdev device. */
> > +const struct mdev_virtio_device_ops *
> > +mdev_get_vhost_ops(struct mdev_device *mdev)
> > +{
> > +   WARN_ON(mdev->clas

[PATCH v5] vhost: introduce mdev based hardware backend

2019-11-05 Thread Tiwei Bie
This patch introduces a mdev based hardware vhost backend.
This backend is built on top of the same abstraction used
in virtio-mdev and provides a generic vhost interface for
userspace to accelerate the virtio devices in guest.

This backend is implemented as a mdev device driver on top
of the same mdev device ops used in virtio-mdev but using
a different mdev class id, and it will register the device
as a VFIO device for userspace to use. Userspace can setup
the IOMMU with the existing VFIO container/group APIs and
then get the device fd with the device name. After getting
the device fd, userspace can use vhost ioctls on top of it
to setup the backend.

Signed-off-by: Tiwei Bie 
---
This patch depends on below series:
https://lkml.org/lkml/2019/11/5/217

v4 -> v5:
- Rebase on top of virtio-mdev series v8;
- Use the virtio_ops of mdev_device in vhost-mdev (Jason);
- Some minor improvements on commit log;

v3 -> v4:
- Rebase on top of virtio-mdev series v6;
- Some minor tweaks and improvements;

v2 -> v3:
- Fix the return value (Jason);
- Don't cache unnecessary information in vhost-mdev (Jason);
- Get rid of the memset in open (Jason);
- Add comments for VHOST_SET_MEM_TABLE, ... (Jason);
- Filter out unsupported features in vhost-mdev (Jason);
- Add _GET_DEVICE_ID ioctl (Jason);
- Add _GET_CONFIG/_SET_CONFIG ioctls (Jason);
- Drop _GET_QUEUE_NUM ioctl (Jason);
- Fix the copy-paste errors in _IOW/_IOR usage;
- Some minor fixes and improvements;

v1 -> v2:
- Replace _SET_STATE with _SET_STATUS (MST);
- Check status bits at each step (MST);
- Report the max ring size and max number of queues (MST);
- Add missing MODULE_DEVICE_TABLE (Jason);
- Only support the network backend w/o multiqueue for now;
- Some minor fixes and improvements;
- Rebase on top of virtio-mdev series v4;

RFC v4 -> v1:
- Implement vhost-mdev as a mdev device driver directly and
  connect it to VFIO container/group. (Jason);
- Pass ring addresses as GPAs/IOVAs in vhost-mdev to avoid
  meaningless HVA->GPA translations (Jason);

RFC v3 -> RFC v4:
- Build vhost-mdev on top of the same abstraction used by
  virtio-mdev (Jason);
- Introduce vhost fd and pass VFIO fd via SET_BACKEND ioctl (MST);

RFC v2 -> RFC v3:
- Reuse vhost's ioctls instead of inventing a VFIO regions/irqs
  based vhost protocol on top of vfio-mdev (Jason);

RFC v1 -> RFC v2:
- Introduce a new VFIO device type to build a vhost protocol
  on top of vfio-mdev;

 drivers/vfio/mdev/mdev_core.c|  21 ++
 drivers/vhost/Kconfig|  12 +
 drivers/vhost/Makefile   |   3 +
 drivers/vhost/mdev.c | 553 +++
 include/linux/mdev.h |   5 +
 include/uapi/linux/vhost.h   |  18 +
 include/uapi/linux/vhost_types.h |   8 +
 7 files changed, 620 insertions(+)
 create mode 100644 drivers/vhost/mdev.c

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index c58253404ed5..d855be5afbae 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -99,6 +99,27 @@ mdev_get_virtio_ops(struct mdev_device *mdev)
 }
 EXPORT_SYMBOL(mdev_get_virtio_ops);
 
+/*
+ * Specify the vhost device ops for the mdev device, this
+ * must be called during create() callback for vhost mdev device.
+ */
+void mdev_set_vhost_ops(struct mdev_device *mdev,
+   const struct mdev_virtio_device_ops *vhost_ops)
+{
+   mdev_set_class(mdev, MDEV_CLASS_ID_VHOST);
+   mdev->virtio_ops = vhost_ops;
+}
+EXPORT_SYMBOL(mdev_set_vhost_ops);
+
+/* Get the vhost device ops for the mdev device. */
+const struct mdev_virtio_device_ops *
+mdev_get_vhost_ops(struct mdev_device *mdev)
+{
+   WARN_ON(mdev->class_id != MDEV_CLASS_ID_VHOST);
+   return mdev->virtio_ops;
+}
+EXPORT_SYMBOL(mdev_get_vhost_ops);
+
 struct device *mdev_dev(struct mdev_device *mdev)
 {
return >dev;
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 3d03ccbd1adc..062cada28f89 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -34,6 +34,18 @@ config VHOST_VSOCK
To compile this driver as a module, choose M here: the module will be 
called
vhost_vsock.
 
+config VHOST_MDEV
+   tristate "Vhost driver for Mediated devices"
+   depends on EVENTFD && VFIO && VFIO_MDEV
+   select VHOST
+   default n
+   ---help---
+   This kernel module can be loaded in host kernel to accelerate
+   guest virtio devices with the mediated device based backends.
+
+   To compile this driver as a module, choose M here: the module will
+   be called vhost_mdev.
+
 config VHOST
tristate
---help---
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 6c6df24f770c..ad9c0f8c6d8c 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -10,4 +10,7 @@ vhost_vsock-y := vsock.o
 
 obj-$(CONFIG_VHOST_RING) += vringh.o
 
+obj-$(CONFIG_VHOST_MDEV) += vhost_md

Re: [PATCH v4] vhost: introduce mdev based hardware backend

2019-11-01 Thread Tiwei Bie
On Fri, Nov 01, 2019 at 03:17:39PM +0800, Jason Wang wrote:
> On 2019/10/31 下午10:01, Tiwei Bie wrote:
> > This patch introduces a mdev based hardware vhost backend.
> > This backend is built on top of the same abstraction used
> > in virtio-mdev and provides a generic vhost interface for
> > userspace to accelerate the virtio devices in guest.
> > 
> > This backend is implemented as a mdev device driver on top
> > of the same mdev device ops used in virtio-mdev but using
> > a different mdev class id, and it will register the device
> > as a VFIO device for userspace to use. Userspace can setup
> > the IOMMU with the existing VFIO container/group APIs and
> > then get the device fd with the device name. After getting
> > the device fd of this device, userspace can use vhost ioctls
> > to setup the backend.
> > 
> > Signed-off-by: Tiwei Bie 
> > ---
> > This patch depends on below series:
> > https://lkml.org/lkml/2019/10/30/62
> > 
> > v3 -> v4:
> > - Rebase on top of virtio-mdev series v6;
> > - Some minor tweaks and improvements;
> > 
> > v2 -> v3:
> > - Fix the return value (Jason);
> > - Don't cache unnecessary information in vhost-mdev (Jason);
> > - Get rid of the memset in open (Jason);
> > - Add comments for VHOST_SET_MEM_TABLE, ... (Jason);
> > - Filter out unsupported features in vhost-mdev (Jason);
> > - Add _GET_DEVICE_ID ioctl (Jason);
> > - Add _GET_CONFIG/_SET_CONFIG ioctls (Jason);
> > - Drop _GET_QUEUE_NUM ioctl (Jason);
> > - Fix the copy-paste errors in _IOW/_IOR usage;
> > - Some minor fixes and improvements;
> > 
> > v1 -> v2:
> > - Replace _SET_STATE with _SET_STATUS (MST);
> > - Check status bits at each step (MST);
> > - Report the max ring size and max number of queues (MST);
> > - Add missing MODULE_DEVICE_TABLE (Jason);
> > - Only support the network backend w/o multiqueue for now;
> > - Some minor fixes and improvements;
> > - Rebase on top of virtio-mdev series v4;
> > 
> > RFC v4 -> v1:
> > - Implement vhost-mdev as a mdev device driver directly and
> >connect it to VFIO container/group. (Jason);
> > - Pass ring addresses as GPAs/IOVAs in vhost-mdev to avoid
> >meaningless HVA->GPA translations (Jason);
> > 
> > RFC v3 -> RFC v4:
> > - Build vhost-mdev on top of the same abstraction used by
> >virtio-mdev (Jason);
> > - Introduce vhost fd and pass VFIO fd via SET_BACKEND ioctl (MST);
> > 
> > RFC v2 -> RFC v3:
> > - Reuse vhost's ioctls instead of inventing a VFIO regions/irqs
> >based vhost protocol on top of vfio-mdev (Jason);
> > 
> > RFC v1 -> RFC v2:
> > - Introduce a new VFIO device type to build a vhost protocol
> >on top of vfio-mdev;
> > 
> >   drivers/vfio/mdev/mdev_core.c|  20 ++
> >   drivers/vfio/mdev/mdev_private.h |   1 +
> >   drivers/vhost/Kconfig|  12 +
> >   drivers/vhost/Makefile   |   3 +
> >   drivers/vhost/mdev.c | 556 +++
> >   include/linux/mdev.h |   5 +
> >   include/uapi/linux/vhost.h   |  18 +
> >   include/uapi/linux/vhost_types.h |   8 +
> >   8 files changed, 623 insertions(+)
> >   create mode 100644 drivers/vhost/mdev.c
> > 
> > diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> > index 22ca589750d8..109dbac01a8f 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> > @@ -96,6 +96,26 @@ mdev_get_virtio_ops(struct mdev_device *mdev)
> >   }
> >   EXPORT_SYMBOL(mdev_get_virtio_ops);
> > +/* Specify the vhost device ops for the mdev device, this
> > + * must be called during create() callback for vhost mdev device.
> > + */
> > +void mdev_set_vhost_ops(struct mdev_device *mdev,
> > +   const struct virtio_mdev_device_ops *vhost_ops)
> > +{
> > +   mdev_set_class(mdev, MDEV_CLASS_ID_VHOST);
> > +   mdev->vhost_ops = vhost_ops;
> > +}
> > +EXPORT_SYMBOL(mdev_set_vhost_ops);
> > +
> > +/* Get the vhost device ops for the mdev device. */
> > +const struct virtio_mdev_device_ops *
> > +mdev_get_vhost_ops(struct mdev_device *mdev)
> > +{
> > +   WARN_ON(mdev->class_id != MDEV_CLASS_ID_VHOST);
> > +   return mdev->vhost_ops;
> > +}
> > +EXPORT_SYMBOL(mdev_get_vhost_ops);
> > +
> >   struct device *mdev_dev(struct mdev_device *mdev)
> >   {
> > return >dev;
> > diff --git a/drivers/vfio/mdev/mdev_private.h 
> >

[PATCH v4] vhost: introduce mdev based hardware backend

2019-10-31 Thread Tiwei Bie
This patch introduces a mdev based hardware vhost backend.
This backend is built on top of the same abstraction used
in virtio-mdev and provides a generic vhost interface for
userspace to accelerate the virtio devices in guest.

This backend is implemented as a mdev device driver on top
of the same mdev device ops used in virtio-mdev but using
a different mdev class id, and it will register the device
as a VFIO device for userspace to use. Userspace can setup
the IOMMU with the existing VFIO container/group APIs and
then get the device fd with the device name. After getting
the device fd of this device, userspace can use vhost ioctls
to setup the backend.

Signed-off-by: Tiwei Bie 
---
This patch depends on below series:
https://lkml.org/lkml/2019/10/30/62

v3 -> v4:
- Rebase on top of virtio-mdev series v6;
- Some minor tweaks and improvements;

v2 -> v3:
- Fix the return value (Jason);
- Don't cache unnecessary information in vhost-mdev (Jason);
- Get rid of the memset in open (Jason);
- Add comments for VHOST_SET_MEM_TABLE, ... (Jason);
- Filter out unsupported features in vhost-mdev (Jason);
- Add _GET_DEVICE_ID ioctl (Jason);
- Add _GET_CONFIG/_SET_CONFIG ioctls (Jason);
- Drop _GET_QUEUE_NUM ioctl (Jason);
- Fix the copy-paste errors in _IOW/_IOR usage;
- Some minor fixes and improvements;

v1 -> v2:
- Replace _SET_STATE with _SET_STATUS (MST);
- Check status bits at each step (MST);
- Report the max ring size and max number of queues (MST);
- Add missing MODULE_DEVICE_TABLE (Jason);
- Only support the network backend w/o multiqueue for now;
- Some minor fixes and improvements;
- Rebase on top of virtio-mdev series v4;

RFC v4 -> v1:
- Implement vhost-mdev as a mdev device driver directly and
  connect it to VFIO container/group. (Jason);
- Pass ring addresses as GPAs/IOVAs in vhost-mdev to avoid
  meaningless HVA->GPA translations (Jason);

RFC v3 -> RFC v4:
- Build vhost-mdev on top of the same abstraction used by
  virtio-mdev (Jason);
- Introduce vhost fd and pass VFIO fd via SET_BACKEND ioctl (MST);

RFC v2 -> RFC v3:
- Reuse vhost's ioctls instead of inventing a VFIO regions/irqs
  based vhost protocol on top of vfio-mdev (Jason);

RFC v1 -> RFC v2:
- Introduce a new VFIO device type to build a vhost protocol
  on top of vfio-mdev;

 drivers/vfio/mdev/mdev_core.c|  20 ++
 drivers/vfio/mdev/mdev_private.h |   1 +
 drivers/vhost/Kconfig|  12 +
 drivers/vhost/Makefile   |   3 +
 drivers/vhost/mdev.c | 556 +++
 include/linux/mdev.h |   5 +
 include/uapi/linux/vhost.h   |  18 +
 include/uapi/linux/vhost_types.h |   8 +
 8 files changed, 623 insertions(+)
 create mode 100644 drivers/vhost/mdev.c

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 22ca589750d8..109dbac01a8f 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -96,6 +96,26 @@ mdev_get_virtio_ops(struct mdev_device *mdev)
 }
 EXPORT_SYMBOL(mdev_get_virtio_ops);
 
+/* Specify the vhost device ops for the mdev device, this
+ * must be called during create() callback for vhost mdev device.
+ */
+void mdev_set_vhost_ops(struct mdev_device *mdev,
+   const struct virtio_mdev_device_ops *vhost_ops)
+{
+   mdev_set_class(mdev, MDEV_CLASS_ID_VHOST);
+   mdev->vhost_ops = vhost_ops;
+}
+EXPORT_SYMBOL(mdev_set_vhost_ops);
+
+/* Get the vhost device ops for the mdev device. */
+const struct virtio_mdev_device_ops *
+mdev_get_vhost_ops(struct mdev_device *mdev)
+{
+   WARN_ON(mdev->class_id != MDEV_CLASS_ID_VHOST);
+   return mdev->vhost_ops;
+}
+EXPORT_SYMBOL(mdev_get_vhost_ops);
+
 struct device *mdev_dev(struct mdev_device *mdev)
 {
return >dev;
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index 7b47890c34e7..5597c846e52f 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -40,6 +40,7 @@ struct mdev_device {
union {
const struct vfio_mdev_device_ops *vfio_ops;
const struct virtio_mdev_device_ops *virtio_ops;
+   const struct virtio_mdev_device_ops *vhost_ops;
};
 };
 
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 3d03ccbd1adc..062cada28f89 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -34,6 +34,18 @@ config VHOST_VSOCK
To compile this driver as a module, choose M here: the module will be 
called
vhost_vsock.
 
+config VHOST_MDEV
+   tristate "Vhost driver for Mediated devices"
+   depends on EVENTFD && VFIO && VFIO_MDEV
+   select VHOST
+   default n
+   ---help---
+   This kernel module can be loaded in host kernel to accelerate
+   guest virtio devices with the mediated device based backends.
+
+   To compile this driver as a module, choose M here: the module will
+   be calle

Re: [RFC] vhost_mdev: add network control vq support

2019-10-30 Thread Tiwei Bie
On Wed, Oct 30, 2019 at 03:04:37PM +0800, Jason Wang wrote:
> On 2019/10/30 下午2:17, Tiwei Bie wrote:
> > On Tue, Oct 29, 2019 at 06:51:32PM +0800, Jason Wang wrote:
> >> On 2019/10/29 下午6:17, Tiwei Bie wrote:
> >>> This patch adds the network control vq support in vhost-mdev.
> >>> A vhost-mdev specific op is introduced to allow parent drivers
> >>> to handle the network control commands come from userspace.
> >> Probably work for userspace driver but not kernel driver.
> > Exactly. This is only for userspace.
> >
> > I got your point now. In virtio-mdev kernel driver case,
> > the ctrl-vq can be special as well.
> >
> 
> Then maybe it's better to introduce vhost-mdev-net on top?
> 
> Looking at the other type of virtio device:
> 
> - console have two control virtqueues when multiqueue port is enabled
> 
> - SCSI has controlq + eventq
> 
> - GPU has controlq
> 
> - Crypto device has one controlq
> 
> - Socket has eventq
> 
> ...

Thanks for the list! It looks dirty to define specific
commands and types in vhost UAPI for each of them in the
future. It's definitely much better to find an approach
to solve it once for all if possible..

Just a quick thought, considering all vhost-mdev does
is just to forward settings between parent and userspace,
I'm wondering whether it's possible to make the argp
opaque in vhost-mdev UAPI and just introduce one generic
ioctl command to deliver these device specific commands
(which are opaque in vhost-mdev as vhost-mdev just pass
the pointer -- argp) defined by spec.

I'm also fine with exposing ctrlq to userspace directly.
PS. It's interesting that some devices have more than
one ctrlq. I need to take a close look first..


> 
> Thanks
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC] vhost_mdev: add network control vq support

2019-10-30 Thread Tiwei Bie
On Tue, Oct 29, 2019 at 06:51:32PM +0800, Jason Wang wrote:
> On 2019/10/29 下午6:17, Tiwei Bie wrote:
> > This patch adds the network control vq support in vhost-mdev.
> > A vhost-mdev specific op is introduced to allow parent drivers
> > to handle the network control commands come from userspace.
> 
> Probably work for userspace driver but not kernel driver.

Exactly. This is only for userspace.

I got your point now. In virtio-mdev kernel driver case,
the ctrl-vq can be special as well.

> 
> 
> >
> > Signed-off-by: Tiwei Bie 
> > ---
> > This patch depends on below patch:
> > https://lkml.org/lkml/2019/10/29/335
> >
> >  drivers/vhost/mdev.c | 37 ++--
> >  include/linux/virtio_mdev_ops.h  | 10 +
> >  include/uapi/linux/vhost.h   |  7 ++
> >  include/uapi/linux/vhost_types.h |  6 ++
> >  4 files changed, 58 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vhost/mdev.c b/drivers/vhost/mdev.c
> > index 35b2fb33e686..c9b3eaa77405 100644
> > --- a/drivers/vhost/mdev.c
> > +++ b/drivers/vhost/mdev.c
> > @@ -47,6 +47,13 @@ enum {
> > (1ULL << VIRTIO_NET_F_HOST_UFO) |
> > (1ULL << VIRTIO_NET_F_MRG_RXBUF) |
> > (1ULL << VIRTIO_NET_F_STATUS) |
> > +   (1ULL << VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) |
> > +   (1ULL << VIRTIO_NET_F_CTRL_VQ) |
> > +   (1ULL << VIRTIO_NET_F_CTRL_RX) |
> > +   (1ULL << VIRTIO_NET_F_CTRL_VLAN) |
> > +   (1ULL << VIRTIO_NET_F_CTRL_RX_EXTRA) |
> > +   (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) |
> > +   (1ULL << VIRTIO_NET_F_CTRL_MAC_ADDR) |
> > (1ULL << VIRTIO_NET_F_SPEED_DUPLEX),
> >  };
> >  
> > @@ -362,6 +369,29 @@ static long vhost_mdev_vring_ioctl(struct vhost_mdev 
> > *m, unsigned int cmd,
> > return r;
> >  }
> >  
> > +/*
> > + * Device specific (e.g. network) ioctls.
> > + */
> > +static long vhost_mdev_dev_ioctl(struct vhost_mdev *m, unsigned int cmd,
> > +void __user *argp)
> > +{
> > +   struct mdev_device *mdev = m->mdev;
> > +   const struct virtio_mdev_device_ops *ops = mdev_get_vhost_ops(mdev);
> > +
> > +   switch (m->virtio_id) {
> > +   case VIRTIO_ID_NET:
> > +   switch (cmd) {
> > +   case VHOST_MDEV_NET_CTRL:
> > +   if (!ops->net.ctrl)
> > +   return -ENOTSUPP;
> > +   return ops->net.ctrl(mdev, argp);
> > +   }
> > +   break;
> > +   }
> > +
> > +   return -ENOIOCTLCMD;
> > +}
> 
> As you comment above, then vhost-mdev need device specific stuffs.

Yeah. But this device specific stuff is quite small and
simple. It's just to forward the settings between parent
and userspace. But I totally agree it would be really
great if we could avoid it in an elegant way.

> 
> 
> > +
> >  static int vhost_mdev_open(void *device_data)
> >  {
> > struct vhost_mdev *m = device_data;
> > @@ -460,8 +490,11 @@ static long vhost_mdev_unlocked_ioctl(void 
> > *device_data,
> >  * VHOST_SET_LOG_FD are not used yet.
> >  */
> > r = vhost_dev_ioctl(>dev, cmd, argp);
> > -   if (r == -ENOIOCTLCMD)
> > -   r = vhost_mdev_vring_ioctl(m, cmd, argp);
> > +   if (r == -ENOIOCTLCMD) {
> > +   r = vhost_mdev_dev_ioctl(m, cmd, argp);
> > +   if (r == -ENOIOCTLCMD)
> > +   r = vhost_mdev_vring_ioctl(m, cmd, argp);
> > +   }
> > }
> >  
> > mutex_unlock(>mutex);
> > diff --git a/include/linux/virtio_mdev_ops.h 
> > b/include/linux/virtio_mdev_ops.h
> > index d417b41f2845..622861804ebd 100644
> > --- a/include/linux/virtio_mdev_ops.h
> > +++ b/include/linux/virtio_mdev_ops.h
> > @@ -20,6 +20,8 @@ struct virtio_mdev_callback {
> > void *private;
> >  };
> >  
> > +struct vhost_mdev_net_ctrl;
> > +
> >  /**
> >   * struct vfio_mdev_device_ops - Structure to be registered for each
> >   * mdev device to register the device for virtio/vhost drivers.
> > @@ -151,6 +153,14 @@ struct virtio_mdev_device_ops {
> >  
> > /* Mdev device ops */
> > u64 (*get_mdev_features)(struct mdev_device *mdev);
> > +
> > +   /* Vhost-mdev (MDEV_CLASS_ID_VHOST) specific ops */
> > +   un

Re: [PATCH v3] vhost: introduce mdev based hardware backend

2019-10-29 Thread Tiwei Bie
On Wed, Oct 30, 2019 at 09:55:57AM +0800, Jason Wang wrote:
> On 2019/10/29 下午6:07, Tiwei Bie wrote:
> > This patch introduces a mdev based hardware vhost backend.
> > This backend is built on top of the same abstraction used
> > in virtio-mdev and provides a generic vhost interface for
> > userspace to accelerate the virtio devices in guest.
> >
> > This backend is implemented as a mdev device driver on top
> > of the same mdev device ops used in virtio-mdev but using
> > a different mdev class id, and it will register the device
> > as a VFIO device for userspace to use. Userspace can setup
> > the IOMMU with the existing VFIO container/group APIs and
> > then get the device fd with the device name. After getting
> > the device fd of this device, userspace can use vhost ioctls
> > to setup the backend.
> 
> 
> Hi Tiwei:
> 
> The patch looks good overall, just few comments & nits.

Thanks for the review! I do appreciate it.

> 
> 
> >
> > Signed-off-by: Tiwei Bie 
> > ---
> > This patch depends on below series:
> > https://lkml.org/lkml/2019/10/23/614
> >
> > v2 -> v3:
> > - Fix the return value (Jason);
> > - Don't cache unnecessary information in vhost-mdev (Jason);
> > - Get rid of the memset in open (Jason);
> > - Add comments for VHOST_SET_MEM_TABLE, ... (Jason);
> > - Filter out unsupported features in vhost-mdev (Jason);
> > - Add _GET_DEVICE_ID ioctl (Jason);
> > - Add _GET_CONFIG/_SET_CONFIG ioctls (Jason);
> > - Drop _GET_QUEUE_NUM ioctl (Jason);
> > - Fix the copy-paste errors in _IOW/_IOR usage;
> > - Some minor fixes and improvements;
> >
> > v1 -> v2:
> > - Replace _SET_STATE with _SET_STATUS (MST);
> > - Check status bits at each step (MST);
> > - Report the max ring size and max number of queues (MST);
> > - Add missing MODULE_DEVICE_TABLE (Jason);
> > - Only support the network backend w/o multiqueue for now;
> > - Some minor fixes and improvements;
> > - Rebase on top of virtio-mdev series v4;
> >
> > RFC v4 -> v1:
> > - Implement vhost-mdev as a mdev device driver directly and
> >   connect it to VFIO container/group. (Jason);
> > - Pass ring addresses as GPAs/IOVAs in vhost-mdev to avoid
> >   meaningless HVA->GPA translations (Jason);
> >
> > RFC v3 -> RFC v4:
> > - Build vhost-mdev on top of the same abstraction used by
> >   virtio-mdev (Jason);
> > - Introduce vhost fd and pass VFIO fd via SET_BACKEND ioctl (MST);
> >
> > RFC v2 -> RFC v3:
> > - Reuse vhost's ioctls instead of inventing a VFIO regions/irqs
> >   based vhost protocol on top of vfio-mdev (Jason);
> >
> > RFC v1 -> RFC v2:
> > - Introduce a new VFIO device type to build a vhost protocol
> >   on top of vfio-mdev;
> >
> >  drivers/vfio/mdev/mdev_core.c|  20 ++
> >  drivers/vfio/mdev/mdev_private.h |   1 +
> >  drivers/vhost/Kconfig|  12 +
> >  drivers/vhost/Makefile   |   3 +
> >  drivers/vhost/mdev.c | 554 +++
> >  include/linux/mdev.h |   5 +
> >  include/uapi/linux/vhost.h   |  18 +
> >  include/uapi/linux/vhost_types.h |   8 +
> >  8 files changed, 621 insertions(+)
> >  create mode 100644 drivers/vhost/mdev.c
> >
> > diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> > index 9b00c3513120..3cfd787d605c 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> > @@ -96,6 +96,26 @@ mdev_get_virtio_ops(struct mdev_device *mdev)
> >  }
> >  EXPORT_SYMBOL(mdev_get_virtio_ops);
> >  
> > +/* Specify the vhost device ops for the mdev device, this
> > + * must be called during create() callback for vhost mdev device.
> > + */
> > +void mdev_set_vhost_ops(struct mdev_device *mdev,
> > +   const struct virtio_mdev_device_ops *vhost_ops)
> > +{
> > +   mdev_set_class(mdev, MDEV_CLASS_ID_VHOST);
> > +   mdev->vhost_ops = vhost_ops;
> > +}
> > +EXPORT_SYMBOL(mdev_set_vhost_ops);
> > +
> > +/* Get the vhost device ops for the mdev device. */
> > +const struct virtio_mdev_device_ops *
> > +mdev_get_vhost_ops(struct mdev_device *mdev)
> > +{
> > +   WARN_ON(mdev->class_id != MDEV_CLASS_ID_VHOST);
> > +   return mdev->vhost_ops;
> > +}
> > +EXPORT_SYMBOL(mdev_get_vhost_ops);
> > +
> >  struct device *mdev_dev(struct mdev_device *mdev)
> >  {
> > return >dev;
> > diff --git a/drivers/vfio/mdev/mdev_private.h 
&

Re: [PATCH v2] vhost: introduce mdev based hardware backend

2019-10-29 Thread Tiwei Bie
On Tue, Oct 29, 2019 at 06:48:27PM +0800, Jason Wang wrote:
> On 2019/10/29 下午5:57, Tiwei Bie wrote:
> > On Mon, Oct 28, 2019 at 11:50:49AM +0800, Jason Wang wrote:
> >> On 2019/10/28 上午9:58, Tiwei Bie wrote:
> >>> On Fri, Oct 25, 2019 at 08:16:26AM -0400, Michael S. Tsirkin wrote:
> >>>> On Fri, Oct 25, 2019 at 05:54:55PM +0800, Jason Wang wrote:
> >>>>> On 2019/10/24 下午6:42, Jason Wang wrote:
> >>>>>> Yes.
> >>>>>>
> >>>>>>
> >>>>>>>    And we should try to avoid
> >>>>>>> putting ctrl vq and Rx/Tx vqs in the same DMA space to prevent
> >>>>>>> guests having the chance to bypass the host (e.g. QEMU) to
> >>>>>>> setup the backend accelerator directly.
> >>>>>> That's really good point.  So when "vhost" type is created, parent
> >>>>>> should assume addr of ctrl_vq is hva.
> >>>>>>
> >>>>>> Thanks
> >>>>> This works for vhost but not virtio since there's no way for virtio 
> >>>>> kernel
> >>>>> driver to differ ctrl_vq with the rest when doing DMA map. One possible
> >>>>> solution is to provide DMA domain isolation between virtqueues. Then 
> >>>>> ctrl vq
> >>>>> can use its dedicated DMA domain for the work.
> >>> It might not be a bad idea to let the parent drivers distinguish
> >>> between virtio-mdev mdevs and vhost-mdev mdevs in ctrl-vq handling
> >>> by mdev's class id.
> >> Yes, that should work, I have something probable better, see below.
> >>
> >>
> >>>>> Anyway, this could be done in the future. We can have a version first 
> >>>>> that
> >>>>> doesn't support ctrl_vq.
> >>> +1, thanks
> >>>
> >>>>> Thanks
> >>>> Well no ctrl_vq implies either no offloads, or no XDP (since XDP needs
> >>>> to disable offloads dynamically).
> >>>>
> >>>>  if (!virtio_has_feature(vi->vdev, 
> >>>> VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
> >>>>  && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> >>>>  virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> >>>>  virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
> >>>>  virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO) ||
> >>>>  virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))) 
> >>>> {
> >>>>  NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is 
> >>>> implementing LRO/CSUM, disable LRO/CSUM first");
> >>>>  return -EOPNOTSUPP;
> >>>>  }
> >>>>
> >>>> neither is very attractive.
> >>>>
> >>>> So yes ok just for development but we do need to figure out how it will
> >>>> work down the road in production.
> >>> Totally agree.
> >>>
> >>>> So really this specific virtio net device does not support control vq,
> >>>> instead it supports a different transport specific way to send commands
> >>>> to device.
> >>>>
> >>>> Some kind of extension to the transport? Ideas?
> >> So it's basically an issue of isolating DMA domains. Maybe we can start 
> >> with
> >> transport API for querying per vq DMA domain/ASID?
> >>
> >> - for vhost-mdev, userspace can query the DMA domain for each specific
> >> virtqueue. For control vq, mdev can return id for software domain, for the
> >> rest mdev will return id of VFIO domain. Then userspace know that it should
> >> use different API for preparing the virtqueue, e.g for vq other than 
> >> control
> >> vq, it should use VFIO DMA API. The control vq it should use hva instead.
> >>
> >> - for virito-mdev, we can introduce per-vq DMA device, and route DMA 
> >> mapping
> >> request for control vq back to mdev instead of the hardware. (We can wrap
> >> them into library or helpers to ease the development of vendor physical
> >> drivers).
> > Thanks for this proposal! I'm thinking about it these days.
> > I think it might be too complicated. I'm wondering whether we
> > can have something simpler. I will post a RFC patch to show
> > my idea today.
> 
> 
> Thanks, will check.
> 
> Btw, for virtio-mdev, the change should be very minimal, will post an
> RFC as well. For vhost-mdev, it could be just a helper to return an ID
> for DMA domain like ID_VFIO or ID_HVA.
> 
> Or a more straightforward way is to force queues like control vq to use PA.

Will check. Thanks!

> 
> 
> >
> > Thanks,
> > Tiwei
> >
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[RFC] vhost_mdev: add network control vq support

2019-10-29 Thread Tiwei Bie
This patch adds the network control vq support in vhost-mdev.
A vhost-mdev specific op is introduced to allow parent drivers
to handle the network control commands come from userspace.

Signed-off-by: Tiwei Bie 
---
This patch depends on below patch:
https://lkml.org/lkml/2019/10/29/335

 drivers/vhost/mdev.c | 37 ++--
 include/linux/virtio_mdev_ops.h  | 10 +
 include/uapi/linux/vhost.h   |  7 ++
 include/uapi/linux/vhost_types.h |  6 ++
 4 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/mdev.c b/drivers/vhost/mdev.c
index 35b2fb33e686..c9b3eaa77405 100644
--- a/drivers/vhost/mdev.c
+++ b/drivers/vhost/mdev.c
@@ -47,6 +47,13 @@ enum {
(1ULL << VIRTIO_NET_F_HOST_UFO) |
(1ULL << VIRTIO_NET_F_MRG_RXBUF) |
(1ULL << VIRTIO_NET_F_STATUS) |
+   (1ULL << VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) |
+   (1ULL << VIRTIO_NET_F_CTRL_VQ) |
+   (1ULL << VIRTIO_NET_F_CTRL_RX) |
+   (1ULL << VIRTIO_NET_F_CTRL_VLAN) |
+   (1ULL << VIRTIO_NET_F_CTRL_RX_EXTRA) |
+   (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) |
+   (1ULL << VIRTIO_NET_F_CTRL_MAC_ADDR) |
(1ULL << VIRTIO_NET_F_SPEED_DUPLEX),
 };
 
@@ -362,6 +369,29 @@ static long vhost_mdev_vring_ioctl(struct vhost_mdev *m, 
unsigned int cmd,
return r;
 }
 
+/*
+ * Device specific (e.g. network) ioctls.
+ */
+static long vhost_mdev_dev_ioctl(struct vhost_mdev *m, unsigned int cmd,
+void __user *argp)
+{
+   struct mdev_device *mdev = m->mdev;
+   const struct virtio_mdev_device_ops *ops = mdev_get_vhost_ops(mdev);
+
+   switch (m->virtio_id) {
+   case VIRTIO_ID_NET:
+   switch (cmd) {
+   case VHOST_MDEV_NET_CTRL:
+   if (!ops->net.ctrl)
+   return -ENOTSUPP;
+   return ops->net.ctrl(mdev, argp);
+   }
+   break;
+   }
+
+   return -ENOIOCTLCMD;
+}
+
 static int vhost_mdev_open(void *device_data)
 {
struct vhost_mdev *m = device_data;
@@ -460,8 +490,11 @@ static long vhost_mdev_unlocked_ioctl(void *device_data,
 * VHOST_SET_LOG_FD are not used yet.
 */
r = vhost_dev_ioctl(>dev, cmd, argp);
-   if (r == -ENOIOCTLCMD)
-   r = vhost_mdev_vring_ioctl(m, cmd, argp);
+   if (r == -ENOIOCTLCMD) {
+   r = vhost_mdev_dev_ioctl(m, cmd, argp);
+   if (r == -ENOIOCTLCMD)
+   r = vhost_mdev_vring_ioctl(m, cmd, argp);
+   }
}
 
mutex_unlock(>mutex);
diff --git a/include/linux/virtio_mdev_ops.h b/include/linux/virtio_mdev_ops.h
index d417b41f2845..622861804ebd 100644
--- a/include/linux/virtio_mdev_ops.h
+++ b/include/linux/virtio_mdev_ops.h
@@ -20,6 +20,8 @@ struct virtio_mdev_callback {
void *private;
 };
 
+struct vhost_mdev_net_ctrl;
+
 /**
  * struct vfio_mdev_device_ops - Structure to be registered for each
  * mdev device to register the device for virtio/vhost drivers.
@@ -151,6 +153,14 @@ struct virtio_mdev_device_ops {
 
/* Mdev device ops */
u64 (*get_mdev_features)(struct mdev_device *mdev);
+
+   /* Vhost-mdev (MDEV_CLASS_ID_VHOST) specific ops */
+   union {
+   struct {
+   int (*ctrl)(struct mdev_device *mdev,
+   struct vhost_mdev_net_ctrl __user *ctrl);
+   } net;
+   };
 };
 
 void mdev_set_virtio_ops(struct mdev_device *mdev,
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index 061a2824a1b3..3693b2cba0c4 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -134,4 +134,11 @@
 /* Get the max ring size. */
 #define VHOST_MDEV_GET_VRING_NUM   _IOR(VHOST_VIRTIO, 0x76, __u16)
 
+/* VHOST_MDEV device specific defines */
+
+/* Send virtio-net commands. The commands follow the same definition
+ * of the virtio-net commands defined in virtio-spec.
+ */
+#define VHOST_MDEV_NET_CTRL_IOW(VHOST_VIRTIO, 0x77, struct 
vhost_mdev_net_ctrl *)
+
 #endif
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index 7b105d0b2fb9..e76b4d8e35e5 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -127,6 +127,12 @@ struct vhost_mdev_config {
__u8 buf[0];
 };
 
+struct vhost_mdev_net_ctrl {
+   __u8 class;
+   __u8 cmd;
+   __u8 cmd_data[0];
+} __attribute__((packed));
+
 /* Feature bits */
 /* Log all write descriptors. Can be changed while device is active. */
 #define VHOST_F_LOG_ALL 26
-- 
2.23.0

___
Virtualization mailing list

[PATCH v3] vhost: introduce mdev based hardware backend

2019-10-29 Thread Tiwei Bie
This patch introduces a mdev based hardware vhost backend.
This backend is built on top of the same abstraction used
in virtio-mdev and provides a generic vhost interface for
userspace to accelerate the virtio devices in guest.

This backend is implemented as a mdev device driver on top
of the same mdev device ops used in virtio-mdev but using
a different mdev class id, and it will register the device
as a VFIO device for userspace to use. Userspace can setup
the IOMMU with the existing VFIO container/group APIs and
then get the device fd with the device name. After getting
the device fd of this device, userspace can use vhost ioctls
to setup the backend.

Signed-off-by: Tiwei Bie 
---
This patch depends on below series:
https://lkml.org/lkml/2019/10/23/614

v2 -> v3:
- Fix the return value (Jason);
- Don't cache unnecessary information in vhost-mdev (Jason);
- Get rid of the memset in open (Jason);
- Add comments for VHOST_SET_MEM_TABLE, ... (Jason);
- Filter out unsupported features in vhost-mdev (Jason);
- Add _GET_DEVICE_ID ioctl (Jason);
- Add _GET_CONFIG/_SET_CONFIG ioctls (Jason);
- Drop _GET_QUEUE_NUM ioctl (Jason);
- Fix the copy-paste errors in _IOW/_IOR usage;
- Some minor fixes and improvements;

v1 -> v2:
- Replace _SET_STATE with _SET_STATUS (MST);
- Check status bits at each step (MST);
- Report the max ring size and max number of queues (MST);
- Add missing MODULE_DEVICE_TABLE (Jason);
- Only support the network backend w/o multiqueue for now;
- Some minor fixes and improvements;
- Rebase on top of virtio-mdev series v4;

RFC v4 -> v1:
- Implement vhost-mdev as a mdev device driver directly and
  connect it to VFIO container/group. (Jason);
- Pass ring addresses as GPAs/IOVAs in vhost-mdev to avoid
  meaningless HVA->GPA translations (Jason);

RFC v3 -> RFC v4:
- Build vhost-mdev on top of the same abstraction used by
  virtio-mdev (Jason);
- Introduce vhost fd and pass VFIO fd via SET_BACKEND ioctl (MST);

RFC v2 -> RFC v3:
- Reuse vhost's ioctls instead of inventing a VFIO regions/irqs
  based vhost protocol on top of vfio-mdev (Jason);

RFC v1 -> RFC v2:
- Introduce a new VFIO device type to build a vhost protocol
  on top of vfio-mdev;

 drivers/vfio/mdev/mdev_core.c|  20 ++
 drivers/vfio/mdev/mdev_private.h |   1 +
 drivers/vhost/Kconfig|  12 +
 drivers/vhost/Makefile   |   3 +
 drivers/vhost/mdev.c | 554 +++
 include/linux/mdev.h |   5 +
 include/uapi/linux/vhost.h   |  18 +
 include/uapi/linux/vhost_types.h |   8 +
 8 files changed, 621 insertions(+)
 create mode 100644 drivers/vhost/mdev.c

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 9b00c3513120..3cfd787d605c 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -96,6 +96,26 @@ mdev_get_virtio_ops(struct mdev_device *mdev)
 }
 EXPORT_SYMBOL(mdev_get_virtio_ops);
 
+/* Specify the vhost device ops for the mdev device, this
+ * must be called during create() callback for vhost mdev device.
+ */
+void mdev_set_vhost_ops(struct mdev_device *mdev,
+   const struct virtio_mdev_device_ops *vhost_ops)
+{
+   mdev_set_class(mdev, MDEV_CLASS_ID_VHOST);
+   mdev->vhost_ops = vhost_ops;
+}
+EXPORT_SYMBOL(mdev_set_vhost_ops);
+
+/* Get the vhost device ops for the mdev device. */
+const struct virtio_mdev_device_ops *
+mdev_get_vhost_ops(struct mdev_device *mdev)
+{
+   WARN_ON(mdev->class_id != MDEV_CLASS_ID_VHOST);
+   return mdev->vhost_ops;
+}
+EXPORT_SYMBOL(mdev_get_vhost_ops);
+
 struct device *mdev_dev(struct mdev_device *mdev)
 {
return >dev;
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index 7b47890c34e7..5597c846e52f 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -40,6 +40,7 @@ struct mdev_device {
union {
const struct vfio_mdev_device_ops *vfio_ops;
const struct virtio_mdev_device_ops *virtio_ops;
+   const struct virtio_mdev_device_ops *vhost_ops;
};
 };
 
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 3d03ccbd1adc..062cada28f89 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -34,6 +34,18 @@ config VHOST_VSOCK
To compile this driver as a module, choose M here: the module will be 
called
vhost_vsock.
 
+config VHOST_MDEV
+   tristate "Vhost driver for Mediated devices"
+   depends on EVENTFD && VFIO && VFIO_MDEV
+   select VHOST
+   default n
+   ---help---
+   This kernel module can be loaded in host kernel to accelerate
+   guest virtio devices with the mediated device based backends.
+
+   To compile this driver as a module, choose M here: the module will
+   be called vhost_mdev.
+
 config VHOST
tristate
---help---
diff --git a/drivers/vhost/Makefil

Re: [PATCH v2] vhost: introduce mdev based hardware backend

2019-10-29 Thread Tiwei Bie
On Mon, Oct 28, 2019 at 11:50:49AM +0800, Jason Wang wrote:
> On 2019/10/28 上午9:58, Tiwei Bie wrote:
> > On Fri, Oct 25, 2019 at 08:16:26AM -0400, Michael S. Tsirkin wrote:
> > > On Fri, Oct 25, 2019 at 05:54:55PM +0800, Jason Wang wrote:
> > > > On 2019/10/24 下午6:42, Jason Wang wrote:
> > > > > Yes.
> > > > > 
> > > > > 
> > > > > >    And we should try to avoid
> > > > > > putting ctrl vq and Rx/Tx vqs in the same DMA space to prevent
> > > > > > guests having the chance to bypass the host (e.g. QEMU) to
> > > > > > setup the backend accelerator directly.
> > > > > 
> > > > > That's really good point.  So when "vhost" type is created, parent
> > > > > should assume addr of ctrl_vq is hva.
> > > > > 
> > > > > Thanks
> > > > 
> > > > This works for vhost but not virtio since there's no way for virtio 
> > > > kernel
> > > > driver to differ ctrl_vq with the rest when doing DMA map. One possible
> > > > solution is to provide DMA domain isolation between virtqueues. Then 
> > > > ctrl vq
> > > > can use its dedicated DMA domain for the work.
> > It might not be a bad idea to let the parent drivers distinguish
> > between virtio-mdev mdevs and vhost-mdev mdevs in ctrl-vq handling
> > by mdev's class id.
> 
> 
> Yes, that should work, I have something probable better, see below.
> 
> 
> > 
> > > > Anyway, this could be done in the future. We can have a version first 
> > > > that
> > > > doesn't support ctrl_vq.
> > +1, thanks
> > 
> > > > Thanks
> > > Well no ctrl_vq implies either no offloads, or no XDP (since XDP needs
> > > to disable offloads dynamically).
> > > 
> > >  if (!virtio_has_feature(vi->vdev, 
> > > VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
> > >  && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > >  virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> > >  virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > >  virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO) ||
> > >  virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))) {
> > >  NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is 
> > > implementing LRO/CSUM, disable LRO/CSUM first");
> > >  return -EOPNOTSUPP;
> > >  }
> > > 
> > > neither is very attractive.
> > > 
> > > So yes ok just for development but we do need to figure out how it will
> > > work down the road in production.
> > Totally agree.
> > 
> > > So really this specific virtio net device does not support control vq,
> > > instead it supports a different transport specific way to send commands
> > > to device.
> > > 
> > > Some kind of extension to the transport? Ideas?
> 
> 
> So it's basically an issue of isolating DMA domains. Maybe we can start with
> transport API for querying per vq DMA domain/ASID?
> 
> - for vhost-mdev, userspace can query the DMA domain for each specific
> virtqueue. For control vq, mdev can return id for software domain, for the
> rest mdev will return id of VFIO domain. Then userspace know that it should
> use different API for preparing the virtqueue, e.g for vq other than control
> vq, it should use VFIO DMA API. The control vq it should use hva instead.
> 
> - for virito-mdev, we can introduce per-vq DMA device, and route DMA mapping
> request for control vq back to mdev instead of the hardware. (We can wrap
> them into library or helpers to ease the development of vendor physical
> drivers).

Thanks for this proposal! I'm thinking about it these days.
I think it might be too complicated. I'm wondering whether we
can have something simpler. I will post a RFC patch to show
my idea today.

Thanks,
Tiwei

> 
> Thanks
> 
> 
> > > 
> > > 
> > > -- 
> > > MST
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2] vhost: introduce mdev based hardware backend

2019-10-27 Thread Tiwei Bie
On Fri, Oct 25, 2019 at 08:16:26AM -0400, Michael S. Tsirkin wrote:
> On Fri, Oct 25, 2019 at 05:54:55PM +0800, Jason Wang wrote:
> > On 2019/10/24 下午6:42, Jason Wang wrote:
> > > 
> > > Yes.
> > > 
> > > 
> > > >   And we should try to avoid
> > > > putting ctrl vq and Rx/Tx vqs in the same DMA space to prevent
> > > > guests having the chance to bypass the host (e.g. QEMU) to
> > > > setup the backend accelerator directly.
> > > 
> > > 
> > > That's really good point.  So when "vhost" type is created, parent
> > > should assume addr of ctrl_vq is hva.
> > > 
> > > Thanks
> > 
> > 
> > This works for vhost but not virtio since there's no way for virtio kernel
> > driver to differ ctrl_vq with the rest when doing DMA map. One possible
> > solution is to provide DMA domain isolation between virtqueues. Then ctrl vq
> > can use its dedicated DMA domain for the work.

It might not be a bad idea to let the parent drivers distinguish
between virtio-mdev mdevs and vhost-mdev mdevs in ctrl-vq handling
by mdev's class id.

> > 
> > Anyway, this could be done in the future. We can have a version first that
> > doesn't support ctrl_vq.

+1, thanks

> > 
> > Thanks
> 
> Well no ctrl_vq implies either no offloads, or no XDP (since XDP needs
> to disable offloads dynamically).
> 
> if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
> && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
> virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO) ||
> virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))) {
> NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is 
> implementing LRO/CSUM, disable LRO/CSUM first");
> return -EOPNOTSUPP;
> }
> 
> neither is very attractive.
> 
> So yes ok just for development but we do need to figure out how it will
> work down the road in production.

Totally agree.

> 
> So really this specific virtio net device does not support control vq,
> instead it supports a different transport specific way to send commands
> to device.
> 
> Some kind of extension to the transport? Ideas?
> 
> 
> -- 
> MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2] vhost: introduce mdev based hardware backend

2019-10-24 Thread Tiwei Bie
On Thu, Oct 24, 2019 at 04:32:42PM +0800, Jason Wang wrote:
> On 2019/10/24 下午4:03, Jason Wang wrote:
> > On 2019/10/24 下午12:21, Tiwei Bie wrote:
> > > On Wed, Oct 23, 2019 at 06:29:21PM +0800, Jason Wang wrote:
> > > > On 2019/10/23 下午6:11, Tiwei Bie wrote:
> > > > > On Wed, Oct 23, 2019 at 03:25:00PM +0800, Jason Wang wrote:
> > > > > > On 2019/10/23 下午3:07, Tiwei Bie wrote:
> > > > > > > On Wed, Oct 23, 2019 at 01:46:23PM +0800, Jason Wang wrote:
> > > > > > > > On 2019/10/23 上午11:02, Tiwei Bie wrote:
> > > > > > > > > On Tue, Oct 22, 2019 at 09:30:16PM +0800, Jason Wang wrote:
> > > > > > > > > > On 2019/10/22 下午5:52, Tiwei Bie wrote:
> > > > > > > > > > > This patch introduces a mdev based hardware vhost backend.
> > > > > > > > > > > This backend is built on top of the same abstraction used
> > > > > > > > > > > in virtio-mdev and provides a generic vhost interface for
> > > > > > > > > > > userspace to accelerate the virtio devices in guest.
> > > > > > > > > > > 
> > > > > > > > > > > This backend is implemented as a mdev device driver on top
> > > > > > > > > > > of the same mdev device ops used in virtio-mdev but using
> > > > > > > > > > > a different mdev class id, and it will register the device
> > > > > > > > > > > as a VFIO device for userspace to use. Userspace can setup
> > > > > > > > > > > the IOMMU with the existing VFIO container/group APIs and
> > > > > > > > > > > then get the device fd with the device name. After getting
> > > > > > > > > > > the device fd of this device, userspace can use vhost 
> > > > > > > > > > > ioctls
> > > > > > > > > > > to setup the backend.
> > > > > > > > > > > 
> > > > > > > > > > > Signed-off-by: Tiwei Bie 
> > > > > > > > > > > ---
> > > > > > > > > > > This patch depends on below series:
> > > > > > > > > > > https://lkml.org/lkml/2019/10/17/286
> > > > > > > > > > > 
> > > > > > > > > > > v1 -> v2:
> > > > > > > > > > > - Replace _SET_STATE with _SET_STATUS (MST);
> > > > > > > > > > > - Check status bits at each step (MST);
> > > > > > > > > > > - Report the max ring size and max number of queues (MST);
> > > > > > > > > > > - Add missing MODULE_DEVICE_TABLE (Jason);
> > > > > > > > > > > - Only support the network backend w/o multiqueue for now;
> > > > > > > > > > Any idea on how to extend it to support
> > > > > > > > > > devices other than net? I think we
> > > > > > > > > > want a generic API or an API that could
> > > > > > > > > > be made generic in the future.
> > > > > > > > > > 
> > > > > > > > > > Do we want to e.g having a generic vhost
> > > > > > > > > > mdev for all kinds of devices or
> > > > > > > > > > introducing e.g vhost-net-mdev and vhost-scsi-mdev?
> > > > > > > > > One possible way is to do what vhost-user does. I.e. Apart 
> > > > > > > > > from
> > > > > > > > > the generic ring, features, ... related ioctls, we also 
> > > > > > > > > introduce
> > > > > > > > > device specific ioctls when we need them. As vhost-mdev just 
> > > > > > > > > needs
> > > > > > > > > to forward configs between parent and userspace and even won't
> > > > > > > > > cache any info when possible,
> > > > > > > > So it looks to me this is only possible if we
> > > > > > > > expose e.g set_config and
> > > > > > > > get_config to userspace.
> > > > > > > The set_config and get_config interface isn't really everything
> > > > > > > of device specific settings. We also have ctrl

Re: [PATCH v2] vhost: introduce mdev based hardware backend

2019-10-23 Thread Tiwei Bie
On Wed, Oct 23, 2019 at 06:29:21PM +0800, Jason Wang wrote:
> On 2019/10/23 下午6:11, Tiwei Bie wrote:
> > On Wed, Oct 23, 2019 at 03:25:00PM +0800, Jason Wang wrote:
> > > On 2019/10/23 下午3:07, Tiwei Bie wrote:
> > > > On Wed, Oct 23, 2019 at 01:46:23PM +0800, Jason Wang wrote:
> > > > > On 2019/10/23 上午11:02, Tiwei Bie wrote:
> > > > > > On Tue, Oct 22, 2019 at 09:30:16PM +0800, Jason Wang wrote:
> > > > > > > On 2019/10/22 下午5:52, Tiwei Bie wrote:
> > > > > > > > This patch introduces a mdev based hardware vhost backend.
> > > > > > > > This backend is built on top of the same abstraction used
> > > > > > > > in virtio-mdev and provides a generic vhost interface for
> > > > > > > > userspace to accelerate the virtio devices in guest.
> > > > > > > > 
> > > > > > > > This backend is implemented as a mdev device driver on top
> > > > > > > > of the same mdev device ops used in virtio-mdev but using
> > > > > > > > a different mdev class id, and it will register the device
> > > > > > > > as a VFIO device for userspace to use. Userspace can setup
> > > > > > > > the IOMMU with the existing VFIO container/group APIs and
> > > > > > > > then get the device fd with the device name. After getting
> > > > > > > > the device fd of this device, userspace can use vhost ioctls
> > > > > > > > to setup the backend.
> > > > > > > > 
> > > > > > > > Signed-off-by: Tiwei Bie 
> > > > > > > > ---
> > > > > > > > This patch depends on below series:
> > > > > > > > https://lkml.org/lkml/2019/10/17/286
> > > > > > > > 
> > > > > > > > v1 -> v2:
> > > > > > > > - Replace _SET_STATE with _SET_STATUS (MST);
> > > > > > > > - Check status bits at each step (MST);
> > > > > > > > - Report the max ring size and max number of queues (MST);
> > > > > > > > - Add missing MODULE_DEVICE_TABLE (Jason);
> > > > > > > > - Only support the network backend w/o multiqueue for now;
> > > > > > > Any idea on how to extend it to support devices other than net? I 
> > > > > > > think we
> > > > > > > want a generic API or an API that could be made generic in the 
> > > > > > > future.
> > > > > > > 
> > > > > > > Do we want to e.g having a generic vhost mdev for all kinds of 
> > > > > > > devices or
> > > > > > > introducing e.g vhost-net-mdev and vhost-scsi-mdev?
> > > > > > One possible way is to do what vhost-user does. I.e. Apart from
> > > > > > the generic ring, features, ... related ioctls, we also introduce
> > > > > > device specific ioctls when we need them. As vhost-mdev just needs
> > > > > > to forward configs between parent and userspace and even won't
> > > > > > cache any info when possible,
> > > > > So it looks to me this is only possible if we expose e.g set_config 
> > > > > and
> > > > > get_config to userspace.
> > > > The set_config and get_config interface isn't really everything
> > > > of device specific settings. We also have ctrlq in virtio-net.
> > > 
> > > Yes, but it could be processed by the exist API. Isn't it? Just set ctrl 
> > > vq
> > > address and let parent to deal with that.
> > I mean how to expose ctrlq related settings to userspace?
> 
> 
> I think it works like:
> 
> 1) userspace find ctrl_vq is supported
> 
> 2) then it can allocate memory for ctrl vq and set its address through
> vhost-mdev
> 
> 3) userspace can populate ctrl vq itself

I see. That is to say, userspace e.g. QEMU will program the
ctrl vq with the existing VHOST_*_VRING_* ioctls, and parent
drivers should know that the addresses used in ctrl vq are
host virtual addresses in vhost-mdev's case.

> 
> 
> > 
> > > 
> > > > > > I think it might be better to do
> > > > > > this in one generic vhost-mdev module.
> > > > > Looking at definitions of VhostUserRequest in qemu, it mixed generic 
> > > > > API
> > > > > with device specific API. If we want go this ways (a generic 

Re: [PATCH v2] vhost: introduce mdev based hardware backend

2019-10-23 Thread Tiwei Bie
On Wed, Oct 23, 2019 at 03:25:00PM +0800, Jason Wang wrote:
> On 2019/10/23 下午3:07, Tiwei Bie wrote:
> > On Wed, Oct 23, 2019 at 01:46:23PM +0800, Jason Wang wrote:
> > > On 2019/10/23 上午11:02, Tiwei Bie wrote:
> > > > On Tue, Oct 22, 2019 at 09:30:16PM +0800, Jason Wang wrote:
> > > > > On 2019/10/22 下午5:52, Tiwei Bie wrote:
> > > > > > This patch introduces a mdev based hardware vhost backend.
> > > > > > This backend is built on top of the same abstraction used
> > > > > > in virtio-mdev and provides a generic vhost interface for
> > > > > > userspace to accelerate the virtio devices in guest.
> > > > > > 
> > > > > > This backend is implemented as a mdev device driver on top
> > > > > > of the same mdev device ops used in virtio-mdev but using
> > > > > > a different mdev class id, and it will register the device
> > > > > > as a VFIO device for userspace to use. Userspace can setup
> > > > > > the IOMMU with the existing VFIO container/group APIs and
> > > > > > then get the device fd with the device name. After getting
> > > > > > the device fd of this device, userspace can use vhost ioctls
> > > > > > to setup the backend.
> > > > > > 
> > > > > > Signed-off-by: Tiwei Bie 
> > > > > > ---
> > > > > > This patch depends on below series:
> > > > > > https://lkml.org/lkml/2019/10/17/286
> > > > > > 
> > > > > > v1 -> v2:
> > > > > > - Replace _SET_STATE with _SET_STATUS (MST);
> > > > > > - Check status bits at each step (MST);
> > > > > > - Report the max ring size and max number of queues (MST);
> > > > > > - Add missing MODULE_DEVICE_TABLE (Jason);
> > > > > > - Only support the network backend w/o multiqueue for now;
> > > > > Any idea on how to extend it to support devices other than net? I 
> > > > > think we
> > > > > want a generic API or an API that could be made generic in the future.
> > > > > 
> > > > > Do we want to e.g having a generic vhost mdev for all kinds of 
> > > > > devices or
> > > > > introducing e.g vhost-net-mdev and vhost-scsi-mdev?
> > > > One possible way is to do what vhost-user does. I.e. Apart from
> > > > the generic ring, features, ... related ioctls, we also introduce
> > > > device specific ioctls when we need them. As vhost-mdev just needs
> > > > to forward configs between parent and userspace and even won't
> > > > cache any info when possible,
> > > 
> > > So it looks to me this is only possible if we expose e.g set_config and
> > > get_config to userspace.
> > The set_config and get_config interface isn't really everything
> > of device specific settings. We also have ctrlq in virtio-net.
> 
> 
> Yes, but it could be processed by the exist API. Isn't it? Just set ctrl vq
> address and let parent to deal with that.

I mean how to expose ctrlq related settings to userspace?

> 
> 
> > 
> > > 
> > > > I think it might be better to do
> > > > this in one generic vhost-mdev module.
> > > 
> > > Looking at definitions of VhostUserRequest in qemu, it mixed generic API
> > > with device specific API. If we want go this ways (a generic vhost-mdev),
> > > more questions needs to be answered:
> > > 
> > > 1) How could userspace know which type of vhost it would use? Do we need 
> > > to
> > > expose virtio subsystem device in for userspace this case?
> > > 
> > > 2) That generic vhost-mdev module still need to filter out unsupported
> > > ioctls for a specific type. E.g if it probes a net device, it should 
> > > refuse
> > > API for other type. This in fact a vhost-mdev-net but just not modularize 
> > > it
> > > on top of vhost-mdev.
> > > 
> > > 
> > > > > > - Some minor fixes and improvements;
> > > > > > - Rebase on top of virtio-mdev series v4;
> > [...]
> > > > > > +
> > > > > > +static long vhost_mdev_get_features(struct vhost_mdev *m, u64 
> > > > > > __user *featurep)
> > > > > > +{
> > > > > > +   if (copy_to_user(featurep, >features, sizeof(m->features)))
> > > > > > +   return -EFAULT;
> > > >

Re: [PATCH v2] vhost: introduce mdev based hardware backend

2019-10-23 Thread Tiwei Bie
On Wed, Oct 23, 2019 at 01:46:23PM +0800, Jason Wang wrote:
> On 2019/10/23 上午11:02, Tiwei Bie wrote:
> > On Tue, Oct 22, 2019 at 09:30:16PM +0800, Jason Wang wrote:
> > > On 2019/10/22 下午5:52, Tiwei Bie wrote:
> > > > This patch introduces a mdev based hardware vhost backend.
> > > > This backend is built on top of the same abstraction used
> > > > in virtio-mdev and provides a generic vhost interface for
> > > > userspace to accelerate the virtio devices in guest.
> > > > 
> > > > This backend is implemented as a mdev device driver on top
> > > > of the same mdev device ops used in virtio-mdev but using
> > > > a different mdev class id, and it will register the device
> > > > as a VFIO device for userspace to use. Userspace can setup
> > > > the IOMMU with the existing VFIO container/group APIs and
> > > > then get the device fd with the device name. After getting
> > > > the device fd of this device, userspace can use vhost ioctls
> > > > to setup the backend.
> > > > 
> > > > Signed-off-by: Tiwei Bie 
> > > > ---
> > > > This patch depends on below series:
> > > > https://lkml.org/lkml/2019/10/17/286
> > > > 
> > > > v1 -> v2:
> > > > - Replace _SET_STATE with _SET_STATUS (MST);
> > > > - Check status bits at each step (MST);
> > > > - Report the max ring size and max number of queues (MST);
> > > > - Add missing MODULE_DEVICE_TABLE (Jason);
> > > > - Only support the network backend w/o multiqueue for now;
> > > 
> > > Any idea on how to extend it to support devices other than net? I think we
> > > want a generic API or an API that could be made generic in the future.
> > > 
> > > Do we want to e.g having a generic vhost mdev for all kinds of devices or
> > > introducing e.g vhost-net-mdev and vhost-scsi-mdev?
> > One possible way is to do what vhost-user does. I.e. Apart from
> > the generic ring, features, ... related ioctls, we also introduce
> > device specific ioctls when we need them. As vhost-mdev just needs
> > to forward configs between parent and userspace and even won't
> > cache any info when possible,
> 
> 
> So it looks to me this is only possible if we expose e.g set_config and
> get_config to userspace.

The set_config and get_config interface isn't really everything
of device specific settings. We also have ctrlq in virtio-net.

> 
> 
> > I think it might be better to do
> > this in one generic vhost-mdev module.
> 
> 
> Looking at definitions of VhostUserRequest in qemu, it mixed generic API
> with device specific API. If we want go this ways (a generic vhost-mdev),
> more questions needs to be answered:
> 
> 1) How could userspace know which type of vhost it would use? Do we need to
> expose virtio subsystem device in for userspace this case?
> 
> 2) That generic vhost-mdev module still need to filter out unsupported
> ioctls for a specific type. E.g if it probes a net device, it should refuse
> API for other type. This in fact a vhost-mdev-net but just not modularize it
> on top of vhost-mdev.
> 
> 
> > 
> > > 
> > > > - Some minor fixes and improvements;
> > > > - Rebase on top of virtio-mdev series v4;
[...]
> > > > +
> > > > +static long vhost_mdev_get_features(struct vhost_mdev *m, u64 __user 
> > > > *featurep)
> > > > +{
> > > > +   if (copy_to_user(featurep, >features, sizeof(m->features)))
> > > > +   return -EFAULT;
> > > 
> > > As discussed in previous version do we need to filter out MQ feature here?
> > I think it's more straightforward to let the parent drivers to
> > filter out the unsupported features. Otherwise it would be tricky
> > when we want to add more features in vhost-mdev module,
> 
> 
> It's as simple as remove the feature from blacklist?

It's not really that easy. It may break the old drivers.

> 
> 
> > i.e. if
> > the parent drivers may expose unsupported features and relay on
> > vhost-mdev to filter them out, these features will be exposed
> > to userspace automatically when they are enabled in vhost-mdev
> > in the future.
> 
> 
> The issue is, it's only that vhost-mdev knows its own limitation. E.g in
> this patch, vhost-mdev only implements a subset of transport API, but parent
> doesn't know about that.
> 
> Still MQ as an example, there's no way (or no need) for parent to know that
> vhost-mdev does not support MQ.

The mdev is a MDEV_CLAS

Re: [PATCH v2] vhost: introduce mdev based hardware backend

2019-10-22 Thread Tiwei Bie
On Tue, Oct 22, 2019 at 09:30:16PM +0800, Jason Wang wrote:
> On 2019/10/22 下午5:52, Tiwei Bie wrote:
> > This patch introduces a mdev based hardware vhost backend.
> > This backend is built on top of the same abstraction used
> > in virtio-mdev and provides a generic vhost interface for
> > userspace to accelerate the virtio devices in guest.
> > 
> > This backend is implemented as a mdev device driver on top
> > of the same mdev device ops used in virtio-mdev but using
> > a different mdev class id, and it will register the device
> > as a VFIO device for userspace to use. Userspace can setup
> > the IOMMU with the existing VFIO container/group APIs and
> > then get the device fd with the device name. After getting
> > the device fd of this device, userspace can use vhost ioctls
> > to setup the backend.
> > 
> > Signed-off-by: Tiwei Bie 
> > ---
> > This patch depends on below series:
> > https://lkml.org/lkml/2019/10/17/286
> > 
> > v1 -> v2:
> > - Replace _SET_STATE with _SET_STATUS (MST);
> > - Check status bits at each step (MST);
> > - Report the max ring size and max number of queues (MST);
> > - Add missing MODULE_DEVICE_TABLE (Jason);
> > - Only support the network backend w/o multiqueue for now;
> 
> 
> Any idea on how to extend it to support devices other than net? I think we
> want a generic API or an API that could be made generic in the future.
> 
> Do we want to e.g having a generic vhost mdev for all kinds of devices or
> introducing e.g vhost-net-mdev and vhost-scsi-mdev?

One possible way is to do what vhost-user does. I.e. Apart from
the generic ring, features, ... related ioctls, we also introduce
device specific ioctls when we need them. As vhost-mdev just needs
to forward configs between parent and userspace and even won't
cache any info when possible, I think it might be better to do
this in one generic vhost-mdev module.

> 
> 
> > - Some minor fixes and improvements;
> > - Rebase on top of virtio-mdev series v4;
> > 
> > RFC v4 -> v1:
> > - Implement vhost-mdev as a mdev device driver directly and
> >connect it to VFIO container/group. (Jason);
> > - Pass ring addresses as GPAs/IOVAs in vhost-mdev to avoid
> >meaningless HVA->GPA translations (Jason);
> > 
> > RFC v3 -> RFC v4:
> > - Build vhost-mdev on top of the same abstraction used by
> >virtio-mdev (Jason);
> > - Introduce vhost fd and pass VFIO fd via SET_BACKEND ioctl (MST);
> > 
> > RFC v2 -> RFC v3:
> > - Reuse vhost's ioctls instead of inventing a VFIO regions/irqs
> >based vhost protocol on top of vfio-mdev (Jason);
> > 
> > RFC v1 -> RFC v2:
> > - Introduce a new VFIO device type to build a vhost protocol
> >on top of vfio-mdev;
> > 
> >   drivers/vfio/mdev/mdev_core.c |  12 +
> >   drivers/vhost/Kconfig |   9 +
> >   drivers/vhost/Makefile|   3 +
> >   drivers/vhost/mdev.c  | 415 ++
> >   include/linux/mdev.h  |   3 +
> >   include/uapi/linux/vhost.h|  13 ++
> >   6 files changed, 455 insertions(+)
> >   create mode 100644 drivers/vhost/mdev.c
> > 
> > diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> > index 5834f6b7c7a5..2963f65e6648 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> > @@ -69,6 +69,18 @@ void mdev_set_virtio_ops(struct mdev_device *mdev,
> >   }
> >   EXPORT_SYMBOL(mdev_set_virtio_ops);
> > +/* Specify the vhost device ops for the mdev device, this
> > + * must be called during create() callback for vhost mdev device.
> > + */
> > +void mdev_set_vhost_ops(struct mdev_device *mdev,
> > +   const struct virtio_mdev_device_ops *vhost_ops)
> > +{
> > +   WARN_ON(mdev->class_id);
> > +   mdev->class_id = MDEV_CLASS_ID_VHOST;
> > +   mdev->device_ops = vhost_ops;
> > +}
> > +EXPORT_SYMBOL(mdev_set_vhost_ops);
> > +
> >   const void *mdev_get_dev_ops(struct mdev_device *mdev)
> >   {
> > return mdev->device_ops;
> > diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> > index 3d03ccbd1adc..7b5c2f655af7 100644
> > --- a/drivers/vhost/Kconfig
> > +++ b/drivers/vhost/Kconfig
> > @@ -34,6 +34,15 @@ config VHOST_VSOCK
> > To compile this driver as a module, choose M here: the module will be 
> > called
> > vhost_vsock.
> > +config VHOST_MDEV
> > +   tristate "Vhost driver for Mediated devices"
> > +   depends on EVENTFD &

[PATCH v2] vhost: introduce mdev based hardware backend

2019-10-22 Thread Tiwei Bie
This patch introduces a mdev based hardware vhost backend.
This backend is built on top of the same abstraction used
in virtio-mdev and provides a generic vhost interface for
userspace to accelerate the virtio devices in guest.

This backend is implemented as a mdev device driver on top
of the same mdev device ops used in virtio-mdev but using
a different mdev class id, and it will register the device
as a VFIO device for userspace to use. Userspace can setup
the IOMMU with the existing VFIO container/group APIs and
then get the device fd with the device name. After getting
the device fd of this device, userspace can use vhost ioctls
to setup the backend.

Signed-off-by: Tiwei Bie 
---
This patch depends on below series:
https://lkml.org/lkml/2019/10/17/286

v1 -> v2:
- Replace _SET_STATE with _SET_STATUS (MST);
- Check status bits at each step (MST);
- Report the max ring size and max number of queues (MST);
- Add missing MODULE_DEVICE_TABLE (Jason);
- Only support the network backend w/o multiqueue for now;
- Some minor fixes and improvements;
- Rebase on top of virtio-mdev series v4;

RFC v4 -> v1:
- Implement vhost-mdev as a mdev device driver directly and
  connect it to VFIO container/group. (Jason);
- Pass ring addresses as GPAs/IOVAs in vhost-mdev to avoid
  meaningless HVA->GPA translations (Jason);

RFC v3 -> RFC v4:
- Build vhost-mdev on top of the same abstraction used by
  virtio-mdev (Jason);
- Introduce vhost fd and pass VFIO fd via SET_BACKEND ioctl (MST);

RFC v2 -> RFC v3:
- Reuse vhost's ioctls instead of inventing a VFIO regions/irqs
  based vhost protocol on top of vfio-mdev (Jason);

RFC v1 -> RFC v2:
- Introduce a new VFIO device type to build a vhost protocol
  on top of vfio-mdev;

 drivers/vfio/mdev/mdev_core.c |  12 +
 drivers/vhost/Kconfig |   9 +
 drivers/vhost/Makefile|   3 +
 drivers/vhost/mdev.c  | 415 ++
 include/linux/mdev.h  |   3 +
 include/uapi/linux/vhost.h|  13 ++
 6 files changed, 455 insertions(+)
 create mode 100644 drivers/vhost/mdev.c

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 5834f6b7c7a5..2963f65e6648 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -69,6 +69,18 @@ void mdev_set_virtio_ops(struct mdev_device *mdev,
 }
 EXPORT_SYMBOL(mdev_set_virtio_ops);
 
+/* Specify the vhost device ops for the mdev device, this
+ * must be called during create() callback for vhost mdev device.
+ */
+void mdev_set_vhost_ops(struct mdev_device *mdev,
+   const struct virtio_mdev_device_ops *vhost_ops)
+{
+   WARN_ON(mdev->class_id);
+   mdev->class_id = MDEV_CLASS_ID_VHOST;
+   mdev->device_ops = vhost_ops;
+}
+EXPORT_SYMBOL(mdev_set_vhost_ops);
+
 const void *mdev_get_dev_ops(struct mdev_device *mdev)
 {
return mdev->device_ops;
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 3d03ccbd1adc..7b5c2f655af7 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -34,6 +34,15 @@ config VHOST_VSOCK
To compile this driver as a module, choose M here: the module will be 
called
vhost_vsock.
 
+config VHOST_MDEV
+   tristate "Vhost driver for Mediated devices"
+   depends on EVENTFD && VFIO && VFIO_MDEV
+   select VHOST
+   default n
+   ---help---
+   Say M here to enable the vhost_mdev module for use with
+   the mediated device based hardware vhost accelerators.
+
 config VHOST
tristate
---help---
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 6c6df24f770c..ad9c0f8c6d8c 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -10,4 +10,7 @@ vhost_vsock-y := vsock.o
 
 obj-$(CONFIG_VHOST_RING) += vringh.o
 
+obj-$(CONFIG_VHOST_MDEV) += vhost_mdev.o
+vhost_mdev-y := mdev.o
+
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/drivers/vhost/mdev.c b/drivers/vhost/mdev.c
new file mode 100644
index ..5f9cae61018c
--- /dev/null
+++ b/drivers/vhost/mdev.c
@@ -0,0 +1,415 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018-2019 Intel Corporation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vhost.h"
+
+/* Currently, only network backend w/o multiqueue is supported. */
+#define VHOST_MDEV_VQ_MAX  2
+
+struct vhost_mdev {
+   /* The lock is to protect this structure. */
+   struct mutex mutex;
+   struct vhost_dev dev;
+   struct vhost_virtqueue *vqs;
+   int nvqs;
+   u64 status;
+   u64 features;
+   u64 acked_features;
+   bool opened;
+   struct mdev_device *mdev;
+};
+
+static void handle_vq_kick(struct vhost_work *work)
+{
+   struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
+ poll.work);
+   struct vhost_mdev

Re: [PATCH V4 4/6] mdev: introduce virtio device and its device ops

2019-10-18 Thread Tiwei Bie
On Thu, Oct 17, 2019 at 06:48:34PM +0800, Jason Wang wrote:
> + * @get_vq_state:Get the state for a virtqueue
> + *   @mdev: mediated device
> + *   @idx: virtqueue index
> + *   Returns virtqueue state (last_avail_idx)
> + * @get_vq_align:Get the virtqueue align requirement
> + *   for the device
> + *   @mdev: mediated device
> + *   Returns virtqueue algin requirement
> + * @get_features:Get virtio features supported by the device
> + *   @mdev: mediated device
> + *   Returns the virtio features support by the
> + *   device
> + * @get_features:Set virtio features supported by the driver

s/get_features/set_features/


> + *   configration space
> + * @get_mdev_features:   Get the feature of virtio mdev device
> + *   @mdev: mediated device
> + *   Returns the mdev features (API) support by
> + *   the device.
> + * @get_generation:  Get device generaton
> + *   @mdev: mediated device
> + *   Returns u32: device generation
> + */
> +struct virtio_mdev_device_ops {
> + /* Virtqueue ops */
> + int (*set_vq_address)(struct mdev_device *mdev,
> +   u16 idx, u64 desc_area, u64 driver_area,
> +   u64 device_area);
> + void (*set_vq_num)(struct mdev_device *mdev, u16 idx, u32 num);
> + void (*kick_vq)(struct mdev_device *mdev, u16 idx);
> + void (*set_vq_cb)(struct mdev_device *mdev, u16 idx,
> +   struct virtio_mdev_callback *cb);
> + void (*set_vq_ready)(struct mdev_device *mdev, u16 idx, bool ready);
> + bool (*get_vq_ready)(struct mdev_device *mdev, u16 idx);
> + int (*set_vq_state)(struct mdev_device *mdev, u16 idx, u64 state);
> + u64 (*get_vq_state)(struct mdev_device *mdev, u16 idx);
> +
> + /* Device ops */
> + u16 (*get_vq_align)(struct mdev_device *mdev);
> + u64 (*get_features)(struct mdev_device *mdev);
> + int (*set_features)(struct mdev_device *mdev, u64 features);
> + void (*set_config_cb)(struct mdev_device *mdev,
> +   struct virtio_mdev_callback *cb);
> + u16 (*get_vq_num_max)(struct mdev_device *mdev);
> + u32 (*get_device_id)(struct mdev_device *mdev);
> + u32 (*get_vendor_id)(struct mdev_device *mdev);
> + u8 (*get_status)(struct mdev_device *mdev);
> + void (*set_status)(struct mdev_device *mdev, u8 status);
> + void (*get_config)(struct mdev_device *mdev, unsigned int offset,
> +void *buf, unsigned int len);
> + void (*set_config)(struct mdev_device *mdev, unsigned int offset,
> +const void *buf, unsigned int len);
> + u64 (*get_mdev_features)(struct mdev_device *mdev);

Do we need a .set_mdev_features method as well?

It's not very clear what does mdev_features mean.
Does it mean the vhost backend features?

https://github.com/torvalds/linux/blob/0e2adab6cf285c41e825b6c74a3aa61324d1132c/include/uapi/linux/vhost.h#L93-L94


> + u32 (*get_generation)(struct mdev_device *mdev);
> +};
> +
> +void mdev_set_virtio_ops(struct mdev_device *mdev,
> +  const struct virtio_mdev_device_ops *virtio_ops);
> +
> +#endif
> -- 
> 2.19.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost: introduce mdev based hardware backend

2019-09-27 Thread Tiwei Bie
On Fri, Sep 27, 2019 at 03:14:42PM +0800, Jason Wang wrote:
> On 2019/9/27 下午12:54, Tiwei Bie wrote:
> > > > +
> > > > +   /*
> > > > +* In vhost-mdev, userspace should pass ring addresses
> > > > +* in guest physical addresses when IOMMU is disabled or
> > > > +* IOVAs when IOMMU is enabled.
> > > > +*/
> > > A question here, consider we're using noiommu mode. If guest physical
> > > address is passed here, how can a device use that?
> > > 
> > > I believe you meant "host physical address" here? And it also have the
> > > implication that the HPA should be continuous (e.g using hugetlbfs).
> > The comment is talking about the virtual IOMMU (i.e. iotlb in vhost).
> > It should be rephrased to cover the noiommu case as well. Thanks for
> > spotting this.
> 
> 
> So the question still, if GPA is passed how can it be used by the
> virtio-mdev device?

Sorry if I didn't make it clear..
Of course, GPA can't be passed in noiommu mode.


> 
> Thanks
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost: introduce mdev based hardware backend

2019-09-26 Thread Tiwei Bie
On Fri, Sep 27, 2019 at 11:46:06AM +0800, Jason Wang wrote:
> On 2019/9/26 下午12:54, Tiwei Bie wrote:
> > +
> > +static long vhost_mdev_start(struct vhost_mdev *m)
> > +{
> > +   struct mdev_device *mdev = m->mdev;
> > +   const struct virtio_mdev_device_ops *ops = mdev_get_dev_ops(mdev);
> > +   struct virtio_mdev_callback cb;
> > +   struct vhost_virtqueue *vq;
> > +   int idx;
> > +
> > +   ops->set_features(mdev, m->acked_features);
> > +
> > +   mdev_add_status(mdev, VIRTIO_CONFIG_S_FEATURES_OK);
> > +   if (!(mdev_get_status(mdev) & VIRTIO_CONFIG_S_FEATURES_OK))
> > +   goto reset;
> > +
> > +   for (idx = 0; idx < m->nvqs; idx++) {
> > +   vq = >vqs[idx];
> > +
> > +   if (!vq->desc || !vq->avail || !vq->used)
> > +   break;
> > +
> > +   if (ops->set_vq_state(mdev, idx, vq->last_avail_idx))
> > +   goto reset;
> 
> 
> If we do set_vq_state() in SET_VRING_BASE, we won't need this step here.

Yeah, I plan to do it in the next version.

> 
> 
> > +
> > +   /*
> > +* In vhost-mdev, userspace should pass ring addresses
> > +* in guest physical addresses when IOMMU is disabled or
> > +* IOVAs when IOMMU is enabled.
> > +*/
> 
> 
> A question here, consider we're using noiommu mode. If guest physical
> address is passed here, how can a device use that?
> 
> I believe you meant "host physical address" here? And it also have the
> implication that the HPA should be continuous (e.g using hugetlbfs).

The comment is talking about the virtual IOMMU (i.e. iotlb in vhost).
It should be rephrased to cover the noiommu case as well. Thanks for
spotting this.


> > +
> > +   switch (cmd) {
> > +   case VHOST_MDEV_SET_STATE:
> > +   r = vhost_set_state(m, argp);
> > +   break;
> > +   case VHOST_GET_FEATURES:
> > +   r = vhost_get_features(m, argp);
> > +   break;
> > +   case VHOST_SET_FEATURES:
> > +   r = vhost_set_features(m, argp);
> > +   break;
> > +   case VHOST_GET_VRING_BASE:
> > +   r = vhost_get_vring_base(m, argp);
> > +   break;
> 
> 
> Does it mean the SET_VRING_BASE may only take affect after
> VHOST_MEV_SET_STATE?

Yeah, in this version, SET_VRING_BASE won't set the base to the
device directly. But I plan to not delay this anymore in the next
version to support the SET_STATUS.

> 
> 
> > +   default:
> > +   r = vhost_dev_ioctl(>dev, cmd, argp);
> > +   if (r == -ENOIOCTLCMD)
> > +   r = vhost_vring_ioctl(>dev, cmd, argp);
> > +   }
> > +
> > +   mutex_unlock(>mutex);
> > +   return r;
> > +}
> > +
> > +static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = {
> > +   .name   = "vfio-vhost-mdev",
> > +   .open   = vhost_mdev_open,
> > +   .release= vhost_mdev_release,
> > +   .ioctl  = vhost_mdev_unlocked_ioctl,
> > +};
> > +
> > +static int vhost_mdev_probe(struct device *dev)
> > +{
> > +   struct mdev_device *mdev = mdev_from_dev(dev);
> > +   const struct virtio_mdev_device_ops *ops = mdev_get_dev_ops(mdev);
> > +   struct vhost_mdev *m;
> > +   int nvqs, r;
> > +
> > +   m = kzalloc(sizeof(*m), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
> > +   if (!m)
> > +   return -ENOMEM;
> > +
> > +   mutex_init(>mutex);
> > +
> > +   nvqs = ops->get_queue_max(mdev);
> > +   m->nvqs = nvqs;
> 
> 
> The name could be confusing, get_queue_max() is to get the maximum number of
> entries for a virtqueue supported by this device.

OK. It might be better to rename it to something like:

get_vq_num_max()

which is more consistent with the set_vq_num().

> 
> It looks to me that we need another API to query the maximum number of
> virtqueues supported by the device.

Yeah.

Thanks,
Tiwei


> 
> Thanks
> 
> 
> > +
> > +   m->vqs = kmalloc_array(nvqs, sizeof(struct vhost_virtqueue),
> > +  GFP_KERNEL);
> > +   if (!m->vqs) {
> > +   r = -ENOMEM;
> > +   goto err;
> > +   }
> > +
> > +   r = vfio_add_group_dev(dev, _vhost_mdev_dev_ops, m);
> > +   if (r)
> > +   goto err;
> > +
> > +   m->features = ops->get_features(mdev);
> > +   m->mdev = mdev;
> > +   return 0;
> > +
> > +err:
> 

Re: [PATCH] vhost: introduce mdev based hardware backend

2019-09-26 Thread Tiwei Bie
On Fri, Sep 27, 2019 at 11:51:35AM +0800, Jason Wang wrote:
> On 2019/9/27 上午11:46, Jason Wang wrote:
> > +
> > +static struct mdev_class_id id_table[] = {
> > +    { MDEV_ID_VHOST },
> > +    { 0 },
> > +};
> > +
> > +static struct mdev_driver vhost_mdev_driver = {
> > +    .name    = "vhost_mdev",
> > +    .probe    = vhost_mdev_probe,
> > +    .remove    = vhost_mdev_remove,
> > +    .id_table = id_table,
> > +};
> > +
> 
> 
> And you probably need to add MODULE_DEVICE_TABLE() as well.

Yeah, thanks!


> 
> Thanks
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost: introduce mdev based hardware backend

2019-09-26 Thread Tiwei Bie
On Thu, Sep 26, 2019 at 09:26:22AM -0400, Michael S. Tsirkin wrote:
> On Thu, Sep 26, 2019 at 09:14:39PM +0800, Tiwei Bie wrote:
> > > 4. Does device need to limit max ring size?
> > > 5. Does device need to limit max number of queues?
> > 
> > I think so. It's helpful to have ioctls to report the max
> > ring size and max number of queues.
> 
> Also, let's not repeat the vhost net mistakes, let's lock
> everything to the order required by the virtio spec,
> checking status bits at each step.
> E.g.:
>   set backend features
>   set features
>   detect and program vqs
>   enable vqs
>   enable driver
> 
> and check status at each step to force the correct order.
> e.g. don't allow enabling vqs after driver ok, etc

Got it. Thanks a lot!

Regards,
Tiwei

> 
> -- 
> MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost: introduce mdev based hardware backend

2019-09-26 Thread Tiwei Bie
On Thu, Sep 26, 2019 at 04:35:18AM -0400, Michael S. Tsirkin wrote:
> On Thu, Sep 26, 2019 at 12:54:27PM +0800, Tiwei Bie wrote:
[...]
> > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> > index 40d028eed645..5afbc2f08fa3 100644
> > --- a/include/uapi/linux/vhost.h
> > +++ b/include/uapi/linux/vhost.h
> > @@ -116,4 +116,12 @@
> >  #define VHOST_VSOCK_SET_GUEST_CID  _IOW(VHOST_VIRTIO, 0x60, __u64)
> >  #define VHOST_VSOCK_SET_RUNNING_IOW(VHOST_VIRTIO, 0x61, int)
> >  
> > +/* VHOST_MDEV specific defines */
> > +
> > +#define VHOST_MDEV_SET_STATE   _IOW(VHOST_VIRTIO, 0x70, __u64)
> > +
> > +#define VHOST_MDEV_S_STOPPED   0
> > +#define VHOST_MDEV_S_RUNNING   1
> > +#define VHOST_MDEV_S_MAX   2
> > +
> >  #endif
> 
> So assuming we have an underlying device that behaves like virtio:

I think they are really good questions/suggestions. Thanks!

> 
> 1. Should we use SET_STATUS maybe?

I like this idea. I will give it a try.

> 2. Do we want a reset ioctl?

I think it is helpful. If we use SET_STATUS, maybe we
can use it to support the reset.

> 3. Do we want ability to enable rings individually?

I will make it possible at least in the vhost layer.

> 4. Does device need to limit max ring size?
> 5. Does device need to limit max number of queues?

I think so. It's helpful to have ioctls to report the max
ring size and max number of queues.

Thanks!
Tiwei


> 
> -- 
> MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] vhost: introduce mdev based hardware backend

2019-09-25 Thread Tiwei Bie
This patch introduces a mdev based hardware vhost backend.
This backend is built on top of the same abstraction used
in virtio-mdev and provides a generic vhost interface for
userspace to accelerate the virtio devices in guest.

This backend is implemented as a mdev device driver on top
of the same mdev device ops used in virtio-mdev but using
a different mdev class id, and it will register the device
as a VFIO device for userspace to use. Userspace can setup
the IOMMU with the existing VFIO container/group APIs and
then get the device fd with the device name. After getting
the device fd of this device, userspace can use vhost ioctls
to setup the backend.

Signed-off-by: Tiwei Bie 
---
This patch depends on below series:
https://lkml.org/lkml/2019/9/24/357

RFC v4 -> v1:
- Implement vhost-mdev as a mdev device driver directly and
  connect it to VFIO container/group. (Jason);
- Pass ring addresses as GPAs/IOVAs in vhost-mdev to avoid
  meaningless HVA->GPA translations (Jason);

RFC v3 -> RFC v4:
- Build vhost-mdev on top of the same abstraction used by
  virtio-mdev (Jason);
- Introduce vhost fd and pass VFIO fd via SET_BACKEND ioctl (MST);

RFC v2 -> RFC v3:
- Reuse vhost's ioctls instead of inventing a VFIO regions/irqs
  based vhost protocol on top of vfio-mdev (Jason);

RFC v1 -> RFC v2:
- Introduce a new VFIO device type to build a vhost protocol
  on top of vfio-mdev;

 drivers/vhost/Kconfig  |   9 +
 drivers/vhost/Makefile |   3 +
 drivers/vhost/mdev.c   | 381 +
 include/uapi/linux/vhost.h |   8 +
 4 files changed, 401 insertions(+)
 create mode 100644 drivers/vhost/mdev.c

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 3d03ccbd1adc..decf0be8efe9 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -34,6 +34,15 @@ config VHOST_VSOCK
To compile this driver as a module, choose M here: the module will be 
called
vhost_vsock.
 
+config VHOST_MDEV
+   tristate "Vhost driver for Mediated devices"
+   depends on EVENTFD && VFIO && VFIO_MDEV
+   select VHOST
+   default n
+   ---help---
+   Say M here to enable the vhost_mdev module for use with
+   the mediated device based hardware vhost accelerators
+
 config VHOST
tristate
---help---
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 6c6df24f770c..ad9c0f8c6d8c 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -10,4 +10,7 @@ vhost_vsock-y := vsock.o
 
 obj-$(CONFIG_VHOST_RING) += vringh.o
 
+obj-$(CONFIG_VHOST_MDEV) += vhost_mdev.o
+vhost_mdev-y := mdev.o
+
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/drivers/vhost/mdev.c b/drivers/vhost/mdev.c
new file mode 100644
index ..1c12a25b86a2
--- /dev/null
+++ b/drivers/vhost/mdev.c
@@ -0,0 +1,381 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018-2019 Intel Corporation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vhost.h"
+
+struct vhost_mdev {
+   /* The lock is to protect this structure. */
+   struct mutex mutex;
+   struct vhost_dev dev;
+   struct vhost_virtqueue *vqs;
+   int nvqs;
+   u64 state;
+   u64 features;
+   u64 acked_features;
+   bool opened;
+   struct mdev_device *mdev;
+};
+
+static u8 mdev_get_status(struct mdev_device *mdev)
+{
+   const struct virtio_mdev_device_ops *ops = mdev_get_dev_ops(mdev);
+
+   return ops->get_status(mdev);
+}
+
+static void mdev_set_status(struct mdev_device *mdev, u8 status)
+{
+   const struct virtio_mdev_device_ops *ops = mdev_get_dev_ops(mdev);
+
+   return ops->set_status(mdev, status);
+}
+
+static void mdev_add_status(struct mdev_device *mdev, u8 status)
+{
+   status |= mdev_get_status(mdev);
+   mdev_set_status(mdev, status);
+}
+
+static void mdev_reset(struct mdev_device *mdev)
+{
+   mdev_set_status(mdev, 0);
+}
+
+static void handle_vq_kick(struct vhost_work *work)
+{
+   struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
+ poll.work);
+   struct vhost_mdev *m = container_of(vq->dev, struct vhost_mdev, dev);
+   const struct virtio_mdev_device_ops *ops = mdev_get_dev_ops(m->mdev);
+
+   ops->kick_vq(m->mdev, vq - m->vqs);
+}
+
+static irqreturn_t vhost_mdev_virtqueue_cb(void *private)
+{
+   struct vhost_virtqueue *vq = private;
+   struct eventfd_ctx *call_ctx = vq->call_ctx;
+
+   if (call_ctx)
+   eventfd_signal(call_ctx, 1);
+   return IRQ_HANDLED;
+}
+
+static long vhost_mdev_reset(struct vhost_mdev *m)
+{
+   struct mdev_device *mdev = m->mdev;
+
+   mdev_reset(mdev);
+   mdev_add_status(mdev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
+   mdev_add_status(mdev, VIRTIO_CONFIG_S_DRIVER);
+   return 0;
+}
+
+static l

Re: [RFC v4 3/3] vhost: introduce mdev based hardware backend

2019-09-19 Thread Tiwei Bie
On Tue, Sep 17, 2019 at 03:26:30PM +0800, Jason Wang wrote:
> On 2019/9/17 上午9:02, Tiwei Bie wrote:
> > diff --git a/drivers/vhost/mdev.c b/drivers/vhost/mdev.c
> > new file mode 100644
> > index ..8c6597aff45e
> > --- /dev/null
> > +++ b/drivers/vhost/mdev.c
> > @@ -0,0 +1,462 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2018-2019 Intel Corporation.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "vhost.h"
> > +
> > +struct vhost_mdev {
> > +   struct mutex mutex;
> > +   struct vhost_dev dev;
> > +   struct vhost_virtqueue *vqs;
> > +   int nvqs;
> > +   u64 state;
> > +   u64 features;
> > +   u64 acked_features;
> > +   struct vfio_group *vfio_group;
> > +   struct vfio_device *vfio_device;
> > +   struct mdev_device *mdev;
> > +};
> > +
> > +/*
> > + * XXX
> > + * We assume virtio_mdev.ko exposes below symbols for now, as we
> > + * don't have a proper way to access parent ops directly yet.
> > + *
> > + * virtio_mdev_readl()
> > + * virtio_mdev_writel()
> > + */
> > +extern u32 virtio_mdev_readl(struct mdev_device *mdev, loff_t off);
> > +extern void virtio_mdev_writel(struct mdev_device *mdev, loff_t off, u32 
> > val);
> 
> 
> Need to consider a better approach, I feel we should do it through some kind
> of mdev driver instead of talk to mdev device directly.

Yeah, a better approach is really needed here.
Besides, we may want a way to allow accessing the mdev
device_ops proposed in below series outside the
drivers/vfio/mdev/ directory.

https://lkml.org/lkml/2019/9/12/151

I.e. allow putting mdev drivers outside above directory.


> > +
> > +   for (queue_id = 0; queue_id < m->nvqs; queue_id++) {
> > +   vq = >vqs[queue_id];
> > +
> > +   if (!vq->desc || !vq->avail || !vq->used)
> > +   break;
> > +
> > +   virtio_mdev_writel(mdev, VIRTIO_MDEV_QUEUE_NUM, vq->num);
> > +
> > +   if (!vhost_translate_ring_addr(vq, (u64)vq->desc,
> > +  vhost_get_desc_size(vq, vq->num),
> > +  ))
> > +   return -EINVAL;
> 
> 
> Interesting, any reason for doing such kinds of translation to HVA? I
> believe the add should already an IOVA that has been map by VFIO.

Currently, in the software based vhost-kernel and vhost-user
backends, QEMU will pass ring addresses as HVA in SET_VRING_ADDR
ioctl when iotlb isn't enabled. If it's OK to let QEMU pass GPA
in vhost-mdev in this case, then this translation won't be needed.

Thanks,
Tiwei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v4 0/3] vhost: introduce mdev based hardware backend

2019-09-19 Thread Tiwei Bie
On Fri, Sep 20, 2019 at 09:30:58AM +0800, Jason Wang wrote:
> On 2019/9/19 下午11:45, Tiwei Bie wrote:
> > On Thu, Sep 19, 2019 at 09:08:11PM +0800, Jason Wang wrote:
> > > On 2019/9/18 下午10:32, Michael S. Tsirkin wrote:
> > > > > > > So I have some questions:
> > > > > > > 
> > > > > > > 1) Compared to method 2, what's the advantage of creating a new 
> > > > > > > vhost char
> > > > > > > device? I guess it's for keep the API compatibility?
> > > > > > One benefit is that we can avoid doing vhost ioctls on
> > > > > > VFIO device fd.
> > > > > Yes, but any benefit from doing this?
> > > > It does seem a bit more modular, but it's certainly not a big deal.
> > > Ok, if we go this way, it could be as simple as provide some callback to
> > > vhost, then vhost can just forward the ioctl through parent_ops.
> > > 
> > > > > > > 2) For method 2, is there any easy way for user/admin to 
> > > > > > > distinguish e.g
> > > > > > > ordinary vfio-mdev for vhost from ordinary vfio-mdev?
> > > > > > I think device-api could be a choice.
> > > > > Ok.
> > > > > 
> > > > > 
> > > > > > > I saw you introduce
> > > > > > > ops matching helper but it's not friendly to management.
> > > > > > The ops matching helper is just to check whether a given
> > > > > > vfio-device is based on a mdev device.
> > > > > > 
> > > > > > > 3) A drawback of 1) and 2) is that it must follow vfio_device_ops 
> > > > > > > that
> > > > > > > assumes the parameter comes from userspace, it prevents support 
> > > > > > > kernel
> > > > > > > virtio drivers.
> > > > > > > 
> > > > > > > 4) So comes the idea of method 3, since it register a new 
> > > > > > > vhost-mdev driver,
> > > > > > > we can use device specific ops instead of VFIO ones, then we can 
> > > > > > > have a
> > > > > > > common API between vDPA parent and vhost-mdev/virtio-mdev drivers.
> > > > > > As the above draft shows, this requires introducing a new
> > > > > > VFIO device driver. I think Alex's opinion matters here.
> > > Just to clarify, a new type of mdev driver but provides dummy
> > > vfio_device_ops for VFIO to make container DMA ioctl work.
> > I see. Thanks! IIUC, you mean we can provide a very tiny
> > VFIO device driver in drivers/vhost/mdev.c, e.g.:
> > 
> > static int vfio_vhost_mdev_open(void *device_data)
> > {
> > if (!try_module_get(THIS_MODULE))
> > return -ENODEV;
> > return 0;
> > }
> > 
> > static void vfio_vhost_mdev_release(void *device_data)
> > {
> > module_put(THIS_MODULE);
> > }
> > 
> > static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = {
> > .name   = "vfio-vhost-mdev",
> > .open   = vfio_vhost_mdev_open,
> > .release= vfio_vhost_mdev_release,
> > };
> > 
> > static int vhost_mdev_probe(struct device *dev)
> > {
> > struct mdev_device *mdev = to_mdev_device(dev);
> > 
> > ... Check the mdev device_id proposed in ...
> > ... https://lkml.org/lkml/2019/9/12/151 ...
> 
> 
> To clarify, this should be done through the id_table fields in
> vhost_mdev_driver, and it should claim it supports virtio-mdev device only:
> 
> 
> static struct mdev_class_id id_table[] = {
>     { MDEV_ID_VIRTIO },
>     { 0 },
> };
> 
> 
> static struct mdev_driver vhost_mdev_driver = {
>     ...
>     .id_table = id_table,
> }

In this way, both of virtio-mdev and vhost-mdev will try to
take this device. We may want a way to let vhost-mdev take this
device only when users explicitly ask it to do it. Or maybe we
can have a different MDEV_ID for vhost-mdev but share the device
ops with virtio-mdev.

> 
> 
> > 
> > return vfio_add_group_dev(dev, _vhost_mdev_dev_ops, mdev);
> 
> 
> And in vfio_vhost_mdev_ops, all its need is to just implement vhost-net
> ioctl and translate them to virtio-mdev transport (e.g device_ops I proposed
> or ioctls other whatever other method) API.

I see, so my previous understanding is basically correct:

https://lkml.org/lkml/2019/9/17/332

I.e. we won't have a separate vhost fd and we will do all 

Re: [RFC v4 0/3] vhost: introduce mdev based hardware backend

2019-09-19 Thread Tiwei Bie
On Thu, Sep 19, 2019 at 09:08:11PM +0800, Jason Wang wrote:
> On 2019/9/18 下午10:32, Michael S. Tsirkin wrote:
> > > > > So I have some questions:
> > > > > 
> > > > > 1) Compared to method 2, what's the advantage of creating a new vhost 
> > > > > char
> > > > > device? I guess it's for keep the API compatibility?
> > > > One benefit is that we can avoid doing vhost ioctls on
> > > > VFIO device fd.
> > > Yes, but any benefit from doing this?
> > It does seem a bit more modular, but it's certainly not a big deal.
> 
> Ok, if we go this way, it could be as simple as provide some callback to
> vhost, then vhost can just forward the ioctl through parent_ops.
> 
> > 
> > > > > 2) For method 2, is there any easy way for user/admin to distinguish 
> > > > > e.g
> > > > > ordinary vfio-mdev for vhost from ordinary vfio-mdev?
> > > > I think device-api could be a choice.
> > > Ok.
> > > 
> > > 
> > > > > I saw you introduce
> > > > > ops matching helper but it's not friendly to management.
> > > > The ops matching helper is just to check whether a given
> > > > vfio-device is based on a mdev device.
> > > > 
> > > > > 3) A drawback of 1) and 2) is that it must follow vfio_device_ops that
> > > > > assumes the parameter comes from userspace, it prevents support kernel
> > > > > virtio drivers.
> > > > > 
> > > > > 4) So comes the idea of method 3, since it register a new vhost-mdev 
> > > > > driver,
> > > > > we can use device specific ops instead of VFIO ones, then we can have 
> > > > > a
> > > > > common API between vDPA parent and vhost-mdev/virtio-mdev drivers.
> > > > As the above draft shows, this requires introducing a new
> > > > VFIO device driver. I think Alex's opinion matters here.
> 
> Just to clarify, a new type of mdev driver but provides dummy
> vfio_device_ops for VFIO to make container DMA ioctl work.

I see. Thanks! IIUC, you mean we can provide a very tiny
VFIO device driver in drivers/vhost/mdev.c, e.g.:

static int vfio_vhost_mdev_open(void *device_data)
{
if (!try_module_get(THIS_MODULE))
return -ENODEV;
return 0;
}

static void vfio_vhost_mdev_release(void *device_data)
{
module_put(THIS_MODULE);
}

static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = {
.name   = "vfio-vhost-mdev",
.open   = vfio_vhost_mdev_open,
.release= vfio_vhost_mdev_release,
};

static int vhost_mdev_probe(struct device *dev)
{
struct mdev_device *mdev = to_mdev_device(dev);

... Check the mdev device_id proposed in ...
... https://lkml.org/lkml/2019/9/12/151 ...

return vfio_add_group_dev(dev, _vhost_mdev_dev_ops, mdev);
}

static void vhost_mdev_remove(struct device *dev)
{
vfio_del_group_dev(dev);
}

static struct mdev_driver vhost_mdev_driver = {
.name   = "vhost_mdev",
.probe  = vhost_mdev_probe,
.remove = vhost_mdev_remove,
};

So we can bind above mdev driver to the virtio-mdev compatible
mdev devices when we want to use vhost-mdev.

After binding above driver to the mdev device, we can setup IOMMU
via VFIO and get VFIO device fd of this mdev device, and pass it
to vhost fd (/dev/vhost-mdev) with a SET_BACKEND ioctl.

Thanks,
Tiwei

> 
> Thanks
> 
> 
> > > Yes, it is.
> > > 
> > > Thanks
> > > 
> > > 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v4 0/3] vhost: introduce mdev based hardware backend

2019-09-17 Thread Tiwei Bie
On Tue, Sep 17, 2019 at 11:32:03AM +0800, Jason Wang wrote:
> On 2019/9/17 上午9:02, Tiwei Bie wrote:
> > This RFC is to demonstrate below ideas,
> > 
> > a) Build vhost-mdev on top of the same abstraction defined in
> > the virtio-mdev series [1];
> > 
> > b) Introduce /dev/vhost-mdev to do vhost ioctls and support
> > setting mdev device as backend;
> > 
> > Now the userspace API looks like this:
> > 
> > - Userspace generates a compatible mdev device;
> > 
> > - Userspace opens this mdev device with VFIO API (including
> >doing IOMMU programming for this mdev device with VFIO's
> >container/group based interface);
> > 
> > - Userspace opens /dev/vhost-mdev and gets vhost fd;
> > 
> > - Userspace uses vhost ioctls to setup vhost (userspace should
> >do VHOST_MDEV_SET_BACKEND ioctl with VFIO group fd and device
> >fd first before doing other vhost ioctls);
> > 
> > Only compile test has been done for this series for now.
> 
> 
> Have a hard thought on the architecture:

Thanks a lot! Do appreciate it!

> 
> 1) Create a vhost char device and pass vfio mdev device fd to it as a
> backend and translate vhost-mdev ioctl to virtio mdev transport (e.g
> read/write). DMA was done through the VFIO DMA mapping on the container that
> is attached.

Yeah, that's what we are doing in this series.

> 
> We have two more choices:
> 
> 2) Use vfio-mdev but do not create vhost-mdev device, instead, just
> implement vhost ioctl on vfio_device_ops, and translate them into
> virtio-mdev transport or just pass ioctl to parent.

Yeah. Instead of introducing /dev/vhost-mdev char device, do
vhost ioctls on VFIO device fd directly. That's what we did
in RFC v3.

> 
> 3) Don't use vfio-mdev, create a new vhost-mdev driver, during probe still
> try to add dev to vfio group and talk to parent with device specific ops

If my understanding is correct, this means we need to introduce
a new VFIO device driver to replace the existing vfio-mdev driver
in our case. Below is a quick draft just to show my understanding:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include "mdev_private.h"

/* XXX: we need a proper way to include below vhost header. */
#include "../../vhost/vhost.h"

static int vfio_vhost_mdev_open(void *device_data)
{
if (!try_module_get(THIS_MODULE))
return -ENODEV;

/* ... */
vhost_dev_init(...);

return 0;
}

static void vfio_vhost_mdev_release(void *device_data)
{
/* ... */
module_put(THIS_MODULE);
}

static long vfio_vhost_mdev_unlocked_ioctl(void *device_data,
   unsigned int cmd, unsigned long arg)
{
struct mdev_device *mdev = device_data;
struct mdev_parent *parent = mdev->parent;

/*
 * Use vhost ioctls.
 *
 * We will have a different parent_ops design.
 * And potentially, we can share the same parent_ops
 * with virtio_mdev.
 */
switch (cmd) {
case VHOST_GET_FEATURES:
parent->ops->get_features(mdev, ...);
break;
/* ... */
}

return 0;
}

static ssize_t vfio_vhost_mdev_read(void *device_data, char __user *buf,
size_t count, loff_t *ppos)
{
/* ... */
return 0;
}

static ssize_t vfio_vhost_mdev_write(void *device_data, const char __user *buf,
 size_t count, loff_t *ppos)
{
/* ... */
return 0;
}

static int vfio_vhost_mdev_mmap(void *device_data, struct vm_area_struct *vma)
{
/* ... */
return 0;
}

static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = {
.name   = "vfio-vhost-mdev",
.open   = vfio_vhost_mdev_open,
.release= vfio_vhost_mdev_release,
.ioctl  = vfio_vhost_mdev_unlocked_ioctl,
.read   = vfio_vhost_mdev_read,
.write  = vfio_vhost_mdev_write,
.mmap   = vfio_vhost_mdev_mmap,
};

static int vfio_vhost_mdev_probe(struct device *dev)
{
struct mdev_device *mdev = to_mdev_device(dev);

/* ... */
return vfio_add_group_dev(dev, _vhost_mdev_dev_ops, mdev);
}

static void vfio_vhost_mdev_remove(struct device *dev)
{
/* ... */
vfio_del_group_dev(dev);
}

static struct mdev_driver vfio_vhost_mdev_driver = {
.name   = "vfio_vhost_mdev",
.probe  = vfio_vhost_mdev_probe,
.remove = vfio_vhost_mdev_remove,
};

static int __init vfio_vhost_mdev_init(void)
{
return mdev_register_driver(_vhost_mdev_driver, THIS_MODULE);
}
module_init(vfio_vhost_mdev_init)

static void __exit vfio_vhost

[RFC v4 3/3] vhost: introduce mdev based hardware backend

2019-09-16 Thread Tiwei Bie
More details about this patch can be found from the cover
letter for now. Only compile test has been done for now.

Signed-off-by: Tiwei Bie 
---
 drivers/vhost/Kconfig|   9 +
 drivers/vhost/Makefile   |   3 +
 drivers/vhost/mdev.c | 462 +++
 drivers/vhost/vhost.c|  39 ++-
 drivers/vhost/vhost.h|   6 +
 include/uapi/linux/vhost.h   |  10 +
 include/uapi/linux/vhost_types.h |   5 +
 7 files changed, 528 insertions(+), 6 deletions(-)
 create mode 100644 drivers/vhost/mdev.c

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 3d03ccbd1adc..ef9783156d2e 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -34,6 +34,15 @@ config VHOST_VSOCK
To compile this driver as a module, choose M here: the module will be 
called
vhost_vsock.
 
+config VHOST_MDEV
+   tristate "Mediated device based hardware vhost accelerator"
+   depends on EVENTFD && VFIO && VFIO_MDEV
+   select VHOST
+   default n
+   ---help---
+   Say Y here to enable the vhost_mdev module
+   for use with hardware vhost accelerators
+
 config VHOST
tristate
---help---
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 6c6df24f770c..ad9c0f8c6d8c 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -10,4 +10,7 @@ vhost_vsock-y := vsock.o
 
 obj-$(CONFIG_VHOST_RING) += vringh.o
 
+obj-$(CONFIG_VHOST_MDEV) += vhost_mdev.o
+vhost_mdev-y := mdev.o
+
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/drivers/vhost/mdev.c b/drivers/vhost/mdev.c
new file mode 100644
index ..8c6597aff45e
--- /dev/null
+++ b/drivers/vhost/mdev.c
@@ -0,0 +1,462 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018-2019 Intel Corporation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vhost.h"
+
+struct vhost_mdev {
+   struct mutex mutex;
+   struct vhost_dev dev;
+   struct vhost_virtqueue *vqs;
+   int nvqs;
+   u64 state;
+   u64 features;
+   u64 acked_features;
+   struct vfio_group *vfio_group;
+   struct vfio_device *vfio_device;
+   struct mdev_device *mdev;
+};
+
+/*
+ * XXX
+ * We assume virtio_mdev.ko exposes below symbols for now, as we
+ * don't have a proper way to access parent ops directly yet.
+ *
+ * virtio_mdev_readl()
+ * virtio_mdev_writel()
+ */
+extern u32 virtio_mdev_readl(struct mdev_device *mdev, loff_t off);
+extern void virtio_mdev_writel(struct mdev_device *mdev, loff_t off, u32 val);
+
+static u8 mdev_get_status(struct mdev_device *mdev)
+{
+   return virtio_mdev_readl(mdev, VIRTIO_MDEV_STATUS);
+}
+
+static void mdev_set_status(struct mdev_device *mdev, u8 status)
+{
+   virtio_mdev_writel(mdev, VIRTIO_MDEV_STATUS, status);
+}
+
+static void mdev_add_status(struct mdev_device *mdev, u8 status)
+{
+   status |= mdev_get_status(mdev);
+   mdev_set_status(mdev, status);
+}
+
+static void mdev_reset(struct mdev_device *mdev)
+{
+   mdev_set_status(mdev, 0);
+}
+
+static void handle_vq_kick(struct vhost_work *work)
+{
+   struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
+ poll.work);
+   struct vhost_mdev *m = container_of(vq->dev, struct vhost_mdev, dev);
+
+   virtio_mdev_writel(m->mdev, VIRTIO_MDEV_QUEUE_NOTIFY, vq - m->vqs);
+}
+
+static long vhost_mdev_start_backend(struct vhost_mdev *m)
+{
+   struct mdev_device *mdev = m->mdev;
+   u64 features = m->acked_features;
+   u64 addr;
+   struct vhost_virtqueue *vq;
+   int queue_id;
+
+   features |= 1ULL << VIRTIO_F_IOMMU_PLATFORM;
+
+   virtio_mdev_writel(mdev, VIRTIO_MDEV_DRIVER_FEATURES_SEL, 1);
+   virtio_mdev_writel(mdev, VIRTIO_MDEV_DRIVER_FEATURES,
+  (u32)(features >> 32));
+
+   virtio_mdev_writel(mdev, VIRTIO_MDEV_DRIVER_FEATURES_SEL, 0);
+   virtio_mdev_writel(mdev, VIRTIO_MDEV_DRIVER_FEATURES,
+  (u32)features);
+
+   mdev_add_status(mdev, VIRTIO_CONFIG_S_FEATURES_OK);
+   if (!(mdev_get_status(mdev) & VIRTIO_CONFIG_S_FEATURES_OK))
+   return -ENODEV;
+
+   for (queue_id = 0; queue_id < m->nvqs; queue_id++) {
+   vq = >vqs[queue_id];
+
+   if (!vq->desc || !vq->avail || !vq->used)
+   break;
+
+   virtio_mdev_writel(mdev, VIRTIO_MDEV_QUEUE_NUM, vq->num);
+
+   if (!vhost_translate_ring_addr(vq, (u64)vq->desc,
+  vhost_get_desc_size(vq, vq->num),
+  ))
+   return -EINVAL;
+
+   virtio_mdev_writel(mdev, VIRTIO_MDEV_QUEUE_DESC_LOW, addr);
+   virtio

[RFC v4 1/3] vfio: support getting vfio device from device fd

2019-09-16 Thread Tiwei Bie
This patch introduces the support for getting VFIO device
from VFIO device fd. With this support, it's possible for
vhost to get VFIO device from the group fd and device fd
set by the userspace.

Signed-off-by: Tiwei Bie 
---
 drivers/vfio/vfio.c  | 25 +
 include/linux/vfio.h |  4 
 2 files changed, 29 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 388597930b64..697fd079bb3f 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -890,6 +890,31 @@ static struct vfio_device 
*vfio_device_get_from_name(struct vfio_group *group,
return device;
 }
 
+struct vfio_device *vfio_device_get_from_fd(struct vfio_group *group,
+   int device_fd)
+{
+   struct fd f;
+   struct vfio_device *it, *device = ERR_PTR(-ENODEV);
+
+   f = fdget(device_fd);
+   if (!f.file)
+   return ERR_PTR(-EBADF);
+
+   mutex_lock(>device_lock);
+   list_for_each_entry(it, >device_list, group_next) {
+   if (it == f.file->private_data) {
+   device = it;
+   vfio_device_get(device);
+   break;
+   }
+   }
+   mutex_unlock(>device_lock);
+
+   fdput(f);
+   return device;
+}
+EXPORT_SYMBOL_GPL(vfio_device_get_from_fd);
+
 /*
  * Caller must hold a reference to the vfio_device
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e42a711a2800..e75b24fd7c5c 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -15,6 +15,8 @@
 #include 
 #include 
 
+struct vfio_group;
+
 /**
  * struct vfio_device_ops - VFIO bus driver device callbacks
  *
@@ -50,6 +52,8 @@ extern int vfio_add_group_dev(struct device *dev,
 
 extern void *vfio_del_group_dev(struct device *dev);
 extern struct vfio_device *vfio_device_get_from_dev(struct device *dev);
+extern struct vfio_device *vfio_device_get_from_fd(struct vfio_group *group,
+  int device_fd);
 extern void vfio_device_put(struct vfio_device *device);
 extern void *vfio_device_data(struct vfio_device *device);
 
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC v4 2/3] vfio: support checking vfio driver by device ops

2019-09-16 Thread Tiwei Bie
This patch introduces the support for checking the VFIO driver
by device ops. And vfio-mdev's device ops is also exported to
make it possible to check whether a VFIO device is based on a
mdev device.

Signed-off-by: Tiwei Bie 
---
 drivers/vfio/mdev/vfio_mdev.c | 3 ++-
 drivers/vfio/vfio.c   | 7 +++
 include/linux/vfio.h  | 7 +++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
index 30964a4e0a28..e0f31c5a5db2 100644
--- a/drivers/vfio/mdev/vfio_mdev.c
+++ b/drivers/vfio/mdev/vfio_mdev.c
@@ -98,7 +98,7 @@ static int vfio_mdev_mmap(void *device_data, struct 
vm_area_struct *vma)
return parent->ops->mmap(mdev, vma);
 }
 
-static const struct vfio_device_ops vfio_mdev_dev_ops = {
+const struct vfio_device_ops vfio_mdev_dev_ops = {
.name   = "vfio-mdev",
.open   = vfio_mdev_open,
.release= vfio_mdev_release,
@@ -107,6 +107,7 @@ static const struct vfio_device_ops vfio_mdev_dev_ops = {
.write  = vfio_mdev_write,
.mmap   = vfio_mdev_mmap,
 };
+EXPORT_SYMBOL_GPL(vfio_mdev_dev_ops);
 
 static int vfio_mdev_probe(struct device *dev)
 {
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 697fd079bb3f..1145110909e4 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1806,6 +1806,13 @@ long vfio_external_check_extension(struct vfio_group 
*group, unsigned long arg)
 }
 EXPORT_SYMBOL_GPL(vfio_external_check_extension);
 
+bool vfio_device_ops_match(struct vfio_device *device,
+  const struct vfio_device_ops *ops)
+{
+   return device->ops == ops;
+}
+EXPORT_SYMBOL_GPL(vfio_device_ops_match);
+
 /**
  * Sub-module support
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e75b24fd7c5c..741c5bb567a8 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -56,6 +56,8 @@ extern struct vfio_device *vfio_device_get_from_fd(struct 
vfio_group *group,
   int device_fd);
 extern void vfio_device_put(struct vfio_device *device);
 extern void *vfio_device_data(struct vfio_device *device);
+extern bool vfio_device_ops_match(struct vfio_device *device,
+ const struct vfio_device_ops *ops);
 
 /**
  * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks
@@ -199,4 +201,9 @@ extern int vfio_virqfd_enable(void *opaque,
  void *data, struct virqfd **pvirqfd, int fd);
 extern void vfio_virqfd_disable(struct virqfd **pvirqfd);
 
+/*
+ * VFIO device ops
+ */
+extern const struct vfio_device_ops vfio_mdev_dev_ops;
+
 #endif /* VFIO_H */
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC v4 0/3] vhost: introduce mdev based hardware backend

2019-09-16 Thread Tiwei Bie
This RFC is to demonstrate below ideas,

a) Build vhost-mdev on top of the same abstraction defined in
   the virtio-mdev series [1];

b) Introduce /dev/vhost-mdev to do vhost ioctls and support
   setting mdev device as backend;

Now the userspace API looks like this:

- Userspace generates a compatible mdev device;

- Userspace opens this mdev device with VFIO API (including
  doing IOMMU programming for this mdev device with VFIO's
  container/group based interface);

- Userspace opens /dev/vhost-mdev and gets vhost fd;

- Userspace uses vhost ioctls to setup vhost (userspace should
  do VHOST_MDEV_SET_BACKEND ioctl with VFIO group fd and device
  fd first before doing other vhost ioctls);

Only compile test has been done for this series for now.

RFCv3: https://patchwork.kernel.org/patch/7785/

[1] https://lkml.org/lkml/2019/9/10/135

Tiwei Bie (3):
  vfio: support getting vfio device from device fd
  vfio: support checking vfio driver by device ops
  vhost: introduce mdev based hardware backend

 drivers/vfio/mdev/vfio_mdev.c|   3 +-
 drivers/vfio/vfio.c  |  32 +++
 drivers/vhost/Kconfig|   9 +
 drivers/vhost/Makefile   |   3 +
 drivers/vhost/mdev.c | 462 +++
 drivers/vhost/vhost.c|  39 ++-
 drivers/vhost/vhost.h|   6 +
 include/linux/vfio.h |  11 +
 include/uapi/linux/vhost.h   |  10 +
 include/uapi/linux/vhost_types.h |   5 +
 10 files changed, 573 insertions(+), 7 deletions(-)
 create mode 100644 drivers/vhost/mdev.c

-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 3/4] virtio: introudce a mdev based transport

2019-09-10 Thread Tiwei Bie
On Wed, Sep 11, 2019 at 10:52:03AM +0800, Jason Wang wrote:
> On 2019/9/11 上午9:47, Tiwei Bie wrote:
> > On Tue, Sep 10, 2019 at 04:19:34PM +0800, Jason Wang wrote:
> > > This path introduces a new mdev transport for virtio. This is used to
> > > use kernel virtio driver to drive the mediated device that is capable
> > > of populating virtqueue directly.
> > > 
> > > A new virtio-mdev driver will be registered to the mdev bus, when a
> > > new virtio-mdev device is probed, it will register the device with
> > > mdev based config ops. This means, unlike the exist hardware
> > > transport, this is a software transport between mdev driver and mdev
> > > device. The transport was implemented through:
> > > 
> > > - configuration access was implemented through parent_ops->read()/write()
> > > - vq/config callback was implemented through parent_ops->ioctl()
> > > 
> > > This transport is derived from virtio MMIO protocol and was wrote for
> > > kernel driver. But for the transport itself, but the design goal is to
> > > be generic enough to support userspace driver (this part will be added
> > > in the future).
> > > 
> > > Note:
> > > - current mdev assume all the parameter of parent_ops was from
> > >userspace. This prevents us from implementing the kernel mdev
> > >driver. For a quick POC, this patch just abuse those parameter and
> > >assume the mdev device implementation will treat them as kernel
> > >pointer. This should be addressed in the formal series by extending
> > >mdev_parent_ops.
> > > - for a quick POC, I just drive the transport from MMIO, I'm pretty
> > >there's lot of optimization space for this.
> > > 
> > > Signed-off-by: Jason Wang 
> > > ---
> > >   drivers/vfio/mdev/Kconfig|   7 +
> > >   drivers/vfio/mdev/Makefile   |   1 +
> > >   drivers/vfio/mdev/virtio_mdev.c  | 500 +++
> > >   include/uapi/linux/virtio_mdev.h | 131 
> > >   4 files changed, 639 insertions(+)
> > >   create mode 100644 drivers/vfio/mdev/virtio_mdev.c
> > >   create mode 100644 include/uapi/linux/virtio_mdev.h
> > > 
> > [...]
> > > diff --git a/include/uapi/linux/virtio_mdev.h 
> > > b/include/uapi/linux/virtio_mdev.h
> > > new file mode 100644
> > > index ..8040de6b960a
> > > --- /dev/null
> > > +++ b/include/uapi/linux/virtio_mdev.h
> > > @@ -0,0 +1,131 @@
> > > +/*
> > > + * Virtio mediated device driver
> > > + *
> > > + * Copyright 2019, Red Hat Corp.
> > > + *
> > > + * Based on Virtio MMIO driver by ARM Ltd, copyright ARM Ltd. 2011
> > > + *
> > > + * This header is BSD licensed so anyone can use the definitions to 
> > > implement
> > > + * compatible drivers/servers.
> > > + *
> > > + * Redistribution and use in source and binary forms, with or without
> > > + * modification, are permitted provided that the following conditions
> > > + * are met:
> > > + * 1. Redistributions of source code must retain the above copyright
> > > + *notice, this list of conditions and the following disclaimer.
> > > + * 2. Redistributions in binary form must reproduce the above copyright
> > > + *notice, this list of conditions and the following disclaimer in the
> > > + *documentation and/or other materials provided with the 
> > > distribution.
> > > + * 3. Neither the name of IBM nor the names of its contributors
> > > + *may be used to endorse or promote products derived from this 
> > > software
> > > + *without specific prior written permission.
> > > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 
> > > ``AS IS'' AND
> > > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> > > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 
> > > PURPOSE
> > > + * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
> > > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 
> > > CONSEQUENTIAL
> > > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE 
> > > GOODS
> > > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> > > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 
> > > STRICT
> > > + * LIABILITY, OR TORT (IN

Re: [RFC PATCH 3/4] virtio: introudce a mdev based transport

2019-09-10 Thread Tiwei Bie
On Tue, Sep 10, 2019 at 04:19:34PM +0800, Jason Wang wrote:
> This path introduces a new mdev transport for virtio. This is used to
> use kernel virtio driver to drive the mediated device that is capable
> of populating virtqueue directly.
> 
> A new virtio-mdev driver will be registered to the mdev bus, when a
> new virtio-mdev device is probed, it will register the device with
> mdev based config ops. This means, unlike the exist hardware
> transport, this is a software transport between mdev driver and mdev
> device. The transport was implemented through:
> 
> - configuration access was implemented through parent_ops->read()/write()
> - vq/config callback was implemented through parent_ops->ioctl()
> 
> This transport is derived from virtio MMIO protocol and was wrote for
> kernel driver. But for the transport itself, but the design goal is to
> be generic enough to support userspace driver (this part will be added
> in the future).
> 
> Note:
> - current mdev assume all the parameter of parent_ops was from
>   userspace. This prevents us from implementing the kernel mdev
>   driver. For a quick POC, this patch just abuse those parameter and
>   assume the mdev device implementation will treat them as kernel
>   pointer. This should be addressed in the formal series by extending
>   mdev_parent_ops.
> - for a quick POC, I just drive the transport from MMIO, I'm pretty
>   there's lot of optimization space for this.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vfio/mdev/Kconfig|   7 +
>  drivers/vfio/mdev/Makefile   |   1 +
>  drivers/vfio/mdev/virtio_mdev.c  | 500 +++
>  include/uapi/linux/virtio_mdev.h | 131 
>  4 files changed, 639 insertions(+)
>  create mode 100644 drivers/vfio/mdev/virtio_mdev.c
>  create mode 100644 include/uapi/linux/virtio_mdev.h
> 
[...]
> diff --git a/include/uapi/linux/virtio_mdev.h 
> b/include/uapi/linux/virtio_mdev.h
> new file mode 100644
> index ..8040de6b960a
> --- /dev/null
> +++ b/include/uapi/linux/virtio_mdev.h
> @@ -0,0 +1,131 @@
> +/*
> + * Virtio mediated device driver
> + *
> + * Copyright 2019, Red Hat Corp.
> + *
> + * Based on Virtio MMIO driver by ARM Ltd, copyright ARM Ltd. 2011
> + *
> + * This header is BSD licensed so anyone can use the definitions to implement
> + * compatible drivers/servers.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *notice, this list of conditions and the following disclaimer in the
> + *documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of IBM nor the names of its contributors
> + *may be used to endorse or promote products derived from this software
> + *without specific prior written permission.
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS 
> IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#ifndef _LINUX_VIRTIO_MDEV_H
> +#define _LINUX_VIRTIO_MDEV_H
> +
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * Ioctls
> + */
> +
> +struct virtio_mdev_callback {
> + irqreturn_t (*callback)(void *);
> + void *private;
> +};
> +
> +#define VIRTIO_MDEV 0xAF
> +#define VIRTIO_MDEV_SET_VQ_CALLBACK _IOW(VIRTIO_MDEV, 0x00, \
> +  struct virtio_mdev_callback)
> +#define VIRTIO_MDEV_SET_CONFIG_CALLBACK _IOW(VIRTIO_MDEV, 0x01, \
> + struct virtio_mdev_callback)
> +
> +#define VIRTIO_MDEV_DEVICE_API_STRING"virtio-mdev"
> +
> +/*
> + * Control registers
> + */
> +
> +/* Magic value ("virt" string) - Read Only */
> +#define VIRTIO_MDEV_MAGIC_VALUE  0x000
> +
> +/* Virtio device version - Read Only */
> +#define VIRTIO_MDEV_VERSION  0x004
> +
> +/* Virtio device ID - Read Only */
> +#define VIRTIO_MDEV_DEVICE_ID0x008
> +
> +/* Virtio vendor ID - Read Only */
> +#define VIRTIO_MDEV_VENDOR_ID0x00c
> +
> +/* Bitmask of the features 

Re: [RFC v3] vhost: introduce mdev based hardware vhost backend

2019-09-03 Thread Tiwei Bie
On Tue, Sep 03, 2019 at 07:26:03AM -0400, Michael S. Tsirkin wrote:
> On Wed, Aug 28, 2019 at 01:37:12PM +0800, Tiwei Bie wrote:
> > Details about this can be found here:
> > 
> > https://lwn.net/Articles/750770/
> > 
> > What's new in this version
> > ==
> > 
> > There are three choices based on the discussion [1] in RFC v2:
> > 
> > > #1. We expose a VFIO device, so we can reuse the VFIO container/group
> > > based DMA API and potentially reuse a lot of VFIO code in QEMU.
> > >
> > > But in this case, we have two choices for the VFIO device interface
> > > (i.e. the interface on top of VFIO device fd):
> > >
> > > A) we may invent a new vhost protocol (as demonstrated by the code
> > >in this RFC) on VFIO device fd to make it work in VFIO's way,
> > >i.e. regions and irqs.
> > >
> > > B) Or as you proposed, instead of inventing a new vhost protocol,
> > >we can reuse most existing vhost ioctls on the VFIO device fd
> > >directly. There should be no conflicts between the VFIO ioctls
> > >(type is 0x3B) and VHOST ioctls (type is 0xAF) currently.
> > >
> > > #2. Instead of exposing a VFIO device, we may expose a VHOST device.
> > > And we will introduce a new mdev driver vhost-mdev to do this.
> > > It would be natural to reuse the existing kernel vhost interface
> > > (ioctls) on it as much as possible. But we will need to invent
> > > some APIs for DMA programming (reusing VHOST_SET_MEM_TABLE is a
> > > choice, but it's too heavy and doesn't support vIOMMU by itself).
> > 
> > This version is more like a quick PoC to try Jason's proposal on
> > reusing vhost ioctls. And the second way (#1/B) in above three
> > choices was chosen in this version to demonstrate the idea quickly.
> > 
> > Now the userspace API looks like this:
> > 
> > - VFIO's container/group based IOMMU API is used to do the
> >   DMA programming.
> > 
> > - Vhost's existing ioctls are used to setup the device.
> > 
> > And the device will report device_api as "vfio-vhost".
> > 
> > Note that, there are dirty hacks in this version. If we decide to
> > go this way, some refactoring in vhost.c/vhost.h may be needed.
> > 
> > PS. The direct mapping of the notify registers isn't implemented
> > in this version.
> > 
> > [1] https://lkml.org/lkml/2019/7/9/101
> > 
> > Signed-off-by: Tiwei Bie 
> 
> 
> 
> > +long vhost_mdev_ioctl(struct mdev_device *mdev, unsigned int cmd,
> > + unsigned long arg)
> > +{
> > +   void __user *argp = (void __user *)arg;
> > +   struct vhost_mdev *vdpa;
> > +   unsigned long minsz;
> > +   int ret = 0;
> > +
> > +   if (!mdev)
> > +   return -EINVAL;
> > +
> > +   vdpa = mdev_get_drvdata(mdev);
> > +   if (!vdpa)
> > +   return -ENODEV;
> > +
> > +   switch (cmd) {
> > +   case VFIO_DEVICE_GET_INFO:
> > +   {
> > +   struct vfio_device_info info;
> > +
> > +   minsz = offsetofend(struct vfio_device_info, num_irqs);
> > +
> > +   if (copy_from_user(, (void __user *)arg, minsz)) {
> > +   ret = -EFAULT;
> > +   break;
> > +   }
> > +
> > +   if (info.argsz < minsz) {
> > +   ret = -EINVAL;
> > +   break;
> > +   }
> > +
> > +   info.flags = VFIO_DEVICE_FLAGS_VHOST;
> > +   info.num_regions = 0;
> > +   info.num_irqs = 0;
> > +
> > +   if (copy_to_user((void __user *)arg, , minsz)) {
> > +   ret = -EFAULT;
> > +   break;
> > +   }
> > +
> > +   break;
> > +   }
> > +   case VFIO_DEVICE_GET_REGION_INFO:
> > +   case VFIO_DEVICE_GET_IRQ_INFO:
> > +   case VFIO_DEVICE_SET_IRQS:
> > +   case VFIO_DEVICE_RESET:
> > +   ret = -EINVAL;
> > +   break;
> > +
> > +   case VHOST_MDEV_SET_STATE:
> > +   ret = vhost_set_state(vdpa, argp);
> > +   break;
> > +   case VHOST_GET_FEATURES:
> > +   ret = vhost_get_features(vdpa, argp);
> > +   break;
> > +   case VHOST_SET_FEATURES:
> > +   ret = vhost_set_features(vdpa, argp);
> > +   break;
> > +   case VHOST_GET_VRING_BASE:
> > +

Re: [RFC v3] vhost: introduce mdev based hardware vhost backend

2019-09-02 Thread Tiwei Bie
On Mon, Sep 02, 2019 at 12:15:05PM +0800, Jason Wang wrote:
> On 2019/8/28 下午1:37, Tiwei Bie wrote:
> > Details about this can be found here:
> > 
> > https://lwn.net/Articles/750770/
> > 
> > What's new in this version
> > ==
> > 
> > There are three choices based on the discussion [1] in RFC v2:
> > 
> > > #1. We expose a VFIO device, so we can reuse the VFIO container/group
> > >  based DMA API and potentially reuse a lot of VFIO code in QEMU.
> > > 
> > >  But in this case, we have two choices for the VFIO device interface
> > >  (i.e. the interface on top of VFIO device fd):
> > > 
> > >  A) we may invent a new vhost protocol (as demonstrated by the code
> > > in this RFC) on VFIO device fd to make it work in VFIO's way,
> > > i.e. regions and irqs.
> > > 
> > >  B) Or as you proposed, instead of inventing a new vhost protocol,
> > > we can reuse most existing vhost ioctls on the VFIO device fd
> > > directly. There should be no conflicts between the VFIO ioctls
> > > (type is 0x3B) and VHOST ioctls (type is 0xAF) currently.
> > > 
> > > #2. Instead of exposing a VFIO device, we may expose a VHOST device.
> > >  And we will introduce a new mdev driver vhost-mdev to do this.
> > >  It would be natural to reuse the existing kernel vhost interface
> > >  (ioctls) on it as much as possible. But we will need to invent
> > >  some APIs for DMA programming (reusing VHOST_SET_MEM_TABLE is a
> > >  choice, but it's too heavy and doesn't support vIOMMU by itself).
> > This version is more like a quick PoC to try Jason's proposal on
> > reusing vhost ioctls. And the second way (#1/B) in above three
> > choices was chosen in this version to demonstrate the idea quickly.
> > 
> > Now the userspace API looks like this:
> > 
> > - VFIO's container/group based IOMMU API is used to do the
> >DMA programming.
> > 
> > - Vhost's existing ioctls are used to setup the device.
> > 
> > And the device will report device_api as "vfio-vhost".
> > 
> > Note that, there are dirty hacks in this version. If we decide to
> > go this way, some refactoring in vhost.c/vhost.h may be needed.
> > 
> > PS. The direct mapping of the notify registers isn't implemented
> >  in this version.
> > 
> > [1] https://lkml.org/lkml/2019/7/9/101
> 
> 
> Thanks for the patch, see comments inline.
> 
> 
> > 
> > Signed-off-by: Tiwei Bie 
> > ---
> >   drivers/vhost/Kconfig  |   9 +
> >   drivers/vhost/Makefile |   3 +
> >   drivers/vhost/mdev.c   | 382 +
> >   include/linux/vhost_mdev.h |  58 ++
> >   include/uapi/linux/vfio.h  |   2 +
> >   include/uapi/linux/vhost.h |   8 +
> >   6 files changed, 462 insertions(+)
> >   create mode 100644 drivers/vhost/mdev.c
> >   create mode 100644 include/linux/vhost_mdev.h
[...]
> > +
> > +   break;
> > +   }
> > +   case VFIO_DEVICE_GET_REGION_INFO:
> > +   case VFIO_DEVICE_GET_IRQ_INFO:
> > +   case VFIO_DEVICE_SET_IRQS:
> > +   case VFIO_DEVICE_RESET:
> > +   ret = -EINVAL;
> > +   break;
> > +
> > +   case VHOST_MDEV_SET_STATE:
> > +   ret = vhost_set_state(vdpa, argp);
> > +   break;
> 
> 
> So this is used to start or stop the device. This means if userspace want to
> drive a network device, the API is not 100% compatible. Any blocker for
> this? E.g for SET_BACKEND, we can pass a fd and then identify the type of
> backend.

This is a legacy from the previous RFC code. I didn't try to
get rid of it while getting this POC to work. I can try to make
the vhost ioctls fully compatible with the existing userspace
if possible.

> 
> Another question is, how can user know the type of a device?

Maybe we can introduce an attribute in $UUID/ to tell the type.

> 
> 
> > +   case VHOST_GET_FEATURES:
> > +   ret = vhost_get_features(vdpa, argp);
> > +   break;
> > +   case VHOST_SET_FEATURES:
> > +   ret = vhost_set_features(vdpa, argp);
> > +   break;
> > +   case VHOST_GET_VRING_BASE:
> > +   ret = vhost_get_vring_base(vdpa, argp);
> > +   break;
> > +   default:
> > +   ret = vhost_dev_ioctl(>dev, cmd, argp);
> > +   if (ret == -ENOIOCTLCMD)
> > +   ret = vhost_vring_ioctl(>dev, cmd, 

[RFC v3] vhost: introduce mdev based hardware vhost backend

2019-08-27 Thread Tiwei Bie
Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

There are three choices based on the discussion [1] in RFC v2:

> #1. We expose a VFIO device, so we can reuse the VFIO container/group
> based DMA API and potentially reuse a lot of VFIO code in QEMU.
>
> But in this case, we have two choices for the VFIO device interface
> (i.e. the interface on top of VFIO device fd):
>
> A) we may invent a new vhost protocol (as demonstrated by the code
>in this RFC) on VFIO device fd to make it work in VFIO's way,
>i.e. regions and irqs.
>
> B) Or as you proposed, instead of inventing a new vhost protocol,
>we can reuse most existing vhost ioctls on the VFIO device fd
>directly. There should be no conflicts between the VFIO ioctls
>(type is 0x3B) and VHOST ioctls (type is 0xAF) currently.
>
> #2. Instead of exposing a VFIO device, we may expose a VHOST device.
> And we will introduce a new mdev driver vhost-mdev to do this.
> It would be natural to reuse the existing kernel vhost interface
> (ioctls) on it as much as possible. But we will need to invent
> some APIs for DMA programming (reusing VHOST_SET_MEM_TABLE is a
> choice, but it's too heavy and doesn't support vIOMMU by itself).

This version is more like a quick PoC to try Jason's proposal on
reusing vhost ioctls. And the second way (#1/B) in above three
choices was chosen in this version to demonstrate the idea quickly.

Now the userspace API looks like this:

- VFIO's container/group based IOMMU API is used to do the
  DMA programming.

- Vhost's existing ioctls are used to setup the device.

And the device will report device_api as "vfio-vhost".

Note that, there are dirty hacks in this version. If we decide to
go this way, some refactoring in vhost.c/vhost.h may be needed.

PS. The direct mapping of the notify registers isn't implemented
in this version.

[1] https://lkml.org/lkml/2019/7/9/101

Signed-off-by: Tiwei Bie 
---
 drivers/vhost/Kconfig  |   9 +
 drivers/vhost/Makefile |   3 +
 drivers/vhost/mdev.c   | 382 +
 include/linux/vhost_mdev.h |  58 ++
 include/uapi/linux/vfio.h  |   2 +
 include/uapi/linux/vhost.h |   8 +
 6 files changed, 462 insertions(+)
 create mode 100644 drivers/vhost/mdev.c
 create mode 100644 include/linux/vhost_mdev.h

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 3d03ccbd1adc..2ba54fcf43b7 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -34,6 +34,15 @@ config VHOST_VSOCK
To compile this driver as a module, choose M here: the module will be 
called
vhost_vsock.
 
+config VHOST_MDEV
+   tristate "Hardware vhost accelerator abstraction"
+   depends on EVENTFD && VFIO && VFIO_MDEV
+   select VHOST
+   default n
+   ---help---
+   Say Y here to enable the vhost_mdev module
+   for use with hardware vhost accelerators
+
 config VHOST
tristate
---help---
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 6c6df24f770c..ad9c0f8c6d8c 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -10,4 +10,7 @@ vhost_vsock-y := vsock.o
 
 obj-$(CONFIG_VHOST_RING) += vringh.o
 
+obj-$(CONFIG_VHOST_MDEV) += vhost_mdev.o
+vhost_mdev-y := mdev.o
+
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/drivers/vhost/mdev.c b/drivers/vhost/mdev.c
new file mode 100644
index ..6bef1d9ae2e6
--- /dev/null
+++ b/drivers/vhost/mdev.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018-2019 Intel Corporation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vhost.h"
+
+struct vhost_mdev {
+   struct vhost_dev dev;
+   bool opened;
+   int nvqs;
+   u64 state;
+   u64 acked_features;
+   u64 features;
+   const struct vhost_mdev_device_ops *ops;
+   struct mdev_device *mdev;
+   void *private;
+   struct vhost_virtqueue vqs[];
+};
+
+static void handle_vq_kick(struct vhost_work *work)
+{
+   struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
+ poll.work);
+   struct vhost_mdev *vdpa = container_of(vq->dev, struct vhost_mdev, dev);
+
+   vdpa->ops->notify(vdpa, vq - vdpa->vqs);
+}
+
+static int vhost_set_state(struct vhost_mdev *vdpa, u64 __user *statep)
+{
+   u64 state;
+
+   if (copy_from_user(, statep, sizeof(state)))
+   return -EFAULT;
+
+   if (state >= VHOST_MDEV_S_MAX)
+   return -EINVAL;
+
+   if (vdpa->state == state)
+   return 0;
+
+   mutex_lock(>dev.mutex);
+
+   vdpa->state = state;
+
+   switch (vdpa->stat

[PATCH 2/2] vhost/test: fix build for vhost test

2019-08-27 Thread Tiwei Bie
Since vhost_exceeds_weight() was introduced, callers need to specify
the packet weight and byte weight in vhost_dev_init(). Note that, the
packet weight isn't counted in this patch to keep the original behavior
unchanged.

Fixes: e82b9b0727ff ("vhost: introduce vhost_exceeds_weight()")
Cc: sta...@vger.kernel.org
Signed-off-by: Tiwei Bie 
---
 drivers/vhost/test.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index ac4f762c4f65..7804869c6a31 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -22,6 +22,12 @@
  * Using this limit prevents one virtqueue from starving others. */
 #define VHOST_TEST_WEIGHT 0x8
 
+/* Max number of packets transferred before requeueing the job.
+ * Using this limit prevents one virtqueue from starving others with
+ * pkts.
+ */
+#define VHOST_TEST_PKT_WEIGHT 256
+
 enum {
VHOST_TEST_VQ = 0,
VHOST_TEST_VQ_MAX = 1,
@@ -80,10 +86,8 @@ static void handle_vq(struct vhost_test *n)
}
vhost_add_used_and_signal(>dev, vq, head, 0);
total_len += len;
-   if (unlikely(total_len >= VHOST_TEST_WEIGHT)) {
-   vhost_poll_queue(>poll);
+   if (unlikely(vhost_exceeds_weight(vq, 0, total_len)))
break;
-   }
}
 
mutex_unlock(>mutex);
@@ -115,7 +119,8 @@ static int vhost_test_open(struct inode *inode, struct file 
*f)
dev = >dev;
vqs[VHOST_TEST_VQ] = >vqs[VHOST_TEST_VQ];
n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
-   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV);
+   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
+  VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT);
 
f->private_data = n;
 
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/2] vhost/test: fix build for vhost test

2019-08-27 Thread Tiwei Bie
Since below commit, callers need to specify the iov_limit in
vhost_dev_init() explicitly.

Fixes: b46a0bf78ad7 ("vhost: fix OOB in get_rx_bufs()")
Cc: sta...@vger.kernel.org
Signed-off-by: Tiwei Bie 
---
 drivers/vhost/test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 9e90e969af55..ac4f762c4f65 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -115,7 +115,7 @@ static int vhost_test_open(struct inode *inode, struct file 
*f)
dev = >dev;
vqs[VHOST_TEST_VQ] = >vqs[VHOST_TEST_VQ];
n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
-   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX);
+   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV);
 
f->private_data = n;
 
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-10 Thread Tiwei Bie
On Wed, Jul 10, 2019 at 10:26:10AM +0800, Jason Wang wrote:
> On 2019/7/9 下午2:33, Tiwei Bie wrote:
> > On Tue, Jul 09, 2019 at 10:50:38AM +0800, Jason Wang wrote:
> > > On 2019/7/8 下午2:16, Tiwei Bie wrote:
> > > > On Fri, Jul 05, 2019 at 08:49:46AM -0600, Alex Williamson wrote:
> > > > > On Thu, 4 Jul 2019 14:21:34 +0800
> > > > > Tiwei Bie  wrote:
> > > > > > On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> > > > > > > On 2019/7/3 下午9:08, Tiwei Bie wrote:
> > > > > > > > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> > > > > > > > > On 2019/7/3 下午7:52, Tiwei Bie wrote:
> > > > > > > > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > > > > > > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > > > > > > > > > > > Details about this can be found here:
> > > > > > > > > > > > 
> > > > > > > > > > > > https://lwn.net/Articles/750770/
> > > > > > > > > > > > 
> > > > > > > > > > > > What's new in this version
> > > > > > > > > > > > ==
> > > > > > > > > > > > 
> > > > > > > > > > > > A new VFIO device type is introduced - vfio-vhost. This 
> > > > > > > > > > > > addressed
> > > > > > > > > > > > some comments from 
> > > > > > > > > > > > here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > > > > > > > 
> > > > > > > > > > > > Below is the updated device interface:
> > > > > > > > > > > > 
> > > > > > > > > > > > Currently, there are two regions of this device: 1) 
> > > > > > > > > > > > CONFIG_REGION
> > > > > > > > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to 
> > > > > > > > > > > > setup the
> > > > > > > > > > > > device; 2) NOTIFY_REGION 
> > > > > > > > > > > > (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > > > > > > > > > can be used to notify the device.
> > > > > > > > > > > > 
> > > > > > > > > > > > 1. CONFIG_REGION
> > > > > > > > > > > > 
> > > > > > > > > > > > The region described by CONFIG_REGION is the main 
> > > > > > > > > > > > control interface.
> > > > > > > > > > > > Messages will be written to or read from this region.
> > > > > > > > > > > > 
> > > > > > > > > > > > The message type is determined by the `request` field 
> > > > > > > > > > > > in message
> > > > > > > > > > > > header. The message size is encoded in the message 
> > > > > > > > > > > > header too.
> > > > > > > > > > > > The message format looks like this:
> > > > > > > > > > > > 
> > > > > > > > > > > > struct vhost_vfio_op {
> > > > > > > > > > > > __u64 request;
> > > > > > > > > > > > __u32 flags;
> > > > > > > > > > > > /* Flag values: */
> > > > > > > > > > > >   #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need 
> > > > > > > > > > > > reply */
> > > > > > > > > > > > __u32 size;
> > > > > > > > > > > > union {
> > > > > > > > > > > > __u64 u64;
> > > > > > > > > > > > struct vhost_vring_state state;
> > > > > > > > > > > > struct vhost_vring_addr addr;
> > > > > > > > > > > > } payload;
> > > > > > > > > > > > };
> > > > &

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-09 Thread Tiwei Bie
On Tue, Jul 09, 2019 at 10:50:38AM +0800, Jason Wang wrote:
> On 2019/7/8 下午2:16, Tiwei Bie wrote:
> > On Fri, Jul 05, 2019 at 08:49:46AM -0600, Alex Williamson wrote:
> > > On Thu, 4 Jul 2019 14:21:34 +0800
> > > Tiwei Bie  wrote:
> > > > On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> > > > > On 2019/7/3 下午9:08, Tiwei Bie wrote:
> > > > > > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> > > > > > > On 2019/7/3 下午7:52, Tiwei Bie wrote:
> > > > > > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > > > > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > > > > > > > > > Details about this can be found here:
> > > > > > > > > > 
> > > > > > > > > > https://lwn.net/Articles/750770/
> > > > > > > > > > 
> > > > > > > > > > What's new in this version
> > > > > > > > > > ==
> > > > > > > > > > 
> > > > > > > > > > A new VFIO device type is introduced - vfio-vhost. This 
> > > > > > > > > > addressed
> > > > > > > > > > some comments from 
> > > > > > > > > > here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > > > > > 
> > > > > > > > > > Below is the updated device interface:
> > > > > > > > > > 
> > > > > > > > > > Currently, there are two regions of this device: 1) 
> > > > > > > > > > CONFIG_REGION
> > > > > > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to 
> > > > > > > > > > setup the
> > > > > > > > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), 
> > > > > > > > > > which
> > > > > > > > > > can be used to notify the device.
> > > > > > > > > > 
> > > > > > > > > > 1. CONFIG_REGION
> > > > > > > > > > 
> > > > > > > > > > The region described by CONFIG_REGION is the main control 
> > > > > > > > > > interface.
> > > > > > > > > > Messages will be written to or read from this region.
> > > > > > > > > > 
> > > > > > > > > > The message type is determined by the `request` field in 
> > > > > > > > > > message
> > > > > > > > > > header. The message size is encoded in the message header 
> > > > > > > > > > too.
> > > > > > > > > > The message format looks like this:
> > > > > > > > > > 
> > > > > > > > > > struct vhost_vfio_op {
> > > > > > > > > > __u64 request;
> > > > > > > > > > __u32 flags;
> > > > > > > > > > /* Flag values: */
> > > > > > > > > >  #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need 
> > > > > > > > > > reply */
> > > > > > > > > > __u32 size;
> > > > > > > > > > union {
> > > > > > > > > > __u64 u64;
> > > > > > > > > > struct vhost_vring_state state;
> > > > > > > > > > struct vhost_vring_addr addr;
> > > > > > > > > > } payload;
> > > > > > > > > > };
> > > > > > > > > > 
> > > > > > > > > > The existing vhost-kernel ioctl cmds are reused as the 
> > > > > > > > > > message
> > > > > > > > > > requests in above structure.
> > > > > > > > > Still a comments like V1. What's the advantage of inventing a 
> > > > > > > > > new protocol?
> > > > > > > > I'm trying to make it work in VFIO's way..
> > > > > > > > > I believe either of the following should be better:
> > > > > > > > > 
> > > > > > > > > - using vhost ioctl, 

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-08 Thread Tiwei Bie
On Fri, Jul 05, 2019 at 08:49:46AM -0600, Alex Williamson wrote:
> On Thu, 4 Jul 2019 14:21:34 +0800
> Tiwei Bie  wrote:
> > On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> > > On 2019/7/3 下午9:08, Tiwei Bie wrote:  
> > > > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:  
> > > > > On 2019/7/3 下午7:52, Tiwei Bie wrote:  
> > > > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:  
> > > > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:  
> > > > > > > > Details about this can be found here:
> > > > > > > > 
> > > > > > > > https://lwn.net/Articles/750770/
> > > > > > > > 
> > > > > > > > What's new in this version
> > > > > > > > ==
> > > > > > > > 
> > > > > > > > A new VFIO device type is introduced - vfio-vhost. This 
> > > > > > > > addressed
> > > > > > > > some comments from 
> > > > > > > > here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > > > 
> > > > > > > > Below is the updated device interface:
> > > > > > > > 
> > > > > > > > Currently, there are two regions of this device: 1) 
> > > > > > > > CONFIG_REGION
> > > > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > > > > > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > > > > > can be used to notify the device.
> > > > > > > > 
> > > > > > > > 1. CONFIG_REGION
> > > > > > > > 
> > > > > > > > The region described by CONFIG_REGION is the main control 
> > > > > > > > interface.
> > > > > > > > Messages will be written to or read from this region.
> > > > > > > > 
> > > > > > > > The message type is determined by the `request` field in message
> > > > > > > > header. The message size is encoded in the message header too.
> > > > > > > > The message format looks like this:
> > > > > > > > 
> > > > > > > > struct vhost_vfio_op {
> > > > > > > > __u64 request;
> > > > > > > > __u32 flags;
> > > > > > > > /* Flag values: */
> > > > > > > > #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > > > > > > > __u32 size;
> > > > > > > > union {
> > > > > > > > __u64 u64;
> > > > > > > > struct vhost_vring_state state;
> > > > > > > > struct vhost_vring_addr addr;
> > > > > > > > } payload;
> > > > > > > > };
> > > > > > > > 
> > > > > > > > The existing vhost-kernel ioctl cmds are reused as the message
> > > > > > > > requests in above structure.  
> > > > > > > Still a comments like V1. What's the advantage of inventing a new 
> > > > > > > protocol?  
> > > > > > I'm trying to make it work in VFIO's way..
> > > > > >   
> > > > > > > I believe either of the following should be better:
> > > > > > > 
> > > > > > > - using vhost ioctl,  we can start from 
> > > > > > > SET_VRING_KICK/SET_VRING_CALL and
> > > > > > > extend it with e.g notify region. The advantages is that all 
> > > > > > > exist userspace
> > > > > > > program could be reused without modification (or minimal 
> > > > > > > modification). And
> > > > > > > vhost API hides lots of details that is not necessary to be 
> > > > > > > understood by
> > > > > > > application (e.g in the case of container).  
> > > > > > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > > > > > or introducing another mdev driver (i.e. vhost_mdev instead of
> > > > > > using the existing vfio_mdev) for mdev device?  
> > > > > Can we simply add them i

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-04 Thread Tiwei Bie
On Fri, Jul 05, 2019 at 08:30:00AM +0800, Jason Wang wrote:
> On 2019/7/4 下午3:02, Tiwei Bie wrote:
> > On Thu, Jul 04, 2019 at 02:35:20PM +0800, Jason Wang wrote:
> > > On 2019/7/4 下午2:21, Tiwei Bie wrote:
> > > > On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> > > > > On 2019/7/3 下午9:08, Tiwei Bie wrote:
> > > > > > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> > > > > > > On 2019/7/3 下午7:52, Tiwei Bie wrote:
> > > > > > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > > > > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > > > > > > > > > Details about this can be found here:
> > > > > > > > > > 
> > > > > > > > > > https://lwn.net/Articles/750770/
> > > > > > > > > > 
> > > > > > > > > > What's new in this version
> > > > > > > > > > ==
> > > > > > > > > > 
> > > > > > > > > > A new VFIO device type is introduced - vfio-vhost. This 
> > > > > > > > > > addressed
> > > > > > > > > > some comments from 
> > > > > > > > > > here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > > > > > 
> > > > > > > > > > Below is the updated device interface:
> > > > > > > > > > 
> > > > > > > > > > Currently, there are two regions of this device: 1) 
> > > > > > > > > > CONFIG_REGION
> > > > > > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to 
> > > > > > > > > > setup the
> > > > > > > > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), 
> > > > > > > > > > which
> > > > > > > > > > can be used to notify the device.
> > > > > > > > > > 
> > > > > > > > > > 1. CONFIG_REGION
> > > > > > > > > > 
> > > > > > > > > > The region described by CONFIG_REGION is the main control 
> > > > > > > > > > interface.
> > > > > > > > > > Messages will be written to or read from this region.
> > > > > > > > > > 
> > > > > > > > > > The message type is determined by the `request` field in 
> > > > > > > > > > message
> > > > > > > > > > header. The message size is encoded in the message header 
> > > > > > > > > > too.
> > > > > > > > > > The message format looks like this:
> > > > > > > > > > 
> > > > > > > > > > struct vhost_vfio_op {
> > > > > > > > > > __u64 request;
> > > > > > > > > > __u32 flags;
> > > > > > > > > > /* Flag values: */
> > > > > > > > > >   #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need 
> > > > > > > > > > reply */
> > > > > > > > > > __u32 size;
> > > > > > > > > > union {
> > > > > > > > > > __u64 u64;
> > > > > > > > > > struct vhost_vring_state state;
> > > > > > > > > > struct vhost_vring_addr addr;
> > > > > > > > > > } payload;
> > > > > > > > > > };
> > > > > > > > > > 
> > > > > > > > > > The existing vhost-kernel ioctl cmds are reused as the 
> > > > > > > > > > message
> > > > > > > > > > requests in above structure.
> > > > > > > > > Still a comments like V1. What's the advantage of inventing a 
> > > > > > > > > new protocol?
> > > > > > > > I'm trying to make it work in VFIO's way..
> > > > > > > > 
> > > > > > > > > I believe either of the following should be better:
> > > > > > > > > 
> > > > > > > > > - using 

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-04 Thread Tiwei Bie
On Thu, Jul 04, 2019 at 02:35:20PM +0800, Jason Wang wrote:
> On 2019/7/4 下午2:21, Tiwei Bie wrote:
> > On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> > > On 2019/7/3 下午9:08, Tiwei Bie wrote:
> > > > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> > > > > On 2019/7/3 下午7:52, Tiwei Bie wrote:
> > > > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > > > > > > > Details about this can be found here:
> > > > > > > > 
> > > > > > > > https://lwn.net/Articles/750770/
> > > > > > > > 
> > > > > > > > What's new in this version
> > > > > > > > ==
> > > > > > > > 
> > > > > > > > A new VFIO device type is introduced - vfio-vhost. This 
> > > > > > > > addressed
> > > > > > > > some comments from 
> > > > > > > > here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > > > 
> > > > > > > > Below is the updated device interface:
> > > > > > > > 
> > > > > > > > Currently, there are two regions of this device: 1) 
> > > > > > > > CONFIG_REGION
> > > > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > > > > > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > > > > > can be used to notify the device.
> > > > > > > > 
> > > > > > > > 1. CONFIG_REGION
> > > > > > > > 
> > > > > > > > The region described by CONFIG_REGION is the main control 
> > > > > > > > interface.
> > > > > > > > Messages will be written to or read from this region.
> > > > > > > > 
> > > > > > > > The message type is determined by the `request` field in message
> > > > > > > > header. The message size is encoded in the message header too.
> > > > > > > > The message format looks like this:
> > > > > > > > 
> > > > > > > > struct vhost_vfio_op {
> > > > > > > > __u64 request;
> > > > > > > > __u32 flags;
> > > > > > > > /* Flag values: */
> > > > > > > >  #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > > > > > > > __u32 size;
> > > > > > > > union {
> > > > > > > > __u64 u64;
> > > > > > > > struct vhost_vring_state state;
> > > > > > > > struct vhost_vring_addr addr;
> > > > > > > > } payload;
> > > > > > > > };
> > > > > > > > 
> > > > > > > > The existing vhost-kernel ioctl cmds are reused as the message
> > > > > > > > requests in above structure.
> > > > > > > Still a comments like V1. What's the advantage of inventing a new 
> > > > > > > protocol?
> > > > > > I'm trying to make it work in VFIO's way..
> > > > > > 
> > > > > > > I believe either of the following should be better:
> > > > > > > 
> > > > > > > - using vhost ioctl,  we can start from 
> > > > > > > SET_VRING_KICK/SET_VRING_CALL and
> > > > > > > extend it with e.g notify region. The advantages is that all 
> > > > > > > exist userspace
> > > > > > > program could be reused without modification (or minimal 
> > > > > > > modification). And
> > > > > > > vhost API hides lots of details that is not necessary to be 
> > > > > > > understood by
> > > > > > > application (e.g in the case of container).
> > > > > > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > > > > > or introducing another mdev driver (i.e. vhost_mdev instead of
> > > > > > using the existing vfio_mdev) for mdev device?
> > > > > Can we simply add them into ioctl of mdev_parent_ops?
> > > &

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-04 Thread Tiwei Bie
On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> On 2019/7/3 下午9:08, Tiwei Bie wrote:
> > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> > > On 2019/7/3 下午7:52, Tiwei Bie wrote:
> > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > > > > > Details about this can be found here:
> > > > > > 
> > > > > > https://lwn.net/Articles/750770/
> > > > > > 
> > > > > > What's new in this version
> > > > > > ==
> > > > > > 
> > > > > > A new VFIO device type is introduced - vfio-vhost. This addressed
> > > > > > some comments from here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > 
> > > > > > Below is the updated device interface:
> > > > > > 
> > > > > > Currently, there are two regions of this device: 1) CONFIG_REGION
> > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > > > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > > > can be used to notify the device.
> > > > > > 
> > > > > > 1. CONFIG_REGION
> > > > > > 
> > > > > > The region described by CONFIG_REGION is the main control interface.
> > > > > > Messages will be written to or read from this region.
> > > > > > 
> > > > > > The message type is determined by the `request` field in message
> > > > > > header. The message size is encoded in the message header too.
> > > > > > The message format looks like this:
> > > > > > 
> > > > > > struct vhost_vfio_op {
> > > > > > __u64 request;
> > > > > > __u32 flags;
> > > > > > /* Flag values: */
> > > > > > #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > > > > > __u32 size;
> > > > > > union {
> > > > > > __u64 u64;
> > > > > > struct vhost_vring_state state;
> > > > > > struct vhost_vring_addr addr;
> > > > > > } payload;
> > > > > > };
> > > > > > 
> > > > > > The existing vhost-kernel ioctl cmds are reused as the message
> > > > > > requests in above structure.
> > > > > Still a comments like V1. What's the advantage of inventing a new 
> > > > > protocol?
> > > > I'm trying to make it work in VFIO's way..
> > > > 
> > > > > I believe either of the following should be better:
> > > > > 
> > > > > - using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL 
> > > > > and
> > > > > extend it with e.g notify region. The advantages is that all exist 
> > > > > userspace
> > > > > program could be reused without modification (or minimal 
> > > > > modification). And
> > > > > vhost API hides lots of details that is not necessary to be 
> > > > > understood by
> > > > > application (e.g in the case of container).
> > > > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > > > or introducing another mdev driver (i.e. vhost_mdev instead of
> > > > using the existing vfio_mdev) for mdev device?
> > > Can we simply add them into ioctl of mdev_parent_ops?
> > Right, either way, these ioctls have to be and just need to be
> > added in the ioctl of the mdev_parent_ops. But another thing we
> > also need to consider is that which file descriptor the userspace
> > will do the ioctl() on. So I'm wondering do you mean let the
> > userspace do the ioctl() on the VFIO device fd of the mdev
> > device?
> > 
> 
> Yes.

Got it! I'm not sure what's Alex opinion on this. If we all
agree with this, I can do it in this way.

> Is there any other way btw?

Just a quick thought.. Maybe totally a bad idea. I was thinking
whether it would be odd to do non-VFIO's ioctls on VFIO's device
fd. So I was wondering whether it's possible to allow binding
another mdev driver (e.g. vhost_mdev) to the supported mdev
devices. The new mdev driver, vhost_mdev, can provide similar
ways to let userspace open the mdev device and do the vhost ioctls
on it. To distinguish with the vfio_mdev compatible mdev devices,
the device API of the new vhost_mdev compatible mdev devices
might be e.g. "vhost-net" for net?

So in VFIO case, the device will be for passthru directly. And
in VHOST case, the device can be used to accelerate the existing
virtualized devices.

How do you think?

Thanks,
Tiwei
> 
> Thanks
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Tiwei Bie
On Wed, Jul 03, 2019 at 12:31:57PM -0600, Alex Williamson wrote:
> On Wed,  3 Jul 2019 17:13:39 +0800
> Tiwei Bie  wrote:
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 8f10748dac79..6c5718ab7eeb 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -201,6 +201,7 @@ struct vfio_device_info {
> >  #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)   /* vfio-amba device */
> >  #define VFIO_DEVICE_FLAGS_CCW  (1 << 4)/* vfio-ccw device */
> >  #define VFIO_DEVICE_FLAGS_AP   (1 << 5)/* vfio-ap device */
> > +#define VFIO_DEVICE_FLAGS_VHOST(1 << 6)/* vfio-vhost device */
> > __u32   num_regions;/* Max region index + 1 */
> > __u32   num_irqs;   /* Max IRQ index + 1 */
> >  };
> > @@ -217,6 +218,7 @@ struct vfio_device_info {
> >  #define VFIO_DEVICE_API_AMBA_STRING"vfio-amba"
> >  #define VFIO_DEVICE_API_CCW_STRING "vfio-ccw"
> >  #define VFIO_DEVICE_API_AP_STRING  "vfio-ap"
> > +#define VFIO_DEVICE_API_VHOST_STRING   "vfio-vhost"
> >  
> >  /**
> >   * VFIO_DEVICE_GET_REGION_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 8,
> > @@ -573,6 +575,23 @@ enum {
> > VFIO_CCW_NUM_IRQS
> >  };
> >  
> > +/*
> > + * The vfio-vhost bus driver makes use of the following fixed region and
> > + * IRQ index mapping. Unimplemented regions return a size of zero.
> > + * Unimplemented IRQ types return a count of zero.
> > + */
> > +
> > +enum {
> > +   VFIO_VHOST_CONFIG_REGION_INDEX,
> > +   VFIO_VHOST_NOTIFY_REGION_INDEX,
> > +   VFIO_VHOST_NUM_REGIONS
> > +};
> > +
> > +enum {
> > +   VFIO_VHOST_VQ_IRQ_INDEX,
> > +   VFIO_VHOST_NUM_IRQS
> > +};
> > +
> 
> Note that the vfio API has evolved a bit since vfio-pci started this
> way, with fixed indexes for pre-defined region types.  We now support
> device specific regions which can be identified by a capability within
> the REGION_INFO ioctl return data.  This allows a bit more flexibility,
> at the cost of complexity, but the infrastructure already exists in
> kernel and QEMU to make it relatively easy.  I think we'll have the
> same support for interrupts soon too.  If you continue to pursue the
> vfio-vhost direction you might want to consider these before committing
> to fixed indexes.  Thanks,

Thanks for the details! Will give it a try!

Thanks,
Tiwei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Tiwei Bie
On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> On 2019/7/3 下午7:52, Tiwei Bie wrote:
> > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > > > Details about this can be found here:
> > > > 
> > > > https://lwn.net/Articles/750770/
> > > > 
> > > > What's new in this version
> > > > ==
> > > > 
> > > > A new VFIO device type is introduced - vfio-vhost. This addressed
> > > > some comments from here: https://patchwork.ozlabs.org/cover/984763/
> > > > 
> > > > Below is the updated device interface:
> > > > 
> > > > Currently, there are two regions of this device: 1) CONFIG_REGION
> > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > can be used to notify the device.
> > > > 
> > > > 1. CONFIG_REGION
> > > > 
> > > > The region described by CONFIG_REGION is the main control interface.
> > > > Messages will be written to or read from this region.
> > > > 
> > > > The message type is determined by the `request` field in message
> > > > header. The message size is encoded in the message header too.
> > > > The message format looks like this:
> > > > 
> > > > struct vhost_vfio_op {
> > > > __u64 request;
> > > > __u32 flags;
> > > > /* Flag values: */
> > > >#define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > > > __u32 size;
> > > > union {
> > > > __u64 u64;
> > > > struct vhost_vring_state state;
> > > > struct vhost_vring_addr addr;
> > > > } payload;
> > > > };
> > > > 
> > > > The existing vhost-kernel ioctl cmds are reused as the message
> > > > requests in above structure.
> > > 
> > > Still a comments like V1. What's the advantage of inventing a new 
> > > protocol?
> > I'm trying to make it work in VFIO's way..
> > 
> > > I believe either of the following should be better:
> > > 
> > > - using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
> > > extend it with e.g notify region. The advantages is that all exist 
> > > userspace
> > > program could be reused without modification (or minimal modification). 
> > > And
> > > vhost API hides lots of details that is not necessary to be understood by
> > > application (e.g in the case of container).
> > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > or introducing another mdev driver (i.e. vhost_mdev instead of
> > using the existing vfio_mdev) for mdev device?
> 
> 
> Can we simply add them into ioctl of mdev_parent_ops?

Right, either way, these ioctls have to be and just need to be
added in the ioctl of the mdev_parent_ops. But another thing we
also need to consider is that which file descriptor the userspace
will do the ioctl() on. So I'm wondering do you mean let the
userspace do the ioctl() on the VFIO device fd of the mdev
device?

> 
> 
> > 
[...]
> > > > 3. VFIO interrupt ioctl API
> > > > 
> > > > VFIO interrupt ioctl API is used to setup device interrupts.
> > > > IRQ-bypass can also be supported.
> > > > 
> > > > Currently, the data path interrupt can be configured via the
> > > > VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.
> > > 
> > > How about DMA API? Do you expect to use VFIO IOMMU API or using vhost
> > > SET_MEM_TABLE? VFIO IOMMU API is more generic for sure but with
> > > SET_MEM_TABLE DMA can be done at the level of parent device which means it
> > > can work for e.g the card with on-chip IOMMU.
> > Agree. In this RFC, it assumes userspace will use VFIO IOMMU API
> > to do the DMA programming. But like what you said, there could be
> > a problem when using cards with on-chip IOMMU.
> 
> 
> Yes, another issue is SET_MEM_TABLE can not be used to update just a part of
> the table. This seems less flexible than VFIO API but it could be extended.

Agree.

> 
> 
> > 
> > > And what's the plan for vIOMMU?
> > As this RFC assumes userspace will use VFIO IOMMU API, userspace
> > just needs to follow the same way like what vfio-pci device does
> > in QEMU to

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Tiwei Bie
On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > Details about this can be found here:
> > 
> > https://lwn.net/Articles/750770/
> > 
> > What's new in this version
> > ==
> > 
> > A new VFIO device type is introduced - vfio-vhost. This addressed
> > some comments from here: https://patchwork.ozlabs.org/cover/984763/
> > 
> > Below is the updated device interface:
> > 
> > Currently, there are two regions of this device: 1) CONFIG_REGION
> > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > can be used to notify the device.
> > 
> > 1. CONFIG_REGION
> > 
> > The region described by CONFIG_REGION is the main control interface.
> > Messages will be written to or read from this region.
> > 
> > The message type is determined by the `request` field in message
> > header. The message size is encoded in the message header too.
> > The message format looks like this:
> > 
> > struct vhost_vfio_op {
> > __u64 request;
> > __u32 flags;
> > /* Flag values: */
> >   #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > __u32 size;
> > union {
> > __u64 u64;
> > struct vhost_vring_state state;
> > struct vhost_vring_addr addr;
> > } payload;
> > };
> > 
> > The existing vhost-kernel ioctl cmds are reused as the message
> > requests in above structure.
> 
> 
> Still a comments like V1. What's the advantage of inventing a new protocol?

I'm trying to make it work in VFIO's way..

> I believe either of the following should be better:
> 
> - using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
> extend it with e.g notify region. The advantages is that all exist userspace
> program could be reused without modification (or minimal modification). And
> vhost API hides lots of details that is not necessary to be understood by
> application (e.g in the case of container).

Do you mean reusing vhost's ioctl on VFIO device fd directly,
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?

> 
> - using PCI layout, then you don't even need to re-invent notifiy region at
> all and we can pass-through them to guest.

Like what you said previously, virtio has transports other than PCI.
And it will look a bit odd when using transports other than PCI..

> 
> Personally, I prefer vhost ioctl.

+1

> 
> 
> > 
[...]
> > 
> > 3. VFIO interrupt ioctl API
> > 
> > VFIO interrupt ioctl API is used to setup device interrupts.
> > IRQ-bypass can also be supported.
> > 
> > Currently, the data path interrupt can be configured via the
> > VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.
> 
> 
> How about DMA API? Do you expect to use VFIO IOMMU API or using vhost
> SET_MEM_TABLE? VFIO IOMMU API is more generic for sure but with
> SET_MEM_TABLE DMA can be done at the level of parent device which means it
> can work for e.g the card with on-chip IOMMU.

Agree. In this RFC, it assumes userspace will use VFIO IOMMU API
to do the DMA programming. But like what you said, there could be
a problem when using cards with on-chip IOMMU.

> 
> And what's the plan for vIOMMU?

As this RFC assumes userspace will use VFIO IOMMU API, userspace
just needs to follow the same way like what vfio-pci device does
in QEMU to support vIOMMU.

> 
> 
> > 
> > Signed-off-by: Tiwei Bie 
> > ---
> >   drivers/vhost/Makefile |   2 +
> >   drivers/vhost/vdpa.c   | 770 +
> >   include/linux/vdpa_mdev.h  |  72 
> >   include/uapi/linux/vfio.h  |  19 +
> >   include/uapi/linux/vhost.h |  25 ++
> >   5 files changed, 888 insertions(+)
> >   create mode 100644 drivers/vhost/vdpa.c
> >   create mode 100644 include/linux/vdpa_mdev.h
> 
> 
> We probably need some sample parent device implementation. It could be a
> software datapath like e.g we can start from virtio-net device in guest or a
> vhost/tap on host.

Yeah, something like this would be interesting!

Thanks,
Tiwei

> 
> Thanks
> 
> 
> > 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Tiwei Bie
Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here: https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
 #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.

Each message will be written to or read from this region at offset 0:

int vhost_vfio_write(struct vhost_dev *dev, struct vhost_vfio_op *op)
{
int count = VHOST_VFIO_OP_HDR_SIZE + op->size;
struct vhost_vfio *vfio = dev->opaque;
int ret;

ret = pwrite64(vfio->device_fd, op, count, vfio->config_offset);
if (ret != count)
return -1;

return 0;
}

int vhost_vfio_read(struct vhost_dev *dev, struct vhost_vfio_op *op)
{
int count = VHOST_VFIO_OP_HDR_SIZE + op->size;
struct vhost_vfio *vfio = dev->opaque;
uint64_t request = op->request;
int ret;

ret = pread64(vfio->device_fd, op, count, vfio->config_offset);
if (ret != count || request != op->request)
return -1;

return 0;
}

It's quite straightforward to set things to the device. Just need to
write the message to device directly:

int vhost_vfio_set_features(struct vhost_dev *dev, uint64_t features)
{
struct vhost_vfio_op op;

op.request = VHOST_SET_FEATURES;
op.flags = 0;
op.size = sizeof(features);
op.payload.u64 = features;

return vhost_vfio_write(dev, );
}

To get things from the device, two steps are needed.
Take VHOST_GET_FEATURE as an example:

int vhost_vfio_get_features(struct vhost_dev *dev, uint64_t *features)
{
struct vhost_vfio_op op;
int ret;

op.request = VHOST_GET_FEATURES;
op.flags = VHOST_VFIO_NEED_REPLY;
op.size = 0;

/* Just need to write the header */
ret = vhost_vfio_write(dev, );
if (ret != 0)
goto out;

/* `op` wasn't changed during write */
op.flags = 0;
op.size = sizeof(*features);

ret = vhost_vfio_read(dev, );
if (ret != 0)
goto out;

*features = op.payload.u64;
out:
return ret;
}

2. NOTIFIY_REGION (mmap-able)

The region described by NOTIFY_REGION will be used to notify
the device.

Each queue will have a page for notification, and it can be mapped
to VM (if hardware also supports), and the virtio driver in the VM
will be able to notify the device directly.

The region described by NOTIFY_REGION is also write-able. If
the accelerator's notification register(s) cannot be mapped to
the VM, write() can also be used to notify the device. Something
like this:

void notify_relay(void *opaque)
{
..
offset = host_page_size * queue_idx;

ret = pwrite64(vfio->device_fd, _idx, sizeof(queue_idx),
vfio->notify_offset + offset);
..
}

3. VFIO interrupt ioctl API

VFIO interrupt ioctl API is used to setup device interrupts.
IRQ-bypass can also be supported.

Currently, the data path interrupt can be configured via the
VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.

Signed-off-by: Tiwei Bie 
---
 drivers/vhost/Makefile |   2 +
 drivers/vhost/vdpa.c   | 770 +
 include/linux/vdpa_mdev.h  |  72 
 include/uapi/linux/vfio.h  |  19 +
 include/uapi/linux/vhost.h |  25 ++
 5 files changed, 888 insertions(+)
 create mode 100644 drivers/vhost/vdpa.c
 create mode 100644 include/linux/vdpa_mdev.h

diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 6c6df24f770c..cabb71095940 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -10,4 +10,6 @@ vhost_vsock-y := vsock.o
 
 obj-$(CONFIG_VHOST_RING) += vringh.o
 
+obj-$(CONFIG_VHOST_VFIO) += vdpa.o
+
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
new file mode 100644
index ..5c9426e2a091
--- /dev/null
+++ b

Re: [PATCH] virtio: drop internal struct from UAPI

2019-02-02 Thread Tiwei Bie
On Fri, Feb 01, 2019 at 05:16:01PM -0500, Michael S. Tsirkin wrote:
> There's no reason to expose struct vring_packed in UAPI - if we do we
> won't be able to change or drop it, and it's not part of any interface.
> 
> Let's move it to virtio_ring.c
> 
> Cc: Tiwei Bie 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/virtio/virtio_ring.c |  7 ++-
>  include/uapi/linux/virtio_ring.h | 10 --
>  2 files changed, 6 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 412e0c431d87..1c85b3423182 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -152,7 +152,12 @@ struct vring_virtqueue {
>   /* Available for packed ring */
>   struct {
>   /* Actual memory layout for this queue. */
> - struct vring_packed vring;
> + struct {
> + unsigned int num;
> + struct vring_packed_desc *desc;
> + struct vring_packed_desc_event *driver;
> + struct vring_packed_desc_event *device;
> + } vring;
>  
>   /* Driver ring wrap counter. */
>   bool avail_wrap_counter;
> diff --git a/include/uapi/linux/virtio_ring.h 
> b/include/uapi/linux/virtio_ring.h
> index 2414f8af26b3..4c4e24c291a5 100644
> --- a/include/uapi/linux/virtio_ring.h
> +++ b/include/uapi/linux/virtio_ring.h
> @@ -213,14 +213,4 @@ struct vring_packed_desc {
>   __le16 flags;
>  };
>  
> -struct vring_packed {
> - unsigned int num;
> -
> - struct vring_packed_desc *desc;
> -
> - struct vring_packed_desc_event *driver;
> -
> - struct vring_packed_desc_event *device;
> -};
> -
>  #endif /* _UAPI_LINUX_VIRTIO_RING_H */
> -- 
> MST

Acked-by: Tiwei Bie 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2] virtio: support VIRTIO_F_ORDER_PLATFORM

2019-01-23 Thread Tiwei Bie
This patch introduces the support for VIRTIO_F_ORDER_PLATFORM.
If this feature is negotiated, the driver must use the barriers
suitable for hardware devices. Otherwise, the device and driver
are assumed to be implemented in software, that is they can be
assumed to run on identical CPUs in an SMP configuration. Thus
a weaker form of memory barriers is sufficient to yield better
performance.

It is recommended that an add-in card based PCI device offers
this feature for portability. The device will fail to operate
further or will operate in a slower emulation mode if this
feature is offered but not accepted.

Signed-off-by: Tiwei Bie 
---
v2:
- Add more explanations in commit log (MST);

 drivers/virtio/virtio_ring.c   | 8 
 include/uapi/linux/virtio_config.h | 6 ++
 2 files changed, 14 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cd7e755484e3..27d3f057493e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1609,6 +1609,9 @@ static struct virtqueue *vring_create_virtqueue_packed(
!context;
vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
 
+   if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
+   vq->weak_barriers = false;
+
vq->packed.ring_dma_addr = ring_dma_addr;
vq->packed.driver_event_dma_addr = driver_event_dma_addr;
vq->packed.device_event_dma_addr = device_event_dma_addr;
@@ -2079,6 +2082,9 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
!context;
vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
 
+   if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
+   vq->weak_barriers = false;
+
vq->split.queue_dma_addr = 0;
vq->split.queue_size_in_bytes = 0;
 
@@ -2213,6 +2219,8 @@ void vring_transport_features(struct virtio_device *vdev)
break;
case VIRTIO_F_RING_PACKED:
break;
+   case VIRTIO_F_ORDER_PLATFORM:
+   break;
default:
/* We don't understand this bit. */
__virtio_clear_bit(vdev, i);
diff --git a/include/uapi/linux/virtio_config.h 
b/include/uapi/linux/virtio_config.h
index 1196e1c1d4f6..ff8e7dc9d4dd 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -78,6 +78,12 @@
 /* This feature indicates support for the packed virtqueue layout. */
 #define VIRTIO_F_RING_PACKED   34
 
+/*
+ * This feature indicates that memory accesses by the driver and the
+ * device are ordered in a way described by the platform.
+ */
+#define VIRTIO_F_ORDER_PLATFORM36
+
 /*
  * Does the device support Single Root I/O Virtualization?
  */
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio: support VIRTIO_F_ORDER_PLATFORM

2019-01-23 Thread Tiwei Bie
On Tue, Jan 22, 2019 at 11:04:29PM -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 23, 2019 at 01:03:46AM +0800, Tiwei Bie wrote:
> > This patch introduces the support for VIRTIO_F_ORDER_PLATFORM.
> > When this feature is negotiated, driver will use the barriers
> > suitable for hardware devices.
> > 
> > Signed-off-by: Tiwei Bie 
> 
> Could you pls add a bit more explanation in the commit log?
> E.g. which configurations are broken without this patch?
> How severe is the problem?

Sure. Will do that.

Thanks

> 
> I'm trying to decide whether this belongs in 5.0 or 5.1.
> 
> > ---
> >  drivers/virtio/virtio_ring.c   | 8 
> >  include/uapi/linux/virtio_config.h | 6 ++
> >  2 files changed, 14 insertions(+)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index cd7e755484e3..27d3f057493e 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1609,6 +1609,9 @@ static struct virtqueue 
> > *vring_create_virtqueue_packed(
> > !context;
> > vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> >  
> > +   if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
> > +   vq->weak_barriers = false;
> > +
> > vq->packed.ring_dma_addr = ring_dma_addr;
> > vq->packed.driver_event_dma_addr = driver_event_dma_addr;
> > vq->packed.device_event_dma_addr = device_event_dma_addr;
> > @@ -2079,6 +2082,9 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
> > index,
> > !context;
> > vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> >  
> > +   if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
> > +   vq->weak_barriers = false;
> > +
> > vq->split.queue_dma_addr = 0;
> > vq->split.queue_size_in_bytes = 0;
> >  
> > @@ -2213,6 +2219,8 @@ void vring_transport_features(struct virtio_device 
> > *vdev)
> > break;
> > case VIRTIO_F_RING_PACKED:
> > break;
> > +   case VIRTIO_F_ORDER_PLATFORM:
> > +   break;
> > default:
> > /* We don't understand this bit. */
> > __virtio_clear_bit(vdev, i);
> > diff --git a/include/uapi/linux/virtio_config.h 
> > b/include/uapi/linux/virtio_config.h
> > index 1196e1c1d4f6..ff8e7dc9d4dd 100644
> > --- a/include/uapi/linux/virtio_config.h
> > +++ b/include/uapi/linux/virtio_config.h
> > @@ -78,6 +78,12 @@
> >  /* This feature indicates support for the packed virtqueue layout. */
> >  #define VIRTIO_F_RING_PACKED   34
> >  
> > +/*
> > + * This feature indicates that memory accesses by the driver and the
> > + * device are ordered in a way described by the platform.
> > + */
> > +#define VIRTIO_F_ORDER_PLATFORM36
> > +
> >  /*
> >   * Does the device support Single Root I/O Virtualization?
> >   */
> > -- 
> > 2.17.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] virtio: support VIRTIO_F_ORDER_PLATFORM

2019-01-22 Thread Tiwei Bie
This patch introduces the support for VIRTIO_F_ORDER_PLATFORM.
When this feature is negotiated, driver will use the barriers
suitable for hardware devices.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c   | 8 
 include/uapi/linux/virtio_config.h | 6 ++
 2 files changed, 14 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cd7e755484e3..27d3f057493e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1609,6 +1609,9 @@ static struct virtqueue *vring_create_virtqueue_packed(
!context;
vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
 
+   if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
+   vq->weak_barriers = false;
+
vq->packed.ring_dma_addr = ring_dma_addr;
vq->packed.driver_event_dma_addr = driver_event_dma_addr;
vq->packed.device_event_dma_addr = device_event_dma_addr;
@@ -2079,6 +2082,9 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
!context;
vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
 
+   if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
+   vq->weak_barriers = false;
+
vq->split.queue_dma_addr = 0;
vq->split.queue_size_in_bytes = 0;
 
@@ -2213,6 +2219,8 @@ void vring_transport_features(struct virtio_device *vdev)
break;
case VIRTIO_F_RING_PACKED:
break;
+   case VIRTIO_F_ORDER_PLATFORM:
+   break;
default:
/* We don't understand this bit. */
__virtio_clear_bit(vdev, i);
diff --git a/include/uapi/linux/virtio_config.h 
b/include/uapi/linux/virtio_config.h
index 1196e1c1d4f6..ff8e7dc9d4dd 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -78,6 +78,12 @@
 /* This feature indicates support for the packed virtqueue layout. */
 #define VIRTIO_F_RING_PACKED   34
 
+/*
+ * This feature indicates that memory accesses by the driver and the
+ * device are ordered in a way described by the platform.
+ */
+#define VIRTIO_F_ORDER_PLATFORM36
+
 /*
  * Does the device support Single Root I/O Virtualization?
  */
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC 3/3] virtio_ring: use new vring flags

2018-12-08 Thread Tiwei Bie
On Fri, Dec 07, 2018 at 01:10:48PM -0500, Michael S. Tsirkin wrote:
> On Fri, Dec 07, 2018 at 04:48:42PM +0800, Tiwei Bie wrote:
> > Switch to using the _SPLIT_ and _PACKED_ variants of vring flags
> > in split ring and packed ring respectively.
> > 
> > Signed-off-by: Tiwei Bie 
> > ---
> > @@ -502,7 +505,8 @@ static inline int virtqueue_add_split(struct virtqueue 
> > *_vq,
> > }
> > }
> > /* Last one doesn't continue. */
> > -   desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > +   desc[prev].flags &= cpu_to_virtio16(_vq->vdev,
> > +   (u16)~BIT(VRING_SPLIT_DESC_F_NEXT));
> >  
> > if (indirect) {
> > /* Now that the indirect table is filled in, map it. */
> 
> I kind of dislike it that this forces use of a cast here.
> Kind of makes it more fragile. Let's use a temporary instead?

I tried something like this:

u16 mask = ~BIT(VRING_SPLIT_DESC_F_NEXT);

And it will still cause the warning:

warning: large integer implicitly truncated to unsigned type [-Woverflow]
  u16 mask = ~BIT(VRING_SPLIT_DESC_F_NEXT);

If the cast isn't wanted, maybe use ~(1 << VRING_SPLIT_DESC_F_NEXT)
directly?

> 
> > -- 
> > 2.17.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC 2/3] virtio_ring: add VIRTIO_RING_NO_LEGACY

2018-12-08 Thread Tiwei Bie
On Fri, Dec 07, 2018 at 01:05:35PM -0500, Michael S. Tsirkin wrote:
> On Fri, Dec 07, 2018 at 04:48:41PM +0800, Tiwei Bie wrote:
> > Introduce VIRTIO_RING_NO_LEGACY to support disabling legacy
> > macros and layout definitions.
> > 
> > Suggested-by: Michael S. Tsirkin 
> > Signed-off-by: Tiwei Bie 
> > ---
> > VRING_AVAIL_ALIGN_SIZE, VRING_USED_ALIGN_SIZE and VRING_DESC_ALIGN_SIZE
> > are not pre-virtio 1.0, but can also be disabled by VIRTIO_RING_NO_LEGACY
> > in this patch, because their names are not consistent with other names.
> > Not sure whether this is a good idea. If we want this, we may also want
> > to define _SPLIT_ version for them.
> 
> I don't think it's a good idea to have alignment in there - the point of
> NO_LEGACY is to help catch bugs not to sanitize coding style IMHO.
> 
> And spec calls "legacy" the 0.X interfaces, let's not muddy the waters.

Make sense. Thanks!

> 
> > 
> >  include/uapi/linux/virtio_ring.h | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/include/uapi/linux/virtio_ring.h 
> > b/include/uapi/linux/virtio_ring.h
> > index 9b0c0d92ab62..192573827850 100644
> > --- a/include/uapi/linux/virtio_ring.h
> > +++ b/include/uapi/linux/virtio_ring.h
> > @@ -37,6 +37,7 @@
> >  #include 
> >  #include 
> >  
> > +#ifndef VIRTIO_RING_NO_LEGACY
> >  /*
> >   * Notice: unlike other _F_ flags, below flags are defined as shifted
> >   * values instead of shifts for compatibility.
> > @@ -51,6 +52,7 @@
> >  #define VRING_USED_F_NO_NOTIFY 1
> >  /* Same as VRING_SPLIT_AVAIL_F_NO_INTERRUPT. */
> >  #define VRING_AVAIL_F_NO_INTERRUPT 1
> > +#endif /* VIRTIO_RING_NO_LEGACY */
> >  
> >  /* Mark a buffer as continuing via the next field in split ring. */
> >  #define VRING_SPLIT_DESC_F_NEXT0
> > @@ -151,6 +153,7 @@ struct vring {
> > struct vring_used *used;
> >  };
> >  
> > +#ifndef VIRTIO_RING_NO_LEGACY
> >  /* Alignment requirements for vring elements.
> >   * When using pre-virtio 1.0 layout, these fall out naturally.
> >   */
> > @@ -203,6 +206,7 @@ static inline unsigned vring_size(unsigned int num, 
> > unsigned long align)
> >  + align - 1) & ~(align - 1))
> > + sizeof(__virtio16) * 3 + sizeof(struct vring_used_elem) * num;
> >  }
> > +#endif /* VIRTIO_RING_NO_LEGACY */
> >  
> >  /* The following is used with USED_EVENT_IDX and AVAIL_EVENT_IDX */
> >  /* Assuming a given event_idx value from the other side, if
> > -- 
> > 2.17.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC 0/3] virtio_ring: define flags as shifts consistently

2018-12-08 Thread Tiwei Bie
On Fri, Dec 07, 2018 at 01:11:42PM -0500, Michael S. Tsirkin wrote:
> On Fri, Dec 07, 2018 at 04:48:39PM +0800, Tiwei Bie wrote:
> > This is a follow up of the discussion in this thread:
> > https://patchwork.ozlabs.org/patch/1001015/#2042353
> 
> How was this tested? I'd suggest building virtio
> before and after the changes, stripped binary
> should be exactly the same.

Sure, I will do the test with scripts/objdiff.

> 
> 
> > Tiwei Bie (3):
> >   virtio_ring: define flags as shifts consistently
> >   virtio_ring: add VIRTIO_RING_NO_LEGACY
> >   virtio_ring: use new vring flags
> > 
> >  drivers/virtio/virtio_ring.c | 100 ++-
> >  include/uapi/linux/virtio_ring.h |  61 +--
> >  2 files changed, 103 insertions(+), 58 deletions(-)
> > 
> > -- 
> > 2.17.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC 2/3] virtio_ring: add VIRTIO_RING_NO_LEGACY

2018-12-07 Thread Tiwei Bie
Introduce VIRTIO_RING_NO_LEGACY to support disabling legacy
macros and layout definitions.

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Tiwei Bie 
---
VRING_AVAIL_ALIGN_SIZE, VRING_USED_ALIGN_SIZE and VRING_DESC_ALIGN_SIZE
are not pre-virtio 1.0, but can also be disabled by VIRTIO_RING_NO_LEGACY
in this patch, because their names are not consistent with other names.
Not sure whether this is a good idea. If we want this, we may also want
to define _SPLIT_ version for them.

 include/uapi/linux/virtio_ring.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 9b0c0d92ab62..192573827850 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -37,6 +37,7 @@
 #include 
 #include 
 
+#ifndef VIRTIO_RING_NO_LEGACY
 /*
  * Notice: unlike other _F_ flags, below flags are defined as shifted
  * values instead of shifts for compatibility.
@@ -51,6 +52,7 @@
 #define VRING_USED_F_NO_NOTIFY 1
 /* Same as VRING_SPLIT_AVAIL_F_NO_INTERRUPT. */
 #define VRING_AVAIL_F_NO_INTERRUPT 1
+#endif /* VIRTIO_RING_NO_LEGACY */
 
 /* Mark a buffer as continuing via the next field in split ring. */
 #define VRING_SPLIT_DESC_F_NEXT0
@@ -151,6 +153,7 @@ struct vring {
struct vring_used *used;
 };
 
+#ifndef VIRTIO_RING_NO_LEGACY
 /* Alignment requirements for vring elements.
  * When using pre-virtio 1.0 layout, these fall out naturally.
  */
@@ -203,6 +206,7 @@ static inline unsigned vring_size(unsigned int num, 
unsigned long align)
 + align - 1) & ~(align - 1))
+ sizeof(__virtio16) * 3 + sizeof(struct vring_used_elem) * num;
 }
+#endif /* VIRTIO_RING_NO_LEGACY */
 
 /* The following is used with USED_EVENT_IDX and AVAIL_EVENT_IDX */
 /* Assuming a given event_idx value from the other side, if
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC 1/3] virtio_ring: define flags as shifts consistently

2018-12-07 Thread Tiwei Bie
Introduce _SPLIT_ and/or _PACKED_ variants for VRING_DESC_F_*,
VRING_AVAIL_F_NO_INTERRUPT and VRING_USED_F_NO_NOTIFY. These
variants are defined as shifts instead of shifted values for
consistency with other _F_ flags.

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Tiwei Bie 
---
 include/uapi/linux/virtio_ring.h | 57 ++--
 1 file changed, 40 insertions(+), 17 deletions(-)

diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 2414f8af26b3..9b0c0d92ab62 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -37,29 +37,52 @@
 #include 
 #include 
 
-/* This marks a buffer as continuing via the next field. */
+/*
+ * Notice: unlike other _F_ flags, below flags are defined as shifted
+ * values instead of shifts for compatibility.
+ */
+/* Same as VRING_SPLIT_DESC_F_NEXT. */
 #define VRING_DESC_F_NEXT  1
-/* This marks a buffer as write-only (otherwise read-only). */
+/* Same as VRING_SPLIT_DESC_F_WRITE. */
 #define VRING_DESC_F_WRITE 2
-/* This means the buffer contains a list of buffer descriptors. */
+/* Same as VRING_SPLIT_DESC_F_INDIRECT. */
 #define VRING_DESC_F_INDIRECT  4
-
-/*
- * Mark a descriptor as available or used in packed ring.
- * Notice: they are defined as shifts instead of shifted values.
- */
-#define VRING_PACKED_DESC_F_AVAIL  7
-#define VRING_PACKED_DESC_F_USED   15
-
-/* The Host uses this in used->flags to advise the Guest: don't kick me when
- * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
- * will still kick if it's out of buffers. */
+/* Same as VRING_SPLIT_USED_F_NO_NOTIFY. */
 #define VRING_USED_F_NO_NOTIFY 1
-/* The Guest uses this in avail->flags to advise the Host: don't interrupt me
- * when you consume a buffer.  It's unreliable, so it's simply an
- * optimization.  */
+/* Same as VRING_SPLIT_AVAIL_F_NO_INTERRUPT. */
 #define VRING_AVAIL_F_NO_INTERRUPT 1
 
+/* Mark a buffer as continuing via the next field in split ring. */
+#define VRING_SPLIT_DESC_F_NEXT0
+/* Mark a buffer as write-only (otherwise read-only) in split ring. */
+#define VRING_SPLIT_DESC_F_WRITE   1
+/* Mean the buffer contains a list of buffer descriptors in split ring. */
+#define VRING_SPLIT_DESC_F_INDIRECT2
+
+/*
+ * The Host uses this in used->flags in split ring to advise the Guest:
+ * don't kick me when you add a buffer.  It's unreliable, so it's simply
+ * an optimization.  Guest will still kick if it's out of buffers.
+ */
+#define VRING_SPLIT_USED_F_NO_NOTIFY   0
+/*
+ * The Guest uses this in avail->flags in split ring to advise the Host:
+ * don't interrupt me when you consume a buffer.  It's unreliable, so it's
+ * simply an optimization.
+ */
+#define VRING_SPLIT_AVAIL_F_NO_INTERRUPT   0
+
+/* Mark a buffer as continuing via the next field in packed ring. */
+#define VRING_PACKED_DESC_F_NEXT   0
+/* Mark a buffer as write-only (otherwise read-only) in packed ring. */
+#define VRING_PACKED_DESC_F_WRITE  1
+/* Mean the buffer contains a list of buffer descriptors in packed ring. */
+#define VRING_PACKED_DESC_F_INDIRECT   2
+
+/* Mark a descriptor as available or used in packed ring. */
+#define VRING_PACKED_DESC_F_AVAIL  7
+#define VRING_PACKED_DESC_F_USED   15
+
 /* Enable events in packed ring. */
 #define VRING_PACKED_EVENT_FLAG_ENABLE 0x0
 /* Disable events in packed ring. */
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC 0/3] virtio_ring: define flags as shifts consistently

2018-12-07 Thread Tiwei Bie
This is a follow up of the discussion in this thread:
https://patchwork.ozlabs.org/patch/1001015/#2042353

Tiwei Bie (3):
  virtio_ring: define flags as shifts consistently
  virtio_ring: add VIRTIO_RING_NO_LEGACY
  virtio_ring: use new vring flags

 drivers/virtio/virtio_ring.c | 100 ++-
 include/uapi/linux/virtio_ring.h |  61 +--
 2 files changed, 103 insertions(+), 58 deletions(-)

-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v3 01/13] virtio: add packed ring types and macros

2018-11-30 Thread Tiwei Bie
On Fri, Nov 30, 2018 at 11:46:57AM -0500, Michael S. Tsirkin wrote:
> On Sat, Dec 01, 2018 at 12:24:16AM +0800, Tiwei Bie wrote:
> > On Fri, Nov 30, 2018 at 10:53:07AM -0500, Michael S. Tsirkin wrote:
> > > On Fri, Nov 30, 2018 at 11:37:37PM +0800, Tiwei Bie wrote:
> > > > On Fri, Nov 30, 2018 at 08:52:42AM -0500, Michael S. Tsirkin wrote:
> > > > > On Fri, Nov 30, 2018 at 02:01:06PM +0100, Maxime Coquelin wrote:
> > > > > > On 11/30/18 1:47 PM, Michael S. Tsirkin wrote:
> > > > > > > On Fri, Nov 30, 2018 at 05:53:40PM +0800, Tiwei Bie wrote:
> > > > > > > > On Fri, Nov 30, 2018 at 04:10:55PM +0800, Jason Wang wrote:
> > > > > > > > > 
> > > > > > > > > On 2018/11/21 下午6:03, Tiwei Bie wrote:
> > > > > > > > > > Add types and macros for packed ring.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Tiwei Bie 
> > > > > > > > > > ---
> > > > > > > > > >include/uapi/linux/virtio_config.h |  3 +++
> > > > > > > > > >include/uapi/linux/virtio_ring.h   | 52 
> > > > > > > > > > ++
> > > > > > > > > >2 files changed, 55 insertions(+)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/include/uapi/linux/virtio_config.h 
> > > > > > > > > > b/include/uapi/linux/virtio_config.h
> > > > > > > > > > index 449132c76b1c..1196e1c1d4f6 100644
> > > > > > > > > > --- a/include/uapi/linux/virtio_config.h
> > > > > > > > > > +++ b/include/uapi/linux/virtio_config.h
> > > > > > > > > > @@ -75,6 +75,9 @@
> > > > > > > > > > */
> > > > > > > > > >#define VIRTIO_F_IOMMU_PLATFORM  33
> > > > > > > > > > +/* This feature indicates support for the packed virtqueue 
> > > > > > > > > > layout. */
> > > > > > > > > > +#define VIRTIO_F_RING_PACKED   34
> > > > > > > > > > +
> > > > > > > > > >/*
> > > > > > > > > > * Does the device support Single Root I/O 
> > > > > > > > > > Virtualization?
> > > > > > > > > > */
> > > > > > > > > > diff --git a/include/uapi/linux/virtio_ring.h 
> > > > > > > > > > b/include/uapi/linux/virtio_ring.h
> > > > > > > > > > index 6d5d5faa989b..2414f8af26b3 100644
> > > > > > > > > > --- a/include/uapi/linux/virtio_ring.h
> > > > > > > > > > +++ b/include/uapi/linux/virtio_ring.h
> > > > > > > > > > @@ -44,6 +44,13 @@
> > > > > > > > > >/* This means the buffer contains a list of buffer 
> > > > > > > > > > descriptors. */
> > > > > > > > > >#define VRING_DESC_F_INDIRECT4
> > > > > > > > > > +/*
> > > > > > > > > > + * Mark a descriptor as available or used in packed ring.
> > > > > > > > > > + * Notice: they are defined as shifts instead of shifted 
> > > > > > > > > > values.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > This looks inconsistent to previous flags, any reason for 
> > > > > > > > > using shifts?
> > > > > > > > 
> > > > > > > > Yeah, it was suggested to use shifts, as _F_ should be a bit
> > > > > > > > number, not a shifted value:
> > > > > > > > 
> > > > > > > > https://patchwork.ozlabs.org/patch/942296/#1989390
> > > > > > > 
> > > > > > > But let's add a _SPLIT_ variant that uses shifts consistently.
> > > > > > 
> > > > > > Maybe we could avoid adding SPLIT and PACKED, but define as follow:
> > > > > > 
> > > > > > #define VRING_DESC_F_INDIRECT_SHIFT 2
> > > > > > #define VRING_DESC_F_INDIRECT (1ull <<  VRING_DESC_F_IN

Re: [PATCH net-next v3 01/13] virtio: add packed ring types and macros

2018-11-30 Thread Tiwei Bie
On Fri, Nov 30, 2018 at 10:53:07AM -0500, Michael S. Tsirkin wrote:
> On Fri, Nov 30, 2018 at 11:37:37PM +0800, Tiwei Bie wrote:
> > On Fri, Nov 30, 2018 at 08:52:42AM -0500, Michael S. Tsirkin wrote:
> > > On Fri, Nov 30, 2018 at 02:01:06PM +0100, Maxime Coquelin wrote:
> > > > On 11/30/18 1:47 PM, Michael S. Tsirkin wrote:
> > > > > On Fri, Nov 30, 2018 at 05:53:40PM +0800, Tiwei Bie wrote:
> > > > > > On Fri, Nov 30, 2018 at 04:10:55PM +0800, Jason Wang wrote:
> > > > > > > 
> > > > > > > On 2018/11/21 下午6:03, Tiwei Bie wrote:
> > > > > > > > Add types and macros for packed ring.
> > > > > > > > 
> > > > > > > > Signed-off-by: Tiwei Bie 
> > > > > > > > ---
> > > > > > > >include/uapi/linux/virtio_config.h |  3 +++
> > > > > > > >include/uapi/linux/virtio_ring.h   | 52 
> > > > > > > > ++
> > > > > > > >2 files changed, 55 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/include/uapi/linux/virtio_config.h 
> > > > > > > > b/include/uapi/linux/virtio_config.h
> > > > > > > > index 449132c76b1c..1196e1c1d4f6 100644
> > > > > > > > --- a/include/uapi/linux/virtio_config.h
> > > > > > > > +++ b/include/uapi/linux/virtio_config.h
> > > > > > > > @@ -75,6 +75,9 @@
> > > > > > > > */
> > > > > > > >#define VIRTIO_F_IOMMU_PLATFORM  33
> > > > > > > > +/* This feature indicates support for the packed virtqueue 
> > > > > > > > layout. */
> > > > > > > > +#define VIRTIO_F_RING_PACKED   34
> > > > > > > > +
> > > > > > > >/*
> > > > > > > > * Does the device support Single Root I/O Virtualization?
> > > > > > > > */
> > > > > > > > diff --git a/include/uapi/linux/virtio_ring.h 
> > > > > > > > b/include/uapi/linux/virtio_ring.h
> > > > > > > > index 6d5d5faa989b..2414f8af26b3 100644
> > > > > > > > --- a/include/uapi/linux/virtio_ring.h
> > > > > > > > +++ b/include/uapi/linux/virtio_ring.h
> > > > > > > > @@ -44,6 +44,13 @@
> > > > > > > >/* This means the buffer contains a list of buffer 
> > > > > > > > descriptors. */
> > > > > > > >#define VRING_DESC_F_INDIRECT4
> > > > > > > > +/*
> > > > > > > > + * Mark a descriptor as available or used in packed ring.
> > > > > > > > + * Notice: they are defined as shifts instead of shifted 
> > > > > > > > values.
> > > > > > > 
> > > > > > > 
> > > > > > > This looks inconsistent to previous flags, any reason for using 
> > > > > > > shifts?
> > > > > > 
> > > > > > Yeah, it was suggested to use shifts, as _F_ should be a bit
> > > > > > number, not a shifted value:
> > > > > > 
> > > > > > https://patchwork.ozlabs.org/patch/942296/#1989390
> > > > > 
> > > > > But let's add a _SPLIT_ variant that uses shifts consistently.
> > > > 
> > > > Maybe we could avoid adding SPLIT and PACKED, but define as follow:
> > > > 
> > > > #define VRING_DESC_F_INDIRECT_SHIFT 2
> > > > #define VRING_DESC_F_INDIRECT (1ull <<  VRING_DESC_F_INDIRECT_SHIFT)
> > > > 
> > > > #define VRING_DESC_F_AVAIL_SHIFT 7
> > > > #define VRING_DESC_F_AVAIL (1ull << VRING_DESC_F_AVAIL_SHIFT)
> > > > 
> > > > I think it would be better for consistency.
> > > 
> > > I don't think that will help. the problem is that
> > > most of the existing virtio code consistently uses _F_ as shifts.
> > > So we just need to do something about these 5 being inconsistent:
> > > 
> > > include/uapi/linux/virtio_ring.h:#define VRING_DESC_F_NEXT  1
> > > include/uapi/linux/virtio_ring.h:#define VRING_DESC_F_WRITE 2
> > > include/uapi/linux/virtio_ring.h:#define VRING_DESC_F_INDIRECT  4
> > > include/uapi/linux/virtio_ring.

Re: [PATCH net-next v3 01/13] virtio: add packed ring types and macros

2018-11-30 Thread Tiwei Bie
On Fri, Nov 30, 2018 at 08:52:42AM -0500, Michael S. Tsirkin wrote:
> On Fri, Nov 30, 2018 at 02:01:06PM +0100, Maxime Coquelin wrote:
> > On 11/30/18 1:47 PM, Michael S. Tsirkin wrote:
> > > On Fri, Nov 30, 2018 at 05:53:40PM +0800, Tiwei Bie wrote:
> > > > On Fri, Nov 30, 2018 at 04:10:55PM +0800, Jason Wang wrote:
> > > > > 
> > > > > On 2018/11/21 下午6:03, Tiwei Bie wrote:
> > > > > > Add types and macros for packed ring.
> > > > > > 
> > > > > > Signed-off-by: Tiwei Bie 
> > > > > > ---
> > > > > >include/uapi/linux/virtio_config.h |  3 +++
> > > > > >include/uapi/linux/virtio_ring.h   | 52 
> > > > > > ++
> > > > > >2 files changed, 55 insertions(+)
> > > > > > 
> > > > > > diff --git a/include/uapi/linux/virtio_config.h 
> > > > > > b/include/uapi/linux/virtio_config.h
> > > > > > index 449132c76b1c..1196e1c1d4f6 100644
> > > > > > --- a/include/uapi/linux/virtio_config.h
> > > > > > +++ b/include/uapi/linux/virtio_config.h
> > > > > > @@ -75,6 +75,9 @@
> > > > > > */
> > > > > >#define VIRTIO_F_IOMMU_PLATFORM  33
> > > > > > +/* This feature indicates support for the packed virtqueue layout. 
> > > > > > */
> > > > > > +#define VIRTIO_F_RING_PACKED   34
> > > > > > +
> > > > > >/*
> > > > > > * Does the device support Single Root I/O Virtualization?
> > > > > > */
> > > > > > diff --git a/include/uapi/linux/virtio_ring.h 
> > > > > > b/include/uapi/linux/virtio_ring.h
> > > > > > index 6d5d5faa989b..2414f8af26b3 100644
> > > > > > --- a/include/uapi/linux/virtio_ring.h
> > > > > > +++ b/include/uapi/linux/virtio_ring.h
> > > > > > @@ -44,6 +44,13 @@
> > > > > >/* This means the buffer contains a list of buffer descriptors. 
> > > > > > */
> > > > > >#define VRING_DESC_F_INDIRECT4
> > > > > > +/*
> > > > > > + * Mark a descriptor as available or used in packed ring.
> > > > > > + * Notice: they are defined as shifts instead of shifted values.
> > > > > 
> > > > > 
> > > > > This looks inconsistent to previous flags, any reason for using 
> > > > > shifts?
> > > > 
> > > > Yeah, it was suggested to use shifts, as _F_ should be a bit
> > > > number, not a shifted value:
> > > > 
> > > > https://patchwork.ozlabs.org/patch/942296/#1989390
> > > 
> > > But let's add a _SPLIT_ variant that uses shifts consistently.
> > 
> > Maybe we could avoid adding SPLIT and PACKED, but define as follow:
> > 
> > #define VRING_DESC_F_INDIRECT_SHIFT 2
> > #define VRING_DESC_F_INDIRECT (1ull <<  VRING_DESC_F_INDIRECT_SHIFT)
> > 
> > #define VRING_DESC_F_AVAIL_SHIFT 7
> > #define VRING_DESC_F_AVAIL (1ull << VRING_DESC_F_AVAIL_SHIFT)
> > 
> > I think it would be better for consistency.
> 
> I don't think that will help. the problem is that
> most of the existing virtio code consistently uses _F_ as shifts.
> So we just need to do something about these 5 being inconsistent:
> 
> include/uapi/linux/virtio_ring.h:#define VRING_DESC_F_NEXT  1
> include/uapi/linux/virtio_ring.h:#define VRING_DESC_F_WRITE 2
> include/uapi/linux/virtio_ring.h:#define VRING_DESC_F_INDIRECT  4
> include/uapi/linux/virtio_ring.h:#define VRING_USED_F_NO_NOTIFY 1
> include/uapi/linux/virtio_ring.h:#define VRING_AVAIL_F_NO_INTERRUPT 1
> 
> I do not want all of virtio to become verbose with _SHIFT, ergo
> we need to change the above 5 to have names which are with _F_ and
> have the bit number.

How about something like this:

#define VRING_COMM_DESC_F_NEXT  0
#define VRING_COMM_DESC_F_WRITE 1
#define VRING_COMM_DESC_F_INDIRECT  2

#define VRING_SPLIT_USED_F_NO_NOTIFY0
#define VRING_SPLIT_AVAIL_F_NO_INTERRUPT0

or

#define VRING_SPLIT_DESC_F_NEXT 0
#define VRING_SPLIT_DESC_F_WRITE1
#define VRING_SPLIT_DESC_F_INDIRECT 2

#define VRING_SPLIT_USED_F_NO_NOTIFY0
#define VRING_SPLIT_AVAIL_F_NO_INTERRUPT0

#define VRING_PACKED_DESC_F_NEXT0
#define VRING_PACKED

Re: [PATCH net-next v3 01/13] virtio: add packed ring types and macros

2018-11-30 Thread Tiwei Bie
On Fri, Nov 30, 2018 at 04:10:55PM +0800, Jason Wang wrote:
> 
> On 2018/11/21 下午6:03, Tiwei Bie wrote:
> > Add types and macros for packed ring.
> > 
> > Signed-off-by: Tiwei Bie 
> > ---
> >   include/uapi/linux/virtio_config.h |  3 +++
> >   include/uapi/linux/virtio_ring.h   | 52 
> > ++
> >   2 files changed, 55 insertions(+)
> > 
> > diff --git a/include/uapi/linux/virtio_config.h 
> > b/include/uapi/linux/virtio_config.h
> > index 449132c76b1c..1196e1c1d4f6 100644
> > --- a/include/uapi/linux/virtio_config.h
> > +++ b/include/uapi/linux/virtio_config.h
> > @@ -75,6 +75,9 @@
> >*/
> >   #define VIRTIO_F_IOMMU_PLATFORM   33
> > +/* This feature indicates support for the packed virtqueue layout. */
> > +#define VIRTIO_F_RING_PACKED   34
> > +
> >   /*
> >* Does the device support Single Root I/O Virtualization?
> >*/
> > diff --git a/include/uapi/linux/virtio_ring.h 
> > b/include/uapi/linux/virtio_ring.h
> > index 6d5d5faa989b..2414f8af26b3 100644
> > --- a/include/uapi/linux/virtio_ring.h
> > +++ b/include/uapi/linux/virtio_ring.h
> > @@ -44,6 +44,13 @@
> >   /* This means the buffer contains a list of buffer descriptors. */
> >   #define VRING_DESC_F_INDIRECT 4
> > +/*
> > + * Mark a descriptor as available or used in packed ring.
> > + * Notice: they are defined as shifts instead of shifted values.
> 
> 
> This looks inconsistent to previous flags, any reason for using shifts?

Yeah, it was suggested to use shifts, as _F_ should be a bit
number, not a shifted value:

https://patchwork.ozlabs.org/patch/942296/#1989390

> 
> 
> > + */
> > +#define VRING_PACKED_DESC_F_AVAIL  7
> > +#define VRING_PACKED_DESC_F_USED   15
> > +
> >   /* The Host uses this in used->flags to advise the Guest: don't kick me 
> > when
> >* you add a buffer.  It's unreliable, so it's simply an optimization.  
> > Guest
> >* will still kick if it's out of buffers. */
> > @@ -53,6 +60,23 @@
> >* optimization.  */
> >   #define VRING_AVAIL_F_NO_INTERRUPT1
> > +/* Enable events in packed ring. */
> > +#define VRING_PACKED_EVENT_FLAG_ENABLE 0x0
> > +/* Disable events in packed ring. */
> > +#define VRING_PACKED_EVENT_FLAG_DISABLE0x1
> > +/*
> > + * Enable events for a specific descriptor in packed ring.
> > + * (as specified by Descriptor Ring Change Event Offset/Wrap Counter).
> > + * Only valid if VIRTIO_RING_F_EVENT_IDX has been negotiated.
> > + */
> > +#define VRING_PACKED_EVENT_FLAG_DESC   0x2
> 
> 
> Any reason for using _FLAG_ instead of _F_?

Yeah, it was suggested to not use _F_, as these are values,
should not have _F_:

https://patchwork.ozlabs.org/patch/942296/#1989390

Regards,
Tiwei

> 
> Thanks
> 
> 
> > +
> > +/*
> > + * Wrap counter bit shift in event suppression structure
> > + * of packed ring.
> > + */
> > +#define VRING_PACKED_EVENT_F_WRAP_CTR  15
> > +
> >   /* We support indirect buffer descriptors */
> >   #define VIRTIO_RING_F_INDIRECT_DESC   28
> > @@ -171,4 +195,32 @@ static inline int vring_need_event(__u16 event_idx, 
> > __u16 new_idx, __u16 old)
> > return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);
> >   }
> > +struct vring_packed_desc_event {
> > +   /* Descriptor Ring Change Event Offset/Wrap Counter. */
> > +   __le16 off_wrap;
> > +   /* Descriptor Ring Change Event Flags. */
> > +   __le16 flags;
> > +};
> > +
> > +struct vring_packed_desc {
> > +   /* Buffer Address. */
> > +   __le64 addr;
> > +   /* Buffer Length. */
> > +   __le32 len;
> > +   /* Buffer ID. */
> > +   __le16 id;
> > +   /* The flags depending on descriptor type. */
> > +   __le16 flags;
> > +};
> > +
> > +struct vring_packed {
> > +   unsigned int num;
> > +
> > +   struct vring_packed_desc *desc;
> > +
> > +   struct vring_packed_desc_event *driver;
> > +
> > +   struct vring_packed_desc_event *device;
> > +};
> > +
> >   #endif /* _UAPI_LINUX_VIRTIO_RING_H */
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH net-next v3 13/13] virtio_ring: advertize packed ring layout

2018-11-21 Thread Tiwei Bie
Advertize the packed ring layout support.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 40e4d3798d16..cd7e755484e3 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2211,6 +2211,8 @@ void vring_transport_features(struct virtio_device *vdev)
break;
case VIRTIO_F_IOMMU_PLATFORM:
break;
+   case VIRTIO_F_RING_PACKED:
+   break;
default:
/* We don't understand this bit. */
__virtio_clear_bit(vdev, i);
-- 
2.14.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next v3 12/13] virtio_ring: disable packed ring on unsupported transports

2018-11-21 Thread Tiwei Bie
Currently, ccw, vop and remoteproc need some legacy virtio
APIs to create or access virtio rings, which are not supported
by packed ring. So disable packed ring on these transports
for now.

Signed-off-by: Tiwei Bie 
---
 drivers/misc/mic/vop/vop_main.c| 13 +
 drivers/remoteproc/remoteproc_virtio.c | 13 +
 drivers/s390/virtio/virtio_ccw.c   | 14 ++
 3 files changed, 40 insertions(+)

diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c
index 3633202e18f4..6b212c8b78e7 100644
--- a/drivers/misc/mic/vop/vop_main.c
+++ b/drivers/misc/mic/vop/vop_main.c
@@ -129,6 +129,16 @@ static u64 vop_get_features(struct virtio_device *vdev)
return features;
 }
 
+static void vop_transport_features(struct virtio_device *vdev)
+{
+   /*
+* Packed ring isn't enabled on virtio_vop for now,
+* because virtio_vop uses vring_new_virtqueue() which
+* creates virtio rings on preallocated memory.
+*/
+   __virtio_clear_bit(vdev, VIRTIO_F_RING_PACKED);
+}
+
 static int vop_finalize_features(struct virtio_device *vdev)
 {
unsigned int i, bits;
@@ -141,6 +151,9 @@ static int vop_finalize_features(struct virtio_device *vdev)
/* Give virtio_ring a chance to accept features. */
vring_transport_features(vdev);
 
+   /* Give virtio_vop a chance to accept features. */
+   vop_transport_features(vdev);
+
memset_io(out_features, 0, feature_len);
bits = min_t(unsigned, feature_len,
 sizeof(vdev->features)) * 8;
diff --git a/drivers/remoteproc/remoteproc_virtio.c 
b/drivers/remoteproc/remoteproc_virtio.c
index de21f620b882..183fc42a510a 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -214,6 +214,16 @@ static u64 rproc_virtio_get_features(struct virtio_device 
*vdev)
return rsc->dfeatures;
 }
 
+static void rproc_transport_features(struct virtio_device *vdev)
+{
+   /*
+* Packed ring isn't enabled on remoteproc for now,
+* because remoteproc uses vring_new_virtqueue() which
+* creates virtio rings on preallocated memory.
+*/
+   __virtio_clear_bit(vdev, VIRTIO_F_RING_PACKED);
+}
+
 static int rproc_virtio_finalize_features(struct virtio_device *vdev)
 {
struct rproc_vdev *rvdev = vdev_to_rvdev(vdev);
@@ -224,6 +234,9 @@ static int rproc_virtio_finalize_features(struct 
virtio_device *vdev)
/* Give virtio_ring a chance to accept features */
vring_transport_features(vdev);
 
+   /* Give virtio_rproc a chance to accept features. */
+   rproc_transport_features(vdev);
+
/* Make sure we don't have any features > 32 bits! */
BUG_ON((u32)vdev->features != vdev->features);
 
diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index 97b6f197f007..406d1f64ad65 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -765,6 +765,17 @@ static u64 virtio_ccw_get_features(struct virtio_device 
*vdev)
return rc;
 }
 
+static void ccw_transport_features(struct virtio_device *vdev)
+{
+   /*
+* Packed ring isn't enabled on virtio_ccw for now,
+* because virtio_ccw uses some legacy accessors,
+* e.g. virtqueue_get_avail() and virtqueue_get_used()
+* which aren't available in packed ring currently.
+*/
+   __virtio_clear_bit(vdev, VIRTIO_F_RING_PACKED);
+}
+
 static int virtio_ccw_finalize_features(struct virtio_device *vdev)
 {
struct virtio_ccw_device *vcdev = to_vc_device(vdev);
@@ -791,6 +802,9 @@ static int virtio_ccw_finalize_features(struct 
virtio_device *vdev)
/* Give virtio_ring a chance to accept features. */
vring_transport_features(vdev);
 
+   /* Give virtio_ccw a chance to accept features. */
+   ccw_transport_features(vdev);
+
features->index = 0;
features->features = cpu_to_le32((u32)vdev->features);
/* Write the first half of the feature bits to the host. */
-- 
2.14.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next v3 11/13] virtio_ring: leverage event idx in packed ring

2018-11-21 Thread Tiwei Bie
Leverage the EVENT_IDX feature in packed ring to suppress
events when it's available.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 77 
 1 file changed, 71 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index b63eee2034e7..40e4d3798d16 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1222,7 +1222,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
 static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
-   u16 flags;
+   u16 new, old, off_wrap, flags, wrap_counter, event_idx;
bool needs_kick;
union {
struct {
@@ -1240,6 +1240,8 @@ static bool virtqueue_kick_prepare_packed(struct 
virtqueue *_vq)
 */
virtio_mb(vq->weak_barriers);
 
+   old = vq->packed.next_avail_idx - vq->num_added;
+   new = vq->packed.next_avail_idx;
vq->num_added = 0;
 
snapshot.u32 = *(u32 *)vq->packed.vring.device;
@@ -1248,7 +1250,20 @@ static bool virtqueue_kick_prepare_packed(struct 
virtqueue *_vq)
LAST_ADD_TIME_CHECK(vq);
LAST_ADD_TIME_INVALID(vq);
 
-   needs_kick = (flags != VRING_PACKED_EVENT_FLAG_DISABLE);
+   if (flags != VRING_PACKED_EVENT_FLAG_DESC) {
+   needs_kick = (flags != VRING_PACKED_EVENT_FLAG_DISABLE);
+   goto out;
+   }
+
+   off_wrap = le16_to_cpu(snapshot.off_wrap);
+
+   wrap_counter = off_wrap >> VRING_PACKED_EVENT_F_WRAP_CTR;
+   event_idx = off_wrap & ~(1 << VRING_PACKED_EVENT_F_WRAP_CTR);
+   if (wrap_counter != vq->packed.avail_wrap_counter)
+   event_idx -= vq->packed.vring.num;
+
+   needs_kick = vring_need_event(event_idx, new, old);
+out:
END_USE(vq);
return needs_kick;
 }
@@ -1365,6 +1380,18 @@ static void *virtqueue_get_buf_ctx_packed(struct 
virtqueue *_vq,
vq->packed.used_wrap_counter ^= 1;
}
 
+   /*
+* If we expect an interrupt for the next entry, tell host
+* by writing event index and flush out the write before
+* the read in the next get_buf call.
+*/
+   if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DESC)
+   virtio_store_mb(vq->weak_barriers,
+   >packed.vring.driver->off_wrap,
+   cpu_to_le16(vq->last_used_idx |
+   (vq->packed.used_wrap_counter <<
+VRING_PACKED_EVENT_F_WRAP_CTR)));
+
LAST_ADD_TIME_INVALID(vq);
 
END_USE(vq);
@@ -1393,8 +1420,22 @@ static unsigned 
virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
 * more to do.
 */
 
+   if (vq->event) {
+   vq->packed.vring.driver->off_wrap =
+   cpu_to_le16(vq->last_used_idx |
+   (vq->packed.used_wrap_counter <<
+VRING_PACKED_EVENT_F_WRAP_CTR));
+   /*
+* We need to update event offset and event wrap
+* counter first before updating event flags.
+*/
+   virtio_wmb(vq->weak_barriers);
+   }
+
if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DISABLE) {
-   vq->packed.event_flags_shadow = VRING_PACKED_EVENT_FLAG_ENABLE;
+   vq->packed.event_flags_shadow = vq->event ?
+   VRING_PACKED_EVENT_FLAG_DESC :
+   VRING_PACKED_EVENT_FLAG_ENABLE;
vq->packed.vring.driver->flags =
cpu_to_le16(vq->packed.event_flags_shadow);
}
@@ -1420,6 +1461,7 @@ static bool virtqueue_enable_cb_delayed_packed(struct 
virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
u16 used_idx, wrap_counter;
+   u16 bufs;
 
START_USE(vq);
 
@@ -1428,11 +1470,34 @@ static bool virtqueue_enable_cb_delayed_packed(struct 
virtqueue *_vq)
 * more to do.
 */
 
-   used_idx = vq->last_used_idx;
-   wrap_counter = vq->packed.used_wrap_counter;
+   if (vq->event) {
+   /* TODO: tune this threshold */
+   bufs = (vq->packed.vring.num - vq->vq.num_free) * 3 / 4;
+   wrap_counter = vq->packed.used_wrap_counter;
+
+   used_idx = vq->last_used_idx + bufs;
+   if (used_idx >= vq->packed.vring.num) {
+   used_idx -= vq->packed.vring.num;
+   wrap_counter ^= 1;
+   }
+
+   vq->packed.vring.driver->off_wrap = cpu_to_le16(used_idx |
+   (wrap_counter << VRIN

[PATCH net-next v3 10/13] virtio_ring: introduce packed ring support

2018-11-21 Thread Tiwei Bie
Introduce the packed ring support. Packed ring can only be
created by vring_create_virtqueue() and each chunk of packed
ring will be allocated individually. Packed ring can not be
created on preallocated memory by vring_new_virtqueue() or
the likes currently.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 900 +--
 1 file changed, 870 insertions(+), 30 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index aafe1969b45e..b63eee2034e7 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -83,9 +83,26 @@ struct vring_desc_state_split {
struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
 };
 
+struct vring_desc_state_packed {
+   void *data; /* Data for callback. */
+   struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
+   u16 num;/* Descriptor list length. */
+   u16 next;   /* The next desc state in a list. */
+   u16 last;   /* The last desc state in a list. */
+};
+
+struct vring_desc_extra_packed {
+   dma_addr_t addr;/* Buffer DMA addr. */
+   u32 len;/* Buffer length. */
+   u16 flags;  /* Descriptor flags. */
+};
+
 struct vring_virtqueue {
struct virtqueue vq;
 
+   /* Is this a packed ring? */
+   bool packed_ring;
+
/* Is DMA API used? */
bool use_dma_api;
 
@@ -109,23 +126,64 @@ struct vring_virtqueue {
/* Last used index we've seen. */
u16 last_used_idx;
 
-   struct {
-   /* Actual memory layout for this queue */
-   struct vring vring;
+   union {
+   /* Available for split ring */
+   struct {
+   /* Actual memory layout for this queue. */
+   struct vring vring;
 
-   /* Last written value to avail->flags */
-   u16 avail_flags_shadow;
+   /* Last written value to avail->flags */
+   u16 avail_flags_shadow;
 
-   /* Last written value to avail->idx in guest byte order */
-   u16 avail_idx_shadow;
+   /*
+* Last written value to avail->idx in
+* guest byte order.
+*/
+   u16 avail_idx_shadow;
 
-   /* Per-descriptor state. */
-   struct vring_desc_state_split *desc_state;
+   /* Per-descriptor state. */
+   struct vring_desc_state_split *desc_state;
 
-   /* DMA, allocation, and size information */
-   size_t queue_size_in_bytes;
-   dma_addr_t queue_dma_addr;
-   } split;
+   /* DMA address and size information */
+   dma_addr_t queue_dma_addr;
+   size_t queue_size_in_bytes;
+   } split;
+
+   /* Available for packed ring */
+   struct {
+   /* Actual memory layout for this queue. */
+   struct vring_packed vring;
+
+   /* Driver ring wrap counter. */
+   bool avail_wrap_counter;
+
+   /* Device ring wrap counter. */
+   bool used_wrap_counter;
+
+   /* Avail used flags. */
+   u16 avail_used_flags;
+
+   /* Index of the next avail descriptor. */
+   u16 next_avail_idx;
+
+   /*
+* Last written value to driver->flags in
+* guest byte order.
+*/
+   u16 event_flags_shadow;
+
+   /* Per-descriptor state. */
+   struct vring_desc_state_packed *desc_state;
+   struct vring_desc_extra_packed *desc_extra;
+
+   /* DMA address and size information */
+   dma_addr_t ring_dma_addr;
+   dma_addr_t driver_event_dma_addr;
+   dma_addr_t device_event_dma_addr;
+   size_t ring_size_in_bytes;
+   size_t event_size_in_bytes;
+   } packed;
+   };
 
/* How to notify other side. FIXME: commonalize hcalls! */
bool (*notify)(struct virtqueue *vq);
@@ -840,6 +898,717 @@ static struct virtqueue *vring_create_virtqueue_split(
 }
 
 
+/*
+ * Packed ring specific functions - *_packed().
+ */
+
+static void vring_unmap_state_packed(const struct vring_virtqueue *vq,
+struct vring_desc_extra_packed *state)
+{
+   u16 flags;
+
+   if (!vq->use_dma_api)
+   retur

[PATCH net-next v3 09/13] virtio_ring: cache whether we will use DMA API

2018-11-21 Thread Tiwei Bie
Cache whether we will use DMA API, instead of doing the
check every time. We are going to check whether DMA API
is used more often in packed ring.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index d00a87909a7e..aafe1969b45e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -86,6 +86,9 @@ struct vring_desc_state_split {
 struct vring_virtqueue {
struct virtqueue vq;
 
+   /* Is DMA API used? */
+   bool use_dma_api;
+
/* Can we use weak barriers? */
bool weak_barriers;
 
@@ -262,7 +265,7 @@ static dma_addr_t vring_map_one_sg(const struct 
vring_virtqueue *vq,
   struct scatterlist *sg,
   enum dma_data_direction direction)
 {
-   if (!vring_use_dma_api(vq->vq.vdev))
+   if (!vq->use_dma_api)
return (dma_addr_t)sg_phys(sg);
 
/*
@@ -279,7 +282,7 @@ static dma_addr_t vring_map_single(const struct 
vring_virtqueue *vq,
   void *cpu_addr, size_t size,
   enum dma_data_direction direction)
 {
-   if (!vring_use_dma_api(vq->vq.vdev))
+   if (!vq->use_dma_api)
return (dma_addr_t)virt_to_phys(cpu_addr);
 
return dma_map_single(vring_dma_dev(vq),
@@ -289,7 +292,7 @@ static dma_addr_t vring_map_single(const struct 
vring_virtqueue *vq,
 static int vring_mapping_error(const struct vring_virtqueue *vq,
   dma_addr_t addr)
 {
-   if (!vring_use_dma_api(vq->vq.vdev))
+   if (!vq->use_dma_api)
return 0;
 
return dma_mapping_error(vring_dma_dev(vq), addr);
@@ -305,7 +308,7 @@ static void vring_unmap_one_split(const struct 
vring_virtqueue *vq,
 {
u16 flags;
 
-   if (!vring_use_dma_api(vq->vq.vdev))
+   if (!vq->use_dma_api)
return;
 
flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
@@ -1202,6 +1205,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
vq->broken = false;
vq->last_used_idx = 0;
vq->num_added = 0;
+   vq->use_dma_api = vring_use_dma_api(vdev);
list_add_tail(>vq.list, >vqs);
 #ifdef DEBUG
vq->in_use = false;
-- 
2.14.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next v3 08/13] virtio_ring: extract split ring handling from ring creation

2018-11-21 Thread Tiwei Bie
Introduce a specific function to create the split ring.
And also move the DMA allocation and size information to
the .split sub-structure.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 220 ---
 1 file changed, 121 insertions(+), 99 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index acd851f3105c..d00a87909a7e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -118,6 +118,10 @@ struct vring_virtqueue {
 
/* Per-descriptor state. */
struct vring_desc_state_split *desc_state;
+
+   /* DMA, allocation, and size information */
+   size_t queue_size_in_bytes;
+   dma_addr_t queue_dma_addr;
} split;
 
/* How to notify other side. FIXME: commonalize hcalls! */
@@ -125,8 +129,6 @@ struct vring_virtqueue {
 
/* DMA, allocation, and size information */
bool we_own_ring;
-   size_t queue_size_in_bytes;
-   dma_addr_t queue_dma_addr;
 
 #ifdef DEBUG
/* They're supposed to lock for us. */
@@ -203,6 +205,48 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
return false;
 }
 
+static void *vring_alloc_queue(struct virtio_device *vdev, size_t size,
+ dma_addr_t *dma_handle, gfp_t flag)
+{
+   if (vring_use_dma_api(vdev)) {
+   return dma_alloc_coherent(vdev->dev.parent, size,
+ dma_handle, flag);
+   } else {
+   void *queue = alloc_pages_exact(PAGE_ALIGN(size), flag);
+
+   if (queue) {
+   phys_addr_t phys_addr = virt_to_phys(queue);
+   *dma_handle = (dma_addr_t)phys_addr;
+
+   /*
+* Sanity check: make sure we dind't truncate
+* the address.  The only arches I can find that
+* have 64-bit phys_addr_t but 32-bit dma_addr_t
+* are certain non-highmem MIPS and x86
+* configurations, but these configurations
+* should never allocate physical pages above 32
+* bits, so this is fine.  Just in case, throw a
+* warning and abort if we end up with an
+* unrepresentable address.
+*/
+   if (WARN_ON_ONCE(*dma_handle != phys_addr)) {
+   free_pages_exact(queue, PAGE_ALIGN(size));
+   return NULL;
+   }
+   }
+   return queue;
+   }
+}
+
+static void vring_free_queue(struct virtio_device *vdev, size_t size,
+void *queue, dma_addr_t dma_handle)
+{
+   if (vring_use_dma_api(vdev))
+   dma_free_coherent(vdev->dev.parent, size, queue, dma_handle);
+   else
+   free_pages_exact(queue, PAGE_ALIGN(size));
+}
+
 /*
  * The DMA ops on various arches are rather gnarly right now, and
  * making all of the arch DMA ops work on the vring device itself
@@ -730,6 +774,68 @@ static void *virtqueue_detach_unused_buf_split(struct 
virtqueue *_vq)
return NULL;
 }
 
+static struct virtqueue *vring_create_virtqueue_split(
+   unsigned int index,
+   unsigned int num,
+   unsigned int vring_align,
+   struct virtio_device *vdev,
+   bool weak_barriers,
+   bool may_reduce_num,
+   bool context,
+   bool (*notify)(struct virtqueue *),
+   void (*callback)(struct virtqueue *),
+   const char *name)
+{
+   struct virtqueue *vq;
+   void *queue = NULL;
+   dma_addr_t dma_addr;
+   size_t queue_size_in_bytes;
+   struct vring vring;
+
+   /* We assume num is a power of 2. */
+   if (num & (num - 1)) {
+   dev_warn(>dev, "Bad virtqueue length %u\n", num);
+   return NULL;
+   }
+
+   /* TODO: allocate each queue chunk individually */
+   for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) {
+   queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
+ _addr,
+ GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
+   if (queue)
+   break;
+   }
+
+   if (!num)
+   return NULL;
+
+   if (!queue) {
+   /* Try to get a single page. You are my only hope! */
+   queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
+ _addr, GFP_KERNEL|__GFP_ZERO);
+   }
+   if (!queue)
+   return NULL;
+
+   queue_size_in_bytes = vring_size(num, vring_align);
+   vring_init(, num, queue, vring_align);
+
+   vq = __vring_new_virtqueue(index, v

[PATCH net-next v3 06/13] virtio_ring: introduce helper for indirect feature

2018-11-21 Thread Tiwei Bie
Introduce a helper to check whether we will use indirect
feature. It will be used by packed ring too.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 10d407910aa2..d1076f28c7e9 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -145,6 +145,18 @@ struct vring_virtqueue {
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
+static inline bool virtqueue_use_indirect(struct virtqueue *_vq,
+ unsigned int total_sg)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   /*
+* If the host supports indirect descriptor tables, and we have multiple
+* buffers, then go indirect. FIXME: tune this threshold
+*/
+   return (vq->indirect && total_sg > 1 && vq->vq.num_free);
+}
+
 /*
  * Modern virtio devices have feature bits to specify whether they need a
  * quirk and bypass the IOMMU. If not there, just use the DMA API.
@@ -324,9 +336,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
head = vq->free_head;
 
-   /* If the host supports indirect descriptor tables, and we have multiple
-* buffers, then go indirect. FIXME: tune this threshold */
-   if (vq->indirect && total_sg > 1 && vq->vq.num_free)
+   if (virtqueue_use_indirect(_vq, total_sg))
desc = alloc_indirect_split(_vq, total_sg, gfp);
else {
desc = NULL;
-- 
2.14.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next v3 05/13] virtio_ring: introduce debug helpers

2018-11-21 Thread Tiwei Bie
Introduce debug helpers for last_add_time update, check and
invalid. They will be used by packed ring too.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 49 
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 0b97e5c79654..10d407910aa2 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -44,6 +44,26 @@
} while (0)
 #define END_USE(_vq) \
do { BUG_ON(!(_vq)->in_use); (_vq)->in_use = 0; } while(0)
+#define LAST_ADD_TIME_UPDATE(_vq)  \
+   do {\
+   ktime_t now = ktime_get();  \
+   \
+   /* No kick or get, with .1 second between?  Warn. */ \
+   if ((_vq)->last_add_time_valid) \
+   WARN_ON(ktime_to_ms(ktime_sub(now,  \
+   (_vq)->last_add_time)) > 100);  \
+   (_vq)->last_add_time = now; \
+   (_vq)->last_add_time_valid = true;  \
+   } while (0)
+#define LAST_ADD_TIME_CHECK(_vq)   \
+   do {\
+   if ((_vq)->last_add_time_valid) {   \
+   WARN_ON(ktime_to_ms(ktime_sub(ktime_get(), \
+ (_vq)->last_add_time)) > 100); \
+   }   \
+   } while (0)
+#define LAST_ADD_TIME_INVALID(_vq) \
+   ((_vq)->last_add_time_valid = false)
 #else
 #define BAD_RING(_vq, fmt, args...)\
do {\
@@ -53,6 +73,9 @@
} while (0)
 #define START_USE(vq)
 #define END_USE(vq)
+#define LAST_ADD_TIME_UPDATE(vq)
+#define LAST_ADD_TIME_CHECK(vq)
+#define LAST_ADD_TIME_INVALID(vq)
 #endif
 
 struct vring_desc_state {
@@ -295,18 +318,7 @@ static inline int virtqueue_add_split(struct virtqueue 
*_vq,
return -EIO;
}
 
-#ifdef DEBUG
-   {
-   ktime_t now = ktime_get();
-
-   /* No kick or get, with .1 second between?  Warn. */
-   if (vq->last_add_time_valid)
-   WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
-   > 100);
-   vq->last_add_time = now;
-   vq->last_add_time_valid = true;
-   }
-#endif
+   LAST_ADD_TIME_UPDATE(vq);
 
BUG_ON(total_sg == 0);
 
@@ -467,13 +479,8 @@ static bool virtqueue_kick_prepare_split(struct virtqueue 
*_vq)
new = vq->split.avail_idx_shadow;
vq->num_added = 0;
 
-#ifdef DEBUG
-   if (vq->last_add_time_valid) {
-   WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
- vq->last_add_time)) > 100);
-   }
-   vq->last_add_time_valid = false;
-#endif
+   LAST_ADD_TIME_CHECK(vq);
+   LAST_ADD_TIME_INVALID(vq);
 
if (vq->event) {
needs_kick = vring_need_event(virtio16_to_cpu(_vq->vdev,
@@ -597,9 +604,7 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue 
*_vq,
_used_event(>split.vring),
cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
 
-#ifdef DEBUG
-   vq->last_add_time_valid = false;
-#endif
+   LAST_ADD_TIME_INVALID(vq);
 
END_USE(vq);
return ret;
-- 
2.14.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next v3 04/13] virtio_ring: put split ring fields in a sub struct

2018-11-21 Thread Tiwei Bie
Put the split ring specific fields in a sub-struct named
as "split" to avoid misuse after introducing packed ring.
There is no functional change.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 156 +--
 1 file changed, 91 insertions(+), 65 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 7cd40a2a0d21..0b97e5c79654 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -63,9 +63,6 @@ struct vring_desc_state {
 struct vring_virtqueue {
struct virtqueue vq;
 
-   /* Actual memory layout for this queue */
-   struct vring vring;
-
/* Can we use weak barriers? */
bool weak_barriers;
 
@@ -86,11 +83,16 @@ struct vring_virtqueue {
/* Last used index we've seen. */
u16 last_used_idx;
 
-   /* Last written value to avail->flags */
-   u16 avail_flags_shadow;
+   struct {
+   /* Actual memory layout for this queue */
+   struct vring vring;
 
-   /* Last written value to avail->idx in guest byte order */
-   u16 avail_idx_shadow;
+   /* Last written value to avail->flags */
+   u16 avail_flags_shadow;
+
+   /* Last written value to avail->idx in guest byte order */
+   u16 avail_idx_shadow;
+   } split;
 
/* How to notify other side. FIXME: commonalize hcalls! */
bool (*notify)(struct virtqueue *vq);
@@ -316,7 +318,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
desc = alloc_indirect_split(_vq, total_sg, gfp);
else {
desc = NULL;
-   WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
+   WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect);
}
 
if (desc) {
@@ -327,7 +329,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
descs_used = 1;
} else {
indirect = false;
-   desc = vq->vring.desc;
+   desc = vq->split.vring.desc;
i = head;
descs_used = total_sg;
}
@@ -383,10 +385,13 @@ static inline int virtqueue_add_split(struct virtqueue 
*_vq,
if (vring_mapping_error(vq, addr))
goto unmap_release;
 
-   vq->vring.desc[head].flags = cpu_to_virtio16(_vq->vdev, 
VRING_DESC_F_INDIRECT);
-   vq->vring.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
+   vq->split.vring.desc[head].flags = cpu_to_virtio16(_vq->vdev,
+   VRING_DESC_F_INDIRECT);
+   vq->split.vring.desc[head].addr = cpu_to_virtio64(_vq->vdev,
+   addr);
 
-   vq->vring.desc[head].len = cpu_to_virtio32(_vq->vdev, total_sg 
* sizeof(struct vring_desc));
+   vq->split.vring.desc[head].len = cpu_to_virtio32(_vq->vdev,
+   total_sg * sizeof(struct vring_desc));
}
 
/* We're using some buffers from the free list. */
@@ -394,7 +399,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
/* Update free pointer */
if (indirect)
-   vq->free_head = virtio16_to_cpu(_vq->vdev, 
vq->vring.desc[head].next);
+   vq->free_head = virtio16_to_cpu(_vq->vdev,
+   vq->split.vring.desc[head].next);
else
vq->free_head = i;
 
@@ -407,14 +413,15 @@ static inline int virtqueue_add_split(struct virtqueue 
*_vq,
 
/* Put entry in available array (but don't update avail->idx until they
 * do sync). */
-   avail = vq->avail_idx_shadow & (vq->vring.num - 1);
-   vq->vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
+   avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
+   vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
 
/* Descriptors and available array need to be set before we expose the
 * new available array entries. */
virtio_wmb(vq->weak_barriers);
-   vq->avail_idx_shadow++;
-   vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
+   vq->split.avail_idx_shadow++;
+   vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
+   vq->split.avail_idx_shadow);
vq->num_added++;
 
pr_debug("Added buffer head %i to %p\n", head, vq);
@@ -435,7 +442,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
if (i == err_idx)
break;
vring_unmap_one_split(vq, [i]);
-   i = virtio16_to_cpu(_vq->v

[PATCH net-next v3 03/13] virtio_ring: put split ring functions together

2018-11-21 Thread Tiwei Bie
Put the xxx_split() functions together to make the
code more readable and avoid misuse after introducing
the packed ring. There is no functional change.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 587 ++-
 1 file changed, 302 insertions(+), 285 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 29fab2fb39cb..7cd40a2a0d21 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -113,6 +113,11 @@ struct vring_virtqueue {
struct vring_desc_state desc_state[];
 };
 
+
+/*
+ * Helpers.
+ */
+
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
 /*
@@ -200,6 +205,20 @@ static dma_addr_t vring_map_single(const struct 
vring_virtqueue *vq,
  cpu_addr, size, direction);
 }
 
+static int vring_mapping_error(const struct vring_virtqueue *vq,
+  dma_addr_t addr)
+{
+   if (!vring_use_dma_api(vq->vq.vdev))
+   return 0;
+
+   return dma_mapping_error(vring_dma_dev(vq), addr);
+}
+
+
+/*
+ * Split ring specific functions - *_split().
+ */
+
 static void vring_unmap_one_split(const struct vring_virtqueue *vq,
  struct vring_desc *desc)
 {
@@ -225,15 +244,6 @@ static void vring_unmap_one_split(const struct 
vring_virtqueue *vq,
}
 }
 
-static int vring_mapping_error(const struct vring_virtqueue *vq,
-  dma_addr_t addr)
-{
-   if (!vring_use_dma_api(vq->vq.vdev))
-   return 0;
-
-   return dma_mapping_error(vring_dma_dev(vq), addr);
-}
-
 static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
   unsigned int total_sg,
   gfp_t gfp)
@@ -435,121 +445,6 @@ static inline int virtqueue_add_split(struct virtqueue 
*_vq,
return -EIO;
 }
 
-static inline int virtqueue_add(struct virtqueue *_vq,
-   struct scatterlist *sgs[],
-   unsigned int total_sg,
-   unsigned int out_sgs,
-   unsigned int in_sgs,
-   void *data,
-   void *ctx,
-   gfp_t gfp)
-{
-   return virtqueue_add_split(_vq, sgs, total_sg,
-  out_sgs, in_sgs, data, ctx, gfp);
-}
-
-/**
- * virtqueue_add_sgs - expose buffers to other end
- * @vq: the struct virtqueue we're talking about.
- * @sgs: array of terminated scatterlists.
- * @out_num: the number of scatterlists readable by other side
- * @in_num: the number of scatterlists which are writable (after readable ones)
- * @data: the token identifying the buffer.
- * @gfp: how to do memory allocations (if necessary).
- *
- * Caller must ensure we don't call this with other virtqueue operations
- * at the same time (except where noted).
- *
- * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
- */
-int virtqueue_add_sgs(struct virtqueue *_vq,
- struct scatterlist *sgs[],
- unsigned int out_sgs,
- unsigned int in_sgs,
- void *data,
- gfp_t gfp)
-{
-   unsigned int i, total_sg = 0;
-
-   /* Count them first. */
-   for (i = 0; i < out_sgs + in_sgs; i++) {
-   struct scatterlist *sg;
-   for (sg = sgs[i]; sg; sg = sg_next(sg))
-   total_sg++;
-   }
-   return virtqueue_add(_vq, sgs, total_sg, out_sgs, in_sgs,
-data, NULL, gfp);
-}
-EXPORT_SYMBOL_GPL(virtqueue_add_sgs);
-
-/**
- * virtqueue_add_outbuf - expose output buffers to other end
- * @vq: the struct virtqueue we're talking about.
- * @sg: scatterlist (must be well-formed and terminated!)
- * @num: the number of entries in @sg readable by other side
- * @data: the token identifying the buffer.
- * @gfp: how to do memory allocations (if necessary).
- *
- * Caller must ensure we don't call this with other virtqueue operations
- * at the same time (except where noted).
- *
- * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
- */
-int virtqueue_add_outbuf(struct virtqueue *vq,
-struct scatterlist *sg, unsigned int num,
-void *data,
-gfp_t gfp)
-{
-   return virtqueue_add(vq, , num, 1, 0, data, NULL, gfp);
-}
-EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
-
-/**
- * virtqueue_add_inbuf - expose input buffers to other end
- * @vq: the struct virtqueue we're talking about.
- * @sg: scatterlist (must be well-formed and terminated!)
- * @num: the number of entries in @sg writable by other side
- * @data: the token identifying the buffer.
- * @gfp: how to do memory allocations (if necessary).
- *
- * Caller must ensure we don't call this w

[PATCH net-next v3 02/13] virtio_ring: add _split suffix for split ring functions

2018-11-21 Thread Tiwei Bie
Add _split suffix for split ring specific functions. This
is a preparation for introducing the packed ring support.
There is no functional change.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 269 ++-
 1 file changed, 164 insertions(+), 105 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 814b395007b2..29fab2fb39cb 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -200,8 +200,8 @@ static dma_addr_t vring_map_single(const struct 
vring_virtqueue *vq,
  cpu_addr, size, direction);
 }
 
-static void vring_unmap_one(const struct vring_virtqueue *vq,
-   struct vring_desc *desc)
+static void vring_unmap_one_split(const struct vring_virtqueue *vq,
+ struct vring_desc *desc)
 {
u16 flags;
 
@@ -234,8 +234,9 @@ static int vring_mapping_error(const struct vring_virtqueue 
*vq,
return dma_mapping_error(vring_dma_dev(vq), addr);
 }
 
-static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
-unsigned int total_sg, gfp_t gfp)
+static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
+  unsigned int total_sg,
+  gfp_t gfp)
 {
struct vring_desc *desc;
unsigned int i;
@@ -256,14 +257,14 @@ static struct vring_desc *alloc_indirect(struct virtqueue 
*_vq,
return desc;
 }
 
-static inline int virtqueue_add(struct virtqueue *_vq,
-   struct scatterlist *sgs[],
-   unsigned int total_sg,
-   unsigned int out_sgs,
-   unsigned int in_sgs,
-   void *data,
-   void *ctx,
-   gfp_t gfp)
+static inline int virtqueue_add_split(struct virtqueue *_vq,
+ struct scatterlist *sgs[],
+ unsigned int total_sg,
+ unsigned int out_sgs,
+ unsigned int in_sgs,
+ void *data,
+ void *ctx,
+ gfp_t gfp)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
struct scatterlist *sg;
@@ -302,7 +303,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
/* If the host supports indirect descriptor tables, and we have multiple
 * buffers, then go indirect. FIXME: tune this threshold */
if (vq->indirect && total_sg > 1 && vq->vq.num_free)
-   desc = alloc_indirect(_vq, total_sg, gfp);
+   desc = alloc_indirect_split(_vq, total_sg, gfp);
else {
desc = NULL;
WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
@@ -423,7 +424,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
for (n = 0; n < total_sg; n++) {
if (i == err_idx)
break;
-   vring_unmap_one(vq, [i]);
+   vring_unmap_one_split(vq, [i]);
i = virtio16_to_cpu(_vq->vdev, vq->vring.desc[i].next);
}
 
@@ -434,6 +435,19 @@ static inline int virtqueue_add(struct virtqueue *_vq,
return -EIO;
 }
 
+static inline int virtqueue_add(struct virtqueue *_vq,
+   struct scatterlist *sgs[],
+   unsigned int total_sg,
+   unsigned int out_sgs,
+   unsigned int in_sgs,
+   void *data,
+   void *ctx,
+   gfp_t gfp)
+{
+   return virtqueue_add_split(_vq, sgs, total_sg,
+  out_sgs, in_sgs, data, ctx, gfp);
+}
+
 /**
  * virtqueue_add_sgs - expose buffers to other end
  * @vq: the struct virtqueue we're talking about.
@@ -536,18 +550,7 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
 
-/**
- * virtqueue_kick_prepare - first half of split virtqueue_kick call.
- * @vq: the struct virtqueue
- *
- * Instead of virtqueue_kick(), you can do:
- * if (virtqueue_kick_prepare(vq))
- * virtqueue_notify(vq);
- *
- * This is sometimes useful because the virtqueue_kick_prepare() needs
- * to be serialized, but the actual virtqueue_notify() call does not.
- */
-bool virtqueue_kick_prepare(struct virtqueue *_vq)
+static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
u16 new, old;
@@ -579,6 +582,22 @@ bool virtqueue_kick_prepare(struct virtqueue *

[PATCH net-next v3 01/13] virtio: add packed ring types and macros

2018-11-21 Thread Tiwei Bie
Add types and macros for packed ring.

Signed-off-by: Tiwei Bie 
---
 include/uapi/linux/virtio_config.h |  3 +++
 include/uapi/linux/virtio_ring.h   | 52 ++
 2 files changed, 55 insertions(+)

diff --git a/include/uapi/linux/virtio_config.h 
b/include/uapi/linux/virtio_config.h
index 449132c76b1c..1196e1c1d4f6 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -75,6 +75,9 @@
  */
 #define VIRTIO_F_IOMMU_PLATFORM33
 
+/* This feature indicates support for the packed virtqueue layout. */
+#define VIRTIO_F_RING_PACKED   34
+
 /*
  * Does the device support Single Root I/O Virtualization?
  */
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 6d5d5faa989b..2414f8af26b3 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -44,6 +44,13 @@
 /* This means the buffer contains a list of buffer descriptors. */
 #define VRING_DESC_F_INDIRECT  4
 
+/*
+ * Mark a descriptor as available or used in packed ring.
+ * Notice: they are defined as shifts instead of shifted values.
+ */
+#define VRING_PACKED_DESC_F_AVAIL  7
+#define VRING_PACKED_DESC_F_USED   15
+
 /* The Host uses this in used->flags to advise the Guest: don't kick me when
  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
  * will still kick if it's out of buffers. */
@@ -53,6 +60,23 @@
  * optimization.  */
 #define VRING_AVAIL_F_NO_INTERRUPT 1
 
+/* Enable events in packed ring. */
+#define VRING_PACKED_EVENT_FLAG_ENABLE 0x0
+/* Disable events in packed ring. */
+#define VRING_PACKED_EVENT_FLAG_DISABLE0x1
+/*
+ * Enable events for a specific descriptor in packed ring.
+ * (as specified by Descriptor Ring Change Event Offset/Wrap Counter).
+ * Only valid if VIRTIO_RING_F_EVENT_IDX has been negotiated.
+ */
+#define VRING_PACKED_EVENT_FLAG_DESC   0x2
+
+/*
+ * Wrap counter bit shift in event suppression structure
+ * of packed ring.
+ */
+#define VRING_PACKED_EVENT_F_WRAP_CTR  15
+
 /* We support indirect buffer descriptors */
 #define VIRTIO_RING_F_INDIRECT_DESC28
 
@@ -171,4 +195,32 @@ static inline int vring_need_event(__u16 event_idx, __u16 
new_idx, __u16 old)
return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);
 }
 
+struct vring_packed_desc_event {
+   /* Descriptor Ring Change Event Offset/Wrap Counter. */
+   __le16 off_wrap;
+   /* Descriptor Ring Change Event Flags. */
+   __le16 flags;
+};
+
+struct vring_packed_desc {
+   /* Buffer Address. */
+   __le64 addr;
+   /* Buffer Length. */
+   __le32 len;
+   /* Buffer ID. */
+   __le16 id;
+   /* The flags depending on descriptor type. */
+   __le16 flags;
+};
+
+struct vring_packed {
+   unsigned int num;
+
+   struct vring_packed_desc *desc;
+
+   struct vring_packed_desc_event *driver;
+
+   struct vring_packed_desc_event *device;
+};
+
 #endif /* _UAPI_LINUX_VIRTIO_RING_H */
-- 
2.14.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v3 00/13] virtio: support packed ring

2018-11-21 Thread Tiwei Bie
On Wed, Nov 21, 2018 at 07:20:27AM -0500, Michael S. Tsirkin wrote:
> On Wed, Nov 21, 2018 at 06:03:17PM +0800, Tiwei Bie wrote:
> > Hi,
> > 
> > This patch set implements packed ring support in virtio driver.
> > 
> > A performance test between pktgen (pktgen_sample03_burst_single_flow.sh)
> > and DPDK vhost (testpmd/rxonly/vhost-PMD) has been done, I saw
> > ~30% performance gain in packed ring in this case.
> 
> Thanks a lot, this is very exciting!
> Dave, given the holiday, attempts to wrap up the 1.1 spec and the
> patchset size I would very much appreciate a bit more time for
> review. Say until Nov 28?
> 
> > To make this patch set work with below patch set for vhost,
> > some hacks are needed to set the _F_NEXT flag in indirect
> > descriptors (this should be fixed in vhost):
> > 
> > https://lkml.org/lkml/2018/7/3/33
> 
> Could you pls clarify - do you mean it doesn't yet work with vhost
> because of a vhost bug, and to test it with the linked patches
> you had to hack in _F_NEXT? Because I do not see _F_NEXT
> in indirect descriptors in this patch (which is fine).
> Or did I miss it?

You didn't miss anything. :)

I think it's a small bug in vhost, which Jason may fix very
quickly, so I didn't post it. Below is the hack I used:

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cd7e755484e3..42faea7d8cf8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -980,6 +980,7 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
unsigned int i, n, err_idx;
u16 head, id;
dma_addr_t addr;
+   int c = 0;
 
head = vq->packed.next_avail_idx;
desc = alloc_indirect_packed(total_sg, gfp);
@@ -1001,8 +1002,9 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
if (vring_mapping_error(vq, addr))
goto unmap_release;
 
-   desc[i].flags = cpu_to_le16(n < out_sgs ?
-   0 : VRING_DESC_F_WRITE);
+   desc[i].flags = cpu_to_le16((n < out_sgs ?
+   0 : VRING_DESC_F_WRITE) |
+   (++c == total_sg ? 0 : VRING_DESC_F_NEXT));
desc[i].addr = cpu_to_le64(addr);
desc[i].len = cpu_to_le32(sg->length);
i++;
-- 
2.14.1

> 
> > v2 -> v3:
> > - Use leXX instead of virtioXX (MST);
> > - Refactor split ring first (MST);
> > - Add debug helpers (MST);
> > - Put split/packed ring specific fields in sub structures (MST);
> > - Handle normal descriptors and indirect descriptors differently (MST);
> > - Track the DMA addr/len related info in a separate structure (MST);
> > - Calculate AVAIL/USED flags only when wrap counter wraps (MST);
> > - Define a struct/union to read event structure (MST);
> > - Define a macro for wrap counter bit in uapi (MST);
> > - Define the AVAIL/USED bits as shifts instead of values (MST);
> > - s/_F_/_FLAG_/ in VRING_PACKED_EVENT_* as they are values (MST);
> > - Drop the notify workaround for QEMU's tx-timer in packed ring (MST);
> > 
> > v1 -> v2:
> > - Use READ_ONCE() to read event off_wrap and flags together (Jason);
> > - Add comments related to ccw (Jason);
> > 
> > RFC v6 -> v1:
> > - Avoid extra virtio_wmb() in virtqueue_enable_cb_delayed_packed()
> >   when event idx is off (Jason);
> > - Fix bufs calculation in virtqueue_enable_cb_delayed_packed() (Jason);
> > - Test the state of the desc at used_idx instead of last_used_idx
> >   in virtqueue_enable_cb_delayed_packed() (Jason);
> > - Save wrap counter (as part of queue state) in the return value
> >   of virtqueue_enable_cb_prepare_packed();
> > - Refine the packed ring definitions in uapi;
> > - Rebase on the net-next tree;
> > 
> > RFC v5 -> RFC v6:
> > - Avoid tracking addr/len/flags when DMA API isn't used (MST/Jason);
> > - Define wrap counter as bool (Jason);
> > - Use ALIGN() in vring_init_packed() (Jason);
> > - Avoid using pointer to track `next` in detach_buf_packed() (Jason);
> > - Add comments for barriers (Jason);
> > - Don't enable RING_PACKED on ccw for now (noticed by Jason);
> > - Refine the memory barrier in virtqueue_poll();
> > - Add a missing memory barrier in virtqueue_enable_cb_delayed_packed();
> > - Remove the hacks in virtqueue_enable_cb_prepare_packed();
> > 
> > RFC v4 -> RFC v5:
> > - Save DMA addr, etc in desc state (Jason);
> > - Track used wrap counter;
> > 
> > RFC v3 -> RFC v4:
>

[PATCH net-next v3 00/13] virtio: support packed ring

2018-11-21 Thread Tiwei Bie
Hi,

This patch set implements packed ring support in virtio driver.

A performance test between pktgen (pktgen_sample03_burst_single_flow.sh)
and DPDK vhost (testpmd/rxonly/vhost-PMD) has been done, I saw
~30% performance gain in packed ring in this case.

To make this patch set work with below patch set for vhost,
some hacks are needed to set the _F_NEXT flag in indirect
descriptors (this should be fixed in vhost):

https://lkml.org/lkml/2018/7/3/33

v2 -> v3:
- Use leXX instead of virtioXX (MST);
- Refactor split ring first (MST);
- Add debug helpers (MST);
- Put split/packed ring specific fields in sub structures (MST);
- Handle normal descriptors and indirect descriptors differently (MST);
- Track the DMA addr/len related info in a separate structure (MST);
- Calculate AVAIL/USED flags only when wrap counter wraps (MST);
- Define a struct/union to read event structure (MST);
- Define a macro for wrap counter bit in uapi (MST);
- Define the AVAIL/USED bits as shifts instead of values (MST);
- s/_F_/_FLAG_/ in VRING_PACKED_EVENT_* as they are values (MST);
- Drop the notify workaround for QEMU's tx-timer in packed ring (MST);

v1 -> v2:
- Use READ_ONCE() to read event off_wrap and flags together (Jason);
- Add comments related to ccw (Jason);

RFC v6 -> v1:
- Avoid extra virtio_wmb() in virtqueue_enable_cb_delayed_packed()
  when event idx is off (Jason);
- Fix bufs calculation in virtqueue_enable_cb_delayed_packed() (Jason);
- Test the state of the desc at used_idx instead of last_used_idx
  in virtqueue_enable_cb_delayed_packed() (Jason);
- Save wrap counter (as part of queue state) in the return value
  of virtqueue_enable_cb_prepare_packed();
- Refine the packed ring definitions in uapi;
- Rebase on the net-next tree;

RFC v5 -> RFC v6:
- Avoid tracking addr/len/flags when DMA API isn't used (MST/Jason);
- Define wrap counter as bool (Jason);
- Use ALIGN() in vring_init_packed() (Jason);
- Avoid using pointer to track `next` in detach_buf_packed() (Jason);
- Add comments for barriers (Jason);
- Don't enable RING_PACKED on ccw for now (noticed by Jason);
- Refine the memory barrier in virtqueue_poll();
- Add a missing memory barrier in virtqueue_enable_cb_delayed_packed();
- Remove the hacks in virtqueue_enable_cb_prepare_packed();

RFC v4 -> RFC v5:
- Save DMA addr, etc in desc state (Jason);
- Track used wrap counter;

RFC v3 -> RFC v4:
- Make ID allocation support out-of-order (Jason);
- Various fixes for EVENT_IDX support;

RFC v2 -> RFC v3:
- Split into small patches (Jason);
- Add helper virtqueue_use_indirect() (Jason);
- Just set id for the last descriptor of a list (Jason);
- Calculate the prev in virtqueue_add_packed() (Jason);
- Fix/improve desc suppression code (Jason/MST);
- Refine the code layout for XXX_split/packed and wrappers (MST);
- Fix the comments and API in uapi (MST);
- Remove the BUG_ON() for indirect (Jason);
- Some other refinements and bug fixes;

RFC v1 -> RFC v2:
- Add indirect descriptor support - compile test only;
- Add event suppression supprt - compile test only;
- Move vring_packed_init() out of uapi (Jason, MST);
- Merge two loops into one in virtqueue_add_packed() (Jason);
- Split vring_unmap_one() for packed ring and split ring (Jason);
- Avoid using '%' operator (Jason);
- Rename free_head -> next_avail_idx (Jason);
- Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
- Some other refinements and bug fixes;


Tiwei Bie (13):
  virtio: add packed ring types and macros
  virtio_ring: add _split suffix for split ring functions
  virtio_ring: put split ring functions together
  virtio_ring: put split ring fields in a sub struct
  virtio_ring: introduce debug helpers
  virtio_ring: introduce helper for indirect feature
  virtio_ring: allocate desc state for split ring separately
  virtio_ring: extract split ring handling from ring creation
  virtio_ring: cache whether we will use DMA API
  virtio_ring: introduce packed ring support
  virtio_ring: leverage event idx in packed ring
  virtio_ring: disable packed ring on unsupported transports
  virtio_ring: advertize packed ring layout

 drivers/misc/mic/vop/vop_main.c|   13 +
 drivers/remoteproc/remoteproc_virtio.c |   13 +
 drivers/s390/virtio/virtio_ccw.c   |   14 +
 drivers/virtio/virtio_ring.c   | 1811 +---
 include/uapi/linux/virtio_config.h |3 +
 include/uapi/linux/virtio_ring.h   |   52 +
 6 files changed, 1530 insertions(+), 376 deletions(-)

-- 
2.14.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next v3 07/13] virtio_ring: allocate desc state for split ring separately

2018-11-21 Thread Tiwei Bie
Put the split ring's desc state into the .split sub-structure,
and allocate desc state for split ring separately, this makes
the code more readable and more consistent with what we will
do for packed ring.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 45 ++--
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index d1076f28c7e9..acd851f3105c 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -78,7 +78,7 @@
 #define LAST_ADD_TIME_INVALID(vq)
 #endif
 
-struct vring_desc_state {
+struct vring_desc_state_split {
void *data; /* Data for callback. */
struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
 };
@@ -115,6 +115,9 @@ struct vring_virtqueue {
 
/* Last written value to avail->idx in guest byte order */
u16 avail_idx_shadow;
+
+   /* Per-descriptor state. */
+   struct vring_desc_state_split *desc_state;
} split;
 
/* How to notify other side. FIXME: commonalize hcalls! */
@@ -133,9 +136,6 @@ struct vring_virtqueue {
bool last_add_time_valid;
ktime_t last_add_time;
 #endif
-
-   /* Per-descriptor state. */
-   struct vring_desc_state desc_state[];
 };
 
 
@@ -427,11 +427,11 @@ static inline int virtqueue_add_split(struct virtqueue 
*_vq,
vq->free_head = i;
 
/* Store token and indirect buffer state. */
-   vq->desc_state[head].data = data;
+   vq->split.desc_state[head].data = data;
if (indirect)
-   vq->desc_state[head].indir_desc = desc;
+   vq->split.desc_state[head].indir_desc = desc;
else
-   vq->desc_state[head].indir_desc = ctx;
+   vq->split.desc_state[head].indir_desc = ctx;
 
/* Put entry in available array (but don't update avail->idx until they
 * do sync). */
@@ -512,7 +512,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
unsigned int head,
__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
 
/* Clear data ptr. */
-   vq->desc_state[head].data = NULL;
+   vq->split.desc_state[head].data = NULL;
 
/* Put back on free list: unmap first-level descriptors and find end */
i = head;
@@ -532,7 +532,8 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
unsigned int head,
vq->vq.num_free++;
 
if (vq->indirect) {
-   struct vring_desc *indir_desc = vq->desc_state[head].indir_desc;
+   struct vring_desc *indir_desc =
+   vq->split.desc_state[head].indir_desc;
u32 len;
 
/* Free the indirect table, if any, now that it's unmapped. */
@@ -550,9 +551,9 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
unsigned int head,
vring_unmap_one_split(vq, _desc[j]);
 
kfree(indir_desc);
-   vq->desc_state[head].indir_desc = NULL;
+   vq->split.desc_state[head].indir_desc = NULL;
} else if (ctx) {
-   *ctx = vq->desc_state[head].indir_desc;
+   *ctx = vq->split.desc_state[head].indir_desc;
}
 }
 
@@ -597,13 +598,13 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue 
*_vq,
BAD_RING(vq, "id %u out of range\n", i);
return NULL;
}
-   if (unlikely(!vq->desc_state[i].data)) {
+   if (unlikely(!vq->split.desc_state[i].data)) {
BAD_RING(vq, "id %u is not a head!\n", i);
return NULL;
}
 
/* detach_buf_split clears data, so grab it now. */
-   ret = vq->desc_state[i].data;
+   ret = vq->split.desc_state[i].data;
detach_buf_split(vq, i, ctx);
vq->last_used_idx++;
/* If we expect an interrupt for the next entry, tell host
@@ -711,10 +712,10 @@ static void *virtqueue_detach_unused_buf_split(struct 
virtqueue *_vq)
START_USE(vq);
 
for (i = 0; i < vq->split.vring.num; i++) {
-   if (!vq->desc_state[i].data)
+   if (!vq->split.desc_state[i].data)
continue;
/* detach_buf_split clears data, so grab it now. */
-   buf = vq->desc_state[i].data;
+   buf = vq->split.desc_state[i].data;
detach_buf_split(vq, i, NULL);
vq->split.avail_idx_shadow--;
vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
@@ -1080,8 +1081,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
unsigned int i;
struct vring_virtqueue *vq;
 
-   vq = kmalloc(sizeof(*vq) + vring.num * sizeof(struct vring_desc_state),

Re: [PATCH net-next v2 3/5] virtio_ring: add packed ring support

2018-11-08 Thread Tiwei Bie
On Thu, Nov 08, 2018 at 10:56:02AM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 08, 2018 at 07:51:48PM +0800, Tiwei Bie wrote:
> > On Thu, Nov 08, 2018 at 04:18:25PM +0800, Jason Wang wrote:
> > > 
> > > On 2018/11/8 上午9:38, Tiwei Bie wrote:
> > > > > > +
> > > > > > +   if (vq->vq.num_free < descs_used) {
> > > > > > +   pr_debug("Can't add buf len %i - avail = %i\n",
> > > > > > +descs_used, vq->vq.num_free);
> > > > > > +   /* FIXME: for historical reasons, we force a notify 
> > > > > > here if
> > > > > > +* there are outgoing parts to the buffer.  Presumably 
> > > > > > the
> > > > > > +* host should service the ring ASAP. */
> > > > > I don't think we have a reason to do this for packed ring.
> > > > > No historical baggage there, right?
> > > > Based on the original commit log, it seems that the notify here
> > > > is just an "optimization". But I don't quite understand what does
> > > > the "the heuristics which KVM uses" refer to. If it's safe to drop
> > > > this in packed ring, I'd like to do it.
> > > 
> > > 
> > > According to the commit log, it seems like a workaround of lguest 
> > > networking
> > > backend.
> > 
> > Do you know why removing this notify in Tx will break "the
> > heuristics which KVM uses"? Or what does "the heuristics
> > which KVM uses" refer to?
> 
> Yes. QEMU has a mode where it disables notifications and processes TX
> ring periodically from a timer.  It's off by default but used to be on
> by default a long time ago. If ring becomes full this causes traffic
> stalls.  As a work-around Rusty put in this hack to kick on ring full
> even with notifications disabled.  It's easy enough to make sure QEMU
> does not combine devices with packed ring support with the timer hack.
> And I am guessing it's safe enough to also block that option completely
> e.g. when virtio 1.0 is enabled.

I see. Thanks!

> 
> > 
> > > I agree to drop it, we should not have such burden.
> > > 
> > > But we should notice that, with this removed, the compare between packed 
> > > vs
> > > split is kind of unfair. Consider the removal of lguest support recently,
> > > maybe we can drop this for split ring as well?
> > > 
> > > Thanks
> > > 
> > > 
> > > > 
> > > > commit 44653eae1407f79dff6f52fcf594ae84cb165ec4
> > > > Author: Rusty Russell
> > > > Date:   Fri Jul 25 12:06:04 2008 -0500
> > > > 
> > > >  virtio: don't always force a notification when ring is full
> > > >  We force notification when the ring is full, even if the host has
> > > >  indicated it doesn't want to know.  This seemed like a good idea at
> > > >  the time: if we fill the transmit ring, we should tell the host
> > > >  immediately.
> > > >  Unfortunately this logic also applies to the receiving ring, which 
> > > > is
> > > >  refilled constantly.  We should introduce real notification 
> > > > thesholds
> > > >  to replace this logic.  Meanwhile, removing the logic altogether 
> > > > breaks
> > > >  the heuristics which KVM uses, so we use a hack: only notify if 
> > > > there are
> > > >  outgoing parts of the new buffer.
> > > >  Here are the number of exits with lguest's crappy network 
> > > > implementation:
> > > >  Before:
> > > >  network xmit 7859051 recv 236420
> > > >  After:
> > > >  network xmit 7858610 recv 118136
> > > >  Signed-off-by: Rusty Russell
> > > > 
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 72bf8bc09014..21d9a62767af 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -87,8 +87,11 @@ static int vring_add_buf(struct virtqueue *_vq,
> > > > if (vq->num_free < out + in) {
> > > > pr_debug("Can't add buf len %i - avail = %i\n",
> > > >  out + in, vq->num_free);
> > > > -   /* We notify*even if*  VRING_USED_F_NO_NOTIFY is set 
> > > > here. */
> > > > -   vq->notify(>vq);
> > > > +   /* FIXME: for historical reasons, we force a notify 
> > > > here if
> > > > +* there are outgoing parts to the buffer.  Presumably 
> > > > the
> > > > +* host should service the ring ASAP. */
> > > > +   if (out)
> > > > +   vq->notify(>vq);
> > > > END_USE(vq);
> > > > return -ENOSPC;
> > > > }
> > > > 
> > > > 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v2 3/5] virtio_ring: add packed ring support

2018-11-08 Thread Tiwei Bie
On Thu, Nov 08, 2018 at 04:18:25PM +0800, Jason Wang wrote:
> 
> On 2018/11/8 上午9:38, Tiwei Bie wrote:
> > > > +
> > > > +   if (vq->vq.num_free < descs_used) {
> > > > +   pr_debug("Can't add buf len %i - avail = %i\n",
> > > > +descs_used, vq->vq.num_free);
> > > > +   /* FIXME: for historical reasons, we force a notify 
> > > > here if
> > > > +* there are outgoing parts to the buffer.  Presumably 
> > > > the
> > > > +* host should service the ring ASAP. */
> > > I don't think we have a reason to do this for packed ring.
> > > No historical baggage there, right?
> > Based on the original commit log, it seems that the notify here
> > is just an "optimization". But I don't quite understand what does
> > the "the heuristics which KVM uses" refer to. If it's safe to drop
> > this in packed ring, I'd like to do it.
> 
> 
> According to the commit log, it seems like a workaround of lguest networking
> backend.

Do you know why removing this notify in Tx will break "the
heuristics which KVM uses"? Or what does "the heuristics
which KVM uses" refer to?


> I agree to drop it, we should not have such burden.
> 
> But we should notice that, with this removed, the compare between packed vs
> split is kind of unfair. Consider the removal of lguest support recently,
> maybe we can drop this for split ring as well?
> 
> Thanks
> 
> 
> > 
> > commit 44653eae1407f79dff6f52fcf594ae84cb165ec4
> > Author: Rusty Russell
> > Date:   Fri Jul 25 12:06:04 2008 -0500
> > 
> >  virtio: don't always force a notification when ring is full
> >  We force notification when the ring is full, even if the host has
> >  indicated it doesn't want to know.  This seemed like a good idea at
> >  the time: if we fill the transmit ring, we should tell the host
> >  immediately.
> >  Unfortunately this logic also applies to the receiving ring, which is
> >  refilled constantly.  We should introduce real notification thesholds
> >  to replace this logic.  Meanwhile, removing the logic altogether breaks
> >  the heuristics which KVM uses, so we use a hack: only notify if there 
> > are
> >  outgoing parts of the new buffer.
> >  Here are the number of exits with lguest's crappy network 
> > implementation:
> >  Before:
> >  network xmit 7859051 recv 236420
> >  After:
> >  network xmit 7858610 recv 118136
> >  Signed-off-by: Rusty Russell
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 72bf8bc09014..21d9a62767af 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -87,8 +87,11 @@ static int vring_add_buf(struct virtqueue *_vq,
> > if (vq->num_free < out + in) {
> > pr_debug("Can't add buf len %i - avail = %i\n",
> >  out + in, vq->num_free);
> > -   /* We notify*even if*  VRING_USED_F_NO_NOTIFY is set here. */
> > -   vq->notify(>vq);
> > +   /* FIXME: for historical reasons, we force a notify here if
> > +* there are outgoing parts to the buffer.  Presumably the
> > +* host should service the ring ASAP. */
> > +   if (out)
> > +   vq->notify(>vq);
> > END_USE(vq);
> > return -ENOSPC;
> > }
> > 
> > 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v2 3/5] virtio_ring: add packed ring support

2018-11-07 Thread Tiwei Bie
On Wed, Nov 07, 2018 at 12:48:46PM -0500, Michael S. Tsirkin wrote:
> On Wed, Jul 11, 2018 at 10:27:09AM +0800, Tiwei Bie wrote:
> > This commit introduces the support (without EVENT_IDX) for
> > packed ring.
> > 
> > Signed-off-by: Tiwei Bie 
> > ---
> >  drivers/virtio/virtio_ring.c | 495 ++-
> >  1 file changed, 487 insertions(+), 8 deletions(-)
[...]
> >  
> > +static void vring_unmap_state_packed(const struct vring_virtqueue *vq,
> > +struct vring_desc_state_packed *state)
> > +{
> > +   u16 flags;
> > +
> > +   if (!vring_use_dma_api(vq->vq.vdev))
> > +   return;
> > +
> > +   flags = state->flags;
> > +
> > +   if (flags & VRING_DESC_F_INDIRECT) {
> > +   dma_unmap_single(vring_dma_dev(vq),
> > +state->addr, state->len,
> > +(flags & VRING_DESC_F_WRITE) ?
> > +DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > +   } else {
> > +   dma_unmap_page(vring_dma_dev(vq),
> > +  state->addr, state->len,
> > +  (flags & VRING_DESC_F_WRITE) ?
> > +  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > +   }
> > +}
> > +
> > +static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
> > +  struct vring_packed_desc *desc)
> > +{
> > +   u16 flags;
> > +
> > +   if (!vring_use_dma_api(vq->vq.vdev))
> > +   return;
> > +
> > +   flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> 
> BTW this stuff is only used on error etc. Is there a way to
> reuse vring_unmap_state_packed?

It's also used by the INDIRECT path. We don't allocate desc
state for INDIRECT descriptors to save DMA addr/len etc.

> 
> > +
> > +   if (flags & VRING_DESC_F_INDIRECT) {
> > +   dma_unmap_single(vring_dma_dev(vq),
> > +virtio64_to_cpu(vq->vq.vdev, desc->addr),
> > +virtio32_to_cpu(vq->vq.vdev, desc->len),
> > +(flags & VRING_DESC_F_WRITE) ?
> > +DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > +   } else {
> > +   dma_unmap_page(vring_dma_dev(vq),
> > +  virtio64_to_cpu(vq->vq.vdev, desc->addr),
> > +  virtio32_to_cpu(vq->vq.vdev, desc->len),
> > +  (flags & VRING_DESC_F_WRITE) ?
> > +  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > +   }
> > +}
[...]
> > @@ -766,47 +840,449 @@ static inline int virtqueue_add_packed(struct 
> > virtqueue *_vq,
> >void *ctx,
> >gfp_t gfp)
> >  {
> > +   struct vring_virtqueue *vq = to_vvq(_vq);
> > +   struct vring_packed_desc *desc;
> > +   struct scatterlist *sg;
> > +   unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
> > +   __virtio16 uninitialized_var(head_flags), flags;
> > +   u16 head, avail_wrap_counter, id, curr;
> > +   bool indirect;
> > +
> > +   START_USE(vq);
> > +
> > +   BUG_ON(data == NULL);
> > +   BUG_ON(ctx && vq->indirect);
> > +
> > +   if (unlikely(vq->broken)) {
> > +   END_USE(vq);
> > +   return -EIO;
> > +   }
> > +
> > +#ifdef DEBUG
> > +   {
> > +   ktime_t now = ktime_get();
> > +
> > +   /* No kick or get, with .1 second between?  Warn. */
> > +   if (vq->last_add_time_valid)
> > +   WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
> > +   > 100);
> > +   vq->last_add_time = now;
> > +   vq->last_add_time_valid = true;
> > +   }
> > +#endif
> > +
> > +   BUG_ON(total_sg == 0);
> > +
> > +   head = vq->next_avail_idx;
> > +   avail_wrap_counter = vq->avail_wrap_counter;
> > +
> > +   if (virtqueue_use_indirect(_vq, total_sg))
> > +   desc = alloc_indirect_packed(_vq, total_sg, gfp);
> > +   else {
> > +   desc = NULL;
> > +   WARN_ON_ONCE(total_sg > vq->vring_packed.num && !vq->indirect);
> > +   }
> > +
> > +   if (desc) {
> > +   /* Use a single buffer which doesn't continue */
> > +   indirect = tr

Re: [virtio-dev] Re: [PATCH net-next v2 0/5] virtio: support packed ring

2018-10-11 Thread Tiwei Bie
On Thu, Oct 11, 2018 at 10:17:15AM -0400, Michael S. Tsirkin wrote:
> On Thu, Oct 11, 2018 at 10:13:31PM +0800, Tiwei Bie wrote:
> > On Thu, Oct 11, 2018 at 09:48:48AM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Oct 11, 2018 at 08:12:21PM +0800, Tiwei Bie wrote:
> > > > > > But if it's not too late, I second for a OUT_OF_ORDER feature.
> > > > > > Starting from in order can have much simpler code in driver.
> > > > > > 
> > > > > > Thanks
> > > > > 
> > > > > It's tricky to change the flag polarity because of compatibility
> > > > > with legacy interfaces. Why is this such a big deal?
> > > > > 
> > > > > Let's teach drivers about IN_ORDER, then if devices
> > > > > are in order it will get enabled by default.
> > > > 
> > > > Yeah, make sense.
> > > > 
> > > > Besides, I have done some further profiling and debugging
> > > > both in kernel driver and DPDK vhost. Previously I was mislead
> > > > by a bug in vhost code. I will send a patch to fix that bug.
> > > > With that bug fixed, the performance of packed ring in the
> > > > test between kernel driver and DPDK vhost is better now.
> > > 
> > > OK, if we get a performance gain on the virtio side, we can finally
> > > upstream it. If you see that please re-post ASAP so we can
> > > put it in the next kernel release.
> > 
> > Got it, I will re-post ASAP.
> > 
> > Thanks!
> 
> 
> Pls remember to include data on performance gain in the cover letter.

Sure. I'll try to include some performance analyses.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH net-next v2 0/5] virtio: support packed ring

2018-10-11 Thread Tiwei Bie
On Thu, Oct 11, 2018 at 09:48:48AM -0400, Michael S. Tsirkin wrote:
> On Thu, Oct 11, 2018 at 08:12:21PM +0800, Tiwei Bie wrote:
> > > > But if it's not too late, I second for a OUT_OF_ORDER feature.
> > > > Starting from in order can have much simpler code in driver.
> > > > 
> > > > Thanks
> > > 
> > > It's tricky to change the flag polarity because of compatibility
> > > with legacy interfaces. Why is this such a big deal?
> > > 
> > > Let's teach drivers about IN_ORDER, then if devices
> > > are in order it will get enabled by default.
> > 
> > Yeah, make sense.
> > 
> > Besides, I have done some further profiling and debugging
> > both in kernel driver and DPDK vhost. Previously I was mislead
> > by a bug in vhost code. I will send a patch to fix that bug.
> > With that bug fixed, the performance of packed ring in the
> > test between kernel driver and DPDK vhost is better now.
> 
> OK, if we get a performance gain on the virtio side, we can finally
> upstream it. If you see that please re-post ASAP so we can
> put it in the next kernel release.

Got it, I will re-post ASAP.

Thanks!


> 
> -- 
> MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH net-next v2 0/5] virtio: support packed ring

2018-10-11 Thread Tiwei Bie
On Wed, Oct 10, 2018 at 10:36:26AM -0400, Michael S. Tsirkin wrote:
> On Thu, Sep 13, 2018 at 05:47:29PM +0800, Jason Wang wrote:
> > On 2018年09月13日 16:59, Tiwei Bie wrote:
> > > > If what you say is true then we should take a careful look
> > > > and not supporting these generic things with packed layout.
> > > > Once we do support them it will be too late and we won't
> > > > be able to get performance back.
> > > I think it's a good point that we don't need to support
> > > everything in packed ring (especially these which would
> > > hurt the performance), as the packed ring aims at high
> > > performance. I'm also wondering about the features. Is
> > > there any possibility that we won't support the out of
> > > order processing (at least not by default) in packed ring?
> > > If I didn't miss anything, the need to support out of order
> > > processing in packed ring will make the data structure
> > > inside the driver not cache friendly which is similar to
> > > the case of the descriptor table in the split ring (the
> > > difference is that, it only happens in driver now).
> > 
> > Out of order is not the only user, DMA is another one. We don't have used
> > ring(len), so we need to maintain buffer length somewhere even for in order
> > device.
> 
> For a bunch of systems dma unmap is a nop so we do not really
> need to maintain it. It's a question of an API to detect that
> and optimize for it. I posted a proposed patch for that -
> want to try using that?

Yeah, definitely!

> 
> > But if it's not too late, I second for a OUT_OF_ORDER feature.
> > Starting from in order can have much simpler code in driver.
> > 
> > Thanks
> 
> It's tricky to change the flag polarity because of compatibility
> with legacy interfaces. Why is this such a big deal?
> 
> Let's teach drivers about IN_ORDER, then if devices
> are in order it will get enabled by default.

Yeah, make sense.

Besides, I have done some further profiling and debugging
both in kernel driver and DPDK vhost. Previously I was mislead
by a bug in vhost code. I will send a patch to fix that bug.
With that bug fixed, the performance of packed ring in the
test between kernel driver and DPDK vhost is better now.
I will send a new series soon. Thanks!

> 
> -- 
> MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] Re: [PATCH net-next v2 0/5] virtio: support packed ring

2018-09-13 Thread Tiwei Bie
On Wed, Sep 12, 2018 at 12:16:32PM -0400, Michael S. Tsirkin wrote:
> On Tue, Sep 11, 2018 at 01:37:26PM +0800, Tiwei Bie wrote:
> > On Mon, Sep 10, 2018 at 11:33:17AM +0800, Jason Wang wrote:
> > > On 2018年09月10日 11:00, Tiwei Bie wrote:
> > > > On Fri, Sep 07, 2018 at 09:00:49AM -0400, Michael S. Tsirkin wrote:
> > > > > On Fri, Sep 07, 2018 at 09:22:25AM +0800, Tiwei Bie wrote:
> > > > > > On Mon, Aug 27, 2018 at 05:00:40PM +0300, Michael S. Tsirkin wrote:
> > > > > > > Are there still plans to test the performance with vost pmd?
> > > > > > > vhost doesn't seem to show a performance gain ...
> > > > > > > 
> > > > > > I tried some performance tests with vhost PMD. In guest, the
> > > > > > XDP program will return XDP_DROP directly. And in host, testpmd
> > > > > > will do txonly fwd.
> > > > > > 
> > > > > > When burst size is 1 and packet size is 64 in testpmd and
> > > > > > testpmd needs to iterate 5 Tx queues (but only the first two
> > > > > > queues are enabled) to prepare and inject packets, I got ~12%
> > > > > > performance boost (5.7Mpps -> 6.4Mpps). And if the vhost PMD
> > > > > > is faster (e.g. just need to iterate the first two queues to
> > > > > > prepare and inject packets), then I got similar performance
> > > > > > for both rings (~9.9Mpps) (packed ring's performance can be
> > > > > > lower, because it's more complicated in driver.)
> > > > > > 
> > > > > > I think packed ring makes vhost PMD faster, but it doesn't make
> > > > > > the driver faster. In packed ring, the ring is simplified, and
> > > > > > the handling of the ring in vhost (device) is also simplified,
> > > > > > but things are not simplified in driver, e.g. although there is
> > > > > > no desc table in the virtqueue anymore, driver still needs to
> > > > > > maintain a private desc state table (which is still managed as
> > > > > > a list in this patch set) to support the out-of-order desc
> > > > > > processing in vhost (device).
> > > > > > 
> > > > > > I think this patch set is mainly to make the driver have a full
> > > > > > functional support for the packed ring, which makes it possible
> > > > > > to leverage the packed ring feature in vhost (device). But I'm
> > > > > > not sure whether there is any other better idea, I'd like to
> > > > > > hear your thoughts. Thanks!
> > > > > Just this: Jens seems to report a nice gain with virtio and
> > > > > vhost pmd across the board. Try to compare virtio and
> > > > > virtio pmd to see what does pmd do better?
> > > > The virtio PMD (drivers/net/virtio) in DPDK doesn't need to share
> > > > the virtio ring operation code with other drivers and is highly
> > > > optimized for network. E.g. in Rx, the Rx burst function won't
> > > > chain descs. So the ID management for the Rx ring can be quite
> > > > simple and straightforward, we just need to initialize these IDs
> > > > when initializing the ring and don't need to change these IDs
> > > > in data path anymore (the mergable Rx code in that patch set
> > > > assumes the descs will be written back in order, which should be
> > > > fixed. I.e., the ID in the desc should be used to index vq->descx[]).
> > > > The Tx code in that patch set also assumes the descs will be
> > > > written back by device in order, which should be fixed.
> > > 
> > > Yes it is. I think I've pointed it out in some early version of pmd patch.
> > > So I suspect part (or all) of the boost may come from in order feature.
> > > 
> > > > 
> > > > But in kernel virtio driver, the virtio_ring.c is very generic.
> > > > The enqueue (virtqueue_add()) and dequeue (virtqueue_get_buf_ctx())
> > > > functions need to support all the virtio devices and should be
> > > > able to handle all the possible cases that may happen. So although
> > > > the packed ring can be very efficient in some cases, currently
> > > > the room to optimize the performance in kernel's virtio_ring.c
> > > > isn't that much. If we want to take the fully advantage of the
> > > > packed ring's efficiency, we need some further e.g. API changes
> >

Re: [PATCH net-next v2 2/5] virtio_ring: support creating packed ring

2018-09-12 Thread Tiwei Bie
On Mon, Sep 10, 2018 at 10:28:37AM +0800, Tiwei Bie wrote:
> On Fri, Sep 07, 2018 at 10:03:24AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 11, 2018 at 10:27:08AM +0800, Tiwei Bie wrote:
> > > This commit introduces the support for creating packed ring.
> > > All split ring specific functions are added _split suffix.
> > > Some necessary stubs for packed ring are also added.
> > > 
> > > Signed-off-by: Tiwei Bie 
> > 
[...]
> > > +
> > > +/*
> > > + * The layout for the packed ring is a continuous chunk of memory
> > > + * which looks like this.
> > > + *
> > > + * struct vring_packed {
> > > + *   // The actual descriptors (16 bytes each)
> > > + *   struct vring_packed_desc desc[num];
> > > + *
> > > + *   // Padding to the next align boundary.
> > > + *   char pad[];
> > > + *
> > > + *   // Driver Event Suppression
> > > + *   struct vring_packed_desc_event driver;
> > > + *
> > > + *   // Device Event Suppression
> > > + *   struct vring_packed_desc_event device;
> > > + * };
> > > + */
> > 
> > Why not just allocate event structures separately?
> > Is it a win to have them share a cache line for some reason?

Users may call vring_new_virtqueue() with preallocated
memory to setup the ring, e.g.:

https://github.com/torvalds/linux/blob/11da3a7f84f1/drivers/s390/virtio/virtio_ccw.c#L513-L522
https://github.com/torvalds/linux/blob/11da3a7f84f1/drivers/misc/mic/vop/vop_main.c#L306-L307

Below is the corresponding definition in split ring:

https://github.com/oasis-tcs/virtio-spec/blob/89dd55f5e606/split-ring.tex#L64-L78

If my understanding is correct, this is just for legacy
interfaces, and we won't define layout in packed ring
and don't need to support vring_new_virtqueue() in packed
ring. Is it correct? Thanks!



> 
> Will do that.
> 
> > 
> > > +static inline void vring_init_packed(struct vring_packed *vr, unsigned 
> > > int num,
> > > +  void *p, unsigned long align)
> > > +{
> > > + vr->num = num;
> > > + vr->desc = p;
> > > + vr->driver = (void *)ALIGN(((uintptr_t)p +
> > > + sizeof(struct vring_packed_desc) * num), align);
> > > + vr->device = vr->driver + 1;
> > > +}
> > 
> > What's all this about alignment? Where does it come from?
> 
> It comes from the `vring_align` parameter of vring_create_virtqueue()
> and vring_new_virtqueue():
> 
> https://github.com/torvalds/linux/blob/a49a9dcce802/drivers/virtio/virtio_ring.c#L1061
> https://github.com/torvalds/linux/blob/a49a9dcce802/drivers/virtio/virtio_ring.c#L1123
> 
> Should I just ignore it in packed ring?
> 
> CCW defined this:
> 
> #define KVM_VIRTIO_CCW_RING_ALIGN 4096
> 
> I'm not familiar with CCW. Currently, in this patch set, packed ring
> isn't enabled on CCW dues to some legacy accessors are not implemented
> in packed ring yet.
> 
> > 
> > > +
> > > +static inline unsigned vring_size_packed(unsigned int num, unsigned long 
> > > align)
> > > +{
> > > + return ((sizeof(struct vring_packed_desc) * num + align - 1)
> > > + & ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
> > > +}
> [...]
> > > @@ -1129,10 +1388,17 @@ struct virtqueue *vring_new_virtqueue(unsigned 
> > > int index,
> > > void (*callback)(struct virtqueue *vq),
> > > const char *name)
> > >  {
> > > - struct vring vring;
> > > - vring_init(, num, pages, vring_align);
> > > - return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > > -  notify, callback, name);
> > > + union vring_union vring;
> > > + bool packed;
> > > +
> > > + packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> > > + if (packed)
> > > + vring_init_packed(_packed, num, pages, vring_align);
> > > + else
> > > + vring_init(_split, num, pages, vring_align);
> > 
> > 
> > vring_init in the UAPI header is more or less a bug.
> > I'd just stop using it, keep it around for legacy userspace.
> 
> Got it. I'd like to do that. Thanks.
> 
> > 
> > > +
> > > + return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > > +  context, notify, callback, name);
> > >  }
> > >  EXPORT_SYMBOL_GPL(vring_new_virtqueue);
&g

  1   2   3   >