[PATCH] gpio: virtio: remove timeout

2021-12-20 Thread Vincent Whitchurch
The driver imposes an arbitrary one second timeout on virtio requests,
but the specification doesn't prevent the virtio device from taking
longer to process requests, so remove this timeout to support all
systems and device implementations.

Fixes: 3a29355a22c0275fe86 ("gpio: Add virtio-gpio driver")
Signed-off-by: Vincent Whitchurch 
---
 drivers/gpio/gpio-virtio.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpio/gpio-virtio.c b/drivers/gpio/gpio-virtio.c
index 84f96b78f32a..9f4941bc5760 100644
--- a/drivers/gpio/gpio-virtio.c
+++ b/drivers/gpio/gpio-virtio.c
@@ -100,11 +100,7 @@ static int _virtio_gpio_req(struct virtio_gpio *vgpio, u16 
type, u16 gpio,
virtqueue_kick(vgpio->request_vq);
mutex_unlock(&vgpio->lock);
 
-   if (!wait_for_completion_timeout(&line->completion, HZ)) {
-   dev_err(dev, "GPIO operation timed out\n");
-   ret = -ETIMEDOUT;
-   goto out;
-   }
+   wait_for_completion(&line->completion);
 
if (unlikely(res->status != VIRTIO_GPIO_STATUS_OK)) {
dev_err(dev, "GPIO request failed: %d\n", gpio);
-- 
2.33.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/2] i2c: virtio: fix completion handling

2021-12-02 Thread Vincent Whitchurch
On Thu, Nov 11, 2021 at 05:57:30PM +0100, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2021 at 05:04:12PM +0100, Vincent Whitchurch wrote:
> > -   wait_for_completion(&vi->completion);
> > +   /*
> > +* We only need to wait for the last one since the device is required
> > +* to complete requests in order.
> > +*/
> 
> Hmm the spec only says:
> 
> A device MUST guarantee the requests in the virtqueue being processed in 
> order
> if multiple requests are received at a time.
> 
> it does not seem to require using the buffers in order.
> In any case, just waiting for all of them in a loop
> seems cleaner and likely won't take longer ...

Thank you, I've fixed this in v3.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3] i2c: virtio: fix completion handling

2021-12-02 Thread Vincent Whitchurch
The driver currently assumes that the notify callback is only received
when the device is done with all the queued buffers.

However, this is not true, since the notify callback could be called
without any of the queued buffers being completed (for example, with
virtio-pci and shared interrupts) or with only some of the buffers being
completed (since the driver makes them available to the device in
multiple separate virtqueue_add_sgs() calls).

This can lead to incorrect data on the I2C bus or memory corruption in
the guest if the device operates on buffers which are have been freed by
the driver.  (The WARN_ON in the driver is also triggered.)

 BUG kmalloc-128 (Tainted: GW): Poison overwritten
 First byte 0x0 instead of 0x6b
 Allocated in i2cdev_ioctl_rdwr+0x9d/0x1de age=243 cpu=0 pid=28
memdup_user+0x2e/0xbd
i2cdev_ioctl_rdwr+0x9d/0x1de
i2cdev_ioctl+0x247/0x2ed
vfs_ioctl+0x21/0x30
sys_ioctl+0xb18/0xb41
 Freed in i2cdev_ioctl_rdwr+0x1bb/0x1de age=68 cpu=0 pid=28
kfree+0x1bd/0x1cc
i2cdev_ioctl_rdwr+0x1bb/0x1de
i2cdev_ioctl+0x247/0x2ed
vfs_ioctl+0x21/0x30
sys_ioctl+0xb18/0xb41

Fix this by calling virtio_get_buf() from the notify handler like other
virtio drivers and by actually waiting for all the buffers to be
completed.

Fixes: 3cfc88380413d20f ("i2c: virtio: add a virtio i2c frontend driver")
Acked-by: Viresh Kumar 
Signed-off-by: Vincent Whitchurch 
---

Notes:
v3: Wait for all completions instead of only the last one.
v2: Add Acked-by and Fixes tags.

 drivers/i2c/busses/i2c-virtio.c | 32 
 1 file changed, 12 insertions(+), 20 deletions(-)

diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
index 95378780da6d..41eb0dcc3204 100644
--- a/drivers/i2c/busses/i2c-virtio.c
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -22,24 +22,24 @@
 /**
  * struct virtio_i2c - virtio I2C data
  * @vdev: virtio device for this controller
- * @completion: completion of virtio I2C message
  * @adap: I2C adapter for this controller
  * @vq: the virtio virtqueue for communication
  */
 struct virtio_i2c {
struct virtio_device *vdev;
-   struct completion completion;
struct i2c_adapter adap;
struct virtqueue *vq;
 };
 
 /**
  * struct virtio_i2c_req - the virtio I2C request structure
+ * @completion: completion of virtio I2C message
  * @out_hdr: the OUT header of the virtio I2C message
  * @buf: the buffer into which data is read, or from which it's written
  * @in_hdr: the IN header of the virtio I2C message
  */
 struct virtio_i2c_req {
+   struct completion completion;
struct virtio_i2c_out_hdr out_hdr   cacheline_aligned;
uint8_t *bufcacheline_aligned;
struct virtio_i2c_in_hdr in_hdr cacheline_aligned;
@@ -47,9 +47,11 @@ struct virtio_i2c_req {
 
 static void virtio_i2c_msg_done(struct virtqueue *vq)
 {
-   struct virtio_i2c *vi = vq->vdev->priv;
+   struct virtio_i2c_req *req;
+   unsigned int len;
 
-   complete(&vi->completion);
+   while ((req = virtqueue_get_buf(vq, &len)))
+   complete(&req->completion);
 }
 
 static int virtio_i2c_prepare_reqs(struct virtqueue *vq,
@@ -62,6 +64,8 @@ static int virtio_i2c_prepare_reqs(struct virtqueue *vq,
for (i = 0; i < num; i++) {
int outcnt = 0, incnt = 0;
 
+   init_completion(&reqs[i].completion);
+
/*
 * Only 7-bit mode supported for this moment. For the address
 * format, Please check the Virtio I2C Specification.
@@ -106,21 +110,15 @@ static int virtio_i2c_complete_reqs(struct virtqueue *vq,
struct virtio_i2c_req *reqs,
struct i2c_msg *msgs, int num)
 {
-   struct virtio_i2c_req *req;
bool failed = false;
-   unsigned int len;
int i, j = 0;
 
for (i = 0; i < num; i++) {
-   /* Detach the ith request from the vq */
-   req = virtqueue_get_buf(vq, &len);
+   struct virtio_i2c_req *req = &reqs[i];
 
-   /*
-* Condition req == &reqs[i] should always meet since we have
-* total num requests in the vq. reqs[i] can never be NULL here.
-*/
-   if (!failed && (WARN_ON(req != &reqs[i]) ||
-   req->in_hdr.status != VIRTIO_I2C_MSG_OK))
+   wait_for_completion(&req->completion);
+
+   if (!failed && req->in_hdr.status != VIRTIO_I2C_MSG_OK)
failed = true;
 
i2c_put_dma_safe_msg_buf(reqs[i].buf, &msgs[i], !failed);
@@ -156,12 +154,8 @@ static int virtio_i2c_xfer(struct i2c_adapter *adap, 
struct i2c_msg *msgs,
  

Re: [PATCH v2 1/2] i2c: virtio: disable timeout handling

2021-11-19 Thread Vincent Whitchurch
On Fri, Nov 12, 2021 at 03:35:29AM +0100, Viresh Kumar wrote:
> On 11-11-21, 17:04, Vincent Whitchurch wrote:
> >  static int virtio_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs,
> > @@ -141,7 +140,6 @@ static int virtio_i2c_xfer(struct i2c_adapter *adap, 
> > struct i2c_msg *msgs,
> > struct virtio_i2c *vi = i2c_get_adapdata(adap);
> > struct virtqueue *vq = vi->vq;
> > struct virtio_i2c_req *reqs;
> > -   unsigned long time_left;
> > int count;
> >  
> > reqs = kcalloc(num, sizeof(*reqs), GFP_KERNEL);
> > @@ -164,11 +162,9 @@ static int virtio_i2c_xfer(struct i2c_adapter *adap, 
> > struct i2c_msg *msgs,
> > reinit_completion(&vi->completion);
> > virtqueue_kick(vq);
> >  
> > -   time_left = wait_for_completion_timeout(&vi->completion, adap->timeout);
> > -   if (!time_left)
> > -   dev_err(&adap->dev, "virtio i2c backend timeout.\n");
> > +   wait_for_completion(&vi->completion);
> 
> I thought we decided on making this in insanely high value instead ?

That wasn't my impression from the previous email thread.  Jie was OK
with doing it either way, and only disabling the timeout entirely makes
sense to me given the risk for memory corruption otherwise.

What "insanely high" timeout value do you have in mind and why would it
be acceptable to corrupt kernel memory after that time?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 1/2] i2c: virtio: disable timeout handling

2021-11-11 Thread Vincent Whitchurch
If a timeout is hit, it can result is incorrect data on the I2C bus
and/or memory corruptions in the guest since the device can still be
operating on the buffers it was given while the guest has freed them.

Here is, for example, the start of a slub_debug splat which was
triggered on the next transfer after one transfer was forced to timeout
by setting a breakpoint in the backend (rust-vmm/vhost-device):

 BUG kmalloc-1k (Not tainted): Poison overwritten
 First byte 0x1 instead of 0x6b
 Allocated in virtio_i2c_xfer+0x65/0x35c age=350 cpu=0 pid=29
__kmalloc+0xc2/0x1c9
virtio_i2c_xfer+0x65/0x35c
__i2c_transfer+0x429/0x57d
i2c_transfer+0x115/0x134
i2cdev_ioctl_rdwr+0x16a/0x1de
i2cdev_ioctl+0x247/0x2ed
vfs_ioctl+0x21/0x30
sys_ioctl+0xb18/0xb41
 Freed in virtio_i2c_xfer+0x32e/0x35c age=244 cpu=0 pid=29
kfree+0x1bd/0x1cc
virtio_i2c_xfer+0x32e/0x35c
__i2c_transfer+0x429/0x57d
i2c_transfer+0x115/0x134
i2cdev_ioctl_rdwr+0x16a/0x1de
i2cdev_ioctl+0x247/0x2ed
vfs_ioctl+0x21/0x30
sys_ioctl+0xb18/0xb41

There is no simple fix for this (the driver would have to always create
bounce buffers and hold on to them until the device eventually returns
the buffers), so just disable the timeout support for now.

Fixes: 3cfc88380413d20f ("i2c: virtio: add a virtio i2c frontend driver")
Acked-by: Jie Deng 
Signed-off-by: Vincent Whitchurch 
---
 drivers/i2c/busses/i2c-virtio.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
index f10a603b13fb..7b2474e6876f 100644
--- a/drivers/i2c/busses/i2c-virtio.c
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -106,11 +106,10 @@ static int virtio_i2c_prepare_reqs(struct virtqueue *vq,
 
 static int virtio_i2c_complete_reqs(struct virtqueue *vq,
struct virtio_i2c_req *reqs,
-   struct i2c_msg *msgs, int num,
-   bool timedout)
+   struct i2c_msg *msgs, int num)
 {
struct virtio_i2c_req *req;
-   bool failed = timedout;
+   bool failed = false;
unsigned int len;
int i, j = 0;
 
@@ -132,7 +131,7 @@ static int virtio_i2c_complete_reqs(struct virtqueue *vq,
j++;
}
 
-   return timedout ? -ETIMEDOUT : j;
+   return j;
 }
 
 static int virtio_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs,
@@ -141,7 +140,6 @@ static int virtio_i2c_xfer(struct i2c_adapter *adap, struct 
i2c_msg *msgs,
struct virtio_i2c *vi = i2c_get_adapdata(adap);
struct virtqueue *vq = vi->vq;
struct virtio_i2c_req *reqs;
-   unsigned long time_left;
int count;
 
reqs = kcalloc(num, sizeof(*reqs), GFP_KERNEL);
@@ -164,11 +162,9 @@ static int virtio_i2c_xfer(struct i2c_adapter *adap, 
struct i2c_msg *msgs,
reinit_completion(&vi->completion);
virtqueue_kick(vq);
 
-   time_left = wait_for_completion_timeout(&vi->completion, adap->timeout);
-   if (!time_left)
-   dev_err(&adap->dev, "virtio i2c backend timeout.\n");
+   wait_for_completion(&vi->completion);
 
-   count = virtio_i2c_complete_reqs(vq, reqs, msgs, count, !time_left);
+   count = virtio_i2c_complete_reqs(vq, reqs, msgs, count);
 
 err_free:
kfree(reqs);
-- 
2.28.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 2/2] i2c: virtio: fix completion handling

2021-11-11 Thread Vincent Whitchurch
The driver currently assumes that the notify callback is only received
when the device is done with all the queued buffers.

However, this is not true, since the notify callback could be called
without any of the queued buffers being completed (for example, with
virtio-pci and shared interrupts) or with only some of the buffers being
completed (since the driver makes them available to the device in
multiple separate virtqueue_add_sgs() calls).

This can lead to incorrect data on the I2C bus or memory corruption in
the guest if the device operates on buffers which are have been freed by
the driver.  (The WARN_ON in the driver is also triggered.)

 BUG kmalloc-128 (Tainted: GW): Poison overwritten
 First byte 0x0 instead of 0x6b
 Allocated in i2cdev_ioctl_rdwr+0x9d/0x1de age=243 cpu=0 pid=28
memdup_user+0x2e/0xbd
i2cdev_ioctl_rdwr+0x9d/0x1de
i2cdev_ioctl+0x247/0x2ed
vfs_ioctl+0x21/0x30
sys_ioctl+0xb18/0xb41
 Freed in i2cdev_ioctl_rdwr+0x1bb/0x1de age=68 cpu=0 pid=28
kfree+0x1bd/0x1cc
i2cdev_ioctl_rdwr+0x1bb/0x1de
i2cdev_ioctl+0x247/0x2ed
vfs_ioctl+0x21/0x30
sys_ioctl+0xb18/0xb41

Fix this by calling virtio_get_buf() from the notify handler like other
virtio drivers and by actually waiting for all the buffers to be
completed.

Fixes: 3cfc88380413d20f ("i2c: virtio: add a virtio i2c frontend driver")
Acked-by: Viresh Kumar 
Signed-off-by: Vincent Whitchurch 
---
 drivers/i2c/busses/i2c-virtio.c | 34 +++--
 1 file changed, 15 insertions(+), 19 deletions(-)

diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
index 7b2474e6876f..2d3ae8e238ec 100644
--- a/drivers/i2c/busses/i2c-virtio.c
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -22,24 +22,24 @@
 /**
  * struct virtio_i2c - virtio I2C data
  * @vdev: virtio device for this controller
- * @completion: completion of virtio I2C message
  * @adap: I2C adapter for this controller
  * @vq: the virtio virtqueue for communication
  */
 struct virtio_i2c {
struct virtio_device *vdev;
-   struct completion completion;
struct i2c_adapter adap;
struct virtqueue *vq;
 };
 
 /**
  * struct virtio_i2c_req - the virtio I2C request structure
+ * @completion: completion of virtio I2C message
  * @out_hdr: the OUT header of the virtio I2C message
  * @buf: the buffer into which data is read, or from which it's written
  * @in_hdr: the IN header of the virtio I2C message
  */
 struct virtio_i2c_req {
+   struct completion completion;
struct virtio_i2c_out_hdr out_hdr   cacheline_aligned;
uint8_t *bufcacheline_aligned;
struct virtio_i2c_in_hdr in_hdr cacheline_aligned;
@@ -47,9 +47,11 @@ struct virtio_i2c_req {
 
 static void virtio_i2c_msg_done(struct virtqueue *vq)
 {
-   struct virtio_i2c *vi = vq->vdev->priv;
+   struct virtio_i2c_req *req;
+   unsigned int len;
 
-   complete(&vi->completion);
+   while ((req = virtqueue_get_buf(vq, &len)))
+   complete(&req->completion);
 }
 
 static int virtio_i2c_prepare_reqs(struct virtqueue *vq,
@@ -69,6 +71,8 @@ static int virtio_i2c_prepare_reqs(struct virtqueue *vq,
if (!msgs[i].len)
break;
 
+   init_completion(&reqs[i].completion);
+
/*
 * Only 7-bit mode supported for this moment. For the address
 * format, Please check the Virtio I2C Specification.
@@ -108,21 +112,13 @@ static int virtio_i2c_complete_reqs(struct virtqueue *vq,
struct virtio_i2c_req *reqs,
struct i2c_msg *msgs, int num)
 {
-   struct virtio_i2c_req *req;
bool failed = false;
-   unsigned int len;
int i, j = 0;
 
for (i = 0; i < num; i++) {
-   /* Detach the ith request from the vq */
-   req = virtqueue_get_buf(vq, &len);
+   struct virtio_i2c_req *req = &reqs[i];
 
-   /*
-* Condition req == &reqs[i] should always meet since we have
-* total num requests in the vq. reqs[i] can never be NULL here.
-*/
-   if (!failed && (WARN_ON(req != &reqs[i]) ||
-   req->in_hdr.status != VIRTIO_I2C_MSG_OK))
+   if (!failed && req->in_hdr.status != VIRTIO_I2C_MSG_OK)
failed = true;
 
i2c_put_dma_safe_msg_buf(reqs[i].buf, &msgs[i], !failed);
@@ -158,11 +154,13 @@ static int virtio_i2c_xfer(struct i2c_adapter *adap, 
struct i2c_msg *msgs,
 * remote here to clear the virtqueue, so we can try another set of
 * messages later on.
 */
-
-   reinit_completion(&vi->completion);
virtqueue_kick(

[PATCH v2 0/2] virtio-i2c: Fix buffer handling

2021-11-11 Thread Vincent Whitchurch
This fixes a couple of bugs in the buffer handling in virtio-i2c which can
result in incorrect data on the I2C bus or memory corruption in the guest.

I tested this on UML (virtio-uml needs a bug fix too, I will sent that out
later) with the device implementation in rust-vmm/vhost-device.

Changes in v2:
- Added Acked-by and Fixes tags

Vincent Whitchurch (2):
  i2c: virtio: disable timeout handling
  i2c: virtio: fix completion handling

 drivers/i2c/busses/i2c-virtio.c | 46 ++---
 1 file changed, 19 insertions(+), 27 deletions(-)

-- 
2.28.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] i2c: virtio: disable timeout handling

2021-11-03 Thread Vincent Whitchurch
On Wed, Nov 03, 2021 at 07:37:45AM +0100, Viresh Kumar wrote:
> On 03-11-21, 06:18, Chen, Conghui wrote:
> > >>> Over the long term, I think the backend should provide that timeout
> > >>> value and guarantee that its processing time should not exceed that
> > >>> value.
> > >> If you mean that the spec should be changed to allow the virtio driver
> > >> to be able to program a certain timeout for I2C transactions in the
> > >> virtio device, yes, that does sound reasonable.
> > >
> > >
> > >Due to changes in my work, I will pass my virtio-i2c maintenance to 
> > >Conghui.
> > >
> > >She may work on this in the future.
> > >
> > 
> > I'll try to update the spec first.
> 
> I don't think the spec should be changed for timeout. Timeout-interval
> here isn't the property of just the host firmware/kernel, but the
> entire setup plays a role here.
> 
> Host have its own timeframe to take care of things (I think HZ should
> really be enough for that, since kernel can manage it for busses
> normally with just that). Then comes the virtualization, context
> switches, guest OS, backend, etc, which add to this delay. All this is
> not part of the virtio protocol and so shouldn't be made part of it.

The suggested timeout is not meant to take into account the overhead of
virtualization, but to be used by the virtio device as a timeout for the
transaction on the I2C bus (presumably by programming this value to the
physical I2C controller, if one exists).

Assume that userspace (or an I2C client driver) asks for a timeout of 20
ms for a particular transfer because it, say, knows that the particular
connected I2C peripheral either responds within 10 ms to a particular
register read or never responds, so it doesn't want to waste time
waiting unnecessarily long for the transfer to complete.

If the virtio device end does not have any information on what timeout
is required (as in the current spec), it must assume some high value
which will never cause I2C transactions to spuriously timeout, say 10
seconds.  

Even if the virtio driver is fixed to copy and hold all buffers to avoid
memory corruption and to time out and return to the caller after the
requested 20 ms, the next I2C transfer can not be issued until 10
seconds have passed, since the virtio device end will still be waiting
for the hardcoded 10 second timeout and may not respond to new requests
until that transfer has timed out.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] i2c: virtio: disable timeout handling

2021-10-29 Thread Vincent Whitchurch
On Thu, Oct 21, 2021 at 05:30:28AM +0200, Jie Deng wrote:
> On 2021/10/20 19:03, Viresh Kumar wrote:
> > On 20-10-21, 12:55, Vincent Whitchurch wrote:
> >> If the timeout cannot be disabled, then the driver should be fixed to
> >> always copy buffers and hold on to them to avoid memory corruption in
> >> the case of timeout, as I mentioned in my commit message.  That would be
> >> quite a substantial change to the driver so it's not something I'm
> >> personally comfortable with doing, especially not this late in the -rc
> >> cycle, so I'd leave that to others.
> > Or we can avoid clearing up and freeing the buffers here until the
> > point where the buffers are returned by the host. Until that happens,
> > we can avoid taking new requests but return to the earlier caller with
> > timeout failure. That would avoid corruption, by freeing buffers
> > sooner, and not hanging of the kernel.
> 
> It seems similar to use "wait_for_completion". If the other side is
> hacked, the guest may never get the buffers returned by the host,
> right ?

Note that it is trivial for the host to DoS the guest.  All the host has
to do is stop responding to I/O requests (I2C or others), then the guest
will not be able to perform its intended functions, regardless of
whether this particular driver waits forever or not.  Even TDX (which
Greg mentioned) does not prevent that, see:

 https://lore.kernel.org/virtualization/?q=tdx+dos

> For this moment, we can solve the problem by using a hardcoded big
> value or disabling the timeout.

Is that an Acked-by on this patch which does the latter?

> Over the long term, I think the backend should provide that timeout
> value and guarantee that its processing time should not exceed that
> value.

If you mean that the spec should be changed to allow the virtio driver
to be able to program a certain timeout for I2C transactions in the
virtio device, yes, that does sound reasonable.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] i2c: virtio: fix completion handling

2021-10-29 Thread Vincent Whitchurch
On Wed, Oct 20, 2021 at 12:47:09PM +0200, Viresh Kumar wrote:
> On 20-10-21, 12:38, Vincent Whitchurch wrote:
> > I don't quite understand how that would be safe since
> > virtqueue_add_sgs() can fail after a few iterations and all queued
> > request buffers can have FAIL_NEXT set.  In such a case, we would end up
> > waiting forever with your proposed change, wouldn't we?
> 
> Good point. I didn't think of that earlier.
> 
> I think a good simple way of handling this is counting the number of
> buffers sent and received. Once they match, we are done. That
> shouldn't break anything else I believe.

That could work, but it's not so straightforward since you would have to
introduce locking to prevent races since the final count is only known
after virtio_i2c_prepare_reqs() completes, while the callback could be
called before that.  Please do not hesitate to send out a patch to fix
it that way if that is what you prefer.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] virtio-ring: fix DMA metadata flags

2021-10-26 Thread Vincent Whitchurch
The flags are currently overwritten, leading to the wrong direction
being passed to the DMA unmap functions.

Fixes: 72b5e8958738aaa4 ("virtio-ring: store DMA metadata in desc_extra for 
split virtqueue")
Signed-off-by: Vincent Whitchurch 
---
 drivers/virtio/virtio_ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index dd95dfd85e98..3035bb6f5458 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -576,7 +576,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
/* Last one doesn't continue. */
desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
if (!indirect && vq->use_dma_api)
-   vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags =
+   vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
~VRING_DESC_F_NEXT;
 
if (indirect) {
-- 
2.28.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] i2c: virtio: disable timeout handling

2021-10-20 Thread Vincent Whitchurch
On Wed, Oct 20, 2021 at 09:04:45AM +0200, Jie Deng wrote:
> On 2021/10/20 14:41, Viresh Kumar wrote:
> > On 20-10-21, 14:35, Jie Deng wrote:
> >> Yes, but we need to know what's the best value to be configured for a
> >> specific "other side".
> >>
> >> I think the "other side" should be more aware of what value is reasonable 
> >> to
> >> be used.
> > If we _really_ need that, then it would require an update to the
> > specification first.
> >
> > I am not sure if the other side is the only party in play here. It
> > depends on the entire setup and not just the hardware device.
> > Specially with virtualisation things become really slow because of
> > context switches, etc. It may be better for the guest userspace to
> > decide on the value.
> >
> > Since this is specially for virtualisation, the kernel may set the
> > value to few HZ by default, lets say 10 (Yeah its really high) :).
> 
> I'm OK with this way for the simplicity.

That would not be safe.  Userspace can change this timeout and the end
result with the current implementation of the driver is potentially
kernel memory corruption, which is something userspace of course never
should be able to trigger.

Even if the timeout were hardcoded in the driver and the driver made to
ignore what userspace requests, it would need to be set to a
ridiculously high value so that we can guarantee that the timeout never
ever occurs (since the result is potentially random kernel memory
corruption).  Which is basically equivalent to disabling the timeout
entirely as my patch does.

If the timeout cannot be disabled, then the driver should be fixed to
always copy buffers and hold on to them to avoid memory corruption in
the case of timeout, as I mentioned in my commit message.  That would be
quite a substantial change to the driver so it's not something I'm
personally comfortable with doing, especially not this late in the -rc
cycle, so I'd leave that to others.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] i2c: virtio: fix completion handling

2021-10-20 Thread Vincent Whitchurch
On Wed, Oct 20, 2021 at 11:17:21AM +0200, Viresh Kumar wrote:
> On 20-10-21, 16:54, Jie Deng wrote:
> > 
> > On 2021/10/19 16:22, Viresh Kumar wrote:
> > > On 19-10-21, 09:46, Vincent Whitchurch wrote:
> > > >   static void virtio_i2c_msg_done(struct virtqueue *vq)
> > > >   {
> > > > -   struct virtio_i2c *vi = vq->vdev->priv;
> > > > +   struct virtio_i2c_req *req;
> > > > +   unsigned int len;
> > > > -   complete(&vi->completion);
> > > > +   while ((req = virtqueue_get_buf(vq, &len)))
> > > > +   complete(&req->completion);
> > > Instead of adding a completion for each request and using only the
> > > last one, maybe we can do this instead here:
> > > 
> > >   while ((req = virtqueue_get_buf(vq, &len))) {
> > >  if (req->out_hdr.flags == 
> > > cpu_to_le32(VIRTIO_I2C_FLAGS_FAIL_NEXT))
> > 
> > 
> > Is this for the last one check ? For the last one, this bit should be
> > cleared, right ?
> 
> Oops, you are right. This should be `!=` instead. Thanks.

I don't quite understand how that would be safe since
virtqueue_add_sgs() can fail after a few iterations and all queued
request buffers can have FAIL_NEXT set.  In such a case, we would end up
waiting forever with your proposed change, wouldn't we?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] i2c: virtio: fix completion handling

2021-10-19 Thread Vincent Whitchurch
The driver currently assumes that the notify callback is only received
when the device is done with all the queued buffers.

However, this is not true, since the notify callback could be called
without any of the queued buffers being completed (for example, with
virtio-pci and shared interrupts) or with only some of the buffers being
completed (since the driver makes them available to the device in
multiple separate virtqueue_add_sgs() calls).

This can lead to incorrect data on the I2C bus or memory corruption in
the guest if the device operates on buffers which are have been freed by
the driver.  (The WARN_ON in the driver is also triggered.)

 BUG kmalloc-128 (Tainted: GW): Poison overwritten
 First byte 0x0 instead of 0x6b
 Allocated in i2cdev_ioctl_rdwr+0x9d/0x1de age=243 cpu=0 pid=28
memdup_user+0x2e/0xbd
i2cdev_ioctl_rdwr+0x9d/0x1de
i2cdev_ioctl+0x247/0x2ed
vfs_ioctl+0x21/0x30
sys_ioctl+0xb18/0xb41
 Freed in i2cdev_ioctl_rdwr+0x1bb/0x1de age=68 cpu=0 pid=28
kfree+0x1bd/0x1cc
i2cdev_ioctl_rdwr+0x1bb/0x1de
i2cdev_ioctl+0x247/0x2ed
vfs_ioctl+0x21/0x30
sys_ioctl+0xb18/0xb41

Fix this by calling virtio_get_buf() from the notify handler like other
virtio drivers and by actually waiting for all the buffers to be
completed.

Signed-off-by: Vincent Whitchurch 
---
 drivers/i2c/busses/i2c-virtio.c | 34 +++--
 1 file changed, 15 insertions(+), 19 deletions(-)

diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
index 7b2474e6876f..2d3ae8e238ec 100644
--- a/drivers/i2c/busses/i2c-virtio.c
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -22,24 +22,24 @@
 /**
  * struct virtio_i2c - virtio I2C data
  * @vdev: virtio device for this controller
- * @completion: completion of virtio I2C message
  * @adap: I2C adapter for this controller
  * @vq: the virtio virtqueue for communication
  */
 struct virtio_i2c {
struct virtio_device *vdev;
-   struct completion completion;
struct i2c_adapter adap;
struct virtqueue *vq;
 };
 
 /**
  * struct virtio_i2c_req - the virtio I2C request structure
+ * @completion: completion of virtio I2C message
  * @out_hdr: the OUT header of the virtio I2C message
  * @buf: the buffer into which data is read, or from which it's written
  * @in_hdr: the IN header of the virtio I2C message
  */
 struct virtio_i2c_req {
+   struct completion completion;
struct virtio_i2c_out_hdr out_hdr   cacheline_aligned;
uint8_t *bufcacheline_aligned;
struct virtio_i2c_in_hdr in_hdr cacheline_aligned;
@@ -47,9 +47,11 @@ struct virtio_i2c_req {
 
 static void virtio_i2c_msg_done(struct virtqueue *vq)
 {
-   struct virtio_i2c *vi = vq->vdev->priv;
+   struct virtio_i2c_req *req;
+   unsigned int len;
 
-   complete(&vi->completion);
+   while ((req = virtqueue_get_buf(vq, &len)))
+   complete(&req->completion);
 }
 
 static int virtio_i2c_prepare_reqs(struct virtqueue *vq,
@@ -69,6 +71,8 @@ static int virtio_i2c_prepare_reqs(struct virtqueue *vq,
if (!msgs[i].len)
break;
 
+   init_completion(&reqs[i].completion);
+
/*
 * Only 7-bit mode supported for this moment. For the address
 * format, Please check the Virtio I2C Specification.
@@ -108,21 +112,13 @@ static int virtio_i2c_complete_reqs(struct virtqueue *vq,
struct virtio_i2c_req *reqs,
struct i2c_msg *msgs, int num)
 {
-   struct virtio_i2c_req *req;
bool failed = false;
-   unsigned int len;
int i, j = 0;
 
for (i = 0; i < num; i++) {
-   /* Detach the ith request from the vq */
-   req = virtqueue_get_buf(vq, &len);
+   struct virtio_i2c_req *req = &reqs[i];
 
-   /*
-* Condition req == &reqs[i] should always meet since we have
-* total num requests in the vq. reqs[i] can never be NULL here.
-*/
-   if (!failed && (WARN_ON(req != &reqs[i]) ||
-   req->in_hdr.status != VIRTIO_I2C_MSG_OK))
+   if (!failed && req->in_hdr.status != VIRTIO_I2C_MSG_OK)
failed = true;
 
i2c_put_dma_safe_msg_buf(reqs[i].buf, &msgs[i], !failed);
@@ -158,11 +154,13 @@ static int virtio_i2c_xfer(struct i2c_adapter *adap, 
struct i2c_msg *msgs,
 * remote here to clear the virtqueue, so we can try another set of
 * messages later on.
 */
-
-   reinit_completion(&vi->completion);
virtqueue_kick(vq);
 
-   wait_for_completion(&vi->completion);
+   /*
+* We only need to

[PATCH 1/2] i2c: virtio: disable timeout handling

2021-10-19 Thread Vincent Whitchurch
If a timeout is hit, it can result is incorrect data on the I2C bus
and/or memory corruptions in the guest since the device can still be
operating on the buffers it was given while the guest has freed them.

Here is, for example, the start of a slub_debug splat which was
triggered on the next transfer after one transfer was forced to timeout
by setting a breakpoint in the backend (rust-vmm/vhost-device):

 BUG kmalloc-1k (Not tainted): Poison overwritten
 First byte 0x1 instead of 0x6b
 Allocated in virtio_i2c_xfer+0x65/0x35c age=350 cpu=0 pid=29
__kmalloc+0xc2/0x1c9
virtio_i2c_xfer+0x65/0x35c
__i2c_transfer+0x429/0x57d
i2c_transfer+0x115/0x134
i2cdev_ioctl_rdwr+0x16a/0x1de
i2cdev_ioctl+0x247/0x2ed
vfs_ioctl+0x21/0x30
sys_ioctl+0xb18/0xb41
 Freed in virtio_i2c_xfer+0x32e/0x35c age=244 cpu=0 pid=29
kfree+0x1bd/0x1cc
virtio_i2c_xfer+0x32e/0x35c
__i2c_transfer+0x429/0x57d
i2c_transfer+0x115/0x134
i2cdev_ioctl_rdwr+0x16a/0x1de
i2cdev_ioctl+0x247/0x2ed
vfs_ioctl+0x21/0x30
sys_ioctl+0xb18/0xb41

There is no simple fix for this (the driver would have to always create
bounce buffers and hold on to them until the device eventually returns
the buffers), so just disable the timeout support for now.

Signed-off-by: Vincent Whitchurch 
---
 drivers/i2c/busses/i2c-virtio.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
index f10a603b13fb..7b2474e6876f 100644
--- a/drivers/i2c/busses/i2c-virtio.c
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -106,11 +106,10 @@ static int virtio_i2c_prepare_reqs(struct virtqueue *vq,
 
 static int virtio_i2c_complete_reqs(struct virtqueue *vq,
struct virtio_i2c_req *reqs,
-   struct i2c_msg *msgs, int num,
-   bool timedout)
+   struct i2c_msg *msgs, int num)
 {
struct virtio_i2c_req *req;
-   bool failed = timedout;
+   bool failed = false;
unsigned int len;
int i, j = 0;
 
@@ -132,7 +131,7 @@ static int virtio_i2c_complete_reqs(struct virtqueue *vq,
j++;
}
 
-   return timedout ? -ETIMEDOUT : j;
+   return j;
 }
 
 static int virtio_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs,
@@ -141,7 +140,6 @@ static int virtio_i2c_xfer(struct i2c_adapter *adap, struct 
i2c_msg *msgs,
struct virtio_i2c *vi = i2c_get_adapdata(adap);
struct virtqueue *vq = vi->vq;
struct virtio_i2c_req *reqs;
-   unsigned long time_left;
int count;
 
reqs = kcalloc(num, sizeof(*reqs), GFP_KERNEL);
@@ -164,11 +162,9 @@ static int virtio_i2c_xfer(struct i2c_adapter *adap, 
struct i2c_msg *msgs,
reinit_completion(&vi->completion);
virtqueue_kick(vq);
 
-   time_left = wait_for_completion_timeout(&vi->completion, adap->timeout);
-   if (!time_left)
-   dev_err(&adap->dev, "virtio i2c backend timeout.\n");
+   wait_for_completion(&vi->completion);
 
-   count = virtio_i2c_complete_reqs(vq, reqs, msgs, count, !time_left);
+   count = virtio_i2c_complete_reqs(vq, reqs, msgs, count);
 
 err_free:
kfree(reqs);
-- 
2.28.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 0/2] virtio-i2c: Fix buffer handling

2021-10-19 Thread Vincent Whitchurch
This fixes a couple of bugs in the buffer handling in virtio-i2c which can
result in incorrect data on the I2C bus or memory corruption in the guest.

I tested this on UML (virtio-uml needs a bug fix too, I will sent that out
later) with the device implementation in rust-vmm/vhost-device.

Vincent Whitchurch (2):
  i2c: virtio: disable timeout handling
  i2c: virtio: fix completion handling

 drivers/i2c/busses/i2c-virtio.c | 46 ++---
 1 file changed, 19 insertions(+), 27 deletions(-)

-- 
2.28.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 02/10] vhost: push virtqueue area pointers into a user struct

2021-09-29 Thread Vincent Whitchurch
In order to prepare for allowing vhost to operate on kernel buffers,
push the virtqueue desc/avail/used area pointers down to a new "user"
struct.

No functional change intended.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/vdpa.c  |  6 +--
 drivers/vhost/vhost.c | 90 +--
 drivers/vhost/vhost.h |  8 ++--
 3 files changed, 53 insertions(+), 51 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index f41d081777f5..6f05388f5a21 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -400,9 +400,9 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, 
unsigned int cmd,
switch (cmd) {
case VHOST_SET_VRING_ADDR:
if (ops->set_vq_address(vdpa, idx,
-   (u64)(uintptr_t)vq->desc,
-   (u64)(uintptr_t)vq->avail,
-   (u64)(uintptr_t)vq->used))
+   (u64)(uintptr_t)vq->user.desc,
+   (u64)(uintptr_t)vq->user.avail,
+   (u64)(uintptr_t)vq->user.used))
r = -EINVAL;
break;
 
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 59edb5a1ffe2..108994f386f7 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -46,8 +46,8 @@ enum {
VHOST_MEMORY_F_LOG = 0x1,
 };
 
-#define vhost_used_event(vq) ((__virtio16 __user *)&vq->avail->ring[vq->num])
-#define vhost_avail_event(vq) ((__virtio16 __user *)&vq->used->ring[vq->num])
+#define vhost_used_event(vq) ((__virtio16 __user 
*)&vq->user.avail->ring[vq->num])
+#define vhost_avail_event(vq) ((__virtio16 __user 
*)&vq->user.used->ring[vq->num])
 
 #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY
 static void vhost_disable_cross_endian(struct vhost_virtqueue *vq)
@@ -306,7 +306,7 @@ static void vhost_vring_call_reset(struct vhost_vring_call 
*call_ctx)
 
 bool vhost_vq_is_setup(struct vhost_virtqueue *vq)
 {
-   return vq->avail && vq->desc && vq->used && vhost_vq_access_ok(vq);
+   return vq->user.avail && vq->user.desc && vq->user.used && 
vhost_vq_access_ok(vq);
 }
 EXPORT_SYMBOL_GPL(vhost_vq_is_setup);
 
@@ -314,9 +314,9 @@ static void vhost_vq_reset(struct vhost_dev *dev,
   struct vhost_virtqueue *vq)
 {
vq->num = 1;
-   vq->desc = NULL;
-   vq->avail = NULL;
-   vq->used = NULL;
+   vq->user.desc = NULL;
+   vq->user.avail = NULL;
+   vq->user.used = NULL;
vq->last_avail_idx = 0;
vq->avail_idx = 0;
vq->last_used_idx = 0;
@@ -444,8 +444,8 @@ static size_t vhost_get_avail_size(struct vhost_virtqueue 
*vq,
size_t event __maybe_unused =
   vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
 
-   return sizeof(*vq->avail) +
-  sizeof(*vq->avail->ring) * num + event;
+   return sizeof(*vq->user.avail) +
+  sizeof(*vq->user.avail->ring) * num + event;
 }
 
 static size_t vhost_get_used_size(struct vhost_virtqueue *vq,
@@ -454,14 +454,14 @@ static size_t vhost_get_used_size(struct vhost_virtqueue 
*vq,
size_t event __maybe_unused =
   vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
 
-   return sizeof(*vq->used) +
-  sizeof(*vq->used->ring) * num + event;
+   return sizeof(*vq->user.used) +
+  sizeof(*vq->user.used->ring) * num + event;
 }
 
 static size_t vhost_get_desc_size(struct vhost_virtqueue *vq,
  unsigned int num)
 {
-   return sizeof(*vq->desc) * num;
+   return sizeof(*vq->user.desc) * num;
 }
 
 void vhost_dev_init(struct vhost_dev *dev,
@@ -959,7 +959,7 @@ static inline int vhost_put_used(struct vhost_virtqueue *vq,
 struct vring_used_elem *head, int idx,
 int count)
 {
-   return vhost_copy_to_user(vq, vq->used->ring + idx, head,
+   return vhost_copy_to_user(vq, vq->user.used->ring + idx, head,
  count * sizeof(*head));
 }
 
@@ -967,14 +967,14 @@ static inline int vhost_put_used_flags(struct 
vhost_virtqueue *vq)
 
 {
return vhost_put_user(vq, cpu_to_vhost16(vq, vq->used_flags),
- &vq->used->flags);
+ &vq->user.used->flags);
 }
 
 static inline int vhost_put_used_idx(struct vhost_virtqueue *vq)
 
 {
return vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx),
- &vq->used->idx);
+ &vq->user.used->idx);
 }
 
 #define vhost_get_user(vq, x, pt

[RFC PATCH 06/10] vhost: extract ioctl locking to common code

2021-09-29 Thread Vincent Whitchurch
Extract the mutex locking for the vhost ioctl into common code.  This
will allow the common code to easily add some extra handling required
for adding a kernel API to control vhost.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/common.c |  7 ++-
 drivers/vhost/net.c| 14 +-
 drivers/vhost/vhost.c  | 10 --
 drivers/vhost/vhost.h  |  1 +
 drivers/vhost/vsock.c  | 12 
 5 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/drivers/vhost/common.c b/drivers/vhost/common.c
index 27d4672b15d3..a5722ad65e24 100644
--- a/drivers/vhost/common.c
+++ b/drivers/vhost/common.c
@@ -60,8 +60,13 @@ static long vhost_ioctl(struct file *file, unsigned int 
ioctl, unsigned long arg
 {
struct vhost_dev *dev = file->private_data;
struct vhost *vhost = dev->vhost;
+   long ret;
 
-   return vhost->ops->ioctl(dev, ioctl, arg);
+   mutex_lock(&dev->mutex);
+   ret = vhost->ops->ioctl(dev, ioctl, arg);
+   mutex_unlock(&dev->mutex);
+
+   return ret;
 }
 
 static ssize_t vhost_read_iter(struct kiocb *iocb, struct iov_iter *to)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8910d9e2a74e..b5590b7862a9 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1505,7 +1505,6 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
struct vhost_net_ubuf_ref *ubufs, *oldubufs = NULL;
int r;
 
-   mutex_lock(&n->dev.mutex);
r = vhost_dev_check_owner(&n->dev);
if (r)
goto err;
@@ -1573,7 +1572,6 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
sockfd_put(oldsock);
}
 
-   mutex_unlock(&n->dev.mutex);
return 0;
 
 err_used:
@@ -1587,7 +1585,6 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
 err_vq:
mutex_unlock(&vq->mutex);
 err:
-   mutex_unlock(&n->dev.mutex);
return r;
 }
 
@@ -1598,7 +1595,6 @@ static long vhost_net_reset_owner(struct vhost_net *n)
long err;
struct vhost_iotlb *umem;
 
-   mutex_lock(&n->dev.mutex);
err = vhost_dev_check_owner(&n->dev);
if (err)
goto done;
@@ -1613,7 +1609,6 @@ static long vhost_net_reset_owner(struct vhost_net *n)
vhost_dev_reset_owner(&n->dev, umem);
vhost_net_vq_reset(n);
 done:
-   mutex_unlock(&n->dev.mutex);
if (tx_sock)
sockfd_put(tx_sock);
if (rx_sock)
@@ -1639,7 +1634,6 @@ static int vhost_net_set_features(struct vhost_net *n, 
u64 features)
vhost_hlen = 0;
sock_hlen = hdr_len;
}
-   mutex_lock(&n->dev.mutex);
if ((features & (1 << VHOST_F_LOG_ALL)) &&
!vhost_log_access_ok(&n->dev))
goto out_unlock;
@@ -1656,11 +1650,9 @@ static int vhost_net_set_features(struct vhost_net *n, 
u64 features)
n->vqs[i].sock_hlen = sock_hlen;
mutex_unlock(&n->vqs[i].vq.mutex);
}
-   mutex_unlock(&n->dev.mutex);
return 0;
 
 out_unlock:
-   mutex_unlock(&n->dev.mutex);
return -EFAULT;
 }
 
@@ -1668,7 +1660,6 @@ static long vhost_net_set_owner(struct vhost_net *n)
 {
int r;
 
-   mutex_lock(&n->dev.mutex);
if (vhost_dev_has_owner(&n->dev)) {
r = -EBUSY;
goto out;
@@ -1681,7 +1672,6 @@ static long vhost_net_set_owner(struct vhost_net *n)
vhost_net_clear_ubuf_info(n);
vhost_net_flush(n);
 out:
-   mutex_unlock(&n->dev.mutex);
return r;
 }
 
@@ -1721,20 +1711,18 @@ static long vhost_net_ioctl(struct vhost_dev *dev, 
unsigned int ioctl,
return -EFAULT;
if (features & ~VHOST_NET_BACKEND_FEATURES)
return -EOPNOTSUPP;
-   vhost_set_backend_features(&n->dev, features);
+   __vhost_set_backend_features(&n->dev, features);
return 0;
case VHOST_RESET_OWNER:
return vhost_net_reset_owner(n);
case VHOST_SET_OWNER:
return vhost_net_set_owner(n);
default:
-   mutex_lock(&n->dev.mutex);
r = vhost_dev_ioctl(&n->dev, ioctl, argp);
if (r == -ENOIOCTLCMD)
r = vhost_vring_ioctl(&n->dev, ioctl, argp);
else
vhost_net_flush(n);
-   mutex_unlock(&n->dev.mutex);
return r;
}
 }
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 9354061ce75e..9d6496b7ad85 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2821,18 +2821,24 @@ struct vhost_msg_node *vhost_de

[RFC PATCH 08/10] vhost: net: add support for kernel control

2021-09-29 Thread Vincent Whitchurch
Add support for kernel control to virtio-net.  For the vhost-*-kernel
devices, the ioctl to set the backend only provides the socket to
vhost-net but does not start the handling of the virtqueues.  The
handling of the virtqueues is started and stopped by the kernel.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/net.c | 106 
 1 file changed, 98 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index b5590b7862a9..977cfa89b216 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -144,6 +144,9 @@ struct vhost_net {
struct page_frag page_frag;
/* Refcount bias of page frag */
int refcnt_bias;
+   /* Used for storing backend sockets when stopped under kernel control */
+   struct socket *tx_sock;
+   struct socket *rx_sock;
 };
 
 static unsigned vhost_net_zcopy_mask __read_mostly;
@@ -1293,6 +1296,8 @@ static struct vhost_dev *vhost_net_open(struct vhost 
*vhost)
n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_RETRY_MAYFAIL);
if (!n)
return ERR_PTR(-ENOMEM);
+   n->tx_sock = NULL;
+   n->rx_sock = NULL;
vqs = kmalloc_array(VHOST_NET_VQ_MAX, sizeof(*vqs), GFP_KERNEL);
if (!vqs) {
kvfree(n);
@@ -1364,6 +1369,20 @@ static struct socket *vhost_net_stop_vq(struct vhost_net 
*n,
return sock;
 }
 
+/* Stops the virtqueue but doesn't unconsume the tap ring */
+static struct socket *__vhost_net_stop_vq(struct vhost_net *n,
+ struct vhost_virtqueue *vq)
+{
+   struct socket *sock;
+
+   mutex_lock(&vq->mutex);
+   sock = vhost_vq_get_backend(vq);
+   vhost_net_disable_vq(n, vq);
+   vhost_vq_set_backend(vq, NULL);
+   mutex_unlock(&vq->mutex);
+   return sock;
+}
+
 static void vhost_net_stop(struct vhost_net *n, struct socket **tx_sock,
   struct socket **rx_sock)
 {
@@ -1394,6 +1413,57 @@ static void vhost_net_flush(struct vhost_net *n)
}
 }
 
+static void vhost_net_start_vq(struct vhost_net *n, struct vhost_virtqueue *vq,
+  struct socket *sock)
+{
+   mutex_lock(&vq->mutex);
+   vhost_vq_set_backend(vq, sock);
+   vhost_vq_init_access(vq);
+   vhost_net_enable_vq(n, vq);
+   mutex_unlock(&vq->mutex);
+}
+
+static void vhost_net_dev_start_vq(struct vhost_dev *dev, u16 idx)
+{
+   struct vhost_net *n = container_of(dev, struct vhost_net, dev);
+
+   if (WARN_ON(idx >= ARRAY_SIZE(n->vqs)))
+   return;
+
+   if (idx == VHOST_NET_VQ_RX) {
+   vhost_net_start_vq(n, &n->vqs[idx].vq, n->rx_sock);
+   n->rx_sock = NULL;
+   } else if (idx == VHOST_NET_VQ_TX) {
+   vhost_net_start_vq(n, &n->vqs[idx].vq, n->tx_sock);
+   n->tx_sock = NULL;
+   }
+
+   vhost_net_flush_vq(n, idx);
+}
+
+static void vhost_net_dev_stop_vq(struct vhost_dev *dev, u16 idx)
+{
+   struct vhost_net *n = container_of(dev, struct vhost_net, dev);
+   struct socket *sock;
+
+   if (WARN_ON(idx >= ARRAY_SIZE(n->vqs)))
+   return;
+
+   if (!vhost_vq_get_backend(&n->vqs[idx].vq))
+   return;
+
+   sock = __vhost_net_stop_vq(n, &n->vqs[idx].vq);
+
+   vhost_net_flush(n);
+   synchronize_rcu();
+   vhost_net_flush(n);
+
+   if (idx == VHOST_NET_VQ_RX)
+   n->rx_sock = sock;
+   else if (idx == VHOST_NET_VQ_TX)
+   n->tx_sock = sock;
+}
+
 static void vhost_net_release(struct vhost_dev *dev)
 {
struct vhost_net *n = container_of(dev, struct vhost_net, dev);
@@ -1405,6 +1475,14 @@ static void vhost_net_release(struct vhost_dev *dev)
vhost_dev_stop(&n->dev);
vhost_dev_cleanup(&n->dev);
vhost_net_vq_reset(n);
+   if (n->tx_sock) {
+   WARN_ON(tx_sock);
+   tx_sock = n->tx_sock;
+   }
+   if (n->rx_sock) {
+   WARN_ON(rx_sock);
+   rx_sock = n->rx_sock;
+   }
if (tx_sock)
sockfd_put(tx_sock);
if (rx_sock)
@@ -1518,7 +1596,7 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
mutex_lock(&vq->mutex);
 
/* Verify that ring has been setup correctly. */
-   if (!vhost_vq_access_ok(vq)) {
+   if (!vhost_kernel(vq) && !vhost_vq_access_ok(vq)) {
r = -EFAULT;
goto err_vq;
}
@@ -1539,14 +1617,17 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
}
 
vhost_net_disable_vq(n, vq);
-   vhost_vq_set_backend(vq, sock);
+   if (!vhost_kernel(vq))
+   vhost_vq_set_backend(vq, sock);
  

[RFC PATCH 05/10] vhost: extract common code for file_operations handling

2021-09-29 Thread Vincent Whitchurch
There is some duplicated code for handling of file_operations among
vhost drivers.  Move this to a common file.

Having file_operations in a common place also makes adding functions for
obaining a handle to a vhost device from a file descriptor simpler.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/Makefile |   3 +
 drivers/vhost/common.c | 134 +
 drivers/vhost/net.c|  79 +++-
 drivers/vhost/vhost.h  |  15 +
 drivers/vhost/vsock.c  |  75 +++
 5 files changed, 197 insertions(+), 109 deletions(-)
 create mode 100644 drivers/vhost/common.c

diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index f3e1897cce85..b1ddc976aede 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -15,5 +15,8 @@ vhost_vdpa-y := vdpa.o
 
 obj-$(CONFIG_VHOST)+= vhost.o
 
+obj-$(CONFIG_VHOST)+= vhost_common.o
+vhost_common-y := common.o
+
 obj-$(CONFIG_VHOST_IOTLB) += vhost_iotlb.o
 vhost_iotlb-y := iotlb.o
diff --git a/drivers/vhost/common.c b/drivers/vhost/common.c
new file mode 100644
index ..27d4672b15d3
--- /dev/null
+++ b/drivers/vhost/common.c
@@ -0,0 +1,134 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vhost.h"
+
+struct vhost_ops;
+
+struct vhost {
+   struct miscdevice misc;
+   const struct vhost_ops *ops;
+};
+
+static int vhost_open(struct inode *inode, struct file *file)
+{
+   struct miscdevice *misc = file->private_data;
+   struct vhost *vhost = container_of(misc, struct vhost, misc);
+   struct vhost_dev *dev;
+
+   dev = vhost->ops->open(vhost);
+   if (IS_ERR(dev))
+   return PTR_ERR(dev);
+
+   dev->vhost = vhost;
+   dev->file = file;
+   file->private_data = dev;
+
+   return 0;
+}
+
+static int vhost_release(struct inode *inode, struct file *file)
+{
+   struct vhost_dev *dev = file->private_data;
+   struct vhost *vhost = dev->vhost;
+
+   vhost->ops->release(dev);
+
+   return 0;
+}
+
+static long vhost_ioctl(struct file *file, unsigned int ioctl, unsigned long 
arg)
+{
+   struct vhost_dev *dev = file->private_data;
+   struct vhost *vhost = dev->vhost;
+
+   return vhost->ops->ioctl(dev, ioctl, arg);
+}
+
+static ssize_t vhost_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+   struct file *file = iocb->ki_filp;
+   struct vhost_dev *dev = file->private_data;
+   int noblock = file->f_flags & O_NONBLOCK;
+
+   return vhost_chr_read_iter(dev, to, noblock);
+}
+
+static ssize_t vhost_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+   struct file *file = iocb->ki_filp;
+   struct vhost_dev *dev = file->private_data;
+
+   return vhost_chr_write_iter(dev, from);
+}
+
+static __poll_t vhost_poll(struct file *file, poll_table *wait)
+{
+   struct vhost_dev *dev = file->private_data;
+
+   return vhost_chr_poll(file, dev, wait);
+}
+
+static const struct file_operations vhost_fops = {
+   .owner  = THIS_MODULE,
+   .open   = vhost_open,
+   .release= vhost_release,
+   .llseek = noop_llseek,
+   .unlocked_ioctl = vhost_ioctl,
+   .compat_ioctl   = compat_ptr_ioctl,
+   .read_iter  = vhost_read_iter,
+   .write_iter = vhost_write_iter,
+   .poll   = vhost_poll,
+};
+
+struct vhost *vhost_register(const struct vhost_ops *ops)
+{
+   struct vhost *vhost;
+   int ret;
+
+   vhost = kzalloc(sizeof(*vhost), GFP_KERNEL);
+   if (!vhost)
+   return ERR_PTR(-ENOMEM);
+
+   vhost->misc.minor = ops->minor;
+   vhost->misc.name = ops->name;
+   vhost->misc.fops = &vhost_fops;
+   vhost->ops = ops;
+
+   ret = misc_register(&vhost->misc);
+   if (ret) {
+   kfree(vhost);
+   return ERR_PTR(ret);
+   }
+
+   return vhost;
+}
+EXPORT_SYMBOL_GPL(vhost_register);
+
+void vhost_unregister(struct vhost *vhost)
+{
+   misc_deregister(&vhost->misc);
+   kfree(vhost);
+}
+EXPORT_SYMBOL_GPL(vhost_unregister);
+
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8f82b646d4af..8910d9e2a74e 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1281,7 +1281,7 @@ static void handle_rx_net(struct vhost_work *work)
handle_rx(net);
 }
 
-static int vhost_net_open(struct inode *inode, struct file *f)
+static struct vhost_dev *vhost_net_open(struct vhost *vhost)
 {
struct vhost_net *n;
struct vhost_dev *dev;
@@ -1292,11 +1292,11 @@ static int vhost_net_open(struct inode *inode, stru

[RFC PATCH 09/10] vdpa: add test driver for kernel buffers in vhost

2021-09-29 Thread Vincent Whitchurch
Add a driver which uses the kernel buffer support in vhost to allow
virtio-net and vhost-net to be run in a looback setup on the same
system.

While this feature could be useful on its own (for example for
development of the vhost/virtio drivers), this driver is primarily
intended to be used for testing the support for kernel buffers in vhost.

A selftest which uses this driver will be added.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vdpa/Kconfig  |   8 +
 drivers/vdpa/Makefile |   1 +
 drivers/vdpa/vhost_kernel_test/Makefile   |   2 +
 .../vhost_kernel_test/vhost_kernel_test.c | 575 ++
 4 files changed, 586 insertions(+)
 create mode 100644 drivers/vdpa/vhost_kernel_test/Makefile
 create mode 100644 drivers/vdpa/vhost_kernel_test/vhost_kernel_test.c

diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index 3d91982d8371..308e5f11d2a9 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -43,6 +43,14 @@ config VDPA_USER
  With VDUSE it is possible to emulate a vDPA Device
  in a userspace program.
 
+config VHOST_KERNEL_TEST
+   tristate "vhost kernel test driver"
+   depends on EVENTFD
+   select VHOST
+   select VHOST_KERNEL
+   help
+ Test driver for the vhost kernel-space buffer support.
+
 config IFCVF
tristate "Intel IFC VF vDPA driver"
depends on PCI_MSI
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index f02ebed33f19..4ba8a4b350c4 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -2,6 +2,7 @@
 obj-$(CONFIG_VDPA) += vdpa.o
 obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
 obj-$(CONFIG_VDPA_USER) += vdpa_user/
+obj-$(CONFIG_VHOST_KERNEL_TEST) += vhost_kernel_test/
 obj-$(CONFIG_IFCVF)+= ifcvf/
 obj-$(CONFIG_MLX5_VDPA) += mlx5/
 obj-$(CONFIG_VP_VDPA)+= virtio_pci/
diff --git a/drivers/vdpa/vhost_kernel_test/Makefile 
b/drivers/vdpa/vhost_kernel_test/Makefile
new file mode 100644
index ..7e0c7bdb3c0e
--- /dev/null
+++ b/drivers/vdpa/vhost_kernel_test/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_VHOST_KERNEL_TEST) += vhost_kernel_test.o
diff --git a/drivers/vdpa/vhost_kernel_test/vhost_kernel_test.c 
b/drivers/vdpa/vhost_kernel_test/vhost_kernel_test.c
new file mode 100644
index ..82364cd02667
--- /dev/null
+++ b/drivers/vdpa/vhost_kernel_test/vhost_kernel_test.c
@@ -0,0 +1,575 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct vktest_vq {
+   struct vktest *vktest;
+   struct eventfd_ctx *kick;
+   struct eventfd_ctx *call;
+   u64 desc_addr;
+   u64 device_addr;
+   u64 driver_addr;
+   u32 num;
+   bool ready;
+   wait_queue_entry_t call_wait;
+   wait_queue_head_t *wqh;
+   poll_table call_pt;
+   struct vdpa_callback cb;
+   struct irq_work irq_work;
+};
+
+struct vktest {
+   struct vdpa_device vdpa;
+   struct mutex mutex;
+   struct vhost_dev *vhost;
+   struct virtio_net_config config;
+   struct vktest_vq vqs[2];
+   u8 status;
+};
+
+static struct vktest *vdpa_to_vktest(struct vdpa_device *vdpa)
+{
+   return container_of(vdpa, struct vktest, vdpa);
+}
+
+static int vktest_set_vq_address(struct vdpa_device *vdpa, u16 idx,
+u64 desc_area, u64 driver_area,
+u64 device_area)
+{
+   struct vktest *vktest = vdpa_to_vktest(vdpa);
+   struct vktest_vq *vq = &vktest->vqs[idx];
+
+   vq->desc_addr = desc_area;
+   vq->driver_addr = driver_area;
+   vq->device_addr = device_area;
+
+   return 0;
+}
+
+static void vktest_set_vq_num(struct vdpa_device *vdpa, u16 idx, u32 num)
+{
+   struct vktest *vktest = vdpa_to_vktest(vdpa);
+   struct vktest_vq *vq = &vktest->vqs[idx];
+
+   vq->num = num;
+}
+
+static void vktest_kick_vq(struct vdpa_device *vdpa, u16 idx)
+{
+   struct vktest *vktest = vdpa_to_vktest(vdpa);
+   struct vktest_vq *vq = &vktest->vqs[idx];
+
+   if (vq->kick)
+   eventfd_signal(vq->kick, 1);
+}
+
+static void vktest_set_vq_cb(struct vdpa_device *vdpa, u16 idx,
+struct vdpa_callback *cb)
+{
+   struct vktest *vktest = vdpa_to_vktest(vdpa);
+   struct vktest_vq *vq = &vktest->vqs[idx];
+
+   vq->cb = *cb;
+}
+
+static void vktest_set_vq_ready(struct vdpa_device *vdpa, u16 idx, bool ready)
+{
+   struct vktest *vktest = vdpa_to_vktest(vdpa);
+   struct vktest_vq *vq = &vktest->vqs[idx];
+   struct vhost_dev *vhost = vktest->vhost;
+
+   if (

[RFC PATCH 07/10] vhost: add support for kernel control

2021-09-29 Thread Vincent Whitchurch
Add functions to allow vhost buffers to be placed in kernel space and
for the vhost driver to be controlled from a kernel driver after initial
setup by userspace.

The kernel control is only possible on new /dev/vhost-*-kernel devices,
and on these devices userspace cannot write to the iotlb, nor can it
control the placement and attributes of the virtqueues, nor start/stop
the virtqueue handling after the kernel starts using it.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/common.c | 201 +
 drivers/vhost/vhost.c  |  92 +--
 drivers/vhost/vhost.h  |   3 +
 include/linux/vhost.h  |  23 +
 4 files changed, 310 insertions(+), 9 deletions(-)
 create mode 100644 include/linux/vhost.h

diff --git a/drivers/vhost/common.c b/drivers/vhost/common.c
index a5722ad65e24..f9758920a33a 100644
--- a/drivers/vhost/common.c
+++ b/drivers/vhost/common.c
@@ -25,7 +25,9 @@
 struct vhost_ops;
 
 struct vhost {
+   char kernelname[128];
struct miscdevice misc;
+   struct miscdevice kernelmisc;
const struct vhost_ops *ops;
 };
 
@@ -46,6 +48,24 @@ static int vhost_open(struct inode *inode, struct file *file)
return 0;
 }
 
+static int vhost_kernel_open(struct inode *inode, struct file *file)
+{
+   struct miscdevice *misc = file->private_data;
+   struct vhost *vhost = container_of(misc, struct vhost, kernelmisc);
+   struct vhost_dev *dev;
+
+   dev = vhost->ops->open(vhost);
+   if (IS_ERR(dev))
+   return PTR_ERR(dev);
+
+   dev->vhost = vhost;
+   dev->file = file;
+   dev->kernel = true;
+   file->private_data = dev;
+
+   return 0;
+}
+
 static int vhost_release(struct inode *inode, struct file *file)
 {
struct vhost_dev *dev = file->private_data;
@@ -69,6 +89,46 @@ static long vhost_ioctl(struct file *file, unsigned int 
ioctl, unsigned long arg
return ret;
 }
 
+static long vhost_kernel_ioctl(struct file *file, unsigned int ioctl, unsigned 
long arg)
+{
+   struct vhost_dev *dev = file->private_data;
+   struct vhost *vhost = dev->vhost;
+   long ret;
+
+   /* Only the kernel is allowed to control virtqueue attributes */
+   switch (ioctl) {
+   case VHOST_SET_VRING_NUM:
+   case VHOST_SET_VRING_ADDR:
+   case VHOST_SET_VRING_BASE:
+   case VHOST_SET_VRING_ENDIAN:
+   case VHOST_SET_MEM_TABLE:
+   case VHOST_SET_LOG_BASE:
+   case VHOST_SET_LOG_FD:
+   return -EPERM;
+   }
+
+   mutex_lock(&dev->mutex);
+
+   /*
+* Userspace should perform all reqired setup on the vhost device
+* _before_ asking the kernel to start using it.
+*
+* Note that ->kernel_attached is never reset, if userspace wants to
+* attach again it should open the device again.
+*/
+   if (dev->kernel_attached) {
+   ret = -EPERM;
+   goto out_unlock;
+   }
+
+   ret = vhost->ops->ioctl(dev, ioctl, arg);
+
+out_unlock:
+   mutex_unlock(&dev->mutex);
+
+   return ret;
+}
+
 static ssize_t vhost_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
struct file *file = iocb->ki_filp;
@@ -105,6 +165,129 @@ static const struct file_operations vhost_fops = {
.poll   = vhost_poll,
 };
 
+static const struct file_operations vhost_kernel_fops = {
+   .owner  = THIS_MODULE,
+   .open   = vhost_kernel_open,
+   .release= vhost_release,
+   .llseek = noop_llseek,
+   .unlocked_ioctl = vhost_kernel_ioctl,
+   .compat_ioctl   = compat_ptr_ioctl,
+};
+
+static void vhost_dev_lock_vqs(struct vhost_dev *d)
+{
+   int i;
+
+   for (i = 0; i < d->nvqs; ++i)
+   mutex_lock_nested(&d->vqs[i]->mutex, i);
+}
+
+static void vhost_dev_unlock_vqs(struct vhost_dev *d)
+{
+   int i;
+
+   for (i = 0; i < d->nvqs; ++i)
+   mutex_unlock(&d->vqs[i]->mutex);
+}
+
+struct vhost_dev *vhost_dev_get(int fd)
+{
+   struct file *file;
+   struct vhost_dev *dev;
+   struct vhost_dev *ret;
+   int err;
+   int i;
+
+   file = fget(fd);
+   if (!file)
+   return ERR_PTR(-EBADF);
+
+   if (file->f_op != &vhost_kernel_fops) {
+   ret = ERR_PTR(-EINVAL);
+   goto err_fput;
+   }
+
+   dev = file->private_data;
+
+   mutex_lock(&dev->mutex);
+   vhost_dev_lock_vqs(dev);
+
+   err = vhost_dev_check_owner(dev);
+   if (err) {
+   ret = ERR_PTR(err);
+   goto err_unlock;
+   }
+
+   if (dev->kernel_attached) {
+   ret = ERR_PTR(-EBUSY);
+   goto err_unlock;
+   }
+
+   if (!dev->iotlb) {
+   ret = ERR_PTR(-EINVAL);
+   goto err_unlock;
+   }
+
+   fo

[RFC PATCH 03/10] vhost: add iov wrapper

2021-09-29 Thread Vincent Whitchurch
In order to prepare for supporting buffers in kernel space, add a
vhost_iov struct to wrap the userspace iovec, add helper functions for
accessing this struct, and use these helpers from all vhost drivers.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/net.c   | 13 ++--
 drivers/vhost/scsi.c  | 30 +--
 drivers/vhost/test.c  |  2 +-
 drivers/vhost/vhost.c | 25 +++---
 drivers/vhost/vhost.h | 48 +--
 drivers/vhost/vsock.c |  8 
 6 files changed, 81 insertions(+), 45 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 28ef323882fb..8f82b646d4af 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -607,9 +607,9 @@ static size_t init_iov_iter(struct vhost_virtqueue *vq, 
struct iov_iter *iter,
size_t hdr_size, int out)
 {
/* Skip header. TODO: support TSO. */
-   size_t len = iov_length(vq->iov, out);
+   size_t len = vhost_iov_length(vq, vq->iov, out);
 
-   iov_iter_init(iter, WRITE, vq->iov, out, len);
+   vhost_iov_iter_init(vq, iter, WRITE, vq->iov, out, len);
iov_iter_advance(iter, hdr_size);
 
return iov_iter_count(iter);
@@ -1080,7 +1080,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
log += *log_num;
}
heads[headcount].id = cpu_to_vhost32(vq, d);
-   len = iov_length(vq->iov + seg, in);
+   len = vhost_iov_length(vq, vq->iov + seg, in);
heads[headcount].len = cpu_to_vhost32(vq, len);
datalen -= len;
++headcount;
@@ -1182,14 +1182,14 @@ static void handle_rx(struct vhost_net *net)
msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
/* On overrun, truncate and discard */
if (unlikely(headcount > UIO_MAXIOV)) {
-   iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
+   vhost_iov_iter_init(vq, &msg.msg_iter, READ, vq->iov, 
1, 1);
err = sock->ops->recvmsg(sock, &msg,
 1, MSG_DONTWAIT | MSG_TRUNC);
pr_debug("Discarded rx packet: len %zd\n", sock_len);
continue;
}
/* We don't need to be notified again. */
-   iov_iter_init(&msg.msg_iter, READ, vq->iov, in, vhost_len);
+   vhost_iov_iter_init(vq, &msg.msg_iter, READ, vq->iov, in, 
vhost_len);
fixup = msg.msg_iter;
if (unlikely((vhost_hlen))) {
/* We will supply the header ourselves
@@ -1212,8 +1212,7 @@ static void handle_rx(struct vhost_net *net)
if (unlikely(vhost_hlen)) {
if (copy_to_iter(&hdr, sizeof(hdr),
 &fixup) != sizeof(hdr)) {
-   vq_err(vq, "Unable to write vnet_hdr "
-  "at addr %p\n", vq->iov->iov_base);
+   vq_err(vq, "Unable to write vnet_hdr");
goto out;
}
} else {
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index bcf53685439d..22a372b52165 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -80,7 +80,7 @@ struct vhost_scsi_cmd {
struct scatterlist *tvc_prot_sgl;
struct page **tvc_upages;
/* Pointer to response header iovec */
-   struct iovec tvc_resp_iov;
+   struct vhost_iov tvc_resp_iov;
/* Pointer to vhost_scsi for our device */
struct vhost_scsi *tvc_vhost;
/* Pointer to vhost_virtqueue for the cmd */
@@ -208,7 +208,7 @@ struct vhost_scsi_tmf {
struct se_cmd se_cmd;
u8 scsi_resp;
struct vhost_scsi_inflight *inflight;
-   struct iovec resp_iov;
+   struct vhost_iov resp_iov;
int in_iovs;
int vq_desc;
 };
@@ -487,9 +487,9 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct 
vhost_scsi_evt *evt)
return;
}
 
-   if ((vq->iov[out].iov_len != sizeof(struct virtio_scsi_event))) {
+   if (vhost_iov_len(vq, &vq->iov[out]) != sizeof(struct 
virtio_scsi_event)) {
vq_err(vq, "Expecting virtio_scsi_event, got %zu bytes\n",
-   vq->iov[out].iov_len);
+   vhost_iov_len(vq, &vq->iov[out]));
vs->vs_events_missed = true;
return;
}
@@ -499,7 +499,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct 
vhost_scsi_evt *evt)
vs->vs_events_missed = false;
}
 
-   iov_iter_init(&iov_iter, READ, &v

[RFC PATCH 01/10] vhost: scsi: use copy_to_iter()

2021-09-29 Thread Vincent Whitchurch
Use copy_to_iter() instead of __copy_to_user() when accessing the virtio
buffers as a preparation for supporting kernel-space buffers in vhost.

It also makes the code consistent since the driver is already using
copy_to_iter() in the other places it accesses the queued buffers.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/scsi.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 532e204f2b1b..bcf53685439d 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -462,7 +462,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct 
vhost_scsi_evt *evt)
 {
struct vhost_virtqueue *vq = &vs->vqs[VHOST_SCSI_VQ_EVT].vq;
struct virtio_scsi_event *event = &evt->event;
-   struct virtio_scsi_event __user *eventp;
+   struct iov_iter iov_iter;
unsigned out, in;
int head, ret;
 
@@ -499,9 +499,10 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct 
vhost_scsi_evt *evt)
vs->vs_events_missed = false;
}
 
-   eventp = vq->iov[out].iov_base;
-   ret = __copy_to_user(eventp, event, sizeof(*event));
-   if (!ret)
+   iov_iter_init(&iov_iter, READ, &vq->iov[out], in, sizeof(*event));
+
+   ret = copy_to_iter(event, sizeof(*event), &iov_iter);
+   if (ret == sizeof(*event))
vhost_add_used_and_signal(&vs->dev, vq, head, 0);
else
vq_err(vq, "Faulted on vhost_scsi_send_event\n");
@@ -802,17 +803,18 @@ static void vhost_scsi_target_queue_cmd(struct 
vhost_scsi_cmd *cmd)
 static void
 vhost_scsi_send_bad_target(struct vhost_scsi *vs,
   struct vhost_virtqueue *vq,
-  int head, unsigned out)
+  int head, unsigned out, unsigned in)
 {
-   struct virtio_scsi_cmd_resp __user *resp;
struct virtio_scsi_cmd_resp rsp;
+   struct iov_iter iov_iter;
int ret;
 
+   iov_iter_init(&iov_iter, READ, &vq->iov[out], in, sizeof(rsp));
+
memset(&rsp, 0, sizeof(rsp));
rsp.response = VIRTIO_SCSI_S_BAD_TARGET;
-   resp = vq->iov[out].iov_base;
-   ret = __copy_to_user(resp, &rsp, sizeof(rsp));
-   if (!ret)
+   ret = copy_to_iter(&rsp, sizeof(rsp), &iov_iter);
+   if (ret == sizeof(rsp))
vhost_add_used_and_signal(&vs->dev, vq, head, 0);
else
pr_err("Faulted on virtio_scsi_cmd_resp\n");
@@ -1124,7 +1126,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct 
vhost_virtqueue *vq)
if (ret == -ENXIO)
break;
else if (ret == -EIO)
-   vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out);
+   vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out, 
vc.in);
} while (likely(!vhost_exceeds_weight(vq, ++c, 0)));
 out:
mutex_unlock(&vq->mutex);
@@ -1347,7 +1349,7 @@ vhost_scsi_ctl_handle_vq(struct vhost_scsi *vs, struct 
vhost_virtqueue *vq)
if (ret == -ENXIO)
break;
else if (ret == -EIO)
-   vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out);
+   vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out, 
vc.in);
} while (likely(!vhost_exceeds_weight(vq, ++c, 0)));
 out:
mutex_unlock(&vq->mutex);
-- 
2.28.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 10/10] selftests: add vhost_kernel tests

2021-09-29 Thread Vincent Whitchurch
Add a test which uses the vhost_kernel_test driver to test the vhost
kernel buffers support.

The test uses virtio-net and vhost-net and sets up a loopback network
and then tests that ping works between the interface.  It also checks
that unbinding/rebinding of devices and closing the involved file
descriptors in different sequences during active use works correctly.

Signed-off-by: Vincent Whitchurch 
---
 tools/testing/selftests/Makefile  |   1 +
 .../vhost_kernel/vhost_kernel_test.c  | 287 ++
 .../vhost_kernel/vhost_kernel_test.sh | 125 
 3 files changed, 413 insertions(+)
 create mode 100644 tools/testing/selftests/vhost_kernel/vhost_kernel_test.c
 create mode 100755 tools/testing/selftests/vhost_kernel/vhost_kernel_test.sh

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index c852eb40c4f7..14a8349e3dc1 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -73,6 +73,7 @@ TARGETS += tmpfs
 TARGETS += tpm2
 TARGETS += user
 TARGETS += vDSO
+TARGETS += vhost_kernel
 TARGETS += vm
 TARGETS += x86
 TARGETS += zram
diff --git a/tools/testing/selftests/vhost_kernel/vhost_kernel_test.c 
b/tools/testing/selftests/vhost_kernel/vhost_kernel_test.c
new file mode 100644
index ..b0f889bd2f72
--- /dev/null
+++ b/tools/testing/selftests/vhost_kernel/vhost_kernel_test.c
@@ -0,0 +1,287 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifndef VIRTIO_F_ACCESS_PLATFORM
+#define VIRTIO_F_ACCESS_PLATFORM 33
+#endif
+
+#ifndef VKTEST_ATTACH_VHOST
+#define VKTEST_ATTACH_VHOST _IOW(0xbf, 0x31, int)
+#endif
+
+static int vktest;
+static const int num_vqs = 2;
+
+static int tun_alloc(char *dev)
+{
+   int hdrsize = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+   struct ifreq ifr = {
+   .ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR,
+   };
+   int fd, ret;
+
+   fd = open("/dev/net/tun", O_RDWR);
+   if (fd < 0)
+   err(1, "open /dev/net/tun");
+
+   strncpy(ifr.ifr_name, dev, IFNAMSIZ);
+
+   ret = ioctl(fd, TUNSETIFF, &ifr);
+   if (ret < 0)
+   err(1, "TUNSETIFF");
+
+   ret = ioctl(fd, TUNSETOFFLOAD,
+   TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 | TUN_F_TSO_ECN);
+   if (ret < 0)
+   err(1, "TUNSETOFFLOAD");
+
+   ret = ioctl(fd, TUNSETVNETHDRSZ, &hdrsize);
+   if (ret < 0)
+   err(1, "TUNSETVNETHDRSZ");
+
+   strncpy(dev, ifr.ifr_name, IFNAMSIZ);
+
+   return fd;
+}
+
+static void handle_signal(int signum)
+{
+   if (signum == SIGUSR1)
+   close(vktest);
+}
+
+static void vhost_net_set_backend(int vhost)
+{
+   char if_name[IFNAMSIZ];
+   int tap_fd;
+
+   snprintf(if_name, IFNAMSIZ, "vhostkernel%d", 0);
+
+   tap_fd = tun_alloc(if_name);
+
+   for (int i = 0; i < num_vqs; i++) {
+   struct vhost_vring_file txbackend = {
+   .index = i,
+   .fd = tap_fd,
+   };
+   int ret;
+
+   ret = ioctl(vhost, VHOST_NET_SET_BACKEND, &txbackend);
+   if (ret < 0)
+   err(1, "VHOST_NET_SET_BACKEND");
+   }
+}
+
+static void prepare_vhost_vktest(int vhost, int vktest)
+{
+   uint64_t features = 1llu << VIRTIO_F_ACCESS_PLATFORM | 1llu << 
VIRTIO_F_VERSION_1;
+   int ret;
+
+   for (int i = 0; i < num_vqs; i++) {
+   int kickfd = eventfd(0, EFD_CLOEXEC);
+
+   if (kickfd < 0)
+   err(1, "eventfd");
+
+   struct vhost_vring_file kick = {
+   .index = i,
+   .fd = kickfd,
+   };
+
+   ret = ioctl(vktest, VHOST_SET_VRING_KICK, &kick);
+   if (ret < 0)
+   err(1, "VHOST_SET_VRING_KICK");
+
+   ret = ioctl(vhost, VHOST_SET_VRING_KICK, &kick);
+   if (ret < 0)
+   err(1, "VHOST_SET_VRING_KICK");
+   }
+
+   for (int i = 0; i < num_vqs; i++) {
+   int callfd = eventfd(0, EFD_CLOEXEC);
+
+   if (callfd < 0)
+   err(1, "eventfd");
+
+   struct vhost_vring_file call = {
+   .index = i,
+   .fd = callfd,
+   };
+
+   ret = ioctl(vktest, VHOST_SET_VRING_CALL, &call);
+   if (ret < 0)
+   err(1, "VHOST_SET_VRING_CALL");
+
+   ret = i

[RFC PATCH 04/10] vhost: add support for kernel buffers

2021-09-29 Thread Vincent Whitchurch
Handle the virtio rings and buffers being placed in kernel memory
instead of userspace memory.  The software IOTLB support is used to
ensure that only permitted regions are accessed.

Note that this patch does not provide a way to actually request that
kernel memory be used, an API for that is added in a later patch.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/Kconfig |   6 ++
 drivers/vhost/vhost.c | 222 +-
 drivers/vhost/vhost.h |  34 +++
 3 files changed, 257 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 587fbae06182..9e76ed485fe1 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -20,6 +20,12 @@ config VHOST
  This option is selected by any driver which needs to access
  the core of vhost.
 
+config VHOST_KERNEL
+   tristate
+   help
+ This option is selected by any driver which needs to access the
+ support for kernel buffers in vhost.
+
 menuconfig VHOST_MENU
bool "VHOST drivers"
default y
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index ce81eee2a3fa..9354061ce75e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -49,6 +49,9 @@ enum {
 #define vhost_used_event(vq) ((__virtio16 __user 
*)&vq->user.avail->ring[vq->num])
 #define vhost_avail_event(vq) ((__virtio16 __user 
*)&vq->user.used->ring[vq->num])
 
+#define vhost_used_event_kern(vq) ((__virtio16 
*)&vq->kern.avail->ring[vq->num])
+#define vhost_avail_event_kern(vq) ((__virtio16 
*)&vq->kern.used->ring[vq->num])
+
 #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY
 static void vhost_disable_cross_endian(struct vhost_virtqueue *vq)
 {
@@ -482,6 +485,7 @@ void vhost_dev_init(struct vhost_dev *dev,
dev->iotlb = NULL;
dev->mm = NULL;
dev->worker = NULL;
+   dev->kernel = false;
dev->iov_limit = iov_limit;
dev->weight = weight;
dev->byte_weight = byte_weight;
@@ -785,6 +789,18 @@ static inline void __user *vhost_vq_meta_fetch(struct 
vhost_virtqueue *vq,
return (void __user *)(uintptr_t)(map->addr + addr - map->start);
 }
 
+static inline void *vhost_vq_meta_fetch_kern(struct vhost_virtqueue *vq,
+  u64 addr, unsigned int size,
+  int type)
+{
+   const struct vhost_iotlb_map *map = vq->meta_iotlb[type];
+
+   if (!map)
+   return NULL;
+
+   return (void *)(uintptr_t)(map->addr + addr - map->start);
+}
+
 /* Can we switch to this memory table? */
 /* Caller should have device mutex but not vq mutex */
 static bool memory_access_ok(struct vhost_dev *d, struct vhost_iotlb *umem,
@@ -849,6 +865,40 @@ static int vhost_copy_to_user(struct vhost_virtqueue *vq, 
void __user *to,
return ret;
 }
 
+static int vhost_copy_to_kern(struct vhost_virtqueue *vq, void *to,
+ const void *from, unsigned int size)
+{
+   int ret;
+
+   /* This function should be called after iotlb
+* prefetch, which means we're sure that all vq
+* could be access through iotlb. So -EAGAIN should
+* not happen in this case.
+*/
+   struct iov_iter t;
+   void *kaddr = vhost_vq_meta_fetch_kern(vq,
+(u64)(uintptr_t)to, size,
+VHOST_ADDR_USED);
+
+   if (kaddr) {
+   memcpy(kaddr, from, size);
+   return 0;
+   }
+
+   ret = translate_desc(vq, (u64)(uintptr_t)to, size, vq->iotlb_iov,
+ARRAY_SIZE(vq->iotlb_iov),
+VHOST_ACCESS_WO);
+   if (ret < 0)
+   goto out;
+   iov_iter_kvec(&t, WRITE, &vq->iotlb_iov->kvec, ret, size);
+   ret = copy_to_iter(from, size, &t);
+   if (ret == size)
+   ret = 0;
+
+out:
+   return ret;
+}
+
 static int vhost_copy_from_user(struct vhost_virtqueue *vq, void *to,
void __user *from, unsigned size)
 {
@@ -889,6 +939,43 @@ static int vhost_copy_from_user(struct vhost_virtqueue 
*vq, void *to,
return ret;
 }
 
+static int vhost_copy_from_kern(struct vhost_virtqueue *vq, void *to,
+   void *from, unsigned int size)
+{
+   int ret;
+
+   /* This function should be called after iotlb
+* prefetch, which means we're sure that vq
+* could be access through iotlb. So -EAGAIN should
+* not happen in this case.
+*/
+   void *kaddr = vhost_vq_meta_fetch_kern(vq,
+(u64)(uintptr_t)from, size,
+VHOST_ADDR_DESC);
+   struct iov_iter f;
+
+   if (kaddr) {
+   memcpy(to, kaddr, size);
+   return 0;
+ 

[RFC PATCH 00/10] Support kernel buffers in vhost

2021-09-29 Thread Vincent Whitchurch
vhost currently expects that the virtqueues and the queued buffers are
accessible from a userspace process' address space.  However, when using vhost
to communicate between two Linux systems running on two physical CPUs in an AMP
configuration (on a single SoC or via something like PCIe), it is undesirable
from a security perspective to make the entire kernel memory of the other Linux
system accessible from userspace.

To remedy this, this series adds support to vhost for placing the virtqueues
and queued buffers in kernel memory.  Since userspace should not be allowed to
control the placement and attributes of these virtqueues, a mechanism to do
this from kernel space is added.  A vDPA-based test driver is added which uses
this support to allow virtio-net and vhost-net to communicate with each other
on the same system without exposing kernel memory to userspace via /dev/mem or
similar.

This vDPA-based test driver is intended to be used as the basis for the
implementation of driver which will allow Linux-Linux communication between
physical CPUs on SoCs using virtio and vhost, for instance by using information
from the device tree to indicate the location of shared memory, and the mailbox
API to trigger interrupts between the CPUs.

This patchset is also available at:

 https://github.com/vwax/linux/tree/vhost/rfc

Vincent Whitchurch (10):
  vhost: scsi: use copy_to_iter()
  vhost: push virtqueue area pointers into a user struct
  vhost: add iov wrapper
  vhost: add support for kernel buffers
  vhost: extract common code for file_operations handling
  vhost: extract ioctl locking to common code
  vhost: add support for kernel control
  vhost: net: add support for kernel control
  vdpa: add test driver for kernel buffers in vhost
  selftests: add vhost_kernel tests

 drivers/vdpa/Kconfig  |   8 +
 drivers/vdpa/Makefile |   1 +
 drivers/vdpa/vhost_kernel_test/Makefile   |   2 +
 .../vhost_kernel_test/vhost_kernel_test.c | 575 ++
 drivers/vhost/Kconfig |   6 +
 drivers/vhost/Makefile|   3 +
 drivers/vhost/common.c| 340 +++
 drivers/vhost/net.c   | 212 ---
 drivers/vhost/scsi.c  |  50 +-
 drivers/vhost/test.c  |   2 +-
 drivers/vhost/vdpa.c  |   6 +-
 drivers/vhost/vhost.c | 437 ++---
 drivers/vhost/vhost.h | 109 +++-
 drivers/vhost/vsock.c |  95 +--
 include/linux/vhost.h |  23 +
 tools/testing/selftests/Makefile  |   1 +
 .../vhost_kernel/vhost_kernel_test.c  | 287 +
 .../vhost_kernel/vhost_kernel_test.sh | 125 
 18 files changed, 2020 insertions(+), 262 deletions(-)
 create mode 100644 drivers/vdpa/vhost_kernel_test/Makefile
 create mode 100644 drivers/vdpa/vhost_kernel_test/vhost_kernel_test.c
 create mode 100644 drivers/vhost/common.c
 create mode 100644 include/linux/vhost.h
 create mode 100644 tools/testing/selftests/vhost_kernel/vhost_kernel_test.c
 create mode 100755 tools/testing/selftests/vhost_kernel/vhost_kernel_test.sh

-- 
2.28.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost: add support for mandatory barriers

2021-08-24 Thread Vincent Whitchurch
On Mon, Aug 23, 2021 at 11:19:56PM +0200, Michael S. Tsirkin wrote:
> On Mon, Aug 23, 2021 at 10:14:37AM +0200, Vincent Whitchurch wrote:
> > vhost always uses SMP-conditional barriers, but these may not be
> > sufficient when vhost is used to communicate between heterogeneous
> > processors in an AMP configuration, especially since they're NOPs on
> > !SMP builds.
> > 
> > To solve this, use the virtio_*() barrier functions and ask them for
> > non-weak barriers if requested by userspace.
> > 
> > Signed-off-by: Vincent Whitchurch 
> 
> I am inclined to say let's (ab)use VIRTIO_F_ORDER_PLATFORM for this.
> Jason what do you think?

OK, thanks, I'll look into that.

> Also is the use of DMA variants really the intended thing here? Could
> you point me at some examples please?

I'm using this on an ARM-based SoC.  The main processor is a Cortex-A53
(arm64) and this processor runs the virtio drivers.  The SoC also has
another processor which is a Cortex-A5 (arm32) and this processor runs
the virtio device end using vhost.  There is no coherency between these
two processors and to each other they look like any other DMA-capable
hardware.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] vhost: add support for mandatory barriers

2021-08-23 Thread Vincent Whitchurch
vhost always uses SMP-conditional barriers, but these may not be
sufficient when vhost is used to communicate between heterogeneous
processors in an AMP configuration, especially since they're NOPs on
!SMP builds.

To solve this, use the virtio_*() barrier functions and ask them for
non-weak barriers if requested by userspace.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/vhost.c  | 23 ++-
 drivers/vhost/vhost.h  |  2 ++
 include/uapi/linux/vhost.h |  2 ++
 3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index b9e853e6094d..f7172e1bc395 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -500,6 +500,7 @@ void vhost_dev_init(struct vhost_dev *dev,
vq->indirect = NULL;
vq->heads = NULL;
vq->dev = dev;
+   vq->weak_barriers = true;
mutex_init(&vq->mutex);
vhost_vq_reset(dev, vq);
if (vq->handle_kick)
@@ -1801,6 +1802,10 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int 
ioctl, void __user *argp)
if (ctx)
eventfd_ctx_put(ctx);
break;
+   case VHOST_SET_STRONG_BARRIERS:
+   for (i = 0; i < d->nvqs; ++i)
+   d->vqs[i]->weak_barriers = false;
+   break;
default:
r = -ENOIOCTLCMD;
break;
@@ -1927,7 +1932,7 @@ int vhost_log_write(struct vhost_virtqueue *vq, struct 
vhost_log *log,
int i, r;
 
/* Make sure data written is seen before log. */
-   smp_wmb();
+   virtio_wmb(vq->weak_barriers);
 
if (vq->iotlb) {
for (i = 0; i < count; i++) {
@@ -1964,7 +1969,7 @@ static int vhost_update_used_flags(struct vhost_virtqueue 
*vq)
return -EFAULT;
if (unlikely(vq->log_used)) {
/* Make sure the flag is seen before log. */
-   smp_wmb();
+   virtio_wmb(vq->weak_barriers);
/* Log used flag write. */
used = &vq->used->flags;
log_used(vq, (used - (void __user *)vq->used),
@@ -1982,7 +1987,7 @@ static int vhost_update_avail_event(struct 
vhost_virtqueue *vq, u16 avail_event)
if (unlikely(vq->log_used)) {
void __user *used;
/* Make sure the event is seen before log. */
-   smp_wmb();
+   virtio_wmb(vq->weak_barriers);
/* Log avail event write */
used = vhost_avail_event(vq);
log_used(vq, (used - (void __user *)vq->used),
@@ -2228,7 +2233,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
/* Only get avail ring entries after they have been
 * exposed by guest.
 */
-   smp_rmb();
+   virtio_rmb(vq->weak_barriers);
}
 
/* Grab the next descriptor number they're advertising, and increment
@@ -2367,7 +2372,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
}
if (unlikely(vq->log_used)) {
/* Make sure data is seen before log. */
-   smp_wmb();
+   virtio_wmb(vq->weak_barriers);
/* Log used ring entry write. */
log_used(vq, ((void __user *)used - (void __user *)vq->used),
 count * sizeof *used);
@@ -2402,14 +2407,14 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct 
vring_used_elem *heads,
r = __vhost_add_used_n(vq, heads, count);
 
/* Make sure buffer is written before we update index. */
-   smp_wmb();
+   virtio_wmb(vq->weak_barriers);
if (vhost_put_used_idx(vq)) {
vq_err(vq, "Failed to increment used idx");
return -EFAULT;
}
if (unlikely(vq->log_used)) {
/* Make sure used idx is seen before log. */
-   smp_wmb();
+   virtio_wmb(vq->weak_barriers);
/* Log used index update. */
log_used(vq, offsetof(struct vring_used, idx),
 sizeof vq->used->idx);
@@ -2428,7 +2433,7 @@ static bool vhost_notify(struct vhost_dev *dev, struct 
vhost_virtqueue *vq)
/* Flush out used index updates. This is paired
 * with the barrier that the Guest executes when enabling
 * interrupts. */
-   smp_mb();
+   virtio_mb(vq->weak_barriers);
 
if (vhost_has_feature(vq, VIRTIO_F_NOTIFY_ON_EMPTY) &&
unlikely(vq->avail_idx == vq->last_avail_idx))
@@ -2530,7 +2535,7 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct 
vhost_virtqueue *vq)
}
/* They could have slipped one in as we were doing that: make
 * sure it's written,

[PATCH] virtio_vdpa: reject invalid vq indices

2021-07-01 Thread Vincent Whitchurch
Do not call vDPA drivers' callbacks with vq indicies larger than what
the drivers indicate that they support.  vDPA drivers do not bounds
check the indices.

Signed-off-by: Vincent Whitchurch 
---
 drivers/virtio/virtio_vdpa.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
index e28acf482e0c..e9b9dd03f44a 100644
--- a/drivers/virtio/virtio_vdpa.c
+++ b/drivers/virtio/virtio_vdpa.c
@@ -149,6 +149,9 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned 
int index,
if (!name)
return NULL;
 
+   if (index >= vdpa->nvqs)
+   return ERR_PTR(-ENOENT);
+
/* Queue shouldn't already be set up. */
if (ops->get_vq_ready(vdpa, index))
return ERR_PTR(-ENOENT);
-- 
2.28.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 0/4] Add a vhost RPMsg API

2020-09-18 Thread Vincent Whitchurch
On Fri, Sep 18, 2020 at 11:47:20AM +0200, Guennadi Liakhovetski wrote:
> On Fri, Sep 18, 2020 at 09:47:45AM +0200, Arnaud POULIQUEN wrote:
> > IMO, as this API is defined in the Linux documentation [5] we should 
> > respect it, to ensure
> > one generic implementation. The RPMsg sample client[4] uses this user API, 
> > so seems to me
> > a good candidate to verify this. 
> > 
> > That's said, shall we multiple the RPMsg implementations in Linux with 
> > several APIs,
> > With the risk to make the RPMsg clients devices dependent on these 
> > implementations?
> > That could lead to complex code or duplications...
> 
> So, no, in my understanding there aren't two competing alternative APIs, 
> you'd never have 
> to choose between them. If you're writing a driver for Linux to communicate 
> with remote 
> processors or to run on VMs, you use the existing API. If you're writing a 
> driver for 
> Linux to communicate with those VMs, you use the vhost API and whatever help 
> is available 
> for RPMsg processing.
> 
> However, I can in principle imagine a single driver, written to work on both 
> sides. 
> Something like the rpmsg_char.c or maybe some networking driver. Is that what 
> you're 
> referring to? I can see that as a fun exercise, but are there any real uses 
> for that? 

I hinted at a real use case for this in the previous mail thread[0].
I'm exploring using rpmsg-char to allow communication between two chips,
both running Linux.  rpmsg-char can be used pretty much as-is for both
sides of the userspace-to-userspace communication and (the userspace
side of the) userspace-to-kernel communication between the two chips.

> You could do the same with VirtIO, however, it has been decided to go with 
> two 
> distinct APIs: virtio for guests and vhost for the host, noone bothered to 
> create a 
> single API for both and nobody seems to miss one. Why would we want one with 
> RPMsg?

I think I answered this question in the previous mail thread as well[1]:
| virtio has distinct driver and device roles so the completely different
| APIs on each side are understandable.  But I don't see that distinction
| in the rpmsg API which is why it seems like a good idea to me to make it
| work from both sides of the link and allow the reuse of drivers like
| rpmsg-char, instead of imposing virtio's distinction on rpmsg.

[0] https://www.spinics.net/lists/linux-virtualization/msg43799.html
[1] https://www.spinics.net/lists/linux-virtualization/msg43802.html
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v7 3/3] vhost: add an RPMsg API

2020-09-17 Thread Vincent Whitchurch
On Thu, Sep 10, 2020 at 01:13:51PM +0200, Guennadi Liakhovetski wrote:
> +int vhost_rpmsg_start_lock(struct vhost_rpmsg *vr, struct vhost_rpmsg_iter 
> *iter,
> +unsigned int qid, ssize_t len)
> + __acquires(vq->mutex)
> +{
> + struct vhost_virtqueue *vq = vr->vq + qid;
> + unsigned int cnt;
> + ssize_t ret;
> + size_t tmp;
> +
> + if (qid >= VIRTIO_RPMSG_NUM_OF_VQS)
> + return -EINVAL;
> +
> + iter->vq = vq;
> +
> + mutex_lock(&vq->mutex);
> + vhost_disable_notify(&vr->dev, vq);
> +
> + iter->head = vhost_rpmsg_get_msg(vq, &cnt);
> + if (iter->head == vq->num)
> + iter->head = -EAGAIN;
> +
> + if (iter->head < 0) {
> + ret = iter->head;
> + goto unlock;
> + }
> +
[...]
> +
> +return_buf:
> + vhost_add_used(vq, iter->head, 0);
> +unlock:
> + vhost_enable_notify(&vr->dev, vq);
> + mutex_unlock(&vq->mutex);
> +
> + return ret;
> +}

There is a race condition here.  New buffers could have been added while
notifications were disabled (between vhost_disable_notify() and
vhost_enable_notify()), so the other vhost drivers check the return
value of vhost_enable_notify() and rerun their work loops if it returns
true.  This driver doesn't do that so it stops processing requests if
that condition hits.

Something like the below seems to fix it but the correct fix could maybe
involve changing this API to account for this case so that it looks more
like the code in other vhost drivers.

diff --git a/drivers/vhost/rpmsg.c b/drivers/vhost/rpmsg.c
index 7c753258d42..673dd4ec865 100644
--- a/drivers/vhost/rpmsg.c
+++ b/drivers/vhost/rpmsg.c
@@ -302,8 +302,14 @@ static void handle_rpmsg_req_kick(struct vhost_work *work)
struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
  poll.work);
struct vhost_rpmsg *vr = container_of(vq->dev, struct vhost_rpmsg, dev);
+   struct vhost_virtqueue *reqvq = vr->vq + VIRTIO_RPMSG_REQUEST;
 
-   while (handle_rpmsg_req_single(vr, vq))
+   /*
+* The !vhost_vq_avail_empty() check is needed since the vhost_rpmsg*
+* APIs don't check the return value of vhost_enable_notify() and retry
+* if there were buffers added while notifications were disabled.
+*/
+   while (handle_rpmsg_req_single(vr, vq) || 
!vhost_vq_avail_empty(reqvq->dev, reqvq))
;
 }
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 0/4] Add a vhost RPMsg API

2020-09-17 Thread Vincent Whitchurch
On Thu, Sep 17, 2020 at 07:47:06AM +0200, Guennadi Liakhovetski wrote:
> On Tue, Sep 15, 2020 at 02:13:23PM +0200, Arnaud POULIQUEN wrote:
> > So i would be agree with Vincent[2] which proposed to switch on a RPMsg API
> > and creating a vhost rpmsg device. This is also proposed in the 
> > "Enhance VHOST to enable SoC-to-SoC communication" RFC[3].
> > Do you think that this alternative could match with your need?
> 
> As I replied to Vincent, I understand his proposal and the approach taken 
> in the series [3], but I'm not sure I agree, that adding yet another 
> virtual device / driver layer on the vhost side is a good idea. As far as 
> I understand adding new completely virtual devices isn't considered to be 
> a good practice in the kernel. Currently vhost is just a passive "library" 
> and my vhost-rpmsg support keeps it that way. Not sure I'm in favour of 
> converting vhost to a virtual device infrastructure.

I know it wasn't what you meant, but I noticed that the above paragraph
could be read as if my suggestion was to convert vhost to a virtual
device infrastructure, so I just want to clarify that that those are not
related.  The only similarity between what I suggested in the thread in
[2] and Kishon's RFC in [3] is that both involve creating a generic
vhost-rpmsg driver which would allow the RPMsg API to be used for both
sides of the link, instead of introducing a new API just for the server
side.  That can be done without rewriting drivers/vhost/.

> > [1]. 
> > https://patchwork.kernel.org/project/linux-remoteproc/list/?series=338335 
> > [2]. https://www.spinics.net/lists/linux-virtualization/msg44195.html
> > [3]. https://www.spinics.net/lists/linux-remoteproc/msg06634.html  
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 5/5] vhost: add an RPMsg API

2020-07-14 Thread Vincent Whitchurch
On Thu, Jun 18, 2020 at 04:14:12PM +0200, Guennadi Liakhovetski wrote:
> On Thu, Jun 18, 2020 at 03:52:42PM +0200, Vincent Whitchurch wrote:
> > Note that "the Linux side" is ambiguous for AMP since both sides can be
> > Linux, as they happen to be in my case.  I'm running virtio/rpmsg
> > between two physical processors (of different architectures), both
> > running Linux.
> 
> Ok, interesting, I didn't know such configurations were used too. I 
> understood 
> the Linux rpmsg implementation in the way, that it's assumed, that the "host" 
> has to boot the "device" by sending an ELF formatted executable image to it, 
> is 
> that optional? You aren't sending a complete Linux image to the device side, 
> are you?

I do pack the zImage, the dtb, and the initramfs into an ELF (along with
a tiny "bootloader" with just a handful of instructions), but the
remoteproc framework is not tied to the ELF format since ->parse_fw()
and friends are overridable by the remoteproc driver.

> > virtio has distinct driver and device roles so the completely different
> > APIs on each side are understandable.  But I don't see that distinction
> > in the rpmsg API which is why it seems like a good idea to me to make it
> > work from both sides of the link and allow the reuse of drivers like
> > rpmsg-char, instead of imposing virtio's distinction on rpmsg.
> 
> Understand. In principle I'm open to this idea, but before I implement it it 
> would be good to know what maintainers think?

Certainly.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 5/5] vhost: add an RPMsg API

2020-06-18 Thread Vincent Whitchurch
On Thu, Jun 18, 2020 at 12:39:40PM +0200, Guennadi Liakhovetski wrote:
> On Thu, Jun 18, 2020 at 11:33:24AM +0200, Vincent Whitchurch wrote:
> > By the "normal rpmsg API" I mean register_rpmsg_driver(), rpmsg_send(),
> > etc.  That API is not tied to virtio in any way and there are other
> > non-virtio backends for this API in the tree.  So it seems quite natural
> > to implement a vhost backend for this API so that both sides of the link
> > can use the same API but different backends, instead of forcing them to
> > use of different APIs.
> 
> Ok, I see what you mean now. But I'm not sure this is useful or desired. I'm 
> not an expert in KVM / VirtIO, I've only been working in the area for less 
> than a year, so, I might well be wrong.
> 
> You're proposing to use the rpmsg API in vhost drivers. As far as I 
> understand so far that API was only designated for the Linux side (in case of 
> AMPs) which corresponds to VM guests in virtualisation case. So, I'm not sure 
> we want to use the same API for the hosts? This can be done as you have 
> illustrated, but is it desirable? The vhost API is far enough from the VirtIO 
> driver API, so I'm not sure why we want the same API for rpmsg?

Note that "the Linux side" is ambiguous for AMP since both sides can be
Linux, as they happen to be in my case.  I'm running virtio/rpmsg
between two physical processors (of different architectures), both
running Linux.

virtio has distinct driver and device roles so the completely different
APIs on each side are understandable.  But I don't see that distinction
in the rpmsg API which is why it seems like a good idea to me to make it
work from both sides of the link and allow the reuse of drivers like
rpmsg-char, instead of imposing virtio's distinction on rpmsg.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 5/5] vhost: add an RPMsg API

2020-06-18 Thread Vincent Whitchurch
On Thu, Jun 18, 2020 at 11:03:42AM +0200, Guennadi Liakhovetski wrote:
> On Wed, Jun 17, 2020 at 09:17:42PM +0200, Vincent Whitchurch wrote:
> > On Wed, May 27, 2020 at 08:05:41PM +0200, Guennadi Liakhovetski wrote:
> > > Linux supports running the RPMsg protocol over the VirtIO transport
> > > protocol, but currently there is only support for VirtIO clients and
> > > no support for a VirtIO server. This patch adds a vhost-based RPMsg
> > > server implementation.
> > 
> > This looks really useful, but why is it implemented as an API and not as
> > a real vhost driver which implements an rpmsg bus?  If you implement it
> > as a vhost driver which implements rpmsg_device_ops and
> > rpmsg_endpoint_ops, then wouldn't you be able to implement your
> > vhost-sof driver using the normal rpmsg APIs?
> 
> Sorry, not sure what you mean by the "normal rpmsg API?" Do you mean the 
> VirtIO RPMsg API? But that's the opposite side of the link - that's the 
> guest side in the VM case and the Linux side in the remoteproc case. What 
> this API is adding is a vhost RPMsg API. The kernel vhost framework 
> itself is essentially a library of functions. Kernel vhost drivers simply 
> create a misc device and use the vhost functions for some common 
> functionality. This RPMsg vhost API stays in the same concept and provides 
> further functions for RPMsg specific vhost operation.

By the "normal rpmsg API" I mean register_rpmsg_driver(), rpmsg_send(),
etc.  That API is not tied to virtio in any way and there are other
non-virtio backends for this API in the tree.  So it seems quite natural
to implement a vhost backend for this API so that both sides of the link
can use the same API but different backends, instead of forcing them to
use of different APIs.

> > I tried quickly hooking up this code to such a vhost driver and I was
> > able to communicate between host and guest systems with both
> > rpmsg-client-sample and rpmsg-char which almost no modifications to
> > those drivers.
> 
> You mean you used this patch to create RPMsg vhost drivers? Without 
> creating a vhost RPMsg bus? Nice, glad to hear that!

Not quite, I hacked togther a single generic vhost-rpmsg-bus driver
which just wraps the API in this patch and implements a basic
rpmsg_device_ops and rpmsg_endpoint_ops.  Then with the following
patches and no other vhost-specific API use, I was able to load and use
the same rpmsg-char and rpmsg-client-sample drivers on both host and
guest kernels.

Userspace sets up the vhost using vhost-rpmsg-bus' misc device and
triggers creation of an rpdev which leads to a probe of the (for
example) rpmsg-client-sample driver on the host (server), which, in
turn, via NS announcement, triggers a creation of an rpdev and a probe
of the rpmsg-client-sample driver on the guest (client).

diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
index a76b963a7e5..7a03978d002 100644
--- a/drivers/rpmsg/rpmsg_char.c
+++ b/drivers/rpmsg/rpmsg_char.c
@@ -104,6 +104,11 @@ static int rpmsg_ept_cb(struct rpmsg_device *rpdev, void 
*buf, int len,
struct rpmsg_eptdev *eptdev = priv;
struct sk_buff *skb;
 
+   if (rpdev->dst == RPMSG_ADDR_ANY) {
+   printk("%s: got client address %#x from first rx!\n", __func__, 
addr);
+   rpdev->dst = addr;
+   }
+
skb = alloc_skb(len, GFP_ATOMIC);
if (!skb)
return -ENOMEM;
@@ -235,6 +240,12 @@ static ssize_t rpmsg_eptdev_write(struct file *filp, const 
char __user *buf,
goto unlock_eptdev;
}
 
+   if (eptdev->rpdev->dst == RPMSG_ADDR_ANY) {
+   ret = -EPIPE;
+   WARN(1, "Cannot write first on server, must wait for 
client!\n");
+   goto unlock_eptdev;
+   }
+
if (filp->f_flags & O_NONBLOCK)
ret = rpmsg_trysend(eptdev->ept, kbuf, len);
else
diff --git a/samples/rpmsg/rpmsg_client_sample.c 
b/samples/rpmsg/rpmsg_client_sample.c
index f161dfd3e70..5d8ca84dce0 100644
--- a/samples/rpmsg/rpmsg_client_sample.c
+++ b/samples/rpmsg/rpmsg_client_sample.c
@@ -46,6 +46,9 @@ static int rpmsg_sample_cb(struct rpmsg_device *rpdev, void 
*data, int len,
return 0;
}
 
+   if (rpdev->dst == RPMSG_ADDR_ANY)
+   rpdev->dst = src;
+
/* send a new message now */
ret = rpmsg_send(rpdev->ept, MSG, strlen(MSG));
if (ret)
@@ -68,11 +71,13 @@ static int rpmsg_sample_probe(struct rpmsg_device *rpdev)
 
dev_set_drvdata(&rpdev->dev, idata);
 
-   /* send a message to our remote processor */
-   ret = rpmsg_send(rpdev->ept, MSG, strlen(MSG));
-   if (ret) {
-   dev_err(&rpdev->dev, &quo

Re: [PATCH v3 5/5] vhost: add an RPMsg API

2020-06-17 Thread Vincent Whitchurch
On Wed, May 27, 2020 at 08:05:41PM +0200, Guennadi Liakhovetski wrote:
> Linux supports running the RPMsg protocol over the VirtIO transport
> protocol, but currently there is only support for VirtIO clients and
> no support for a VirtIO server. This patch adds a vhost-based RPMsg
> server implementation.

This looks really useful, but why is it implemented as an API and not as
a real vhost driver which implements an rpmsg bus?  If you implement it
as a vhost driver which implements rpmsg_device_ops and
rpmsg_endpoint_ops, then wouldn't you be able to implement your
vhost-sof driver using the normal rpmsg APIs?

I tried quickly hooking up this code to such a vhost driver and I was
able to communicate between host and guest systems with both
rpmsg-client-sample and rpmsg-char which almost no modifications to
those drivers.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization