Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command

2022-12-25 Thread Jason Wang
On Fri, Dec 23, 2022 at 3:39 PM Alvaro Karsz  wrote:
>
> > This needs to be proposed to the virtio spec first. And actually we
> > need more than this:
> >
> > 1) we still need a way to deal with the device without this feature
> > 2) driver can't depend solely on what is advertised by the device (e.g
> > device can choose to advertise a very long timeout)
>
> I think that I wasn't clear, sorry.
> I'm not talking about a new virtio feature, I'm talking about a situation 
> when:
> * virtio_net issues a control command.
> * the device gets the command, but somehow, completes the command
> after timeout.
> * virtio_net assumes that the command failed (timeout), and issues a
> different control command.
> * virtio_net will then call virtqueue_wait_for_used, and will
> immediately get the previous response (If I'm not wrong).
>
> So, this is not a new feature that I'm proposing, just a situation
> that may occur due to cvq timeouts.
>
> Anyhow, your solution calling BAD_RING if we reach a timeout should
> prevent this situation.

Right, that is the goal.

Thanks

>
> Thanks
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command

2022-12-25 Thread Jason Wang
On Fri, Dec 23, 2022 at 4:05 PM Eugenio Perez Martin
 wrote:
>
> On Fri, Dec 23, 2022 at 4:04 AM Jason Wang  wrote:
> >
> > On Thu, Dec 22, 2022 at 5:19 PM Eugenio Perez Martin
> >  wrote:
> > >
> > > On Thu, Dec 22, 2022 at 7:05 AM Jason Wang  wrote:
> > > >
> > > > We used to busy waiting on the cvq command this tends to be
> > > > problematic since:
> > > >
> > > > 1) CPU could wait for ever on a buggy/malicous device
> > > > 2) There's no wait to terminate the process that triggers the cvq
> > > >command
> > > >
> > > > So this patch switch to use sleep with a timeout (1s) instead of busy
> > > > polling for the cvq command forever. This gives the scheduler a breath
> > > > and can let the process can respond to a signal.
> > > >
> > > > Signed-off-by: Jason Wang 
> > > > ---
> > > >  drivers/net/virtio_net.c | 15 ---
> > > >  1 file changed, 8 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 8225496ccb1e..69173049371f 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -405,6 +405,7 @@ static void disable_rx_mode_work(struct 
> > > > virtnet_info *vi)
> > > > vi->rx_mode_work_enabled = false;
> > > > spin_unlock_bh(&vi->rx_mode_lock);
> > > >
> > > > +   virtqueue_wake_up(vi->cvq);
> > > > flush_work(&vi->rx_mode_work);
> > > >  }
> > > >
> > > > @@ -1497,6 +1498,11 @@ static bool try_fill_recv(struct virtnet_info 
> > > > *vi, struct receive_queue *rq,
> > > > return !oom;
> > > >  }
> > > >
> > > > +static void virtnet_cvq_done(struct virtqueue *cvq)
> > > > +{
> > > > +   virtqueue_wake_up(cvq);
> > > > +}
> > > > +
> > > >  static void skb_recv_done(struct virtqueue *rvq)
> > > >  {
> > > > struct virtnet_info *vi = rvq->vdev->priv;
> > > > @@ -2024,12 +2030,7 @@ static bool virtnet_send_command(struct 
> > > > virtnet_info *vi, u8 class, u8 cmd,
> > > > if (unlikely(!virtqueue_kick(vi->cvq)))
> > > > return vi->ctrl->status == VIRTIO_NET_OK;
> > > >
> > > > -   /* Spin for a response, the kick causes an ioport write, 
> > > > trapping
> > > > -* into the hypervisor, so the request should be handled 
> > > > immediately.
> > > > -*/
> > > > -   while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > -  !virtqueue_is_broken(vi->cvq))
> > > > -   cpu_relax();
> > > > +   virtqueue_wait_for_used(vi->cvq, &tmp);
> > > >
> > > > return vi->ctrl->status == VIRTIO_NET_OK;
> > > >  }
> > > > @@ -3524,7 +3525,7 @@ static int virtnet_find_vqs(struct virtnet_info 
> > > > *vi)
> > > >
> > > > /* Parameters for control virtqueue, if any */
> > > > if (vi->has_cvq) {
> > > > -   callbacks[total_vqs - 1] = NULL;
> > > > +   callbacks[total_vqs - 1] = virtnet_cvq_done;
> > >
> > > If we're using CVQ callback, what is the actual use of the timeout?
> >
> > Because we can't sleep forever since locks could be held like RTNL_LOCK.
> >
>
> Right, rtnl_lock kind of invalidates it for a general case.
>
> But do all of the commands need to take rtnl_lock? For example I see
> how we could remove it from ctrl_announce,

I think not, it's intended to serialize all cvq commands.

> so lack of ack may not be
> fatal for it.

Then there could be more than one cvq commands sent to the device, the
busy poll logic may not work.

And it's a hint that the device malfunctioned which is something that
the driver should be aware of.

Thanks

> Assuming a buggy device, we can take some cvq commands
> out of this fatal situation.
>
> This series already improves the current situation and my suggestion
> (if it's worth it) can be applied on top of it, so it is not a blocker
> at all.
>
> > >
> > > I'd say there is no right choice neither in the right timeout value
> > > nor in the action to take.
> >
> > In the next version, I tend to put BAD_RING() to prevent future requests.
> >
> > > Why not simply trigger the cmd and do all
> > > the changes at command return?
> >
> > I don't get this, sorry.
> >
>
> It's actually expanding the first point so you already answered it :).
>
> Thanks!
>
> > >
> > > I suspect the reason is that it complicates the code. For example,
> > > having the possibility of many in flight commands, races between their
> > > completion, etc.
> >
> > Actually the cvq command was serialized through RTNL_LOCK, so we don't
> > need to worry about this.
> >
> > In the next version I can add ASSERT_RTNL().
> >
> > Thanks
> >
> > > The virtio standard does not even cover unordered
> > > used commands if I'm not wrong.
> > >
> > > Is there any other fundamental reason?
> > >
> > > Thanks!
> > >
> > > > names[total_vqs - 1] = "control";
> > > > }
> > > >
> > > > --
> > > > 2.25.1
> > > >
> > >
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-

Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command

2022-12-22 Thread Alvaro Karsz
> This needs to be proposed to the virtio spec first. And actually we
> need more than this:
>
> 1) we still need a way to deal with the device without this feature
> 2) driver can't depend solely on what is advertised by the device (e.g
> device can choose to advertise a very long timeout)

I think that I wasn't clear, sorry.
I'm not talking about a new virtio feature, I'm talking about a situation when:
* virtio_net issues a control command.
* the device gets the command, but somehow, completes the command
after timeout.
* virtio_net assumes that the command failed (timeout), and issues a
different control command.
* virtio_net will then call virtqueue_wait_for_used, and will
immediately get the previous response (If I'm not wrong).

So, this is not a new feature that I'm proposing, just a situation
that may occur due to cvq timeouts.

Anyhow, your solution calling BAD_RING if we reach a timeout should
prevent this situation.

Thanks
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command

2022-12-22 Thread Jason Wang
On Thu, Dec 22, 2022 at 5:19 PM Eugenio Perez Martin
 wrote:
>
> On Thu, Dec 22, 2022 at 7:05 AM Jason Wang  wrote:
> >
> > We used to busy waiting on the cvq command this tends to be
> > problematic since:
> >
> > 1) CPU could wait for ever on a buggy/malicous device
> > 2) There's no wait to terminate the process that triggers the cvq
> >command
> >
> > So this patch switch to use sleep with a timeout (1s) instead of busy
> > polling for the cvq command forever. This gives the scheduler a breath
> > and can let the process can respond to a signal.
> >
> > Signed-off-by: Jason Wang 
> > ---
> >  drivers/net/virtio_net.c | 15 ---
> >  1 file changed, 8 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 8225496ccb1e..69173049371f 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -405,6 +405,7 @@ static void disable_rx_mode_work(struct virtnet_info 
> > *vi)
> > vi->rx_mode_work_enabled = false;
> > spin_unlock_bh(&vi->rx_mode_lock);
> >
> > +   virtqueue_wake_up(vi->cvq);
> > flush_work(&vi->rx_mode_work);
> >  }
> >
> > @@ -1497,6 +1498,11 @@ static bool try_fill_recv(struct virtnet_info *vi, 
> > struct receive_queue *rq,
> > return !oom;
> >  }
> >
> > +static void virtnet_cvq_done(struct virtqueue *cvq)
> > +{
> > +   virtqueue_wake_up(cvq);
> > +}
> > +
> >  static void skb_recv_done(struct virtqueue *rvq)
> >  {
> > struct virtnet_info *vi = rvq->vdev->priv;
> > @@ -2024,12 +2030,7 @@ static bool virtnet_send_command(struct virtnet_info 
> > *vi, u8 class, u8 cmd,
> > if (unlikely(!virtqueue_kick(vi->cvq)))
> > return vi->ctrl->status == VIRTIO_NET_OK;
> >
> > -   /* Spin for a response, the kick causes an ioport write, trapping
> > -* into the hypervisor, so the request should be handled 
> > immediately.
> > -*/
> > -   while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > -  !virtqueue_is_broken(vi->cvq))
> > -   cpu_relax();
> > +   virtqueue_wait_for_used(vi->cvq, &tmp);
> >
> > return vi->ctrl->status == VIRTIO_NET_OK;
> >  }
> > @@ -3524,7 +3525,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> >
> > /* Parameters for control virtqueue, if any */
> > if (vi->has_cvq) {
> > -   callbacks[total_vqs - 1] = NULL;
> > +   callbacks[total_vqs - 1] = virtnet_cvq_done;
>
> If we're using CVQ callback, what is the actual use of the timeout?

Because we can't sleep forever since locks could be held like RTNL_LOCK.

>
> I'd say there is no right choice neither in the right timeout value
> nor in the action to take.

In the next version, I tend to put BAD_RING() to prevent future requests.

> Why not simply trigger the cmd and do all
> the changes at command return?

I don't get this, sorry.

>
> I suspect the reason is that it complicates the code. For example,
> having the possibility of many in flight commands, races between their
> completion, etc.

Actually the cvq command was serialized through RTNL_LOCK, so we don't
need to worry about this.

In the next version I can add ASSERT_RTNL().

Thanks

> The virtio standard does not even cover unordered
> used commands if I'm not wrong.
>
> Is there any other fundamental reason?
>
> Thanks!
>
> > names[total_vqs - 1] = "control";
> > }
> >
> > --
> > 2.25.1
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command

2022-12-22 Thread Jason Wang
On Thu, Dec 22, 2022 at 11:55 PM Alvaro Karsz
 wrote:
>
> My point is that the device may complete the control command after the 
> timeout,

This needs to be proposed to the virtio spec first. And actually we
need more than this:

1) we still need a way to deal with the device without this feature
2) driver can't depend solely on what is advertised by the device (e.g
device can choose to advertise a very long timeout)

> so, if I'm not mistaken, next time we send a control command and call
> virtqueue_wait_for_used we'll get the previous response.
>

In the next version, I will first put BAD_RING() to prevent future
requests for cvq.

Note that the patch can't fix all the issues, we need more things on
top. But it's a good step and it will behave much better than the
current code.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command

2022-12-22 Thread Alvaro Karsz
My point is that the device may complete the control command after the timeout,
so, if I'm not mistaken, next time we send a control command and call
virtqueue_wait_for_used we'll get the previous response.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command

2022-12-22 Thread Jason Wang
Hi Alvaro:

On Thu, Dec 22, 2022 at 2:44 PM Alvaro Karsz  wrote:
>
> Hi Jason,
>
> Adding timeout to the cvq is a great idea IMO.
>
> > -   /* Spin for a response, the kick causes an ioport write, trapping
> > -* into the hypervisor, so the request should be handled 
> > immediately.
> > -*/
> > -   while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > -  !virtqueue_is_broken(vi->cvq))
> > -   cpu_relax();
> > +   virtqueue_wait_for_used(vi->cvq, &tmp);
>
> Do you think that we should continue like nothing happened in case of a 
> timeout?

We could, but we should not depend on a device to do this since it's
not reliable. More below.

> Shouldn't we reset the device?

We can't depend on device, there's probably another loop in reset():

E.g in vp_reset() we had:

while (vp_modern_get_status(mdev))
msleep(1);

> What happens if a device completes the control command after timeout?

Maybe we could have a BAD_RING() here in this case (and more check in
vq->broken in this case).

Thanks

>
> Thanks
>
> Alvaro
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command

2022-12-21 Thread Alvaro Karsz
Hi Jason,

Adding timeout to the cvq is a great idea IMO.

> -   /* Spin for a response, the kick causes an ioport write, trapping
> -* into the hypervisor, so the request should be handled immediately.
> -*/
> -   while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> -  !virtqueue_is_broken(vi->cvq))
> -   cpu_relax();
> +   virtqueue_wait_for_used(vi->cvq, &tmp);

Do you think that we should continue like nothing happened in case of a timeout?
Shouldn't we reset the device?
What happens if a device completes the control command after timeout?

Thanks

Alvaro
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command

2022-12-21 Thread Jason Wang
We used to busy waiting on the cvq command this tends to be
problematic since:

1) CPU could wait for ever on a buggy/malicous device
2) There's no wait to terminate the process that triggers the cvq
   command

So this patch switch to use sleep with a timeout (1s) instead of busy
polling for the cvq command forever. This gives the scheduler a breath
and can let the process can respond to a signal.

Signed-off-by: Jason Wang 
---
 drivers/net/virtio_net.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8225496ccb1e..69173049371f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -405,6 +405,7 @@ static void disable_rx_mode_work(struct virtnet_info *vi)
vi->rx_mode_work_enabled = false;
spin_unlock_bh(&vi->rx_mode_lock);
 
+   virtqueue_wake_up(vi->cvq);
flush_work(&vi->rx_mode_work);
 }
 
@@ -1497,6 +1498,11 @@ static bool try_fill_recv(struct virtnet_info *vi, 
struct receive_queue *rq,
return !oom;
 }
 
+static void virtnet_cvq_done(struct virtqueue *cvq)
+{
+   virtqueue_wake_up(cvq);
+}
+
 static void skb_recv_done(struct virtqueue *rvq)
 {
struct virtnet_info *vi = rvq->vdev->priv;
@@ -2024,12 +2030,7 @@ static bool virtnet_send_command(struct virtnet_info 
*vi, u8 class, u8 cmd,
if (unlikely(!virtqueue_kick(vi->cvq)))
return vi->ctrl->status == VIRTIO_NET_OK;
 
-   /* Spin for a response, the kick causes an ioport write, trapping
-* into the hypervisor, so the request should be handled immediately.
-*/
-   while (!virtqueue_get_buf(vi->cvq, &tmp) &&
-  !virtqueue_is_broken(vi->cvq))
-   cpu_relax();
+   virtqueue_wait_for_used(vi->cvq, &tmp);
 
return vi->ctrl->status == VIRTIO_NET_OK;
 }
@@ -3524,7 +3525,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
 
/* Parameters for control virtqueue, if any */
if (vi->has_cvq) {
-   callbacks[total_vqs - 1] = NULL;
+   callbacks[total_vqs - 1] = virtnet_cvq_done;
names[total_vqs - 1] = "control";
}
 
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization