Re: [PATCH v1] virtio-pci: Fix legacy device flag setting error in probe

2023-07-24 Thread Feng Liu via Virtualization



On 2023-07-24 p.m.11:41, Jason Wang wrote:

External email: Use caution opening links or attachments


On Mon, Jul 24, 2023 at 9:14 PM Feng Liu  wrote:




On 2023-07-20 p.m.1:14, Michael S. Tsirkin wrote:

External email: Use caution opening links or attachments


On Thu, Jul 20, 2023 at 10:27:04AM +0800, Jason Wang wrote:

On Wed, Jul 19, 2023 at 11:46 PM Feng Liu  wrote:


The 'is_legacy' flag is used to differentiate between legacy vs modern
device. Currently, it is based on the value of vp_dev->ldev.ioaddr.
However, due to the shared memory of the union between struct
virtio_pci_legacy_device and struct virtio_pci_modern_device, when
virtio_pci_modern_probe modifies the content of struct
virtio_pci_modern_device, it affects the content of struct
virtio_pci_legacy_device, and ldev.ioaddr is no longer zero, causing
the 'is_legacy' flag to be set as true. To resolve issue, when legacy
device is probed, mark 'is_legacy' as true, when modern device is
probed, keep 'is_legacy' as false.

Fixes: 4f0fc22534e3 ("virtio_pci: Optimize virtio_pci_device structure size")
Signed-off-by: Feng Liu 
Reviewed-by: Parav Pandit 
Reviewed-by: Jiri Pirko 
---
   drivers/virtio/virtio_pci_common.c | 2 --
   drivers/virtio/virtio_pci_legacy.c | 1 +
   2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index a6c86f916dbd..c2524a7207cf 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -557,8 +557,6 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,

  pci_set_master(pci_dev);

-   vp_dev->is_legacy = vp_dev->ldev.ioaddr ? true : false;
-
  rc = register_virtio_device(_dev->vdev);
  reg_dev = vp_dev;
  if (rc)
diff --git a/drivers/virtio/virtio_pci_legacy.c 
b/drivers/virtio/virtio_pci_legacy.c
index 2257f1b3d8ae..d9cbb02b35a1 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -223,6 +223,7 @@ int virtio_pci_legacy_probe(struct virtio_pci_device 
*vp_dev)
  vp_dev->config_vector = vp_config_vector;
  vp_dev->setup_vq = setup_vq;
  vp_dev->del_vq = del_vq;
+   vp_dev->is_legacy = true;


This seems break force_legacy for modern device:

  if (force_legacy) {
  rc = virtio_pci_legacy_probe(vp_dev);
  /* Also try modern mode if we can't map BAR0 (no IO space). */
  if (rc == -ENODEV || rc == -ENOMEM)
  rc = virtio_pci_modern_probe(vp_dev);

Thanks


don't see the breakage here - can you explain a bit more?


Hi, Jason

I also think there is no breakage herea and gave an explanation in
another email, please have a see.


I think I've made a mistake, the patch should be fine.



So are there any comments about this bug fix patch? Can this patch pass
the review?


Yes.

Acked-by: Jason Wang 

Thanks



Thanks Jason



Thanks
Feng



  return 0;
   }
--
2.37.1 (Apple Git-137.1)








___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1] virtio-pci: Fix legacy device flag setting error in probe

2023-07-24 Thread Jason Wang
On Mon, Jul 24, 2023 at 9:14 PM Feng Liu  wrote:
>
>
>
> On 2023-07-20 p.m.1:14, Michael S. Tsirkin wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Thu, Jul 20, 2023 at 10:27:04AM +0800, Jason Wang wrote:
> >> On Wed, Jul 19, 2023 at 11:46 PM Feng Liu  wrote:
> >>>
> >>> The 'is_legacy' flag is used to differentiate between legacy vs modern
> >>> device. Currently, it is based on the value of vp_dev->ldev.ioaddr.
> >>> However, due to the shared memory of the union between struct
> >>> virtio_pci_legacy_device and struct virtio_pci_modern_device, when
> >>> virtio_pci_modern_probe modifies the content of struct
> >>> virtio_pci_modern_device, it affects the content of struct
> >>> virtio_pci_legacy_device, and ldev.ioaddr is no longer zero, causing
> >>> the 'is_legacy' flag to be set as true. To resolve issue, when legacy
> >>> device is probed, mark 'is_legacy' as true, when modern device is
> >>> probed, keep 'is_legacy' as false.
> >>>
> >>> Fixes: 4f0fc22534e3 ("virtio_pci: Optimize virtio_pci_device structure 
> >>> size")
> >>> Signed-off-by: Feng Liu 
> >>> Reviewed-by: Parav Pandit 
> >>> Reviewed-by: Jiri Pirko 
> >>> ---
> >>>   drivers/virtio/virtio_pci_common.c | 2 --
> >>>   drivers/virtio/virtio_pci_legacy.c | 1 +
> >>>   2 files changed, 1 insertion(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/virtio/virtio_pci_common.c 
> >>> b/drivers/virtio/virtio_pci_common.c
> >>> index a6c86f916dbd..c2524a7207cf 100644
> >>> --- a/drivers/virtio/virtio_pci_common.c
> >>> +++ b/drivers/virtio/virtio_pci_common.c
> >>> @@ -557,8 +557,6 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
> >>>
> >>>  pci_set_master(pci_dev);
> >>>
> >>> -   vp_dev->is_legacy = vp_dev->ldev.ioaddr ? true : false;
> >>> -
> >>>  rc = register_virtio_device(_dev->vdev);
> >>>  reg_dev = vp_dev;
> >>>  if (rc)
> >>> diff --git a/drivers/virtio/virtio_pci_legacy.c 
> >>> b/drivers/virtio/virtio_pci_legacy.c
> >>> index 2257f1b3d8ae..d9cbb02b35a1 100644
> >>> --- a/drivers/virtio/virtio_pci_legacy.c
> >>> +++ b/drivers/virtio/virtio_pci_legacy.c
> >>> @@ -223,6 +223,7 @@ int virtio_pci_legacy_probe(struct virtio_pci_device 
> >>> *vp_dev)
> >>>  vp_dev->config_vector = vp_config_vector;
> >>>  vp_dev->setup_vq = setup_vq;
> >>>  vp_dev->del_vq = del_vq;
> >>> +   vp_dev->is_legacy = true;
> >>
> >> This seems break force_legacy for modern device:
> >>
> >>  if (force_legacy) {
> >>  rc = virtio_pci_legacy_probe(vp_dev);
> >>  /* Also try modern mode if we can't map BAR0 (no IO 
> >> space). */
> >>  if (rc == -ENODEV || rc == -ENOMEM)
> >>  rc = virtio_pci_modern_probe(vp_dev);
> >>
> >> Thanks
> >
> > don't see the breakage here - can you explain a bit more?
> >
> Hi, Jason
>
> I also think there is no breakage herea and gave an explanation in
> another email, please have a see.

I think I've made a mistake, the patch should be fine.

>
> So are there any comments about this bug fix patch? Can this patch pass
> the review?

Yes.

Acked-by: Jason Wang 

Thanks

>
> Thanks
> Feng
>
> >>>
> >>>  return 0;
> >>>   }
> >>> --
> >>> 2.37.1 (Apple Git-137.1)
> >>>
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-24 Thread Jason Wang
On Mon, Jul 24, 2023 at 3:17 PM Michael S. Tsirkin  wrote:
>
> On Mon, Jul 24, 2023 at 02:52:05PM +0800, Jason Wang wrote:
> > On Sat, Jul 22, 2023 at 4:18 AM Maxime Coquelin
> >  wrote:
> > >
> > >
> > >
> > > On 7/21/23 17:10, Michael S. Tsirkin wrote:
> > > > On Fri, Jul 21, 2023 at 04:58:04PM +0200, Maxime Coquelin wrote:
> > > >>
> > > >>
> > > >> On 7/21/23 16:45, Michael S. Tsirkin wrote:
> > > >>> On Fri, Jul 21, 2023 at 04:37:00PM +0200, Maxime Coquelin wrote:
> > > 
> > > 
> > >  On 7/20/23 23:02, Michael S. Tsirkin wrote:
> > > > On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson wrote:
> > > >> On 7/20/23 1:38 AM, Jason Wang wrote:
> > > >>>
> > > >>> Adding cond_resched() to the command waiting loop for a better
> > > >>> co-operation with the scheduler. This allows to give CPU a breath 
> > > >>> to
> > > >>> run other task(workqueue) instead of busy looping when preemption 
> > > >>> is
> > > >>> not allowed on a device whose CVQ might be slow.
> > > >>>
> > > >>> Signed-off-by: Jason Wang 
> > > >>
> > > >> This still leaves hung processes, but at least it doesn't pin the 
> > > >> CPU any
> > > >> more.  Thanks.
> > > >> Reviewed-by: Shannon Nelson 
> > > >>
> > > >
> > > > I'd like to see a full solution
> > > > 1- block until interrupt
> >
> > I remember in previous versions, you worried about the extra MSI
> > vector. (Maybe I was wrong).
> >
> > > 
> > >  Would it make sense to also have a timeout?
> > >  And when timeout expires, set FAILED bit in device status?
> > > >>>
> > > >>> virtio spec does not set any limits on the timing of vq
> > > >>> processing.
> > > >>
> > > >> Indeed, but I thought the driver could decide it is too long for it.
> > > >>
> > > >> The issue is we keep waiting with rtnl locked, it can quickly make the
> > > >> system unusable.
> > > >
> > > > if this is a problem we should find a way not to keep rtnl
> > > > locked indefinitely.
> >
> > Any ideas on this direction? Simply dropping rtnl during the busy loop
> > will result in a lot of races. This seems to require non-trivial
> > changes in the networking core.
> >
> > >
> > >  From the tests I have done, I think it is. With OVS, a reconfiguration
> > > is performed when the VDUSE device is added, and when a MLX5 device is
> > > in the same bridge, it ends up doing an ioctl() that tries to take the
> > > rtnl lock. In this configuration, it is not possible to kill OVS because
> > > it is stuck trying to acquire rtnl lock for mlx5 that is held by virtio-
> > > net.
> >
> > Yeah, basically, any RTNL users would be blocked forever.
> >
> > And the infinite loop has other side effects like it blocks the freezer to 
> > work.
> >
> > To summarize, there are three issues
> >
> > 1) busy polling
> > 2) breaks freezer
> > 3) hold RTNL during the loop
> >
> > Solving 3 may help somehow for 2 e.g some pm routine e.g wireguard or
> > even virtnet_restore() itself may try to hold the lock.
>
> Yep. So my feeling currently is, the only real fix is to actually
> queue up the work in software.

Do you mean something like:

rtnl_lock();
queue up the work
rtnl_unlock();
return success;

?

> It's mostly trivial to limit
> memory consumption, vid's is the
> only one where it would make sense to have more than
> 1 command of a given type outstanding.

And rx mode so this implies we will fail any command if the previous
work is not finished.

> have a tree
> or a bitmap with vids to add/remove?

Probably.

Thanks

>
>
>
> > >
> > > >
> > > > 2- still handle surprise removal correctly by waking in that case
> >
> > This is basically what version 1 did?
> >
> > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368...@redhat.com/t/
> >
> > Thanks
>
> Yes - except the timeout part.
>
>
> > > >
> > > >
> > > >
> > > >>> ---
> > > >>>  drivers/net/virtio_net.c | 4 +++-
> > > >>>  1 file changed, 3 insertions(+), 1 deletion(-)
> > > >>>
> > > >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > >>> index 9f3b1d6ac33d..e7533f29b219 100644
> > > >>> --- a/drivers/net/virtio_net.c
> > > >>> +++ b/drivers/net/virtio_net.c
> > > >>> @@ -2314,8 +2314,10 @@ static bool virtnet_send_command(struct 
> > > >>> virtnet_info *vi, u8 class, u8 cmd,
> > > >>>  * into the hypervisor, so the request should be 
> > > >>> handled immediately.
> > > >>>  */
> > > >>> while (!virtqueue_get_buf(vi->cvq, ) &&
> > > >>> -  !virtqueue_is_broken(vi->cvq))
> > > >>> +  !virtqueue_is_broken(vi->cvq)) {
> > > >>> +   cond_resched();
> > > >>> cpu_relax();
> > > >>> +   }
> > > >>>
> > > >>> return vi->ctrl->status == VIRTIO_NET_OK;
> > > >>>  }
> > > >>> --
> > > >>> 2.39.3
> > > >>>
> > > 

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-24 Thread Jason Wang
On Mon, Jul 24, 2023 at 3:18 PM Michael S. Tsirkin  wrote:
>
> On Mon, Jul 24, 2023 at 02:52:49PM +0800, Jason Wang wrote:
> > On Mon, Jul 24, 2023 at 2:46 PM Michael S. Tsirkin  wrote:
> > >
> > > On Fri, Jul 21, 2023 at 10:18:03PM +0200, Maxime Coquelin wrote:
> > > >
> > > >
> > > > On 7/21/23 17:10, Michael S. Tsirkin wrote:
> > > > > On Fri, Jul 21, 2023 at 04:58:04PM +0200, Maxime Coquelin wrote:
> > > > > >
> > > > > >
> > > > > > On 7/21/23 16:45, Michael S. Tsirkin wrote:
> > > > > > > On Fri, Jul 21, 2023 at 04:37:00PM +0200, Maxime Coquelin wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > On 7/20/23 23:02, Michael S. Tsirkin wrote:
> > > > > > > > > On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson 
> > > > > > > > > wrote:
> > > > > > > > > > On 7/20/23 1:38 AM, Jason Wang wrote:
> > > > > > > > > > >
> > > > > > > > > > > Adding cond_resched() to the command waiting loop for a 
> > > > > > > > > > > better
> > > > > > > > > > > co-operation with the scheduler. This allows to give CPU 
> > > > > > > > > > > a breath to
> > > > > > > > > > > run other task(workqueue) instead of busy looping when 
> > > > > > > > > > > preemption is
> > > > > > > > > > > not allowed on a device whose CVQ might be slow.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Jason Wang 
> > > > > > > > > >
> > > > > > > > > > This still leaves hung processes, but at least it doesn't 
> > > > > > > > > > pin the CPU any
> > > > > > > > > > more.  Thanks.
> > > > > > > > > > Reviewed-by: Shannon Nelson 
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I'd like to see a full solution
> > > > > > > > > 1- block until interrupt
> > > > > > > >
> > > > > > > > Would it make sense to also have a timeout?
> > > > > > > > And when timeout expires, set FAILED bit in device status?
> > > > > > >
> > > > > > > virtio spec does not set any limits on the timing of vq
> > > > > > > processing.
> > > > > >
> > > > > > Indeed, but I thought the driver could decide it is too long for it.
> > > > > >
> > > > > > The issue is we keep waiting with rtnl locked, it can quickly make 
> > > > > > the
> > > > > > system unusable.
> > > > >
> > > > > if this is a problem we should find a way not to keep rtnl
> > > > > locked indefinitely.
> > > >
> > > > From the tests I have done, I think it is. With OVS, a reconfiguration 
> > > > is
> > > > performed when the VDUSE device is added, and when a MLX5 device is
> > > > in the same bridge, it ends up doing an ioctl() that tries to take the
> > > > rtnl lock. In this configuration, it is not possible to kill OVS because
> > > > it is stuck trying to acquire rtnl lock for mlx5 that is held by virtio-
> > > > net.
> > >
> > > So for sure, we can queue up the work and process it later.
> > > The somewhat tricky part is limiting the memory consumption.
> >
> > And it needs to sync with rtnl somehow, e.g device unregistering which
> > seems not easy.
> >
> > Thanks
>
> since when does device unregister need to send cvq commands?

It doesn't do this now. But if we don't process the work under rtnl,
we need to synchronize with device unregistering.

Thanks

>
> > >
> > >
> > > > >
> > > > > > > > > 2- still handle surprise removal correctly by waking in that 
> > > > > > > > > case
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > > ---
> > > > > > > > > > >  drivers/net/virtio_net.c | 4 +++-
> > > > > > > > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/net/virtio_net.c 
> > > > > > > > > > > b/drivers/net/virtio_net.c
> > > > > > > > > > > index 9f3b1d6ac33d..e7533f29b219 100644
> > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > @@ -2314,8 +2314,10 @@ static bool 
> > > > > > > > > > > virtnet_send_command(struct virtnet_info *vi, u8 class, 
> > > > > > > > > > > u8 cmd,
> > > > > > > > > > >  * into the hypervisor, so the request should 
> > > > > > > > > > > be handled immediately.
> > > > > > > > > > >  */
> > > > > > > > > > > while (!virtqueue_get_buf(vi->cvq, ) &&
> > > > > > > > > > > -  !virtqueue_is_broken(vi->cvq))
> > > > > > > > > > > +  !virtqueue_is_broken(vi->cvq)) {
> > > > > > > > > > > +   cond_resched();
> > > > > > > > > > > cpu_relax();
> > > > > > > > > > > +   }
> > > > > > > > > > >
> > > > > > > > > > > return vi->ctrl->status == VIRTIO_NET_OK;
> > > > > > > > > > >  }
> > > > > > > > > > > --
> > > > > > > > > > > 2.39.3
> > > > > > > > > > >
> > > > > > > > > > > ___
> > > > > > > > > > > Virtualization mailing list
> > > > > > > > > > > Virtualization@lists.linux-foundation.org
> > > > > > > > > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> > > > > > > > >
> > > > > > >
> > > > >
> > >
>


Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-24 Thread Xuan Zhuo
On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig  
wrote:
> On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > Well I think we can add wrappers like virtio_dma_sync and so on.
> > There are NOP for non-dma so passing the dma device is harmless.
>
> Yes, please.


I am not sure I got this fully.

Are you mean this:
https://lore.kernel.org/all/20230214072704.126660-8-xuanz...@linux.alibaba.com/
https://lore.kernel.org/all/20230214072704.126660-9-xuanz...@linux.alibaba.com/

Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
No care the device is non-dma device or dma device.

Then the AF_XDP must use these virtio_dma_* APIs for virtio device.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] vdpa: Complement vdpa_nl_policy for nlattr length check

2023-07-24 Thread Michael S. Tsirkin
On Mon, Jul 24, 2023 at 11:42:42AM +, Dragos Tatulea wrote:
> On Mon, 2023-07-24 at 05:16 -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 24, 2023 at 08:38:04AM +, Dragos Tatulea wrote:
> > > 
> > > On Mon, 2023-07-24 at 15:11 +0800, Jason Wang wrote:
> > > > On Sun, Jul 23, 2023 at 6:02 PM Michael S. Tsirkin  
> > > > wrote:
> > > > > 
> > > > > On Sun, Jul 23, 2023 at 05:48:46PM +0800, Lin Ma wrote:
> > > > > > 
> > > > > > > Sure, that is another undergoing task I'm working on. If the 
> > > > > > > nlattr
> > > > > > > is
> > > > > > > parsed with
> > > > > > > NL_VALIDATE_UNSPEC, any forgotten nlattr will be rejected, 
> > > > > > > therefore
> > > > > > > (which is the default
> > > > > > > for modern nla_parse).
> > > > > > 
> > > > > > For the general netlink interface, the deciding flag should be
> > > > > > genl_ops.validate defined in
> > > > > > each ops. The default validate flag is strict, while the developer 
> > > > > > can
> > > > > > overwrite the flag
> > > > > > with GENL_DONT_VALIDATE_STRICT to ease the validation. That is to 
> > > > > > say,
> > > > > > safer code should
> > > > > > enforce NL_VALIDATE_STRICT by not overwriting the validate flag.
> > > > > > 
> > > > > > Regrads
> > > > > > Lin
> > > > > 
> > > > > 
> > > > > Oh I see.
> > > > > 
> > > > > It started here:
> > > > > 
> > > > > commit 33b347503f014ebf76257327cbc7001c6b721956
> > > > > Author: Parav Pandit 
> > > > > Date:   Tue Jan 5 12:32:00 2021 +0200
> > > > > 
> > > > >     vdpa: Define vdpa mgmt device, ops and a netlink interface
> > > > > 
> > > > > which did:
> > > > > 
> > > > > +   .validate = GENL_DONT_VALIDATE_STRICT |
> > > > > GENL_DONT_VALIDATE_DUMP,
> > > > > 
> > > > > 
> > > > > which was most likely just a copy paste from somewhere, right Parav?
> > > > > 
> > > > > and then everyone kept copying this around.
> > > > > 
> > > > > Parav, Eli can we drop these? There's a tiny chance of breaking
> > > > > something
> > > > > but I feel there aren't that many users outside mlx5 yet, so if you
> > > > > guys can test on mlx5 and confirm no breakage, I think we are good.
> > > > 
> > > > Adding Dragos.
> > > > 
> > > I will check. Just to make sure I understand correctly: you want me to 
> > > drop
> > > the
> > > .validate flags all together in all vdpa ops and check, right?
> > > 
> > > Thanks,
> > > Dragos
> > 
> > yes - I suspect you will then need this patch to make things work.
> > 
> Yep. Adding the patch and removing the .validate config on the vdpa_nl_ops
> seems to work just fine.
> 
> Thanks,
> Dragos

OK, post a patch?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-24 Thread Michael S. Tsirkin
On Thu, Jul 20, 2023 at 03:34:01PM +0800, Xuan Zhuo wrote:
> On Wed, 19 Jul 2023 23:57:51 -0700, Christoph Hellwig  
> wrote:
> > On Thu, Jul 20, 2023 at 02:45:14PM +0800, Xuan Zhuo wrote:
> > >  virtqueue_dma_dev() return the device that working with the DMA APIs.
> > >  Then that can be used like other devices. So what is the problem.
> > >
> > >  I always think the code path without the DMA APIs is the trouble for you.
> >
> > Because we now have an API where the upper level drivers sometimes
> > see the dma device and sometimes not.
> 
> No dma device is just for the old devices.
> 
> The API without DMA dev are only compatible with older devices. We can't give 
> up
> these old devices, but we also have to embrace new features.
> 
> > This will be abused and cause
> > trouble sooner than you can say "layering".
> 
> I don't understand what the possible trouble here is.
> 
> When no dma device, the driver just does the same thing as before.
> 
> Thanks.

Instead of skipping operations, Christoph wants wrappers that
do nothing for non dma case.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-24 Thread Christoph Hellwig
On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> Well I think we can add wrappers like virtio_dma_sync and so on.
> There are NOP for non-dma so passing the dma device is harmless.

Yes, please.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] virtio-pci: Fix legacy device flag setting error in probe

2023-07-24 Thread Feng Liu via Virtualization



On 2023-07-20 p.m.1:14, Michael S. Tsirkin wrote:

External email: Use caution opening links or attachments


On Thu, Jul 20, 2023 at 10:27:04AM +0800, Jason Wang wrote:

On Wed, Jul 19, 2023 at 11:46 PM Feng Liu  wrote:


The 'is_legacy' flag is used to differentiate between legacy vs modern
device. Currently, it is based on the value of vp_dev->ldev.ioaddr.
However, due to the shared memory of the union between struct
virtio_pci_legacy_device and struct virtio_pci_modern_device, when
virtio_pci_modern_probe modifies the content of struct
virtio_pci_modern_device, it affects the content of struct
virtio_pci_legacy_device, and ldev.ioaddr is no longer zero, causing
the 'is_legacy' flag to be set as true. To resolve issue, when legacy
device is probed, mark 'is_legacy' as true, when modern device is
probed, keep 'is_legacy' as false.

Fixes: 4f0fc22534e3 ("virtio_pci: Optimize virtio_pci_device structure size")
Signed-off-by: Feng Liu 
Reviewed-by: Parav Pandit 
Reviewed-by: Jiri Pirko 
---
  drivers/virtio/virtio_pci_common.c | 2 --
  drivers/virtio/virtio_pci_legacy.c | 1 +
  2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index a6c86f916dbd..c2524a7207cf 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -557,8 +557,6 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,

 pci_set_master(pci_dev);

-   vp_dev->is_legacy = vp_dev->ldev.ioaddr ? true : false;
-
 rc = register_virtio_device(_dev->vdev);
 reg_dev = vp_dev;
 if (rc)
diff --git a/drivers/virtio/virtio_pci_legacy.c 
b/drivers/virtio/virtio_pci_legacy.c
index 2257f1b3d8ae..d9cbb02b35a1 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -223,6 +223,7 @@ int virtio_pci_legacy_probe(struct virtio_pci_device 
*vp_dev)
 vp_dev->config_vector = vp_config_vector;
 vp_dev->setup_vq = setup_vq;
 vp_dev->del_vq = del_vq;
+   vp_dev->is_legacy = true;


This seems break force_legacy for modern device:

 if (force_legacy) {
 rc = virtio_pci_legacy_probe(vp_dev);
 /* Also try modern mode if we can't map BAR0 (no IO space). */
 if (rc == -ENODEV || rc == -ENOMEM)
 rc = virtio_pci_modern_probe(vp_dev);

Thanks


don't see the breakage here - can you explain a bit more?


Hi, Jason

I also think there is no breakage herea and gave an explanation in 
another email, please have a see.


So are there any comments about this bug fix patch? Can this patch pass 
the review?


Thanks
Feng



 return 0;
  }
--
2.37.1 (Apple Git-137.1)




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 03/47] mm: shrinker: add infrastructure for dynamically allocating shrinker

2023-07-24 Thread Peter Zijlstra
On Mon, Jul 24, 2023 at 05:43:10PM +0800, Qi Zheng wrote:

> +void shrinker_unregister(struct shrinker *shrinker)
> +{
> + struct dentry *debugfs_entry;
> + int debugfs_id;
> +
> + if (!shrinker || !(shrinker->flags & SHRINKER_REGISTERED))
> + return;
> +
> + down_write(_rwsem);
> + list_del(>list);
> + shrinker->flags &= ~SHRINKER_REGISTERED;
> + if (shrinker->flags & SHRINKER_MEMCG_AWARE)
> + unregister_memcg_shrinker(shrinker);
> + debugfs_entry = shrinker_debugfs_detach(shrinker, _id);
> + up_write(_rwsem);
> +
> + shrinker_debugfs_remove(debugfs_entry, debugfs_id);

Should there not be an rcu_barrier() right about here?

> +
> + kfree(shrinker->nr_deferred);
> + shrinker->nr_deferred = NULL;
> +
> + kfree(shrinker);
> +}
> +EXPORT_SYMBOL(shrinker_unregister);

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] vdpa: Complement vdpa_nl_policy for nlattr length check

2023-07-24 Thread Dragos Tatulea via Virtualization
On Mon, 2023-07-24 at 05:16 -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 24, 2023 at 08:38:04AM +, Dragos Tatulea wrote:
> > 
> > On Mon, 2023-07-24 at 15:11 +0800, Jason Wang wrote:
> > > On Sun, Jul 23, 2023 at 6:02 PM Michael S. Tsirkin  
> > > wrote:
> > > > 
> > > > On Sun, Jul 23, 2023 at 05:48:46PM +0800, Lin Ma wrote:
> > > > > 
> > > > > > Sure, that is another undergoing task I'm working on. If the nlattr
> > > > > > is
> > > > > > parsed with
> > > > > > NL_VALIDATE_UNSPEC, any forgotten nlattr will be rejected, therefore
> > > > > > (which is the default
> > > > > > for modern nla_parse).
> > > > > 
> > > > > For the general netlink interface, the deciding flag should be
> > > > > genl_ops.validate defined in
> > > > > each ops. The default validate flag is strict, while the developer can
> > > > > overwrite the flag
> > > > > with GENL_DONT_VALIDATE_STRICT to ease the validation. That is to say,
> > > > > safer code should
> > > > > enforce NL_VALIDATE_STRICT by not overwriting the validate flag.
> > > > > 
> > > > > Regrads
> > > > > Lin
> > > > 
> > > > 
> > > > Oh I see.
> > > > 
> > > > It started here:
> > > > 
> > > > commit 33b347503f014ebf76257327cbc7001c6b721956
> > > > Author: Parav Pandit 
> > > > Date:   Tue Jan 5 12:32:00 2021 +0200
> > > > 
> > > >     vdpa: Define vdpa mgmt device, ops and a netlink interface
> > > > 
> > > > which did:
> > > > 
> > > > +   .validate = GENL_DONT_VALIDATE_STRICT |
> > > > GENL_DONT_VALIDATE_DUMP,
> > > > 
> > > > 
> > > > which was most likely just a copy paste from somewhere, right Parav?
> > > > 
> > > > and then everyone kept copying this around.
> > > > 
> > > > Parav, Eli can we drop these? There's a tiny chance of breaking
> > > > something
> > > > but I feel there aren't that many users outside mlx5 yet, so if you
> > > > guys can test on mlx5 and confirm no breakage, I think we are good.
> > > 
> > > Adding Dragos.
> > > 
> > I will check. Just to make sure I understand correctly: you want me to drop
> > the
> > .validate flags all together in all vdpa ops and check, right?
> > 
> > Thanks,
> > Dragos
> 
> yes - I suspect you will then need this patch to make things work.
> 
Yep. Adding the patch and removing the .validate config on the vdpa_nl_ops
seems to work just fine.

Thanks,
Dragos
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1] vdpa: Complement vdpa_nl_policy for nlattr length check

2023-07-24 Thread Michael S. Tsirkin
On Mon, Jul 24, 2023 at 08:38:04AM +, Dragos Tatulea wrote:
> 
> On Mon, 2023-07-24 at 15:11 +0800, Jason Wang wrote:
> > On Sun, Jul 23, 2023 at 6:02 PM Michael S. Tsirkin  wrote:
> > > 
> > > On Sun, Jul 23, 2023 at 05:48:46PM +0800, Lin Ma wrote:
> > > > 
> > > > > Sure, that is another undergoing task I'm working on. If the nlattr is
> > > > > parsed with
> > > > > NL_VALIDATE_UNSPEC, any forgotten nlattr will be rejected, therefore
> > > > > (which is the default
> > > > > for modern nla_parse).
> > > > 
> > > > For the general netlink interface, the deciding flag should be
> > > > genl_ops.validate defined in
> > > > each ops. The default validate flag is strict, while the developer can
> > > > overwrite the flag
> > > > with GENL_DONT_VALIDATE_STRICT to ease the validation. That is to say,
> > > > safer code should
> > > > enforce NL_VALIDATE_STRICT by not overwriting the validate flag.
> > > > 
> > > > Regrads
> > > > Lin
> > > 
> > > 
> > > Oh I see.
> > > 
> > > It started here:
> > > 
> > > commit 33b347503f014ebf76257327cbc7001c6b721956
> > > Author: Parav Pandit 
> > > Date:   Tue Jan 5 12:32:00 2021 +0200
> > > 
> > >     vdpa: Define vdpa mgmt device, ops and a netlink interface
> > > 
> > > which did:
> > > 
> > > +   .validate = GENL_DONT_VALIDATE_STRICT |
> > > GENL_DONT_VALIDATE_DUMP,
> > > 
> > > 
> > > which was most likely just a copy paste from somewhere, right Parav?
> > > 
> > > and then everyone kept copying this around.
> > > 
> > > Parav, Eli can we drop these? There's a tiny chance of breaking something
> > > but I feel there aren't that many users outside mlx5 yet, so if you
> > > guys can test on mlx5 and confirm no breakage, I think we are good.
> > 
> > Adding Dragos.
> > 
> I will check. Just to make sure I understand correctly: you want me to drop 
> the
> .validate flags all together in all vdpa ops and check, right?
> 
> Thanks,
> Dragos

yes - I suspect you will then need this patch to make things work.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1] vdpa: Complement vdpa_nl_policy for nlattr length check

2023-07-24 Thread Dragos Tatulea via Virtualization

On Mon, 2023-07-24 at 15:11 +0800, Jason Wang wrote:
> On Sun, Jul 23, 2023 at 6:02 PM Michael S. Tsirkin  wrote:
> > 
> > On Sun, Jul 23, 2023 at 05:48:46PM +0800, Lin Ma wrote:
> > > 
> > > > Sure, that is another undergoing task I'm working on. If the nlattr is
> > > > parsed with
> > > > NL_VALIDATE_UNSPEC, any forgotten nlattr will be rejected, therefore
> > > > (which is the default
> > > > for modern nla_parse).
> > > 
> > > For the general netlink interface, the deciding flag should be
> > > genl_ops.validate defined in
> > > each ops. The default validate flag is strict, while the developer can
> > > overwrite the flag
> > > with GENL_DONT_VALIDATE_STRICT to ease the validation. That is to say,
> > > safer code should
> > > enforce NL_VALIDATE_STRICT by not overwriting the validate flag.
> > > 
> > > Regrads
> > > Lin
> > 
> > 
> > Oh I see.
> > 
> > It started here:
> > 
> > commit 33b347503f014ebf76257327cbc7001c6b721956
> > Author: Parav Pandit 
> > Date:   Tue Jan 5 12:32:00 2021 +0200
> > 
> >     vdpa: Define vdpa mgmt device, ops and a netlink interface
> > 
> > which did:
> > 
> > +   .validate = GENL_DONT_VALIDATE_STRICT |
> > GENL_DONT_VALIDATE_DUMP,
> > 
> > 
> > which was most likely just a copy paste from somewhere, right Parav?
> > 
> > and then everyone kept copying this around.
> > 
> > Parav, Eli can we drop these? There's a tiny chance of breaking something
> > but I feel there aren't that many users outside mlx5 yet, so if you
> > guys can test on mlx5 and confirm no breakage, I think we are good.
> 
> Adding Dragos.
> 
I will check. Just to make sure I understand correctly: you want me to drop the
.validate flags all together in all vdpa ops and check, right?

Thanks,
Dragos
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next V3 3/4] virtio_net: support per queue interrupt coalesce command

2023-07-24 Thread Michael S. Tsirkin
On Mon, Jul 24, 2023 at 06:40:47AM +0300, Gavin Li wrote:
> Add interrupt_coalesce config in send_queue and receive_queue to cache user
> config.
> 
> Send per virtqueue interrupt moderation config to underline device in order
> to have more efficient interrupt moderation and cpu utilization of guest
> VM.
> 
> Signed-off-by: Gavin Li 
> Reviewed-by: Dragos Tatulea 
> Reviewed-by: Jiri Pirko 
> Acked-by: Michael S. Tsirkin 
> ---
>  drivers/net/virtio_net.c| 120 
>  include/uapi/linux/virtio_net.h |  14 
>  2 files changed, 122 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 802ed21453f5..0c3ee1e26ece 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -144,6 +144,8 @@ struct send_queue {
>  
>   struct virtnet_sq_stats stats;
>  
> + struct virtnet_interrupt_coalesce intr_coal;
> +
>   struct napi_struct napi;
>  
>   /* Record whether sq is in reset state. */
> @@ -161,6 +163,8 @@ struct receive_queue {
>  
>   struct virtnet_rq_stats stats;
>  
> + struct virtnet_interrupt_coalesce intr_coal;
> +
>   /* Chain pages by the private ptr. */
>   struct page *pages;
>  
> @@ -212,6 +216,7 @@ struct control_buf {
>   struct virtio_net_ctrl_rss rss;
>   struct virtio_net_ctrl_coal_tx coal_tx;
>   struct virtio_net_ctrl_coal_rx coal_rx;
> + struct virtio_net_ctrl_coal_vq coal_vq;
>  };
>  
>  struct virtnet_info {
> @@ -3078,6 +3083,55 @@ static int virtnet_send_notf_coal_cmds(struct 
> virtnet_info *vi,
>   return 0;
>  }
>  
> +static int virtnet_send_ctrl_coal_vq_cmd(struct virtnet_info *vi,
> +  u16 vqn, u32 max_usecs, u32 
> max_packets)
> +{
> + struct scatterlist sgs;
> +
> + vi->ctrl->coal_vq.vqn = cpu_to_le16(vqn);
> + vi->ctrl->coal_vq.coal.max_usecs = cpu_to_le32(max_usecs);
> + vi->ctrl->coal_vq.coal.max_packets = cpu_to_le32(max_packets);
> + sg_init_one(, >ctrl->coal_vq, sizeof(vi->ctrl->coal_vq));
> +
> + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
> +   VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET,
> +   ))
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> +static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi,
> +   struct ethtool_coalesce *ec,
> +   u16 queue)
> +{
> + int err;
> +
> + if (ec->rx_coalesce_usecs || ec->rx_max_coalesced_frames) {
> + err = virtnet_send_ctrl_coal_vq_cmd(vi, rxq2vq(queue),
> + ec->rx_coalesce_usecs,
> + 
> ec->rx_max_coalesced_frames);
> + if (err)
> + return err;
> + /* Save parameters */
> + vi->rq[queue].intr_coal.max_usecs = ec->rx_coalesce_usecs;
> + vi->rq[queue].intr_coal.max_packets = 
> ec->rx_max_coalesced_frames;
> + }
> +
> + if (ec->tx_coalesce_usecs || ec->tx_max_coalesced_frames) {
> + err = virtnet_send_ctrl_coal_vq_cmd(vi, txq2vq(queue),
> + ec->tx_coalesce_usecs,
> + 
> ec->tx_max_coalesced_frames);
> + if (err)
> + return err;
> + /* Save parameters */
> + vi->sq[queue].intr_coal.max_usecs = ec->tx_coalesce_usecs;
> + vi->sq[queue].intr_coal.max_packets = 
> ec->tx_max_coalesced_frames;
> + }
> +
> + return 0;
> +}
> +
>  static int virtnet_coal_params_supported(struct ethtool_coalesce *ec)
>  {
>   /* usecs coalescing is supported only if VIRTIO_NET_F_NOTF_COAL
> @@ -3094,23 +3148,39 @@ static int virtnet_coal_params_supported(struct 
> ethtool_coalesce *ec)
>  }
>  
>  static int virtnet_set_coalesce_one(struct net_device *dev,
> - struct ethtool_coalesce *ec)
> + struct ethtool_coalesce *ec,
> + bool per_queue,
> + u32 queue)
>  {
>   struct virtnet_info *vi = netdev_priv(dev);
> - int ret, i, napi_weight;
> + int queue_count = per_queue ? 1 : vi->max_queue_pairs;
> + int queue_number = per_queue ? queue : 0;

Actually can't we refactor this? This whole function is littered
with if/else branches. just code it separately - the only
common part is:

napi_weight = ec->tx_max_coalesced_frames ? NAPI_POLL_WEIGHT : 0;
if (napi_weight ^ vi->sq[0].napi.weight) {
if (dev->flags & IFF_UP)
return -EBUSY;
else
update_napi = true;
}

so just move this to a helper and have two functions - global and
per queue.



>   bool update_napi = false;
> 

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-24 Thread Michael S. Tsirkin
On Mon, Jul 24, 2023 at 02:52:49PM +0800, Jason Wang wrote:
> On Mon, Jul 24, 2023 at 2:46 PM Michael S. Tsirkin  wrote:
> >
> > On Fri, Jul 21, 2023 at 10:18:03PM +0200, Maxime Coquelin wrote:
> > >
> > >
> > > On 7/21/23 17:10, Michael S. Tsirkin wrote:
> > > > On Fri, Jul 21, 2023 at 04:58:04PM +0200, Maxime Coquelin wrote:
> > > > >
> > > > >
> > > > > On 7/21/23 16:45, Michael S. Tsirkin wrote:
> > > > > > On Fri, Jul 21, 2023 at 04:37:00PM +0200, Maxime Coquelin wrote:
> > > > > > >
> > > > > > >
> > > > > > > On 7/20/23 23:02, Michael S. Tsirkin wrote:
> > > > > > > > On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson wrote:
> > > > > > > > > On 7/20/23 1:38 AM, Jason Wang wrote:
> > > > > > > > > >
> > > > > > > > > > Adding cond_resched() to the command waiting loop for a 
> > > > > > > > > > better
> > > > > > > > > > co-operation with the scheduler. This allows to give CPU a 
> > > > > > > > > > breath to
> > > > > > > > > > run other task(workqueue) instead of busy looping when 
> > > > > > > > > > preemption is
> > > > > > > > > > not allowed on a device whose CVQ might be slow.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Jason Wang 
> > > > > > > > >
> > > > > > > > > This still leaves hung processes, but at least it doesn't pin 
> > > > > > > > > the CPU any
> > > > > > > > > more.  Thanks.
> > > > > > > > > Reviewed-by: Shannon Nelson 
> > > > > > > > >
> > > > > > > >
> > > > > > > > I'd like to see a full solution
> > > > > > > > 1- block until interrupt
> > > > > > >
> > > > > > > Would it make sense to also have a timeout?
> > > > > > > And when timeout expires, set FAILED bit in device status?
> > > > > >
> > > > > > virtio spec does not set any limits on the timing of vq
> > > > > > processing.
> > > > >
> > > > > Indeed, but I thought the driver could decide it is too long for it.
> > > > >
> > > > > The issue is we keep waiting with rtnl locked, it can quickly make the
> > > > > system unusable.
> > > >
> > > > if this is a problem we should find a way not to keep rtnl
> > > > locked indefinitely.
> > >
> > > From the tests I have done, I think it is. With OVS, a reconfiguration is
> > > performed when the VDUSE device is added, and when a MLX5 device is
> > > in the same bridge, it ends up doing an ioctl() that tries to take the
> > > rtnl lock. In this configuration, it is not possible to kill OVS because
> > > it is stuck trying to acquire rtnl lock for mlx5 that is held by virtio-
> > > net.
> >
> > So for sure, we can queue up the work and process it later.
> > The somewhat tricky part is limiting the memory consumption.
> 
> And it needs to sync with rtnl somehow, e.g device unregistering which
> seems not easy.
> 
> Thanks

since when does device unregister need to send cvq commands?

> >
> >
> > > >
> > > > > > > > 2- still handle surprise removal correctly by waking in that 
> > > > > > > > case
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > > ---
> > > > > > > > > >  drivers/net/virtio_net.c | 4 +++-
> > > > > > > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/net/virtio_net.c 
> > > > > > > > > > b/drivers/net/virtio_net.c
> > > > > > > > > > index 9f3b1d6ac33d..e7533f29b219 100644
> > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > @@ -2314,8 +2314,10 @@ static bool 
> > > > > > > > > > virtnet_send_command(struct virtnet_info *vi, u8 class, u8 
> > > > > > > > > > cmd,
> > > > > > > > > >  * into the hypervisor, so the request should 
> > > > > > > > > > be handled immediately.
> > > > > > > > > >  */
> > > > > > > > > > while (!virtqueue_get_buf(vi->cvq, ) &&
> > > > > > > > > > -  !virtqueue_is_broken(vi->cvq))
> > > > > > > > > > +  !virtqueue_is_broken(vi->cvq)) {
> > > > > > > > > > +   cond_resched();
> > > > > > > > > > cpu_relax();
> > > > > > > > > > +   }
> > > > > > > > > >
> > > > > > > > > > return vi->ctrl->status == VIRTIO_NET_OK;
> > > > > > > > > >  }
> > > > > > > > > > --
> > > > > > > > > > 2.39.3
> > > > > > > > > >
> > > > > > > > > > ___
> > > > > > > > > > Virtualization mailing list
> > > > > > > > > > Virtualization@lists.linux-foundation.org
> > > > > > > > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> > > > > > > >
> > > > > >
> > > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-24 Thread Michael S. Tsirkin
On Mon, Jul 24, 2023 at 02:52:05PM +0800, Jason Wang wrote:
> On Sat, Jul 22, 2023 at 4:18 AM Maxime Coquelin
>  wrote:
> >
> >
> >
> > On 7/21/23 17:10, Michael S. Tsirkin wrote:
> > > On Fri, Jul 21, 2023 at 04:58:04PM +0200, Maxime Coquelin wrote:
> > >>
> > >>
> > >> On 7/21/23 16:45, Michael S. Tsirkin wrote:
> > >>> On Fri, Jul 21, 2023 at 04:37:00PM +0200, Maxime Coquelin wrote:
> > 
> > 
> >  On 7/20/23 23:02, Michael S. Tsirkin wrote:
> > > On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson wrote:
> > >> On 7/20/23 1:38 AM, Jason Wang wrote:
> > >>>
> > >>> Adding cond_resched() to the command waiting loop for a better
> > >>> co-operation with the scheduler. This allows to give CPU a breath to
> > >>> run other task(workqueue) instead of busy looping when preemption is
> > >>> not allowed on a device whose CVQ might be slow.
> > >>>
> > >>> Signed-off-by: Jason Wang 
> > >>
> > >> This still leaves hung processes, but at least it doesn't pin the 
> > >> CPU any
> > >> more.  Thanks.
> > >> Reviewed-by: Shannon Nelson 
> > >>
> > >
> > > I'd like to see a full solution
> > > 1- block until interrupt
> 
> I remember in previous versions, you worried about the extra MSI
> vector. (Maybe I was wrong).
> 
> > 
> >  Would it make sense to also have a timeout?
> >  And when timeout expires, set FAILED bit in device status?
> > >>>
> > >>> virtio spec does not set any limits on the timing of vq
> > >>> processing.
> > >>
> > >> Indeed, but I thought the driver could decide it is too long for it.
> > >>
> > >> The issue is we keep waiting with rtnl locked, it can quickly make the
> > >> system unusable.
> > >
> > > if this is a problem we should find a way not to keep rtnl
> > > locked indefinitely.
> 
> Any ideas on this direction? Simply dropping rtnl during the busy loop
> will result in a lot of races. This seems to require non-trivial
> changes in the networking core.
> 
> >
> >  From the tests I have done, I think it is. With OVS, a reconfiguration
> > is performed when the VDUSE device is added, and when a MLX5 device is
> > in the same bridge, it ends up doing an ioctl() that tries to take the
> > rtnl lock. In this configuration, it is not possible to kill OVS because
> > it is stuck trying to acquire rtnl lock for mlx5 that is held by virtio-
> > net.
> 
> Yeah, basically, any RTNL users would be blocked forever.
> 
> And the infinite loop has other side effects like it blocks the freezer to 
> work.
> 
> To summarize, there are three issues
> 
> 1) busy polling
> 2) breaks freezer
> 3) hold RTNL during the loop
> 
> Solving 3 may help somehow for 2 e.g some pm routine e.g wireguard or
> even virtnet_restore() itself may try to hold the lock.

Yep. So my feeling currently is, the only real fix is to actually
queue up the work in software. It's mostly trivial to limit
memory consumption, vid's is the
only one where it would make sense to have more than
1 command of a given type outstanding. have a tree
or a bitmap with vids to add/remove?



> >
> > >
> > > 2- still handle surprise removal correctly by waking in that case
> 
> This is basically what version 1 did?
> 
> https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368...@redhat.com/t/
> 
> Thanks

Yes - except the timeout part.


> > >
> > >
> > >
> > >>> ---
> > >>>  drivers/net/virtio_net.c | 4 +++-
> > >>>  1 file changed, 3 insertions(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > >>> index 9f3b1d6ac33d..e7533f29b219 100644
> > >>> --- a/drivers/net/virtio_net.c
> > >>> +++ b/drivers/net/virtio_net.c
> > >>> @@ -2314,8 +2314,10 @@ static bool virtnet_send_command(struct 
> > >>> virtnet_info *vi, u8 class, u8 cmd,
> > >>>  * into the hypervisor, so the request should be 
> > >>> handled immediately.
> > >>>  */
> > >>> while (!virtqueue_get_buf(vi->cvq, ) &&
> > >>> -  !virtqueue_is_broken(vi->cvq))
> > >>> +  !virtqueue_is_broken(vi->cvq)) {
> > >>> +   cond_resched();
> > >>> cpu_relax();
> > >>> +   }
> > >>>
> > >>> return vi->ctrl->status == VIRTIO_NET_OK;
> > >>>  }
> > >>> --
> > >>> 2.39.3
> > >>>
> > >>> ___
> > >>> Virtualization mailing list
> > >>> Virtualization@lists.linux-foundation.org
> > >>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> > >
> > >>>
> > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1] vdpa: Complement vdpa_nl_policy for nlattr length check

2023-07-24 Thread Jason Wang
On Sun, Jul 23, 2023 at 6:02 PM Michael S. Tsirkin  wrote:
>
> On Sun, Jul 23, 2023 at 05:48:46PM +0800, Lin Ma wrote:
> >
> > > Sure, that is another undergoing task I'm working on. If the nlattr is 
> > > parsed with
> > > NL_VALIDATE_UNSPEC, any forgotten nlattr will be rejected, therefore 
> > > (which is the default
> > > for modern nla_parse).
> >
> > For the general netlink interface, the deciding flag should be 
> > genl_ops.validate defined in
> > each ops. The default validate flag is strict, while the developer can 
> > overwrite the flag
> > with GENL_DONT_VALIDATE_STRICT to ease the validation. That is to say, 
> > safer code should
> > enforce NL_VALIDATE_STRICT by not overwriting the validate flag.
> >
> > Regrads
> > Lin
>
>
> Oh I see.
>
> It started here:
>
> commit 33b347503f014ebf76257327cbc7001c6b721956
> Author: Parav Pandit 
> Date:   Tue Jan 5 12:32:00 2021 +0200
>
> vdpa: Define vdpa mgmt device, ops and a netlink interface
>
> which did:
>
> +   .validate = GENL_DONT_VALIDATE_STRICT | 
> GENL_DONT_VALIDATE_DUMP,
>
>
> which was most likely just a copy paste from somewhere, right Parav?
>
> and then everyone kept copying this around.
>
> Parav, Eli can we drop these? There's a tiny chance of breaking something
> but I feel there aren't that many users outside mlx5 yet, so if you
> guys can test on mlx5 and confirm no breakage, I think we are good.

Adding Dragos.

Thanks

>
> --
> MST
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-24 Thread Jason Wang
On Mon, Jul 24, 2023 at 2:46 PM Michael S. Tsirkin  wrote:
>
> On Fri, Jul 21, 2023 at 10:18:03PM +0200, Maxime Coquelin wrote:
> >
> >
> > On 7/21/23 17:10, Michael S. Tsirkin wrote:
> > > On Fri, Jul 21, 2023 at 04:58:04PM +0200, Maxime Coquelin wrote:
> > > >
> > > >
> > > > On 7/21/23 16:45, Michael S. Tsirkin wrote:
> > > > > On Fri, Jul 21, 2023 at 04:37:00PM +0200, Maxime Coquelin wrote:
> > > > > >
> > > > > >
> > > > > > On 7/20/23 23:02, Michael S. Tsirkin wrote:
> > > > > > > On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson wrote:
> > > > > > > > On 7/20/23 1:38 AM, Jason Wang wrote:
> > > > > > > > >
> > > > > > > > > Adding cond_resched() to the command waiting loop for a better
> > > > > > > > > co-operation with the scheduler. This allows to give CPU a 
> > > > > > > > > breath to
> > > > > > > > > run other task(workqueue) instead of busy looping when 
> > > > > > > > > preemption is
> > > > > > > > > not allowed on a device whose CVQ might be slow.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Jason Wang 
> > > > > > > >
> > > > > > > > This still leaves hung processes, but at least it doesn't pin 
> > > > > > > > the CPU any
> > > > > > > > more.  Thanks.
> > > > > > > > Reviewed-by: Shannon Nelson 
> > > > > > > >
> > > > > > >
> > > > > > > I'd like to see a full solution
> > > > > > > 1- block until interrupt
> > > > > >
> > > > > > Would it make sense to also have a timeout?
> > > > > > And when timeout expires, set FAILED bit in device status?
> > > > >
> > > > > virtio spec does not set any limits on the timing of vq
> > > > > processing.
> > > >
> > > > Indeed, but I thought the driver could decide it is too long for it.
> > > >
> > > > The issue is we keep waiting with rtnl locked, it can quickly make the
> > > > system unusable.
> > >
> > > if this is a problem we should find a way not to keep rtnl
> > > locked indefinitely.
> >
> > From the tests I have done, I think it is. With OVS, a reconfiguration is
> > performed when the VDUSE device is added, and when a MLX5 device is
> > in the same bridge, it ends up doing an ioctl() that tries to take the
> > rtnl lock. In this configuration, it is not possible to kill OVS because
> > it is stuck trying to acquire rtnl lock for mlx5 that is held by virtio-
> > net.
>
> So for sure, we can queue up the work and process it later.
> The somewhat tricky part is limiting the memory consumption.

And it needs to sync with rtnl somehow, e.g device unregistering which
seems not easy.

Thanks

>
>
> > >
> > > > > > > 2- still handle surprise removal correctly by waking in that case
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > > ---
> > > > > > > > >  drivers/net/virtio_net.c | 4 +++-
> > > > > > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/net/virtio_net.c 
> > > > > > > > > b/drivers/net/virtio_net.c
> > > > > > > > > index 9f3b1d6ac33d..e7533f29b219 100644
> > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > @@ -2314,8 +2314,10 @@ static bool 
> > > > > > > > > virtnet_send_command(struct virtnet_info *vi, u8 class, u8 
> > > > > > > > > cmd,
> > > > > > > > >  * into the hypervisor, so the request should be 
> > > > > > > > > handled immediately.
> > > > > > > > >  */
> > > > > > > > > while (!virtqueue_get_buf(vi->cvq, ) &&
> > > > > > > > > -  !virtqueue_is_broken(vi->cvq))
> > > > > > > > > +  !virtqueue_is_broken(vi->cvq)) {
> > > > > > > > > +   cond_resched();
> > > > > > > > > cpu_relax();
> > > > > > > > > +   }
> > > > > > > > >
> > > > > > > > > return vi->ctrl->status == VIRTIO_NET_OK;
> > > > > > > > >  }
> > > > > > > > > --
> > > > > > > > > 2.39.3
> > > > > > > > >
> > > > > > > > > ___
> > > > > > > > > Virtualization mailing list
> > > > > > > > > Virtualization@lists.linux-foundation.org
> > > > > > > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> > > > > > >
> > > > >
> > >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-24 Thread Jason Wang
On Sat, Jul 22, 2023 at 4:18 AM Maxime Coquelin
 wrote:
>
>
>
> On 7/21/23 17:10, Michael S. Tsirkin wrote:
> > On Fri, Jul 21, 2023 at 04:58:04PM +0200, Maxime Coquelin wrote:
> >>
> >>
> >> On 7/21/23 16:45, Michael S. Tsirkin wrote:
> >>> On Fri, Jul 21, 2023 at 04:37:00PM +0200, Maxime Coquelin wrote:
> 
> 
>  On 7/20/23 23:02, Michael S. Tsirkin wrote:
> > On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson wrote:
> >> On 7/20/23 1:38 AM, Jason Wang wrote:
> >>>
> >>> Adding cond_resched() to the command waiting loop for a better
> >>> co-operation with the scheduler. This allows to give CPU a breath to
> >>> run other task(workqueue) instead of busy looping when preemption is
> >>> not allowed on a device whose CVQ might be slow.
> >>>
> >>> Signed-off-by: Jason Wang 
> >>
> >> This still leaves hung processes, but at least it doesn't pin the CPU 
> >> any
> >> more.  Thanks.
> >> Reviewed-by: Shannon Nelson 
> >>
> >
> > I'd like to see a full solution
> > 1- block until interrupt

I remember in previous versions, you worried about the extra MSI
vector. (Maybe I was wrong).

> 
>  Would it make sense to also have a timeout?
>  And when timeout expires, set FAILED bit in device status?
> >>>
> >>> virtio spec does not set any limits on the timing of vq
> >>> processing.
> >>
> >> Indeed, but I thought the driver could decide it is too long for it.
> >>
> >> The issue is we keep waiting with rtnl locked, it can quickly make the
> >> system unusable.
> >
> > if this is a problem we should find a way not to keep rtnl
> > locked indefinitely.

Any ideas on this direction? Simply dropping rtnl during the busy loop
will result in a lot of races. This seems to require non-trivial
changes in the networking core.

>
>  From the tests I have done, I think it is. With OVS, a reconfiguration
> is performed when the VDUSE device is added, and when a MLX5 device is
> in the same bridge, it ends up doing an ioctl() that tries to take the
> rtnl lock. In this configuration, it is not possible to kill OVS because
> it is stuck trying to acquire rtnl lock for mlx5 that is held by virtio-
> net.

Yeah, basically, any RTNL users would be blocked forever.

And the infinite loop has other side effects like it blocks the freezer to work.

To summarize, there are three issues

1) busy polling
2) breaks freezer
3) hold RTNL during the loop

Solving 3 may help somehow for 2 e.g some pm routine e.g wireguard or
even virtnet_restore() itself may try to hold the lock.

>
> >
> > 2- still handle surprise removal correctly by waking in that case

This is basically what version 1 did?

https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368...@redhat.com/t/

Thanks

> >
> >
> >
> >>> ---
> >>>  drivers/net/virtio_net.c | 4 +++-
> >>>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>> index 9f3b1d6ac33d..e7533f29b219 100644
> >>> --- a/drivers/net/virtio_net.c
> >>> +++ b/drivers/net/virtio_net.c
> >>> @@ -2314,8 +2314,10 @@ static bool virtnet_send_command(struct 
> >>> virtnet_info *vi, u8 class, u8 cmd,
> >>>  * into the hypervisor, so the request should be handled 
> >>> immediately.
> >>>  */
> >>> while (!virtqueue_get_buf(vi->cvq, ) &&
> >>> -  !virtqueue_is_broken(vi->cvq))
> >>> +  !virtqueue_is_broken(vi->cvq)) {
> >>> +   cond_resched();
> >>> cpu_relax();
> >>> +   }
> >>>
> >>> return vi->ctrl->status == VIRTIO_NET_OK;
> >>>  }
> >>> --
> >>> 2.39.3
> >>>
> >>> ___
> >>> Virtualization mailing list
> >>> Virtualization@lists.linux-foundation.org
> >>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> >
> >>>
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-24 Thread Michael S. Tsirkin
On Fri, Jul 21, 2023 at 10:18:03PM +0200, Maxime Coquelin wrote:
> 
> 
> On 7/21/23 17:10, Michael S. Tsirkin wrote:
> > On Fri, Jul 21, 2023 at 04:58:04PM +0200, Maxime Coquelin wrote:
> > > 
> > > 
> > > On 7/21/23 16:45, Michael S. Tsirkin wrote:
> > > > On Fri, Jul 21, 2023 at 04:37:00PM +0200, Maxime Coquelin wrote:
> > > > > 
> > > > > 
> > > > > On 7/20/23 23:02, Michael S. Tsirkin wrote:
> > > > > > On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson wrote:
> > > > > > > On 7/20/23 1:38 AM, Jason Wang wrote:
> > > > > > > > 
> > > > > > > > Adding cond_resched() to the command waiting loop for a better
> > > > > > > > co-operation with the scheduler. This allows to give CPU a 
> > > > > > > > breath to
> > > > > > > > run other task(workqueue) instead of busy looping when 
> > > > > > > > preemption is
> > > > > > > > not allowed on a device whose CVQ might be slow.
> > > > > > > > 
> > > > > > > > Signed-off-by: Jason Wang 
> > > > > > > 
> > > > > > > This still leaves hung processes, but at least it doesn't pin the 
> > > > > > > CPU any
> > > > > > > more.  Thanks.
> > > > > > > Reviewed-by: Shannon Nelson 
> > > > > > > 
> > > > > > 
> > > > > > I'd like to see a full solution
> > > > > > 1- block until interrupt
> > > > > 
> > > > > Would it make sense to also have a timeout?
> > > > > And when timeout expires, set FAILED bit in device status?
> > > > 
> > > > virtio spec does not set any limits on the timing of vq
> > > > processing.
> > > 
> > > Indeed, but I thought the driver could decide it is too long for it.
> > > 
> > > The issue is we keep waiting with rtnl locked, it can quickly make the
> > > system unusable.
> > 
> > if this is a problem we should find a way not to keep rtnl
> > locked indefinitely.
> 
> From the tests I have done, I think it is. With OVS, a reconfiguration is
> performed when the VDUSE device is added, and when a MLX5 device is
> in the same bridge, it ends up doing an ioctl() that tries to take the
> rtnl lock. In this configuration, it is not possible to kill OVS because
> it is stuck trying to acquire rtnl lock for mlx5 that is held by virtio-
> net.

So for sure, we can queue up the work and process it later.
The somewhat tricky part is limiting the memory consumption.


> > 
> > > > > > 2- still handle surprise removal correctly by waking in that case
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > > ---
> > > > > > > >  drivers/net/virtio_net.c | 4 +++-
> > > > > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > index 9f3b1d6ac33d..e7533f29b219 100644
> > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > @@ -2314,8 +2314,10 @@ static bool virtnet_send_command(struct 
> > > > > > > > virtnet_info *vi, u8 class, u8 cmd,
> > > > > > > >  * into the hypervisor, so the request should be 
> > > > > > > > handled immediately.
> > > > > > > >  */
> > > > > > > > while (!virtqueue_get_buf(vi->cvq, ) &&
> > > > > > > > -  !virtqueue_is_broken(vi->cvq))
> > > > > > > > +  !virtqueue_is_broken(vi->cvq)) {
> > > > > > > > +   cond_resched();
> > > > > > > > cpu_relax();
> > > > > > > > +   }
> > > > > > > > 
> > > > > > > > return vi->ctrl->status == VIRTIO_NET_OK;
> > > > > > > >  }
> > > > > > > > --
> > > > > > > > 2.39.3
> > > > > > > > 
> > > > > > > > ___
> > > > > > > > Virtualization mailing list
> > > > > > > > Virtualization@lists.linux-foundation.org
> > > > > > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> > > > > > 
> > > > 
> > 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization