Re: [PATCH net-next v4 0/4] vsock/virtio/vhost: MSG_ZEROCOPY preparations

2023-07-27 Thread Michael S. Tsirkin
On Fri, Jul 28, 2023 at 01:26:23AM +0300, Arseniy Krasnov wrote:
> Hello,
> 
> this patchset is first of three parts of another big patchset for
> MSG_ZEROCOPY flag support:
> https://lore.kernel.org/netdev/20230701063947.3422088-1-avkras...@sberdevices.ru/

overall looks good. Two points I'd like to see addressed:
- what's the performance with all these changes - still same?
- most systems have a copybreak scheme where buffers
  smaller than a given size are copied directly.
  This will address regression you see with small buffers -
  but need to find that value. we know it's between 4k and 32k :)


> During review of this series, Stefano Garzarella 
> suggested to split it for three parts to simplify review and merging:
> 
> 1) virtio and vhost updates (for fragged skbs) <--- this patchset
> 2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
>tx completions) and update for Documentation/.
> 3) Updates for tests and utils.
> 
> This series enables handling of fragged skbs in virtio and vhost parts.
> Newly logic won't be triggered, because SO_ZEROCOPY options is still
> impossible to enable at this moment (next bunch of patches from big
> set above will enable it).
> 
> I've included changelog to some patches anyway, because there were some
> comments during review of last big patchset from the link above.
> 
> Head for this patchset is 9d0cd5d25f7d45bce01bbb3193b54ac24b3a60f3
> 
> Link to v1:
> https://lore.kernel.org/netdev/20230717210051.856388-1-avkras...@sberdevices.ru/
> Link to v2:
> https://lore.kernel.org/netdev/20230718180237.3248179-1-avkras...@sberdevices.ru/
> Link to v3:
> https://lore.kernel.org/netdev/20230720214245.457298-1-avkras...@sberdevices.ru/
> 
> Changelog:
>  * Patchset rebased and tested on new HEAD of net-next (see hash above).
>  * See per-patch changelog after ---.
> 
> Arseniy Krasnov (4):
>   vsock/virtio/vhost: read data from non-linear skb
>   vsock/virtio: support to send non-linear skb
>   vsock/virtio: non-linear skb handling for tap
>   vsock/virtio: MSG_ZEROCOPY flag support
> 
>  drivers/vhost/vsock.c   |  14 +-
>  include/linux/virtio_vsock.h|   6 +
>  net/vmw_vsock/virtio_transport.c|  79 +-
>  net/vmw_vsock/virtio_transport_common.c | 312 ++--
>  4 files changed, 330 insertions(+), 81 deletions(-)
> 
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next V4 2/3] virtio_net: support per queue interrupt coalesce command

2023-07-27 Thread Michael S. Tsirkin
On Thu, Jul 27, 2023 at 03:28:32PM +0200, Paolo Abeni wrote:
> On Tue, 2023-07-25 at 16:07 +0300, Gavin Li wrote:
> > Add interrupt_coalesce config in send_queue and receive_queue to cache user
> > config.
> > 
> > Send per virtqueue interrupt moderation config to underlying device in
> > order to have more efficient interrupt moderation and cpu utilization of
> > guest VM.
> > 
> > Additionally, address all the VQs when updating the global configuration,
> > as now the individual VQs configuration can diverge from the global
> > configuration.
> > 
> > Signed-off-by: Gavin Li 
> > Reviewed-by: Dragos Tatulea 
> > Reviewed-by: Jiri Pirko 
> > Acked-by: Michael S. Tsirkin 
> 
> FTR, this patch is significantly different from the version previously
> acked/reviewed, I'm unsure if all the reviewers are ok with the new
> one.
> 
> [...]

still ok by me

Acked-by: Michael S. Tsirkin 

let's wait for Jason too.

> >  static int virtnet_set_coalesce(struct net_device *dev,
> > struct ethtool_coalesce *ec,
> > struct kernel_ethtool_coalesce *kernel_coal,
> > struct netlink_ext_ack *extack)
> >  {
> > struct virtnet_info *vi = netdev_priv(dev);
> > -   int ret, i, napi_weight;
> > +   int ret, queue_number, napi_weight;
> > bool update_napi = false;
> >  
> > /* Can't change NAPI weight if the link is up */
> > napi_weight = ec->tx_max_coalesced_frames ? NAPI_POLL_WEIGHT : 0;
> > -   if (napi_weight ^ vi->sq[0].napi.weight) {
> > -   if (dev->flags & IFF_UP)
> > -   return -EBUSY;
> > -   else
> > -   update_napi = true;
> > +   for (queue_number = 0; queue_number < vi->max_queue_pairs; 
> > queue_number++) {
> > +   ret = virtnet_should_update_vq_weight(dev->flags, napi_weight,
> > + 
> > vi->sq[queue_number].napi.weight,
> > + _napi);
> > +   if (ret)
> > +   return ret;
> > +
> > +   if (update_napi) {
> > +   /* All queues that belong to [queue_number, 
> > queue_count] will be
> > +* updated for the sake of simplicity, which might not 
> > be necessary
> 
> It looks like the comment above still refers to the old code. Should
> be:
>   [queue_number, vi->max_queue_pairs]
>   
> Otherwise LGTM, thanks!
> 
> Paolo

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next V4 2/3] virtio_net: support per queue interrupt coalesce command

2023-07-27 Thread Jason Wang
On Thu, Jul 27, 2023 at 9:28 PM Paolo Abeni  wrote:
>
> On Tue, 2023-07-25 at 16:07 +0300, Gavin Li wrote:
> > Add interrupt_coalesce config in send_queue and receive_queue to cache user
> > config.
> >
> > Send per virtqueue interrupt moderation config to underlying device in
> > order to have more efficient interrupt moderation and cpu utilization of
> > guest VM.
> >
> > Additionally, address all the VQs when updating the global configuration,
> > as now the individual VQs configuration can diverge from the global
> > configuration.
> >
> > Signed-off-by: Gavin Li 
> > Reviewed-by: Dragos Tatulea 
> > Reviewed-by: Jiri Pirko 
> > Acked-by: Michael S. Tsirkin 
>
> FTR, this patch is significantly different from the version previously
> acked/reviewed, I'm unsure if all the reviewers are ok with the new
> one.

Good point, and I plan to review this no later than next Monday and
offer my ack if necessary. Please hold this series now.

Thanks

>
> [...]
>
> >  static int virtnet_set_coalesce(struct net_device *dev,
> >   struct ethtool_coalesce *ec,
> >   struct kernel_ethtool_coalesce *kernel_coal,
> >   struct netlink_ext_ack *extack)
> >  {
> >   struct virtnet_info *vi = netdev_priv(dev);
> > - int ret, i, napi_weight;
> > + int ret, queue_number, napi_weight;
> >   bool update_napi = false;
> >
> >   /* Can't change NAPI weight if the link is up */
> >   napi_weight = ec->tx_max_coalesced_frames ? NAPI_POLL_WEIGHT : 0;
> > - if (napi_weight ^ vi->sq[0].napi.weight) {
> > - if (dev->flags & IFF_UP)
> > - return -EBUSY;
> > - else
> > - update_napi = true;
> > + for (queue_number = 0; queue_number < vi->max_queue_pairs; 
> > queue_number++) {
> > + ret = virtnet_should_update_vq_weight(dev->flags, napi_weight,
> > +   
> > vi->sq[queue_number].napi.weight,
> > +   _napi);
> > + if (ret)
> > + return ret;
> > +
> > + if (update_napi) {
> > + /* All queues that belong to [queue_number, 
> > queue_count] will be
> > +  * updated for the sake of simplicity, which might 
> > not be necessary
>
> It looks like the comment above still refers to the old code. Should
> be:
> [queue_number, vi->max_queue_pairs]
>
> Otherwise LGTM, thanks!
>
> Paolo
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vdpa/mlx5: Correct default number of queues when MQ is on

2023-07-27 Thread Jason Wang
On Fri, Jul 28, 2023 at 1:25 AM Dragos Tatulea  wrote:
>
> The standard specifies that the initial number of queues is the
> default, which is 1 (1 tx, 1 rx).
>
> Signed-off-by: Dragos Tatulea 
> Reviewed-by: Eugenio Pérez 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 9138ef2fb2c8..6b6eb69a8a90 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2517,7 +2517,15 @@ static int mlx5_vdpa_set_driver_features(struct 
> vdpa_device *vdev, u64 features)
> else
> ndev->rqt_size = 1;
>
> -   ndev->cur_num_vqs = 2 * ndev->rqt_size;
> +   /* Device must start with 1 queue pair, as per VIRTIO v1.2 spec, 
> section
> +* 5.1.6.5.5 "Device operation in multiqueue mode":
> +*
> +* Multiqueue is disabled by default.
> +* The driver enables multiqueue by sending a command using class
> +* VIRTIO_NET_CTRL_MQ. The command selects the mode of multiqueue
> +* operation, as follows: ...
> +*/
> +   ndev->cur_num_vqs = 2;
>
> update_cvq_info(mvdev);
> return err;
> --
> 2.41.0
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 28/49] dm zoned: dynamically allocate the dm-zoned-meta shrinker

2023-07-27 Thread Dave Chinner via Virtualization
On Thu, Jul 27, 2023 at 07:20:46PM +0900, Damien Le Moal wrote:
> On 7/27/23 17:55, Qi Zheng wrote:
> >>>   goto err;
> >>>   }
> >>>   +    zmd->mblk_shrinker->count_objects = dmz_mblock_shrinker_count;
> >>> +    zmd->mblk_shrinker->scan_objects = dmz_mblock_shrinker_scan;
> >>> +    zmd->mblk_shrinker->seeks = DEFAULT_SEEKS;
> >>> +    zmd->mblk_shrinker->private_data = zmd;
> >>> +
> >>> +    shrinker_register(zmd->mblk_shrinker);
> >>
> >> I fail to see how this new shrinker API is better... Why isn't there a
> >> shrinker_alloc_and_register() function ? That would avoid adding all this 
> >> code
> >> all over the place as the new API call would be very similar to the current
> >> shrinker_register() call with static allocation.
> > 
> > In some registration scenarios, memory needs to be allocated in advance.
> > So we continue to use the previous prealloc/register_prepared()
> > algorithm. The shrinker_alloc_and_register() is just a helper function
> > that combines the two, and this increases the number of APIs that
> > shrinker exposes to the outside, so I choose not to add this helper.
> 
> And that results in more code in many places instead of less code + a simple
> inline helper in the shrinker header file...

It's not just a "simple helper" - it's a function that has to take 6
or 7 parameters with a return value that must be checked and
handled.

This was done in the first versions of the patch set - the amount of
code in each caller does not go down and, IMO, was much harder to
read and determine "this is obviously correct" that what we have
now.

> So not adding that super simple
> helper is not exactly the best choice in my opinion.

Each to their own - I much prefer the existing style/API over having
to go look up a helper function every time I want to check some
random shrinker has been set up correctly

-Dave.
-- 
Dave Chinner
da...@fromorbit.com
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 4/4] vdpa: Enable strict validation for netlinks ops

2023-07-27 Thread Dragos Tatulea via Virtualization
The previous patches added the missing nla policies that were required for
validation to work.

Now strict validation on netlink ops can be enabled. This patch does it.

Signed-off-by: Dragos Tatulea 
Cc: sta...@vger.kernel.org
---
 drivers/vdpa/vdpa.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index f2f654fd84e5..a7612e0783b3 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -1257,37 +1257,31 @@ static const struct nla_policy 
vdpa_nl_policy[VDPA_ATTR_MAX + 1] = {
 static const struct genl_ops vdpa_nl_ops[] = {
{
.cmd = VDPA_CMD_MGMTDEV_GET,
-   .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
.doit = vdpa_nl_cmd_mgmtdev_get_doit,
.dumpit = vdpa_nl_cmd_mgmtdev_get_dumpit,
},
{
.cmd = VDPA_CMD_DEV_NEW,
-   .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
.doit = vdpa_nl_cmd_dev_add_set_doit,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = VDPA_CMD_DEV_DEL,
-   .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
.doit = vdpa_nl_cmd_dev_del_set_doit,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = VDPA_CMD_DEV_GET,
-   .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
.doit = vdpa_nl_cmd_dev_get_doit,
.dumpit = vdpa_nl_cmd_dev_get_dumpit,
},
{
.cmd = VDPA_CMD_DEV_CONFIG_GET,
-   .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
.doit = vdpa_nl_cmd_dev_config_get_doit,
.dumpit = vdpa_nl_cmd_dev_config_get_dumpit,
},
{
.cmd = VDPA_CMD_DEV_VSTATS_GET,
-   .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
.doit = vdpa_nl_cmd_dev_stats_get_doit,
.flags = GENL_ADMIN_PERM,
},
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 3/4] vdpa: Add max vqp attr to vdpa_nl_policy for nlattr length check

2023-07-27 Thread Dragos Tatulea via Virtualization
From: Lin Ma 

The vdpa_nl_policy structure is used to validate the nlattr when parsing
the incoming nlmsg. It will ensure the attribute being described produces
a valid nlattr pointer in info->attrs before entering into each handler
in vdpa_nl_ops.

That is to say, the missing part in vdpa_nl_policy may lead to illegal
nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.

This patch adds the missing nla_policy for vdpa max vqp attr to avoid
such bugs.

Fixes: ad69dd0bf26b ("vdpa: Introduce query of device config layout")
Signed-off-by: Lin Ma 
Cc: sta...@vger.kernel.org
---
 drivers/vdpa/vdpa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index 75f1df2b9d2a..f2f654fd84e5 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -1247,6 +1247,7 @@ static const struct nla_policy 
vdpa_nl_policy[VDPA_ATTR_MAX + 1] = {
[VDPA_ATTR_MGMTDEV_DEV_NAME] = { .type = NLA_STRING },
[VDPA_ATTR_DEV_NAME] = { .type = NLA_STRING },
[VDPA_ATTR_DEV_NET_CFG_MACADDR] = NLA_POLICY_ETH_ADDR,
+   [VDPA_ATTR_DEV_NET_CFG_MAX_VQP] = { .type = NLA_U16 },
/* virtio spec 1.1 section 5.1.4.1 for valid MTU range */
[VDPA_ATTR_DEV_NET_CFG_MTU] = NLA_POLICY_MIN(NLA_U16, 68),
[VDPA_ATTR_DEV_QUEUE_INDEX] = { .type = NLA_U32 },
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/4] vdpa: Add queue index attr to vdpa_nl_policy for nlattr length check

2023-07-27 Thread Dragos Tatulea via Virtualization
From: Lin Ma 

The vdpa_nl_policy structure is used to validate the nlattr when parsing
the incoming nlmsg. It will ensure the attribute being described produces
a valid nlattr pointer in info->attrs before entering into each handler
in vdpa_nl_ops.

That is to say, the missing part in vdpa_nl_policy may lead to illegal
nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.

This patch adds the missing nla_policy for vdpa queue index attr to avoid
such bugs.

Fixes: 13b00b135665 ("vdpa: Add support for querying vendor statistics")
Signed-off-by: Lin Ma 
Cc: stable@vger.kernelorg
---
 drivers/vdpa/vdpa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index 3ad355a2208a..75f1df2b9d2a 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -1249,6 +1249,7 @@ static const struct nla_policy 
vdpa_nl_policy[VDPA_ATTR_MAX + 1] = {
[VDPA_ATTR_DEV_NET_CFG_MACADDR] = NLA_POLICY_ETH_ADDR,
/* virtio spec 1.1 section 5.1.4.1 for valid MTU range */
[VDPA_ATTR_DEV_NET_CFG_MTU] = NLA_POLICY_MIN(NLA_U16, 68),
+   [VDPA_ATTR_DEV_QUEUE_INDEX] = { .type = NLA_U32 },
[VDPA_ATTR_DEV_FEATURES] = { .type = NLA_U64 },
 };
 
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/4] vdpa: Add features attr to vdpa_nl_policy for nlattr length check

2023-07-27 Thread Dragos Tatulea via Virtualization
From: Lin Ma 

The vdpa_nl_policy structure is used to validate the nlattr when parsing
the incoming nlmsg. It will ensure the attribute being described produces
a valid nlattr pointer in info->attrs before entering into each handler
in vdpa_nl_ops.

That is to say, the missing part in vdpa_nl_policy may lead to illegal
nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.

This patch adds the missing nla_policy for vdpa features attr to avoid
such bugs.

Fixes: 90fea5a800c3 ("vdpa: device feature provisioning")
Signed-off-by: Lin Ma 
Cc: sta...@vger.kernel.org
---
 drivers/vdpa/vdpa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index 965e32529eb8..3ad355a2208a 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -1249,6 +1249,7 @@ static const struct nla_policy 
vdpa_nl_policy[VDPA_ATTR_MAX + 1] = {
[VDPA_ATTR_DEV_NET_CFG_MACADDR] = NLA_POLICY_ETH_ADDR,
/* virtio spec 1.1 section 5.1.4.1 for valid MTU range */
[VDPA_ATTR_DEV_NET_CFG_MTU] = NLA_POLICY_MIN(NLA_U16, 68),
+   [VDPA_ATTR_DEV_FEATURES] = { .type = NLA_U64 },
 };
 
 static const struct genl_ops vdpa_nl_ops[] = {
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 0/4] vdpa: Enable strict validation for netlink ops

2023-07-27 Thread Dragos Tatulea via Virtualization
The original patch from Lin Ma enables the vdpa driver to use validation
netlink ops. The patch got split into 3 parts for easier backporting.

The last patch simply disables the validation skip which is no longer
neccesary. Patchset started of from this discussion [0].

[0] 
https://lore.kernel.org/virtualization/20230726074710-mutt-send-email-...@kernel.org/T/#t

v3:
  - Split initial patch for easier backporting.
  - Correctly marked patches for stable inclusion.

v2:
  - cc'ed stable

Dragos Tatulea (1):
  vdpa: Enable strict validation for netlinks ops

Lin Ma (3):
  vdpa: Add features attr to vdpa_nl_policy for nlattr length check
  vdpa: Add queue index attr to vdpa_nl_policy for nlattr length check
  vdpa: Add max vqp attr to vdpa_nl_policy for nlattr length check

 drivers/vdpa/vdpa.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] vdpa/mlx5: Correct default number of queues when MQ is on

2023-07-27 Thread Dragos Tatulea via Virtualization
The standard specifies that the initial number of queues is the
default, which is 1 (1 tx, 1 rx).

Signed-off-by: Dragos Tatulea 
Reviewed-by: Eugenio Pérez 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 9138ef2fb2c8..6b6eb69a8a90 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2517,7 +2517,15 @@ static int mlx5_vdpa_set_driver_features(struct 
vdpa_device *vdev, u64 features)
else
ndev->rqt_size = 1;
 
-   ndev->cur_num_vqs = 2 * ndev->rqt_size;
+   /* Device must start with 1 queue pair, as per VIRTIO v1.2 spec, section
+* 5.1.6.5.5 "Device operation in multiqueue mode":
+*
+* Multiqueue is disabled by default.
+* The driver enables multiqueue by sending a command using class
+* VIRTIO_NET_CTRL_MQ. The command selects the mode of multiqueue
+* operation, as follows: ...
+*/
+   ndev->cur_num_vqs = 2;
 
update_cvq_info(mvdev);
return err;
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

2023-07-27 Thread Michael S. Tsirkin
On Thu, Jul 27, 2023 at 04:02:16PM +, Dragos Tatulea wrote:
> On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> > > The ndev was accessed on shutdown without a check if it actually exists.
> > > This triggered the crash pasted below. This patch simply adds a check
> > > before using ndev.
> > > 
> > >  BUG: kernel NULL pointer dereference, address: 0300
> > >  #PF: supervisor read access in kernel mode
> > >  #PF: error_code(0x) - not-present page
> > >  PGD 0 P4D 0
> > >  Oops:  [#1] SMP
> > >  CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> > >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-
> > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > >  RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > >  RSP: 0018:8881003bfdc0 EFLAGS: 00010286
> > >  RAX: 888103befba0 RBX: 888109d28008 RCX: 0017
> > >  RDX: 0001 RSI: 0212 RDI: 888109d28000
> > >  RBP:  R08: 000d3a3a3882 R09: 0001
> > >  R10:  R11:  R12: 888109d28000
> > >  R13: 888109d28080 R14: fee1dead R15: 
> > >  FS:  7f4969e0be40() GS:88852c80()
> > > knlGS:
> > >  CS:  0010 DS:  ES:  CR0: 80050033
> > >  CR2: 0300 CR3: 0001051cd006 CR4: 00370eb0
> > >  DR0:  DR1:  DR2: 
> > >  DR3:  DR6: fffe0ff0 DR7: 0400
> > >  Call Trace:
> > >   
> > >   ? __die+0x20/0x60
> > >   ? page_fault_oops+0x14c/0x3c0
> > >   ? exc_page_fault+0x75/0x140
> > >   ? asm_exc_page_fault+0x22/0x30
> > >   ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > >   device_shutdown+0x13e/0x1e0
> > >   kernel_restart+0x36/0x90
> > >   __do_sys_reboot+0x141/0x210
> > >   ? vfs_writev+0xcd/0x140
> > >   ? handle_mm_fault+0x161/0x260
> > >   ? do_writev+0x6b/0x110
> > >   do_syscall_64+0x3d/0x90
> > >   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > >  RIP: 0033:0x7f496990fb56
> > >  RSP: 002b:7fffc7bdde88 EFLAGS: 0206 ORIG_RAX: 00a9
> > >  RAX: ffda RBX:  RCX: 7f496990fb56
> > >  RDX: 01234567 RSI: 28121969 RDI: fee1dead
> > >  RBP: 7fffc7bde1d0 R08:  R09: 
> > >  R10:  R11: 0206 R12: 
> > >  R13: 7fffc7bddf10 R14:  R15: 7fffc7bde2b8
> > >   
> > >  CR2: 0300
> > >  ---[ end trace  ]---
> > > 
> > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> > > Signed-off-by: Dragos Tatulea 
> > > ---
> > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > index 9138ef2fb2c8..e2e7ebd71798 100644
> > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device
> > > *auxdev)
> > > mgtdev = auxiliary_get_drvdata(auxdev);
> > > ndev = mgtdev->ndev;
> > >  
> > > -   free_irqs(ndev);
> > > +   if (ndev)
> > > +   free_irqs(ndev);
> > >  }
> > >  
> > 
> > something I don't get:
> > irqs are allocated in mlx5_vdpa_dev_add
> > why are they not freed in mlx5_vdpa_dev_del?
> > 
> That is a good point. I will try to find out. I also don't get why free_irq is
> called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can 
> change
> that in a different refactoring.

as it is I have no idea whether e.g. ndev can change
between these two call sites. that would make the check
pointless.

> > this is what's creating all this mess.
> > 
> > 
> Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
> vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I see. 
> Or
> am I missing something?

and why do we care whether irqs are freed on shutdown?

> > >  static const struct auxiliary_device_id mlx5v_id_table[] = {
> > > -- 
> > > 2.41.0
> > 
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

2023-07-27 Thread Dragos Tatulea via Virtualization
On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> > The ndev was accessed on shutdown without a check if it actually exists.
> > This triggered the crash pasted below. This patch simply adds a check
> > before using ndev.
> > 
> >  BUG: kernel NULL pointer dereference, address: 0300
> >  #PF: supervisor read access in kernel mode
> >  #PF: error_code(0x) - not-present page
> >  PGD 0 P4D 0
> >  Oops:  [#1] SMP
> >  CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-
> > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> >  RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> >  RSP: 0018:8881003bfdc0 EFLAGS: 00010286
> >  RAX: 888103befba0 RBX: 888109d28008 RCX: 0017
> >  RDX: 0001 RSI: 0212 RDI: 888109d28000
> >  RBP:  R08: 000d3a3a3882 R09: 0001
> >  R10:  R11:  R12: 888109d28000
> >  R13: 888109d28080 R14: fee1dead R15: 
> >  FS:  7f4969e0be40() GS:88852c80()
> > knlGS:
> >  CS:  0010 DS:  ES:  CR0: 80050033
> >  CR2: 0300 CR3: 0001051cd006 CR4: 00370eb0
> >  DR0:  DR1:  DR2: 
> >  DR3:  DR6: fffe0ff0 DR7: 0400
> >  Call Trace:
> >   
> >   ? __die+0x20/0x60
> >   ? page_fault_oops+0x14c/0x3c0
> >   ? exc_page_fault+0x75/0x140
> >   ? asm_exc_page_fault+0x22/0x30
> >   ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> >   device_shutdown+0x13e/0x1e0
> >   kernel_restart+0x36/0x90
> >   __do_sys_reboot+0x141/0x210
> >   ? vfs_writev+0xcd/0x140
> >   ? handle_mm_fault+0x161/0x260
> >   ? do_writev+0x6b/0x110
> >   do_syscall_64+0x3d/0x90
> >   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> >  RIP: 0033:0x7f496990fb56
> >  RSP: 002b:7fffc7bdde88 EFLAGS: 0206 ORIG_RAX: 00a9
> >  RAX: ffda RBX:  RCX: 7f496990fb56
> >  RDX: 01234567 RSI: 28121969 RDI: fee1dead
> >  RBP: 7fffc7bde1d0 R08:  R09: 
> >  R10:  R11: 0206 R12: 
> >  R13: 7fffc7bddf10 R14:  R15: 7fffc7bde2b8
> >   
> >  CR2: 0300
> >  ---[ end trace  ]---
> > 
> > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> > Signed-off-by: Dragos Tatulea 
> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 9138ef2fb2c8..e2e7ebd71798 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device
> > *auxdev)
> > mgtdev = auxiliary_get_drvdata(auxdev);
> > ndev = mgtdev->ndev;
> >  
> > -   free_irqs(ndev);
> > +   if (ndev)
> > +   free_irqs(ndev);
> >  }
> >  
> 
> something I don't get:
> irqs are allocated in mlx5_vdpa_dev_add
> why are they not freed in mlx5_vdpa_dev_del?
> 
That is a good point. I will try to find out. I also don't get why free_irq is
called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can change
that in a different refactoring.

> this is what's creating all this mess.
> 
> 
Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I see. Or
am I missing something?

> >  static const struct auxiliary_device_id mlx5v_id_table[] = {
> > -- 
> > 2.41.0
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next V4 2/3] virtio_net: support per queue interrupt coalesce command

2023-07-27 Thread Paolo Abeni
On Tue, 2023-07-25 at 16:07 +0300, Gavin Li wrote:
> Add interrupt_coalesce config in send_queue and receive_queue to cache user
> config.
> 
> Send per virtqueue interrupt moderation config to underlying device in
> order to have more efficient interrupt moderation and cpu utilization of
> guest VM.
> 
> Additionally, address all the VQs when updating the global configuration,
> as now the individual VQs configuration can diverge from the global
> configuration.
> 
> Signed-off-by: Gavin Li 
> Reviewed-by: Dragos Tatulea 
> Reviewed-by: Jiri Pirko 
> Acked-by: Michael S. Tsirkin 

FTR, this patch is significantly different from the version previously
acked/reviewed, I'm unsure if all the reviewers are ok with the new
one.

[...]

>  static int virtnet_set_coalesce(struct net_device *dev,
>   struct ethtool_coalesce *ec,
>   struct kernel_ethtool_coalesce *kernel_coal,
>   struct netlink_ext_ack *extack)
>  {
>   struct virtnet_info *vi = netdev_priv(dev);
> - int ret, i, napi_weight;
> + int ret, queue_number, napi_weight;
>   bool update_napi = false;
>  
>   /* Can't change NAPI weight if the link is up */
>   napi_weight = ec->tx_max_coalesced_frames ? NAPI_POLL_WEIGHT : 0;
> - if (napi_weight ^ vi->sq[0].napi.weight) {
> - if (dev->flags & IFF_UP)
> - return -EBUSY;
> - else
> - update_napi = true;
> + for (queue_number = 0; queue_number < vi->max_queue_pairs; 
> queue_number++) {
> + ret = virtnet_should_update_vq_weight(dev->flags, napi_weight,
> +   
> vi->sq[queue_number].napi.weight,
> +   _napi);
> + if (ret)
> + return ret;
> +
> + if (update_napi) {
> + /* All queues that belong to [queue_number, 
> queue_count] will be
> +  * updated for the sake of simplicity, which might not 
> be necessary

It looks like the comment above still refers to the old code. Should
be:
[queue_number, vi->max_queue_pairs]

Otherwise LGTM, thanks!

Paolo

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-27 Thread Michael S. Tsirkin
On Thu, Jul 27, 2023 at 04:59:33PM +0800, Jason Wang wrote:
> > They really shouldn't - any NIC that takes forever to
> > program will create issues in the networking stack.
> 
> Unfortunately, it's not rare as the device/cvq could be implemented
> via firmware or software.

Currently that mean one either has sane firmware with a scheduler that
can meet deadlines, or loses ability to report errors back.

> > But if they do they can always set this flag too.
> 
> This may have false negatives and may confuse the management.
> 
> Maybe we can extend the networking core to allow some device specific
> configurations to be done with device specific lock without rtnl. For
> example, split the set_channels to
> 
> pre_set_channels
> set_channels
> post_set_channels
> 
> The device specific part could be done in pre and post without a rtnl lock?
> 
> Thanks


Would the benefit be that errors can be reported to userspace then?
Then maybe.  I think you will have to show how this works for at least
one card besides virtio.


-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-27 Thread Jason Wang
On Thu, Jul 27, 2023 at 2:10 PM Michael S. Tsirkin  wrote:
>
> On Thu, Jul 27, 2023 at 02:03:59PM +0800, Jason Wang wrote:
> > On Wed, Jul 26, 2023 at 7:38 PM Michael S. Tsirkin  wrote:
> > >
> > > On Wed, Jul 26, 2023 at 09:55:37AM +0800, Jason Wang wrote:
> > > > On Tue, Jul 25, 2023 at 3:36 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Tue, Jul 25, 2023 at 11:07:40AM +0800, Jason Wang wrote:
> > > > > > On Mon, Jul 24, 2023 at 3:17 PM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, Jul 24, 2023 at 02:52:05PM +0800, Jason Wang wrote:
> > > > > > > > On Sat, Jul 22, 2023 at 4:18 AM Maxime Coquelin
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 7/21/23 17:10, Michael S. Tsirkin wrote:
> > > > > > > > > > On Fri, Jul 21, 2023 at 04:58:04PM +0200, Maxime Coquelin 
> > > > > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> On 7/21/23 16:45, Michael S. Tsirkin wrote:
> > > > > > > > > >>> On Fri, Jul 21, 2023 at 04:37:00PM +0200, Maxime Coquelin 
> > > > > > > > > >>> wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >  On 7/20/23 23:02, Michael S. Tsirkin wrote:
> > > > > > > > > > On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon 
> > > > > > > > > > Nelson wrote:
> > > > > > > > > >> On 7/20/23 1:38 AM, Jason Wang wrote:
> > > > > > > > > >>>
> > > > > > > > > >>> Adding cond_resched() to the command waiting loop for 
> > > > > > > > > >>> a better
> > > > > > > > > >>> co-operation with the scheduler. This allows to give 
> > > > > > > > > >>> CPU a breath to
> > > > > > > > > >>> run other task(workqueue) instead of busy looping 
> > > > > > > > > >>> when preemption is
> > > > > > > > > >>> not allowed on a device whose CVQ might be slow.
> > > > > > > > > >>>
> > > > > > > > > >>> Signed-off-by: Jason Wang 
> > > > > > > > > >>
> > > > > > > > > >> This still leaves hung processes, but at least it 
> > > > > > > > > >> doesn't pin the CPU any
> > > > > > > > > >> more.  Thanks.
> > > > > > > > > >> Reviewed-by: Shannon Nelson 
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > > I'd like to see a full solution
> > > > > > > > > > 1- block until interrupt
> > > > > > > >
> > > > > > > > I remember in previous versions, you worried about the extra MSI
> > > > > > > > vector. (Maybe I was wrong).
> > > > > > > >
> > > > > > > > > 
> > > > > > > > >  Would it make sense to also have a timeout?
> > > > > > > > >  And when timeout expires, set FAILED bit in device 
> > > > > > > > >  status?
> > > > > > > > > >>>
> > > > > > > > > >>> virtio spec does not set any limits on the timing of vq
> > > > > > > > > >>> processing.
> > > > > > > > > >>
> > > > > > > > > >> Indeed, but I thought the driver could decide it is too 
> > > > > > > > > >> long for it.
> > > > > > > > > >>
> > > > > > > > > >> The issue is we keep waiting with rtnl locked, it can 
> > > > > > > > > >> quickly make the
> > > > > > > > > >> system unusable.
> > > > > > > > > >
> > > > > > > > > > if this is a problem we should find a way not to keep rtnl
> > > > > > > > > > locked indefinitely.
> > > > > > > >
> > > > > > > > Any ideas on this direction? Simply dropping rtnl during the 
> > > > > > > > busy loop
> > > > > > > > will result in a lot of races. This seems to require non-trivial
> > > > > > > > changes in the networking core.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >  From the tests I have done, I think it is. With OVS, a 
> > > > > > > > > reconfiguration
> > > > > > > > > is performed when the VDUSE device is added, and when a MLX5 
> > > > > > > > > device is
> > > > > > > > > in the same bridge, it ends up doing an ioctl() that tries to 
> > > > > > > > > take the
> > > > > > > > > rtnl lock. In this configuration, it is not possible to kill 
> > > > > > > > > OVS because
> > > > > > > > > it is stuck trying to acquire rtnl lock for mlx5 that is held 
> > > > > > > > > by virtio-
> > > > > > > > > net.
> > > > > > > >
> > > > > > > > Yeah, basically, any RTNL users would be blocked forever.
> > > > > > > >
> > > > > > > > And the infinite loop has other side effects like it blocks the 
> > > > > > > > freezer to work.
> > > > > > > >
> > > > > > > > To summarize, there are three issues
> > > > > > > >
> > > > > > > > 1) busy polling
> > > > > > > > 2) breaks freezer
> > > > > > > > 3) hold RTNL during the loop
> > > > > > > >
> > > > > > > > Solving 3 may help somehow for 2 e.g some pm routine e.g 
> > > > > > > > wireguard or
> > > > > > > > even virtnet_restore() itself may try to hold the lock.
> > > > > > >
> > > > > > > Yep. So my feeling currently is, the only real fix is to actually
> > > > > > > queue up the work in software.
> > > > > >
> > > > > > Do you mean something like:
> > > > > >
> > > > > > rtnl_lock();
> > > > > > queue up the work
> > > > 

Re: [PATCH net-next v3 4/4] vsock/virtio: MSG_ZEROCOPY flag support

2023-07-27 Thread Stefano Garzarella

On Thu, Jul 27, 2023 at 11:32:00AM +0300, Arseniy Krasnov wrote:

On 25.07.2023 15:28, Stefano Garzarella wrote:

On Tue, Jul 25, 2023 at 12:16:11PM +0300, Arseniy Krasnov wrote:

On 25.07.2023 11:46, Arseniy Krasnov wrote:

On 25.07.2023 11:43, Stefano Garzarella wrote:

On Fri, Jul 21, 2023 at 08:09:03AM +0300, Arseniy Krasnov wrote:


[...]


+    t = vsock_core_get_transport(info->vsk);

-    if (msg_data_left(info->msg) == 0 &&
-    info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) {
-    hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM);
+    if (t->msgzerocopy_check_iov &&
+    !t->msgzerocopy_check_iov(iov_iter))
+    return false;


I'd avoid adding a new transport callback used only internally in virtio
transports.


Ok, I see.



Usually the transport callbacks are used in af_vsock.c, if we need a
callback just for virtio transports, maybe better to add it in struct
virtio_vsock_pkt_info or struct virtio_vsock_sock.


Hm, may be I just need to move this callback from 'struct vsock_transport' to 
parent 'struct virtio_transport',
after 'send_pkt' callback. In this case:
1) AF_VSOCK part is not touched.
2) This callback stays in 'virtio_transport.c' and is set also in this file.
  vhost and loopback are unchanged - only 'send_pkt' still enabled in both
  files for these two transports.


Yep, this could also work!

Stefano


Great! I'll send this implementation when this patchset for MSG_PEEK will be 
merged
to net-next as both conflicts with each other.

https://lore.kernel.org/netdev/20230726060150-mutt-send-email-...@kernel.org/T/#m56f3b850361a412735616145162d2d9df25f6350


Ack!

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC net-next v5 00/14] virtio/vsock: support datagrams

2023-07-27 Thread Michael S. Tsirkin
On Wed, Jul 19, 2023 at 12:50:04AM +, Bobby Eshleman wrote:
> Hey all!
> 
> This series introduces support for datagrams to virtio/vsock.
> 
> It is a spin-off (and smaller version) of this series from the summer:
>   
> https://lore.kernel.org/all/cover.1660362668.git.bobby.eshle...@bytedance.com/
> 
> Please note that this is an RFC and should not be merged until
> associated changes are made to the virtio specification, which will
> follow after discussion from this series.
> 
> Another aside, the v4 of the series has only been mildly tested with a
> run of tools/testing/vsock/vsock_test. Some code likely needs cleaning
> up, but I'm hoping to get some of the design choices agreed upon before
> spending too much time making it pretty.
> 
> This series first supports datagrams in a basic form for virtio, and
> then optimizes the sendpath for all datagram transports.
> 
> The result is a very fast datagram communication protocol that
> outperforms even UDP on multi-queue virtio-net w/ vhost on a variety
> of multi-threaded workload samples.
> 
> For those that are curious, some summary data comparing UDP and VSOCK
> DGRAM (N=5):
> 
>   vCPUS: 16
>   virtio-net queues: 16
>   payload size: 4KB
>   Setup: bare metal + vm (non-nested)
> 
>   UDP: 287.59 MB/s
>   VSOCK DGRAM: 509.2 MB/s
> 
> Some notes about the implementation...
> 
> This datagram implementation forces datagrams to self-throttle according
> to the threshold set by sk_sndbuf. It behaves similar to the credits
> used by streams in its effect on throughput and memory consumption, but
> it is not influenced by the receiving socket as credits are.
> 
> The device drops packets silently.
> 
> As discussed previously, this series introduces datagrams and defers
> fairness to future work. See discussion in v2 for more context around
> datagrams, fairness, and this implementation.

it's a big thread - can't you summarize here?


> Signed-off-by: Bobby Eshleman 


could you give a bit more motivation? which applications do
you have in mind? for example, on localhost loopback datagrams
are actually reliable and a bunch of apps came to depend
on that even if they shouldn't.



> ---
> Changes in v5:
> - teach vhost to drop dgram when a datagram exceeds the receive buffer
>   - now uses MSG_ERRQUEUE and depends on Arseniy's zerocopy patch:
>   "vsock: read from socket's error queue"
> - replace multiple ->dgram_* callbacks with single ->dgram_addr_init()
>   callback
> - refactor virtio dgram skb allocator to reduce conflicts w/ zerocopy series
> - add _fallback/_FALLBACK suffix to dgram transport variables/macros
> - add WARN_ONCE() for table_size / VSOCK_HASH issue
> - add static to vsock_find_bound_socket_common
> - dedupe code in vsock_dgram_sendmsg() using module_got var
> - drop concurrent sendmsg() for dgram and defer to future series
> - Add more tests
>   - test EHOSTUNREACH in errqueue
>   - test stream + dgram address collision
> - improve clarity of dgram msg bounds test code
> - Link to v4: 
> https://lore.kernel.org/r/20230413-b4-vsock-dgram-v4-0-0cebbb2ae...@bytedance.com
> 
> Changes in v4:
> - style changes
>   - vsock: use sk_vsock(vsk) in vsock_dgram_recvmsg instead of
> >vsk
>   - vsock: fix xmas tree declaration
>   - vsock: fix spacing issues
>   - virtio/vsock: virtio_transport_recv_dgram returns void because err
> unused
> - sparse analysis warnings/errors
>   - virtio/vsock: fix unitialized skerr on destroy
>   - virtio/vsock: fix uninitialized err var on goto out
>   - vsock: fix declarations that need static
>   - vsock: fix __rcu annotation order
> - bugs
>   - vsock: fix null ptr in remote_info code
>   - vsock/dgram: make transport_dgram a fallback instead of first
> priority
>   - vsock: remove redundant rcu read lock acquire in getname()
> - tests
>   - add more tests (message bounds and more)
>   - add vsock_dgram_bind() helper
>   - add vsock_dgram_connect() helper
> 
> Changes in v3:
> - Support multi-transport dgram, changing logic in connect/bind
>   to support VMCI case
> - Support per-pkt transport lookup for sendto() case
> - Fix dgram_allow() implementation
> - Fix dgram feature bit number (now it is 3)
> - Fix binding so dgram and connectible (cid,port) spaces are
>   non-overlapping
> - RCU protect transport ptr so connect() calls never leave
>   a lockless read of the transport and remote_addr are always
>   in sync
> - Link to v2: 
> https://lore.kernel.org/r/20230413-b4-vsock-dgram-v2-0-079cc7cee...@bytedance.com
> 
> ---
> Bobby Eshleman (13):
>   af_vsock: generalize vsock_dgram_recvmsg() to all transports
>   af_vsock: refactor transport lookup code
>   af_vsock: support multi-transport datagrams
>   af_vsock: generalize bind table functions
>   af_vsock: use a separate dgram bind table
>   virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM
>   virtio/vsock: add common datagram send path
>   af_vsock: add vsock_find_bound_dgram_socket()
>  

Re: [PATCH RFC net-next v5 10/14] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit

2023-07-27 Thread Stefano Garzarella

On Wed, Jul 26, 2023 at 02:38:08PM -0400, Michael S. Tsirkin wrote:

On Wed, Jul 19, 2023 at 12:50:14AM +, Bobby Eshleman wrote:

This commit adds a feature bit for virtio vsock to support datagrams.

Signed-off-by: Jiang Wang 
Signed-off-by: Bobby Eshleman 
---
 include/uapi/linux/virtio_vsock.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/virtio_vsock.h 
b/include/uapi/linux/virtio_vsock.h
index 331be28b1d30..27b4b2b8bf13 100644
--- a/include/uapi/linux/virtio_vsock.h
+++ b/include/uapi/linux/virtio_vsock.h
@@ -40,6 +40,7 @@

 /* The feature bitmap for virtio vsock */
 #define VIRTIO_VSOCK_F_SEQPACKET   1   /* SOCK_SEQPACKET supported */
+#define VIRTIO_VSOCK_F_DGRAM   3   /* SOCK_DGRAM supported */

 struct virtio_vsock_config {
__le64 guest_cid;


pls do not add interface without first getting it accepted in the
virtio spec.


Yep, fortunatelly this series is still RFC.
I think by now we've seen that the implementation is doable, so we
should discuss the changes to the specification ASAP. Then we can
merge the series.

@Bobby can you start the discussion about spec changes?

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-27 Thread Michael S. Tsirkin
On Thu, Jul 27, 2023 at 02:03:59PM +0800, Jason Wang wrote:
> On Wed, Jul 26, 2023 at 7:38 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jul 26, 2023 at 09:55:37AM +0800, Jason Wang wrote:
> > > On Tue, Jul 25, 2023 at 3:36 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Tue, Jul 25, 2023 at 11:07:40AM +0800, Jason Wang wrote:
> > > > > On Mon, Jul 24, 2023 at 3:17 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Jul 24, 2023 at 02:52:05PM +0800, Jason Wang wrote:
> > > > > > > On Sat, Jul 22, 2023 at 4:18 AM Maxime Coquelin
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On 7/21/23 17:10, Michael S. Tsirkin wrote:
> > > > > > > > > On Fri, Jul 21, 2023 at 04:58:04PM +0200, Maxime Coquelin 
> > > > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On 7/21/23 16:45, Michael S. Tsirkin wrote:
> > > > > > > > >>> On Fri, Jul 21, 2023 at 04:37:00PM +0200, Maxime Coquelin 
> > > > > > > > >>> wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > >  On 7/20/23 23:02, Michael S. Tsirkin wrote:
> > > > > > > > > On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson 
> > > > > > > > > wrote:
> > > > > > > > >> On 7/20/23 1:38 AM, Jason Wang wrote:
> > > > > > > > >>>
> > > > > > > > >>> Adding cond_resched() to the command waiting loop for a 
> > > > > > > > >>> better
> > > > > > > > >>> co-operation with the scheduler. This allows to give 
> > > > > > > > >>> CPU a breath to
> > > > > > > > >>> run other task(workqueue) instead of busy looping when 
> > > > > > > > >>> preemption is
> > > > > > > > >>> not allowed on a device whose CVQ might be slow.
> > > > > > > > >>>
> > > > > > > > >>> Signed-off-by: Jason Wang 
> > > > > > > > >>
> > > > > > > > >> This still leaves hung processes, but at least it 
> > > > > > > > >> doesn't pin the CPU any
> > > > > > > > >> more.  Thanks.
> > > > > > > > >> Reviewed-by: Shannon Nelson 
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > > I'd like to see a full solution
> > > > > > > > > 1- block until interrupt
> > > > > > >
> > > > > > > I remember in previous versions, you worried about the extra MSI
> > > > > > > vector. (Maybe I was wrong).
> > > > > > >
> > > > > > > > 
> > > > > > > >  Would it make sense to also have a timeout?
> > > > > > > >  And when timeout expires, set FAILED bit in device status?
> > > > > > > > >>>
> > > > > > > > >>> virtio spec does not set any limits on the timing of vq
> > > > > > > > >>> processing.
> > > > > > > > >>
> > > > > > > > >> Indeed, but I thought the driver could decide it is too long 
> > > > > > > > >> for it.
> > > > > > > > >>
> > > > > > > > >> The issue is we keep waiting with rtnl locked, it can 
> > > > > > > > >> quickly make the
> > > > > > > > >> system unusable.
> > > > > > > > >
> > > > > > > > > if this is a problem we should find a way not to keep rtnl
> > > > > > > > > locked indefinitely.
> > > > > > >
> > > > > > > Any ideas on this direction? Simply dropping rtnl during the busy 
> > > > > > > loop
> > > > > > > will result in a lot of races. This seems to require non-trivial
> > > > > > > changes in the networking core.
> > > > > > >
> > > > > > > >
> > > > > > > >  From the tests I have done, I think it is. With OVS, a 
> > > > > > > > reconfiguration
> > > > > > > > is performed when the VDUSE device is added, and when a MLX5 
> > > > > > > > device is
> > > > > > > > in the same bridge, it ends up doing an ioctl() that tries to 
> > > > > > > > take the
> > > > > > > > rtnl lock. In this configuration, it is not possible to kill 
> > > > > > > > OVS because
> > > > > > > > it is stuck trying to acquire rtnl lock for mlx5 that is held 
> > > > > > > > by virtio-
> > > > > > > > net.
> > > > > > >
> > > > > > > Yeah, basically, any RTNL users would be blocked forever.
> > > > > > >
> > > > > > > And the infinite loop has other side effects like it blocks the 
> > > > > > > freezer to work.
> > > > > > >
> > > > > > > To summarize, there are three issues
> > > > > > >
> > > > > > > 1) busy polling
> > > > > > > 2) breaks freezer
> > > > > > > 3) hold RTNL during the loop
> > > > > > >
> > > > > > > Solving 3 may help somehow for 2 e.g some pm routine e.g 
> > > > > > > wireguard or
> > > > > > > even virtnet_restore() itself may try to hold the lock.
> > > > > >
> > > > > > Yep. So my feeling currently is, the only real fix is to actually
> > > > > > queue up the work in software.
> > > > >
> > > > > Do you mean something like:
> > > > >
> > > > > rtnl_lock();
> > > > > queue up the work
> > > > > rtnl_unlock();
> > > > > return success;
> > > > >
> > > > > ?
> > > >
> > > > yes
> > >
> > > We will lose the error reporting, is it a real problem or not?
> >
> > Fundamental isn't it? Maybe we want a per-device flag for a asynch commands,
> > and vduse will set it while hardware virtio won't.
> > 

Re: [PATCH] virtio-vdpa: Fix cpumask memory leak in virtio_vdpa_find_vqs()

2023-07-27 Thread Jason Wang
On Thu, Jul 27, 2023 at 3:11 AM Dragos Tatulea  wrote:
>
> From: Gal Pressman 
>
> Free the cpumask allocated by create_affinity_masks() before returning
> from the function.
>
> Fixes: 3dad56823b53 ("virtio-vdpa: Support interrupt affinity spreading 
> mechanism")
> Signed-off-by: Gal Pressman 
> Reviewed-by: Dragos Tatulea 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/virtio/virtio_vdpa.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> index 989e2d7184ce..961161da5900 100644
> --- a/drivers/virtio/virtio_vdpa.c
> +++ b/drivers/virtio/virtio_vdpa.c
> @@ -393,11 +393,13 @@ static int virtio_vdpa_find_vqs(struct virtio_device 
> *vdev, unsigned int nvqs,
> cb.callback = virtio_vdpa_config_cb;
> cb.private = vd_dev;
> ops->set_config_cb(vdpa, );
> +   kfree(masks);
>
> return 0;
>
>  err_setup_vq:
> virtio_vdpa_del_vqs(vdev);
> +   kfree(masks);
> return err;
>  }
>
> --
> 2.41.0
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-27 Thread Jason Wang
On Wed, Jul 26, 2023 at 7:38 PM Michael S. Tsirkin  wrote:
>
> On Wed, Jul 26, 2023 at 09:55:37AM +0800, Jason Wang wrote:
> > On Tue, Jul 25, 2023 at 3:36 PM Michael S. Tsirkin  wrote:
> > >
> > > On Tue, Jul 25, 2023 at 11:07:40AM +0800, Jason Wang wrote:
> > > > On Mon, Jul 24, 2023 at 3:17 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Mon, Jul 24, 2023 at 02:52:05PM +0800, Jason Wang wrote:
> > > > > > On Sat, Jul 22, 2023 at 4:18 AM Maxime Coquelin
> > > > > >  wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 7/21/23 17:10, Michael S. Tsirkin wrote:
> > > > > > > > On Fri, Jul 21, 2023 at 04:58:04PM +0200, Maxime Coquelin wrote:
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On 7/21/23 16:45, Michael S. Tsirkin wrote:
> > > > > > > >>> On Fri, Jul 21, 2023 at 04:37:00PM +0200, Maxime Coquelin 
> > > > > > > >>> wrote:
> > > > > > > 
> > > > > > > 
> > > > > > >  On 7/20/23 23:02, Michael S. Tsirkin wrote:
> > > > > > > > On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson 
> > > > > > > > wrote:
> > > > > > > >> On 7/20/23 1:38 AM, Jason Wang wrote:
> > > > > > > >>>
> > > > > > > >>> Adding cond_resched() to the command waiting loop for a 
> > > > > > > >>> better
> > > > > > > >>> co-operation with the scheduler. This allows to give CPU 
> > > > > > > >>> a breath to
> > > > > > > >>> run other task(workqueue) instead of busy looping when 
> > > > > > > >>> preemption is
> > > > > > > >>> not allowed on a device whose CVQ might be slow.
> > > > > > > >>>
> > > > > > > >>> Signed-off-by: Jason Wang 
> > > > > > > >>
> > > > > > > >> This still leaves hung processes, but at least it doesn't 
> > > > > > > >> pin the CPU any
> > > > > > > >> more.  Thanks.
> > > > > > > >> Reviewed-by: Shannon Nelson 
> > > > > > > >>
> > > > > > > >
> > > > > > > > I'd like to see a full solution
> > > > > > > > 1- block until interrupt
> > > > > >
> > > > > > I remember in previous versions, you worried about the extra MSI
> > > > > > vector. (Maybe I was wrong).
> > > > > >
> > > > > > > 
> > > > > > >  Would it make sense to also have a timeout?
> > > > > > >  And when timeout expires, set FAILED bit in device status?
> > > > > > > >>>
> > > > > > > >>> virtio spec does not set any limits on the timing of vq
> > > > > > > >>> processing.
> > > > > > > >>
> > > > > > > >> Indeed, but I thought the driver could decide it is too long 
> > > > > > > >> for it.
> > > > > > > >>
> > > > > > > >> The issue is we keep waiting with rtnl locked, it can quickly 
> > > > > > > >> make the
> > > > > > > >> system unusable.
> > > > > > > >
> > > > > > > > if this is a problem we should find a way not to keep rtnl
> > > > > > > > locked indefinitely.
> > > > > >
> > > > > > Any ideas on this direction? Simply dropping rtnl during the busy 
> > > > > > loop
> > > > > > will result in a lot of races. This seems to require non-trivial
> > > > > > changes in the networking core.
> > > > > >
> > > > > > >
> > > > > > >  From the tests I have done, I think it is. With OVS, a 
> > > > > > > reconfiguration
> > > > > > > is performed when the VDUSE device is added, and when a MLX5 
> > > > > > > device is
> > > > > > > in the same bridge, it ends up doing an ioctl() that tries to 
> > > > > > > take the
> > > > > > > rtnl lock. In this configuration, it is not possible to kill OVS 
> > > > > > > because
> > > > > > > it is stuck trying to acquire rtnl lock for mlx5 that is held by 
> > > > > > > virtio-
> > > > > > > net.
> > > > > >
> > > > > > Yeah, basically, any RTNL users would be blocked forever.
> > > > > >
> > > > > > And the infinite loop has other side effects like it blocks the 
> > > > > > freezer to work.
> > > > > >
> > > > > > To summarize, there are three issues
> > > > > >
> > > > > > 1) busy polling
> > > > > > 2) breaks freezer
> > > > > > 3) hold RTNL during the loop
> > > > > >
> > > > > > Solving 3 may help somehow for 2 e.g some pm routine e.g wireguard 
> > > > > > or
> > > > > > even virtnet_restore() itself may try to hold the lock.
> > > > >
> > > > > Yep. So my feeling currently is, the only real fix is to actually
> > > > > queue up the work in software.
> > > >
> > > > Do you mean something like:
> > > >
> > > > rtnl_lock();
> > > > queue up the work
> > > > rtnl_unlock();
> > > > return success;
> > > >
> > > > ?
> > >
> > > yes
> >
> > We will lose the error reporting, is it a real problem or not?
>
> Fundamental isn't it? Maybe we want a per-device flag for a asynch commands,
> and vduse will set it while hardware virtio won't.
> this way we only lose error reporting for vduse.

This problem is not VDUSE specific, DPUs/vDPA may suffer from this as
well. This might require more thoughts.

Thanks

>
> > >
> > >
> > > > > It's mostly trivial to limit
> > > > > memory consumption, vid's is the
> > > > > only one where it would make sense to have more than
>