Re: [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume

2024-08-02 Thread Michael S. Tsirkin
On Fri, Aug 02, 2024 at 10:20:17AM +0300, Dragos Tatulea wrote:
> This series parallelizes the mlx5_vdpa device suspend and resume
> operations through the firmware async API. The purpose is to reduce live
> migration downtime.
> 
> The series starts with changing the VQ suspend and resume commands
> to the async API. After that, the switch is made to issue multiple
> commands of the same type in parallel.
> 
> Finally, a bonus improvement is thrown in: keep the notifierd enabled
> during suspend but make it a NOP. Upon resume make sure that the link
> state is forwarded. This shaves around 30ms per device constant time.
> 
> For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
> x 2 threads per core), the improvements are:
> 
> +---+++---+
> | operation | Before | After  | Reduction |
> |---+++---|
> | mlx5_vdpa_suspend | 37 ms  | 2.5 ms | 14x   |
> | mlx5_vdpa_resume  | 16 ms  | 5 ms   |  3x   |
> +---+++---+
> 
> Note for the maintainers:
> The first patch contains changes for mlx5_core. This must be applied
> into the mlx5-vhost tree [0] first. Once this patch is applied on
> mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
> tree and only then the remaining patches can be applied.

Or maintainer just acks it and I apply directly.

Let me know when all this can happen.

> [0] 
> https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
> 
> Dragos Tatulea (7):
>   net/mlx5: Support throttled commands from async API
>   vdpa/mlx5: Introduce error logging function
>   vdpa/mlx5: Use async API for vq query command
>   vdpa/mlx5: Use async API for vq modify commands
>   vdpa/mlx5: Parallelize device suspend
>   vdpa/mlx5: Parallelize device resume
>   vdpa/mlx5: Keep notifiers during suspend but ignore
> 
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
>  drivers/vdpa/mlx5/core/mlx5_vdpa.h|   7 +
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 435 +-
>  3 files changed, 333 insertions(+), 130 deletions(-)
> 
> -- 
> 2.45.2




Re: [PATCH V4 net-next 3/3] virtio-net: synchronize operstate with admin state on up/down

2024-08-01 Thread Michael S. Tsirkin
On Thu, Aug 01, 2024 at 02:55:10PM +0800, Jason Wang wrote:
> On Thu, Aug 1, 2024 at 2:42 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, Aug 01, 2024 at 02:13:18PM +0800, Jason Wang wrote:
> > > On Thu, Aug 1, 2024 at 1:58 PM Michael S. Tsirkin  wrote:
> > > >
> > > > On Thu, Aug 01, 2024 at 10:16:00AM +0800, Jason Wang wrote:
> > > > > > > @@ -2885,6 +2886,25 @@ static void virtnet_cancel_dim(struct 
> > > > > > > virtnet_info *vi, struct dim *dim)
> > > > > > >   net_dim_work_cancel(dim);
> > > > > > >  }
> > > > > > >
> > > > > > > +static void virtnet_update_settings(struct virtnet_info *vi)
> > > > > > > +{
> > > > > > > + u32 speed;
> > > > > > > + u8 duplex;
> > > > > > > +
> > > > > > > + if (!virtio_has_feature(vi->vdev, 
> > > > > > > VIRTIO_NET_F_SPEED_DUPLEX))
> > > > > > > + return;
> > > > > > > +
> > > > > > > + virtio_cread_le(vi->vdev, struct virtio_net_config, speed, 
> > > > > > > );
> > > > > > > +
> > > > > > > + if (ethtool_validate_speed(speed))
> > > > > > > + vi->speed = speed;
> > > > > > > +
> > > > > > > + virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, 
> > > > > > > );
> > > > > > > +
> > > > > > > + if (ethtool_validate_duplex(duplex))
> > > > > > > + vi->duplex = duplex;
> > > > > > > +}
> > > > > > > +
> > > > > >
> > > > > > I already commented on this approach.  This is now invoked on each 
> > > > > > open,
> > > > > > lots of extra VM exits. No bueno, people are working hard to keep 
> > > > > > setup
> > > > > > overhead under control. Handle this in the config change interrupt -
> > > > > > your new infrastructure is perfect for this.
> > > > >
> > > > > No, in this version it doesn't. Config space read only happens if
> > > > > there's a pending config interrupt during ndo_open:
> > > > >
> > > > > +   if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> > > > > +   if (vi->status & VIRTIO_NET_S_LINK_UP)
> > > > > +   netif_carrier_on(vi->dev);
> > > > > +   virtio_config_driver_enable(vi->vdev);
> > > > > +   } else {
> > > > > +   vi->status = VIRTIO_NET_S_LINK_UP;
> > > > > +   netif_carrier_on(dev);
> > > > > +   virtnet_update_settings(vi);
> > > > > +   }
> > > >
> > > > Sorry for being unclear, I was referring to !VIRTIO_NET_F_STATUS.
> > > > I do not see why do we need to bother re-reading settings in this case 
> > > > at all,
> > > > status is not there, nothing much changes.
> > >
> > > Ok, let me remove it from the next version.
> > >
> > > >
> > > >
> > > > > >
> > > > > >
> > > > > > >  static int virtnet_open(struct net_device *dev)
> > > > > > >  {
> > > > > > >   struct virtnet_info *vi = netdev_priv(dev);
> > > > > > > @@ -2903,6 +2923,16 @@ static int virtnet_open(struct net_device 
> > > > > > > *dev)
> > > > > > >   goto err_enable_qp;
> > > > > > >   }
> > > > > > >
> > > > > > > + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> > > > > > > + if (vi->status & VIRTIO_NET_S_LINK_UP)
> > > > > > > + netif_carrier_on(vi->dev);
> > > > > > > + virtio_config_driver_enable(vi->vdev);
> > > > > > > + } else {
> > > > > > > + vi->status = VIRTIO_NET_S_LINK_UP;
> > > > > > > + netif_carrier_on(dev);
> > > > > > > + virtnet_update_settings(vi);
> > > > > > > + }
> > > >

Re: [PATCH V4 net-next 3/3] virtio-net: synchronize operstate with admin state on up/down

2024-08-01 Thread Michael S. Tsirkin
On Thu, Aug 01, 2024 at 02:13:49PM +0800, Jason Wang wrote:
> On Thu, Aug 1, 2024 at 2:06 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jul 31, 2024 at 10:59:47AM +0800, Jason Wang wrote:
> > > This patch synchronize operstate with admin state per RFC2863.
> > >
> > > This is done by trying to toggle the carrier upon open/close and
> > > synchronize with the config change work. This allows propagate status
> > > correctly to stacked devices like:
> > >
> > > ip link add link enp0s3 macvlan0 type macvlan
> > > ip link set link enp0s3 down
> > > ip link show
> > >
> > > Before this patch:
> > >
> > > 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN 
> > > mode DEFAULT group default qlen 1000
> > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> > > ..
> > > 5: macvlan0@enp0s3:  mtu 1500 
> > > qdisc noqueue state UP mode DEFAULT group default qlen 1000
> > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> > >
> > > After this patch:
> > >
> > > 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN 
> > > mode DEFAULT group default qlen 1000
> > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> > > ...
> > > 5: macvlan0@enp0s3:  mtu 1500 
> > > qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
> > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> > >
> > > Cc: Venkat Venkatsubra 
> > > Cc: Gia-Khanh Nguyen 
> > > Signed-off-by: Jason Wang 
> > > ---
> > >  drivers/net/virtio_net.c | 84 ++--
> > >  1 file changed, 54 insertions(+), 30 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 0383a3e136d6..0cb93261eba1 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -2878,6 +2878,7 @@ static int virtnet_enable_queue_pair(struct 
> > > virtnet_info *vi, int qp_index)
> > >   return err;
> > >  }
> > >
> > > +
> > >  static void virtnet_cancel_dim(struct virtnet_info *vi, struct dim *dim)
> > >  {
> > >   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_VQ_NOTF_COAL))
> > > @@ -2885,6 +2886,25 @@ static void virtnet_cancel_dim(struct virtnet_info 
> > > *vi, struct dim *dim)
> > >   net_dim_work_cancel(dim);
> > >  }
> > >
> > > +static void virtnet_update_settings(struct virtnet_info *vi)
> > > +{
> > > + u32 speed;
> > > + u8 duplex;
> > > +
> > > + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> > > + return;
> > > +
> > > + virtio_cread_le(vi->vdev, struct virtio_net_config, speed, );
> > > +
> > > + if (ethtool_validate_speed(speed))
> > > + vi->speed = speed;
> > > +
> > > + virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, 
> > > );
> > > +
> > > + if (ethtool_validate_duplex(duplex))
> > > + vi->duplex = duplex;
> > > +}
> > > +
> > >  static int virtnet_open(struct net_device *dev)
> > >  {
> > >   struct virtnet_info *vi = netdev_priv(dev);
> > > @@ -2903,6 +2923,16 @@ static int virtnet_open(struct net_device *dev)
> > >   goto err_enable_qp;
> > >   }
> > >
> > > + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> > > + if (vi->status & VIRTIO_NET_S_LINK_UP)
> > > + netif_carrier_on(vi->dev);
> > > + virtio_config_driver_enable(vi->vdev);
> > > + } else {
> > > + vi->status = VIRTIO_NET_S_LINK_UP;
> > > + netif_carrier_on(dev);
> > > + virtnet_update_settings(vi);
> > > + }
> > > +
> > >   return 0;
> > >
> > >  err_enable_qp:
> > > @@ -3381,12 +3411,18 @@ static int virtnet_close(struct net_device *dev)
> > >   disable_delayed_refill(vi);
> > >   /* Make sure refill_work doesn't re-enable napi! */
> > >   cancel_delayed_work_sync(>refill);
> > > + /* Make sure config notification doesn't schedule config work */
> > > + virtio_config_driver_disable(vi->vdev);
> > > + /* Make sure status upda

Re: [PATCH V4 net-next 3/3] virtio-net: synchronize operstate with admin state on up/down

2024-08-01 Thread Michael S. Tsirkin
On Thu, Aug 01, 2024 at 02:13:18PM +0800, Jason Wang wrote:
> On Thu, Aug 1, 2024 at 1:58 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, Aug 01, 2024 at 10:16:00AM +0800, Jason Wang wrote:
> > > > > @@ -2885,6 +2886,25 @@ static void virtnet_cancel_dim(struct 
> > > > > virtnet_info *vi, struct dim *dim)
> > > > >   net_dim_work_cancel(dim);
> > > > >  }
> > > > >
> > > > > +static void virtnet_update_settings(struct virtnet_info *vi)
> > > > > +{
> > > > > + u32 speed;
> > > > > + u8 duplex;
> > > > > +
> > > > > + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> > > > > + return;
> > > > > +
> > > > > + virtio_cread_le(vi->vdev, struct virtio_net_config, speed, 
> > > > > );
> > > > > +
> > > > > + if (ethtool_validate_speed(speed))
> > > > > + vi->speed = speed;
> > > > > +
> > > > > + virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, 
> > > > > );
> > > > > +
> > > > > + if (ethtool_validate_duplex(duplex))
> > > > > + vi->duplex = duplex;
> > > > > +}
> > > > > +
> > > >
> > > > I already commented on this approach.  This is now invoked on each open,
> > > > lots of extra VM exits. No bueno, people are working hard to keep setup
> > > > overhead under control. Handle this in the config change interrupt -
> > > > your new infrastructure is perfect for this.
> > >
> > > No, in this version it doesn't. Config space read only happens if
> > > there's a pending config interrupt during ndo_open:
> > >
> > > +   if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> > > +   if (vi->status & VIRTIO_NET_S_LINK_UP)
> > > +   netif_carrier_on(vi->dev);
> > > +   virtio_config_driver_enable(vi->vdev);
> > > +   } else {
> > > +   vi->status = VIRTIO_NET_S_LINK_UP;
> > > +   netif_carrier_on(dev);
> > > +   virtnet_update_settings(vi);
> > > +   }
> >
> > Sorry for being unclear, I was referring to !VIRTIO_NET_F_STATUS.
> > I do not see why do we need to bother re-reading settings in this case at 
> > all,
> > status is not there, nothing much changes.
> 
> Ok, let me remove it from the next version.
> 
> >
> >
> > > >
> > > >
> > > > >  static int virtnet_open(struct net_device *dev)
> > > > >  {
> > > > >   struct virtnet_info *vi = netdev_priv(dev);
> > > > > @@ -2903,6 +2923,16 @@ static int virtnet_open(struct net_device *dev)
> > > > >   goto err_enable_qp;
> > > > >   }
> > > > >
> > > > > + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> > > > > + if (vi->status & VIRTIO_NET_S_LINK_UP)
> > > > > + netif_carrier_on(vi->dev);
> > > > > + virtio_config_driver_enable(vi->vdev);
> > > > > + } else {
> > > > > + vi->status = VIRTIO_NET_S_LINK_UP;
> > > > > + netif_carrier_on(dev);
> > > > > + virtnet_update_settings(vi);
> > > > > + }
> > > > > +
> > > > >   return 0;
> > > > >
> > > > >  err_enable_qp:
> > > > > @@ -3381,12 +3411,18 @@ static int virtnet_close(struct net_device 
> > > > > *dev)
> > > > >   disable_delayed_refill(vi);
> > > > >   /* Make sure refill_work doesn't re-enable napi! */
> > > > >   cancel_delayed_work_sync(>refill);
> > > > > + /* Make sure config notification doesn't schedule config work */
> > > >
> > > > it's clear what this does even without a comment.
> > > > what you should comment on, and do not, is *why*.
> > >
> > > Well, it just follows the existing style, for example the above said
> > >
> > > "/* Make sure refill_work doesn't re-enable napi! */"
> >
> > only at the grammar level.
> > you don't see the difference?
> >
> > /* Make sure refill_work doesn't re-enable napi! */
> > cancel_delayed_work_sync(>refill);
> >
> > it explains why we cancel: to avoid re-enabling napi.
> >
> > why do you cancel config callback and work?
> > comment should say that.
> 
> Something like "Prevent the config change callback from changing
> carrier after close"?


sounds good.

> >
> >
> >
> > > >
> > > > > + virtio_config_driver_disable(vi->vdev);
> > > > > + /* Make sure status updating is cancelled */
> > > >
> > > > same
> > > >
> > > > also what "status updating"? confuses more than this clarifies.
> > >
> > > Does "Make sure the config changed work is cancelled" sounds better?
> >
> > no, this just repeats what code does.
> > explain why you cancel it.
> 
> Does something like "Make sure carrier changes have been done by the
> config change callback" works?
> 
> Thanks

I don't understand what this means.

> >
> >
> >
> > --
> > MST
> >




Re: [PATCH V4 net-next 3/3] virtio-net: synchronize operstate with admin state on up/down

2024-08-01 Thread Michael S. Tsirkin
On Wed, Jul 31, 2024 at 10:59:47AM +0800, Jason Wang wrote:
> This patch synchronize operstate with admin state per RFC2863.
> 
> This is done by trying to toggle the carrier upon open/close and
> synchronize with the config change work. This allows propagate status
> correctly to stacked devices like:
> 
> ip link add link enp0s3 macvlan0 type macvlan
> ip link set link enp0s3 down
> ip link show
> 
> Before this patch:
> 
> 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN mode 
> DEFAULT group default qlen 1000
> link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> ..
> 5: macvlan0@enp0s3:  mtu 1500 qdisc 
> noqueue state UP mode DEFAULT group default qlen 1000
> link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> 
> After this patch:
> 
> 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN mode 
> DEFAULT group default qlen 1000
> link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> ...
> 5: macvlan0@enp0s3:  mtu 1500 qdisc 
> noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
> link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> 
> Cc: Venkat Venkatsubra 
> Cc: Gia-Khanh Nguyen 
> Signed-off-by: Jason Wang 
> ---
>  drivers/net/virtio_net.c | 84 ++--
>  1 file changed, 54 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0383a3e136d6..0cb93261eba1 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2878,6 +2878,7 @@ static int virtnet_enable_queue_pair(struct 
> virtnet_info *vi, int qp_index)
>   return err;
>  }
>  
> +
>  static void virtnet_cancel_dim(struct virtnet_info *vi, struct dim *dim)
>  {
>   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_VQ_NOTF_COAL))
> @@ -2885,6 +2886,25 @@ static void virtnet_cancel_dim(struct virtnet_info 
> *vi, struct dim *dim)
>   net_dim_work_cancel(dim);
>  }
>  
> +static void virtnet_update_settings(struct virtnet_info *vi)
> +{
> + u32 speed;
> + u8 duplex;
> +
> + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> + return;
> +
> + virtio_cread_le(vi->vdev, struct virtio_net_config, speed, );
> +
> + if (ethtool_validate_speed(speed))
> + vi->speed = speed;
> +
> + virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, );
> +
> + if (ethtool_validate_duplex(duplex))
> + vi->duplex = duplex;
> +}
> +
>  static int virtnet_open(struct net_device *dev)
>  {
>   struct virtnet_info *vi = netdev_priv(dev);
> @@ -2903,6 +2923,16 @@ static int virtnet_open(struct net_device *dev)
>   goto err_enable_qp;
>   }
>  
> + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> + if (vi->status & VIRTIO_NET_S_LINK_UP)
> + netif_carrier_on(vi->dev);
> + virtio_config_driver_enable(vi->vdev);
> + } else {
> + vi->status = VIRTIO_NET_S_LINK_UP;
> + netif_carrier_on(dev);
> + virtnet_update_settings(vi);
> + }
> +
>   return 0;
>  
>  err_enable_qp:
> @@ -3381,12 +3411,18 @@ static int virtnet_close(struct net_device *dev)
>   disable_delayed_refill(vi);
>   /* Make sure refill_work doesn't re-enable napi! */
>   cancel_delayed_work_sync(>refill);
> + /* Make sure config notification doesn't schedule config work */
> + virtio_config_driver_disable(vi->vdev);
> + /* Make sure status updating is cancelled */
> + cancel_work_sync(>config_work);
>  
>   for (i = 0; i < vi->max_queue_pairs; i++) {
>   virtnet_disable_queue_pair(vi, i);
>   virtnet_cancel_dim(vi, >rq[i].dim);
>   }
>  
> + netif_carrier_off(dev);
> +
>   return 0;
>  }
>  
> @@ -5085,25 +5121,6 @@ static void virtnet_init_settings(struct net_device 
> *dev)
>   vi->duplex = DUPLEX_UNKNOWN;
>  }
>  
> -static void virtnet_update_settings(struct virtnet_info *vi)
> -{
> - u32 speed;
> - u8 duplex;
> -
> - if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> - return;
> -
> - virtio_cread_le(vi->vdev, struct virtio_net_config, speed, );
> -
> - if (ethtool_validate_speed(speed))
> - vi->speed = speed;
> -
> - virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, );
> -
> - if (ethtool_validate_duplex(duplex))
> - vi->duplex = duplex;
> -}
> -
>  static u32 virtnet_get_rxfh_key_size(struct net_device *dev)
>  {
>   return ((struct virtnet_info *)netdev_priv(dev))->rss_key_size;
> @@ -6514,6 +6531,11 @@ static int virtnet_probe(struct virtio_device *vdev)
>   goto free_failover;
>   }
>  
> + /* Forbid config change notification until ndo_open. */
> + virtio_config_driver_disable(vi->vdev);
> + /* Make sure status updating work is done */

Wait a second, how can anything run here, this is probe,
config change callbacks are never invoked at all.

> + 

Re: [PATCH V4 net-next 3/3] virtio-net: synchronize operstate with admin state on up/down

2024-07-31 Thread Michael S. Tsirkin
On Thu, Aug 01, 2024 at 10:16:00AM +0800, Jason Wang wrote:
> > > @@ -2885,6 +2886,25 @@ static void virtnet_cancel_dim(struct virtnet_info 
> > > *vi, struct dim *dim)
> > >   net_dim_work_cancel(dim);
> > >  }
> > >
> > > +static void virtnet_update_settings(struct virtnet_info *vi)
> > > +{
> > > + u32 speed;
> > > + u8 duplex;
> > > +
> > > + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> > > + return;
> > > +
> > > + virtio_cread_le(vi->vdev, struct virtio_net_config, speed, );
> > > +
> > > + if (ethtool_validate_speed(speed))
> > > + vi->speed = speed;
> > > +
> > > + virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, 
> > > );
> > > +
> > > + if (ethtool_validate_duplex(duplex))
> > > + vi->duplex = duplex;
> > > +}
> > > +
> >
> > I already commented on this approach.  This is now invoked on each open,
> > lots of extra VM exits. No bueno, people are working hard to keep setup
> > overhead under control. Handle this in the config change interrupt -
> > your new infrastructure is perfect for this.
> 
> No, in this version it doesn't. Config space read only happens if
> there's a pending config interrupt during ndo_open:
> 
> +   if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> +   if (vi->status & VIRTIO_NET_S_LINK_UP)
> +   netif_carrier_on(vi->dev);
> +   virtio_config_driver_enable(vi->vdev);
> +   } else {
> +   vi->status = VIRTIO_NET_S_LINK_UP;
> +   netif_carrier_on(dev);
> +   virtnet_update_settings(vi);
> +   }

Sorry for being unclear, I was referring to !VIRTIO_NET_F_STATUS.
I do not see why do we need to bother re-reading settings in this case at all,
status is not there, nothing much changes.


> >
> >
> > >  static int virtnet_open(struct net_device *dev)
> > >  {
> > >   struct virtnet_info *vi = netdev_priv(dev);
> > > @@ -2903,6 +2923,16 @@ static int virtnet_open(struct net_device *dev)
> > >   goto err_enable_qp;
> > >   }
> > >
> > > + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> > > + if (vi->status & VIRTIO_NET_S_LINK_UP)
> > > + netif_carrier_on(vi->dev);
> > > + virtio_config_driver_enable(vi->vdev);
> > > + } else {
> > > + vi->status = VIRTIO_NET_S_LINK_UP;
> > > + netif_carrier_on(dev);
> > > + virtnet_update_settings(vi);
> > > + }
> > > +
> > >   return 0;
> > >
> > >  err_enable_qp:
> > > @@ -3381,12 +3411,18 @@ static int virtnet_close(struct net_device *dev)
> > >   disable_delayed_refill(vi);
> > >   /* Make sure refill_work doesn't re-enable napi! */
> > >   cancel_delayed_work_sync(>refill);
> > > + /* Make sure config notification doesn't schedule config work */
> >
> > it's clear what this does even without a comment.
> > what you should comment on, and do not, is *why*.
> 
> Well, it just follows the existing style, for example the above said
> 
> "/* Make sure refill_work doesn't re-enable napi! */"

only at the grammar level.
you don't see the difference?

/* Make sure refill_work doesn't re-enable napi! */
cancel_delayed_work_sync(>refill);

it explains why we cancel: to avoid re-enabling napi.

why do you cancel config callback and work?
comment should say that.



> >
> > > + virtio_config_driver_disable(vi->vdev);
> > > + /* Make sure status updating is cancelled */
> >
> > same
> >
> > also what "status updating"? confuses more than this clarifies.
> 
> Does "Make sure the config changed work is cancelled" sounds better?

no, this just repeats what code does.
explain why you cancel it.



-- 
MST




Re: [PATCH V4 net-next 3/3] virtio-net: synchronize operstate with admin state on up/down

2024-07-31 Thread Michael S. Tsirkin
On Wed, Jul 31, 2024 at 10:59:47AM +0800, Jason Wang wrote:
> This patch synchronize operstate with admin state per RFC2863.
> 
> This is done by trying to toggle the carrier upon open/close and
> synchronize with the config change work. This allows propagate status
> correctly to stacked devices like:
> 
> ip link add link enp0s3 macvlan0 type macvlan
> ip link set link enp0s3 down
> ip link show
> 
> Before this patch:
> 
> 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN mode 
> DEFAULT group default qlen 1000
> link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> ..
> 5: macvlan0@enp0s3:  mtu 1500 qdisc 
> noqueue state UP mode DEFAULT group default qlen 1000
> link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> 
> After this patch:
> 
> 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN mode 
> DEFAULT group default qlen 1000
> link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> ...
> 5: macvlan0@enp0s3:  mtu 1500 qdisc 
> noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
> link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> 
> Cc: Venkat Venkatsubra 
> Cc: Gia-Khanh Nguyen 
> Signed-off-by: Jason Wang 

Changelog?

> ---
>  drivers/net/virtio_net.c | 84 ++--
>  1 file changed, 54 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0383a3e136d6..0cb93261eba1 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2878,6 +2878,7 @@ static int virtnet_enable_queue_pair(struct 
> virtnet_info *vi, int qp_index)
>   return err;
>  }
>  
> +
>  static void virtnet_cancel_dim(struct virtnet_info *vi, struct dim *dim)
>  {
>   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_VQ_NOTF_COAL))

hmm

> @@ -2885,6 +2886,25 @@ static void virtnet_cancel_dim(struct virtnet_info 
> *vi, struct dim *dim)
>   net_dim_work_cancel(dim);
>  }
>  
> +static void virtnet_update_settings(struct virtnet_info *vi)
> +{
> + u32 speed;
> + u8 duplex;
> +
> + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> + return;
> +
> + virtio_cread_le(vi->vdev, struct virtio_net_config, speed, );
> +
> + if (ethtool_validate_speed(speed))
> + vi->speed = speed;
> +
> + virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, );
> +
> + if (ethtool_validate_duplex(duplex))
> + vi->duplex = duplex;
> +}
> +

I already commented on this approach.  This is now invoked on each open,
lots of extra VM exits. No bueno, people are working hard to keep setup
overhead under control. Handle this in the config change interrupt -
your new infrastructure is perfect for this.


>  static int virtnet_open(struct net_device *dev)
>  {
>   struct virtnet_info *vi = netdev_priv(dev);
> @@ -2903,6 +2923,16 @@ static int virtnet_open(struct net_device *dev)
>   goto err_enable_qp;
>   }
>  
> + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> + if (vi->status & VIRTIO_NET_S_LINK_UP)
> + netif_carrier_on(vi->dev);
> + virtio_config_driver_enable(vi->vdev);
> + } else {
> + vi->status = VIRTIO_NET_S_LINK_UP;
> + netif_carrier_on(dev);
> + virtnet_update_settings(vi);
> + }
> +
>   return 0;
>  
>  err_enable_qp:
> @@ -3381,12 +3411,18 @@ static int virtnet_close(struct net_device *dev)
>   disable_delayed_refill(vi);
>   /* Make sure refill_work doesn't re-enable napi! */
>   cancel_delayed_work_sync(>refill);
> + /* Make sure config notification doesn't schedule config work */

it's clear what this does even without a comment.
what you should comment on, and do not, is *why*.

> + virtio_config_driver_disable(vi->vdev);
> + /* Make sure status updating is cancelled */

same

also what "status updating"? confuses more than this clarifies.

> + cancel_work_sync(>config_work);
>  
>   for (i = 0; i < vi->max_queue_pairs; i++) {
>   virtnet_disable_queue_pair(vi, i);
>   virtnet_cancel_dim(vi, >rq[i].dim);
>   }
>  
> + netif_carrier_off(dev);
> +
>   return 0;
>  }
>  
> @@ -5085,25 +5121,6 @@ static void virtnet_init_settings(struct net_device 
> *dev)
>   vi->duplex = DUPLEX_UNKNOWN;
>  }
>  
> -static void virtnet_update_settings(struct virtnet_info *vi)
> -{
> - u32 speed;
> - u8 duplex;
> -
> - if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> - return;
> -
> - virtio_cread_le(vi->vdev, struct virtio_net_config, speed, );
> -
> - if (ethtool_validate_speed(speed))
> - vi->speed = speed;
> -
> - virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, );
> -
> - if (ethtool_validate_duplex(duplex))
> - vi->duplex = duplex;
> -}
> -
>  static u32 virtnet_get_rxfh_key_size(struct net_device *dev)
>  {
>   return ((struct virtnet_info 

Re: [PATCH v1] MAINTAINERS: add me as reviewer of AF_VSOCK and virtio-vsock

2024-07-30 Thread Michael S. Tsirkin
On Tue, Jul 30, 2024 at 08:47:07AM -0700, Jakub Kicinski wrote:
> On Sun, 28 Jul 2024 21:33:25 +0300 Arseniy Krasnov wrote:
> > I'm working on AF_VSOCK and virtio-vsock.
> 
> If you want to review the code perhaps you can use lore+lei
> and filter on the paths?
> 
> Adding people to MAINTAINERS is somewhat fraught.

Arseniy's not a newbie in vsock, but yes, I'd like to first
see some reviews before we make this formal ;)




Re: [PATCH v3] ptp: Add vDSO-style vmclock support

2024-07-29 Thread Michael S. Tsirkin
On Mon, Jul 29, 2024 at 11:42:22AM +0100, David Woodhouse wrote:
> +struct vmclock_abi {
> + /* CONSTANT FIELDS */
> + uint32_t magic;
> +#define VMCLOCK_MAGIC0x4b4c4356 /* "VCLK" */
> + uint32_t size;  /* Size of region containing this structure */
> + uint16_t version;   /* 1 */
> + uint8_t counter_id; /* Matches VIRTIO_RTC_COUNTER_xxx except INVALID */
> +#define VMCLOCK_COUNTER_ARM_VCNT 0
> +#define VMCLOCK_COUNTER_X86_TSC  1
> +#define VMCLOCK_COUNTER_INVALID  0xff
> + uint8_t time_type; /* Matches VIRTIO_RTC_TYPE_xxx */
> +#define VMCLOCK_TIME_UTC 0   /* Since 1970-01-01 
> 00:00:00z */
> +#define VMCLOCK_TIME_TAI 1   /* Since 1970-01-01 
> 00:00:00z */
> +#define VMCLOCK_TIME_MONOTONIC   2   /* Since 
> undefined epoch */
> +#define VMCLOCK_TIME_INVALID_SMEARED 3   /* Not supported */
> +#define VMCLOCK_TIME_INVALID_MAYBE_SMEARED   4   /* Not supported */
> +
> + /* NON-CONSTANT FIELDS PROTECTED BY SEQCOUNT LOCK */
> + uint32_t seq_count; /* Low bit means an update is in progress */
> + /*
> +  * This field changes to another non-repeating value when the CPU
> +  * counter is disrupted, for example on live migration. This lets
> +  * the guest know that it should discard any calibration it has
> +  * performed of the counter against external sources (NTP/PTP/etc.).
> +  */
> + uint64_t disruption_marker;
> + uint64_t flags;
> + /* Indicates that the tai_offset_sec field is valid */
> +#define VMCLOCK_FLAG_TAI_OFFSET_VALID(1 << 0)
> + /*
> +  * Optionally used to notify guests of pending maintenance events.
> +  * A guest which provides latency-sensitive services may wish to
> +  * remove itself from service if an event is coming up. Two flags
> +  * indicate the approximate imminence of the event.
> +  */
> +#define VMCLOCK_FLAG_DISRUPTION_SOON (1 << 1) /* About a day */
> +#define VMCLOCK_FLAG_DISRUPTION_IMMINENT (1 << 2) /* About an hour */
> +#define VMCLOCK_FLAG_PERIOD_ESTERROR_VALID   (1 << 3)
> +#define VMCLOCK_FLAG_PERIOD_MAXERROR_VALID   (1 << 4)
> +#define VMCLOCK_FLAG_TIME_ESTERROR_VALID (1 << 5)
> +#define VMCLOCK_FLAG_TIME_MAXERROR_VALID (1 << 6)
> + /*
> +  * If the MONOTONIC flag is set then (other than leap seconds) it is
> +  * guaranteed that the time calculated according this structure at
> +  * any given moment shall never appear to be later than the time
> +  * calculated via the structure at any *later* moment.
> +  *
> +  * In particular, a timestamp based on a counter reading taken
> +  * immediately after setting the low bit of seq_count (and the
> +  * associated memory barrier), using the previously-valid time and
> +  * period fields, shall never be later than a timestamp based on
> +  * a counter reading taken immediately before *clearing* the low
> +  * bit again after the update, using the about-to-be-valid fields.
> +  */
> +#define VMCLOCK_FLAG_TIME_MONOTONIC  (1 << 7)
> +
> + uint8_t pad[2];
> + uint8_t clock_status;
> +#define VMCLOCK_STATUS_UNKNOWN   0
> +#define VMCLOCK_STATUS_INITIALIZING  1
> +#define VMCLOCK_STATUS_SYNCHRONIZED  2
> +#define VMCLOCK_STATUS_FREERUNNING   3
> +#define VMCLOCK_STATUS_UNRELIABLE4
> +
> + /*
> +  * The time exposed through this device is never smeared. This field
> +  * corresponds to the 'subtype' field in virtio-rtc, which indicates
> +  * the smearing method. However in this case it provides a *hint* to
> +  * the guest operating system, such that *if* the guest OS wants to
> +  * provide its users with an alternative clock which does not follow
> +  * UTC, it may do so in a fashion consistent with the other systems
> +  * in the nearby environment.
> +  */
> + uint8_t leap_second_smearing_hint; /* Matches VIRTIO_RTC_SUBTYPE_xxx */
> +#define VMCLOCK_SMEARING_STRICT  0
> +#define VMCLOCK_SMEARING_NOON_LINEAR 1
> +#define VMCLOCK_SMEARING_UTC_SLS 2
> + int16_t tai_offset_sec;
> + uint8_t leap_indicator;
> + /*
> +  * This field is based on the the VIRTIO_RTC_LEAP_xxx values as
> +  * defined in the current draft of virtio-rtc, but since smearing
> +  * cannot be used with the shared memory device, some values are
> +  * not used.
> +  *
> +  * The _POST_POS and _POST_NEG values allow the guest to perform
> +  * its own smearing during the day or so after a leap second when
> +  * such smearing may need to continue being applied for a leap
> +  * second which is now theoretically "historical".
> +  */
> +#define VMCLOCK_LEAP_NONE0x00/* No known nearby leap second */
> +#define VMCLOCK_LEAP_PRE_POS 0x01/* Positive leap second at EOM */
> +#define VMCLOCK_LEAP_PRE_NEG 

[GIT PULL] virtio: fixes for rc1

2024-07-28 Thread Michael S. Tsirkin


The biggest thing here is the adminq change - but it looks like
the only way to avoid headq blocking causing indefinite stalls.


The following changes since commit 6c85d6b653caeba2ef982925703cbb4f2b3b3163:

  virtio: rename virtio_find_vqs_info() to virtio_find_vqs() (2024-07-17 
05:20:58 -0400)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 6d834691da474ed1c648753d3d3a3ef8379fa1c1:

  virtio_pci_modern: remove admin queue serialization lock (2024-07-17 05:43:21 
-0400)


virtio: fixes

This fixes 3 issues:
- prevent admin commands on one VF blocking another:
  fixes a huge scalability issue with large # of VFs
- correctly return error on command failure on octeon
  fixes a corruption if any commands fail
- fix modpost warning when building virtio_dma_buf
  harmless, but the fix is trivial

Signed-off-by: Michael S. Tsirkin 


Dan Carpenter (1):
  vdpa/octeon_ep: Fix error code in octep_process_mbox()

Jeff Johnson (1):
  virtio: add missing MODULE_DESCRIPTION() macro

Jiri Pirko (13):
  virtio_pci: push out single vq find code to vp_find_one_vq_msix()
  virtio_pci: simplify vp_request_msix_vectors() call a bit
  virtio_pci: pass vector policy enum to vp_find_vqs_msix()
  virtio_pci: pass vector policy enum to vp_find_one_vq_msix()
  virtio_pci: introduce vector allocation fallback for slow path virtqueues
  virtio_pci_modern: treat vp_dev->admin_vq.info.vq pointer as static
  virtio: push out code to vp_avq_index()
  virtio_pci: pass vq info as an argument to vp_setup_vq()
  virtio: create admin queues alongside other virtqueues
  virtio_pci_modern: create admin queue of queried size
  virtio_pci_modern: pass cmd as an identification token
  virtio_pci_modern: use completion instead of busy loop to wait on admin 
cmd result
  virtio_pci_modern: remove admin queue serialization lock

 drivers/vdpa/octeon_ep/octep_vdpa_hw.c |   2 +-
 drivers/virtio/virtio.c|  28 +
 drivers/virtio/virtio_dma_buf.c|   1 +
 drivers/virtio/virtio_pci_common.c | 192 ++---
 drivers/virtio/virtio_pci_common.h |  16 +--
 drivers/virtio/virtio_pci_modern.c | 161 +--
 include/linux/virtio.h |   3 +
 include/linux/virtio_config.h  |   4 -
 8 files changed, 243 insertions(+), 164 deletions(-)




Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-28 Thread Michael S. Tsirkin
On Sun, Jul 28, 2024 at 02:07:01PM +0100, David Woodhouse wrote:
> On 28 July 2024 11:37:04 BST, "Michael S. Tsirkin"  wrote:
> >Glad you asked :)
> 
> Heh, I'm not sure I'm so glad. Did I mention I hate ACPI? Perhaps it's still 
> not too late for me just to define a DT binding and use PRP0001 for it :)
> 
> >Long story short, QEMUVGID is indeed out of spec, but it works
> >both because of guest compatibility with ACPI 1.0, and because no one
> >much uses it.
> 
> 
> I think it's reasonable enough to follow that example and use AMZNVCLK (or 
> QEMUVCLK, but there seems little point in both) then?

I'd stick to spec. If you like puns, QEMUC10C maybe?




Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-28 Thread Michael S. Tsirkin
On Fri, Jul 26, 2024 at 07:28:28PM +0100, David Woodhouse wrote:
> On 26 July 2024 17:49:58 BST, Jonathan Cameron  
> wrote:
> >On Thu, 25 Jul 2024 14:50:50 +0100
> >David Woodhouse  wrote:
> >
> >> On Thu, 2024-07-25 at 08:33 -0400, Michael S. Tsirkin wrote:
> >> > On Thu, Jul 25, 2024 at 01:31:19PM +0100, David Woodhouse wrote:  
> >> > > On Thu, 2024-07-25 at 08:29 -0400, Michael S. Tsirkin wrote:  
> >> > > > On Thu, Jul 25, 2024 at 01:27:49PM +0100, David Woodhouse wrote:  
> >> > > > > On Thu, 2024-07-25 at 08:17 -0400, Michael S. Tsirkin wrote:  
> >> > > > > > On Thu, Jul 25, 2024 at 10:56:05AM +0100, David Woodhouse wrote: 
> >> > > > > >  
> >> > > > > > > > Do you want to just help complete virtio-rtc then? Would be 
> >> > > > > > > > easier than
> >> > > > > > > > trying to keep two specs in sync.  
> >> > > > > > > 
> >> > > > > > > The ACPI version is much more lightweight and doesn't take up a
> >> > > > > > > valuable PCI slot#. (I know, you can do virtio without PCI but 
> >> > > > > > > that's
> >> > > > > > > complex in other ways).
> >> > > > > > >   
> >> > > > > > 
> >> > > > > > Hmm, should we support virtio over ACPI? Just asking.  
> >> > > > > 
> >> > > > > Given that we support virtio DT bindings, and the ACPI "PRP0001" 
> >> > > > > device
> >> > > > > exists with a DSM method which literally returns DT properties,
> >> > > > > including such properties as "compatible=virtio,mmio" ... do we
> >> > > > > already?
> >> > > > > 
> >> > > > >   
> >> > > > 
> >> > > > In a sense, but you are saying that is too complex?
> >> > > > Can you elaborate?  
> >> > > 
> >> > > No, I think it's fine. I encourage the use of the PRP0001 device to
> >> > > expose DT devices through ACPI. I was just reminding you of its
> >> > > existence.  
> >> > 
> >> > Confused. You said "I know, you can do virtio without PCI but that's
> >> > complex in other ways" as the explanation why you are doing a custom
> >> > protocol.  
> >> 
> >> Ah, apologies, I wasn't thinking that far back in the conversation.
> >> 
> >> If we wanted to support virtio over ACPI, I think PRP0001 can be made
> >> to work and isn't too complex (even though it probably doesn't yet work
> >> out of the box).
> >> 
> >> But for the VMCLOCK thing, yes, the simple ACPI device is a lot simpler
> >> than virtio-rtc and much more attractive.
> >> 
> >> Even if the virtio-rtc specification were official today, and I was
> >> able to expose it via PCI, I probably wouldn't do it that way. There's
> >> just far more in virtio-rtc than we need; the simple shared memory
> >> region is perfectly sufficient for most needs, and especially ours.
> >> 
> >> I have reworked
> >> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/vmclock
> >> to take your other feedback into account.
> >> 
> >> It's now more flexible about the size handling, and explicitly checking
> >> that specific fields are present before using them. 
> >> 
> >> I think I'm going to add a method on the ACPI device to enable the
> >> precise clock information. I haven't done that in the driver yet; it
> >> still just consumes the precise clock information if it happens to be
> >> present already. The enable method can be added in a compatible fashion
> >> (the failure mode is that guests which don't invoke this method when
> >> the hypervisor needs them to will see only the disruption signal and
> >> not precise time).
> >> 
> >> For the HID I'm going to use AMZNVCLK. I had used QEMUVCLK in the QEMU
> >> patches, but I'll change that to use AMZNVCLK too when I repost the
> >> QEMU patch.
> >
> >That doesn't fit with ACPI _HID definitions.
> >Second set 4 characters need to be hex digits as this is an
> >ACPI style ID (which I assume this is given AMZN is a valid
> >vendor ID.  6.1.5 in ACPI v6.5
> >
> >Maybe I'm missing something..

Re: [PATCH v2] ptp: Add vDSO-style vmclock support

2024-07-26 Thread Michael S. Tsirkin
On Fri, Jul 26, 2024 at 01:28:17PM +0100, David Woodhouse wrote:
> diff --git a/include/uapi/linux/vmclock-abi.h 
> b/include/uapi/linux/vmclock-abi.h
> new file mode 100644
> index ..7b1b4759363c
> --- /dev/null
> +++ b/include/uapi/linux/vmclock-abi.h
> @@ -0,0 +1,187 @@
> +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
> BSD-2-Clause) */
> +
> +/*
> + * This structure provides a vDSO-style clock to VM guests, exposing the
> + * relationship (or lack thereof) between the CPU clock (TSC, timebase, arch
> + * counter, etc.) and real time. It is designed to address the problem of
> + * live migration, which other clock enlightenments do not.
> + *
> + * When a guest is live migrated, this affects the clock in two ways.
> + *
> + * First, even between identical hosts the actual frequency of the underlying
> + * counter will change within the tolerances of its specification (typically
> + * ±50PPM, or 4 seconds a day). This frequency also varies over time on the
> + * same host, but can be tracked by NTP as it generally varies slowly. With
> + * live migration there is a step change in the frequency, with no warning.
> + *
> + * Second, there may be a step change in the value of the counter itself, as
> + * its accuracy is limited by the precision of the NTP synchronization on the
> + * source and destination hosts.
> + *
> + * So any calibration (NTP, PTP, etc.) which the guest has done on the source
> + * host before migration is invalid, and needs to be redone on the new host.
> + *
> + * In its most basic mode, this structure provides only an indication to the
> + * guest that live migration has occurred. This allows the guest to know that
> + * its clock is invalid and take remedial action. For applications that need
> + * reliable accurate timestamps (e.g. distributed databases), the structure
> + * can be mapped all the way to userspace. This allows the application to see
> + * directly for itself that the clock is disrupted and take appropriate
> + * action, even when using a vDSO-style method to get the time instead of a
> + * system call.
> + *
> + * In its more advanced mode. this structure can also be used to expose the
> + * precise relationship of the CPU counter to real time, as calibrated by the
> + * host. This means that userspace applications can have accurate time
> + * immediately after live migration, rather than having to pause operations
> + * and wait for NTP to recover. This mode does, of course, rely on the
> + * counter being reliable and consistent across CPUs.
> + *
> + * Note that this must be true UTC, never with smeared leap seconds. If a
> + * guest wishes to construct a smeared clock, it can do so. Presenting a
> + * smeared clock through this interface would be problematic because it
> + * actually messes with the apparent counter *period*. A linear smearing
> + * of 1 ms per second would effectively tweak the counter period by 1000PPM
> + * at the start/end of the smearing period, while a sinusoidal smear would
> + * basically be impossible to represent.
> + *
> + * This structure is offered with the intent that it be adopted into the
> + * nascent virtio-rtc standard, as a virtio-rtc that does not address the 
> live
> + * migration problem seems a little less than fit for purpose. For that
> + * reason, certain fields use precisely the same numeric definitions as in
> + * the virtio-rtc proposal. The structure can also be exposed through an ACPI
> + * device with the CID "VMCLOCK", modelled on the "VMGENID" device except for
> + * the fact that it uses a real _CRS to convey the address of the structure
> + * (which should be a full page, to allow for mapping directly to userspace).
> + */
> +
> +#ifndef __VMCLOCK_ABI_H__
> +#define __VMCLOCK_ABI_H__
> +
> +#ifdef __KERNEL__
> +#include 
> +#else
> +#include 
> +#endif
> +
> +struct vmclock_abi {
> + /* CONSTANT FIELDS */
> + uint32_t magic;
> +#define VMCLOCK_MAGIC0x4b4c4356 /* "VCLK" */
> + uint32_t size;  /* Size of region containing this structure */
> + uint16_t version;   /* 1 */
> + uint8_t counter_id; /* Matches VIRTIO_RTC_COUNTER_xxx except INVALID */
> +#define VMCLOCK_COUNTER_ARM_VCNT 0
> +#define VMCLOCK_COUNTER_X86_TSC  1
> +#define VMCLOCK_COUNTER_INVALID  0xff
> + uint8_t time_type; /* Matches VIRTIO_RTC_TYPE_xxx */
> +#define VMCLOCK_TIME_UTC 0   /* Since 1970-01-01 
> 00:00:00z */
> +#define VMCLOCK_TIME_TAI 1   /* Since 1970-01-01 
> 00:00:00z */
> +#define VMCLOCK_TIME_MONOTONIC   2   /* Since 
> undefined epoch */
> +#define VMCLOCK_TIME_INVALID_SMEARED 3   /* Not supported */
> +#define VMCLOCK_TIME_INVALID_MAYBE_SMEARED   4   /* Not supported */
> +
> + /* NON-CONSTANT FIELDS PROTECTED BY SEQCOUNT LOCK */
> + uint32_t seq_count; /* Low bit means an update is in progress */
> + /*
> +  * This 

Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-26 Thread Michael S. Tsirkin
On Fri, Jul 26, 2024 at 02:00:25PM +0100, David Woodhouse wrote:
> On Fri, 2024-07-26 at 08:52 -0400, Michael S. Tsirkin wrote:
> > On Fri, Jul 26, 2024 at 09:35:51AM +0100, David Woodhouse wrote:
> > > But for this use case, we only need a memory region that the hypervisor
> > > can update. We don't need any of that complexity of gratuitously
> > > interrupting all the vCPUs just to ensure that none of them can be
> > > running userspace while one of them does an update for itself,
> > > potentially translating from one ABI to another. The hypervisor can
> > > just update the user-visible memory in place.
> > 
> > Looks like then your userspace is hypervisor specific, and that's a
> > problem because it's a one way street - there is no way for hypervisor
> > to know what does userspace need, so no way for hypervisor to know which
> > information to provide. No real way to fix bugs.
> 
> It's not hypervisor specific, but you're right that as it stands there
> is no negotiation of what userspace wants. So the hypervisor provides
> what it feels it can provide without significant overhead (which may or
> may not include the precise timekeeping, as discussed, but should
> always include the disruption signal which is the most important
> thing).
> 
> The guest *does* know what the hypervisor provides. And when we get to
> do this in virtio, we get all the goodness of negotiation as well. The
> existence of the simple ACPI model doesn't hurt that at all.

Maybe it doesn't, at that. E.g. virtio does a copy, acpi doesn't?
I'll ponder compatibility over the weekend.

-- 
MST




Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-26 Thread Michael S. Tsirkin
On Fri, Jul 26, 2024 at 09:35:51AM +0100, David Woodhouse wrote:
> But for this use case, we only need a memory region that the hypervisor
> can update. We don't need any of that complexity of gratuitously
> interrupting all the vCPUs just to ensure that none of them can be
> running userspace while one of them does an update for itself,
> potentially translating from one ABI to another. The hypervisor can
> just update the user-visible memory in place.

Looks like then your userspace is hypervisor specific, and that's a
problem because it's a one way street - there is no way for hypervisor
to know what does userspace need, so no way for hypervisor to know which
information to provide. No real way to fix bugs.

-- 
MST




Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-26 Thread Michael S. Tsirkin
On Fri, Jul 26, 2024 at 09:06:29AM +0100, David Woodhouse wrote:
> That's great. You don't even need it to be per-vCPU if you let the
> hypervisor write directly to the single physical location that's mapped
> to userspace. It can do that before it even starts *running* the vCPUs
> after migration. It's a whole lot simpler. 

It *seems* simpler, until you realize that there is no way
to change anything in the interface, there is no negotiation
between hypervisor and userspace. If I learned anything at all
in tens of years working on software, it's that it is
never done. So let's have userspace talk to the kernel
and have kernel talk to the devices, please. There's
no compelling reason to have this bypass here.

-- 
MST




Re: [PATCH net-next v3 3/3] virtio-net: synchronize operstate with admin state on up/down

2024-07-26 Thread Michael S. Tsirkin
On Wed, Jul 10, 2024 at 11:03:42AM +0800, Jason Wang wrote:
> On Tue, Jul 9, 2024 at 9:28 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Jul 09, 2024 at 04:02:14PM +0800, Jason Wang wrote:
> > > This patch synchronize operstate with admin state per RFC2863.
> > >
> > > This is done by trying to toggle the carrier upon open/close and
> > > synchronize with the config change work. This allows propagate status
> > > correctly to stacked devices like:
> > >
> > > ip link add link enp0s3 macvlan0 type macvlan
> > > ip link set link enp0s3 down
> > > ip link show
> > >
> > > Before this patch:
> > >
> > > 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN 
> > > mode DEFAULT group default qlen 1000
> > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> > > ..
> > > 5: macvlan0@enp0s3:  mtu 1500 
> > > qdisc noqueue state UP mode DEFAULT group default qlen 1000
> > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> > >
> > > After this patch:
> > >
> > > 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN 
> > > mode DEFAULT group default qlen 1000
> > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> > > ...
> > > 5: macvlan0@enp0s3:  mtu 1500 
> > > qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
> > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> >
> > I think that the commit log is confusing. It seems to say that
> > the issue fixed is synchronizing state with hardware
> > config change.
> > But your example does not show any
> > hardware change. Isn't this example really just
> > a side effect of setting carrier off on close?
> 
> The main goal for this patch is to make virtio-net follow RFC2863. The
> main thing that is missed is to synchronize the operstate with admin
> state, if we do this, we get several good results, one of the obvious
> one is to allow virtio-net to propagate status to the upper layer, for
> example if the admin state of the lower virtio-net is down it should
> be propagated to the macvlan on top, so I give the example of using a
> stacked device. I'm not we had others but the commit log is probably
> too small to say all of it.
> 
> >
> >
> > > Cc: Venkat Venkatsubra 
> > > Cc: Gia-Khanh Nguyen 
> > > Signed-off-by: Jason Wang 
> >
> > Yes but this just forces lots of re-reads of config on each
> > open/close for no good reason.
> 
> Does it really harm? Technically the link status could be changed
> several times when the admin state is down as well.

It's a bunch of extra vmexits on each VM boot, yes.

> > Config interrupt is handled in core, you can read once
> > on probe and then handle config changes.
> 
> Per RFC2863, the code tries to avoid dealing with any operstate change
> via config space read when the admin state is down.

what exactly in RFC2863 are you referring to? This?
   (2)   if ifAdminStatus is down, then ifOperStatus will normally also
 be down (or notPresent) i.e., there is not (necessarily) a
 fault condition on the interface.
So basically, just call virtio_config_driver_disable on close,
and then config interrupt will not trigger.
Why is that not enough?


> >
> >
> >
> >
> >
> > > ---
> > >  drivers/net/virtio_net.c | 64 
> > >  1 file changed, 38 insertions(+), 26 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 0b4747e81464..e6626ba25b29 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -2476,6 +2476,25 @@ static void virtnet_cancel_dim(struct virtnet_info 
> > > *vi, struct dim *dim)
> > >   net_dim_work_cancel(dim);
> > >  }
> > >
> > > +static void virtnet_update_settings(struct virtnet_info *vi)
> > > +{
> > > + u32 speed;
> > > + u8 duplex;
> > > +
> > > + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> > > + return;
> > > +
> > > + virtio_cread_le(vi->vdev, struct virtio_net_config, speed, );
> > > +
> > > + if (ethtool_validate_speed(speed))
> > > + vi->speed = speed;
> > > +
> > > + virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, 
> > > );
> > > +
> > > + if (ethtool_validate_duplex(duplex))
> > > + vi-

Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-26 Thread Michael S. Tsirkin
On Thu, Jul 25, 2024 at 11:20:56PM +0100, David Woodhouse wrote:
> We're rolling out the AMZNVCLK device for internal use cases, and plan
> to add it in public instances some time later.

Let's be real. If amazon does something in its own hypervisor, and the
only way to use that is to expose the interface to userspace, there is
very little the linux community can do.  Moreover, userspace will be
written to this ABI, and be locked in to the specific hypervisor. It
might be a win for amazon short term but long term you will want to
extend things and it will be a mess.

So I feel you have chosen ACPI badly.  It just does not have the APIs
that you need. Virtio does, and would not create a userpspace lock-in
to a specific hypervisor. It's not really virtio specific either,
you can write a bare pci device with a BAR and a bunch of msix
vectors and it will get you the same effect.

-- 
MST




Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-25 Thread Michael S. Tsirkin
On Fri, Jul 26, 2024 at 01:09:24AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 25, 2024 at 10:29:18PM +0100, David Woodhouse wrote:
> > > > > Then can't we fix it by interrupting all CPUs right after LM?
> > > > > 
> > > > > To me that seems like a cleaner approach - we then compartmentalize
> > > > > the ABI issue - kernel has its own ABI against userspace,
> > > > > devices have their own ABI against kernel.
> > > > > It'd mean we need a way to detect that interrupt was sent,
> > > > > maybe yet another counter inside that structure.
> > > > > 
> > > > > WDYT?
> > > > > 
> > > > > By the way the same idea would work for snapshots -
> > > > > some people wanted to expose that info to userspace, too.
> > 
> > Those people included me. I wanted to interrupt all the vCPUs, even the
> > ones which were in userspace at the moment of migration, and have the
> > kernel deal with passing it on to userspace via a different ABI.
> > 
> > It ends up being complex and intricate, and requiring a lot of new
> > kernel and userspace support. I gave up on it in the end for snapshots,
> > and didn't go there again for this.
> 
> Maybe become you insist on using ACPI?
> I see a fairly simple way to do it. For example, with virtio:
> 
> one vq per CPU, with a single outstanding buffer,
> callback copies from the buffer into the userspace
> visible memory.
> 
> Want me to show you the code?

Couldn't resist, so I wrote a bit of this code.
Fundamentally, we keep a copy of the hypervisor abi
in the device:

struct virtclk_info *vci {
struct vmclock_abi abi;
};

each vq will has its own copy:

struct virtqueue_info {
struct scatterlist sg[];
struct vmclock_abi abi;
}

we add it during probe:
sg_init_one(vqi->sg, >abi, sizeof(vqi->abi));
virtqueue_add_inbuf(vq,
vqi->sg, 1,
>vabi,
GFP_ATOMIC);



We set the affinity for each vq:

   for (i = 0; i < num_online_cpus(); i++)
   virtqueue_set_affinity(vi->vq[i], i);

(virtio net does it, and it handles cpu hotplug as well)

each vq callback would do:

static void vmclock_cb(struct virtqueue *vq)
{
struct virtclk_info *vci = vq->vdev->priv;
struct virtqueue_info *vqi = vq->priv;
void *buf;
unsigned int len;

buf = virtqueue_get_buf(vq, );
if (!buf)
return;

BUG_ON(buf != >abi);

spin_lock(vci->lock);
if (memcmp(>abi, >abi, sizeof(vqi->abi))) {
memcpy(>abi, >abi, sizeof(vqi->abi));
}

/* Update the userspace visible structure now */
.

/* Re-add the buffer */
virtqueue_add_inbuf(vq,
vqi->sg, 1,
>abi,
GFP_ATOMIC);

spin_unlock(vi->lock);
}

That's it!
Where's the problem here?

-- 
MST




Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2024 at 10:29:18PM +0100, David Woodhouse wrote:
> > > > Then can't we fix it by interrupting all CPUs right after LM?
> > > > 
> > > > To me that seems like a cleaner approach - we then compartmentalize
> > > > the ABI issue - kernel has its own ABI against userspace,
> > > > devices have their own ABI against kernel.
> > > > It'd mean we need a way to detect that interrupt was sent,
> > > > maybe yet another counter inside that structure.
> > > > 
> > > > WDYT?
> > > > 
> > > > By the way the same idea would work for snapshots -
> > > > some people wanted to expose that info to userspace, too.
> 
> Those people included me. I wanted to interrupt all the vCPUs, even the
> ones which were in userspace at the moment of migration, and have the
> kernel deal with passing it on to userspace via a different ABI.
> 
> It ends up being complex and intricate, and requiring a lot of new
> kernel and userspace support. I gave up on it in the end for snapshots,
> and didn't go there again for this.

Maybe become you insist on using ACPI?
I see a fairly simple way to do it. For example, with virtio:

one vq per CPU, with a single outstanding buffer,
callback copies from the buffer into the userspace
visible memory.

Want me to show you the code?

-- 
MST




Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2024 at 10:29:18PM +0100, David Woodhouse wrote:
> > > > Then can't we fix it by interrupting all CPUs right after LM?
> > > > 
> > > > To me that seems like a cleaner approach - we then compartmentalize
> > > > the ABI issue - kernel has its own ABI against userspace,
> > > > devices have their own ABI against kernel.
> > > > It'd mean we need a way to detect that interrupt was sent,
> > > > maybe yet another counter inside that structure.
> > > > 
> > > > WDYT?
> > > > 
> > > > By the way the same idea would work for snapshots -
> > > > some people wanted to expose that info to userspace, too.
> 
> Those people included me. I wanted to interrupt all the vCPUs, even the
> ones which were in userspace at the moment of migration, and have the
> kernel deal with passing it on to userspace via a different ABI.
> 
> It ends up being complex and intricate, and requiring a lot of new
> kernel and userspace support. I gave up on it in the end for snapshots,
> and didn't go there again for this.

ok I believe you, I am just curious how come you need userspace
support - what I imagine would live completely in kernel ...


> By contrast, a driver which merely exposes a page of MMIO space
> identified by an ACPI device (without even the in-kernel PTP support)
> could probably be fewer than a hundred lines of code. In an externally-
> buildable module that goes back as far as RHEL8 or even further,
> allowing users to just build and use it from their application.
> 
> > was there supposed to be text here, or did you just like this
> > so much you decided to repost my mail ;) 
> 
> Hm, weirdness. I've known Evolution get into a state where it sends
> completely *empty* messages, but I've never seen it eat only my own
> part before. I had definitely typed responses (along the lines of the
> above) last time.

mutt sucks less ;)




Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2024 at 10:00:24PM +0100, David Woodhouse wrote:
> On Thu, 2024-07-25 at 16:50 -0400, Michael S. Tsirkin wrote:
> > On Thu, Jul 25, 2024 at 08:35:40PM +0100, David Woodhouse wrote:
> > > On Thu, 2024-07-25 at 12:38 -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Jul 25, 2024 at 04:18:43PM +0100, David Woodhouse wrote:
> > > > > The use case isn't necessarily for all users of gettimeofday(), of
> > > > > course; this is for those applications which *need* precision time.
> > > > > Like distributed databases which rely on timestamps for coherency, and
> > > > > users who get fined millions of dollars when LM messes up their clocks
> > > > > and they put wrong timestamps on financial transactions.
> > > > 
> > > > I would however worry that with all this pass through,
> > > > applications have to be coded to each hypervisor or even
> > > > version of the hypervisor.
> > > 
> > > Yes, that would be a problem. Which is why I feel it's so important to
> > > harmonise the contents of the shared memory, and I'm implementing it
> > > both QEMU and $DAYJOB, as well as aligning with virtio-rtc.
> > 
> > 
> > Writing an actual spec for this would be another thing that might help.
> > 
> 
> > > I don't think the structure should be changing between hypervisors (and
> > > especially versions). We *will* see a progression from simply providing
> > > the disruption signal, to providing the full clock information so that
> > > guests don't have to abort transactions while they resync their clock.
> > > But that's perfectly fine.
> > > 
> > > And it's also entirely agnostic to the mechanism by which the memory
> > > region is *discovered*. It doesn't matter if it's ACPI, DT, a
> > > hypervisor enlightenment, a BAR of a simple PCI device, virtio, or
> > > anything else.
> > > 
> > > ACPI is one of the *simplest* options for a hypervisor and guest to
> > > implement, and doesn't prevent us from using the same structure in
> > > virtio-rtc. I'm happy enough using ACPI and letting virtio-rtc come
> > > along later.
> > > 
> > > > virtio has been developed with the painful experience that we keep
> > > > making mistakes, or coming up with new needed features,
> > > > and that maintaining forward and backward compatibility
> > > > becomes a whole lot harder than it seems in the beginning.
> > > 
> > > Yes. But as you note, this shared memory structure is a userspace ABI
> > > all of its own, so we get to make a completely *different* kind of
> > > mistake :)
> > > 
> > 
> > 
> > So, something I still don't completely understand.
> > Can't the VDSO thing be written to by kernel?
> > Let's say on LM, an interrupt triggers and kernel copies
> > data from a specific device to the VDSO.
> > 
> > Is that problematic somehow? I imagine there is a race where
> > userspace reads vdso after lm but before kernel updated
> > vdso - is that the concern?
> > 
> > Then can't we fix it by interrupting all CPUs right after LM?
> > 
> > To me that seems like a cleaner approach - we then compartmentalize
> > the ABI issue - kernel has its own ABI against userspace,
> > devices have their own ABI against kernel.
> > It'd mean we need a way to detect that interrupt was sent,
> > maybe yet another counter inside that structure.
> > 
> > WDYT?
> > 
> > By the way the same idea would work for snapshots -
> > some people wanted to expose that info to userspace, too.
> > 
> 



was there supposed to be text here, or did you just like this
so much you decided to repost my mail ;) 

-- 
MST




Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2024 at 08:35:40PM +0100, David Woodhouse wrote:
> On Thu, 2024-07-25 at 12:38 -0400, Michael S. Tsirkin wrote:
> > On Thu, Jul 25, 2024 at 04:18:43PM +0100, David Woodhouse wrote:
> > > The use case isn't necessarily for all users of gettimeofday(), of
> > > course; this is for those applications which *need* precision time.
> > > Like distributed databases which rely on timestamps for coherency, and
> > > users who get fined millions of dollars when LM messes up their clocks
> > > and they put wrong timestamps on financial transactions.
> > 
> > I would however worry that with all this pass through,
> > applications have to be coded to each hypervisor or even
> > version of the hypervisor.
> 
> Yes, that would be a problem. Which is why I feel it's so important to
> harmonise the contents of the shared memory, and I'm implementing it
> both QEMU and $DAYJOB, as well as aligning with virtio-rtc.


Writing an actual spec for this would be another thing that might help.

> I don't think the structure should be changing between hypervisors (and
> especially versions). We *will* see a progression from simply providing
> the disruption signal, to providing the full clock information so that
> guests don't have to abort transactions while they resync their clock.
> But that's perfectly fine.
> 
> And it's also entirely agnostic to the mechanism by which the memory
> region is *discovered*. It doesn't matter if it's ACPI, DT, a
> hypervisor enlightenment, a BAR of a simple PCI device, virtio, or
> anything else.
> 
> ACPI is one of the *simplest* options for a hypervisor and guest to
> implement, and doesn't prevent us from using the same structure in
> virtio-rtc. I'm happy enough using ACPI and letting virtio-rtc come
> along later.
> 
> > virtio has been developed with the painful experience that we keep
> > making mistakes, or coming up with new needed features,
> > and that maintaining forward and backward compatibility
> > becomes a whole lot harder than it seems in the beginning.
> 
> Yes. But as you note, this shared memory structure is a userspace ABI
> all of its own, so we get to make a completely *different* kind of
> mistake :)
> 


So, something I still don't completely understand.
Can't the VDSO thing be written to by kernel?
Let's say on LM, an interrupt triggers and kernel copies
data from a specific device to the VDSO.

Is that problematic somehow? I imagine there is a race where
userspace reads vdso after lm but before kernel updated
vdso - is that the concern?

Then can't we fix it by interrupting all CPUs right after LM?

To me that seems like a cleaner approach - we then compartmentalize
the ABI issue - kernel has its own ABI against userspace,
devices have their own ABI against kernel.
It'd mean we need a way to detect that interrupt was sent,
maybe yet another counter inside that structure.

WDYT?

By the way the same idea would work for snapshots -
some people wanted to expose that info to userspace, too.

-- 
MST




Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2024 at 04:18:43PM +0100, David Woodhouse wrote:
> On Thu, 2024-07-25 at 10:11 -0400, Michael S. Tsirkin wrote:
> > On Thu, Jul 25, 2024 at 02:50:50PM +0100, David Woodhouse wrote:
> > > Even if the virtio-rtc specification were official today, and I was
> > > able to expose it via PCI, I probably wouldn't do it that way. There's
> > > just far more in virtio-rtc than we need; the simple shared memory
> > > region is perfectly sufficient for most needs, and especially ours.
> > 
> > I can't stop amazon from shipping whatever in its hypervisor,
> > I'd just like to understand this better, if there is a use-case
> > not addressed here then we can change virtio to address it.
> > 
> > The rtc driver patch posted is 900 lines, yours is 700 lines, does not
> > look like a big difference.  As for using a memory region, this is
> > valid, but maybe rtc should be changed to do exactly that?
> 
> I'm certainly aiming for virtio-rtc to include that as an *option*,
> because I think I don't think it makes sense for an RTC specification
> aimed at virtual machines *not* to deal with the live migration
> problem.
> 
> AFAICT the only ways to deal with the LM problem are either to make a
> hypercall/virtio transaction for *every* clock read which needs to be
> accurate, or expose a memory region for the guest to do it "vDSO-
> style".

virtio can support the second option, we already have
VIRTIO_PCI_CAP_SHARED_MEMORY_CFG, I'd just use it.


> And similarly, unless we want guest userspace to have to make a
> *system* call every time, that memory region needs to be mappable all
> the way to userspace.

This part is classic for pci, mapping pci bar has been well
studied.


> The use case isn't necessarily for all users of gettimeofday(), of
> course; this is for those applications which *need* precision time.
> Like distributed databases which rely on timestamps for coherency, and
> users who get fined millions of dollars when LM messes up their clocks
> and they put wrong timestamps on financial transactions.

I would however worry that with all this pass through,
applications have to be coded to each hypervisor or even
version of the hypervisor.

I don't really know the use-case well enough - is sending
an interrupt to linux and having linux create a device
independent structure not workable?


> > E.g. we can easily add a capability describing such a region.
> > or put it in device config space.
> 
> I think it has to be memory, not config space. But yes.

virtio config space, which is just a region in a BAR.
But yes, maybe VIRTIO_PCI_CAP_SHARED_MEMORY_CFG is cleaner.

> The intent is that my driver would be usable with the shared memory
> region from a virtio-rtc device too. It'd need a tiny amount of
> refactoring of the discovery code in vmclock_probe(), which I haven't
> done yet as it would be premature optimisation. 
> 
> > I mean yes, we can build a new transport for each specific need but in
> > the end we'll get a ton of interfaces with unclear compatibility
> > requirements.  If effort is instead spent improving common interfaces,
> > we get consistency and everyone benefits. That's why I'm trying to
> > understand the need here.
> 
> It's simplicity. Because this isn't even a "transport". It's just a
> simple breadcrumb given to the guest to tell it where the information
> is.
> In the fullness of time assuming this becomes part of virtio-rtc too,
> the fact that it can *also* be discovered by ACPI is just a tiny
> detail. And it allows hypervisors to implement it a *whole* lot more
> simply.
> 
> The addition of an ACPI method to enable the timekeeping does make it a
> tiny bit more than a 'breadcrump', I concede — but that's still
> basically trivial to implement. A whole lot simpler than a full virtio
> device.

virtio has been developed with the painful experience that we keep
making mistakes, or coming up with new needed features,
and that maintaining forward and backward compatibility
becomes a whole lot harder than it seems in the beginning.

-- 
MST




Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2024 at 02:50:50PM +0100, David Woodhouse wrote:
> Even if the virtio-rtc specification were official today, and I was
> able to expose it via PCI, I probably wouldn't do it that way. There's
> just far more in virtio-rtc than we need; the simple shared memory
> region is perfectly sufficient for most needs, and especially ours.

I can't stop amazon from shipping whatever in its hypervisor,
I'd just like to understand this better, if there is a use-case
not addressed here then we can change virtio to address it.

The rtc driver patch posted is 900 lines, yours is 700 lines, does not
look like a big difference.  As for using a memory region, this is
valid, but maybe rtc should be changed to do exactly that?
E.g. we can easily add a capability describing such a region.
or put it in device config space.

I mean yes, we can build a new transport for each specific need but in
the end we'll get a ton of interfaces with unclear compatibility
requirements.  If effort is instead spent improving common interfaces,
we get consistency and everyone benefits. That's why I'm trying to
understand the need here.

-- 
MST




Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2024 at 01:31:19PM +0100, David Woodhouse wrote:
> On Thu, 2024-07-25 at 08:29 -0400, Michael S. Tsirkin wrote:
> > On Thu, Jul 25, 2024 at 01:27:49PM +0100, David Woodhouse wrote:
> > > On Thu, 2024-07-25 at 08:17 -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Jul 25, 2024 at 10:56:05AM +0100, David Woodhouse wrote:
> > > > > > Do you want to just help complete virtio-rtc then? Would be easier 
> > > > > > than
> > > > > > trying to keep two specs in sync.
> > > > > 
> > > > > The ACPI version is much more lightweight and doesn't take up a
> > > > > valuable PCI slot#. (I know, you can do virtio without PCI but that's
> > > > > complex in other ways).
> > > > > 
> > > > 
> > > > Hmm, should we support virtio over ACPI? Just asking.
> > > 
> > > Given that we support virtio DT bindings, and the ACPI "PRP0001" device
> > > exists with a DSM method which literally returns DT properties,
> > > including such properties as "compatible=virtio,mmio" ... do we
> > > already?
> > > 
> > > 
> > 
> > In a sense, but you are saying that is too complex?
> > Can you elaborate?
> 
> No, I think it's fine. I encourage the use of the PRP0001 device to
> expose DT devices through ACPI. I was just reminding you of its
> existence.
> 
> 

Confused. You said "I know, you can do virtio without PCI but that's
complex in other ways" as the explanation why you are doing a custom
protocol.

-- 
MST




Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2024 at 01:27:49PM +0100, David Woodhouse wrote:
> On Thu, 2024-07-25 at 08:17 -0400, Michael S. Tsirkin wrote:
> > On Thu, Jul 25, 2024 at 10:56:05AM +0100, David Woodhouse wrote:
> > > > Do you want to just help complete virtio-rtc then? Would be easier than
> > > > trying to keep two specs in sync.
> > > 
> > > The ACPI version is much more lightweight and doesn't take up a
> > > valuable PCI slot#. (I know, you can do virtio without PCI but that's
> > > complex in other ways).
> > > 
> > 
> > Hmm, should we support virtio over ACPI? Just asking.
> 
> Given that we support virtio DT bindings, and the ACPI "PRP0001" device
> exists with a DSM method which literally returns DT properties,
> including such properties as "compatible=virtio,mmio" ... do we
> already?
> 
> 

In a sense, but you are saying that is too complex?
Can you elaborate?

-- 
MST




Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2024 at 10:56:05AM +0100, David Woodhouse wrote:
> > Do you want to just help complete virtio-rtc then? Would be easier than
> > trying to keep two specs in sync.
> 
> The ACPI version is much more lightweight and doesn't take up a
> valuable PCI slot#. (I know, you can do virtio without PCI but that's
> complex in other ways).
> 

Hmm, should we support virtio over ACPI? Just asking.

-- 
MST




Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-24 Thread Michael S. Tsirkin
On Wed, Jul 24, 2024 at 06:16:37PM +0100, David Woodhouse wrote:
> From: David Woodhouse 
> 
> The vmclock "device" provides a shared memory region with precision clock
> information. By using shared memory, it is safe across Live Migration.
> 
> Like the KVM PTP clock, this can convert TSC-based cross timestamps into
> KVM clock values. Unlike the KVM PTP clock, it does so only when such is
> actually helpful.
> 
> The memory region of the device is also exposed to userspace so it can be
> read or memory mapped by application which need reliable notification of
> clock disruptions.
> 
> Signed-off-by: David Woodhouse 

one other thing worth mentioning is that this design can't work
with confidential computing setups. By comparison, mapping e.g. a
range in a PCI BAR would work for these setups.
Is there a reason this functionality is not interesting for
confidential VMs?

-- 
MST




Re: [PATCH] ptp: Add vDSO-style vmclock support

2024-07-24 Thread Michael S. Tsirkin
On Wed, Jul 24, 2024 at 06:16:37PM +0100, David Woodhouse wrote:
> From: David Woodhouse 
> 
> The vmclock "device" provides a shared memory region with precision clock
> information. By using shared memory, it is safe across Live Migration.
> 
> Like the KVM PTP clock, this can convert TSC-based cross timestamps into
> KVM clock values. Unlike the KVM PTP clock, it does so only when such is
> actually helpful.
> 
> The memory region of the device is also exposed to userspace so it can be
> read or memory mapped by application which need reliable notification of
> clock disruptions.
> 
> Signed-off-by: David Woodhouse 
> ---
> QEMU implementation at
> https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/vmclock
> 
> Although the ACPI device implemented in QEMU (and some other
> hypervisor) stands alone, most of the fields and values herein are
> aligned as much as possible with the nascent virtio-rtc specification,
> with the intent that a version of the same structure can be
> incorporated into that standard.

Do you want to just help complete virtio-rtc then? Would be easier than
trying to keep two specs in sync.


> v1:
>  • Change absolute error fields to nanoseconds
>  • Update leap second definition to match virtio-rtc intentions in 
> 
> https://lore.kernel.org/all/85c93b42-41a2-42c4-a168-55079bbff...@opensynergy.com
> 
> RFC v4:
>  • Add esterror fields, MONOTONIC flag.
>  • Reduce seq_count to 32 bits
>  • Expand size to permit 64KiB pages
>  • Align with virtio-rtc fields, values and leap handling
>  • Drop gettime() method (since we have gettimex())
>  • Add leap second smearing hint
>  • Use a real _CRS on the ACPI device
> 
> RFC v3: (wrong patch sent)
> 
> RFC v2:
>  • Add gettimex64() support
>  • Convert TSC values to KVM clock when appropriate
>  • Require int128 support
>  • Add counter_period_shift
>  • Add timeout when seq_count is invalid
>  • Add flags field
>  • Better comments in vmclock ABI structure
>  • Explicitly forbid smearing (as clock rates would need to change)
> 
> 
>  drivers/ptp/Kconfig  |  13 +
>  drivers/ptp/Makefile |   1 +
>  drivers/ptp/ptp_vmclock.c| 567 +++
>  include/uapi/linux/vmclock-abi.h | 187 ++
>  4 files changed, 768 insertions(+)
>  create mode 100644 drivers/ptp/ptp_vmclock.c
>  create mode 100644 include/uapi/linux/vmclock-abi.h
> 
> diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
> index 604541dcb320..e98c9767e0ef 100644
> --- a/drivers/ptp/Kconfig
> +++ b/drivers/ptp/Kconfig
> @@ -131,6 +131,19 @@ config PTP_1588_CLOCK_KVM
> To compile this driver as a module, choose M here: the module
> will be called ptp_kvm.
>  
> +config PTP_1588_CLOCK_VMCLOCK
> + tristate "Virtual machine PTP clock"
> + depends on X86_TSC || ARM_ARCH_TIMER
> + depends on PTP_1588_CLOCK && ACPI && ARCH_SUPPORTS_INT128
> + default y
> + help
> +   This driver adds support for using a virtual precision clock
> +   advertised by the hypervisor. This clock is only useful in virtual
> +   machines where such a device is present.
> +
> +   To compile this driver as a module, choose M here: the module
> +   will be called ptp_vmclock.
> +
>  config PTP_1588_CLOCK_IDT82P33
>   tristate "IDT 82P33xxx PTP clock"
>   depends on PTP_1588_CLOCK && I2C
> diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
> index 68bf02078053..01b5cd91eb61 100644
> --- a/drivers/ptp/Makefile
> +++ b/drivers/ptp/Makefile
> @@ -11,6 +11,7 @@ obj-$(CONFIG_PTP_1588_CLOCK_DTE)+= ptp_dte.o
>  obj-$(CONFIG_PTP_1588_CLOCK_INES)+= ptp_ines.o
>  obj-$(CONFIG_PTP_1588_CLOCK_PCH) += ptp_pch.o
>  obj-$(CONFIG_PTP_1588_CLOCK_KVM) += ptp_kvm.o
> +obj-$(CONFIG_PTP_1588_CLOCK_VMCLOCK) += ptp_vmclock.o
>  obj-$(CONFIG_PTP_1588_CLOCK_QORIQ)   += ptp-qoriq.o
>  ptp-qoriq-y  += ptp_qoriq.o
>  ptp-qoriq-$(CONFIG_DEBUG_FS) += ptp_qoriq_debugfs.o
> diff --git a/drivers/ptp/ptp_vmclock.c b/drivers/ptp/ptp_vmclock.c
> new file mode 100644
> index ..9c508c21c062
> --- /dev/null
> +++ b/drivers/ptp/ptp_vmclock.c
> @@ -0,0 +1,567 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Virtual PTP 1588 clock for use with LM-safe VMclock device.
> + *
> + * Copyright © 2024 Amazon.com, Inc. or its affiliates.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include 
> +
> +#ifdef CONFIG_X86
> +#include 
> +#include 
> +#endif
> +
> +#ifdef CONFIG_KVM_GUEST
> +#define SUPPORT_KVMCLOCK
> +#endif
> +
> +static DEFINE_IDA(vmclock_ida);
> +
> +ACPI_MODULE_NAME("vmclock");
> +
> +struct vmclock_state {
> + struct resource res;
> + struct vmclock_abi *clk;
> + struct miscdevice miscdev;
> + struct ptp_clock_info ptp_clock_info;
> + struct ptp_clock *ptp_clock;
> 

Re: [PATCH] tools/virtio:Fix the wrong format specifier

2024-07-24 Thread Michael S. Tsirkin
On Wed, Jul 24, 2024 at 12:41:08AM -0700, Zhu Jun wrote:
> The unsigned int should use "%u" instead of "%d".
> 
> Signed-off-by: Zhu Jun 

which matters why?

> ---
>  tools/virtio/ringtest/main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/virtio/ringtest/main.c b/tools/virtio/ringtest/main.c
> index 5a18b2301a63..e471d8e7cfaa 100644
> --- a/tools/virtio/ringtest/main.c
> +++ b/tools/virtio/ringtest/main.c
> @@ -276,7 +276,7 @@ static void help(void)
>   fprintf(stderr, "Usage:  [--help]"
>   " [--host-affinity H]"
>   " [--guest-affinity G]"
> - " [--ring-size R (default: %d)]"
> + " [--ring-size R (default: %u)]"
>   " [--run-cycles C (default: %d)]"
>   " [--batch b]"
>   " [--outstanding o]"
> -- 
> 2.17.1
> 
> 




Re: [PATH v5 3/3] vdpa/mlx5: Add the support of set mac address

2024-07-23 Thread Michael S. Tsirkin
On Tue, Jul 23, 2024 at 07:49:44AM +, Dragos Tatulea wrote:
> On Tue, 2024-07-23 at 13:39 +0800, Cindy Lu wrote:
> > Add the function to support setting the MAC address.
> > For vdpa/mlx5, the function will use mlx5_mpfs_add_mac
> > to set the mac address
> > 
> > Tested in ConnectX-6 Dx device
> > 
> > Signed-off-by: Cindy Lu 
> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 28 
> >  1 file changed, 28 insertions(+)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index ecfc16151d61..7fce952d650f 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -3785,10 +3785,38 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev 
> > *v_mdev, struct vdpa_device *
> > destroy_workqueue(wq);
> > mgtdev->ndev = NULL;
> >  }
> > +static int mlx5_vdpa_set_attr(struct vdpa_mgmt_dev *v_mdev,
> > + struct vdpa_device *dev,
> > + const struct vdpa_dev_set_config *add_config)
> > +{
> > +   struct virtio_net_config *config;
> > +   struct mlx5_core_dev *pfmdev;
> > +   struct mlx5_vdpa_dev *mvdev;
> > +   struct mlx5_vdpa_net *ndev;
> > +   struct mlx5_core_dev *mdev;
> > +   int err = -EINVAL;
> > +
> > +   mvdev = to_mvdev(dev);
> > +   ndev = to_mlx5_vdpa_ndev(mvdev);
> > +   mdev = mvdev->mdev;
> > +   config = >config;
> > +
> > +   down_write(>reslock);
> > +   if (add_config->mask & (1 << VDPA_ATTR_DEV_NET_CFG_MACADDR)) {
> > +   pfmdev = pci_get_drvdata(pci_physfn(mdev->pdev));
> > +   err = mlx5_mpfs_add_mac(pfmdev, config->mac);
> > +   if (0 == err)
> if (!err) would be nicer. Not a deal breaker though.

yes, no yodda style please. It, I can not stand.


> Reviewed-by: Dragos Tatulea 
> 
> > +   memcpy(config->mac, add_config->net.mac, ETH_ALEN);
> > +   }
> > +
> > +   up_write(>reslock);
> > +   return err;
> > +}
> >  
> >  static const struct vdpa_mgmtdev_ops mdev_ops = {
> > .dev_add = mlx5_vdpa_dev_add,
> > .dev_del = mlx5_vdpa_dev_del,
> > +   .dev_set_attr = mlx5_vdpa_set_attr,
> >  };
> >  
> >  static struct virtio_device_id id_table[] = {
> 




Re: [PATCH net-next v3 3/3] virtio-net: synchronize operstate with admin state on up/down

2024-07-18 Thread Michael S. Tsirkin
On Fri, Jul 19, 2024 at 09:02:29AM +0800, Jason Wang wrote:
> On Wed, Jul 17, 2024 at 2:53 PM Jason Wang  wrote:
> >
> > On Wed, Jul 17, 2024 at 2:00 PM Michael S. Tsirkin  wrote:
> > >
> > > On Wed, Jul 17, 2024 at 09:19:02AM +0800, Jason Wang wrote:
> > > > On Wed, Jul 10, 2024 at 11:03 AM Jason Wang  wrote:
> > > > >
> > > > > On Tue, Jul 9, 2024 at 9:28 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Jul 09, 2024 at 04:02:14PM +0800, Jason Wang wrote:
> > > > > > > This patch synchronize operstate with admin state per RFC2863.
> > > > > > >
> > > > > > > This is done by trying to toggle the carrier upon open/close and
> > > > > > > synchronize with the config change work. This allows propagate 
> > > > > > > status
> > > > > > > correctly to stacked devices like:
> > > > > > >
> > > > > > > ip link add link enp0s3 macvlan0 type macvlan
> > > > > > > ip link set link enp0s3 down
> > > > > > > ip link show
> > > > > > >
> > > > > > > Before this patch:
> > > > > > >
> > > > > > > 3: enp0s3:  mtu 1500 qdisc pfifo_fast state 
> > > > > > > DOWN mode DEFAULT group default qlen 1000
> > > > > > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> > > > > > > ..
> > > > > > > 5: macvlan0@enp0s3:  mtu 
> > > > > > > 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
> > > > > > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> > > > > > >
> > > > > > > After this patch:
> > > > > > >
> > > > > > > 3: enp0s3:  mtu 1500 qdisc pfifo_fast state 
> > > > > > > DOWN mode DEFAULT group default qlen 1000
> > > > > > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> > > > > > > ...
> > > > > > > 5: macvlan0@enp0s3:  
> > > > > > > mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group 
> > > > > > > default qlen 1000
> > > > > > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> > > > > >
> > > > > > I think that the commit log is confusing. It seems to say that
> > > > > > the issue fixed is synchronizing state with hardware
> > > > > > config change.
> > > > > > But your example does not show any
> > > > > > hardware change. Isn't this example really just
> > > > > > a side effect of setting carrier off on close?
> > > > >
> > > > > The main goal for this patch is to make virtio-net follow RFC2863. The
> > > > > main thing that is missed is to synchronize the operstate with admin
> > > > > state, if we do this, we get several good results, one of the obvious
> > > > > one is to allow virtio-net to propagate status to the upper layer, for
> > > > > example if the admin state of the lower virtio-net is down it should
> > > > > be propagated to the macvlan on top, so I give the example of using a
> > > > > stacked device. I'm not we had others but the commit log is probably
> > > > > too small to say all of it.
> > > >
> > > > Michael, any more comments on this?
> > > >
> > > > Thans
> > >
> > >
> > > Still don't get it, sorry.
> > > > > > > This is done by trying to toggle the carrier upon open/close and
> > > > > > > synchronize with the config change work.
> > > What does this sentence mean? What is not synchronized with config
> > > change that needs to be?
> >
> > I meant,
> >
> > 1) maclvan depends on the linkwatch to transfer operstate from the
> > lower device to itself.
> > 2) ndo_open()/close() will not trigger the linkwatch so we need to do
> > it by ourselves in virtio-net to make sure macvlan get the correct
> > opersate
> > 3) consider config change work can change the state so ndo_close()
> > needs to synchronize with it
> >
> > Thanks
> 
> Michael, are you fine with the above or I miss something there?
> 
> Thanks


I don't understand 3. config change can always trigger.
what I do not like is all these reads from config space
that now trigger on open/close. previously we did
read
- on probe
- after probe, if config changed


and that made sense.

-- 
MST




Re: [PATCH V2 5/7] vhost-vdpa: VHOST_IOTLB_REMAP

2024-07-18 Thread Michael S. Tsirkin
On Thu, Jul 18, 2024 at 08:45:31AM +0800, Jason Wang wrote:
> > > For example:
> > >
> > > 1) old owner pass fd to new owner which is another process
> > > 2) the new owner do VHOST_NEW_OWNER
> > > 3) new owner doesn't do remap correctly
> > >
> > > There's no way for the old owner to remove/unpin the mappings as we
> > > have the owner check in IOTLB_UPDATE. Looks like a potential way for
> > > DOS.
> >
> > This is a bug in the second cooperating process, not a DOS.  The application
> > must fix it.  Sometimes you cannot recover from an application bug at run 
> > time.
> >
> > BTW, at one time vfio enforced the concept of an owner, but Alex deleted it.
> > It adds no value, because possession of the fd is the key.
> >ffed0518d871 ("vfio: remove useless judgement")
> 
> This seems to be a great relaxation of the ownership check. I would
> like to hear from Michael first.
> 
> Thanks

It could be that the ownership model is too restrictive.
But again, this is changing a security assumption.
Looks like yes another reason to tie this to the switch to iommufd.

-- 
MST




Re: [GIT PULL] virtio: features, fixes, cleanups

2024-07-18 Thread Michael S. Tsirkin
On Wed, Jul 17, 2024 at 05:30:34AM -0400, Michael S. Tsirkin wrote:
> This is relatively small.
> I had to drop a buggy commit in the middle so some hashes
> changed from what was in linux-next.
> Deferred admin vq scalability fix to after rc2 as a minor issue was
> found with it recently, but the infrastructure for it
> is there now.

BTW I forgot to mention a merge conflict with char-misc
that is also adding an entry in MAINTAINERS.
It's trivial to resolve.


> The following changes since commit e9d22f7a6655941fc8b2b942ed354ec780936b3e:
> 
>   Merge tag 'linux_kselftest-fixes-6.10-rc7' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest 
> (2024-07-02 13:53:24 -0700)
> 
> are available in the Git repository at:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
> 
> for you to fetch changes up to 6c85d6b653caeba2ef982925703cbb4f2b3b3163:
> 
>   virtio: rename virtio_find_vqs_info() to virtio_find_vqs() (2024-07-17 
> 05:20:58 -0400)
> 
> 
> virtio: features, fixes, cleanups
> 
> Several new features here:
> 
> - Virtio find vqs API has been reworked
>   (required to fix the scalability issue we have with
>adminq, which I hope to merge later in the cycle)
> 
> - vDPA driver for Marvell OCTEON
> 
> - virtio fs performance improvement
> 
> - mlx5 migration speedups
> 
> Fixes, cleanups all over the place.
> 
> Signed-off-by: Michael S. Tsirkin 
> 
> 
> Denis Arefev (1):
>   net: missing check virtio
> 
> Dragos Tatulea (24):
>   vdpa/mlx5: Clarify meaning thorough function rename
>   vdpa/mlx5: Make setup/teardown_vq_resources() symmetrical
>   vdpa/mlx5: Drop redundant code
>   vdpa/mlx5: Drop redundant check in teardown_virtqueues()
>   vdpa/mlx5: Iterate over active VQs during suspend/resume
>   vdpa/mlx5: Remove duplicate suspend code
>   vdpa/mlx5: Initialize and reset device with one queue pair
>   vdpa/mlx5: Clear and reinitialize software VQ data on reset
>   vdpa/mlx5: Rename init_mvqs
>   vdpa/mlx5: Add support for modifying the virtio_version VQ field
>   vdpa/mlx5: Add support for modifying the VQ features field
>   vdpa/mlx5: Set an initial size on the VQ
>   vdpa/mlx5: Start off rqt_size with max VQPs
>   vdpa/mlx5: Set mkey modified flags on all VQs
>   vdpa/mlx5: Allow creation of blank VQs
>   vdpa/mlx5: Accept Init -> Ready VQ transition in resume_vq()
>   vdpa/mlx5: Add error code for suspend/resume VQ
>   vdpa/mlx5: Consolidate all VQ modify to Ready to use resume_vq()
>   vdpa/mlx5: Forward error in suspend/resume device
>   vdpa/mlx5: Use suspend/resume during VQP change
>   vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time
>   vdpa/mlx5: Re-create HW VQs under certain conditions
>   vdpa/mlx5: Don't reset VQs more than necessary
>   vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready()
> 
> Jeff Johnson (3):
>   vringh: add MODULE_DESCRIPTION()
>   virtio: add missing MODULE_DESCRIPTION() macros
>   vDPA: add missing MODULE_DESCRIPTION() macros
> 
> Jiri Pirko (19):
>   caif_virtio: use virtio_find_single_vq() for single virtqueue finding
>   virtio: make virtio_find_vqs() call virtio_find_vqs_ctx()
>   virtio: make virtio_find_single_vq() call virtio_find_vqs()
>   virtio: introduce virtio_queue_info struct and find_vqs_info() config op
>   virtio_pci: convert vp_*find_vqs() ops to find_vqs_info()
>   virtio: convert find_vqs() op implementations to find_vqs_info()
>   virtio: call virtio_find_vqs_info() from virtio_find_single_vq() 
> directly
>   virtio: remove the original find_vqs() op
>   virtio: rename find_vqs_info() op to find_vqs()
>   virtio_blk: convert to use virtio_find_vqs_info()
>   virtio_console: convert to use virtio_find_vqs_info()
>   virtio_crypto: convert to use virtio_find_vqs_info()
>   virtio_net: convert to use virtio_find_vqs_info()
>   scsi: virtio_scsi: convert to use virtio_find_vqs_info()
>   virtiofs: convert to use virtio_find_vqs_info()
>   virtio_balloon: convert to use virtio_find_vqs_info()
>   virtio: convert the rest virtio_find_vqs() users to 
> virtio_find_vqs_info()
>   virtio: remove unused virtio_find_vqs() and virtio_find_vqs_ctx() 
> helpers
>   virtio: rename virtio_find_vqs_info() to virtio_find_vqs()
> 
> Michael S. Tsirkin (2):
>   vhost/vsock: always initialize seqpacket_allow
>   vhost: move smp_rmb() into vhost_get_avail_idx()
> 
> Pe

Re: [GIT PULL] virtio: features, fixes, cleanups

2024-07-18 Thread Michael S. Tsirkin
On Thu, Jul 18, 2024 at 08:52:28AM +0800, Jason Wang wrote:
> On Wed, Jul 17, 2024 at 5:30 PM Michael S. Tsirkin  wrote:
> >
> > This is relatively small.
> > I had to drop a buggy commit in the middle so some hashes
> > changed from what was in linux-next.
> > Deferred admin vq scalability fix to after rc2 as a minor issue was
> > found with it recently, but the infrastructure for it
> > is there now.
> >
> > The following changes since commit e9d22f7a6655941fc8b2b942ed354ec780936b3e:
> >
> >   Merge tag 'linux_kselftest-fixes-6.10-rc7' of 
> > git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest 
> > (2024-07-02 13:53:24 -0700)
> >
> > are available in the Git repository at:
> >
> >   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git 
> > tags/for_linus
> >
> > for you to fetch changes up to 6c85d6b653caeba2ef982925703cbb4f2b3b3163:
> >
> >   virtio: rename virtio_find_vqs_info() to virtio_find_vqs() (2024-07-17 
> > 05:20:58 -0400)
> >
> > 
> > virtio: features, fixes, cleanups
> >
> > Several new features here:
> >
> > - Virtio find vqs API has been reworked
> >   (required to fix the scalability issue we have with
> >adminq, which I hope to merge later in the cycle)
> >
> > - vDPA driver for Marvell OCTEON
> >
> > - virtio fs performance improvement
> >
> > - mlx5 migration speedups
> >
> > Fixes, cleanups all over the place.
> >
> > Signed-off-by: Michael S. Tsirkin 
> >
> 
> It looks like this one is missing?
> 
> https://lore.kernel.org/kvm/20240701033159.18133-1-jasow...@redhat.com/T/
> 
> Thanks

It's not included in the full but it's a bugfix and it's subtel enough
that I decided it's best to merge later, in particular when I'm not on
vacation ;)

-- 
MST




[GIT PULL] virtio: features, fixes, cleanups

2024-07-17 Thread Michael S. Tsirkin
This is relatively small.
I had to drop a buggy commit in the middle so some hashes
changed from what was in linux-next.
Deferred admin vq scalability fix to after rc2 as a minor issue was
found with it recently, but the infrastructure for it
is there now.

The following changes since commit e9d22f7a6655941fc8b2b942ed354ec780936b3e:

  Merge tag 'linux_kselftest-fixes-6.10-rc7' of 
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest (2024-07-02 
13:53:24 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 6c85d6b653caeba2ef982925703cbb4f2b3b3163:

  virtio: rename virtio_find_vqs_info() to virtio_find_vqs() (2024-07-17 
05:20:58 -0400)


virtio: features, fixes, cleanups

Several new features here:

- Virtio find vqs API has been reworked
  (required to fix the scalability issue we have with
   adminq, which I hope to merge later in the cycle)

- vDPA driver for Marvell OCTEON

- virtio fs performance improvement

- mlx5 migration speedups

Fixes, cleanups all over the place.

Signed-off-by: Michael S. Tsirkin 


Denis Arefev (1):
  net: missing check virtio

Dragos Tatulea (24):
  vdpa/mlx5: Clarify meaning thorough function rename
  vdpa/mlx5: Make setup/teardown_vq_resources() symmetrical
  vdpa/mlx5: Drop redundant code
  vdpa/mlx5: Drop redundant check in teardown_virtqueues()
  vdpa/mlx5: Iterate over active VQs during suspend/resume
  vdpa/mlx5: Remove duplicate suspend code
  vdpa/mlx5: Initialize and reset device with one queue pair
  vdpa/mlx5: Clear and reinitialize software VQ data on reset
  vdpa/mlx5: Rename init_mvqs
  vdpa/mlx5: Add support for modifying the virtio_version VQ field
  vdpa/mlx5: Add support for modifying the VQ features field
  vdpa/mlx5: Set an initial size on the VQ
  vdpa/mlx5: Start off rqt_size with max VQPs
  vdpa/mlx5: Set mkey modified flags on all VQs
  vdpa/mlx5: Allow creation of blank VQs
  vdpa/mlx5: Accept Init -> Ready VQ transition in resume_vq()
  vdpa/mlx5: Add error code for suspend/resume VQ
  vdpa/mlx5: Consolidate all VQ modify to Ready to use resume_vq()
  vdpa/mlx5: Forward error in suspend/resume device
  vdpa/mlx5: Use suspend/resume during VQP change
  vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time
  vdpa/mlx5: Re-create HW VQs under certain conditions
  vdpa/mlx5: Don't reset VQs more than necessary
  vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready()

Jeff Johnson (3):
  vringh: add MODULE_DESCRIPTION()
  virtio: add missing MODULE_DESCRIPTION() macros
  vDPA: add missing MODULE_DESCRIPTION() macros

Jiri Pirko (19):
  caif_virtio: use virtio_find_single_vq() for single virtqueue finding
  virtio: make virtio_find_vqs() call virtio_find_vqs_ctx()
  virtio: make virtio_find_single_vq() call virtio_find_vqs()
  virtio: introduce virtio_queue_info struct and find_vqs_info() config op
  virtio_pci: convert vp_*find_vqs() ops to find_vqs_info()
  virtio: convert find_vqs() op implementations to find_vqs_info()
  virtio: call virtio_find_vqs_info() from virtio_find_single_vq() directly
  virtio: remove the original find_vqs() op
  virtio: rename find_vqs_info() op to find_vqs()
  virtio_blk: convert to use virtio_find_vqs_info()
  virtio_console: convert to use virtio_find_vqs_info()
  virtio_crypto: convert to use virtio_find_vqs_info()
  virtio_net: convert to use virtio_find_vqs_info()
  scsi: virtio_scsi: convert to use virtio_find_vqs_info()
  virtiofs: convert to use virtio_find_vqs_info()
  virtio_balloon: convert to use virtio_find_vqs_info()
  virtio: convert the rest virtio_find_vqs() users to virtio_find_vqs_info()
  virtio: remove unused virtio_find_vqs() and virtio_find_vqs_ctx() helpers
  virtio: rename virtio_find_vqs_info() to virtio_find_vqs()

Michael S. Tsirkin (2):
  vhost/vsock: always initialize seqpacket_allow
  vhost: move smp_rmb() into vhost_get_avail_idx()

Peter-Jan Gootzen (2):
  virtio-fs: let -ENOMEM bubble up or burst gently
  virtio-fs: improved request latencies when Virtio queue is full

Srujana Challa (1):
  virtio: vdpa: vDPA driver for Marvell OCTEON DPU devices

Xuan Zhuo (1):
  virtio_ring: fix KMSAN error for premapped mode

Yunseong Kim (1):
  tools/virtio: creating pipe assertion in vringh_test

Zhu Lingshan (1):
  MAINTAINERS: Change lingshan's email to kernel.org

zhenwei pi (1):
  virtio_balloon: separate vm events into a function

 MAINTAINERS   |   7 +-
 arch/um/drivers/virt-pci.c|   8 +-
 arch/um/drivers/virtio_uml.c  |  12 +-
 drivers/bl

Re: [PATCH net-next v3 3/3] virtio-net: synchronize operstate with admin state on up/down

2024-07-17 Thread Michael S. Tsirkin
On Wed, Jul 17, 2024 at 09:19:02AM +0800, Jason Wang wrote:
> On Wed, Jul 10, 2024 at 11:03 AM Jason Wang  wrote:
> >
> > On Tue, Jul 9, 2024 at 9:28 PM Michael S. Tsirkin  wrote:
> > >
> > > On Tue, Jul 09, 2024 at 04:02:14PM +0800, Jason Wang wrote:
> > > > This patch synchronize operstate with admin state per RFC2863.
> > > >
> > > > This is done by trying to toggle the carrier upon open/close and
> > > > synchronize with the config change work. This allows propagate status
> > > > correctly to stacked devices like:
> > > >
> > > > ip link add link enp0s3 macvlan0 type macvlan
> > > > ip link set link enp0s3 down
> > > > ip link show
> > > >
> > > > Before this patch:
> > > >
> > > > 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN 
> > > > mode DEFAULT group default qlen 1000
> > > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> > > > ..
> > > > 5: macvlan0@enp0s3:  mtu 1500 
> > > > qdisc noqueue state UP mode DEFAULT group default qlen 1000
> > > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> > > >
> > > > After this patch:
> > > >
> > > > 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN 
> > > > mode DEFAULT group default qlen 1000
> > > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> > > > ...
> > > > 5: macvlan0@enp0s3:  mtu 1500 
> > > > qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
> > > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> > >
> > > I think that the commit log is confusing. It seems to say that
> > > the issue fixed is synchronizing state with hardware
> > > config change.
> > > But your example does not show any
> > > hardware change. Isn't this example really just
> > > a side effect of setting carrier off on close?
> >
> > The main goal for this patch is to make virtio-net follow RFC2863. The
> > main thing that is missed is to synchronize the operstate with admin
> > state, if we do this, we get several good results, one of the obvious
> > one is to allow virtio-net to propagate status to the upper layer, for
> > example if the admin state of the lower virtio-net is down it should
> > be propagated to the macvlan on top, so I give the example of using a
> > stacked device. I'm not we had others but the commit log is probably
> > too small to say all of it.
> 
> Michael, any more comments on this?
> 
> Thans


Still don't get it, sorry.
> > > > This is done by trying to toggle the carrier upon open/close and
> > > > synchronize with the config change work.
What does this sentence mean? What is not synchronized with config
change that needs to be?

> >
> > >
> > >
> > > > Cc: Venkat Venkatsubra 
> > > > Cc: Gia-Khanh Nguyen 
> > > > Signed-off-by: Jason Wang 
> > >
> > > Yes but this just forces lots of re-reads of config on each
> > > open/close for no good reason.
> >
> > Does it really harm? Technically the link status could be changed
> > several times when the admin state is down as well.
> >
> > > Config interrupt is handled in core, you can read once
> > > on probe and then handle config changes.
> >
> > Per RFC2863, the code tries to avoid dealing with any operstate change
> > via config space read when the admin state is down.
> >
> > >
> > >
> > >
> > >
> > >
> > > > ---
> > > >  drivers/net/virtio_net.c | 64 
> > > >  1 file changed, 38 insertions(+), 26 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 0b4747e81464..e6626ba25b29 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -2476,6 +2476,25 @@ static void virtnet_cancel_dim(struct 
> > > > virtnet_info *vi, struct dim *dim)
> > > >   net_dim_work_cancel(dim);
> > > >  }
> > > >
> > > > +static void virtnet_update_settings(struct virtnet_info *vi)
> > > > +{
> > > > + u32 speed;
> > > > + u8 duplex;
> > > > +
> > > > + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> > > > + return;

Re: [PATCH V2 3/7] vhost-vdpa: VHOST_NEW_OWNER

2024-07-15 Thread Michael S. Tsirkin
On Mon, Jul 15, 2024 at 10:29:26AM -0400, Steven Sistare wrote:
> On 7/15/2024 5:07 AM, Michael S. Tsirkin wrote:
> > On Fri, Jul 12, 2024 at 06:18:49AM -0700, Steve Sistare wrote:
> > > Add an ioctl to transfer file descriptor ownership and pinned memory
> > > accounting from one process to another.
> > > 
> > > This is more efficient than VHOST_RESET_OWNER followed by VHOST_SET_OWNER,
> > > as that would unpin all physical pages, requiring them to be repinned in
> > > the new process.  That would cost multiple seconds for large memories, and
> > > be incurred during a virtual machine's pause time during live update.
> > > 
> > > Signed-off-by: Steve Sistare 
> > 
> > Please, we just need to switch to use iommufd for pinning.
> > Piling up all these hacks gets us nowhere.
> 
> I am working on iommufd kernel interfaces and QEMU changes.  But who is 
> working
> on iommufd support for vdpa? If no one, or not for years, then adding these
> small interfaces to vdpa plugs a signficant gap in live update coverage.
> 
> FWIW, the iommufd interfaces for live update will look much the same: change 
> owner
> and pinned memory accounting, and update virtual addresses.  So adding that 
> to vdpa
> will not make it look like an odd duck.
> 
> - Steve

I think that no one is working on it - Cindy posted some rfcs in January
("vhost-vdpa: add support for iommufd").  Feel free to pick that up.
What you described is just more of a reason not to duplicate this code.
And it's always the same: a small extension here, a small extension there.
If you can make do with existing kernel interfaces, fine,
one can argue that userspace code is useful to support existing kernels.

-- 
MST




Re: [PATCH V2 3/7] vhost-vdpa: VHOST_NEW_OWNER

2024-07-15 Thread Michael S. Tsirkin
On Fri, Jul 12, 2024 at 06:18:49AM -0700, Steve Sistare wrote:
> Add an ioctl to transfer file descriptor ownership and pinned memory
> accounting from one process to another.
> 
> This is more efficient than VHOST_RESET_OWNER followed by VHOST_SET_OWNER,
> as that would unpin all physical pages, requiring them to be repinned in
> the new process.  That would cost multiple seconds for large memories, and
> be incurred during a virtual machine's pause time during live update.
> 
> Signed-off-by: Steve Sistare 

Please, we just need to switch to use iommufd for pinning.
Piling up all these hacks gets us nowhere.


> ---
>  drivers/vhost/vdpa.c   | 41 ++
>  drivers/vhost/vhost.c  | 15 ++
>  drivers/vhost/vhost.h  |  1 +
>  include/uapi/linux/vhost.h | 10 ++
>  4 files changed, 67 insertions(+)
> 
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index b49e5831b3f0..5cf55ca4ec02 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -632,6 +632,44 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v)
>   return ret;
>  }
>  
> +static long vhost_vdpa_new_owner(struct vhost_vdpa *v)
> +{
> + int r;
> + struct vhost_dev *vdev = >vdev;
> + struct mm_struct *mm_old = vdev->mm;
> + struct mm_struct *mm_new = current->mm;
> + long pinned_vm = v->pinned_vm;
> + unsigned long lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK));
> +
> + if (!mm_old)
> + return -EINVAL;
> + mmgrab(mm_old);
> +
> + if (!v->vdpa->use_va &&
> + pinned_vm + atomic64_read(_new->pinned_vm) > lock_limit) {
> + r = -ENOMEM;
> + goto out;
> + }
> + r = vhost_vdpa_bind_mm(v, mm_new);
> + if (r)
> + goto out;
> +
> + r = vhost_dev_new_owner(vdev);
> + if (r) {
> + vhost_vdpa_bind_mm(v, mm_old);
> + goto out;
> + }
> +
> + if (!v->vdpa->use_va) {
> + atomic64_sub(pinned_vm, _old->pinned_vm);
> + atomic64_add(pinned_vm, _new->pinned_vm);
> + }
> +
> +out:
> + mmdrop(mm_old);
> + return r;
> +}
> +
>  static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
>  void __user *argp)
>  {
> @@ -876,6 +914,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
>   case VHOST_VDPA_RESUME:
>   r = vhost_vdpa_resume(v);
>   break;
> + case VHOST_NEW_OWNER:
> + r = vhost_vdpa_new_owner(v);
> + break;
>   default:
>   r = vhost_dev_ioctl(>vdev, cmd, argp);
>   if (r == -ENOIOCTLCMD)
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index b60955682474..ab40ae50552f 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -963,6 +963,21 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_set_owner);
>  
> +/* Caller should have device mutex */
> +long vhost_dev_new_owner(struct vhost_dev *dev)
> +{
> + if (dev->mm == current->mm)
> + return -EBUSY;
> +
> + if (!vhost_dev_has_owner(dev))
> + return -EINVAL;
> +
> + vhost_detach_mm(dev);
> + vhost_attach_mm(dev);
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(vhost_dev_new_owner);
> +
>  static struct vhost_iotlb *iotlb_alloc(void)
>  {
>   return vhost_iotlb_alloc(max_iotlb_entries,
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index bb75a292d50c..8b2018bb02b1 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -187,6 +187,7 @@ void vhost_dev_init(struct vhost_dev *, struct 
> vhost_virtqueue **vqs,
>   int (*msg_handler)(struct vhost_dev *dev, u32 asid,
>  struct vhost_iotlb_msg *msg));
>  long vhost_dev_set_owner(struct vhost_dev *dev);
> +long vhost_dev_new_owner(struct vhost_dev *dev);
>  bool vhost_dev_has_owner(struct vhost_dev *dev);
>  long vhost_dev_check_owner(struct vhost_dev *);
>  struct vhost_iotlb *vhost_dev_reset_owner_prepare(void);
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index b95dd84eef2d..543d0e3434c3 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -123,6 +123,16 @@
>  #define VHOST_SET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x25, __u64)
>  #define VHOST_GET_BACKEND_FEATURES _IOR(VHOST_VIRTIO, 0x26, __u64)
>  
> +/* Set current process as the new owner of this file descriptor.  The fd must
> + * already be owned, via a prior call to VHOST_SET_OWNER.  The pinned memory
> + * count is transferred from the previous to the new owner.
> + * Errors:
> + *   EINVAL: not owned
> + *   EBUSY:  caller is already the owner
> + *   ENOMEM: RLIMIT_MEMLOCK exceeded
> + */
> +#define VHOST_NEW_OWNER _IO(VHOST_VIRTIO, 0x27)
> +
>  /* VHOST_NET specific defines */
>  
>  /* Attach virtio net ring to a raw socket, or tap device.
> -- 
> 2.39.3




Re: [PATCH V2 3/7] vhost-vdpa: VHOST_NEW_OWNER

2024-07-15 Thread Michael S. Tsirkin
On Mon, Jul 15, 2024 at 10:26:13AM +0800, Jason Wang wrote:
> On Fri, Jul 12, 2024 at 9:19 PM Steve Sistare  
> wrote:
> >
> > Add an ioctl to transfer file descriptor ownership and pinned memory
> > accounting from one process to another.
> >
> > This is more efficient than VHOST_RESET_OWNER followed by VHOST_SET_OWNER,
> > as that would unpin all physical pages, requiring them to be repinned in
> > the new process.  That would cost multiple seconds for large memories, and
> > be incurred during a virtual machine's pause time during live update.
> >
> > Signed-off-by: Steve Sistare 
> > ---
> >  drivers/vhost/vdpa.c   | 41 ++
> >  drivers/vhost/vhost.c  | 15 ++
> >  drivers/vhost/vhost.h  |  1 +
> >  include/uapi/linux/vhost.h | 10 ++
> >  4 files changed, 67 insertions(+)
> >
> > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > index b49e5831b3f0..5cf55ca4ec02 100644
> > --- a/drivers/vhost/vdpa.c
> > +++ b/drivers/vhost/vdpa.c
> > @@ -632,6 +632,44 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v)
> > return ret;
> >  }
> >
> > +static long vhost_vdpa_new_owner(struct vhost_vdpa *v)
> > +{
> > +   int r;
> > +   struct vhost_dev *vdev = >vdev;
> > +   struct mm_struct *mm_old = vdev->mm;
> > +   struct mm_struct *mm_new = current->mm;
> > +   long pinned_vm = v->pinned_vm;
> > +   unsigned long lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK));
> > +
> > +   if (!mm_old)
> > +   return -EINVAL;
> > +   mmgrab(mm_old);
> > +
> > +   if (!v->vdpa->use_va &&
> > +   pinned_vm + atomic64_read(_new->pinned_vm) > lock_limit) {
> > +   r = -ENOMEM;
> > +   goto out;
> > +   }
> 
> So this seems to allow an arbitrary process to execute this. Seems to be 
> unsafe.
> 
> I wonder if we need to add some checks here, maybe PID or other stuff
> to only allow the owner process to do this.

Not pid pls.


> > +   r = vhost_vdpa_bind_mm(v, mm_new);
> > +   if (r)
> > +   goto out;
> > +
> > +   r = vhost_dev_new_owner(vdev);
> > +   if (r) {
> > +   vhost_vdpa_bind_mm(v, mm_old);
> > +   goto out;
> > +   }
> > +
> > +   if (!v->vdpa->use_va) {
> > +   atomic64_sub(pinned_vm, _old->pinned_vm);
> > +   atomic64_add(pinned_vm, _new->pinned_vm);
> > +   }
> > +
> > +out:
> > +   mmdrop(mm_old);
> > +   return r;
> > +}
> > +
> >  static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
> >void __user *argp)
> >  {
> > @@ -876,6 +914,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file 
> > *filep,
> > case VHOST_VDPA_RESUME:
> > r = vhost_vdpa_resume(v);
> > break;
> > +   case VHOST_NEW_OWNER:
> > +   r = vhost_vdpa_new_owner(v);
> > +   break;
> > default:
> > r = vhost_dev_ioctl(>vdev, cmd, argp);
> > if (r == -ENOIOCTLCMD)
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index b60955682474..ab40ae50552f 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -963,6 +963,21 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(vhost_dev_set_owner);
> >
> > +/* Caller should have device mutex */
> > +long vhost_dev_new_owner(struct vhost_dev *dev)
> > +{
> > +   if (dev->mm == current->mm)
> > +   return -EBUSY;
> > +
> > +   if (!vhost_dev_has_owner(dev))
> > +   return -EINVAL;
> > +
> > +   vhost_detach_mm(dev);
> > +   vhost_attach_mm(dev);
> 
> This seems to do nothing unless I miss something.
> 
> Thanks




Re: [PATCH net-next] virtio_net: Fix napi_skb_cache_put warning

2024-07-14 Thread Michael S. Tsirkin
On Fri, Jul 12, 2024 at 04:53:25AM -0700, Breno Leitao wrote:
> After the commit bdacf3e34945 ("net: Use nested-BH locking for
> napi_alloc_cache.") was merged, the following warning began to appear:
> 
>WARNING: CPU: 5 PID: 1 at net/core/skbuff.c:1451 
> napi_skb_cache_put+0x82/0x4b0
> 
> __warn+0x12f/0x340
> napi_skb_cache_put+0x82/0x4b0
> napi_skb_cache_put+0x82/0x4b0
> report_bug+0x165/0x370
> handle_bug+0x3d/0x80
> exc_invalid_op+0x1a/0x50
> asm_exc_invalid_op+0x1a/0x20
> __free_old_xmit+0x1c8/0x510
> napi_skb_cache_put+0x82/0x4b0
> __free_old_xmit+0x1c8/0x510
> __free_old_xmit+0x1c8/0x510
> __pfx___free_old_xmit+0x10/0x10
> 
> The issue arises because virtio is assuming it's running in NAPI context
> even when it's not, such as in the netpoll case.
> 
> To resolve this, modify virtnet_poll_tx() to only set NAPI when budget
> is available. Same for virtnet_poll_cleantx(), which always assumed that
> it was in a NAPI context.
> 
> Fixes: df133f3f9625 ("virtio_net: bulk free tx skbs")
> Suggested-by: Jakub Kicinski 
> Signed-off-by: Breno Leitao 

Acked-by: Michael S. Tsirkin 

though I'm not sure I understand the connection with bdacf3e34945.

> ---
>  drivers/net/virtio_net.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0b4747e81464..fb1331827308 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2341,7 +2341,7 @@ static int virtnet_receive(struct receive_queue *rq, 
> int budget,
>   return packets;
>  }
>  
> -static void virtnet_poll_cleantx(struct receive_queue *rq)
> +static void virtnet_poll_cleantx(struct receive_queue *rq, int budget)
>  {
>   struct virtnet_info *vi = rq->vq->vdev->priv;
>   unsigned int index = vq2rxq(rq->vq);
> @@ -2359,7 +2359,7 @@ static void virtnet_poll_cleantx(struct receive_queue 
> *rq)
>  
>   do {
>   virtqueue_disable_cb(sq->vq);
> - free_old_xmit(sq, txq, true);
> + free_old_xmit(sq, txq, !!budget);
>   } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>  
>   if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) {
> @@ -2404,7 +2404,7 @@ static int virtnet_poll(struct napi_struct *napi, int 
> budget)
>   unsigned int xdp_xmit = 0;
>   bool napi_complete;
>  
> - virtnet_poll_cleantx(rq);
> + virtnet_poll_cleantx(rq, budget);
>  
>   received = virtnet_receive(rq, budget, _xmit);
>   rq->packets_in_napi += received;
> @@ -2526,7 +2526,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, 
> int budget)
>   txq = netdev_get_tx_queue(vi->dev, index);
>   __netif_tx_lock(txq, raw_smp_processor_id());
>   virtqueue_disable_cb(sq->vq);
> - free_old_xmit(sq, txq, true);
> + free_old_xmit(sq, txq, !!budget);
>  
>   if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) {
>   if (netif_tx_queue_stopped(txq)) {
> -- 
> 2.43.0




Re: [PATCH] virtio: add missing MODULE_DESCRIPTION() macro

2024-07-11 Thread Michael S. Tsirkin
On Thu, Jul 11, 2024 at 11:43:18AM -0700, Jeff Johnson wrote:
> On 6/23/24 10:36, Jeff Johnson wrote:
> > On 6/2/2024 1:25 PM, Jeff Johnson wrote:
> > > make allmodconfig && make W=1 C=1 reports:
> > > WARNING: modpost: missing MODULE_DESCRIPTION() in 
> > > drivers/virtio/virtio_dma_buf.o
> > > 
> > > Add the missing invocation of the MODULE_DESCRIPTION() macro.
> > > 
> > > Signed-off-by: Jeff Johnson 
> > > ---
> > >   drivers/virtio/virtio_dma_buf.c | 1 +
> > >   1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/drivers/virtio/virtio_dma_buf.c 
> > > b/drivers/virtio/virtio_dma_buf.c
> > > index 2521a75009c3..3034a2f605c8 100644
> > > --- a/drivers/virtio/virtio_dma_buf.c
> > > +++ b/drivers/virtio/virtio_dma_buf.c
> > > @@ -85,5 +85,6 @@ int virtio_dma_buf_get_uuid(struct dma_buf *dma_buf,
> > >   }
> > >   EXPORT_SYMBOL(virtio_dma_buf_get_uuid);
> > > +MODULE_DESCRIPTION("dma-bufs for virtio exported objects");
> > >   MODULE_LICENSE("GPL");
> > >   MODULE_IMPORT_NS(DMA_BUF);
> > > 
> > > ---
> > > base-commit: 83814698cf48ce3aadc5d88a3f577f04482ff92a
> > > change-id: 20240602-md-virtio_dma_buf-b3552ca6c5d5
> > > 
> > 
> > Following up to see if anything else is needed from me.
> > Hoping to see this in linux-next :)
> 
> I still don't see this in linux-next so following up to see if anything else
> is needed to get this merged. Adding Greg KH since he's signed off on this
> file before and he's taken quite a few of my cleanups through his trees.
> 
> I'm hoping to have all of these warnings fixed tree-wide in 6.11.
> 
> /jeff

not sure why I tag it and it gets cleared again.
tagged again hope it holds now.

-- 
MST




Re: [PATCH v2 2/2] virtio: fix vq # for balloon

2024-07-10 Thread Michael S. Tsirkin
On Wed, Jul 10, 2024 at 03:54:22PM -0700, Daniel Verkamp wrote:
> On Wed, Jul 10, 2024 at 1:39 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jul 10, 2024 at 12:58:11PM -0700, Daniel Verkamp wrote:
> > > On Wed, Jul 10, 2024 at 11:39 AM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Wed, Jul 10, 2024 at 11:12:34AM -0700, Daniel Verkamp wrote:
> > > > > On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > virtio balloon communicates to the core that in some
> > > > > > configurations vq #s are non-contiguous by setting name
> > > > > > pointer to NULL.
> > > > > >
> > > > > > Unfortunately, core then turned around and just made them
> > > > > > contiguous again. Result is that driver is out of spec.
> > > > >
> > > > > Thanks for fixing this - I think the overall approach of the patch 
> > > > > looks good.
> > > > >
> > > > > > Implement what the API was supposed to do
> > > > > > in the 1st place. Compatibility with buggy hypervisors
> > > > > > is handled inside virtio-balloon, which is the only driver
> > > > > > making use of this facility, so far.
> > > > >
> > > > > In addition to virtio-balloon, I believe the same problem also affects
> > > > > the virtio-fs device, since queue 1 is only supposed to be present if
> > > > > VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are
> > > > > meant to be queue indexes 2 and up. From a look at the Linux driver
> > > > > (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION
> > > > > and assumes that request queues start at index 1 rather than 2, which
> > > > > looks out of spec to me, but the current device implementations (that
> > > > > I am aware of, anyway) are also broken in the same way, so it ends up
> > > > > working today. Queue numbering in a spec-compliant device and the
> > > > > current Linux driver would mismatch; what the driver considers to be
> > > > > the first request queue (index 1) would be ignored by the device since
> > > > > queue index 1 has no function if F_NOTIFICATION isn't negotiated.
> > > >
> > > >
> > > > Oh, thanks a lot for pointing this out!
> > > >
> > > > I see so this patch is no good as is, we need to add a workaround for
> > > > virtio-fs first.
> > > >
> > > > QEMU workaround is simple - just add an extra queue. But I did not
> > > > reasearch how this would interact with vhost-user.
> > > >
> > > > From driver POV, I guess we could just ignore queue # 1 - would that be
> > > > ok or does it have performance implications?
> > >
> > > As a driver workaround for non-compliant devices, I think ignoring the
> > > first request queue would be a reasonable approach if the device's
> > > config advertises num_request_queues > 1. Unfortunately, both
> > > virtiofsd and crosvm's virtio-fs device have hard-coded
> > > num_request_queues =1, so this won't help with those existing devices.
> >
> > Do they care what the vq # is though?
> > We could do some magic to translate VQ #s in qemu.
> >
> >
> > > Maybe there are other devices that we would need to consider as well;
> > > commit 529395d2ae64 ("virtio-fs: add multi-queue support") quotes
> > > benchmarks that seem to be from a different virtio-fs implementation
> > > that does support multiple request queues, so the workaround could
> > > possibly be used there.
> > >
> > > > Or do what I did for balloon here: try with spec compliant #s first,
> > > > if that fails then assume it's the spec issue and shift by 1.
> > >
> > > If there is a way to "guess and check" without breaking spec-compliant
> > > devices, that sounds reasonable too; however, I'm not sure how this
> > > would work out in practice: an existing non-compliant device may fail
> > > to start if the driver tries to enable queue index 2 when it only
> > > supports one request queue,
> >
> > You don't try to enable queue - driver starts by checking queue size.
> > The way my patch works is that it assumes a non existing queue has
> > size 0 if not available.
> >
> > This was actually a docum

Re: [PATCH v2 2/2] virtio: fix vq # for balloon

2024-07-10 Thread Michael S. Tsirkin
On Wed, Jul 10, 2024 at 12:58:11PM -0700, Daniel Verkamp wrote:
> On Wed, Jul 10, 2024 at 11:39 AM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jul 10, 2024 at 11:12:34AM -0700, Daniel Verkamp wrote:
> > > On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > virtio balloon communicates to the core that in some
> > > > configurations vq #s are non-contiguous by setting name
> > > > pointer to NULL.
> > > >
> > > > Unfortunately, core then turned around and just made them
> > > > contiguous again. Result is that driver is out of spec.
> > >
> > > Thanks for fixing this - I think the overall approach of the patch looks 
> > > good.
> > >
> > > > Implement what the API was supposed to do
> > > > in the 1st place. Compatibility with buggy hypervisors
> > > > is handled inside virtio-balloon, which is the only driver
> > > > making use of this facility, so far.
> > >
> > > In addition to virtio-balloon, I believe the same problem also affects
> > > the virtio-fs device, since queue 1 is only supposed to be present if
> > > VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are
> > > meant to be queue indexes 2 and up. From a look at the Linux driver
> > > (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION
> > > and assumes that request queues start at index 1 rather than 2, which
> > > looks out of spec to me, but the current device implementations (that
> > > I am aware of, anyway) are also broken in the same way, so it ends up
> > > working today. Queue numbering in a spec-compliant device and the
> > > current Linux driver would mismatch; what the driver considers to be
> > > the first request queue (index 1) would be ignored by the device since
> > > queue index 1 has no function if F_NOTIFICATION isn't negotiated.
> >
> >
> > Oh, thanks a lot for pointing this out!
> >
> > I see so this patch is no good as is, we need to add a workaround for
> > virtio-fs first.
> >
> > QEMU workaround is simple - just add an extra queue. But I did not
> > reasearch how this would interact with vhost-user.
> >
> > From driver POV, I guess we could just ignore queue # 1 - would that be
> > ok or does it have performance implications?
> 
> As a driver workaround for non-compliant devices, I think ignoring the
> first request queue would be a reasonable approach if the device's
> config advertises num_request_queues > 1. Unfortunately, both
> virtiofsd and crosvm's virtio-fs device have hard-coded
> num_request_queues =1, so this won't help with those existing devices.

Do they care what the vq # is though?
We could do some magic to translate VQ #s in qemu.


> Maybe there are other devices that we would need to consider as well;
> commit 529395d2ae64 ("virtio-fs: add multi-queue support") quotes
> benchmarks that seem to be from a different virtio-fs implementation
> that does support multiple request queues, so the workaround could
> possibly be used there.
> 
> > Or do what I did for balloon here: try with spec compliant #s first,
> > if that fails then assume it's the spec issue and shift by 1.
> 
> If there is a way to "guess and check" without breaking spec-compliant
> devices, that sounds reasonable too; however, I'm not sure how this
> would work out in practice: an existing non-compliant device may fail
> to start if the driver tries to enable queue index 2 when it only
> supports one request queue,

You don't try to enable queue - driver starts by checking queue size.
The way my patch works is that it assumes a non existing queue has
size 0 if not available.

This was actually a documented way to check for PCI and MMIO:
Read the virtqueue size from queue_size. This controls how big the 
virtqueue is (see 2.6 Virtqueues).
If this field is 0, the virtqueue does not exist.
MMIO:
If the returned value is zero (0x0) the queue is not available.

unfortunately not for CCW, but I guess CCW implementations outside
of QEMU are uncommon enough that we can assume it's the same?


To me the above is also a big hint that drivers are allowed to
query size for queues that do not exist.



> and a spec-compliant device would probably
> balk if the driver tries to enable queue 1 but does not negotiate
> VIRTIO_FS_F_NOTIFICATION. If there's a way to reset and retry the
> whole virtio device initialization process if a device fails like
> this, then maybe it's feasible. (Or can the driver tweak the virtqueue
> configuration and try to set DRIVER_OK repeatedly until it works? It's
> not clear to me if this is allowed by the spec, or what device
> implementations actually do in practice in this scenario.)
> 
> Thanks,
> -- Daniel

My patch starts with a spec compliant behaviour. If that fails,
try non-compliant one as a fallback.

-- 
MST




Re: [PATCH v2 2/2] virtio: fix vq # for balloon

2024-07-10 Thread Michael S. Tsirkin
On Wed, Jul 10, 2024 at 11:12:34AM -0700, Daniel Verkamp wrote:
> On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin  wrote:
> >
> > virtio balloon communicates to the core that in some
> > configurations vq #s are non-contiguous by setting name
> > pointer to NULL.
> >
> > Unfortunately, core then turned around and just made them
> > contiguous again. Result is that driver is out of spec.
> 
> Thanks for fixing this - I think the overall approach of the patch looks good.
> 
> > Implement what the API was supposed to do
> > in the 1st place. Compatibility with buggy hypervisors
> > is handled inside virtio-balloon, which is the only driver
> > making use of this facility, so far.
> 
> In addition to virtio-balloon, I believe the same problem also affects
> the virtio-fs device, since queue 1 is only supposed to be present if
> VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are
> meant to be queue indexes 2 and up. From a look at the Linux driver
> (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION
> and assumes that request queues start at index 1 rather than 2, which
> looks out of spec to me, but the current device implementations (that
> I am aware of, anyway) are also broken in the same way, so it ends up
> working today. Queue numbering in a spec-compliant device and the
> current Linux driver would mismatch; what the driver considers to be
> the first request queue (index 1) would be ignored by the device since
> queue index 1 has no function if F_NOTIFICATION isn't negotiated.
> 
> [...]
> > diff --git a/drivers/virtio/virtio_pci_common.c 
> > b/drivers/virtio/virtio_pci_common.c
> > index 7d82facafd75..fa606e7321ad 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -293,7 +293,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> > unsigned int nvqs,
> > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > struct virtqueue_info *vqi;
> > u16 msix_vec;
> > -   int i, err, nvectors, allocated_vectors, queue_idx = 0;
> > +   int i, err, nvectors, allocated_vectors;
> >
> > vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
> > if (!vp_dev->vqs)
> > @@ -332,7 +332,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> > unsigned int nvqs,
> > msix_vec = allocated_vectors++;
> > else
> > msix_vec = VP_MSIX_VQ_VECTOR;
> > -   vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback,
> > +   vqs[i] = vp_setup_vq(vdev, i, vqi->callback,
> >  vqi->name, vqi->ctx, msix_vec);
> > if (IS_ERR(vqs[i])) {
> > err = PTR_ERR(vqs[i]);
> > @@ -368,7 +368,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, 
> > unsigned int nvqs,
> > struct virtqueue_info vqs_info[])
> >  {
> > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > -   int i, err, queue_idx = 0;
> > +   int i, err;
> >
> > vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
> > if (!vp_dev->vqs)
> > @@ -388,8 +388,13 @@ static int vp_find_vqs_intx(struct virtio_device 
> > *vdev, unsigned int nvqs,
> > vqs[i] = NULL;
> > continue;
> > }
> > +<<<<<<< HEAD
> > vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback,
> >  vqi->name, vqi->ctx,
> > +===
> > +   vqs[i] = vp_setup_vq(vdev, i, callbacks[i], names[i],
> > +ctx ? ctx[i] : false,
> > +>>>>>>> f814759f80b7... virtio: fix vq # for balloon
> 
> This still has merge markers in it.
> 
> Thanks,
> -- Daniel

ouch forgot to commit ;)




Re: [PATCH v2 2/2] virtio: fix vq # for balloon

2024-07-10 Thread Michael S. Tsirkin
On Wed, Jul 10, 2024 at 11:12:34AM -0700, Daniel Verkamp wrote:
> On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin  wrote:
> >
> > virtio balloon communicates to the core that in some
> > configurations vq #s are non-contiguous by setting name
> > pointer to NULL.
> >
> > Unfortunately, core then turned around and just made them
> > contiguous again. Result is that driver is out of spec.
> 
> Thanks for fixing this - I think the overall approach of the patch looks good.
> 
> > Implement what the API was supposed to do
> > in the 1st place. Compatibility with buggy hypervisors
> > is handled inside virtio-balloon, which is the only driver
> > making use of this facility, so far.
> 
> In addition to virtio-balloon, I believe the same problem also affects
> the virtio-fs device, since queue 1 is only supposed to be present if
> VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are
> meant to be queue indexes 2 and up. From a look at the Linux driver
> (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION
> and assumes that request queues start at index 1 rather than 2, which
> looks out of spec to me, but the current device implementations (that
> I am aware of, anyway) are also broken in the same way, so it ends up
> working today. Queue numbering in a spec-compliant device and the
> current Linux driver would mismatch; what the driver considers to be
> the first request queue (index 1) would be ignored by the device since
> queue index 1 has no function if F_NOTIFICATION isn't negotiated.


Oh, thanks a lot for pointing this out!

I see so this patch is no good as is, we need to add a workaround for
virtio-fs first.

QEMU workaround is simple - just add an extra queue. But I did not
reasearch how this would interact with vhost-user.

>From driver POV, I guess we could just ignore queue # 1 - would that be
ok or does it have performance implications?
Or do what I did for balloon here: try with spec compliant #s first,
if that fails then assume it's the spec issue and shift by 1.


> [...]
> > diff --git a/drivers/virtio/virtio_pci_common.c 
> > b/drivers/virtio/virtio_pci_common.c
> > index 7d82facafd75..fa606e7321ad 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -293,7 +293,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> > unsigned int nvqs,
> > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > struct virtqueue_info *vqi;
> > u16 msix_vec;
> > -   int i, err, nvectors, allocated_vectors, queue_idx = 0;
> > +   int i, err, nvectors, allocated_vectors;
> >
> > vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
> > if (!vp_dev->vqs)
> > @@ -332,7 +332,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> > unsigned int nvqs,
> > msix_vec = allocated_vectors++;
> > else
> > msix_vec = VP_MSIX_VQ_VECTOR;
> > -   vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback,
> > +   vqs[i] = vp_setup_vq(vdev, i, vqi->callback,
> >  vqi->name, vqi->ctx, msix_vec);
> > if (IS_ERR(vqs[i])) {
> > err = PTR_ERR(vqs[i]);
> > @@ -368,7 +368,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, 
> > unsigned int nvqs,
> > struct virtqueue_info vqs_info[])
> >  {
> > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > -   int i, err, queue_idx = 0;
> > +   int i, err;
> >
> > vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
> > if (!vp_dev->vqs)
> > @@ -388,8 +388,13 @@ static int vp_find_vqs_intx(struct virtio_device 
> > *vdev, unsigned int nvqs,
> > vqs[i] = NULL;
> > continue;
> > }
> > +<<<<<<< HEAD
> > vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback,
> >  vqi->name, vqi->ctx,
> > +===
> > +   vqs[i] = vp_setup_vq(vdev, i, callbacks[i], names[i],
> > +ctx ? ctx[i] : false,
> > +>>>>>>> f814759f80b7... virtio: fix vq # for balloon
> 
> This still has merge markers in it.
> 
> Thanks,
> -- Daniel




[PATCH v2 2/2] virtio: fix vq # for balloon

2024-07-10 Thread Michael S. Tsirkin
virtio balloon communicates to the core that in some
configurations vq #s are non-contiguous by setting name
pointer to NULL.

Unfortunately, core then turned around and just made them
contiguous again. Result is that driver is out of spec.

Implement what the API was supposed to do
in the 1st place. Compatibility with buggy hypervisors
is handled inside virtio-balloon, which is the only driver
making use of this facility, so far.

Message-ID: 
Fixes: b0c504f15471 ("virtio-balloon: add support for providing free page 
reports to host")
Cc: "Alexander Duyck" 
Signed-off-by: Michael S. Tsirkin 
---
 arch/um/drivers/virtio_uml.c   |  4 ++--
 drivers/remoteproc/remoteproc_virtio.c |  4 ++--
 drivers/s390/virtio/virtio_ccw.c   |  4 ++--
 drivers/virtio/virtio_mmio.c   |  4 ++--
 drivers/virtio/virtio_pci_common.c | 11 ---
 drivers/virtio/virtio_vdpa.c   |  4 ++--
 6 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/um/drivers/virtio_uml.c b/arch/um/drivers/virtio_uml.c
index 2b6e701776b6..c903e4959f51 100644
--- a/arch/um/drivers/virtio_uml.c
+++ b/arch/um/drivers/virtio_uml.c
@@ -1019,7 +1019,7 @@ static int vu_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
   struct irq_affinity *desc)
 {
struct virtio_uml_device *vu_dev = to_virtio_uml_device(vdev);
-   int i, queue_idx = 0, rc;
+   int i, rc;
struct virtqueue *vq;
 
/* not supported for now */
@@ -1038,7 +1038,7 @@ static int vu_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
continue;
}
 
-   vqs[i] = vu_setup_vq(vdev, queue_idx++, vqi->callback,
+   vqs[i] = vu_setup_vq(vdev, i, vqi->callback,
 vqi->name, vqi->ctx);
if (IS_ERR(vqs[i])) {
rc = PTR_ERR(vqs[i]);
diff --git a/drivers/remoteproc/remoteproc_virtio.c 
b/drivers/remoteproc/remoteproc_virtio.c
index d3f39009b28e..1019b2825c26 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -185,7 +185,7 @@ static int rproc_virtio_find_vqs(struct virtio_device 
*vdev, unsigned int nvqs,
 struct virtqueue_info vqs_info[],
 struct irq_affinity *desc)
 {
-   int i, ret, queue_idx = 0;
+   int i, ret;
 
for (i = 0; i < nvqs; ++i) {
struct virtqueue_info *vqi = _info[i];
@@ -195,7 +195,7 @@ static int rproc_virtio_find_vqs(struct virtio_device 
*vdev, unsigned int nvqs,
continue;
}
 
-   vqs[i] = rp_find_vq(vdev, queue_idx++, vqi->callback,
+   vqs[i] = rp_find_vq(vdev, i, vqi->callback,
vqi->name, vqi->ctx);
if (IS_ERR(vqs[i])) {
ret = PTR_ERR(vqs[i]);
diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index 62eca9419ad7..82a3440bbabb 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -694,7 +694,7 @@ static int virtio_ccw_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
 {
struct virtio_ccw_device *vcdev = to_vc_device(vdev);
dma64_t *indicatorp = NULL;
-   int ret, i, queue_idx = 0;
+   int ret, i;
struct ccw1 *ccw;
dma32_t indicatorp_dma = 0;
 
@@ -710,7 +710,7 @@ static int virtio_ccw_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
continue;
}
 
-   vqs[i] = virtio_ccw_setup_vq(vdev, queue_idx++, vqi->callback,
+   vqs[i] = virtio_ccw_setup_vq(vdev, i, vqi->callback,
 vqi->name, vqi->ctx, ccw);
if (IS_ERR(vqs[i])) {
ret = PTR_ERR(vqs[i]);
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 90e784e7b721..db6a0366f082 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -494,7 +494,7 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned 
int nvqs,
 {
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
int irq = platform_get_irq(vm_dev->pdev, 0);
-   int i, err, queue_idx = 0;
+   int i, err;
 
if (irq < 0)
return irq;
@@ -515,7 +515,7 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned 
int nvqs,
continue;
}
 
-   vqs[i] = vm_setup_vq(vdev, queue_idx++, vqi->callback,
+   vqs[i] = vm_setup_vq(vdev, i, vqi->callback,
 vqi->name, vqi->ctx);
if (IS_ERR(vqs[i])) {
vm_del_vqs(vdev);
diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index 7d8

[PATCH v2 1/2] virtio_balloon: add work around for out of spec QEMU

2024-07-10 Thread Michael S. Tsirkin
QEMU implemented the configuration
VIRTIO_BALLOON_F_REPORTING && ! VIRTIO_BALLOON_F_FREE_PAGE_HINT
incorrectly: it then uses vq3 for reporting, spec says it is always 4.

This is masked by a corresponding bug in driver:
add a work around as I'm going to try and fix the driver bug.

Message-ID: 
Fixes: b0c504f15471 ("virtio-balloon: add support for providing free page 
reports to host")
Cc: "Alexander Duyck" 
Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_balloon.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 54469277ca30..eebeab863697 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -589,8 +589,23 @@ static int init_vqs(struct virtio_balloon *vb)
 
err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs,
  vqs_info, NULL);
-   if (err)
-   return err;
+   if (err) {
+   /*
+* Try to work around QEMU bug which since 2020 confused vq 
numbers
+* when VIRTIO_BALLOON_F_REPORTING but not
+* VIRTIO_BALLOON_F_FREE_PAGE_HINT are offered.
+*/
+   if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING) &&
+   !virtio_has_feature(vb->vdev, 
VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   vqs_info[VIRTIO_BALLOON_VQ_FREE_PAGE].name = 
"reporting_vq";
+   vqs_info[VIRTIO_BALLOON_VQ_FREE_PAGE].callback = 
balloon_ack;
+   err = virtio_find_vqs(vb->vdev,
+ VIRTIO_BALLOON_VQ_REPORTING, 
vqs_info, NULL);
+   }
+
+   if (err)
+   return err;
+   }
 
vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE];
vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE];
-- 
MST




Re: [PATCH 1/2] virtio_balloon: add work around for out of spec QEMU

2024-07-10 Thread Michael S. Tsirkin
On Wed, Jul 10, 2024 at 03:37:49PM +0800, Jason Wang wrote:
> On Wed, Jul 10, 2024 at 2:16 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jul 10, 2024 at 11:23:20AM +0800, Jason Wang wrote:
> > > On Fri, Jul 5, 2024 at 6:09 PM Michael S. Tsirkin  wrote:
> > > >
> > > > QEMU implemented the configuration
> > > > VIRTIO_BALLOON_F_REPORTING && ! VIRTIO_BALLOON_F_FREE_PAGE_HINT
> > > > incorrectly: it then uses vq3 for reporting, spec says it is always 4.
> > > >
> > > > This is masked by a corresponding bug in driver:
> > > > add a work around as I'm going to try and fix the driver bug.
> > > >
> > > > Signed-off-by: Michael S. Tsirkin 
> > > > ---
> > > >  drivers/virtio/virtio_balloon.c | 19 +--
> > > >  1 file changed, 17 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_balloon.c 
> > > > b/drivers/virtio/virtio_balloon.c
> > > > index 9a61febbd2f7..7dc3fcd56238 100644
> > > > --- a/drivers/virtio/virtio_balloon.c
> > > > +++ b/drivers/virtio/virtio_balloon.c
> > > > @@ -597,8 +597,23 @@ static int init_vqs(struct virtio_balloon *vb)
> > > >
> > > > err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs,
> > > >   callbacks, names, NULL);
> > > > -   if (err)
> > > > -   return err;
> > > > +   if (err) {
> > > > +   /*
> > > > +* Try to work around QEMU bug which since 2020 
> > > > confused vq numbers
> > > > +* when VIRTIO_BALLOON_F_REPORTING but not
> > > > +* VIRTIO_BALLOON_F_FREE_PAGE_HINT are offered.
> > > > +*/
> > > > +   if (virtio_has_feature(vb->vdev, 
> > > > VIRTIO_BALLOON_F_REPORTING) &&
> > > > +   !virtio_has_feature(vb->vdev, 
> > > > VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> > > > +   names[VIRTIO_BALLOON_VQ_FREE_PAGE] = 
> > > > "reporting_vq";
> > > > +   callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = 
> > > > balloon_ack;
> > > > +   err = virtio_find_vqs(vb->vdev,
> > > > + 
> > > > VIRTIO_BALLOON_VQ_REPORTING, vqs, callbacks, names, NULL);
> > > > +   }
> > > > +
> > > > +   if (err)
> > > > +   return err;
> > > > +   }
> > > >
> > > > vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE];
> > > > vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE];
> > > > --
> > > > MST
> > > >
> > >
> > > Acked-by: Jason Wang 
> > >
> > > Do we need a spec to say this is something that needs to be considered
> > > by the driver?
> > >
> > > Thanks
> >
> > I'd say it's a temporary situation that we won't need to bother
> > about in several years.
> 
> I mean, should a newly-written virtio-balloon driver care about this?
> If not, it means it can't work for several Qemu versions.
> 
> Thanks

True - I could not find a way to make it work in a way that
would be compatible with old qemu.


> >
> > --
> > MST
> >




Re: [PATCH 1/2] virtio_balloon: add work around for out of spec QEMU

2024-07-10 Thread Michael S. Tsirkin
On Wed, Jul 10, 2024 at 11:23:20AM +0800, Jason Wang wrote:
> On Fri, Jul 5, 2024 at 6:09 PM Michael S. Tsirkin  wrote:
> >
> > QEMU implemented the configuration
> > VIRTIO_BALLOON_F_REPORTING && ! VIRTIO_BALLOON_F_FREE_PAGE_HINT
> > incorrectly: it then uses vq3 for reporting, spec says it is always 4.
> >
> > This is masked by a corresponding bug in driver:
> > add a work around as I'm going to try and fix the driver bug.
> >
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  drivers/virtio/virtio_balloon.c | 19 +--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_balloon.c 
> > b/drivers/virtio/virtio_balloon.c
> > index 9a61febbd2f7..7dc3fcd56238 100644
> > --- a/drivers/virtio/virtio_balloon.c
> > +++ b/drivers/virtio/virtio_balloon.c
> > @@ -597,8 +597,23 @@ static int init_vqs(struct virtio_balloon *vb)
> >
> > err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs,
> >   callbacks, names, NULL);
> > -   if (err)
> > -   return err;
> > +   if (err) {
> > +   /*
> > +* Try to work around QEMU bug which since 2020 confused vq 
> > numbers
> > +* when VIRTIO_BALLOON_F_REPORTING but not
> > +* VIRTIO_BALLOON_F_FREE_PAGE_HINT are offered.
> > +*/
> > +   if (virtio_has_feature(vb->vdev, 
> > VIRTIO_BALLOON_F_REPORTING) &&
> > +   !virtio_has_feature(vb->vdev, 
> > VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> > +   names[VIRTIO_BALLOON_VQ_FREE_PAGE] = "reporting_vq";
> > +   callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = 
> > balloon_ack;
> > +   err = virtio_find_vqs(vb->vdev,
> > + VIRTIO_BALLOON_VQ_REPORTING, 
> > vqs, callbacks, names, NULL);
> > +   }
> > +
> > +   if (err)
> > +   return err;
> > +   }
> >
> > vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE];
> > vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE];
> > --
> > MST
> >
> 
> Acked-by: Jason Wang 
> 
> Do we need a spec to say this is something that needs to be considered
> by the driver?
> 
> Thanks

I'd say it's a temporary situation that we won't need to bother
about in several years.

-- 
MST




Re: [PATCH v3 0/2] vdpa: support set mac address from vdpa tool

2024-07-10 Thread Michael S. Tsirkin
On Wed, Jul 10, 2024 at 11:05:48AM +0800, Jason Wang wrote:
> On Tue, Jul 9, 2024 at 8:42 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Jul 09, 2024 at 02:19:19PM +0800, Cindy Lu wrote:
> > > On Tue, 9 Jul 2024 at 11:59, Parav Pandit  wrote:
> > > >
> > > > Hi Cindy,
> > > >
> > > > > From: Cindy Lu 
> > > > > Sent: Monday, July 8, 2024 12:17 PM
> > > > >
> > > > > Add support for setting the MAC address using the VDPA tool.
> > > > > This feature will allow setting the MAC address using the VDPA tool.
> > > > > For example, in vdpa_sim_net, the implementation sets the MAC address 
> > > > > to
> > > > > the config space. However, for other drivers, they can implement 
> > > > > their own
> > > > > function, not limited to the config space.
> > > > >
> > > > > Changelog v2
> > > > >  - Changed the function name to prevent misunderstanding
> > > > >  - Added check for blk device
> > > > >  - Addressed the comments
> > > > > Changelog v3
> > > > >  - Split the function of the net device from 
> > > > > vdpa_nl_cmd_dev_attr_set_doit
> > > > >  - Add a lock for the network device's dev_set_attr operation
> > > > >  - Address the comments
> > > > >
> > > > > Cindy Lu (2):
> > > > >   vdpa: support set mac address from vdpa tool
> > > > >   vdpa_sim_net: Add the support of set mac address
> > > > >
> > > > >  drivers/vdpa/vdpa.c  | 81 
> > > > > 
> > > > >  drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 19 ++-
> > > > >  include/linux/vdpa.h |  9 
> > > > >  include/uapi/linux/vdpa.h|  1 +
> > > > >  4 files changed, 109 insertions(+), 1 deletion(-)
> > > > >
> > > > > --
> > > > > 2.45.0
> > > >
> > > > Mlx5 device already allows setting the mac and mtu during the vdpa 
> > > > device creation time.
> > > > Once the vdpa device is created, it binds to vdpa bus and other driver 
> > > > vhost_vdpa etc bind to it.
> > > > So there was no good reason in the past to support explicit config 
> > > > after device add complicate the flow for synchronizing this.
> > > >
> > > > The user who wants a device with new attributes, as well destroy and 
> > > > recreate the vdpa device with new desired attributes.
> > > >
> > > > vdpa_sim_net can also be extended for similar way when adding the vdpa 
> > > > device.
> > > >
> > > > Have you considered using the existing tool and kernel in place since 
> > > > 2021?
> > > > Such as commit d8ca2fa5be1.
> > > >
> > > > An example of it is,
> > > > $ vdpa dev add name bar mgmtdev vdpasim_net mac 00:11:22:33:44:55 mtu 
> > > > 9000
> > > >
> > > Hi Parav
> > > Really thanks for your comments. The reason for adding this function
> > > is to support Kubevirt.
> > > the problem we meet is that kubevirt chooses one random vdpa device
> > > from the pool and we don't know which one it going to pick. That means
> > > we can't get to know the Mac address before it is created. So we plan
> > > to have this function to change the mac address after it is created
> > > Thanks
> > > cindy
> >
> > Well you will need to change kubevirt to teach it to set
> > mac address, right?
> 
> That's the plan. Adding Leonardo.
> 
> Thanks

So given you are going to change kubevirt, can we
change it to create devices as needed with the
existing API?

> >
> > --
> > MST
> >




Re: [PATCH net-next v3 3/3] virtio-net: synchronize operstate with admin state on up/down

2024-07-09 Thread Michael S. Tsirkin
On Tue, Jul 09, 2024 at 04:02:14PM +0800, Jason Wang wrote:
> This patch synchronize operstate with admin state per RFC2863.
> 
> This is done by trying to toggle the carrier upon open/close and
> synchronize with the config change work. This allows propagate status
> correctly to stacked devices like:
> 
> ip link add link enp0s3 macvlan0 type macvlan
> ip link set link enp0s3 down
> ip link show
> 
> Before this patch:
> 
> 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN mode 
> DEFAULT group default qlen 1000
> link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> ..
> 5: macvlan0@enp0s3:  mtu 1500 qdisc 
> noqueue state UP mode DEFAULT group default qlen 1000
> link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> 
> After this patch:
> 
> 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN mode 
> DEFAULT group default qlen 1000
> link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> ...
> 5: macvlan0@enp0s3:  mtu 1500 qdisc 
> noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
> link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff

I think that the commit log is confusing. It seems to say that
the issue fixed is synchronizing state with hardware
config change. But your example does not show any
hardware change. Isn't this example really just
a side effect of setting carrier off on close?


> Cc: Venkat Venkatsubra 
> Cc: Gia-Khanh Nguyen 
> Signed-off-by: Jason Wang 

Yes but this just forces lots of re-reads of config on each
open/close for no good reason.
Config interrupt is handled in core, you can read once
on probe and then handle config changes.





> ---
>  drivers/net/virtio_net.c | 64 
>  1 file changed, 38 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0b4747e81464..e6626ba25b29 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2476,6 +2476,25 @@ static void virtnet_cancel_dim(struct virtnet_info 
> *vi, struct dim *dim)
>   net_dim_work_cancel(dim);
>  }
>  
> +static void virtnet_update_settings(struct virtnet_info *vi)
> +{
> + u32 speed;
> + u8 duplex;
> +
> + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> + return;
> +
> + virtio_cread_le(vi->vdev, struct virtio_net_config, speed, );
> +
> + if (ethtool_validate_speed(speed))
> + vi->speed = speed;
> +
> + virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, );
> +
> + if (ethtool_validate_duplex(duplex))
> + vi->duplex = duplex;
> +}
> +
>  static int virtnet_open(struct net_device *dev)
>  {
>   struct virtnet_info *vi = netdev_priv(dev);
> @@ -2494,6 +2513,18 @@ static int virtnet_open(struct net_device *dev)
>   goto err_enable_qp;
>   }
>  
> + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> + virtio_config_driver_enable(vi->vdev);
> + /* Do not schedule the config change work as the
> +  * config change notification might have been disabled
> +  * by the virtio core. */

I don't get why you need this.
If the notification was disabled it will just trigger later.
This is exactly why using core is a good idea.


> + virtio_config_changed(vi->vdev);
> + } else {
> + vi->status = VIRTIO_NET_S_LINK_UP;
> + virtnet_update_settings(vi);


And why do we need this here I don't get at all.

> + netif_carrier_on(dev);
> + }



> +
>   return 0;
>  
>  err_enable_qp:
> @@ -2936,12 +2967,19 @@ static int virtnet_close(struct net_device *dev)
>   disable_delayed_refill(vi);
>   /* Make sure refill_work doesn't re-enable napi! */
>   cancel_delayed_work_sync(>refill);
> + /* Make sure config notification doesn't schedule config work */
> + virtio_config_driver_disable(vi->vdev);
> + /* Make sure status updating is cancelled */
> + cancel_work_sync(>config_work);
>  
>   for (i = 0; i < vi->max_queue_pairs; i++) {
>   virtnet_disable_queue_pair(vi, i);
>   virtnet_cancel_dim(vi, >rq[i].dim);
>   }
>  
> + vi->status &= ~VIRTIO_NET_S_LINK_UP;
> + netif_carrier_off(dev);
> +
>   return 0;
>  }
>  
> @@ -4640,25 +4678,6 @@ static void virtnet_init_settings(struct net_device 
> *dev)
>   vi->duplex = DUPLEX_UNKNOWN;
>  }
>  
> -static void virtnet_update_settings(struct virtnet_info *vi)
> -{
> - u32 speed;
> - u8 duplex;
> -
> - if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> - return;
> -
> - virtio_cread_le(vi->vdev, struct virtio_net_config, speed, );
> -
> - if (ethtool_validate_speed(speed))
> - vi->speed = speed;
> -
> - virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, );
> -
> - if (ethtool_validate_duplex(duplex))
> - vi->duplex = duplex;
> -}
> -
>  static u32 

Re: [PATCH v3 0/2] vdpa: support set mac address from vdpa tool

2024-07-09 Thread Michael S. Tsirkin
On Tue, Jul 09, 2024 at 02:19:19PM +0800, Cindy Lu wrote:
> On Tue, 9 Jul 2024 at 11:59, Parav Pandit  wrote:
> >
> > Hi Cindy,
> >
> > > From: Cindy Lu 
> > > Sent: Monday, July 8, 2024 12:17 PM
> > >
> > > Add support for setting the MAC address using the VDPA tool.
> > > This feature will allow setting the MAC address using the VDPA tool.
> > > For example, in vdpa_sim_net, the implementation sets the MAC address to
> > > the config space. However, for other drivers, they can implement their own
> > > function, not limited to the config space.
> > >
> > > Changelog v2
> > >  - Changed the function name to prevent misunderstanding
> > >  - Added check for blk device
> > >  - Addressed the comments
> > > Changelog v3
> > >  - Split the function of the net device from vdpa_nl_cmd_dev_attr_set_doit
> > >  - Add a lock for the network device's dev_set_attr operation
> > >  - Address the comments
> > >
> > > Cindy Lu (2):
> > >   vdpa: support set mac address from vdpa tool
> > >   vdpa_sim_net: Add the support of set mac address
> > >
> > >  drivers/vdpa/vdpa.c  | 81 
> > >  drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 19 ++-
> > >  include/linux/vdpa.h |  9 
> > >  include/uapi/linux/vdpa.h|  1 +
> > >  4 files changed, 109 insertions(+), 1 deletion(-)
> > >
> > > --
> > > 2.45.0
> >
> > Mlx5 device already allows setting the mac and mtu during the vdpa device 
> > creation time.
> > Once the vdpa device is created, it binds to vdpa bus and other driver 
> > vhost_vdpa etc bind to it.
> > So there was no good reason in the past to support explicit config after 
> > device add complicate the flow for synchronizing this.
> >
> > The user who wants a device with new attributes, as well destroy and 
> > recreate the vdpa device with new desired attributes.
> >
> > vdpa_sim_net can also be extended for similar way when adding the vdpa 
> > device.
> >
> > Have you considered using the existing tool and kernel in place since 2021?
> > Such as commit d8ca2fa5be1.
> >
> > An example of it is,
> > $ vdpa dev add name bar mgmtdev vdpasim_net mac 00:11:22:33:44:55 mtu 9000
> >
> Hi Parav
> Really thanks for your comments. The reason for adding this function
> is to support Kubevirt.
> the problem we meet is that kubevirt chooses one random vdpa device
> from the pool and we don't know which one it going to pick. That means
> we can't get to know the Mac address before it is created. So we plan
> to have this function to change the mac address after it is created
> Thanks
> cindy

Well you will need to change kubevirt to teach it to set
mac address, right?

-- 
MST




Re: [PATCH] vdpa/mlx5: Add the support of set mac address

2024-07-08 Thread Michael S. Tsirkin
On Mon, Jul 08, 2024 at 02:55:49PM +0800, Cindy Lu wrote:
> Add the function to support setting the MAC address.
> For vdpa/mlx5, the function will use mlx5_mpfs_add_mac
> to set the mac address
> 
> Tested in ConnectX-6 Dx device
> 
> Signed-off-by: Cindy Lu 

Is this on top of your other patchset?

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 26ba7da6b410..f78701386690 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -3616,10 +3616,33 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev 
> *v_mdev, struct vdpa_device *
>   destroy_workqueue(wq);
>   mgtdev->ndev = NULL;
>  }
> +static int mlx5_vdpa_set_attr_mac(struct vdpa_mgmt_dev *v_mdev,
> +   struct vdpa_device *dev,
> +   const struct vdpa_dev_set_config *add_config)
> +{
> + struct mlx5_vdpa_dev *mvdev = to_mvdev(dev);
> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> + struct mlx5_core_dev *mdev = mvdev->mdev;
> + struct virtio_net_config *config = >config;
> + int err;
> + struct mlx5_core_dev *pfmdev;
> +
> + if (add_config->mask & (1 << VDPA_ATTR_DEV_NET_CFG_MACADDR)) {
> + if (!is_zero_ether_addr(add_config->net.mac)) {
> + memcpy(config->mac, add_config->net.mac, ETH_ALEN);
> + pfmdev = pci_get_drvdata(pci_physfn(mdev->pdev));
> + err = mlx5_mpfs_add_mac(pfmdev, config->mac);
> + if (err)
> + return -1;
> + }
> + }
> + return 0;
> +}
>  
>  static const struct vdpa_mgmtdev_ops mdev_ops = {
>   .dev_add = mlx5_vdpa_dev_add,
>   .dev_del = mlx5_vdpa_dev_del,
> + .dev_set_attr = mlx5_vdpa_set_attr_mac,
>  };
>  
>  static struct virtio_device_id id_table[] = {
> -- 
> 2.45.0




Re: [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time

2024-07-08 Thread Michael S. Tsirkin
On Mon, Jul 08, 2024 at 11:17:06AM +, Dragos Tatulea wrote:
> On Mon, 2024-07-08 at 07:11 -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 08, 2024 at 11:01:39AM +, Dragos Tatulea wrote:
> > > On Wed, 2024-07-03 at 18:01 +0200, Eugenio Perez Martin wrote:
> > > > On Wed, Jun 26, 2024 at 11:27 AM Dragos Tatulea  
> > > > wrote:
> > > > > 
> > > > > On Wed, 2024-06-19 at 17:54 +0200, Eugenio Perez Martin wrote:
> > > > > > On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea 
> > > > > >  wrote:
> > > > > > > 
> > > > > > > Currently, hardware VQs are created right when the vdpa device 
> > > > > > > gets into
> > > > > > > DRIVER_OK state. That is easier because most of the VQ state is 
> > > > > > > known by
> > > > > > > then.
> > > > > > > 
> > > > > > > This patch switches to creating all VQs and their associated 
> > > > > > > resources
> > > > > > > at device creation time. The motivation is to reduce the vdpa 
> > > > > > > device
> > > > > > > live migration downtime by moving the expensive operation of 
> > > > > > > creating
> > > > > > > all the hardware VQs and their associated resources out of 
> > > > > > > downtime on
> > > > > > > the destination VM.
> > > > > > > 
> > > > > > > The VQs are now created in a blank state. The VQ configuration 
> > > > > > > will
> > > > > > > happen later, on DRIVER_OK. Then the configuration will be 
> > > > > > > applied when
> > > > > > > the VQs are moved to the Ready state.
> > > > > > > 
> > > > > > > When .set_vq_ready() is called on a VQ before DRIVER_OK, special 
> > > > > > > care is
> > > > > > > needed: now that the VQ is already created a resume_vq() will be
> > > > > > > triggered too early when no mr has been configured yet. Skip 
> > > > > > > calling
> > > > > > > resume_vq() in this case, let it be handled during DRIVER_OK.
> > > > > > > 
> > > > > > > For virtio-vdpa, the device configuration is done earlier during
> > > > > > > .vdpa_dev_add() by vdpa_register_device(). Avoid calling
> > > > > > > setup_vq_resources() a second time in that case.
> > > > > > > 
> > > > > > 
> > > > > > I guess this happens if virtio_vdpa is already loaded, but I cannot
> > > > > > see how this is different here. Apart from the IOTLB, what else does
> > > > > > it change from the mlx5_vdpa POV?
> > > > > > 
> > > > > I don't understand your question, could you rephrase or provide more 
> > > > > context
> > > > > please?
> > > > > 
> > > > 
> > > > My main point is that the vdpa parent driver should not be able to
> > > > tell the difference between vhost_vdpa and virtio_vdpa. The only
> > > > difference I can think of is because of the vhost IOTLB handling.
> > > > 
> > > > Do you also observe this behavior if you add the device with "vdpa
> > > > add" without the virtio_vdpa module loaded, and then modprobe
> > > > virtio_vdpa?
> > > > 
> > > Aah, now I understand what you mean. Indeed in my tests I was loading the
> > > virtio_vdpa module before adding the device. When doing it the other way 
> > > around
> > > the device doesn't get configured during probe.
> > >  
> > > 
> > > > At least the comment should be something in the line of "If we have
> > > > all the information to initialize the device, pre-warm it here" or
> > > > similar.
> > > Makes sense. I will send a v3 with the commit + comment message update.
> > 
> > 
> > Is commit update the only change then?
> I was planning to drop the paragraph in the commit message (it is confusing) 
> and
> edit the comment below (scroll down to see which).
> 
> Let me know if I should send the v3 or not. I have it prepared.

You can do this but pls document that the only change is in commit log.


> > 
> > > > 
> > > > > Thanks,
> > &g

Re: [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time

2024-07-08 Thread Michael S. Tsirkin
On Mon, Jul 08, 2024 at 11:01:39AM +, Dragos Tatulea wrote:
> On Wed, 2024-07-03 at 18:01 +0200, Eugenio Perez Martin wrote:
> > On Wed, Jun 26, 2024 at 11:27 AM Dragos Tatulea  wrote:
> > > 
> > > On Wed, 2024-06-19 at 17:54 +0200, Eugenio Perez Martin wrote:
> > > > On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea  
> > > > wrote:
> > > > > 
> > > > > Currently, hardware VQs are created right when the vdpa device gets 
> > > > > into
> > > > > DRIVER_OK state. That is easier because most of the VQ state is known 
> > > > > by
> > > > > then.
> > > > > 
> > > > > This patch switches to creating all VQs and their associated resources
> > > > > at device creation time. The motivation is to reduce the vdpa device
> > > > > live migration downtime by moving the expensive operation of creating
> > > > > all the hardware VQs and their associated resources out of downtime on
> > > > > the destination VM.
> > > > > 
> > > > > The VQs are now created in a blank state. The VQ configuration will
> > > > > happen later, on DRIVER_OK. Then the configuration will be applied 
> > > > > when
> > > > > the VQs are moved to the Ready state.
> > > > > 
> > > > > When .set_vq_ready() is called on a VQ before DRIVER_OK, special care 
> > > > > is
> > > > > needed: now that the VQ is already created a resume_vq() will be
> > > > > triggered too early when no mr has been configured yet. Skip calling
> > > > > resume_vq() in this case, let it be handled during DRIVER_OK.
> > > > > 
> > > > > For virtio-vdpa, the device configuration is done earlier during
> > > > > .vdpa_dev_add() by vdpa_register_device(). Avoid calling
> > > > > setup_vq_resources() a second time in that case.
> > > > > 
> > > > 
> > > > I guess this happens if virtio_vdpa is already loaded, but I cannot
> > > > see how this is different here. Apart from the IOTLB, what else does
> > > > it change from the mlx5_vdpa POV?
> > > > 
> > > I don't understand your question, could you rephrase or provide more 
> > > context
> > > please?
> > > 
> > 
> > My main point is that the vdpa parent driver should not be able to
> > tell the difference between vhost_vdpa and virtio_vdpa. The only
> > difference I can think of is because of the vhost IOTLB handling.
> > 
> > Do you also observe this behavior if you add the device with "vdpa
> > add" without the virtio_vdpa module loaded, and then modprobe
> > virtio_vdpa?
> > 
> Aah, now I understand what you mean. Indeed in my tests I was loading the
> virtio_vdpa module before adding the device. When doing it the other way 
> around
> the device doesn't get configured during probe.
>  
> 
> > At least the comment should be something in the line of "If we have
> > all the information to initialize the device, pre-warm it here" or
> > similar.
> Makes sense. I will send a v3 with the commit + comment message update.


Is commit update the only change then?

> > 
> > > Thanks,
> > > Dragos
> > > 
> > > > > Signed-off-by: Dragos Tatulea 
> > > > > Reviewed-by: Cosmin Ratiu 
> > > > > ---
> > > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 37 
> > > > > -
> > > > >  1 file changed, 32 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > index 249b5afbe34a..b2836fd3d1dd 100644
> > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > @@ -2444,7 +2444,7 @@ static void mlx5_vdpa_set_vq_ready(struct 
> > > > > vdpa_device *vdev, u16 idx, bool ready
> > > > > mvq = >vqs[idx];
> > > > > if (!ready) {
> > > > > suspend_vq(ndev, mvq);
> > > > > -   } else {
> > > > > +   } else if (mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > > > > if (resume_vq(ndev, mvq))
> > > > > ready = false;
> > > > > }
> > > > > @@ -3078,10 +3078,18 @@ static void mlx5_vdpa_set_status(struct 
> > > > > vdpa_device *vdev, u8 status)
> > > > > goto err_setup;
> > > > > }
> > > > > register_link_notifier(ndev);
> > > > > -   err = setup_vq_resources(ndev, true);
> > > > > -   if (err) {
> > > > > -   mlx5_vdpa_warn(mvdev, "failed to 
> > > > > setup driver\n");
> > > > > -   goto err_driver;
> > > > > +   if (ndev->setup) {
> > > > > +   err = resume_vqs(ndev);
> > > > > +   if (err) {
> > > > > +   mlx5_vdpa_warn(mvdev, "failed 
> > > > > to resume VQs\n");
> > > > > +   goto err_driver;
> > > > > +   }
> > > > > +   } else {
> > > > > +   err = setup_vq_resources(ndev, true);
> > > > > +   

Re: [PATCH] vdpa_sim_blk: add `capacity` module parameter

2024-07-05 Thread Michael S. Tsirkin
On Fri, Jul 05, 2024 at 01:28:21PM +0200, Stefano Garzarella wrote:
> The vDPA block simulator always allocated a 128 MiB ram-disk, but some
> filesystems (e.g. XFS) may require larger minimum sizes (see
> https://issues.redhat.com/browse/RHEL-45951).
> 
> So to allow us to test these filesystems, let's add a module parameter
> to control the size of the simulated virtio-blk devices.
> The value is mapped directly to the `capacity` field of the virtio-blk
> configuration space, so it must be expressed in sector numbers of 512
> bytes.
> 
> The default value (0x4) is the same as the previous value, so the
> behavior without setting `capacity` remains unchanged.
> 
> Before this patch or with this patch without setting `capacity`:
>   $ modprobe vdpa-sim-blk
>   $ vdpa dev add mgmtdev vdpasim_blk name blk0
>   virtio_blk virtio6: 1/0/0 default/read/poll queues
>   virtio_blk virtio6: [vdb] 262144 512-byte logical blocks (134 MB/128 MiB)
> 
> After this patch:
>   $ modprobe vdpa-sim-blk capacity=614400
>   $ vdpa dev add mgmtdev vdpasim_blk name blk0
>   virtio_blk virtio6: 1/0/0 default/read/poll queues
>   virtio_blk virtio6: [vdb] 614400 512-byte logical blocks (315 MB/300 MiB)
> 
> Signed-off-by: Stefano Garzarella 

What a hack. Cindy was working on adding control over config
space, why can't that be used?

> ---
>  drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 25 +
>  1 file changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c 
> b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> index b137f3679343..18f390149836 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> @@ -33,7 +33,6 @@
>(1ULL << VIRTIO_BLK_F_DISCARD)  | \
>(1ULL << VIRTIO_BLK_F_WRITE_ZEROES))
>  
> -#define VDPASIM_BLK_CAPACITY 0x4
>  #define VDPASIM_BLK_SIZE_MAX 0x1000
>  #define VDPASIM_BLK_SEG_MAX  32
>  #define VDPASIM_BLK_DWZ_MAX_SECTORS UINT_MAX
> @@ -43,6 +42,10 @@
>  #define VDPASIM_BLK_AS_NUM   1
>  #define VDPASIM_BLK_GROUP_NUM1
>  
> +static unsigned long capacity = 0x4;
> +module_param(capacity, ulong, 0444);
> +MODULE_PARM_DESC(capacity, "virtio-blk device capacity (in 512-byte 
> sectors)");
> +
>  struct vdpasim_blk {
>   struct vdpasim vdpasim;
>   void *buffer;
> @@ -79,10 +82,10 @@ static void vdpasim_blk_buffer_unlock(struct vdpasim_blk 
> *blk)
>  static bool vdpasim_blk_check_range(struct vdpasim *vdpasim, u64 
> start_sector,
>   u64 num_sectors, u64 max_sectors)
>  {
> - if (start_sector > VDPASIM_BLK_CAPACITY) {
> + if (start_sector > capacity) {
>   dev_dbg(>vdpa.dev,
> - "starting sector exceeds the capacity - start: 0x%llx 
> capacity: 0x%x\n",
> - start_sector, VDPASIM_BLK_CAPACITY);
> + "starting sector exceeds the capacity - start: 0x%llx 
> capacity: 0x%lx\n",
> + start_sector, capacity);
>   }
>  
>   if (num_sectors > max_sectors) {
> @@ -92,10 +95,10 @@ static bool vdpasim_blk_check_range(struct vdpasim 
> *vdpasim, u64 start_sector,
>   return false;
>   }
>  
> - if (num_sectors > VDPASIM_BLK_CAPACITY - start_sector) {
> + if (num_sectors > capacity - start_sector) {
>   dev_dbg(>vdpa.dev,
> - "request exceeds the capacity - start: 0x%llx num: 
> 0x%llx capacity: 0x%x\n",
> - start_sector, num_sectors, VDPASIM_BLK_CAPACITY);
> + "request exceeds the capacity - start: 0x%llx num: 
> 0x%llx capacity: 0x%lx\n",
> + start_sector, num_sectors, capacity);
>   return false;
>   }
>  
> @@ -369,7 +372,7 @@ static void vdpasim_blk_get_config(struct vdpasim 
> *vdpasim, void *config)
>  
>   memset(config, 0, sizeof(struct virtio_blk_config));
>  
> - blk_config->capacity = cpu_to_vdpasim64(vdpasim, VDPASIM_BLK_CAPACITY);
> + blk_config->capacity = cpu_to_vdpasim64(vdpasim, capacity);
>   blk_config->size_max = cpu_to_vdpasim32(vdpasim, VDPASIM_BLK_SIZE_MAX);
>   blk_config->seg_max = cpu_to_vdpasim32(vdpasim, VDPASIM_BLK_SEG_MAX);
>   blk_config->num_queues = cpu_to_vdpasim16(vdpasim, VDPASIM_BLK_VQ_NUM);
> @@ -437,8 +440,7 @@ static int vdpasim_blk_dev_add(struct vdpa_mgmt_dev 
> *mdev, const char *name,
>   if (blk->shared_backend) {
>   blk->buffer = shared_buffer;
>   } else {
> - blk->buffer = kvzalloc(VDPASIM_BLK_CAPACITY << SECTOR_SHIFT,
> -GFP_KERNEL);
> + blk->buffer = kvzalloc(capacity << SECTOR_SHIFT, GFP_KERNEL);
>   if (!blk->buffer) {
>   ret = -ENOMEM;
>   goto put_dev;
> @@ -495,8 +497,7 @@ static int __init vdpasim_blk_init(void)
>   goto parent_err;
>  
>   if 

[PATCH 2/2] virtio: fix vq # when vq skipped

2024-07-05 Thread Michael S. Tsirkin
virtio balloon communicates to the core that in some
configurations vq #s are non-contiguous by setting name
pointer to NULL.

Unfortunately, core then turned around and just made them
contiguous again. Result is that driver is out of spec.

Implement what the API was supposed to do
in the 1st place. Compatibility with buggy hypervisors
is handled inside virtio-balloon, which is the only driver
making use of this facility, so far.

Signed-off-by: Michael S. Tsirkin 
---
 arch/um/drivers/virtio_uml.c   | 4 ++--
 drivers/remoteproc/remoteproc_virtio.c | 4 ++--
 drivers/s390/virtio/virtio_ccw.c   | 4 ++--
 drivers/virtio/virtio_mmio.c   | 4 ++--
 drivers/virtio/virtio_pci_common.c | 8 
 drivers/virtio/virtio_vdpa.c   | 4 ++--
 6 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/um/drivers/virtio_uml.c b/arch/um/drivers/virtio_uml.c
index 77faa2cf3a13..d65346cd340e 100644
--- a/arch/um/drivers/virtio_uml.c
+++ b/arch/um/drivers/virtio_uml.c
@@ -1019,7 +1019,7 @@ static int vu_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
   struct irq_affinity *desc)
 {
struct virtio_uml_device *vu_dev = to_virtio_uml_device(vdev);
-   int i, queue_idx = 0, rc;
+   int i, rc;
struct virtqueue *vq;
 
/* not supported for now */
@@ -1036,7 +1036,7 @@ static int vu_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
continue;
}
 
-   vqs[i] = vu_setup_vq(vdev, queue_idx++, callbacks[i], names[i],
+   vqs[i] = vu_setup_vq(vdev, i, callbacks[i], names[i],
 ctx ? ctx[i] : false);
if (IS_ERR(vqs[i])) {
rc = PTR_ERR(vqs[i]);
diff --git a/drivers/remoteproc/remoteproc_virtio.c 
b/drivers/remoteproc/remoteproc_virtio.c
index 25b66b113b69..2d17135abb66 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -187,7 +187,7 @@ static int rproc_virtio_find_vqs(struct virtio_device 
*vdev, unsigned int nvqs,
 const bool * ctx,
 struct irq_affinity *desc)
 {
-   int i, ret, queue_idx = 0;
+   int i, ret;
 
for (i = 0; i < nvqs; ++i) {
if (!names[i]) {
@@ -195,7 +195,7 @@ static int rproc_virtio_find_vqs(struct virtio_device 
*vdev, unsigned int nvqs,
continue;
}
 
-   vqs[i] = rp_find_vq(vdev, queue_idx++, callbacks[i], names[i],
+   vqs[i] = rp_find_vq(vdev, i, callbacks[i], names[i],
ctx ? ctx[i] : false);
if (IS_ERR(vqs[i])) {
ret = PTR_ERR(vqs[i]);
diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index d6491fc84e8c..64541b3bb8a2 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -696,7 +696,7 @@ static int virtio_ccw_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
 {
struct virtio_ccw_device *vcdev = to_vc_device(vdev);
dma64_t *indicatorp = NULL;
-   int ret, i, queue_idx = 0;
+   int ret, i;
struct ccw1 *ccw;
dma32_t indicatorp_dma = 0;
 
@@ -710,7 +710,7 @@ static int virtio_ccw_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
continue;
}
 
-   vqs[i] = virtio_ccw_setup_vq(vdev, queue_idx++, callbacks[i],
+   vqs[i] = virtio_ccw_setup_vq(vdev, i, callbacks[i],
 names[i], ctx ? ctx[i] : false,
 ccw);
if (IS_ERR(vqs[i])) {
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 173596589c71..a3a66a0b7cb1 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -496,7 +496,7 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned 
int nvqs,
 {
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
int irq = platform_get_irq(vm_dev->pdev, 0);
-   int i, err, queue_idx = 0;
+   int i, err;
 
if (irq < 0)
return irq;
@@ -515,7 +515,7 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned 
int nvqs,
continue;
}
 
-   vqs[i] = vm_setup_vq(vdev, queue_idx++, callbacks[i], names[i],
+   vqs[i] = vm_setup_vq(vdev, i, callbacks[i], names[i],
 ctx ? ctx[i] : false);
if (IS_ERR(vqs[i])) {
vm_del_vqs(vdev);
diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index f6b0b00e4599..eeff060cacec 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -292,7 +292,7 @@ static int vp_find_vqs_msix(struct virtio_dev

[PATCH 1/2] virtio_balloon: add work around for out of spec QEMU

2024-07-05 Thread Michael S. Tsirkin
QEMU implemented the configuration
VIRTIO_BALLOON_F_REPORTING && ! VIRTIO_BALLOON_F_FREE_PAGE_HINT
incorrectly: it then uses vq3 for reporting, spec says it is always 4.

This is masked by a corresponding bug in driver:
add a work around as I'm going to try and fix the driver bug.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_balloon.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 9a61febbd2f7..7dc3fcd56238 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -597,8 +597,23 @@ static int init_vqs(struct virtio_balloon *vb)
 
err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs,
  callbacks, names, NULL);
-   if (err)
-   return err;
+   if (err) {
+   /*
+* Try to work around QEMU bug which since 2020 confused vq 
numbers
+* when VIRTIO_BALLOON_F_REPORTING but not
+* VIRTIO_BALLOON_F_FREE_PAGE_HINT are offered.
+*/
+   if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING) &&
+   !virtio_has_feature(vb->vdev, 
VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   names[VIRTIO_BALLOON_VQ_FREE_PAGE] = "reporting_vq";
+   callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = balloon_ack;
+   err = virtio_find_vqs(vb->vdev,
+ VIRTIO_BALLOON_VQ_REPORTING, vqs, 
callbacks, names, NULL);
+   }
+
+   if (err)
+   return err;
+   }
 
vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE];
vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE];
-- 
MST




Re: [PATCH v2] net: missing check virtio

2024-07-03 Thread Michael S. Tsirkin
On Thu, Jun 13, 2024 at 12:54:48PM +0300, Denis Arefev wrote:
> Two missing check in virtio_net_hdr_to_skb() allowed syzbot
> to crash kernels again
> 
> 1. After the skb_segment function the buffer may become non-linear
> (nr_frags != 0), but since the SKBTX_SHARED_FRAG flag is not set anywhere
> the __skb_linearize function will not be executed, then the buffer will
> remain non-linear. Then the condition (offset >= skb_headlen(skb))
> becomes true, which causes WARN_ON_ONCE in skb_checksum_help.
> 
> 2. The struct sk_buff and struct virtio_net_hdr members must be
> mathematically related.
> (gso_size) must be greater than (needed) otherwise WARN_ON_ONCE.
> (remainder) must be greater than (needed) otherwise WARN_ON_ONCE.
> (remainder) may be 0 if division is without remainder.
> 
> offset+2 (4191) > skb_headlen() (1116)
> WARNING: CPU: 1 PID: 5084 at net/core/dev.c:3303 
> skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303
> Modules linked in:
> CPU: 1 PID: 5084 Comm: syz-executor336 Not tainted 
> 6.7.0-rc3-syzkaller-00014-gdf60cee26a2e #0
> Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 
> 11/10/2023
> RIP: 0010:skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303
> Code: 89 e8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 52 01 00 00 44 89 e2 2b 
> 53 74 4c 89 ee 48 c7 c7 40 57 e9 8b e8 af 8f dd f8 90 <0f> 0b 90 90 e9 87 fe 
> ff ff e8 40 0f 6e f9 e9 4b fa ff ff 48 89 ef
> RSP: 0018:c90003a9f338 EFLAGS: 00010286
> RAX:  RBX: 888025125780 RCX: 814db209
> RDX: 888015393b80 RSI: 814db216 RDI: 0001
> RBP: 8880251257f4 R08: 0001 R09: 
> R10:  R11: 0001 R12: 045c
> R13: 105f R14: 8880251257f0 R15: 105d
> FS:  55c24380() GS:8880b990() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 2000f000 CR3: 23151000 CR4: 003506f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  
>  ip_do_fragment+0xa1b/0x18b0 net/ipv4/ip_output.c:777
>  ip_fragment.constprop.0+0x161/0x230 net/ipv4/ip_output.c:584
>  ip_finish_output_gso net/ipv4/ip_output.c:286 [inline]
>  __ip_finish_output net/ipv4/ip_output.c:308 [inline]
>  __ip_finish_output+0x49c/0x650 net/ipv4/ip_output.c:295
>  ip_finish_output+0x31/0x310 net/ipv4/ip_output.c:323
>  NF_HOOK_COND include/linux/netfilter.h:303 [inline]
>  ip_output+0x13b/0x2a0 net/ipv4/ip_output.c:433
>  dst_output include/net/dst.h:451 [inline]
>  ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:129
>  iptunnel_xmit+0x5b4/0x9b0 net/ipv4/ip_tunnel_core.c:82
>  ipip6_tunnel_xmit net/ipv6/sit.c:1034 [inline]
>  sit_tunnel_xmit+0xed2/0x28f0 net/ipv6/sit.c:1076
>  __netdev_start_xmit include/linux/netdevice.h:4940 [inline]
>  netdev_start_xmit include/linux/netdevice.h:4954 [inline]
>  xmit_one net/core/dev.c:3545 [inline]
>  dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3561
>  __dev_queue_xmit+0x7c1/0x3d60 net/core/dev.c:4346
>  dev_queue_xmit include/linux/netdevice.h:3134 [inline]
>  packet_xmit+0x257/0x380 net/packet/af_packet.c:276
>  packet_snd net/packet/af_packet.c:3087 [inline]
>  packet_sendmsg+0x24ca/0x5240 net/packet/af_packet.c:3119
>  sock_sendmsg_nosec net/socket.c:730 [inline]
>  __sock_sendmsg+0xd5/0x180 net/socket.c:745
>  __sys_sendto+0x255/0x340 net/socket.c:2190
>  __do_sys_sendto net/socket.c:2202 [inline]
>  __se_sys_sendto net/socket.c:2198 [inline]
>  __x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198
>  do_syscall_x64 arch/x86/entry/common.c:51 [inline]
>  do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
>  entry_SYSCALL_64_after_hwframe+0x63/0x6b
> 
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller
> 
> Signed-off-by: Denis Arefev 

I suspect it's this one:

Fixes: 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head")


Though if you can test kernel before that to make sure, that
would be nice.

I'm inclined to merge, crashing syzkaller is not nice.


Acked-by: Michael S. Tsirkin 


> ---
>  V1 -> V2: incorrect type in argument 2
>  include/linux/virtio_net.h | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> index 4dfa9b69ca8d..d1d7825318c3 100644
> --- a/include/linux/virtio_net.h
> +++ b/include/linux/virtio_net.h
> @@ -56,6 +56,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
>   unsigned int thlen = 0;
>   unsigned int p_off = 0;
>   unsigned int ip_proto;
> + u64 ret, remainder, gs

Re: [PATCH] virtio: add missing MODULE_DESCRIPTION() macros

2024-07-02 Thread Michael S. Tsirkin
On Tue, Jul 02, 2024 at 01:10:18PM -0700, Jeff Johnson wrote:
> With ARCH=sh, make allmodconfig && make W=1 C=1 reports:
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/virtio/virtio.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/virtio/virtio_ring.o
> 
> Add the missing invocations of the MODULE_DESCRIPTION() macro.
> 
> Signed-off-by: Jeff Johnson 

tagged, thanks!

> ---
>  drivers/virtio/virtio.c  | 1 +
>  drivers/virtio/virtio_ring.c | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index b968b2aa5f4d..396d3cd49a1b 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -609,4 +609,5 @@ static void __exit virtio_exit(void)
>  core_initcall(virtio_init);
>  module_exit(virtio_exit);
>  
> +MODULE_DESCRIPTION("Virtio core interface");
>  MODULE_LICENSE("GPL");
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 2a972752ff1b..1cac7d5b3062 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -3244,4 +3244,5 @@ void virtqueue_dma_sync_single_range_for_device(struct 
> virtqueue *_vq,
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_dma_sync_single_range_for_device);
>  
> +MODULE_DESCRIPTION("Virtio ring implementation");
>  MODULE_LICENSE("GPL");
> 
> ---
> base-commit: 1dfe225e9af5bd3399a1dbc6a4df6a6041ff9c23
> change-id: 20240702-md-sh-drivers-virtio-704eb84769cb




Re: [PATCH V2 3/3] virtio-net: synchronize operstate with admin state on up/down

2024-06-26 Thread Michael S. Tsirkin
On Wed, Jun 26, 2024 at 09:58:32AM +0800, Jason Wang wrote:
> On Tue, Jun 25, 2024 at 4:32 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Jun 25, 2024 at 04:11:05PM +0800, Jason Wang wrote:
> > > On Tue, Jun 25, 2024 at 3:57 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Tue, Jun 25, 2024 at 03:46:44PM +0800, Jason Wang wrote:
> > > > > Workqueue is used to serialize those so we won't lose any change.
> > > >
> > > > So we don't need to re-read then?
> > > >
> > >
> > > We might have to re-read but I don't get why it is a problem for us.
> > >
> > > Thanks
> >
> > I don't think each ethtool command should force a full config read,
> > is what I mean. Only do it if really needed.
> 
> We don't, as we will check config_pending there.
> 
> Thanks

And config_pending set from an interrupt? That's fine.
But it's not what this patch does, right?

> >
> > --
> > MST
> >




Re: [PATCH] virtio-pci: PCI extended capabilities for virtio

2024-06-25 Thread Michael S. Tsirkin
On Tue, Jun 25, 2024 at 02:39:37PM -0400, sahanlb wrote:
> PCI legacy configuration space does not have sufficient space for a device
> that supports all kinds of virtio structures via PCI capabilities. This is
> especially true if one were to use virtio drivers with physical devices.
> Link: https://par.nsf.gov/servlets/purl/10463939
> A physical device may already have many capabilities in the legacy space.
> 
> This patch adds support to place virtio capabilities in the PCI extended
> configuration space and makes the driver search both legacy and extended
> PCI configuration spaces.
> 
> Add new argument to vp_modern_map_capability to indicate whether mapping
> a legacy or extended capability.
> Add new function virtio_pci_find_ext_capability to walk extended
> capabilities and find virtio capabilities.
> 
> Modify vp_modern_probe to search both legacy and extended configuration
> spaces.
> If virtio_pci_find_capability fails to find common, isr, notify, or device
> virtio structures, call virtio_pci_find_ext_capability.
> 
> Notify virtio structure can get mapped either in vp_modern_probe or in
> vp_modern_map_vq_notify. Add new attribute 'notify_ecap' to
> struct virtio_pci_modern_device to indicate whether the notify capability
> is in the extended congiguration structure.
> 
> Add virtio extended capability structures to
> "include/uapi/linux/virtio_pci.h".
> Format for the extended structures derived from
> Link: https://lore.kernel.org/all/20220112055755.41011-2-jasow...@redhat.com/
> 
> This patch has been validated using an FPGA development board to implement 
> a virtio interface.
> 
> Signed-off-by: sahanlb 


Thanks for the patch! As any UAPI change, this one has to also
be accompanied by a spec patch documenting the capabilities.


...

> +struct virtio_pci_cfg_ecap {
> + struct virtio_pci_ecap cap;
> + __u8 pci_cfg_data[4]; /* Data for BAR access. */
> +};

Hmm, a weird thing to do. The reason we have it is because
a legacy bios has trouble accessing BAR directly (e.g. if BAR
is 64 bit).  Is there still an issue even for bios with
support for pcie extended config space?


> +
>  /* Macro versions of offsets for the Old Timers! */
>  #define VIRTIO_PCI_CAP_VNDR  0
>  #define VIRTIO_PCI_CAP_NEXT  1


This comes at an interesting time, there is an interest
in order to get virtio working over EP protocol,
in exposing all this information as part of a BAR
as opposed as a capability. I wonder whether that's
also palatable.


> -- 
> 2.42.0




Re: [PATCH V2 3/3] virtio-net: synchronize operstate with admin state on up/down

2024-06-25 Thread Michael S. Tsirkin
On Tue, Jun 25, 2024 at 04:11:05PM +0800, Jason Wang wrote:
> On Tue, Jun 25, 2024 at 3:57 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Jun 25, 2024 at 03:46:44PM +0800, Jason Wang wrote:
> > > Workqueue is used to serialize those so we won't lose any change.
> >
> > So we don't need to re-read then?
> >
> 
> We might have to re-read but I don't get why it is a problem for us.
> 
> Thanks

I don't think each ethtool command should force a full config read,
is what I mean. Only do it if really needed.

-- 
MST




Re: [PATCH V2 1/3] virtio: allow nested disabling of the configure interrupt

2024-06-25 Thread Michael S. Tsirkin
On Tue, Jun 25, 2024 at 04:18:00PM +0800, Jason Wang wrote:
> > > >
> > > >
> > > >
> > > > But in conclusion ;) if you don't like my suggestion do something else
> > > > but make the APIs make sense,
> > >
> > > I don't say I don't like it:)
> > >
> > > Limiting it to virtio-net seems to be the most easy way. And if we
> > > want to do it in the core, I just want to make nesting to be supported
> > > which might not be necessary now.
> >
> > I feel limiting it to a single driver strikes the right balance ATM.
> 
> Just to make sure I understand here, should we go back to v1 or go
> with the config_driver_disabled?
> 
> Thanks


I still like config_driver_disabled.


> >
> > >
> > > > at least do better than +5
> > > > on Rusty's interface design scale.
> > > >
> > > > >
> > >
> > > Thanks
> > >
> > >
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > @@ -455,7 +461,7 @@ int register_virtio_device(struct 
> > > > > > > virtio_device *dev)
> > > > > > >   goto out_ida_remove;
> > > > > > >
> > > > > > >   spin_lock_init(>config_lock);
> > > > > > > - dev->config_enabled = false;
> > > > > > > + dev->config_enabled = 0;
> > > > > > >   dev->config_change_pending = false;
> > > > > > >
> > > > > > >   INIT_LIST_HEAD(>vqs);
> > > > > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > > > > > index 96fea920873b..4496f9ba5d82 100644
> > > > > > > --- a/include/linux/virtio.h
> > > > > > > +++ b/include/linux/virtio.h
> > > > > > > @@ -132,7 +132,7 @@ struct virtio_admin_cmd {
> > > > > > >  struct virtio_device {
> > > > > > >   int index;
> > > > > > >   bool failed;
> > > > > > > - bool config_enabled;
> > > > > > > + int config_enabled;
> > > > > > >   bool config_change_pending;
> > > > > > >   spinlock_t config_lock;
> > > > > > >   spinlock_t vqs_list_lock;
> > > > > > > --
> > > > > > > 2.31.1
> > > > > >
> > > >
> >




Re: [PATCH V2 1/3] virtio: allow nested disabling of the configure interrupt

2024-06-25 Thread Michael S. Tsirkin
On Tue, Jun 25, 2024 at 03:50:30PM +0800, Jason Wang wrote:
> On Tue, Jun 25, 2024 at 3:11 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Jun 25, 2024 at 09:27:04AM +0800, Jason Wang wrote:
> > > On Mon, Jun 24, 2024 at 5:59 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, Jun 24, 2024 at 10:45:21AM +0800, Jason Wang wrote:
> > > > > Somtime driver may want to enable or disable the config callback. This
> > > > > requires a synchronization with the core. So this patch change the
> > > > > config_enabled to be a integer counter. This allows the toggling of
> > > > > the config_enable to be synchronized between the virtio core and the
> > > > > virtio driver.
> > > > >
> > > > > The counter is not allowed to be increased greater than one, this
> > > > > simplifies the logic where the interrupt could be disabled immediately
> > > > > without extra synchronization between driver and core.
> > > > >
> > > > > Signed-off-by: Jason Wang 
> > > > > ---
> > > > >  drivers/virtio/virtio.c | 20 +---
> > > > >  include/linux/virtio.h  |  2 +-
> > > > >  2 files changed, 14 insertions(+), 8 deletions(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > index b968b2aa5f4d..d3aa74b8ae5d 100644
> > > > > --- a/drivers/virtio/virtio.c
> > > > > +++ b/drivers/virtio/virtio.c
> > > > > @@ -127,7 +127,7 @@ static void __virtio_config_changed(struct 
> > > > > virtio_device *dev)
> > > > >  {
> > > > >   struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> > > > >
> > > > > - if (!dev->config_enabled)
> > > > > + if (dev->config_enabled < 1)
> > > > >   dev->config_change_pending = true;
> > > > >   else if (drv && drv->config_changed)
> > > > >   drv->config_changed(dev);
> > > > > @@ -146,17 +146,23 @@ EXPORT_SYMBOL_GPL(virtio_config_changed);
> > > > >  static void virtio_config_disable(struct virtio_device *dev)
> > > > >  {
> > > > >   spin_lock_irq(>config_lock);
> > > > > - dev->config_enabled = false;
> > > > > + --dev->config_enabled;
> > > > >   spin_unlock_irq(>config_lock);
> > > > >  }
> > > > >
> > > > >  static void virtio_config_enable(struct virtio_device *dev)
> > > > >  {
> > > > >   spin_lock_irq(>config_lock);
> > > > > - dev->config_enabled = true;
> > > > > - if (dev->config_change_pending)
> > > > > - __virtio_config_changed(dev);
> > > > > - dev->config_change_pending = false;
> > > > > +
> > > > > + if (dev->config_enabled < 1) {
> > > > > + ++dev->config_enabled;
> > > > > + if (dev->config_enabled == 1 &&
> > > > > + dev->config_change_pending) {
> > > > > + __virtio_config_changed(dev);
> > > > > + dev->config_change_pending = false;
> > > > > + }
> > > > > + }
> > > > > +
> > > > >   spin_unlock_irq(>config_lock);
> > > > >  }
> > > > >
> > > >
> > > > So every disable decrements the counter. Enable only increments it up 
> > > > to 1.
> > > > You seem to be making some very specific assumptions
> > > > about how this API will be used. Any misuse will lead to under/overflow
> > > > eventually ...
> > > >
> > >
> > > Well, a counter gives us more information than a boolean. With
> > > boolean, misuse is even harder to be noticed.
> >
> > With boolean we can prevent misuse easily because previous state
> > is known exactly. E.g.:
> >
> > static void virtio_config_driver_disable(struct virtio_device *dev)
> > {
> > BUG_ON(dev->config_driver_disabled);
> > dev->config_driver_disabled = true;
> > }
> >
> >
> >
> > static void virtio_config_driver_enable(struct virtio_device *dev)
> > {
> > BUG_ON(!dev->

Re: [PATCH V2 3/3] virtio-net: synchronize operstate with admin state on up/down

2024-06-25 Thread Michael S. Tsirkin
On Tue, Jun 25, 2024 at 03:46:44PM +0800, Jason Wang wrote:
> Workqueue is used to serialize those so we won't lose any change.

So we don't need to re-read then?




Re: [PATCH V2 3/3] virtio-net: synchronize operstate with admin state on up/down

2024-06-25 Thread Michael S. Tsirkin
On Tue, Jun 25, 2024 at 09:27:38AM +0800, Jason Wang wrote:
> On Mon, Jun 24, 2024 at 6:07 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Jun 24, 2024 at 10:45:23AM +0800, Jason Wang wrote:
> > > This patch synchronize operstate with admin state per RFC2863.
> > >
> > > This is done by trying to toggle the carrier upon open/close and
> > > synchronize with the config change work. This allows propagate status
> > > correctly to stacked devices like:
> > >
> > > ip link add link enp0s3 macvlan0 type macvlan
> > > ip link set link enp0s3 down
> > > ip link show
> > >
> > > Before this patch:
> > >
> > > 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN 
> > > mode DEFAULT group default qlen 1000
> > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> > > ..
> > > 5: macvlan0@enp0s3:  mtu 1500 
> > > qdisc noqueue state UP mode DEFAULT group default qlen 1000
> > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> > >
> > > After this patch:
> > >
> > > 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN 
> > > mode DEFAULT group default qlen 1000
> > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> > > ...
> > > 5: macvlan0@enp0s3:  mtu 1500 
> > > qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
> > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> > >
> > > Cc: Venkat Venkatsubra 
> > > Cc: Gia-Khanh Nguyen 
> > > Signed-off-by: Jason Wang 
> > > ---
> > >  drivers/net/virtio_net.c | 72 +++-
> > >  1 file changed, 42 insertions(+), 30 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index b1f8b720733e..eff3ad3d6bcc 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -2468,6 +2468,25 @@ static int virtnet_enable_queue_pair(struct 
> > > virtnet_info *vi, int qp_index)
> > >   return err;
> > >  }
> > >
> > > +static void virtnet_update_settings(struct virtnet_info *vi)
> > > +{
> > > + u32 speed;
> > > + u8 duplex;
> > > +
> > > + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> > > + return;
> > > +
> > > + virtio_cread_le(vi->vdev, struct virtio_net_config, speed, );
> > > +
> > > + if (ethtool_validate_speed(speed))
> > > + vi->speed = speed;
> > > +
> > > + virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, 
> > > );
> > > +
> > > + if (ethtool_validate_duplex(duplex))
> > > + vi->duplex = duplex;
> > > +}
> > > +
> > >  static int virtnet_open(struct net_device *dev)
> > >  {
> > >   struct virtnet_info *vi = netdev_priv(dev);
> > > @@ -2486,6 +2505,22 @@ static int virtnet_open(struct net_device *dev)
> > >   goto err_enable_qp;
> > >   }
> > >
> > > + /* Assume link up if device can't report link status,
> > > +otherwise get link status from config. */
> > > + netif_carrier_off(dev);
> > > + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> > > + virtio_config_enable(vi->vdev);
> > > + /* We are not sure if config interrupt is disabled by
> > > +  * core or not, so we can't schedule config_work by
> > > +  * ourselves.
> > > +  */
> >
> > This comment confuses more than it explains.
> > You seem to be arguing about some alternative design
> > you had in mind, but readers don't have it in mind.
> >
> >
> > Please just explain what this does and why.
> > For what: something like "Trigger re-read of config - same
> > as we'd do if config changed".
> >
> > Now, please do what you don't do here: explain the why:
> >
> >
> > why do we want all these VM
> > exits on each open/close as opposed to once on probe and later on
> > config changed interrupt.
> 
> Fine, the main reason is that a config interrupt might be pending
> during ifdown and core may disable configure interrupt due to several
> reasons.
> 
> Thanks

If the config changes exactly as command is executing?
Then we'll get an interrupt later and u

Re: [PATCH V2 1/3] virtio: allow nested disabling of the configure interrupt

2024-06-25 Thread Michael S. Tsirkin
On Tue, Jun 25, 2024 at 09:27:04AM +0800, Jason Wang wrote:
> On Mon, Jun 24, 2024 at 5:59 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Jun 24, 2024 at 10:45:21AM +0800, Jason Wang wrote:
> > > Somtime driver may want to enable or disable the config callback. This
> > > requires a synchronization with the core. So this patch change the
> > > config_enabled to be a integer counter. This allows the toggling of
> > > the config_enable to be synchronized between the virtio core and the
> > > virtio driver.
> > >
> > > The counter is not allowed to be increased greater than one, this
> > > simplifies the logic where the interrupt could be disabled immediately
> > > without extra synchronization between driver and core.
> > >
> > > Signed-off-by: Jason Wang 
> > > ---
> > >  drivers/virtio/virtio.c | 20 +---
> > >  include/linux/virtio.h  |  2 +-
> > >  2 files changed, 14 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > index b968b2aa5f4d..d3aa74b8ae5d 100644
> > > --- a/drivers/virtio/virtio.c
> > > +++ b/drivers/virtio/virtio.c
> > > @@ -127,7 +127,7 @@ static void __virtio_config_changed(struct 
> > > virtio_device *dev)
> > >  {
> > >   struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> > >
> > > - if (!dev->config_enabled)
> > > + if (dev->config_enabled < 1)
> > >   dev->config_change_pending = true;
> > >   else if (drv && drv->config_changed)
> > >   drv->config_changed(dev);
> > > @@ -146,17 +146,23 @@ EXPORT_SYMBOL_GPL(virtio_config_changed);
> > >  static void virtio_config_disable(struct virtio_device *dev)
> > >  {
> > >   spin_lock_irq(>config_lock);
> > > - dev->config_enabled = false;
> > > + --dev->config_enabled;
> > >   spin_unlock_irq(>config_lock);
> > >  }
> > >
> > >  static void virtio_config_enable(struct virtio_device *dev)
> > >  {
> > >   spin_lock_irq(>config_lock);
> > > - dev->config_enabled = true;
> > > - if (dev->config_change_pending)
> > > - __virtio_config_changed(dev);
> > > - dev->config_change_pending = false;
> > > +
> > > + if (dev->config_enabled < 1) {
> > > + ++dev->config_enabled;
> > > + if (dev->config_enabled == 1 &&
> > > + dev->config_change_pending) {
> > > + __virtio_config_changed(dev);
> > > + dev->config_change_pending = false;
> > > + }
> > > + }
> > > +
> > >   spin_unlock_irq(>config_lock);
> > >  }
> > >
> >
> > So every disable decrements the counter. Enable only increments it up to 1.
> > You seem to be making some very specific assumptions
> > about how this API will be used. Any misuse will lead to under/overflow
> > eventually ...
> >
> 
> Well, a counter gives us more information than a boolean. With
> boolean, misuse is even harder to be noticed.

With boolean we can prevent misuse easily because previous state
is known exactly. E.g.:

static void virtio_config_driver_disable(struct virtio_device *dev)
{
BUG_ON(dev->config_driver_disabled);
dev->config_driver_disabled = true;
}



static void virtio_config_driver_enable(struct virtio_device *dev)
{
BUG_ON(!dev->config_driver_disabled);
dev->config_driver_disabled = false;
}


Does not work with integer you simply have no idea what the value
should be at point of call.


> >
> >
> > My suggestion would be to
> > 1. rename config_enabled to config_core_enabled
> > 2. rename virtio_config_enable/disable to virtio_config_core_enable/disable
> > 3. add bool config_driver_disabled and make virtio_config_enable/disable
> >switch that.
> > 4. Change logic from dev->config_enabled to
> >dev->config_core_enabled && !dev->config_driver_disabled
> 
> If we make config_driver_disabled by default true,

No, we make it false by default.

> we need someone to
> enable it explicitly. If it's core, it breaks the semantic that it is
> under the control of the driver (or needs to synchronize with the
> driver). If it's a driver, each driver needs to enable it at some time
> which can be easily forgotten. And if we end up with workarounds l

Re: [PATCH V2 3/3] virtio-net: synchronize operstate with admin state on up/down

2024-06-24 Thread Michael S. Tsirkin
On Mon, Jun 24, 2024 at 10:45:23AM +0800, Jason Wang wrote:
> This patch synchronize operstate with admin state per RFC2863.
> 
> This is done by trying to toggle the carrier upon open/close and
> synchronize with the config change work. This allows propagate status
> correctly to stacked devices like:
> 
> ip link add link enp0s3 macvlan0 type macvlan
> ip link set link enp0s3 down
> ip link show
> 
> Before this patch:
> 
> 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN mode 
> DEFAULT group default qlen 1000
> link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> ..
> 5: macvlan0@enp0s3:  mtu 1500 qdisc 
> noqueue state UP mode DEFAULT group default qlen 1000
> link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> 
> After this patch:
> 
> 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN mode 
> DEFAULT group default qlen 1000
> link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> ...
> 5: macvlan0@enp0s3:  mtu 1500 qdisc 
> noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
> link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> 
> Cc: Venkat Venkatsubra 
> Cc: Gia-Khanh Nguyen 
> Signed-off-by: Jason Wang 
> ---
>  drivers/net/virtio_net.c | 72 +++-
>  1 file changed, 42 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index b1f8b720733e..eff3ad3d6bcc 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2468,6 +2468,25 @@ static int virtnet_enable_queue_pair(struct 
> virtnet_info *vi, int qp_index)
>   return err;
>  }
>  
> +static void virtnet_update_settings(struct virtnet_info *vi)
> +{
> + u32 speed;
> + u8 duplex;
> +
> + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> + return;
> +
> + virtio_cread_le(vi->vdev, struct virtio_net_config, speed, );
> +
> + if (ethtool_validate_speed(speed))
> + vi->speed = speed;
> +
> + virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, );
> +
> + if (ethtool_validate_duplex(duplex))
> + vi->duplex = duplex;
> +}
> +
>  static int virtnet_open(struct net_device *dev)
>  {
>   struct virtnet_info *vi = netdev_priv(dev);
> @@ -2486,6 +2505,22 @@ static int virtnet_open(struct net_device *dev)
>   goto err_enable_qp;
>   }
>  
> + /* Assume link up if device can't report link status,
> +otherwise get link status from config. */
> + netif_carrier_off(dev);
> + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> + virtio_config_enable(vi->vdev);
> + /* We are not sure if config interrupt is disabled by
> +  * core or not, so we can't schedule config_work by
> +  * ourselves.
> +  */

This comment confuses more than it explains.
You seem to be arguing about some alternative design
you had in mind, but readers don't have it in mind.


Please just explain what this does and why.
For what: something like "Trigger re-read of config - same
as we'd do if config changed".

Now, please do what you don't do here: explain the why:


why do we want all these VM
exits on each open/close as opposed to once on probe and later on
config changed interrupt.


> + virtio_config_changed(vi->vdev);
> + } else {
> + vi->status = VIRTIO_NET_S_LINK_UP;
> + virtnet_update_settings(vi);
> + netif_carrier_on(dev);
> + }
> +
>   return 0;
>  
>  err_enable_qp:
> @@ -2928,12 +2963,19 @@ static int virtnet_close(struct net_device *dev)
>   disable_delayed_refill(vi);
>   /* Make sure refill_work doesn't re-enable napi! */
>   cancel_delayed_work_sync(>refill);
> + /* Make sure config notification doesn't schedule config work */
> + virtio_config_disable(vi->vdev);
> + /* Make sure status updating is cancelled */
> + cancel_work_sync(>config_work);
>  
>   for (i = 0; i < vi->max_queue_pairs; i++) {
>   virtnet_disable_queue_pair(vi, i);
>   cancel_work_sync(>rq[i].dim.work);
>   }
>  
> + vi->status &= ~VIRTIO_NET_S_LINK_UP;
> + netif_carrier_off(dev);
> +
>   return 0;
>  }
>  
> @@ -4632,25 +4674,6 @@ static void virtnet_init_settings(struct net_device 
> *dev)
>   vi->duplex = DUPLEX_UNKNOWN;
>  }
>  
> -static void virtnet_update_settings(struct virtnet_info *vi)
> -{
> - u32 speed;
> - u8 duplex;
> -
> - if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> - return;
> -
> - virtio_cread_le(vi->vdev, struct virtio_net_config, speed, );
> -
> - if (ethtool_validate_speed(speed))
> - vi->speed = speed;
> -
> - virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, );
> -
> - if (ethtool_validate_duplex(duplex))
> - vi->duplex = duplex;
> -}
> -
>  static u32 virtnet_get_rxfh_key_size(struct net_device *dev)
>  {
>   return 

Re: [PATCH V2 1/3] virtio: allow nested disabling of the configure interrupt

2024-06-24 Thread Michael S. Tsirkin
On Mon, Jun 24, 2024 at 10:45:21AM +0800, Jason Wang wrote:
> Somtime driver may want to enable or disable the config callback. This
> requires a synchronization with the core. So this patch change the
> config_enabled to be a integer counter. This allows the toggling of
> the config_enable to be synchronized between the virtio core and the
> virtio driver.
> 
> The counter is not allowed to be increased greater than one, this
> simplifies the logic where the interrupt could be disabled immediately
> without extra synchronization between driver and core.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/virtio/virtio.c | 20 +---
>  include/linux/virtio.h  |  2 +-
>  2 files changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index b968b2aa5f4d..d3aa74b8ae5d 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -127,7 +127,7 @@ static void __virtio_config_changed(struct virtio_device 
> *dev)
>  {
>   struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
>  
> - if (!dev->config_enabled)
> + if (dev->config_enabled < 1)
>   dev->config_change_pending = true;
>   else if (drv && drv->config_changed)
>   drv->config_changed(dev);
> @@ -146,17 +146,23 @@ EXPORT_SYMBOL_GPL(virtio_config_changed);
>  static void virtio_config_disable(struct virtio_device *dev)
>  {
>   spin_lock_irq(>config_lock);
> - dev->config_enabled = false;
> + --dev->config_enabled;
>   spin_unlock_irq(>config_lock);
>  }
>  
>  static void virtio_config_enable(struct virtio_device *dev)
>  {
>   spin_lock_irq(>config_lock);
> - dev->config_enabled = true;
> - if (dev->config_change_pending)
> - __virtio_config_changed(dev);
> - dev->config_change_pending = false;
> +
> + if (dev->config_enabled < 1) {
> + ++dev->config_enabled;
> + if (dev->config_enabled == 1 &&
> + dev->config_change_pending) {
> + __virtio_config_changed(dev);
> + dev->config_change_pending = false;
> + }
> + }
> +
>   spin_unlock_irq(>config_lock);
>  }
>

So every disable decrements the counter. Enable only increments it up to 1.
You seem to be making some very specific assumptions
about how this API will be used. Any misuse will lead to under/overflow
eventually ...



My suggestion would be to
1. rename config_enabled to config_core_enabled
2. rename virtio_config_enable/disable to virtio_config_core_enable/disable
3. add bool config_driver_disabled and make virtio_config_enable/disable
   switch that.
4. Change logic from dev->config_enabled to
   dev->config_core_enabled && !dev->config_driver_disabled



  
> @@ -455,7 +461,7 @@ int register_virtio_device(struct virtio_device *dev)
>   goto out_ida_remove;
>  
>   spin_lock_init(>config_lock);
> - dev->config_enabled = false;
> + dev->config_enabled = 0;
>   dev->config_change_pending = false;
>  
>   INIT_LIST_HEAD(>vqs);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 96fea920873b..4496f9ba5d82 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -132,7 +132,7 @@ struct virtio_admin_cmd {
>  struct virtio_device {
>   int index;
>   bool failed;
> - bool config_enabled;
> + int config_enabled;
>   bool config_change_pending;
>   spinlock_t config_lock;
>   spinlock_t vqs_list_lock;
> -- 
> 2.31.1




[PATCH net-next] net: virtio: unify code to init stats

2024-06-20 Thread Michael S. Tsirkin
Moving initialization of stats structure into
__free_old_xmit reduces the code size slightly.
It also makes it clearer that this function shouldn't
be called multiple times on the same stats struct.

Signed-off-by: Michael S. Tsirkin 
---

Especially important now that Jiri's patch for BQL has been merged.
Lightly tested.

 drivers/net/virtio_net.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 283b34d50296..c2ce8de340f7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -383,6 +383,8 @@ static void __free_old_xmit(struct send_queue *sq, bool 
in_napi,
unsigned int len;
void *ptr;
 
+   stats->bytes = stats->packets = 0;
+
while ((ptr = virtqueue_get_buf(sq->vq, )) != NULL) {
++stats->packets;
 
@@ -828,7 +830,7 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, 
void *buf)
 
 static void free_old_xmit(struct send_queue *sq, bool in_napi)
 {
-   struct virtnet_sq_free_stats stats = {0};
+   struct virtnet_sq_free_stats stats;
 
__free_old_xmit(sq, in_napi, );
 
@@ -979,7 +981,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
int n, struct xdp_frame **frames, u32 flags)
 {
struct virtnet_info *vi = netdev_priv(dev);
-   struct virtnet_sq_free_stats stats = {0};
+   struct virtnet_sq_free_stats stats;
struct receive_queue *rq = vi->rq;
struct bpf_prog *xdp_prog;
struct send_queue *sq;
-- 
MST




Re: [PATCH net-next V2] virtio-net: synchronize operstate with admin state on up/down

2024-06-19 Thread Michael S. Tsirkin
On Thu, Jun 06, 2024 at 08:22:13AM +0800, Jason Wang wrote:
> On Fri, May 31, 2024 at 8:18 AM Jason Wang  wrote:
> >
> > On Thu, May 30, 2024 at 9:09 PM Michael S. Tsirkin  wrote:
> > >
> > > On Thu, May 30, 2024 at 06:29:51PM +0800, Jason Wang wrote:
> > > > On Thu, May 30, 2024 at 2:10 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Thu, May 30, 2024 at 11:20:55AM +0800, Jason Wang wrote:
> > > > > > This patch synchronize operstate with admin state per RFC2863.
> > > > > >
> > > > > > This is done by trying to toggle the carrier upon open/close and
> > > > > > synchronize with the config change work. This allows propagate 
> > > > > > status
> > > > > > correctly to stacked devices like:
> > > > > >
> > > > > > ip link add link enp0s3 macvlan0 type macvlan
> > > > > > ip link set link enp0s3 down
> > > > > > ip link show
> > > > > >
> > > > > > Before this patch:
> > > > > >
> > > > > > 3: enp0s3:  mtu 1500 qdisc pfifo_fast state 
> > > > > > DOWN mode DEFAULT group default qlen 1000
> > > > > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> > > > > > ..
> > > > > > 5: macvlan0@enp0s3:  mtu 
> > > > > > 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
> > > > > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> > > > > >
> > > > > > After this patch:
> > > > > >
> > > > > > 3: enp0s3:  mtu 1500 qdisc pfifo_fast state 
> > > > > > DOWN mode DEFAULT group default qlen 1000
> > > > > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> > > > > > ...
> > > > > > 5: macvlan0@enp0s3:  mtu 
> > > > > > 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default 
> > > > > > qlen 1000
> > > > > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> > > > > >
> > > > > > Cc: Venkat Venkatsubra 
> > > > > > Cc: Gia-Khanh Nguyen 
> > > > > > Reviewed-by: Xuan Zhuo 
> > > > > > Acked-by: Michael S. Tsirkin 
> > > > > > Signed-off-by: Jason Wang 
> > > > > > ---
> > > > > > Changes since V1:
> > > > > > - rebase
> > > > > > - add ack/review tags
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 94 
> > > > > > +++-
> > > > > >  1 file changed, 63 insertions(+), 31 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index 4a802c0ea2cb..69e4ae353c51 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -433,6 +433,12 @@ struct virtnet_info {
> > > > > >   /* The lock to synchronize the access to refill_enabled */
> > > > > >   spinlock_t refill_lock;
> > > > > >
> > > > > > + /* Is config change enabled? */
> > > > > > + bool config_change_enabled;
> > > > > > +
> > > > > > + /* The lock to synchronize the access to 
> > > > > > config_change_enabled */
> > > > > > + spinlock_t config_change_lock;
> > > > > > +
> > > > > >   /* Work struct for config space updates */
> > > > > >   struct work_struct config_work;
> > > > > >
> > > > >
> > > > >
> > > > > But we already have dev->config_lock and dev->config_enabled.
> > > > >
> > > > > And it actually works better - instead of discarding config
> > > > > change events it defers them until enabled.
> > > > >
> > > >
> > > > Yes but then both virtio-net driver and virtio core can ask to enable
> > > > and disable and then we need some kind of synchronization which is
> > > > non-trivial.
> > >
> > > We

Re: [PATCH v2] net: missing check virtio

2024-06-19 Thread Michael S. Tsirkin
On Thu, Jun 13, 2024 at 12:54:48PM +0300, Denis Arefev wrote:
> Two missing check in virtio_net_hdr_to_skb() allowed syzbot
> to crash kernels again
> 
> 1. After the skb_segment function the buffer may become non-linear
> (nr_frags != 0), but since the SKBTX_SHARED_FRAG flag is not set anywhere
> the __skb_linearize function will not be executed, then the buffer will
> remain non-linear. Then the condition (offset >= skb_headlen(skb))
> becomes true, which causes WARN_ON_ONCE in skb_checksum_help.
> 
> 2. The struct sk_buff and struct virtio_net_hdr members must be
> mathematically related.
> (gso_size) must be greater than (needed) otherwise WARN_ON_ONCE.
> (remainder) must be greater than (needed) otherwise WARN_ON_ONCE.
> (remainder) may be 0 if division is without remainder.
> 
> offset+2 (4191) > skb_headlen() (1116)
> WARNING: CPU: 1 PID: 5084 at net/core/dev.c:3303 
> skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303
> Modules linked in:
> CPU: 1 PID: 5084 Comm: syz-executor336 Not tainted 
> 6.7.0-rc3-syzkaller-00014-gdf60cee26a2e #0
> Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 
> 11/10/2023
> RIP: 0010:skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303
> Code: 89 e8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 52 01 00 00 44 89 e2 2b 
> 53 74 4c 89 ee 48 c7 c7 40 57 e9 8b e8 af 8f dd f8 90 <0f> 0b 90 90 e9 87 fe 
> ff ff e8 40 0f 6e f9 e9 4b fa ff ff 48 89 ef
> RSP: 0018:c90003a9f338 EFLAGS: 00010286
> RAX:  RBX: 888025125780 RCX: 814db209
> RDX: 888015393b80 RSI: 814db216 RDI: 0001
> RBP: 8880251257f4 R08: 0001 R09: 
> R10:  R11: 0001 R12: 045c
> R13: 105f R14: 8880251257f0 R15: 105d
> FS:  55c24380() GS:8880b990() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 2000f000 CR3: 23151000 CR4: 003506f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  
>  ip_do_fragment+0xa1b/0x18b0 net/ipv4/ip_output.c:777
>  ip_fragment.constprop.0+0x161/0x230 net/ipv4/ip_output.c:584
>  ip_finish_output_gso net/ipv4/ip_output.c:286 [inline]
>  __ip_finish_output net/ipv4/ip_output.c:308 [inline]
>  __ip_finish_output+0x49c/0x650 net/ipv4/ip_output.c:295
>  ip_finish_output+0x31/0x310 net/ipv4/ip_output.c:323
>  NF_HOOK_COND include/linux/netfilter.h:303 [inline]
>  ip_output+0x13b/0x2a0 net/ipv4/ip_output.c:433
>  dst_output include/net/dst.h:451 [inline]
>  ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:129
>  iptunnel_xmit+0x5b4/0x9b0 net/ipv4/ip_tunnel_core.c:82
>  ipip6_tunnel_xmit net/ipv6/sit.c:1034 [inline]
>  sit_tunnel_xmit+0xed2/0x28f0 net/ipv6/sit.c:1076
>  __netdev_start_xmit include/linux/netdevice.h:4940 [inline]
>  netdev_start_xmit include/linux/netdevice.h:4954 [inline]
>  xmit_one net/core/dev.c:3545 [inline]
>  dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3561
>  __dev_queue_xmit+0x7c1/0x3d60 net/core/dev.c:4346
>  dev_queue_xmit include/linux/netdevice.h:3134 [inline]
>  packet_xmit+0x257/0x380 net/packet/af_packet.c:276
>  packet_snd net/packet/af_packet.c:3087 [inline]
>  packet_sendmsg+0x24ca/0x5240 net/packet/af_packet.c:3119
>  sock_sendmsg_nosec net/socket.c:730 [inline]
>  __sock_sendmsg+0xd5/0x180 net/socket.c:745
>  __sys_sendto+0x255/0x340 net/socket.c:2190
>  __do_sys_sendto net/socket.c:2202 [inline]
>  __se_sys_sendto net/socket.c:2198 [inline]
>  __x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198
>  do_syscall_x64 arch/x86/entry/common.c:51 [inline]
>  do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
>  entry_SYSCALL_64_after_hwframe+0x63/0x6b
> 
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller
> 
> Signed-off-by: Denis Arefev 
> ---
>  V1 -> V2: incorrect type in argument 2
>  include/linux/virtio_net.h | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> index 4dfa9b69ca8d..d1d7825318c3 100644
> --- a/include/linux/virtio_net.h
> +++ b/include/linux/virtio_net.h
> @@ -56,6 +56,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
>   unsigned int thlen = 0;
>   unsigned int p_off = 0;
>   unsigned int ip_proto;
> + u64 ret, remainder, gso_size;
>  
>   if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
>   switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
> @@ -98,6 +99,16 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff 
> *skb,
>   u32 off = __virtio16_to_cpu(little_endian, hdr->csum_offset);
>   u32 needed = start + max_t(u32, thlen, off + sizeof(__sum16));
>  
> + if (hdr->gso_size) {
> + gso_size = __virtio16_to_cpu(little_endian, 
> hdr->gso_size);
> + ret = div64_u64_rem(skb->len, gso_size, 

Re: [PATCH 1/2] vdpa: support set mac address from vdpa tool

2024-06-18 Thread Michael S. Tsirkin
On Mon, Jun 17, 2024 at 09:44:21AM -0700, Jakub Kicinski wrote:
> On Mon, 17 Jun 2024 12:20:19 -0400 Michael S. Tsirkin wrote:
> > > But the virtio spec doesn't allow setting the MAC...
> > > I'm probably just lost in the conversation but there's hypervisor side
> > > and there is user/VM side, each of them already has an interface to set
> > > the MAC. The MAC doesn't matter, but I want to make sure my mental model
> > > matches reality in case we start duplicating too much..  
> > 
> > An obvious part of provisioning is specifying the config space
> > of the device.
> 
> Agreed, that part is obvious.
> Please go ahead, I don't really care and you clearly don't have time
> to explain.

Thanks!
Just in case Cindy who is working on it is also confused,
here is what I meant:

- an interface to provision a device, including its config
  space, makes sense to me
- default mac address is part of config space, and would thus be covered
- note how this is different from ability to tweak the mac of an existing
  device


-- 
MST




Re: [PATCH 1/2] vdpa: support set mac address from vdpa tool

2024-06-17 Thread Michael S. Tsirkin
On Mon, Jun 17, 2024 at 08:20:02AM -0700, Jakub Kicinski wrote:
> On Mon, 17 Jun 2024 09:47:21 -0400 Michael S. Tsirkin wrote:
> > I don't know what this discussion is about, at this point.
> > For better or worse, vdpa gained interfaces for provisioning
> > new devices. Yes the solution space was wide but it's been there
> > for years so kind of too late to try and make people
> > move to another interface for that.
> > 
> > Having said that, vdpa interfaces are all built around
> > virtio spec. Let's try to stick to that.
> 
> But the virtio spec doesn't allow setting the MAC...
> I'm probably just lost in the conversation but there's hypervisor side
> and there is user/VM side, each of them already has an interface to set
> the MAC. The MAC doesn't matter, but I want to make sure my mental model
> matches reality in case we start duplicating too much..

An obvious part of provisioning is specifying the config space
of the device.

-- 
MST




Re: [PATCH 1/2] vdpa: support set mac address from vdpa tool

2024-06-17 Thread Michael S. Tsirkin
On Mon, Jun 17, 2024 at 03:02:43PM +0200, Jiri Pirko wrote:
> Mon, Jun 17, 2024 at 01:48:02PM CEST, pa...@nvidia.com wrote:
> >
> >> From: Jiri Pirko 
> >> Sent: Monday, June 17, 2024 5:10 PM
> >> 
> >> Mon, Jun 17, 2024 at 11:44:53AM CEST, pa...@nvidia.com wrote:
> >> >
> >> >> From: Jiri Pirko 
> >> >> Sent: Monday, June 17, 2024 3:09 PM
> >> >>
> >> >> Mon, Jun 17, 2024 at 04:57:23AM CEST, pa...@nvidia.com wrote:
> >> >> >
> >> >> >
> >> >> >> From: Jason Wang 
> >> >> >> Sent: Monday, June 17, 2024 7:18 AM
> >> >> >>
> >> >> >> On Wed, Jun 12, 2024 at 2:30 PM Jiri Pirko  wrote:
> >> >> >> >
> >> >> >> > Wed, Jun 12, 2024 at 03:58:10AM CEST, k...@kernel.org wrote:
> >> >> >> > >On Tue, 11 Jun 2024 13:32:32 +0800 Cindy Lu wrote:
> >> >> >> > >> Add new UAPI to support the mac address from vdpa tool
> >> >> >> > >> Function
> >> >> >> > >> vdpa_nl_cmd_dev_config_set_doit() will get the MAC address
> >> >> >> > >> from the vdpa tool and then set it to the device.
> >> >> >> > >>
> >> >> >> > >> The usage is: vdpa dev set name vdpa_name mac
> >> >> >> > >> **:**:**:**:**:**
> >> >> >> > >
> >> >> >> > >Why don't you use devlink?
> >> >> >> >
> >> >> >> > Fair question. Why does vdpa-specific uapi even exist? To have
> >> >> >> > driver-specific uapi Does not make any sense to me :/
> >> >> >>
> >> >> >> It came with devlink first actually, but switched to a dedicated 
> >> >> >> uAPI.
> >> >> >>
> >> >> >> Parav(cced) may explain more here.
> >> >> >>
> >> >> >Devlink configures function level mac that applies to all protocol
> >> >> >devices
> >> >> (vdpa, rdma, netdev) etc.
> >> >> >Additionally, vdpa device level mac can be different (an additional
> >> >> >one) to
> >> >> apply to only vdpa traffic.
> >> >> >Hence dedicated uAPI was added.
> >> >>
> >> >> There is 1:1 relation between vdpa instance and devlink port, isn't it?
> >> >> Then we have:
> >> >>devlink port function set DEV/PORT_INDEX hw_addr ADDR
> >> >>
> >> >Above command is privilege command done by the hypervisor on the port
> >> function.
> >> >Vpda level setting the mac is similar to a function owner driver setting 
> >> >the
> >> mac on the self netdev (even though devlink side has configured some mac 
> >> for
> >> it).
> >> >For example,
> >> >$ ip link set dev wlan1 address 00:11:22:33:44:55
> >> 
> >> Hmm, under what sceratio exacly this is needed?
> >The administrator on the host creating a vdpa device for the VM wants to 
> >configure the mac address for the VM.
> >This administrator may not have the access to the devlink port function.
> >Or he may just prefer a different MAC (theoretical case).
> 
> Right, but that is not reason for new uapi but rather reason to alter
> existing devlink model to have the "host side". We discussed this many
> times.
> 
> 
> >
> >> I mean, the VM that has VDPA device can actually do that too. 
> >VM cannot do. Virtio spec do not allow modifying the mac address.
> 
> I see. Any good reason to not allow that?
> 
> 
> >
> >> That is the actual function owner.
> >vdpa is not mapping a whole VF to the VM.
> >It is getting some synthetic PCI device composed using several software 
> >(kernel) and user space layers.
> >so VM is not the function owner.
> 
> Sure, but owner of the netdev side, to what the mac is related. That is
> my point.


I don't know what this discussion is about, at this point.
For better or worse, vdpa gained interfaces for provisioning
new devices. Yes the solution space was wide but it's been there
for years so kind of too late to try and make people
move to another interface for that.

Having said that, vdpa interfaces are all built around
virtio spec. Let's try to stick to that.


-- 
MST




Re: [PATCH] vringh: add MODULE_DESCRIPTION()

2024-06-17 Thread Michael S. Tsirkin
On Sat, Jun 15, 2024 at 02:50:11PM -0700, Jeff Johnson wrote:
> On 5/16/2024 6:57 PM, Jeff Johnson wrote:
> > Fix the allmodconfig 'make w=1' issue:
> > 
> > WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/vhost/vringh.o
> > 
> > Signed-off-by: Jeff Johnson 
> > ---
> >  drivers/vhost/vringh.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> > index 7b8fd977f71c..73e153f9b449 100644
> > --- a/drivers/vhost/vringh.c
> > +++ b/drivers/vhost/vringh.c
> > @@ -1614,4 +1614,5 @@ EXPORT_SYMBOL(vringh_need_notify_iotlb);
> >  
> >  #endif
> >  
> > +MODULE_DESCRIPTION("host side of a virtio ring");
> >  MODULE_LICENSE("GPL");
> > 
> > ---
> > base-commit: 7f094f0e3866f83ca705519b1e8f5a7d6ecce232
> > change-id: 20240516-md-vringh-c43803ae0ba4
> > 
> 
> Just following up to see if anything else is needed to pick this up.

I tagged this, will be in the next pull.

Thanks!




Re: [PATCH] virtio_net: Eliminate OOO packets during switching

2024-06-17 Thread Michael S. Tsirkin
On Fri, Jun 14, 2024 at 10:04:22PM +, Abhinav Jain wrote:
> Disable the network device & turn off carrier before modifying the
> number of queue pairs.
> Process all the in-flight packets and then turn on carrier, followed
> by waking up all the queues on the network device.

Did you test that there's a workload with OOO and
this patch actually prevents that?

> 
> Signed-off-by: Abhinav Jain 


> ---
>  drivers/net/virtio_net.c | 17 +++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 61a57d134544..d0a655a3b4c6 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3447,7 +3447,6 @@ static void virtnet_get_drvinfo(struct net_device *dev,
>  
>  }
>  
> -/* TODO: Eliminate OOO packets during switching */
>  static int virtnet_set_channels(struct net_device *dev,
>   struct ethtool_channels *channels)
>  {
> @@ -3471,6 +3470,15 @@ static int virtnet_set_channels(struct net_device *dev,
>   if (vi->rq[0].xdp_prog)
>   return -EINVAL;
>  
> + /* Disable network device to prevent packet processing during
> +  * the switch.
> +  */
> + netif_tx_disable(dev);
> + netif_carrier_off(dev);

Won't turning off carrier cause a lot of damage such as
changing IP and so on?

> +
> + /* Make certain that all in-flight packets are processed. */
> + synchronize_net();
> +

The comment seems to say what the code does not do.


Also, doing this under rtnl is a heavy weight operation.



>   cpus_read_lock();
>   err = virtnet_set_queues(vi, queue_pairs);
>   if (err) {
> @@ -3482,7 +3490,12 @@ static int virtnet_set_channels(struct net_device *dev,
>  
>   netif_set_real_num_tx_queues(dev, queue_pairs);
>   netif_set_real_num_rx_queues(dev, queue_pairs);
> - err:
> +
> + /* Restart the network device */
> + netif_carrier_on(dev);
> + netif_tx_wake_all_queues(dev);
> +
> +err:
>   return err;
>  }
>  



Given the result is, presumably, improved performance with less
packet loss due to OOO, I'd like to see some actual testing results,
hopefully also measuring the effect on CPU load.




> -- 
> 2.34.1




Re: [PATCH 1/2] vdpa: support set mac address from vdpa tool

2024-06-13 Thread Michael S. Tsirkin
On Thu, Jun 13, 2024 at 09:21:07AM +0200, Jiri Pirko wrote:
> Thu, Jun 13, 2024 at 08:49:25AM CEST, m...@redhat.com wrote:
> >On Wed, Jun 12, 2024 at 09:22:32AM +0200, Jiri Pirko wrote:
> >> Wed, Jun 12, 2024 at 09:15:44AM CEST, m...@redhat.com wrote:
> >> >On Wed, Jun 12, 2024 at 08:29:53AM +0200, Jiri Pirko wrote:
> >> >> Wed, Jun 12, 2024 at 03:58:10AM CEST, k...@kernel.org wrote:
> >> >> >On Tue, 11 Jun 2024 13:32:32 +0800 Cindy Lu wrote:
> >> >> >> Add new UAPI to support the mac address from vdpa tool
> >> >> >> Function vdpa_nl_cmd_dev_config_set_doit() will get the
> >> >> >> MAC address from the vdpa tool and then set it to the device.
> >> >> >> 
> >> >> >> The usage is: vdpa dev set name vdpa_name mac **:**:**:**:**:**
> >> >> >
> >> >> >Why don't you use devlink?
> >> >> 
> >> >> Fair question. Why does vdpa-specific uapi even exist? To have
> >> >> driver-specific uapi Does not make any sense to me :/
> >> >
> >> >I am not sure which uapi do you refer to? The one this patch proposes or
> >> >the existing one?
> >> 
> >> Sure, I'm sure pointing out, that devlink should have been the answer
> >> instead of vdpa netlink introduction. That ship is sailed,
> >
> >> now we have
> >> unfortunate api duplication which leads to questions like Jakub's one.
> >> That's all :/
> >
> >
> >
> >Yea there's no point to argue now, there were arguments this and that
> >way.  I don't think we currently have a lot
> >of duplication, do we?
> 
> True. I think it would be good to establish guidelines for api
> extensions in this area.
> 
> >
> >-- 
> >MST
> >


Guidelines are good, are there existing examples of such guidelines in
Linux to follow though? Specifically after reviewing this some more, I
think what Cindy is trying to do is actually provisioning as opposed to
programming.

-- 
MST




Re: [PATCH 1/2] vdpa: support set mac address from vdpa tool

2024-06-13 Thread Michael S. Tsirkin
On Tue, Jun 11, 2024 at 01:32:32PM +0800, Cindy Lu wrote:
> Add new UAPI to support the mac address from vdpa tool
> Function vdpa_nl_cmd_dev_config_set_doit() will get the
> MAC address from the vdpa tool and then set it to the device.
> 
> The usage is: vdpa dev set name vdpa_name mac **:**:**:**:**:**
> 
> Here is sample:
> root@L1# vdpa -jp dev config show vdpa0
> {
> "config": {
> "vdpa0": {
> "mac": "82:4d:e9:5d:d7:e6",
> "link ": "up",
> "link_announce ": false,
> "mtu": 1500
> }
> }
> }
> 
> root@L1# vdpa dev set name vdpa0 mac 00:11:22:33:44:55
> 
> root@L1# vdpa -jp dev config show vdpa0
> {
> "config": {
> "vdpa0": {
> "mac": "00:11:22:33:44:55",
> "link ": "up",
> "link_announce ": false,
> "mtu": 1500
> }
> }
> }
> 
> Signed-off-by: Cindy Lu 



I think actually the idea of allowing provisioning
by specifying config of the device is actually valid.
However
- the name SET_CONFIG makes people think this allows
  writing even when e.g. device is assigned to guest
- having the internal api be mac specific is weird

Shouldn't config be an attribute maybe, not a new command?


> ---
>  drivers/vdpa/vdpa.c   | 71 +++
>  include/linux/vdpa.h  |  2 ++
>  include/uapi/linux/vdpa.h |  1 +
>  3 files changed, 74 insertions(+)
> 
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> index a7612e0783b3..347ae6e7749d 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -1149,6 +1149,72 @@ static int vdpa_nl_cmd_dev_config_get_doit(struct 
> sk_buff *skb, struct genl_info
>   return err;
>  }
>  
> +static int vdpa_nl_cmd_dev_config_set_doit(struct sk_buff *skb,
> +struct genl_info *info)
> +{
> + struct vdpa_dev_set_config set_config = {};
> + struct nlattr **nl_attrs = info->attrs;
> + struct vdpa_mgmt_dev *mdev;
> + const u8 *macaddr;
> + const char *name;
> + int err = 0;
> + struct device *dev;
> + struct vdpa_device *vdev;
> +
> + if (!info->attrs[VDPA_ATTR_DEV_NAME])
> + return -EINVAL;
> +
> + name = nla_data(info->attrs[VDPA_ATTR_DEV_NAME]);
> +
> + down_write(_dev_lock);
> + dev = bus_find_device(_bus, NULL, name, vdpa_name_match);
> + if (!dev) {
> + NL_SET_ERR_MSG_MOD(info->extack, "device not found");
> + err = -ENODEV;
> + goto dev_err;
> + }
> + vdev = container_of(dev, struct vdpa_device, dev);
> + if (!vdev->mdev) {
> + NL_SET_ERR_MSG_MOD(
> + info->extack,
> + "Fail to find the specified management device");
> + err = -EINVAL;
> + goto mdev_err;
> + }
> + mdev = vdev->mdev;
> + if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MACADDR]) {
> + if (!(mdev->supported_features & BIT_ULL(VIRTIO_NET_F_MAC))) {
> + NL_SET_ERR_MSG_FMT_MOD(
> + info->extack,
> + "Missing features 0x%llx for provided 
> attributes",
> + BIT_ULL(VIRTIO_NET_F_MAC));
> + err = -EINVAL;
> + goto mdev_err;
> + }
> + macaddr = nla_data(nl_attrs[VDPA_ATTR_DEV_NET_CFG_MACADDR]);
> + memcpy(set_config.net.mac, macaddr, ETH_ALEN);
> + set_config.mask |= BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR);
> + if (mdev->ops->set_mac) {
> + err = mdev->ops->set_mac(mdev, vdev, _config);
> + } else {
> + NL_SET_ERR_MSG_FMT_MOD(
> + info->extack,
> + "%s device not support set mac address ", name);
> + }
> +
> + } else {
> + NL_SET_ERR_MSG_FMT_MOD(info->extack,
> +"%s device not support this config ",
> +name);
> + }
> +
> +mdev_err:
> + put_device(dev);
> +dev_err:
> + up_write(_dev_lock);
> + return err;
> +}
> +
>  static int vdpa_dev_config_dump(struct device *dev, void *data)
>  {
>   struct vdpa_device *vdev = container_of(dev, struct vdpa_device, dev);
> @@ -1285,6 +1351,11 @@ static const struct genl_ops vdpa_nl_ops[] = {
>   .doit = vdpa_nl_cmd_dev_stats_get_doit,
>   .flags = GENL_ADMIN_PERM,
>   },
> + {
> + .cmd = VDPA_CMD_DEV_CONFIG_SET,
> + .doit = vdpa_nl_cmd_dev_config_set_doit,
> + .flags = GENL_ADMIN_PERM,
> + },
>  };
>  
>  static struct genl_family vdpa_nl_family __ro_after_init = {
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index db15ac07f8a6..c97f4f1da753 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -581,6 +581,8 @@ struct vdpa_mgmtdev_ops {
>  

Re: [PATCH 1/2] vdpa: support set mac address from vdpa tool

2024-06-13 Thread Michael S. Tsirkin
On Wed, Jun 12, 2024 at 09:22:32AM +0200, Jiri Pirko wrote:
> Wed, Jun 12, 2024 at 09:15:44AM CEST, m...@redhat.com wrote:
> >On Wed, Jun 12, 2024 at 08:29:53AM +0200, Jiri Pirko wrote:
> >> Wed, Jun 12, 2024 at 03:58:10AM CEST, k...@kernel.org wrote:
> >> >On Tue, 11 Jun 2024 13:32:32 +0800 Cindy Lu wrote:
> >> >> Add new UAPI to support the mac address from vdpa tool
> >> >> Function vdpa_nl_cmd_dev_config_set_doit() will get the
> >> >> MAC address from the vdpa tool and then set it to the device.
> >> >> 
> >> >> The usage is: vdpa dev set name vdpa_name mac **:**:**:**:**:**
> >> >
> >> >Why don't you use devlink?
> >> 
> >> Fair question. Why does vdpa-specific uapi even exist? To have
> >> driver-specific uapi Does not make any sense to me :/
> >
> >I am not sure which uapi do you refer to? The one this patch proposes or
> >the existing one?
> 
> Sure, I'm sure pointing out, that devlink should have been the answer
> instead of vdpa netlink introduction. That ship is sailed,

> now we have
> unfortunate api duplication which leads to questions like Jakub's one.
> That's all :/



Yea there's no point to argue now, there were arguments this and that
way.  I don't think we currently have a lot
of duplication, do we?

-- 
MST




Re: [PATCH 1/2] vdpa: support set mac address from vdpa tool

2024-06-12 Thread Michael S. Tsirkin
On Wed, Jun 12, 2024 at 08:29:53AM +0200, Jiri Pirko wrote:
> Wed, Jun 12, 2024 at 03:58:10AM CEST, k...@kernel.org wrote:
> >On Tue, 11 Jun 2024 13:32:32 +0800 Cindy Lu wrote:
> >> Add new UAPI to support the mac address from vdpa tool
> >> Function vdpa_nl_cmd_dev_config_set_doit() will get the
> >> MAC address from the vdpa tool and then set it to the device.
> >> 
> >> The usage is: vdpa dev set name vdpa_name mac **:**:**:**:**:**
> >
> >Why don't you use devlink?
> 
> Fair question. Why does vdpa-specific uapi even exist? To have
> driver-specific uapi Does not make any sense to me :/

I am not sure which uapi do you refer to? The one this patch proposes or
the existing one?



-- 
MST




Re: [PATCH 1/2] vdpa: support set mac address from vdpa tool

2024-06-12 Thread Michael S. Tsirkin
On Tue, Jun 11, 2024 at 01:32:32PM +0800, Cindy Lu wrote:
> Add new UAPI to support the mac address from vdpa tool

The patch does not do what commit log says.
Instead there's an internal API to set mac and
a UAPI to write into config space.

> Function vdpa_nl_cmd_dev_config_set_doit() will get the
> MAC address from the vdpa tool and then set it to the device.
> 
> The usage is: vdpa dev set name vdpa_name mac **:**:**:**:**:**
> 
> Here is sample:
> root@L1# vdpa -jp dev config show vdpa0
> {
> "config": {
> "vdpa0": {
> "mac": "82:4d:e9:5d:d7:e6",
> "link ": "up",
> "link_announce ": false,
> "mtu": 1500
> }
> }
> }
> 
> root@L1# vdpa dev set name vdpa0 mac 00:11:22:33:44:55
> 
> root@L1# vdpa -jp dev config show vdpa0
> {
> "config": {
> "vdpa0": {
> "mac": "00:11:22:33:44:55",
> "link ": "up",
> "link_announce ": false,
> "mtu": 1500
> }
> }
> }
> 
> Signed-off-by: Cindy Lu 
> ---
>  drivers/vdpa/vdpa.c   | 71 +++
>  include/linux/vdpa.h  |  2 ++
>  include/uapi/linux/vdpa.h |  1 +
>  3 files changed, 74 insertions(+)
> 
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> index a7612e0783b3..347ae6e7749d 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -1149,6 +1149,72 @@ static int vdpa_nl_cmd_dev_config_get_doit(struct 
> sk_buff *skb, struct genl_info
>   return err;
>  }
>  
> +static int vdpa_nl_cmd_dev_config_set_doit(struct sk_buff *skb,
> +struct genl_info *info)
> +{
> + struct vdpa_dev_set_config set_config = {};
> + struct nlattr **nl_attrs = info->attrs;
> + struct vdpa_mgmt_dev *mdev;
> + const u8 *macaddr;
> + const char *name;
> + int err = 0;
> + struct device *dev;
> + struct vdpa_device *vdev;
> +
> + if (!info->attrs[VDPA_ATTR_DEV_NAME])
> + return -EINVAL;
> +
> + name = nla_data(info->attrs[VDPA_ATTR_DEV_NAME]);
> +
> + down_write(_dev_lock);
> + dev = bus_find_device(_bus, NULL, name, vdpa_name_match);
> + if (!dev) {
> + NL_SET_ERR_MSG_MOD(info->extack, "device not found");
> + err = -ENODEV;
> + goto dev_err;
> + }
> + vdev = container_of(dev, struct vdpa_device, dev);
> + if (!vdev->mdev) {
> + NL_SET_ERR_MSG_MOD(
> + info->extack,
> + "Fail to find the specified management device");
> + err = -EINVAL;
> + goto mdev_err;
> + }
> + mdev = vdev->mdev;
> + if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MACADDR]) {
> + if (!(mdev->supported_features & BIT_ULL(VIRTIO_NET_F_MAC))) {


Seems to poke at a device without even making sure it's a network
device.

> + NL_SET_ERR_MSG_FMT_MOD(
> + info->extack,
> + "Missing features 0x%llx for provided 
> attributes",
> + BIT_ULL(VIRTIO_NET_F_MAC));
> + err = -EINVAL;
> + goto mdev_err;
> + }
> + macaddr = nla_data(nl_attrs[VDPA_ATTR_DEV_NET_CFG_MACADDR]);
> + memcpy(set_config.net.mac, macaddr, ETH_ALEN);
> + set_config.mask |= BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR);
> + if (mdev->ops->set_mac) {
> + err = mdev->ops->set_mac(mdev, vdev, _config);
> + } else {
> + NL_SET_ERR_MSG_FMT_MOD(
> + info->extack,
> + "%s device not support set mac address ", name);
> + }
> +
> + } else {
> + NL_SET_ERR_MSG_FMT_MOD(info->extack,
> +"%s device not support this config ",
> +name);
> + }
> +
> +mdev_err:
> + put_device(dev);
> +dev_err:
> + up_write(_dev_lock);
> + return err;
> +}
> +
>  static int vdpa_dev_config_dump(struct device *dev, void *data)
>  {
>   struct vdpa_device *vdev = container_of(dev, struct vdpa_device, dev);
> @@ -1285,6 +1351,11 @@ static const struct genl_ops vdpa_nl_ops[] = {
>   .doit = vdpa_nl_cmd_dev_stats_get_doit,
>   .flags = GENL_ADMIN_PERM,
>   },
> + {
> + .cmd = VDPA_CMD_DEV_CONFIG_SET,
> + .doit = vdpa_nl_cmd_dev_config_set_doit,
> + .flags = GENL_ADMIN_PERM,
> + },
>  };
>  
>  static struct genl_family vdpa_nl_family __ro_after_init = {
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index db15ac07f8a6..c97f4f1da753 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -581,6 +581,8 @@ struct vdpa_mgmtdev_ops {
>   int (*dev_add)(struct vdpa_mgmt_dev *mdev, const char *name,
>  const struct vdpa_dev_set_config 

Re: [PATCH] tools/virtio: Use the __GFP_ZERO flag of kmalloc to complete the memory initialization.

2024-06-06 Thread Michael S. Tsirkin
On Wed, Jun 05, 2024 at 09:52:45PM +0800, cuitao wrote:
> Use the __GFP_ZERO flag of kmalloc to initialize memory while allocating it,
> without the need for an additional memset call.
> 
> Signed-off-by: cuitao 
> ---
>  tools/virtio/linux/kernel.h | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/tools/virtio/linux/kernel.h b/tools/virtio/linux/kernel.h
> index 6702008f7f5c..9e401fb7c215 100644
> --- a/tools/virtio/linux/kernel.h
> +++ b/tools/virtio/linux/kernel.h
> @@ -66,10 +66,7 @@ static inline void *kmalloc_array(unsigned n, size_t s, 
> gfp_t gfp)
>  
>  static inline void *kzalloc(size_t s, gfp_t gfp)
>  {
> - void *p = kmalloc(s, gfp);
> -
> - memset(p, 0, s);
> - return p;
> + return kmalloc(s, gfp | __GFP_ZERO);
>  }


Why do we care? It's just here to make things compile. The simpler the
better.

>  static inline void *alloc_pages_exact(size_t s, gfp_t gfp)
> -- 
> 2.25.1




Re: [PATCH net-next V2] virtio-net: synchronize operstate with admin state on up/down

2024-05-30 Thread Michael S. Tsirkin
On Thu, May 30, 2024 at 06:29:51PM +0800, Jason Wang wrote:
> On Thu, May 30, 2024 at 2:10 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, May 30, 2024 at 11:20:55AM +0800, Jason Wang wrote:
> > > This patch synchronize operstate with admin state per RFC2863.
> > >
> > > This is done by trying to toggle the carrier upon open/close and
> > > synchronize with the config change work. This allows propagate status
> > > correctly to stacked devices like:
> > >
> > > ip link add link enp0s3 macvlan0 type macvlan
> > > ip link set link enp0s3 down
> > > ip link show
> > >
> > > Before this patch:
> > >
> > > 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN 
> > > mode DEFAULT group default qlen 1000
> > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> > > ..
> > > 5: macvlan0@enp0s3:  mtu 1500 
> > > qdisc noqueue state UP mode DEFAULT group default qlen 1000
> > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> > >
> > > After this patch:
> > >
> > > 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN 
> > > mode DEFAULT group default qlen 1000
> > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> > > ...
> > > 5: macvlan0@enp0s3:  mtu 1500 
> > > qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
> > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> > >
> > > Cc: Venkat Venkatsubra 
> > > Cc: Gia-Khanh Nguyen 
> > > Reviewed-by: Xuan Zhuo 
> > > Acked-by: Michael S. Tsirkin 
> > > Signed-off-by: Jason Wang 
> > > ---
> > > Changes since V1:
> > > - rebase
> > > - add ack/review tags
> >
> >
> >
> >
> >
> > > ---
> > >  drivers/net/virtio_net.c | 94 +++-
> > >  1 file changed, 63 insertions(+), 31 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 4a802c0ea2cb..69e4ae353c51 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -433,6 +433,12 @@ struct virtnet_info {
> > >   /* The lock to synchronize the access to refill_enabled */
> > >   spinlock_t refill_lock;
> > >
> > > + /* Is config change enabled? */
> > > + bool config_change_enabled;
> > > +
> > > + /* The lock to synchronize the access to config_change_enabled */
> > > + spinlock_t config_change_lock;
> > > +
> > >   /* Work struct for config space updates */
> > >   struct work_struct config_work;
> > >
> >
> >
> > But we already have dev->config_lock and dev->config_enabled.
> >
> > And it actually works better - instead of discarding config
> > change events it defers them until enabled.
> >
> 
> Yes but then both virtio-net driver and virtio core can ask to enable
> and disable and then we need some kind of synchronization which is
> non-trivial.

Well for core it happens on bring up path before driver works
and later on tear down after it is gone.
So I do not think they ever do it at the same time.


> And device enabling on the core is different from bringing the device
> up in the networking subsystem. Here we just delay to deal with the
> config change interrupt on ndo_open(). (E.g try to ack announce is
> meaningless when the device is down).
> 
> Thanks

another thing is that it is better not to re-read all config
on link up if there was no config interrupt - less vm exits.

-- 
MST




Re: [PATCH net-next V2] virtio-net: synchronize operstate with admin state on up/down

2024-05-30 Thread Michael S. Tsirkin
On Thu, May 30, 2024 at 11:20:55AM +0800, Jason Wang wrote:
> This patch synchronize operstate with admin state per RFC2863.
> 
> This is done by trying to toggle the carrier upon open/close and
> synchronize with the config change work. This allows propagate status
> correctly to stacked devices like:
> 
> ip link add link enp0s3 macvlan0 type macvlan
> ip link set link enp0s3 down
> ip link show
> 
> Before this patch:
> 
> 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN mode 
> DEFAULT group default qlen 1000
> link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> ..
> 5: macvlan0@enp0s3:  mtu 1500 qdisc 
> noqueue state UP mode DEFAULT group default qlen 1000
> link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> 
> After this patch:
> 
> 3: enp0s3:  mtu 1500 qdisc pfifo_fast state DOWN mode 
> DEFAULT group default qlen 1000
> link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> ...
> 5: macvlan0@enp0s3:  mtu 1500 qdisc 
> noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
> link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> 
> Cc: Venkat Venkatsubra 
> Cc: Gia-Khanh Nguyen 
> Reviewed-by: Xuan Zhuo 
> Acked-by: Michael S. Tsirkin 
> Signed-off-by: Jason Wang 
> ---
> Changes since V1:
> - rebase
> - add ack/review tags





> ---
>  drivers/net/virtio_net.c | 94 +++-
>  1 file changed, 63 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 4a802c0ea2cb..69e4ae353c51 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -433,6 +433,12 @@ struct virtnet_info {
>   /* The lock to synchronize the access to refill_enabled */
>   spinlock_t refill_lock;
>  
> + /* Is config change enabled? */
> + bool config_change_enabled;
> +
> + /* The lock to synchronize the access to config_change_enabled */
> + spinlock_t config_change_lock;
> +
>   /* Work struct for config space updates */
>   struct work_struct config_work;
>  


But we already have dev->config_lock and dev->config_enabled.

And it actually works better - instead of discarding config
change events it defers them until enabled.



> @@ -623,6 +629,20 @@ static void disable_delayed_refill(struct virtnet_info 
> *vi)
>   spin_unlock_bh(>refill_lock);
>  }
>  
> +static void enable_config_change(struct virtnet_info *vi)
> +{
> + spin_lock_irq(>config_change_lock);
> + vi->config_change_enabled = true;
> + spin_unlock_irq(>config_change_lock);
> +}
> +
> +static void disable_config_change(struct virtnet_info *vi)
> +{
> + spin_lock_irq(>config_change_lock);
> + vi->config_change_enabled = false;
> + spin_unlock_irq(>config_change_lock);
> +}
> +
>  static void enable_rx_mode_work(struct virtnet_info *vi)
>  {
>   rtnl_lock();
> @@ -2421,6 +2441,25 @@ static int virtnet_enable_queue_pair(struct 
> virtnet_info *vi, int qp_index)
>   return err;
>  }
>  
> +static void virtnet_update_settings(struct virtnet_info *vi)
> +{
> + u32 speed;
> + u8 duplex;
> +
> + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> + return;
> +
> + virtio_cread_le(vi->vdev, struct virtio_net_config, speed, );
> +
> + if (ethtool_validate_speed(speed))
> + vi->speed = speed;
> +
> + virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, );
> +
> + if (ethtool_validate_duplex(duplex))
> + vi->duplex = duplex;
> +}
> +
>  static int virtnet_open(struct net_device *dev)
>  {
>   struct virtnet_info *vi = netdev_priv(dev);
> @@ -2439,6 +2478,18 @@ static int virtnet_open(struct net_device *dev)
>   goto err_enable_qp;
>   }
>  
> + /* Assume link up if device can't report link status,
> +otherwise get link status from config. */
> + netif_carrier_off(dev);
> + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> + enable_config_change(vi);
> + schedule_work(>config_work);
> + } else {
> + vi->status = VIRTIO_NET_S_LINK_UP;
> + virtnet_update_settings(vi);
> + netif_carrier_on(dev);
> + }
> +
>   return 0;
>  
>  err_enable_qp:
> @@ -2875,12 +2926,19 @@ static int virtnet_close(struct net_device *dev)
>   disable_delayed_refill(vi);
>   /* Make sure refill_work doesn't re-enable napi! */
>   cancel_delayed_work_sync(>refill);
> + /* Make sure config notification doesn't schedule config

Re: [PATCH] tools/virtio: pipe assertion in vring_test.c

2024-05-27 Thread Michael S. Tsirkin
On Mon, May 27, 2024 at 04:13:31PM +0900, ysk...@gmail.com wrote:
> From: Yunseong Kim 
> 
> The virtio_device need to fail checking when create the geust/host pipe.

typo

> 
> Signed-off-by: Yunseong Kim 


I guess ... 

> ---
>  tools/virtio/vringh_test.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/virtio/vringh_test.c b/tools/virtio/vringh_test.c
> index 98ff808d6f0c..b1af8807c02a 100644
> --- a/tools/virtio/vringh_test.c
> +++ b/tools/virtio/vringh_test.c
> @@ -161,8 +161,8 @@ static int parallel_test(u64 features,
>   host_map = mmap(NULL, mapsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
>   guest_map = mmap(NULL, mapsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 
> 0);
>  
> - pipe(to_guest);
> - pipe(to_host);
> + assert(pipe(to_guest) == 0);
> + assert(pipe(to_host) == 0);


I don't like == 0, prefer ! .
Also, calling pipe outside assert is preferable, since in theory
assert can be compiled out.
Not an issue here but people tend to copy/paste text.

>   CPU_ZERO(_set);
>   find_cpus(_cpu, _cpu);
> -- 
> 2.34.1




Re: [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices

2024-05-23 Thread Michael S. Tsirkin
On Fri, May 17, 2024 at 10:46:02PM +0800, Xuewei Niu wrote:
>  include/linux/virtio_vsock.h|   2 +-
>  include/net/af_vsock.h  |  25 ++-
>  include/uapi/linux/virtio_vsock.h   |   1 +
>  include/uapi/linux/vm_sockets.h |  14 ++
>  net/vmw_vsock/af_vsock.c| 116 +--
>  net/vmw_vsock/virtio_transport.c| 255 ++--
>  net/vmw_vsock/virtio_transport_common.c |  16 +-
>  net/vmw_vsock/vsock_loopback.c  |   4 +-
>  8 files changed, 352 insertions(+), 81 deletions(-)

As any change to virtio device/driver interface, this has to
go through the virtio TC. Please subscribe at
virtio-comment+subscr...@lists.linux.dev and then
contact the TC at virtio-comm...@lists.linux.dev

You will likely eventually need to write a spec draft document, too.

-- 
MST




[GIT PULL v2] virtio: features, fixes, cleanups

2024-05-23 Thread Michael S. Tsirkin


Things to note here:
- dropped a couple of patches at the last moment. Did a bunch
  of testing in the last day to make sure that's not causing
  any fallout, it's a revert and no other changes in the same area
  so I feel rather safe doing that.
- the new Marvell OCTEON DPU driver is not here: latest v4 keeps causing
  build failures on mips. I kept deferring the pull hoping to get it in
  and I might try to merge a new version post rc1 (supposed to be ok for
  new drivers as they can't cause regressions), but we'll see.
- there are also a couple bugfixes under review, to be merged after rc1
- there is a trivial conflict in the header file. Shouldn't be any
  trouble to resolve, but fyi the resolution by Stephen is here
diff --cc drivers/virtio/virtio_mem.c
index e8355f55a8f7,6d4dfbc53a66..
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@@ -21,7 -21,7 +21,8 @@@
  #include 
  #include 
  #include 
 +#include 
+ #include 
  Also see it here:
  https://lore.kernel.org/all/20240423145947.14217...@canb.auug.org.au/


The following changes since commit 18daea77cca626f590fb140fc11e3a43c5d41354:

  Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm 
(2024-04-30 12:40:41 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to c8fae27d141a32a1624d0d0d5419d94252824498:

  virtio-pci: Check if is_avq is NULL (2024-05-22 08:39:41 -0400)


virtio: features, fixes, cleanups

Several new features here:

- virtio-net is finally supported in vduse.

- Virtio (balloon and mem) interaction with suspend is improved

- vhost-scsi now handles signals better/faster.

Fixes, cleanups all over the place.

Signed-off-by: Michael S. Tsirkin 


Christophe JAILLET (1):
  vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API

David Hildenbrand (1):
  virtio-mem: support suspend+resume

David Stevens (2):
  virtio_balloon: Give the balloon its own wakeup source
  virtio_balloon: Treat stats requests as wakeup events

Eugenio Pérez (1):
  MAINTAINERS: add Eugenio Pérez as reviewer

Jiri Pirko (1):
  virtio: delete vq in vp_find_vqs_msix() when request_irq() fails

Krzysztof Kozlowski (24):
  virtio: balloon: drop owner assignment
  virtio: input: drop owner assignment
  virtio: mem: drop owner assignment
  um: virt-pci: drop owner assignment
  virtio_blk: drop owner assignment
  bluetooth: virtio: drop owner assignment
  hwrng: virtio: drop owner assignment
  virtio_console: drop owner assignment
  crypto: virtio - drop owner assignment
  firmware: arm_scmi: virtio: drop owner assignment
  gpio: virtio: drop owner assignment
  drm/virtio: drop owner assignment
  iommu: virtio: drop owner assignment
  misc: nsm: drop owner assignment
  net: caif: virtio: drop owner assignment
  net: virtio: drop owner assignment
  net: 9p: virtio: drop owner assignment
  vsock/virtio: drop owner assignment
  wifi: mac80211_hwsim: drop owner assignment
  nvdimm: virtio_pmem: drop owner assignment
  rpmsg: virtio: drop owner assignment
  scsi: virtio: drop owner assignment
  fuse: virtio: drop owner assignment
  sound: virtio: drop owner assignment

Li Zhang (1):
  virtio-pci: Check if is_avq is NULL

Li Zhijian (1):
  vdpa: Convert sprintf/snprintf to sysfs_emit

Maxime Coquelin (3):
  vduse: validate block features only with block devices
  vduse: Temporarily fail if control queue feature requested
  vduse: enable Virtio-net device type

Michael S. Tsirkin (1):
  Merge tag 'stable/vduse-virtio-net' into vhost

Mike Christie (9):
  vhost-scsi: Handle vhost_vq_work_queue failures for events
  vhost-scsi: Handle vhost_vq_work_queue failures for cmds
  vhost-scsi: Use system wq to flush dev for TMFs
  vhost: Remove vhost_vq_flush
  vhost_scsi: Handle vhost_vq_work_queue failures for TMFs
  vhost: Use virtqueue mutex for swapping worker
  vhost: Release worker mutex during flushes
  vhost_task: Handle SIGKILL by flushing work and exiting
  kernel: Remove signal hacks for vhost_tasks

Uwe Kleine-König (1):
  virtio-mmio: Convert to platform remove callback returning void

Yuxue Liu (2):
  vp_vdpa: Fix return value check vp_vdpa_request_irq
  vp_vdpa: don't allocate unused msix vectors

Zhu Lingshan (1):
  MAINTAINERS: apply maintainer role of Intel vDPA driver

 MAINTAINERS   |  10 +-
 arch/um/drivers/virt-pci.c|   1 -
 drivers/block/virtio_blk.c|   1 -
 drivers/bluetooth/virtio_bt.c |   1 -
 drivers/char/hw_random

Re: [GIT PULL] virtio: features, fixes, cleanups

2024-05-22 Thread Michael S. Tsirkin
On Wed, May 22, 2024 at 06:03:08AM -0400, Michael S. Tsirkin wrote:
> Things to note here:

Sorry Linus, author of one of the patchsets I merged wants to drop it now.
I could revert but it seems cleaner to do that, re-test and re-post.
Will drop a duplicate as long as I do it.



> - the new Marvell OCTEON DPU driver is not here: latest v4 keeps causing
>   build failures on mips. I deferred the pull hoping to get it in
>   and I might merge a new version post rc1
>   (supposed to be ok for new drivers as they can't cause regressions),
>   but we'll see.
> - there are also a couple bugfixes under review, to be merged after rc1
> - I merged a trivial patch (removing a comment) that also got
>   merged through net.
>   git handles this just fine and it did not seem worth it
>   rebasing to drop it.
> - there is a trivial conflict in the header file. Shouldn't be any
>   trouble to resolve, but fyi the resolution by Stephen is here
>   diff --cc drivers/virtio/virtio_mem.c
>   index e8355f55a8f7,6d4dfbc53a66..
>   --- a/drivers/virtio/virtio_mem.c
>   +++ b/drivers/virtio/virtio_mem.c
>   @@@ -21,7 -21,7 +21,8 @@@
> #include 
> #include 
> #include 
>+#include 
>   + #include 
>   Also see it here:
>   https://lore.kernel.org/all/20240423145947.14217...@canb.auug.org.au/
> 
> 
> 
> The following changes since commit 18daea77cca626f590fb140fc11e3a43c5d41354:
> 
>   Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm 
> (2024-04-30 12:40:41 -0700)
> 
> are available in the Git repository at:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
> 
> for you to fetch changes up to 0b8dbbdcf2e42273fbac9b752919e2e5b2abac21:
> 
>   Merge tag 'for_linus' into vhost (2024-05-12 08:15:28 -0400)
> 
> 
> virtio: features, fixes, cleanups
> 
> Several new features here:
> 
> - virtio-net is finally supported in vduse.
> 
> - Virtio (balloon and mem) interaction with suspend is improved
> 
> - vhost-scsi now handles signals better/faster.
> 
> - virtio-net now supports premapped mode by default,
>   opening the door for all kind of zero copy tricks.
> 
> Fixes, cleanups all over the place.
> 
> Signed-off-by: Michael S. Tsirkin 
> 
> 
> Christophe JAILLET (1):
>   vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API
> 
> David Hildenbrand (1):
>   virtio-mem: support suspend+resume
> 
> David Stevens (2):
>   virtio_balloon: Give the balloon its own wakeup source
>   virtio_balloon: Treat stats requests as wakeup events
> 
> Eugenio Pérez (2):
>   MAINTAINERS: add Eugenio Pérez as reviewer
>   MAINTAINERS: add Eugenio Pérez as reviewer
> 
> Jiri Pirko (1):
>   virtio: delete vq in vp_find_vqs_msix() when request_irq() fails
> 
> Krzysztof Kozlowski (24):
>   virtio: balloon: drop owner assignment
>   virtio: input: drop owner assignment
>   virtio: mem: drop owner assignment
>   um: virt-pci: drop owner assignment
>   virtio_blk: drop owner assignment
>   bluetooth: virtio: drop owner assignment
>   hwrng: virtio: drop owner assignment
>   virtio_console: drop owner assignment
>   crypto: virtio - drop owner assignment
>   firmware: arm_scmi: virtio: drop owner assignment
>   gpio: virtio: drop owner assignment
>   drm/virtio: drop owner assignment
>   iommu: virtio: drop owner assignment
>   misc: nsm: drop owner assignment
>   net: caif: virtio: drop owner assignment
>   net: virtio: drop owner assignment
>   net: 9p: virtio: drop owner assignment
>   vsock/virtio: drop owner assignment
>   wifi: mac80211_hwsim: drop owner assignment
>   nvdimm: virtio_pmem: drop owner assignment
>   rpmsg: virtio: drop owner assignment
>   scsi: virtio: drop owner assignment
>   fuse: virtio: drop owner assignment
>   sound: virtio: drop owner assignment
> 
> Li Zhijian (1):
>   vdpa: Convert sprintf/snprintf to sysfs_emit
> 
> Maxime Coquelin (6):
>   vduse: validate block features only with block devices
>   vduse: Temporarily fail if control queue feature requested
>   vduse: enable Virtio-net device type
>   vduse: validate block features only with block devices
>   vduse: Temporarily fail if control queue feature requested
>   vduse: enable Virtio-net device type
> 
> Michael S. Tsirkin (2):
>   Merge tag 'stable/vduse-virtio-net' into vhost
>   Merge tag 'for_linus' into vho

Re: [GIT PULL] virtio: features, fixes, cleanups

2024-05-22 Thread Michael S. Tsirkin
On Wed, May 22, 2024 at 06:22:45PM +0800, Xuan Zhuo wrote:
> On Wed, 22 May 2024 06:03:01 -0400, "Michael S. Tsirkin"  
> wrote:
> > Things to note here:
> >
> > - the new Marvell OCTEON DPU driver is not here: latest v4 keeps causing
> >   build failures on mips. I deferred the pull hoping to get it in
> >   and I might merge a new version post rc1
> >   (supposed to be ok for new drivers as they can't cause regressions),
> >   but we'll see.
> > - there are also a couple bugfixes under review, to be merged after rc1
> > - I merged a trivial patch (removing a comment) that also got
> >   merged through net.
> >   git handles this just fine and it did not seem worth it
> >   rebasing to drop it.
> > - there is a trivial conflict in the header file. Shouldn't be any
> >   trouble to resolve, but fyi the resolution by Stephen is here
> > diff --cc drivers/virtio/virtio_mem.c
> > index e8355f55a8f7,6d4dfbc53a66..
> > --- a/drivers/virtio/virtio_mem.c
> > +++ b/drivers/virtio/virtio_mem.c
> > @@@ -21,7 -21,7 +21,8 @@@
> >   #include 
> >   #include 
> >   #include 
> >  +#include 
> > + #include 
> >   Also see it here:
> >   https://lore.kernel.org/all/20240423145947.14217...@canb.auug.org.au/
> >
> >
> >
> > The following changes since commit 18daea77cca626f590fb140fc11e3a43c5d41354:
> >
> >   Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm 
> > (2024-04-30 12:40:41 -0700)
> >
> > are available in the Git repository at:
> >
> >   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git 
> > tags/for_linus
> >
> > for you to fetch changes up to 0b8dbbdcf2e42273fbac9b752919e2e5b2abac21:
> >
> >   Merge tag 'for_linus' into vhost (2024-05-12 08:15:28 -0400)
> >
> > 
> > virtio: features, fixes, cleanups
> >
> > Several new features here:
> >
> > - virtio-net is finally supported in vduse.
> >
> > - Virtio (balloon and mem) interaction with suspend is improved
> >
> > - vhost-scsi now handles signals better/faster.
> >
> > - virtio-net now supports premapped mode by default,
> >   opening the door for all kind of zero copy tricks.
> >
> > Fixes, cleanups all over the place.
> >
> > Signed-off-by: Michael S. Tsirkin 
> >
> > 
> > Christophe JAILLET (1):
> >   vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API
> >
> > David Hildenbrand (1):
> >   virtio-mem: support suspend+resume
> >
> > David Stevens (2):
> >   virtio_balloon: Give the balloon its own wakeup source
> >   virtio_balloon: Treat stats requests as wakeup events
> >
> > Eugenio Pérez (2):
> >   MAINTAINERS: add Eugenio Pérez as reviewer
> >   MAINTAINERS: add Eugenio Pérez as reviewer
> >
> > Jiri Pirko (1):
> >   virtio: delete vq in vp_find_vqs_msix() when request_irq() fails
> >
> > Krzysztof Kozlowski (24):
> >   virtio: balloon: drop owner assignment
> >   virtio: input: drop owner assignment
> >   virtio: mem: drop owner assignment
> >   um: virt-pci: drop owner assignment
> >   virtio_blk: drop owner assignment
> >   bluetooth: virtio: drop owner assignment
> >   hwrng: virtio: drop owner assignment
> >   virtio_console: drop owner assignment
> >   crypto: virtio - drop owner assignment
> >   firmware: arm_scmi: virtio: drop owner assignment
> >   gpio: virtio: drop owner assignment
> >   drm/virtio: drop owner assignment
> >   iommu: virtio: drop owner assignment
> >   misc: nsm: drop owner assignment
> >   net: caif: virtio: drop owner assignment
> >   net: virtio: drop owner assignment
> >   net: 9p: virtio: drop owner assignment
> >   vsock/virtio: drop owner assignment
> >   wifi: mac80211_hwsim: drop owner assignment
> >   nvdimm: virtio_pmem: drop owner assignment
> >   rpmsg: virtio: drop owner assignment
> >   scsi: virtio: drop owner assignment
> >   fuse: virtio: drop owner assignment
> >   sound: virtio: drop owner assignment
> >
> > Li Zhijian (1):
> >   vdpa: Convert sprintf/snprintf to sysfs_emit
> >
> > Maxime Coquelin (6):
> >   vduse: validate block features only with block devices
> >   vduse: Temporaril

[GIT PULL] virtio: features, fixes, cleanups

2024-05-22 Thread Michael S. Tsirkin
Things to note here:

- the new Marvell OCTEON DPU driver is not here: latest v4 keeps causing
  build failures on mips. I deferred the pull hoping to get it in
  and I might merge a new version post rc1
  (supposed to be ok for new drivers as they can't cause regressions),
  but we'll see.
- there are also a couple bugfixes under review, to be merged after rc1
- I merged a trivial patch (removing a comment) that also got
  merged through net.
  git handles this just fine and it did not seem worth it
  rebasing to drop it.
- there is a trivial conflict in the header file. Shouldn't be any
  trouble to resolve, but fyi the resolution by Stephen is here
diff --cc drivers/virtio/virtio_mem.c
index e8355f55a8f7,6d4dfbc53a66..
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@@ -21,7 -21,7 +21,8 @@@
  #include 
  #include 
  #include 
 +#include 
+ #include 
  Also see it here:
  https://lore.kernel.org/all/20240423145947.14217...@canb.auug.org.au/



The following changes since commit 18daea77cca626f590fb140fc11e3a43c5d41354:

  Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm 
(2024-04-30 12:40:41 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 0b8dbbdcf2e42273fbac9b752919e2e5b2abac21:

  Merge tag 'for_linus' into vhost (2024-05-12 08:15:28 -0400)


virtio: features, fixes, cleanups

Several new features here:

- virtio-net is finally supported in vduse.

- Virtio (balloon and mem) interaction with suspend is improved

- vhost-scsi now handles signals better/faster.

- virtio-net now supports premapped mode by default,
  opening the door for all kind of zero copy tricks.

Fixes, cleanups all over the place.

Signed-off-by: Michael S. Tsirkin 


Christophe JAILLET (1):
  vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API

David Hildenbrand (1):
  virtio-mem: support suspend+resume

David Stevens (2):
  virtio_balloon: Give the balloon its own wakeup source
  virtio_balloon: Treat stats requests as wakeup events

Eugenio Pérez (2):
  MAINTAINERS: add Eugenio Pérez as reviewer
  MAINTAINERS: add Eugenio Pérez as reviewer

Jiri Pirko (1):
  virtio: delete vq in vp_find_vqs_msix() when request_irq() fails

Krzysztof Kozlowski (24):
  virtio: balloon: drop owner assignment
  virtio: input: drop owner assignment
  virtio: mem: drop owner assignment
  um: virt-pci: drop owner assignment
  virtio_blk: drop owner assignment
  bluetooth: virtio: drop owner assignment
  hwrng: virtio: drop owner assignment
  virtio_console: drop owner assignment
  crypto: virtio - drop owner assignment
  firmware: arm_scmi: virtio: drop owner assignment
  gpio: virtio: drop owner assignment
  drm/virtio: drop owner assignment
  iommu: virtio: drop owner assignment
  misc: nsm: drop owner assignment
  net: caif: virtio: drop owner assignment
  net: virtio: drop owner assignment
  net: 9p: virtio: drop owner assignment
  vsock/virtio: drop owner assignment
  wifi: mac80211_hwsim: drop owner assignment
  nvdimm: virtio_pmem: drop owner assignment
  rpmsg: virtio: drop owner assignment
  scsi: virtio: drop owner assignment
  fuse: virtio: drop owner assignment
  sound: virtio: drop owner assignment

Li Zhijian (1):
  vdpa: Convert sprintf/snprintf to sysfs_emit

Maxime Coquelin (6):
  vduse: validate block features only with block devices
  vduse: Temporarily fail if control queue feature requested
  vduse: enable Virtio-net device type
  vduse: validate block features only with block devices
  vduse: Temporarily fail if control queue feature requested
  vduse: enable Virtio-net device type

Michael S. Tsirkin (2):
  Merge tag 'stable/vduse-virtio-net' into vhost
  Merge tag 'for_linus' into vhost

Mike Christie (9):
  vhost-scsi: Handle vhost_vq_work_queue failures for events
  vhost-scsi: Handle vhost_vq_work_queue failures for cmds
  vhost-scsi: Use system wq to flush dev for TMFs
  vhost: Remove vhost_vq_flush
  vhost_scsi: Handle vhost_vq_work_queue failures for TMFs
  vhost: Use virtqueue mutex for swapping worker
  vhost: Release worker mutex during flushes
  vhost_task: Handle SIGKILL by flushing work and exiting
  kernel: Remove signal hacks for vhost_tasks

Uwe Kleine-König (1):
  virtio-mmio: Convert to platform remove callback returning void

Xuan Zhuo (7):
  virtio_ring: introduce dma map api for page
  virtio_ring: enable premapped mode whatever use_dma_api
  virtio_net: replace private by pp struct inside page
  virtio_net: big mode

Re: [PATCH] vhost: use pr_err for vq_err

2024-05-16 Thread Michael S. Tsirkin
On Thu, May 16, 2024 at 03:46:29PM +0800, Peng Fan (OSS) wrote:
> From: Peng Fan 
> 
> Use pr_err to print out error message without enabling DEBUG. This could
> make people catch error easier.
> 
> Signed-off-by: Peng Fan 

This isn't appropriate: pr_err must not be triggerable
by userspace. If you are debugging userspace, use a debugging
kernel, it's that simple.


> ---
>  drivers/vhost/vhost.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index bb75a292d50c..0bff436d1ce9 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -248,7 +248,7 @@ void vhost_iotlb_map_free(struct vhost_iotlb *iotlb,
> struct vhost_iotlb_map *map);
>  
>  #define vq_err(vq, fmt, ...) do {  \
> - pr_debug(pr_fmt(fmt), ##__VA_ARGS__);   \
> + pr_err(pr_fmt(fmt), ##__VA_ARGS__);   \
>   if ((vq)->error_ctx)   \
>   eventfd_signal((vq)->error_ctx);\
>   } while (0)
> -- 
> 2.37.1




Re: [PATCH net-next] virtio_net: Fix error code in __virtnet_get_hw_stats()

2024-05-15 Thread Michael S. Tsirkin
On Wed, May 15, 2024 at 04:50:48PM +0200, Dan Carpenter wrote:
> On Sun, May 12, 2024 at 12:01:55PM -0400, Michael S. Tsirkin wrote:
> > On Fri, May 10, 2024 at 03:50:45PM +0300, Dan Carpenter wrote:
> > > The virtnet_send_command_reply() function returns true on success or
> > > false on failure.  The "ok" variable is true/false depending on whether
> > > it succeeds or not.  It's up to the caller to translate the true/false
> > > into -EINVAL on failure or zero for success.
> > > 
> > > The bug is that __virtnet_get_hw_stats() returns false for both
> > > errors and success.  It's not a bug, but it is confusing that the caller
> > > virtnet_get_hw_stats() uses an "ok" variable to store negative error
> > > codes.
> > 
> > The bug is ... It's not a bug 
> > 
> > I think what you are trying to say is that the error isn't
> > really handled anyway, except for printing a warning,
> > so it's not a big deal.
> > 
> > Right?
> > 
> 
> No, I'm sorry, that was confusing.  The change to __virtnet_get_hw_stats()
> is a bugfix but the change to virtnet_get_hw_stats() was not a bugfix.
> I viewed this all as really one thing, because it's cleaning up the
> error codes which happens to fix a bug.  It seems very related.  At the
> same time, I can also see how people would disagree.
> 
> I'm traveling until May 23.  I can resend this.  Probably as two patches
> for simpler review.
> 
> regards,
> dan carpenter
>  

Yea, no rush - bugfixes are fine after 23. And it's ok to combine into
one - we don't want inconsistent code - just please write a clear
commit log message.


-- 
MST




[PATCH] vhost/vsock: always initialize seqpacket_allow

2024-05-15 Thread Michael S. Tsirkin
There are two issues around seqpacket_allow:
1. seqpacket_allow is not initialized when socket is
   created. Thus if features are never set, it will be
   read uninitialized.
2. if VIRTIO_VSOCK_F_SEQPACKET is set and then cleared,
   then seqpacket_allow will not be cleared appropriately
   (existing apps I know about don't usually do this but
it's legal and there's no way to be sure no one relies
on this).

To fix:
- initialize seqpacket_allow after allocation
- set it unconditionally in set_features

Reported-by: syzbot+6c21aeb59d0e82eb2...@syzkaller.appspotmail.com
Reported-by: Jeongjun Park 
Fixes: ced7b713711f ("vhost/vsock: support SEQPACKET for transport").
Cc: Arseny Krasnov 
Cc: David S. Miller 
Cc: Stefan Hajnoczi 
Signed-off-by: Michael S. Tsirkin 
Acked-by: Arseniy Krasnov 
Tested-by: Arseniy Krasnov 

---


Reposting now it's been tested.

 drivers/vhost/vsock.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index ec20ecff85c7..bf664ec9341b 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -667,6 +667,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct 
file *file)
}
 
vsock->guest_cid = 0; /* no CID assigned yet */
+   vsock->seqpacket_allow = false;
 
atomic_set(>queued_replies, 0);
 
@@ -810,8 +811,7 @@ static int vhost_vsock_set_features(struct vhost_vsock 
*vsock, u64 features)
goto err;
}
 
-   if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
-   vsock->seqpacket_allow = true;
+   vsock->seqpacket_allow = features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET);
 
for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
vq = >vqs[i];
-- 
MST




Re: [PATCH net-next] virtio_net: Fix error code in __virtnet_get_hw_stats()

2024-05-12 Thread Michael S. Tsirkin
On Fri, May 10, 2024 at 03:50:45PM +0300, Dan Carpenter wrote:
> The virtnet_send_command_reply() function returns true on success or
> false on failure.  The "ok" variable is true/false depending on whether
> it succeeds or not.  It's up to the caller to translate the true/false
> into -EINVAL on failure or zero for success.
> 
> The bug is that __virtnet_get_hw_stats() returns false for both
> errors and success.  It's not a bug, but it is confusing that the caller
> virtnet_get_hw_stats() uses an "ok" variable to store negative error
> codes.

The bug is ... It's not a bug 

I think what you are trying to say is that the error isn't
really handled anyway, except for printing a warning,
so it's not a big deal.

Right?

I don't know why can't get_ethtool_stats fail - we should
probably fix that.


> Fix the bug and clean things up so that it's clear that
> __virtnet_get_hw_stats() returns zero on success or negative error codes
> on failure.
> 
> Fixes: 941168f8b40e ("virtio_net: support device stats")
> Signed-off-by: Dan Carpenter 
> ---
>  drivers/net/virtio_net.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 218a446c4c27..4fc0fcdad259 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -4016,7 +4016,7 @@ static int __virtnet_get_hw_stats(struct virtnet_info 
> *vi,
>   _out, _in);
>  
>   if (!ok)
> - return ok;
> + return -EINVAL;
>  
>   for (p = reply; p - reply < res_size; p += le16_to_cpu(hdr->size)) {
>   hdr = p;
> @@ -4053,7 +4053,7 @@ static int virtnet_get_hw_stats(struct virtnet_info *vi,
>   struct virtio_net_ctrl_queue_stats *req;
>   bool enable_cvq;
>   void *reply;
> - int ok;
> + int err;
>  
>   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_DEVICE_STATS))
>   return 0;
> @@ -4100,12 +4100,12 @@ static int virtnet_get_hw_stats(struct virtnet_info 
> *vi,
>   if (enable_cvq)
>   virtnet_make_stat_req(vi, ctx, req, vi->max_queue_pairs * 2, 
> );
>  
> - ok = __virtnet_get_hw_stats(vi, ctx, req, sizeof(*req) * j, reply, 
> res_size);
> + err = __virtnet_get_hw_stats(vi, ctx, req, sizeof(*req) * j, reply, 
> res_size);
>  
>   kfree(req);
>   kfree(reply);
>  
> - return ok;
> + return err;
>  }
>  
>  static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 
> *data)




Re: [PATCH next] vhost_task: after freeing vhost_task it should not be accessed in vhost_task_fn

2024-05-01 Thread Michael S. Tsirkin
On Wed, May 01, 2024 at 10:57:38AM -0500, Mike Christie wrote:
> On 5/1/24 2:50 AM, Hillf Danton wrote:
> > On Wed, 1 May 2024 02:01:20 -0400 Michael S. Tsirkin 
> >>
> >> and then it failed testing.
> >>
> > So did my patch [1] but then the reason was spotted [2,3]
> > 
> > [1] https://lore.kernel.org/lkml/20240430110209.4310-1-hdan...@sina.com/
> > [2] https://lore.kernel.org/lkml/20240430225005.4368-1-hdan...@sina.com/
> > [3] https://lore.kernel.org/lkml/a7f8470617589...@google.com/
> 
> Just to make sure I understand the conclusion.
> 
> Edward's patch that just swaps the order of the calls:
> 
> https://lore.kernel.org/lkml/tencent_546da49414e876eebecf2c78d26d242ee...@qq.com/
> 
> fixes the UAF. I tested the same in my setup. However, when you guys tested it
> with sysbot, it also triggered a softirq/RCU warning.
> 
> The softirq/RCU part of the issue is fixed with this commit:
> 
> https://lore.kernel.org/all/20240427102808.29356-1-qiang.zhang1...@gmail.com/
> 
> commit 1dd1eff161bd55968d3d46bc36def62d71fb4785
> Author: Zqiang 
> Date:   Sat Apr 27 18:28:08 2024 +0800
> 
> softirq: Fix suspicious RCU usage in __do_softirq()
> 
> The problem was that I was testing with -next master which has that patch.
> It looks like you guys were testing against bb7a2467e6be which didn't have
> the patch, and so that's why you guys still hit the softirq/RCU issue. Later
> when you added that patch to your patch, it worked with syzbot.
> 
> So is it safe to assume that the softirq/RCU patch above will be upstream
> when the vhost changes go in or is there a tag I need to add to my patches?

That patch is upstream now. I rebased and asked syzbot to test
https://lore.kernel.org/lkml/tencent_546da49414e876eebecf2c78d26d242ee...@qq.com/
on top.

If that passes I will squash.




Re: [syzbot] [net?] [virt?] [kvm?] KASAN: slab-use-after-free Read in vhost_task_fn

2024-05-01 Thread Michael S. Tsirkin
#syz test https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git 
f138e94c1f0dbeae721917694fb2203446a68ea9




  1   2   3   4   5   6   7   8   9   10   >