Re: [PATCH] mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK

2023-07-04 Thread Michael S. Tsirkin
On Wed, Jul 05, 2023 at 01:47:44PM +0800, Jason Wang wrote:
> On Wed, Jul 5, 2023 at 1:31 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jul 05, 2023 at 01:11:37PM +0800, Jason Wang wrote:
> > > On Tue, Jul 4, 2023 at 6:16 PM Michael S. Tsirkin  wrote:
> > > >
> > > > On Mon, Jul 03, 2023 at 05:26:02PM -0700, Si-Wei Liu wrote:
> > > > >
> > > > >
> > > > > On 7/3/2023 8:46 AM, Michael S. Tsirkin wrote:
> > > > > > On Mon, Jul 03, 2023 at 04:25:14PM +0200, Eugenio Pérez wrote:
> > > > > > > Offer this backend feature as mlx5 is compatible with it. It 
> > > > > > > allows it
> > > > > > > to do live migration with CVQ, dynamically switching between 
> > > > > > > passthrough
> > > > > > > and shadow virtqueue.
> > > > > > >
> > > > > > > Signed-off-by: Eugenio Pérez 
> > > > > > Same comment.
> > > > > to which?
> > > > >
> > > > > -Siwei
> > > >
> > > > VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is too narrow a use-case to 
> > > > commit to it
> > > > as a kernel/userspace ABI: what if one wants to start rings in some
> > > > other specific order?
> > >
> > > Just enable a queue by writing e.g 1 to queue_enable in a specific order?
> >
> >
> > But then at driver ok time we don't know how many queues are there.
> 
> There should be a device specific interface for this, for example,
> num_queue_paris. So the device knows at most how many queues there
> are. Or anything I miss?

That's a device limitations. Does not tell the device how much is used.

> >
> > > > As was discussed on list, a better promise is not to access ring
> > > > until the 1st kick. vdpa can then do a kick when it wants
> > > > the device to start accessing rings.
> > >
> > > Rethink about the ACCESS_AFTER_KICK, it sounds functional equivalent
> > > to allow queue_enable after DRIVER_OK, but it seems to have
> > > distanvages:
> > >
> > > A busy polling software device may disable notifications and never
> > > respond to register to any kick notifiers. ACCESS_AFTER_KICK will
> > > introduce overhead to those implementations.
> > >
> > > Thanks
> >
> > It's just the 1st kick, then you can disable. No?
> 
> Yes, but:
> 
> 1) adding hooks for queue_enable
> 2) adding new codes to register event notifier and toggle the notifier
> 
> 1) seems much easier? Or for most devices, it already behaves like this.
> 
> Thanks

Well libvhostuser checks enabled queues at DRIVER_OK, does it not?

> >
> > > >
> > > > > >
> > > > > > > ---
> > > > > > >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
> > > > > > >   1 file changed, 7 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > index 9138ef2fb2c8..5f309a16b9dc 100644
> > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > @@ -7,6 +7,7 @@
> > > > > > >   #include 
> > > > > > >   #include 
> > > > > > >   #include 
> > > > > > > +#include 
> > > > > > >   #include 
> > > > > > >   #include 
> > > > > > >   #include 
> > > > > > > @@ -2499,6 +2500,11 @@ static void 
> > > > > > > unregister_link_notifier(struct mlx5_vdpa_net *ndev)
> > > > > > >   flush_workqueue(ndev->mvdev.wq);
> > > > > > >   }
> > > > > > > +static u64 mlx5_vdpa_get_backend_features(const struct 
> > > > > > > vdpa_device *vdpa)
> > > > > > > +{
> > > > > > > + return BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK);
> > > > > > > +}
> > > > > > > +
> > > > > > >   static int mlx5_vdpa_set_driver_features(struct vdpa_device 
> > > > > > > *vdev, u64 features)
> > > > > > >   {
> > > > > > >   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > > > @@ -3140,6 +3146,7 @@ static const struct vdpa_config_ops 
> > > > > > > mlx5_vdpa_ops = {
> > > > > > >   .get_vq_align = mlx5_vdpa_get_vq_align,
> > > > > > >   .get_vq_group = mlx5_vdpa_get_vq_group,
> > > > > > >   .get_device_features = mlx5_vdpa_get_device_features,
> > > > > > > + .get_backend_features = mlx5_vdpa_get_backend_features,
> > > > > > >   .set_driver_features = mlx5_vdpa_set_driver_features,
> > > > > > >   .get_driver_features = mlx5_vdpa_get_driver_features,
> > > > > > >   .set_config_cb = mlx5_vdpa_set_config_cb,
> > > > > > > --
> > > > > > > 2.39.3
> > > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-07-04 Thread Jason Wang
On Wed, Jul 5, 2023 at 1:41 PM Liang Chen  wrote:
>
> On Fri, Jun 9, 2023 at 10:57 AM Liang Chen  wrote:
> >
> > On Thu, Jun 8, 2023 at 8:38 AM Jason Wang  wrote:
> > >
> > > On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin  wrote:
> > > >
> > > > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote:
> > > > > On Tue, May 30, 2023 at 9:19 AM Liang Chen 
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > > > > > The implementation at the moment uses one page per packet 
> > > > > > > > > > in both the
> > > > > > > > > > normal and XDP path. In addition, introducing a module 
> > > > > > > > > > parameter to enable
> > > > > > > > > > or disable the usage of page pool (disabled by default).
> > > > > > > > > >
> > > > > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > > > > performance gain
> > > > > > > > > > in the normal path.
> > > > > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > > > > >
> > > > > > > > > > In multi-core vm testing environments, The most significant 
> > > > > > > > > > performance
> > > > > > > > > > gain is observed in XDP cpumap:
> > > > > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > > > > > >
> > > > > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > > > > fragmentation and
> > > > > > > > > > DMA map/unmap support.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Liang Chen 
> > > > > > > > >
> > > > > > > > > Why off by default?
> > > > > > > > > I am guessing it sometimes has performance costs too?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > What happens if we use page pool for big mode too?
> > > > > > > > > The less modes we have the better...
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > Sure, now I believe it makes sense to enable it by default. 
> > > > > > > > When the
> > > > > > > > packet size is very small, it reduces the likelihood of skb
> > > > > > > > coalescing. But such cases are rare.
> > > > > > >
> > > > > > > small packets are rare? These workloads are easy to create 
> > > > > > > actually.
> > > > > > > Pls try and include benchmark with small packet size.
> > > > > > >
> > > > > >
> > > > > > Sure, Thanks!
> > > > >
> > > > > Before going ahead and posting v2 patch, I would like to hear more
> > > > > advice for the cases of small packets. I have done more performance
> > > > > benchmark with small packets since then. Here is a list of iperf
> > > > > output,
> > > > >
> > > > > With PP and PP fragmenting:
> > > > > 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0
> > > > > 144 KBytes
> > > > > 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> > > > > 223 KBytes
> > > > > 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> > > > > 324 KBytes
> > > > > 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> > > > > 1.08 MBytes
> > > > > 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> > > > > 744 KBytes
> > > > > 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0
> > > > > 963 KBytes
> > > > > 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   
> > > > > 1.25 MBytes
> > > > > 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   
> > > > > 1.70 MBytes
> > > > > 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   
> > > > > 4.26 MBytes
> > > > > 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   
> > > > > 3.20 MBytes
> > >
> > > Note that virtio-net driver is lacking things like BQL and others, so
> > > it might suffer from buffer bloat for TCP performance. Would you mind
> > > to measure with e.g using testpmd on the vhost to see the rx PPS?
> > >
> >
> > No problem. Before we proceed to measure with testpmd, could you
> > please take a look at the PPS measurements we obtained previously and
> > see if they are sufficient? Though we will only utilize page pool for
> > xdp on v2.
> >
> > netperf -H 192.168.124.197 -p  -t UDP_STREAM -l 0 -- -m $((1))
> >
> > with page pool:
> > 1.
> > Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
> > rxcmp/s   txcmp/s  rxmcst/s   %ifutil
> > Average:   enp8s0 655092.27  0.35  27508.77  0.03
> > 0.00  0.00  0.00  0.00
> > 2.
> > Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
> > rxcmp/s   txcmp/s  rxmcst/s   %ifutil
> > Average:   enp8s0 654749.87  0.63  274

Re: [PATCH] mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK

2023-07-04 Thread Jason Wang
On Wed, Jul 5, 2023 at 1:31 PM Michael S. Tsirkin  wrote:
>
> On Wed, Jul 05, 2023 at 01:11:37PM +0800, Jason Wang wrote:
> > On Tue, Jul 4, 2023 at 6:16 PM Michael S. Tsirkin  wrote:
> > >
> > > On Mon, Jul 03, 2023 at 05:26:02PM -0700, Si-Wei Liu wrote:
> > > >
> > > >
> > > > On 7/3/2023 8:46 AM, Michael S. Tsirkin wrote:
> > > > > On Mon, Jul 03, 2023 at 04:25:14PM +0200, Eugenio Pérez wrote:
> > > > > > Offer this backend feature as mlx5 is compatible with it. It allows 
> > > > > > it
> > > > > > to do live migration with CVQ, dynamically switching between 
> > > > > > passthrough
> > > > > > and shadow virtqueue.
> > > > > >
> > > > > > Signed-off-by: Eugenio Pérez 
> > > > > Same comment.
> > > > to which?
> > > >
> > > > -Siwei
> > >
> > > VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is too narrow a use-case to commit 
> > > to it
> > > as a kernel/userspace ABI: what if one wants to start rings in some
> > > other specific order?
> >
> > Just enable a queue by writing e.g 1 to queue_enable in a specific order?
>
>
> But then at driver ok time we don't know how many queues are there.

There should be a device specific interface for this, for example,
num_queue_paris. So the device knows at most how many queues there
are. Or anything I miss?

>
> > > As was discussed on list, a better promise is not to access ring
> > > until the 1st kick. vdpa can then do a kick when it wants
> > > the device to start accessing rings.
> >
> > Rethink about the ACCESS_AFTER_KICK, it sounds functional equivalent
> > to allow queue_enable after DRIVER_OK, but it seems to have
> > distanvages:
> >
> > A busy polling software device may disable notifications and never
> > respond to register to any kick notifiers. ACCESS_AFTER_KICK will
> > introduce overhead to those implementations.
> >
> > Thanks
>
> It's just the 1st kick, then you can disable. No?

Yes, but:

1) adding hooks for queue_enable
2) adding new codes to register event notifier and toggle the notifier

1) seems much easier? Or for most devices, it already behaves like this.

Thanks

>
> > >
> > > > >
> > > > > > ---
> > > > > >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
> > > > > >   1 file changed, 7 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > index 9138ef2fb2c8..5f309a16b9dc 100644
> > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > @@ -7,6 +7,7 @@
> > > > > >   #include 
> > > > > >   #include 
> > > > > >   #include 
> > > > > > +#include 
> > > > > >   #include 
> > > > > >   #include 
> > > > > >   #include 
> > > > > > @@ -2499,6 +2500,11 @@ static void unregister_link_notifier(struct 
> > > > > > mlx5_vdpa_net *ndev)
> > > > > >   flush_workqueue(ndev->mvdev.wq);
> > > > > >   }
> > > > > > +static u64 mlx5_vdpa_get_backend_features(const struct vdpa_device 
> > > > > > *vdpa)
> > > > > > +{
> > > > > > + return BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK);
> > > > > > +}
> > > > > > +
> > > > > >   static int mlx5_vdpa_set_driver_features(struct vdpa_device 
> > > > > > *vdev, u64 features)
> > > > > >   {
> > > > > >   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > > @@ -3140,6 +3146,7 @@ static const struct vdpa_config_ops 
> > > > > > mlx5_vdpa_ops = {
> > > > > >   .get_vq_align = mlx5_vdpa_get_vq_align,
> > > > > >   .get_vq_group = mlx5_vdpa_get_vq_group,
> > > > > >   .get_device_features = mlx5_vdpa_get_device_features,
> > > > > > + .get_backend_features = mlx5_vdpa_get_backend_features,
> > > > > >   .set_driver_features = mlx5_vdpa_set_driver_features,
> > > > > >   .get_driver_features = mlx5_vdpa_get_driver_features,
> > > > > >   .set_config_cb = mlx5_vdpa_set_config_cb,
> > > > > > --
> > > > > > 2.39.3
> > >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-07-04 Thread Liang Chen
On Fri, Jun 9, 2023 at 10:57 AM Liang Chen  wrote:
>
> On Thu, Jun 8, 2023 at 8:38 AM Jason Wang  wrote:
> >
> > On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin  wrote:
> > >
> > > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote:
> > > > On Tue, May 30, 2023 at 9:19 AM Liang Chen  
> > > > wrote:
> > > > >
> > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > > > > The implementation at the moment uses one page per packet in 
> > > > > > > > > both the
> > > > > > > > > normal and XDP path. In addition, introducing a module 
> > > > > > > > > parameter to enable
> > > > > > > > > or disable the usage of page pool (disabled by default).
> > > > > > > > >
> > > > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > > > performance gain
> > > > > > > > > in the normal path.
> > > > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > > > >
> > > > > > > > > In multi-core vm testing environments, The most significant 
> > > > > > > > > performance
> > > > > > > > > gain is observed in XDP cpumap:
> > > > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > > > > >
> > > > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > > > fragmentation and
> > > > > > > > > DMA map/unmap support.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Liang Chen 
> > > > > > > >
> > > > > > > > Why off by default?
> > > > > > > > I am guessing it sometimes has performance costs too?
> > > > > > > >
> > > > > > > >
> > > > > > > > What happens if we use page pool for big mode too?
> > > > > > > > The less modes we have the better...
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > Sure, now I believe it makes sense to enable it by default. When 
> > > > > > > the
> > > > > > > packet size is very small, it reduces the likelihood of skb
> > > > > > > coalescing. But such cases are rare.
> > > > > >
> > > > > > small packets are rare? These workloads are easy to create actually.
> > > > > > Pls try and include benchmark with small packet size.
> > > > > >
> > > > >
> > > > > Sure, Thanks!
> > > >
> > > > Before going ahead and posting v2 patch, I would like to hear more
> > > > advice for the cases of small packets. I have done more performance
> > > > benchmark with small packets since then. Here is a list of iperf
> > > > output,
> > > >
> > > > With PP and PP fragmenting:
> > > > 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0
> > > > 144 KBytes
> > > > 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> > > > 223 KBytes
> > > > 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> > > > 324 KBytes
> > > > 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> > > > 1.08 MBytes
> > > > 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> > > > 744 KBytes
> > > > 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0
> > > > 963 KBytes
> > > > 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   
> > > > 1.25 MBytes
> > > > 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   
> > > > 1.70 MBytes
> > > > 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   
> > > > 4.26 MBytes
> > > > 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   
> > > > 3.20 MBytes
> >
> > Note that virtio-net driver is lacking things like BQL and others, so
> > it might suffer from buffer bloat for TCP performance. Would you mind
> > to measure with e.g using testpmd on the vhost to see the rx PPS?
> >
>
> No problem. Before we proceed to measure with testpmd, could you
> please take a look at the PPS measurements we obtained previously and
> see if they are sufficient? Though we will only utilize page pool for
> xdp on v2.
>
> netperf -H 192.168.124.197 -p  -t UDP_STREAM -l 0 -- -m $((1))
>
> with page pool:
> 1.
> Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
> rxcmp/s   txcmp/s  rxmcst/s   %ifutil
> Average:   enp8s0 655092.27  0.35  27508.77  0.03
> 0.00  0.00  0.00  0.00
> 2.
> Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
> rxcmp/s   txcmp/s  rxmcst/s   %ifutil
> Average:   enp8s0 654749.87  0.63  27494.42  0.05
> 0.00  0.00  0.00  0.00
> 3.
> Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
> rxcmp/s   txcmp/s  rxmcst/s   %ifutil
> Average:   enp8s0 654230.40  0.10  27472.57  0.01
> 0.00  0.00  0.00  0.00
> 4.
> Average:IFA

Re: [PATCH] mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK

2023-07-04 Thread Michael S. Tsirkin
On Wed, Jul 05, 2023 at 01:11:37PM +0800, Jason Wang wrote:
> On Tue, Jul 4, 2023 at 6:16 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Jul 03, 2023 at 05:26:02PM -0700, Si-Wei Liu wrote:
> > >
> > >
> > > On 7/3/2023 8:46 AM, Michael S. Tsirkin wrote:
> > > > On Mon, Jul 03, 2023 at 04:25:14PM +0200, Eugenio Pérez wrote:
> > > > > Offer this backend feature as mlx5 is compatible with it. It allows it
> > > > > to do live migration with CVQ, dynamically switching between 
> > > > > passthrough
> > > > > and shadow virtqueue.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez 
> > > > Same comment.
> > > to which?
> > >
> > > -Siwei
> >
> > VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is too narrow a use-case to commit 
> > to it
> > as a kernel/userspace ABI: what if one wants to start rings in some
> > other specific order?
> 
> Just enable a queue by writing e.g 1 to queue_enable in a specific order?


But then at driver ok time we don't know how many queues are there.

> > As was discussed on list, a better promise is not to access ring
> > until the 1st kick. vdpa can then do a kick when it wants
> > the device to start accessing rings.
> 
> Rethink about the ACCESS_AFTER_KICK, it sounds functional equivalent
> to allow queue_enable after DRIVER_OK, but it seems to have
> distanvages:
> 
> A busy polling software device may disable notifications and never
> respond to register to any kick notifiers. ACCESS_AFTER_KICK will
> introduce overhead to those implementations.
> 
> Thanks

It's just the 1st kick, then you can disable. No?

> >
> > > >
> > > > > ---
> > > > >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
> > > > >   1 file changed, 7 insertions(+)
> > > > >
> > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > index 9138ef2fb2c8..5f309a16b9dc 100644
> > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > @@ -7,6 +7,7 @@
> > > > >   #include 
> > > > >   #include 
> > > > >   #include 
> > > > > +#include 
> > > > >   #include 
> > > > >   #include 
> > > > >   #include 
> > > > > @@ -2499,6 +2500,11 @@ static void unregister_link_notifier(struct 
> > > > > mlx5_vdpa_net *ndev)
> > > > >   flush_workqueue(ndev->mvdev.wq);
> > > > >   }
> > > > > +static u64 mlx5_vdpa_get_backend_features(const struct vdpa_device 
> > > > > *vdpa)
> > > > > +{
> > > > > + return BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK);
> > > > > +}
> > > > > +
> > > > >   static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, 
> > > > > u64 features)
> > > > >   {
> > > > >   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > @@ -3140,6 +3146,7 @@ static const struct vdpa_config_ops 
> > > > > mlx5_vdpa_ops = {
> > > > >   .get_vq_align = mlx5_vdpa_get_vq_align,
> > > > >   .get_vq_group = mlx5_vdpa_get_vq_group,
> > > > >   .get_device_features = mlx5_vdpa_get_device_features,
> > > > > + .get_backend_features = mlx5_vdpa_get_backend_features,
> > > > >   .set_driver_features = mlx5_vdpa_set_driver_features,
> > > > >   .get_driver_features = mlx5_vdpa_get_driver_features,
> > > > >   .set_config_cb = mlx5_vdpa_set_config_cb,
> > > > > --
> > > > > 2.39.3
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK

2023-07-04 Thread Jason Wang
On Tue, Jul 4, 2023 at 6:16 PM Michael S. Tsirkin  wrote:
>
> On Mon, Jul 03, 2023 at 05:26:02PM -0700, Si-Wei Liu wrote:
> >
> >
> > On 7/3/2023 8:46 AM, Michael S. Tsirkin wrote:
> > > On Mon, Jul 03, 2023 at 04:25:14PM +0200, Eugenio Pérez wrote:
> > > > Offer this backend feature as mlx5 is compatible with it. It allows it
> > > > to do live migration with CVQ, dynamically switching between passthrough
> > > > and shadow virtqueue.
> > > >
> > > > Signed-off-by: Eugenio Pérez 
> > > Same comment.
> > to which?
> >
> > -Siwei
>
> VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is too narrow a use-case to commit to 
> it
> as a kernel/userspace ABI: what if one wants to start rings in some
> other specific order?

Just enable a queue by writing e.g 1 to queue_enable in a specific order?

> As was discussed on list, a better promise is not to access ring
> until the 1st kick. vdpa can then do a kick when it wants
> the device to start accessing rings.

Rethink about the ACCESS_AFTER_KICK, it sounds functional equivalent
to allow queue_enable after DRIVER_OK, but it seems to have
distanvages:

A busy polling software device may disable notifications and never
respond to register to any kick notifiers. ACCESS_AFTER_KICK will
introduce overhead to those implementations.

Thanks

>
> > >
> > > > ---
> > > >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
> > > >   1 file changed, 7 insertions(+)
> > > >
> > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > index 9138ef2fb2c8..5f309a16b9dc 100644
> > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > @@ -7,6 +7,7 @@
> > > >   #include 
> > > >   #include 
> > > >   #include 
> > > > +#include 
> > > >   #include 
> > > >   #include 
> > > >   #include 
> > > > @@ -2499,6 +2500,11 @@ static void unregister_link_notifier(struct 
> > > > mlx5_vdpa_net *ndev)
> > > >   flush_workqueue(ndev->mvdev.wq);
> > > >   }
> > > > +static u64 mlx5_vdpa_get_backend_features(const struct vdpa_device 
> > > > *vdpa)
> > > > +{
> > > > + return BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK);
> > > > +}
> > > > +
> > > >   static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, 
> > > > u64 features)
> > > >   {
> > > >   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > @@ -3140,6 +3146,7 @@ static const struct vdpa_config_ops mlx5_vdpa_ops 
> > > > = {
> > > >   .get_vq_align = mlx5_vdpa_get_vq_align,
> > > >   .get_vq_group = mlx5_vdpa_get_vq_group,
> > > >   .get_device_features = mlx5_vdpa_get_device_features,
> > > > + .get_backend_features = mlx5_vdpa_get_backend_features,
> > > >   .set_driver_features = mlx5_vdpa_set_driver_features,
> > > >   .get_driver_features = mlx5_vdpa_get_driver_features,
> > > >   .set_config_cb = mlx5_vdpa_set_config_cb,
> > > > --
> > > > 2.39.3
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 3/3] vduse: Temporarily disable control queue features

2023-07-04 Thread Maxime Coquelin




On 7/4/23 18:43, Michael S. Tsirkin wrote:

On Tue, Jul 04, 2023 at 06:40:45PM +0200, Maxime Coquelin wrote:

Virtio-net driver control queue implementation is not safe
when used with VDUSE. If the VDUSE application does not
reply to control queue messages, it currently ends up
hanging the kernel thread sending this command.

Some work is on-going to make the control queue
implementation robust with VDUSE. Until it is completed,
let's disable control virtqueue and features that depend on
it.

Signed-off-by: Maxime Coquelin 
---
  drivers/vdpa/vdpa_user/vduse_dev.c | 21 +
  1 file changed, 21 insertions(+)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index 1271c9796517..04367a53802b 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -1778,6 +1778,25 @@ static struct attribute *vduse_dev_attrs[] = {
  
  ATTRIBUTE_GROUPS(vduse_dev);
  
+static void vduse_dev_features_fixup(struct vduse_dev_config *config)

+{
+   if (config->device_id == VIRTIO_ID_NET) {
+   /*
+* Temporarily disable control virtqueue and features that
+* depend on it while CVQ is being made more robust for VDUSE.
+*/
+   config->features &= ~((1ULL << VIRTIO_NET_F_CTRL_VQ) |
+   (1ULL << VIRTIO_NET_F_CTRL_RX) |
+   (1ULL << VIRTIO_NET_F_CTRL_VLAN) |
+   (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) |
+   (1ULL << VIRTIO_NET_F_MQ) |
+   (1ULL << VIRTIO_NET_F_CTRL_MAC_ADDR) |
+   (1ULL << VIRTIO_NET_F_RSS) |
+   (1ULL << VIRTIO_NET_F_HASH_REPORT) |
+   (1ULL << VIRTIO_NET_F_NOTF_COAL));
+   }
+}
+



This will never be exhaustive, we are adding new features.
Please add an allowlist with just legal ones instead.


Ok, got it!
I'll post a new revision.

Thanks,
Maxime





  static int vduse_create_dev(struct vduse_dev_config *config,
void *config_buf, u64 api_version)
  {
@@ -1793,6 +1812,8 @@ static int vduse_create_dev(struct vduse_dev_config 
*config,
if (!dev)
goto err;
  
+	vduse_dev_features_fixup(config);

+
dev->api_version = api_version;
dev->device_features = config->features;
dev->device_id = config->device_id;
--
2.41.0




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[linux-next:master] BUILD REGRESSION 1c6f93977947dbba1fc4d250c4eb8a7d4cfdecf1

2023-07-04 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 1c6f93977947dbba1fc4d250c4eb8a7d4cfdecf1  Add linux-next specific 
files for 20230704

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/202306260401.qzlyqpv2-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202306301709.lvrxzycj-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202306301756.x8dgyynl-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

drivers/bluetooth/btmtk.c:386:32: error: no member named 'dump' in 'struct 
hci_dev'
drivers/bluetooth/btmtk.c:386:44: error: 'struct hci_dev' has no member named 
'dump'
drivers/char/mem.c:164:25: error: implicit declaration of function 
'unxlate_dev_mem_ptr'; did you mean 'xlate_dev_mem_ptr'? 
[-Werror=implicit-function-declaration]
drivers/mfd/max77541.c:176:18: warning: cast to smaller integer type 'enum 
max7754x_ids' from 'const void *' [-Wvoid-pointer-to-enum-cast]
lib/kunit/executor_test.c:138:4: warning: cast from 'void (*)(const void *)' to 
'kunit_action_t *' (aka 'void (*)(void *)') converts to incompatible function 
type [-Wcast-function-type-strict]
lib/kunit/test.c:775:38: warning: cast from 'void (*)(const void *)' to 
'kunit_action_t *' (aka 'void (*)(void *)') converts to incompatible function 
type [-Wcast-function-type-strict]

Unverified Error/Warning (likely false positive, please contact us if 
interested):

drivers/vhost/vhost.c:1654 vhost_vring_ioctl() error: uninitialized symbol 'vq'.
drivers/vhost/vhost.c:1685 vhost_vring_ioctl() error: uninitialized symbol 
'idx'.
drivers/vhost/vhost.c:763 vhost_worker_ioctl() error: uninitialized symbol 'vq'.
drivers/vhost/vhost.c:774 vhost_worker_ioctl() error: uninitialized symbol 
'idx'.
{standard input}: Error: local label `"2" (instance number 9 of a fb label)' is 
not defined

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-randconfig-r002-20230704
|   `-- drivers-bluetooth-btmtk.c:error:struct-hci_dev-has-no-member-named-dump
|-- loongarch-randconfig-r022-20230704
|   `-- drivers-bluetooth-btmtk.c:error:struct-hci_dev-has-no-member-named-dump
|-- m68k-randconfig-r016-20230704
|   `-- drivers-bluetooth-btmtk.c:error:struct-hci_dev-has-no-member-named-dump
|-- microblaze-randconfig-r073-20230703
|   `-- drivers-bluetooth-btmtk.c:error:struct-hci_dev-has-no-member-named-dump
|-- nios2-randconfig-r034-20230704
|   `-- drivers-bluetooth-btmtk.c:error:struct-hci_dev-has-no-member-named-dump
|-- riscv-randconfig-m031-20230703
|   |-- 
drivers-vhost-vhost.c-vhost_vring_ioctl()-error:uninitialized-symbol-idx-.
|   |-- 
drivers-vhost-vhost.c-vhost_vring_ioctl()-error:uninitialized-symbol-vq-.
|   |-- 
drivers-vhost-vhost.c-vhost_worker_ioctl()-error:uninitialized-symbol-idx-.
|   `-- 
drivers-vhost-vhost.c-vhost_worker_ioctl()-error:uninitialized-symbol-vq-.
|-- sh-allmodconfig
|   |-- 
drivers-char-mem.c:error:implicit-declaration-of-function-unxlate_dev_mem_ptr
|   `-- 
standard-input:Error:local-label-(instance-number-of-a-fb-label)-is-not-defined
|-- sh-randconfig-r015-20230704
|   |-- drivers-bluetooth-btmtk.c:error:struct-hci_dev-has-no-member-named-dump
|   `-- 
drivers-char-mem.c:error:implicit-declaration-of-function-unxlate_dev_mem_ptr
|-- sh-sh7710voipgw_defconfig
|   `-- 
drivers-char-mem.c:error:implicit-declaration-of-function-unxlate_dev_mem_ptr
|-- sparc-randconfig-r005-20230704
|   `-- drivers-bluetooth-btmtk.c:error:struct-hci_dev-has-no-member-named-dump
`-- x86_64-buildonly-randconfig-r003-20230703
`-- drivers-bluetooth-btmtk.c:error:struct-hci_dev-has-no-member-named-dump
clang_recent_errors
|-- arm64-randconfig-r033-20230704
|   `-- 
drivers-mfd-max77541.c:warning:cast-to-smaller-integer-type-enum-max7754x_ids-from-const-void
|-- hexagon-randconfig-r041-20230703
|   |-- drivers-bluetooth-btmtk.c:error:no-member-named-dump-in-struct-hci_dev
|   |-- 
lib-kunit-executor_test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type
|   `-- 
lib-kunit-test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type
|-- hexagon-randconfig-r045-20230703
|   |-- 
lib-kunit-executor_test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type
|   `-- 
lib-kunit-test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type
|-- i386-randconfig-i011-20230703
|   `-- drivers-bluetooth-btmtk.c:error:no-member-named-dump-in-struct-hci_dev
`-- s390-randconfig-r006-20230704
|-- 
lib-kunit-executor_test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-

Re: [PATCH v2 3/3] vduse: Temporarily disable control queue features

2023-07-04 Thread Michael S. Tsirkin
On Tue, Jul 04, 2023 at 06:40:45PM +0200, Maxime Coquelin wrote:
> Virtio-net driver control queue implementation is not safe
> when used with VDUSE. If the VDUSE application does not
> reply to control queue messages, it currently ends up
> hanging the kernel thread sending this command.
> 
> Some work is on-going to make the control queue
> implementation robust with VDUSE. Until it is completed,
> let's disable control virtqueue and features that depend on
> it.
> 
> Signed-off-by: Maxime Coquelin 
> ---
>  drivers/vdpa/vdpa_user/vduse_dev.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 1271c9796517..04367a53802b 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -1778,6 +1778,25 @@ static struct attribute *vduse_dev_attrs[] = {
>  
>  ATTRIBUTE_GROUPS(vduse_dev);
>  
> +static void vduse_dev_features_fixup(struct vduse_dev_config *config)
> +{
> + if (config->device_id == VIRTIO_ID_NET) {
> + /*
> +  * Temporarily disable control virtqueue and features that
> +  * depend on it while CVQ is being made more robust for VDUSE.
> +  */
> + config->features &= ~((1ULL << VIRTIO_NET_F_CTRL_VQ) |
> + (1ULL << VIRTIO_NET_F_CTRL_RX) |
> + (1ULL << VIRTIO_NET_F_CTRL_VLAN) |
> + (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) |
> + (1ULL << VIRTIO_NET_F_MQ) |
> + (1ULL << VIRTIO_NET_F_CTRL_MAC_ADDR) |
> + (1ULL << VIRTIO_NET_F_RSS) |
> + (1ULL << VIRTIO_NET_F_HASH_REPORT) |
> + (1ULL << VIRTIO_NET_F_NOTF_COAL));
> + }
> +}
> +


This will never be exhaustive, we are adding new features.
Please add an allowlist with just legal ones instead.


>  static int vduse_create_dev(struct vduse_dev_config *config,
>   void *config_buf, u64 api_version)
>  {
> @@ -1793,6 +1812,8 @@ static int vduse_create_dev(struct vduse_dev_config 
> *config,
>   if (!dev)
>   goto err;
>  
> + vduse_dev_features_fixup(config);
> +
>   dev->api_version = api_version;
>   dev->device_features = config->features;
>   dev->device_id = config->device_id;
> -- 
> 2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 3/3] vduse: Temporarily disable control queue features

2023-07-04 Thread Maxime Coquelin
Virtio-net driver control queue implementation is not safe
when used with VDUSE. If the VDUSE application does not
reply to control queue messages, it currently ends up
hanging the kernel thread sending this command.

Some work is on-going to make the control queue
implementation robust with VDUSE. Until it is completed,
let's disable control virtqueue and features that depend on
it.

Signed-off-by: Maxime Coquelin 
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index 1271c9796517..04367a53802b 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -1778,6 +1778,25 @@ static struct attribute *vduse_dev_attrs[] = {
 
 ATTRIBUTE_GROUPS(vduse_dev);
 
+static void vduse_dev_features_fixup(struct vduse_dev_config *config)
+{
+   if (config->device_id == VIRTIO_ID_NET) {
+   /*
+* Temporarily disable control virtqueue and features that
+* depend on it while CVQ is being made more robust for VDUSE.
+*/
+   config->features &= ~((1ULL << VIRTIO_NET_F_CTRL_VQ) |
+   (1ULL << VIRTIO_NET_F_CTRL_RX) |
+   (1ULL << VIRTIO_NET_F_CTRL_VLAN) |
+   (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) |
+   (1ULL << VIRTIO_NET_F_MQ) |
+   (1ULL << VIRTIO_NET_F_CTRL_MAC_ADDR) |
+   (1ULL << VIRTIO_NET_F_RSS) |
+   (1ULL << VIRTIO_NET_F_HASH_REPORT) |
+   (1ULL << VIRTIO_NET_F_NOTF_COAL));
+   }
+}
+
 static int vduse_create_dev(struct vduse_dev_config *config,
void *config_buf, u64 api_version)
 {
@@ -1793,6 +1812,8 @@ static int vduse_create_dev(struct vduse_dev_config 
*config,
if (!dev)
goto err;
 
+   vduse_dev_features_fixup(config);
+
dev->api_version = api_version;
dev->device_features = config->features;
dev->device_id = config->device_id;
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 1/3] vduse: validate block features only with block devices

2023-07-04 Thread Maxime Coquelin
This patch is preliminary work to enable network device
type support to VDUSE.

As VIRTIO_BLK_F_CONFIG_WCE shares the same value as
VIRTIO_NET_F_HOST_TSO4, we need to restrict its check
to Virtio-blk device type.

Acked-by: Jason Wang 
Reviewed-by: Xie Yongji 
Signed-off-by: Maxime Coquelin 
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index dc38ed21319d..ff9fdd6783fe 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -1662,13 +1662,14 @@ static bool device_is_allowed(u32 device_id)
return false;
 }
 
-static bool features_is_valid(u64 features)
+static bool features_is_valid(struct vduse_dev_config *config)
 {
-   if (!(features & (1ULL << VIRTIO_F_ACCESS_PLATFORM)))
+   if (!(config->features & (1ULL << VIRTIO_F_ACCESS_PLATFORM)))
return false;
 
/* Now we only support read-only configuration space */
-   if (features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE))
+   if ((config->device_id == VIRTIO_ID_BLOCK) &&
+   (config->features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE)))
return false;
 
return true;
@@ -1695,7 +1696,7 @@ static bool vduse_validate_config(struct vduse_dev_config 
*config)
if (!device_is_allowed(config->device_id))
return false;
 
-   if (!features_is_valid(config->features))
+   if (!features_is_valid(config))
return false;
 
return true;
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 2/3] vduse: enable Virtio-net device type

2023-07-04 Thread Maxime Coquelin
This patch adds Virtio-net device type to the supported
devices types. Initialization fails if the device does
not support VIRTIO_F_VERSION_1 feature, in order to
guarantee the configuration space is read-only.

Acked-by: Jason Wang 
Reviewed-by: Xie Yongji 
Signed-off-by: Maxime Coquelin 
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index ff9fdd6783fe..1271c9796517 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -142,6 +142,7 @@ static struct workqueue_struct *vduse_irq_bound_wq;
 
 static u32 allowed_device_id[] = {
VIRTIO_ID_BLOCK,
+   VIRTIO_ID_NET,
 };
 
 static inline struct vduse_dev *vdpa_to_vduse(struct vdpa_device *vdpa)
@@ -1672,6 +1673,10 @@ static bool features_is_valid(struct vduse_dev_config 
*config)
(config->features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE)))
return false;
 
+   if ((config->device_id == VIRTIO_ID_NET) &&
+   !(config->features & (1ULL << VIRTIO_F_VERSION_1)))
+   return false;
+
return true;
 }
 
@@ -2027,6 +2032,7 @@ static const struct vdpa_mgmtdev_ops vdpa_dev_mgmtdev_ops 
= {
 
 static struct virtio_device_id id_table[] = {
{ VIRTIO_ID_BLOCK, VIRTIO_DEV_ANY_ID },
+   { VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID },
{ 0 },
 };
 
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 0/3] vduse: add support for networking devices

2023-07-04 Thread Maxime Coquelin
This small series enables virtio-net device type in VDUSE.
With it, basic operation have been tested, both with
virtio-vdpa and vhost-vdpa using DPDK Vhost library series
adding VDUSE support using split rings layout (merged in
DPDK v23.07-rc1).

Control queue support (and so multiqueue) has also been
tested, but requires a Kernel series from Jason Wang
relaxing control queue polling [1] to function reliably,
so while Jason rework is done, a patch is added to disable
CVQ and features that depend on it (tested also with DPDK
v23.07-rc1).

[1]: 
https://lore.kernel.org/lkml/cacgkmetgrxn3ppwsdo4oosnssljfembez0wvjgrr3whu+qa...@mail.gmail.com/T/

v1 -> v2 changes:
=
- Add a patch to disable CVQ (Michael)

RFC -> v1 changes:
==
- Fail device init if it does not support VERSION_1 (Jason)

Maxime Coquelin (3):
  vduse: validate block features only with block devices
  vduse: enable Virtio-net device type
  vduse: Temporarily disable control queue features

 drivers/vdpa/vdpa_user/vduse_dev.c | 36 ++
 1 file changed, 32 insertions(+), 4 deletions(-)

-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vdpa: reject F_ENABLE_AFTER_DRIVER_OK if backend does not support it

2023-07-04 Thread Michael S. Tsirkin
On Tue, Jul 04, 2023 at 01:36:11PM +0200, Eugenio Perez Martin wrote:
> On Tue, Jul 4, 2023 at 12:38 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Jul 04, 2023 at 12:25:32PM +0200, Eugenio Perez Martin wrote:
> > > On Mon, Jul 3, 2023 at 4:52 PM Michael S. Tsirkin  wrote:
> > > >
> > > > On Mon, Jul 03, 2023 at 04:22:18PM +0200, Eugenio Pérez wrote:
> > > > > With the current code it is accepted as long as userland send it.
> > > > >
> > > > > Although userland should not set a feature flag that has not been
> > > > > offered to it with VHOST_GET_BACKEND_FEATURES, the current code will 
> > > > > not
> > > > > complain for it.
> > > > >
> > > > > Since there is no specific reason for any parent to reject that 
> > > > > backend
> > > > > feature bit when it has been proposed, let's control it at vdpa 
> > > > > frontend
> > > > > level. Future patches may move this control to the parent driver.
> > > > >
> > > > > Fixes: 967800d2d52e ("vdpa: accept 
> > > > > VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature")
> > > > > Signed-off-by: Eugenio Pérez 
> > > >
> > > > Please do send v3. And again, I don't want to send "after driver ok" 
> > > > hack
> > > > upstream at all, I merged it in next just to give it some testing.
> > > > We want RING_ACCESS_AFTER_KICK or some such.
> > > >
> > >
> > > Current devices do not support that semantic.
> >
> > Which devices specifically access the ring after DRIVER_OK but before
> > a kick?
> >
> 
> Previous versions of the QEMU LM series did a spurious kick to start
> traffic at the LM destination [1]. When it was proposed, that spurious
> kick was removed from the series because to check for descriptors
> after driver_ok, even without a kick, was considered work of the
> parent driver.
> 
> I'm ok to go back to this spurious kick, but I'm not sure if the hw
> will read the ring before the kick actually. I can ask.
> 
> Thanks!
> 
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-01/msg02775.html

Let's find out. We need to check for ENABLE_AFTER_DRIVER_OK too, no?



> > > My plan was to convert
> > > it in vp_vdpa if needed, and reuse the current vdpa ops. Sorry if I
> > > was not explicit enough.
> > >
> > > The only solution I can see to that is to trap & emulate in the vdpa
> > > (parent?) driver, as talked in virtio-comment. But that complicates
> > > the architecture:
> > > * Offer VHOST_BACKEND_F_RING_ACCESS_AFTER_KICK
> > > * Store vq enable state separately, at
> > > vdpa->config->set_vq_ready(true), but not transmit that enable to hw
> > > * Store the doorbell state separately, but do not configure it to the
> > > device directly.
> > >
> > > But how to recover if the device cannot configure them at kick time,
> > > for example?
> > >
> > > Maybe we can just fail if the parent driver does not support enabling
> > > the vq after DRIVER_OK? That way no new feature flag is needed.
> > >
> > > Thanks!
> > >
> > > >
> > > > > ---
> > > > > Sent with Fixes: tag pointing to 
> > > > > git.kernel.org/pub/scm/linux/kernel/git/mst
> > > > > commit. Please let me know if I should send a v3 of [1] instead.
> > > > >
> > > > > [1] 
> > > > > https://lore.kernel.org/lkml/20230609121244-mutt-send-email-...@kernel.org/T/
> > > > > ---
> > > > >  drivers/vhost/vdpa.c | 7 +--
> > > > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > > > index e1abf29fed5b..a7e554352351 100644
> > > > > --- a/drivers/vhost/vdpa.c
> > > > > +++ b/drivers/vhost/vdpa.c
> > > > > @@ -681,18 +681,21 @@ static long vhost_vdpa_unlocked_ioctl(struct 
> > > > > file *filep,
> > > > >  {
> > > > >   struct vhost_vdpa *v = filep->private_data;
> > > > >   struct vhost_dev *d = &v->vdev;
> > > > > + const struct vdpa_config_ops *ops = v->vdpa->config;
> > > > >   void __user *argp = (void __user *)arg;
> > > > >   u64 __user *featurep = argp;
> > > > > - u64 features;
> > > > > + u64 features, parent_features = 0;
> > > > >   long r = 0;
> > > > >
> > > > >   if (cmd == VHOST_SET_BACKEND_FEATURES) {
> > > > >   if (copy_from_user(&features, featurep, 
> > > > > sizeof(features)))
> > > > >   return -EFAULT;
> > > > > + if (ops->get_backend_features)
> > > > > + parent_features = 
> > > > > ops->get_backend_features(v->vdpa);
> > > > >   if (features & ~(VHOST_VDPA_BACKEND_FEATURES |
> > > > >BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
> > > > >BIT_ULL(VHOST_BACKEND_F_RESUME) |
> > > > > -  
> > > > > BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)))
> > > > > +  parent_features))
> > > > >   return -EOPNOTSUPP;
> > > > >   if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
> > > > >!vhost_vdpa_can_suspend(v))
> > > > > --
> > > > > 2.

Re: [PATCH v1 0/2] vduse: add support for networking devices

2023-07-04 Thread Maxime Coquelin



On 7/4/23 11:59, Michael S. Tsirkin wrote:

On Tue, Jul 04, 2023 at 10:43:07AM +0200, Maxime Coquelin wrote:



On 7/3/23 23:45, Michael S. Tsirkin wrote:

On Mon, Jul 03, 2023 at 09:43:49AM +0200, Maxime Coquelin wrote:


On 7/3/23 08:44, Jason Wang wrote:

On Sun, Jul 2, 2023 at 9:37 PM Michael S. Tsirkin  wrote:


On Tue, Jun 27, 2023 at 01:36:50PM +0200, Maxime Coquelin wrote:

This small series enables virtio-net device type in VDUSE.
With it, basic operation have been tested, both with
virtio-vdpa and vhost-vdpa using DPDK Vhost library series
adding VDUSE support using split rings layout (merged in
DPDK v23.07-rc1).

Control queue support (and so multiqueue) has also been
tested, but requires a Kernel series from Jason Wang
relaxing control queue polling [1] to function reliably.

[1]: 
https://lore.kernel.org/lkml/cacgkmetgrxn3ppwsdo4oosnssljfembez0wvjgrr3whu+qa...@mail.gmail.com/T/


Jason promised to post a new version of that patch.
Right Jason?


Yes.


For now let's make sure CVQ feature flag is off?


We can do that and relax on top of my patch.


I agree? Do you prefer a features negotiation, or failing init (like
done for VERSION_1) if the VDUSE application advertises CVQ?

Thanks,
Maxime


Unfortunately guests fail probe if feature set is inconsistent.
So I don't think passing through features is a good idea,
you need a list of legal bits. And when doing this,
clear CVQ and everything that depends on it.


Since this is temporary, while cvq is made more robust, I think it is
better to fail VDUSE device creation if CVQ feature is advertised by the
VDUSE application, instead of ensuring features depending on CVQ are
also cleared.

Jason seems to think likewise, would that work for you?

Thanks,
Maxime


Nothing is more permanent than temporary solutions.
My concern would be that hardware devices then start masking CVQ
intentionally just to avoid the pain of broken software.


Got it, I'll add a patch on top that filters out CVQ feature and all the
features that depend on it.

Thanks,
Maxime







Thanks




RFC -> v1 changes:
==
- Fail device init if it does not support VERSION_1 (Jason)

Maxime Coquelin (2):
 vduse: validate block features only with block devices
 vduse: enable Virtio-net device type

drivers/vdpa/vdpa_user/vduse_dev.c | 15 +++
1 file changed, 11 insertions(+), 4 deletions(-)

--
2.41.0










___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vdpa: reject F_ENABLE_AFTER_DRIVER_OK if backend does not support it

2023-07-04 Thread Michael S. Tsirkin
On Tue, Jul 04, 2023 at 12:25:32PM +0200, Eugenio Perez Martin wrote:
> On Mon, Jul 3, 2023 at 4:52 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Jul 03, 2023 at 04:22:18PM +0200, Eugenio Pérez wrote:
> > > With the current code it is accepted as long as userland send it.
> > >
> > > Although userland should not set a feature flag that has not been
> > > offered to it with VHOST_GET_BACKEND_FEATURES, the current code will not
> > > complain for it.
> > >
> > > Since there is no specific reason for any parent to reject that backend
> > > feature bit when it has been proposed, let's control it at vdpa frontend
> > > level. Future patches may move this control to the parent driver.
> > >
> > > Fixes: 967800d2d52e ("vdpa: accept VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK 
> > > backend feature")
> > > Signed-off-by: Eugenio Pérez 
> >
> > Please do send v3. And again, I don't want to send "after driver ok" hack
> > upstream at all, I merged it in next just to give it some testing.
> > We want RING_ACCESS_AFTER_KICK or some such.
> >
> 
> Current devices do not support that semantic.

Which devices specifically access the ring after DRIVER_OK but before
a kick?

> My plan was to convert
> it in vp_vdpa if needed, and reuse the current vdpa ops. Sorry if I
> was not explicit enough.
> 
> The only solution I can see to that is to trap & emulate in the vdpa
> (parent?) driver, as talked in virtio-comment. But that complicates
> the architecture:
> * Offer VHOST_BACKEND_F_RING_ACCESS_AFTER_KICK
> * Store vq enable state separately, at
> vdpa->config->set_vq_ready(true), but not transmit that enable to hw
> * Store the doorbell state separately, but do not configure it to the
> device directly.
> 
> But how to recover if the device cannot configure them at kick time,
> for example?
> 
> Maybe we can just fail if the parent driver does not support enabling
> the vq after DRIVER_OK? That way no new feature flag is needed.
> 
> Thanks!
> 
> >
> > > ---
> > > Sent with Fixes: tag pointing to 
> > > git.kernel.org/pub/scm/linux/kernel/git/mst
> > > commit. Please let me know if I should send a v3 of [1] instead.
> > >
> > > [1] 
> > > https://lore.kernel.org/lkml/20230609121244-mutt-send-email-...@kernel.org/T/
> > > ---
> > >  drivers/vhost/vdpa.c | 7 +--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > index e1abf29fed5b..a7e554352351 100644
> > > --- a/drivers/vhost/vdpa.c
> > > +++ b/drivers/vhost/vdpa.c
> > > @@ -681,18 +681,21 @@ static long vhost_vdpa_unlocked_ioctl(struct file 
> > > *filep,
> > >  {
> > >   struct vhost_vdpa *v = filep->private_data;
> > >   struct vhost_dev *d = &v->vdev;
> > > + const struct vdpa_config_ops *ops = v->vdpa->config;
> > >   void __user *argp = (void __user *)arg;
> > >   u64 __user *featurep = argp;
> > > - u64 features;
> > > + u64 features, parent_features = 0;
> > >   long r = 0;
> > >
> > >   if (cmd == VHOST_SET_BACKEND_FEATURES) {
> > >   if (copy_from_user(&features, featurep, sizeof(features)))
> > >   return -EFAULT;
> > > + if (ops->get_backend_features)
> > > + parent_features = 
> > > ops->get_backend_features(v->vdpa);
> > >   if (features & ~(VHOST_VDPA_BACKEND_FEATURES |
> > >BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
> > >BIT_ULL(VHOST_BACKEND_F_RESUME) |
> > > -  
> > > BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)))
> > > +  parent_features))
> > >   return -EOPNOTSUPP;
> > >   if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
> > >!vhost_vdpa_can_suspend(v))
> > > --
> > > 2.39.3
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK

2023-07-04 Thread Michael S. Tsirkin
On Mon, Jul 03, 2023 at 05:26:02PM -0700, Si-Wei Liu wrote:
> 
> 
> On 7/3/2023 8:46 AM, Michael S. Tsirkin wrote:
> > On Mon, Jul 03, 2023 at 04:25:14PM +0200, Eugenio Pérez wrote:
> > > Offer this backend feature as mlx5 is compatible with it. It allows it
> > > to do live migration with CVQ, dynamically switching between passthrough
> > > and shadow virtqueue.
> > > 
> > > Signed-off-by: Eugenio Pérez 
> > Same comment.
> to which?
> 
> -Siwei

VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is too narrow a use-case to commit to it
as a kernel/userspace ABI: what if one wants to start rings in some
other specific order?
As was discussed on list, a better promise is not to access ring
until the 1st kick. vdpa can then do a kick when it wants
the device to start accessing rings.

> > 
> > > ---
> > >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
> > >   1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > index 9138ef2fb2c8..5f309a16b9dc 100644
> > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > @@ -7,6 +7,7 @@
> > >   #include 
> > >   #include 
> > >   #include 
> > > +#include 
> > >   #include 
> > >   #include 
> > >   #include 
> > > @@ -2499,6 +2500,11 @@ static void unregister_link_notifier(struct 
> > > mlx5_vdpa_net *ndev)
> > >   flush_workqueue(ndev->mvdev.wq);
> > >   }
> > > +static u64 mlx5_vdpa_get_backend_features(const struct vdpa_device *vdpa)
> > > +{
> > > + return BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK);
> > > +}
> > > +
> > >   static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, u64 
> > > features)
> > >   {
> > >   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > @@ -3140,6 +3146,7 @@ static const struct vdpa_config_ops mlx5_vdpa_ops = 
> > > {
> > >   .get_vq_align = mlx5_vdpa_get_vq_align,
> > >   .get_vq_group = mlx5_vdpa_get_vq_group,
> > >   .get_device_features = mlx5_vdpa_get_device_features,
> > > + .get_backend_features = mlx5_vdpa_get_backend_features,
> > >   .set_driver_features = mlx5_vdpa_set_driver_features,
> > >   .get_driver_features = mlx5_vdpa_get_driver_features,
> > >   .set_config_cb = mlx5_vdpa_set_config_cb,
> > > -- 
> > > 2.39.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 0/2] vduse: add support for networking devices

2023-07-04 Thread Michael S. Tsirkin
On Tue, Jul 04, 2023 at 10:43:07AM +0200, Maxime Coquelin wrote:
> 
> 
> On 7/3/23 23:45, Michael S. Tsirkin wrote:
> > On Mon, Jul 03, 2023 at 09:43:49AM +0200, Maxime Coquelin wrote:
> > > 
> > > On 7/3/23 08:44, Jason Wang wrote:
> > > > On Sun, Jul 2, 2023 at 9:37 PM Michael S. Tsirkin  
> > > > wrote:
> > > > > 
> > > > > On Tue, Jun 27, 2023 at 01:36:50PM +0200, Maxime Coquelin wrote:
> > > > > > This small series enables virtio-net device type in VDUSE.
> > > > > > With it, basic operation have been tested, both with
> > > > > > virtio-vdpa and vhost-vdpa using DPDK Vhost library series
> > > > > > adding VDUSE support using split rings layout (merged in
> > > > > > DPDK v23.07-rc1).
> > > > > > 
> > > > > > Control queue support (and so multiqueue) has also been
> > > > > > tested, but requires a Kernel series from Jason Wang
> > > > > > relaxing control queue polling [1] to function reliably.
> > > > > > 
> > > > > > [1]: 
> > > > > > https://lore.kernel.org/lkml/cacgkmetgrxn3ppwsdo4oosnssljfembez0wvjgrr3whu+qa...@mail.gmail.com/T/
> > > > > 
> > > > > Jason promised to post a new version of that patch.
> > > > > Right Jason?
> > > > 
> > > > Yes.
> > > > 
> > > > > For now let's make sure CVQ feature flag is off?
> > > > 
> > > > We can do that and relax on top of my patch.
> > > 
> > > I agree? Do you prefer a features negotiation, or failing init (like
> > > done for VERSION_1) if the VDUSE application advertises CVQ?
> > > 
> > > Thanks,
> > > Maxime
> > 
> > Unfortunately guests fail probe if feature set is inconsistent.
> > So I don't think passing through features is a good idea,
> > you need a list of legal bits. And when doing this,
> > clear CVQ and everything that depends on it.
> 
> Since this is temporary, while cvq is made more robust, I think it is
> better to fail VDUSE device creation if CVQ feature is advertised by the
> VDUSE application, instead of ensuring features depending on CVQ are
> also cleared.
> 
> Jason seems to think likewise, would that work for you?
> 
> Thanks,
> Maxime

Nothing is more permanent than temporary solutions.
My concern would be that hardware devices then start masking CVQ
intentionally just to avoid the pain of broken software.

> > 
> > 
> > > > Thanks
> > > > 
> > > > > 
> > > > > > RFC -> v1 changes:
> > > > > > ==
> > > > > > - Fail device init if it does not support VERSION_1 (Jason)
> > > > > > 
> > > > > > Maxime Coquelin (2):
> > > > > > vduse: validate block features only with block devices
> > > > > > vduse: enable Virtio-net device type
> > > > > > 
> > > > > >drivers/vdpa/vdpa_user/vduse_dev.c | 15 +++
> > > > > >1 file changed, 11 insertions(+), 4 deletions(-)
> > > > > > 
> > > > > > --
> > > > > > 2.41.0
> > > > > 
> > > > 
> > 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 0/2] vduse: add support for networking devices

2023-07-04 Thread Maxime Coquelin



On 7/3/23 23:45, Michael S. Tsirkin wrote:

On Mon, Jul 03, 2023 at 09:43:49AM +0200, Maxime Coquelin wrote:


On 7/3/23 08:44, Jason Wang wrote:

On Sun, Jul 2, 2023 at 9:37 PM Michael S. Tsirkin  wrote:


On Tue, Jun 27, 2023 at 01:36:50PM +0200, Maxime Coquelin wrote:

This small series enables virtio-net device type in VDUSE.
With it, basic operation have been tested, both with
virtio-vdpa and vhost-vdpa using DPDK Vhost library series
adding VDUSE support using split rings layout (merged in
DPDK v23.07-rc1).

Control queue support (and so multiqueue) has also been
tested, but requires a Kernel series from Jason Wang
relaxing control queue polling [1] to function reliably.

[1]: 
https://lore.kernel.org/lkml/cacgkmetgrxn3ppwsdo4oosnssljfembez0wvjgrr3whu+qa...@mail.gmail.com/T/


Jason promised to post a new version of that patch.
Right Jason?


Yes.


For now let's make sure CVQ feature flag is off?


We can do that and relax on top of my patch.


I agree? Do you prefer a features negotiation, or failing init (like
done for VERSION_1) if the VDUSE application advertises CVQ?

Thanks,
Maxime


Unfortunately guests fail probe if feature set is inconsistent.
So I don't think passing through features is a good idea,
you need a list of legal bits. And when doing this,
clear CVQ and everything that depends on it.


Since this is temporary, while cvq is made more robust, I think it is
better to fail VDUSE device creation if CVQ feature is advertised by the
VDUSE application, instead of ensuring features depending on CVQ are
also cleared.

Jason seems to think likewise, would that work for you?

Thanks,
Maxime





Thanks




RFC -> v1 changes:
==
- Fail device init if it does not support VERSION_1 (Jason)

Maxime Coquelin (2):
vduse: validate block features only with block devices
vduse: enable Virtio-net device type

   drivers/vdpa/vdpa_user/vduse_dev.c | 15 +++
   1 file changed, 11 insertions(+), 4 deletions(-)

--
2.41.0








___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: Re: [PATCH] virtio_ring: use u32 for virtio_max_dma_size

2023-07-04 Thread zhenwei pi via Virtualization




On 7/4/23 14:21, Michael S. Tsirkin wrote:

On Wed, May 10, 2023 at 10:54:37AM +0800, zhenwei pi wrote:

Both split ring and packed ring use 32bits to describe the length of
a descriptor: see struct vring_desc and struct vring_packed_desc.
This means the max segment size supported by virtio is U32_MAX.

An example of virtio_max_dma_size in virtio_blk.c:
   u32 v, max_size;

   max_size = virtio_max_dma_size(vdev);  -> implicit convert
   err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SIZE_MAX,
  struct virtio_blk_config, size_max, &v);
   max_size = min(max_size, v);

There is a risk during implicit convert here, once virtio_max_dma_size
returns 4G, max_size becomes 0.

Fixes: e6d6dd6c875e ("virtio: Introduce virtio_max_dma_size()")
Cc: Joerg Roedel 
Signed-off-by: zhenwei pi 
---
  drivers/virtio/virtio_ring.c | 12 
  include/linux/virtio.h   |  2 +-
  2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c5310eaf8b46..55cfecf030a1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -289,12 +289,16 @@ static bool vring_use_dma_api(const struct virtio_device 
*vdev)
return false;
  }
  
-size_t virtio_max_dma_size(const struct virtio_device *vdev)

+u32 virtio_max_dma_size(const struct virtio_device *vdev)
  {
-   size_t max_segment_size = SIZE_MAX;
+   u32 max_segment_size = U32_MAX;
  
-	if (vring_use_dma_api(vdev))

-   max_segment_size = dma_max_mapping_size(vdev->dev.parent);
+   if (vring_use_dma_api(vdev)) {
+   size_t max_dma_size = dma_max_mapping_size(vdev->dev.parent);
+
+   if (max_dma_size < max_segment_size)
+   max_segment_size = max_dma_size;
+   }
  
  	return max_segment_size;

  }


Took a while for me to get this, it's confusing.  I think the issue is
really in virtio blk, so I would just change max_size there to size_t
and be done with it.




Fine.





diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b93238db94e3..1a605f408329 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -147,7 +147,7 @@ int virtio_device_restore(struct virtio_device *dev);
  #endif
  void virtio_reset_device(struct virtio_device *dev);
  
-size_t virtio_max_dma_size(const struct virtio_device *vdev);

+u32 virtio_max_dma_size(const struct virtio_device *vdev);
  
  #define virtio_device_for_each_vq(vdev, vq) \

list_for_each_entry(vq, &vdev->vqs, list)
--
2.20.1




--
zhenwei pi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization