Re: [PATCH] vhost-vdpa: filter VIRTIO_F_RING_PACKED feature

2023-06-07 Thread Michael S. Tsirkin
On Thu, Jun 08, 2023 at 08:42:15AM +0800, Jason Wang wrote:
> On Wed, Jun 7, 2023 at 5:43 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jun 07, 2023 at 10:39:15AM +0200, Stefano Garzarella wrote:
> > > On Tue, Jun 6, 2023 at 2:58 PM Michael S. Tsirkin  wrote:
> > > >
> > > > On Tue, Jun 06, 2023 at 09:29:22AM +0800, Jason Wang wrote:
> > > > > On Mon, Jun 5, 2023 at 10:58 PM Stefano Garzarella 
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, Jun 05, 2023 at 09:54:57AM -0400, Michael S. Tsirkin wrote:
> > > > > > >On Mon, Jun 05, 2023 at 03:30:35PM +0200, Stefano Garzarella wrote:
> > > > > > >> On Mon, Jun 05, 2023 at 09:00:25AM -0400, Michael S. Tsirkin 
> > > > > > >> wrote:
> > > > > > >> > On Mon, Jun 05, 2023 at 02:54:20PM +0200, Stefano Garzarella 
> > > > > > >> > wrote:
> > > > > > >> > > On Mon, Jun 05, 2023 at 08:41:54AM -0400, Michael S. Tsirkin 
> > > > > > >> > > wrote:
> > > > > > >> > > > On Mon, Jun 05, 2023 at 01:06:44PM +0200, Stefano 
> > > > > > >> > > > Garzarella wrote:
> > > > > > >> > > > > vhost-vdpa IOCTLs (eg. VHOST_GET_VRING_BASE, 
> > > > > > >> > > > > VHOST_SET_VRING_BASE)
> > > > > > >> > > > > don't support packed virtqueue well yet, so let's filter 
> > > > > > >> > > > > the
> > > > > > >> > > > > VIRTIO_F_RING_PACKED feature for now in 
> > > > > > >> > > > > vhost_vdpa_get_features().
> > > > > > >> > > > >
> > > > > > >> > > > > This way, even if the device supports it, we don't risk 
> > > > > > >> > > > > it being
> > > > > > >> > > > > negotiated, then the VMM is unable to set the vring 
> > > > > > >> > > > > state properly.
> > > > > > >> > > > >
> > > > > > >> > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based 
> > > > > > >> > > > > backend")
> > > > > > >> > > > > Cc: sta...@vger.kernel.org
> > > > > > >> > > > > Signed-off-by: Stefano Garzarella 
> > > > > > >> > > > > ---
> > > > > > >> > > > >
> > > > > > >> > > > > Notes:
> > > > > > >> > > > > This patch should be applied before the "[PATCH v2 
> > > > > > >> > > > > 0/3] vhost_vdpa:
> > > > > > >> > > > > better PACKED support" series [1] and backported in 
> > > > > > >> > > > > stable branches.
> > > > > > >> > > > >
> > > > > > >> > > > > We can revert it when we are sure that everything is 
> > > > > > >> > > > > working with
> > > > > > >> > > > > packed virtqueues.
> > > > > > >> > > > >
> > > > > > >> > > > > Thanks,
> > > > > > >> > > > > Stefano
> > > > > > >> > > > >
> > > > > > >> > > > > [1] 
> > > > > > >> > > > > https://lore.kernel.org/virtualization/20230424225031.18947-1-shannon.nel...@amd.com/
> > > > > > >> > > >
> > > > > > >> > > > I'm a bit lost here. So why am I merging "better PACKED 
> > > > > > >> > > > support" then?
> > > > > > >> > >
> > > > > > >> > > To really support packed virtqueue with vhost-vdpa, at that 
> > > > > > >> > > point we would
> > > > > > >> > > also have to revert this patch.
> > > > > > >> > >
> > > > > > >> > > I wasn't sure if you wanted to queue the series for this 
> > > > > > >> > > merge window.
> > > > > > >> > > In that case do you think it is better to send this patch 
> > > > > > >> > > only for stable
> > > > > > >> > > branches?
> > > > > > >> > > > Does this patch make them a NOP?
> > > > > > >> > >
> > > > > > >> > > Yep, after applying the "better PACKED support" series and 
> > > > > > >> > > being
> > > > > > >> > > sure that
> > > > > > >> > > the IOCTLs of vhost-vdpa support packed virtqueue, we should 
> > > > > > >> > > revert this
> > > > > > >> > > patch.
> > > > > > >> > >
> > > > > > >> > > Let me know if you prefer a different approach.
> > > > > > >> > >
> > > > > > >> > > I'm concerned that QEMU uses vhost-vdpa IOCTLs thinking that 
> > > > > > >> > > the kernel
> > > > > > >> > > interprets them the right way, when it does not.
> > > > > > >> > >
> > > > > > >> > > Thanks,
> > > > > > >> > > Stefano
> > > > > > >> > >
> > > > > > >> >
> > > > > > >> > If this fixes a bug can you add Fixes tags to each of them? 
> > > > > > >> > Then it's ok
> > > > > > >> > to merge in this window. Probably easier than the elaborate
> > > > > > >> > mask/unmask dance.
> > > > > > >>
> > > > > > >> CCing Shannon (the original author of the "better PACKED support"
> > > > > > >> series).
> > > > > > >>
> > > > > > >> IIUC Shannon is going to send a v3 of that series to fix the
> > > > > > >> documentation, so Shannon can you also add the Fixes tags?
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> Stefano
> > > > > > >
> > > > > > >Well this is in my tree already. Just reply with
> > > > > > >Fixes: <>
> > > > > > >to each and I will add these tags.
> > > > > >
> > > > > > I tried, but it is not easy since we added the support for packed
> > > > > > virtqueue in vdpa and vhost incrementally.
> > > > > >
> > > > > > Initially I was thinking of adding the same tag used here:
> > > > > >
> > > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
> > > > > >
> > > > > > Then I discovered that vq_s

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-07 Thread Xuan Zhuo
On Thu, 8 Jun 2023 08:38:14 +0800, Jason Wang  wrote:
> On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote:
> > > On Tue, May 30, 2023 at 9:19 AM Liang Chen  
> > > wrote:
> > > >
> > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > > > The implementation at the moment uses one page per packet in 
> > > > > > > > both the
> > > > > > > > normal and XDP path. In addition, introducing a module 
> > > > > > > > parameter to enable
> > > > > > > > or disable the usage of page pool (disabled by default).
> > > > > > > >
> > > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > > performance gain
> > > > > > > > in the normal path.
> > > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > > >
> > > > > > > > In multi-core vm testing environments, The most significant 
> > > > > > > > performance
> > > > > > > > gain is observed in XDP cpumap:
> > > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > > > >
> > > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > > fragmentation and
> > > > > > > > DMA map/unmap support.
> > > > > > > >
> > > > > > > > Signed-off-by: Liang Chen 
> > > > > > >
> > > > > > > Why off by default?
> > > > > > > I am guessing it sometimes has performance costs too?
> > > > > > >
> > > > > > >
> > > > > > > What happens if we use page pool for big mode too?
> > > > > > > The less modes we have the better...
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > Sure, now I believe it makes sense to enable it by default. When the
> > > > > > packet size is very small, it reduces the likelihood of skb
> > > > > > coalescing. But such cases are rare.
> > > > >
> > > > > small packets are rare? These workloads are easy to create actually.
> > > > > Pls try and include benchmark with small packet size.
> > > > >
> > > >
> > > > Sure, Thanks!
> > >
> > > Before going ahead and posting v2 patch, I would like to hear more
> > > advice for the cases of small packets. I have done more performance
> > > benchmark with small packets since then. Here is a list of iperf
> > > output,
> > >
> > > With PP and PP fragmenting:
> > > 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0144 
> > > KBytes
> > > 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> > > 223 KBytes
> > > 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> > > 324 KBytes
> > > 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> > > 1.08 MBytes
> > > 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> > > 744 KBytes
> > > 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0963 
> > > KBytes
> > > 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   1.25 
> > > MBytes
> > > 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   1.70 
> > > MBytes
> > > 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   4.26 
> > > MBytes
> > > 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   3.20 
> > > MBytes
>
> Note that virtio-net driver is lacking things like BQL and others, so
> it might suffer from buffer bloat for TCP performance. Would you mind
> to measure with e.g using testpmd on the vhost to see the rx PPS?
>
> > >
> > > Without PP:
> > > 256: [  5] 680.00-685.00 sec  1.57 GBytes  2.69 Gbits/sec0359 
> > > KBytes
> > > 1K:  [  5]  75.00-80.00  sec  5.47 GBytes  9.40 Gbits/sec0730 
> > > KBytes
> > > 2K:  [  5]  65.00-70.00  sec  9.46 GBytes  16.2 Gbits/sec0   1.99 
> > > MBytes
> > > 4K:  [  5]  30.00-35.00  sec  14.5 GBytes  25.0 Gbits/sec0   1.20 
> > > MBytes
> > > 8K:  [  5]  45.00-50.00  sec  19.9 GBytes  34.1 Gbits/sec0   1.72 
> > > MBytes
> > > 16K:[  5]   5.00-10.00  sec  23.8 GBytes  40.9 Gbits/sec0   2.90 
> > > MBytes
> > > 32K:[  5]  15.00-20.00  sec  28.0 GBytes  48.1 Gbits/sec0   3.03 
> > > MBytes
> > > 64K:[  5]  60.00-65.00  sec  31.8 GBytes  54.6 Gbits/sec0   3.05 
> > > MBytes
> > > 128K:  [  5]  45.00-50.00  sec  33.0 GBytes  56.6 Gbits/sec1   3.03 
> > > MBytes
> > > 256K:  [  5]  25.00-30.00  sec  34.7 GBytes  59.6 Gbits/sec0   3.11 
> > > MBytes
> > >
> > >
> > > The major factor contributing to the performance drop is the reduction
> > > of skb coalescing. Additionally, without the page pool, small packets
> > > can still benefit from the allocation of 8 continuous pages by
> > > breaking them down int

Re: [PATCH] scsi: virtio_scsi: Remove a useless function call

2023-06-07 Thread Martin K. Petersen
On Mon, 29 May 2023 09:35:08 +0200, Christophe JAILLET wrote:

> 'inq_result' is known to be NULL. There is no point calling kfree().
> 
> 

Applied to 6.5/scsi-queue, thanks!

[1/1] scsi: virtio_scsi: Remove a useless function call
  https://git.kernel.org/mkp/scsi/c/0e5e41ee3d73

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost-vdpa: filter VIRTIO_F_RING_PACKED feature

2023-06-07 Thread Jason Wang
On Wed, Jun 7, 2023 at 5:43 PM Michael S. Tsirkin  wrote:
>
> On Wed, Jun 07, 2023 at 10:39:15AM +0200, Stefano Garzarella wrote:
> > On Tue, Jun 6, 2023 at 2:58 PM Michael S. Tsirkin  wrote:
> > >
> > > On Tue, Jun 06, 2023 at 09:29:22AM +0800, Jason Wang wrote:
> > > > On Mon, Jun 5, 2023 at 10:58 PM Stefano Garzarella 
> > > >  wrote:
> > > > >
> > > > > On Mon, Jun 05, 2023 at 09:54:57AM -0400, Michael S. Tsirkin wrote:
> > > > > >On Mon, Jun 05, 2023 at 03:30:35PM +0200, Stefano Garzarella wrote:
> > > > > >> On Mon, Jun 05, 2023 at 09:00:25AM -0400, Michael S. Tsirkin wrote:
> > > > > >> > On Mon, Jun 05, 2023 at 02:54:20PM +0200, Stefano Garzarella 
> > > > > >> > wrote:
> > > > > >> > > On Mon, Jun 05, 2023 at 08:41:54AM -0400, Michael S. Tsirkin 
> > > > > >> > > wrote:
> > > > > >> > > > On Mon, Jun 05, 2023 at 01:06:44PM +0200, Stefano Garzarella 
> > > > > >> > > > wrote:
> > > > > >> > > > > vhost-vdpa IOCTLs (eg. VHOST_GET_VRING_BASE, 
> > > > > >> > > > > VHOST_SET_VRING_BASE)
> > > > > >> > > > > don't support packed virtqueue well yet, so let's filter 
> > > > > >> > > > > the
> > > > > >> > > > > VIRTIO_F_RING_PACKED feature for now in 
> > > > > >> > > > > vhost_vdpa_get_features().
> > > > > >> > > > >
> > > > > >> > > > > This way, even if the device supports it, we don't risk it 
> > > > > >> > > > > being
> > > > > >> > > > > negotiated, then the VMM is unable to set the vring state 
> > > > > >> > > > > properly.
> > > > > >> > > > >
> > > > > >> > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
> > > > > >> > > > > Cc: sta...@vger.kernel.org
> > > > > >> > > > > Signed-off-by: Stefano Garzarella 
> > > > > >> > > > > ---
> > > > > >> > > > >
> > > > > >> > > > > Notes:
> > > > > >> > > > > This patch should be applied before the "[PATCH v2 
> > > > > >> > > > > 0/3] vhost_vdpa:
> > > > > >> > > > > better PACKED support" series [1] and backported in 
> > > > > >> > > > > stable branches.
> > > > > >> > > > >
> > > > > >> > > > > We can revert it when we are sure that everything is 
> > > > > >> > > > > working with
> > > > > >> > > > > packed virtqueues.
> > > > > >> > > > >
> > > > > >> > > > > Thanks,
> > > > > >> > > > > Stefano
> > > > > >> > > > >
> > > > > >> > > > > [1] 
> > > > > >> > > > > https://lore.kernel.org/virtualization/20230424225031.18947-1-shannon.nel...@amd.com/
> > > > > >> > > >
> > > > > >> > > > I'm a bit lost here. So why am I merging "better PACKED 
> > > > > >> > > > support" then?
> > > > > >> > >
> > > > > >> > > To really support packed virtqueue with vhost-vdpa, at that 
> > > > > >> > > point we would
> > > > > >> > > also have to revert this patch.
> > > > > >> > >
> > > > > >> > > I wasn't sure if you wanted to queue the series for this merge 
> > > > > >> > > window.
> > > > > >> > > In that case do you think it is better to send this patch only 
> > > > > >> > > for stable
> > > > > >> > > branches?
> > > > > >> > > > Does this patch make them a NOP?
> > > > > >> > >
> > > > > >> > > Yep, after applying the "better PACKED support" series and 
> > > > > >> > > being
> > > > > >> > > sure that
> > > > > >> > > the IOCTLs of vhost-vdpa support packed virtqueue, we should 
> > > > > >> > > revert this
> > > > > >> > > patch.
> > > > > >> > >
> > > > > >> > > Let me know if you prefer a different approach.
> > > > > >> > >
> > > > > >> > > I'm concerned that QEMU uses vhost-vdpa IOCTLs thinking that 
> > > > > >> > > the kernel
> > > > > >> > > interprets them the right way, when it does not.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > > Stefano
> > > > > >> > >
> > > > > >> >
> > > > > >> > If this fixes a bug can you add Fixes tags to each of them? Then 
> > > > > >> > it's ok
> > > > > >> > to merge in this window. Probably easier than the elaborate
> > > > > >> > mask/unmask dance.
> > > > > >>
> > > > > >> CCing Shannon (the original author of the "better PACKED support"
> > > > > >> series).
> > > > > >>
> > > > > >> IIUC Shannon is going to send a v3 of that series to fix the
> > > > > >> documentation, so Shannon can you also add the Fixes tags?
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Stefano
> > > > > >
> > > > > >Well this is in my tree already. Just reply with
> > > > > >Fixes: <>
> > > > > >to each and I will add these tags.
> > > > >
> > > > > I tried, but it is not easy since we added the support for packed
> > > > > virtqueue in vdpa and vhost incrementally.
> > > > >
> > > > > Initially I was thinking of adding the same tag used here:
> > > > >
> > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
> > > > >
> > > > > Then I discovered that vq_state wasn't there, so I was thinking of
> > > > >
> > > > > Fixes: 530a5678bc00 ("vdpa: support packed virtqueue for 
> > > > > set/get_vq_state()")
> > > > >
> > > > > So we would have to backport quite a few patches into the stable 
> > > > > branches.
> > > > > I don't know if it's worth it...
> > > > >
> > > > 

Re: vhost: Fix vhost_task regressions in 6.4-rc

2023-06-07 Thread Mike Christie
On 6/7/23 3:22 PM, Michael S. Tsirkin wrote:
> On Wed, Jun 07, 2023 at 02:23:36PM -0500, Mike Christie wrote:
>> The following patches were made over Linus's tree which contains a
>> vhost change missing in mst's vhost branch. These patches fix two
>> issues caused by the vhost_task patches:
>> 1. I was setting dev->worker too early and this caused crashes when
>> vsock would queue work before VHOST_SET_OWNER.
>>
>> 2. The patch that Linus's tree contains which vhost does not yet
>> have converted vhost_tasks to use CLONE_THREAD. That required a
>> change to how we set the task state, but I completely messed up
>> and moved when we set ourself to interruptible too late.
>>
> 
> Right. and that's in rc5 yes?
> 

Yes.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-07 Thread Jason Wang
On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin  wrote:
>
> On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote:
> > On Tue, May 30, 2023 at 9:19 AM Liang Chen  
> > wrote:
> > >
> > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > > The implementation at the moment uses one page per packet in both 
> > > > > > > the
> > > > > > > normal and XDP path. In addition, introducing a module parameter 
> > > > > > > to enable
> > > > > > > or disable the usage of page pool (disabled by default).
> > > > > > >
> > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > performance gain
> > > > > > > in the normal path.
> > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > >
> > > > > > > In multi-core vm testing environments, The most significant 
> > > > > > > performance
> > > > > > > gain is observed in XDP cpumap:
> > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > > >
> > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > fragmentation and
> > > > > > > DMA map/unmap support.
> > > > > > >
> > > > > > > Signed-off-by: Liang Chen 
> > > > > >
> > > > > > Why off by default?
> > > > > > I am guessing it sometimes has performance costs too?
> > > > > >
> > > > > >
> > > > > > What happens if we use page pool for big mode too?
> > > > > > The less modes we have the better...
> > > > > >
> > > > > >
> > > > >
> > > > > Sure, now I believe it makes sense to enable it by default. When the
> > > > > packet size is very small, it reduces the likelihood of skb
> > > > > coalescing. But such cases are rare.
> > > >
> > > > small packets are rare? These workloads are easy to create actually.
> > > > Pls try and include benchmark with small packet size.
> > > >
> > >
> > > Sure, Thanks!
> >
> > Before going ahead and posting v2 patch, I would like to hear more
> > advice for the cases of small packets. I have done more performance
> > benchmark with small packets since then. Here is a list of iperf
> > output,
> >
> > With PP and PP fragmenting:
> > 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0144 
> > KBytes
> > 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> > 223 KBytes
> > 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> > 324 KBytes
> > 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> > 1.08 MBytes
> > 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> > 744 KBytes
> > 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0963 
> > KBytes
> > 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   1.25 
> > MBytes
> > 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   1.70 
> > MBytes
> > 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   4.26 
> > MBytes
> > 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   3.20 
> > MBytes

Note that virtio-net driver is lacking things like BQL and others, so
it might suffer from buffer bloat for TCP performance. Would you mind
to measure with e.g using testpmd on the vhost to see the rx PPS?

> >
> > Without PP:
> > 256: [  5] 680.00-685.00 sec  1.57 GBytes  2.69 Gbits/sec0359 
> > KBytes
> > 1K:  [  5]  75.00-80.00  sec  5.47 GBytes  9.40 Gbits/sec0730 
> > KBytes
> > 2K:  [  5]  65.00-70.00  sec  9.46 GBytes  16.2 Gbits/sec0   1.99 
> > MBytes
> > 4K:  [  5]  30.00-35.00  sec  14.5 GBytes  25.0 Gbits/sec0   1.20 
> > MBytes
> > 8K:  [  5]  45.00-50.00  sec  19.9 GBytes  34.1 Gbits/sec0   1.72 
> > MBytes
> > 16K:[  5]   5.00-10.00  sec  23.8 GBytes  40.9 Gbits/sec0   2.90 
> > MBytes
> > 32K:[  5]  15.00-20.00  sec  28.0 GBytes  48.1 Gbits/sec0   3.03 
> > MBytes
> > 64K:[  5]  60.00-65.00  sec  31.8 GBytes  54.6 Gbits/sec0   3.05 
> > MBytes
> > 128K:  [  5]  45.00-50.00  sec  33.0 GBytes  56.6 Gbits/sec1   3.03 
> > MBytes
> > 256K:  [  5]  25.00-30.00  sec  34.7 GBytes  59.6 Gbits/sec0   3.11 
> > MBytes
> >
> >
> > The major factor contributing to the performance drop is the reduction
> > of skb coalescing. Additionally, without the page pool, small packets
> > can still benefit from the allocation of 8 continuous pages by
> > breaking them down into smaller pieces. This effectively reduces the
> > frequency of page allocation from the buddy system. For instance, the
> > arrival of 32 1K packets only triggers one alloc_page call. Therefore,
> > the benefits of using a page pool are limited in such cases.

I wonder if we can imp

Re: [PATCH] vhost-vdpa: filter VIRTIO_F_RING_PACKED feature

2023-06-07 Thread Michael S. Tsirkin
On Tue, Jun 06, 2023 at 12:09:11PM +0200, Stefano Garzarella wrote:
> On Mon, Jun 05, 2023 at 05:44:50PM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jun 05, 2023 at 04:56:37PM +0200, Stefano Garzarella wrote:
> > > On Mon, Jun 05, 2023 at 09:54:57AM -0400, Michael S. Tsirkin wrote:
> > > > On Mon, Jun 05, 2023 at 03:30:35PM +0200, Stefano Garzarella wrote:
> > > > > On Mon, Jun 05, 2023 at 09:00:25AM -0400, Michael S. Tsirkin wrote:
> > > > > > On Mon, Jun 05, 2023 at 02:54:20PM +0200, Stefano Garzarella wrote:
> > > > > > > On Mon, Jun 05, 2023 at 08:41:54AM -0400, Michael S. Tsirkin 
> > > > > > > wrote:
> > > > > > > > On Mon, Jun 05, 2023 at 01:06:44PM +0200, Stefano Garzarella 
> > > > > > > > wrote:
> > > > > > > > > vhost-vdpa IOCTLs (eg. VHOST_GET_VRING_BASE, 
> > > > > > > > > VHOST_SET_VRING_BASE)
> > > > > > > > > don't support packed virtqueue well yet, so let's filter the
> > > > > > > > > VIRTIO_F_RING_PACKED feature for now in 
> > > > > > > > > vhost_vdpa_get_features().
> > > > > > > > >
> > > > > > > > > This way, even if the device supports it, we don't risk it 
> > > > > > > > > being
> > > > > > > > > negotiated, then the VMM is unable to set the vring state 
> > > > > > > > > properly.
> > > > > > > > >
> > > > > > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
> > > > > > > > > Cc: sta...@vger.kernel.org
> > > > > > > > > Signed-off-by: Stefano Garzarella 
> > > > > > > > > ---
> > > > > > > > >
> > > > > > > > > Notes:
> > > > > > > > > This patch should be applied before the "[PATCH v2 0/3] 
> > > > > > > > > vhost_vdpa:
> > > > > > > > > better PACKED support" series [1] and backported in 
> > > > > > > > > stable branches.
> > > > > > > > >
> > > > > > > > > We can revert it when we are sure that everything is 
> > > > > > > > > working with
> > > > > > > > > packed virtqueues.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Stefano
> > > > > > > > >
> > > > > > > > > [1] 
> > > > > > > > > https://lore.kernel.org/virtualization/20230424225031.18947-1-shannon.nel...@amd.com/
> > > > > > > >
> > > > > > > > I'm a bit lost here. So why am I merging "better PACKED 
> > > > > > > > support" then?
> > > > > > >
> > > > > > > To really support packed virtqueue with vhost-vdpa, at that point 
> > > > > > > we would
> > > > > > > also have to revert this patch.
> > > > > > >
> > > > > > > I wasn't sure if you wanted to queue the series for this merge 
> > > > > > > window.
> > > > > > > In that case do you think it is better to send this patch only 
> > > > > > > for stable
> > > > > > > branches?
> > > > > > > > Does this patch make them a NOP?
> > > > > > >
> > > > > > > Yep, after applying the "better PACKED support" series and being
> > > > > > > sure that
> > > > > > > the IOCTLs of vhost-vdpa support packed virtqueue, we should 
> > > > > > > revert this
> > > > > > > patch.
> > > > > > >
> > > > > > > Let me know if you prefer a different approach.
> > > > > > >
> > > > > > > I'm concerned that QEMU uses vhost-vdpa IOCTLs thinking that the 
> > > > > > > kernel
> > > > > > > interprets them the right way, when it does not.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Stefano
> > > > > > >
> > > > > >
> > > > > > If this fixes a bug can you add Fixes tags to each of them? Then 
> > > > > > it's ok
> > > > > > to merge in this window. Probably easier than the elaborate
> > > > > > mask/unmask dance.
> > > > >
> > > > > CCing Shannon (the original author of the "better PACKED support"
> > > > > series).
> > > > >
> > > > > IIUC Shannon is going to send a v3 of that series to fix the
> > > > > documentation, so Shannon can you also add the Fixes tags?
> > > > >
> > > > > Thanks,
> > > > > Stefano
> > > >
> > > > Well this is in my tree already. Just reply with
> > > > Fixes: <>
> > > > to each and I will add these tags.
> > > 
> > > I tried, but it is not easy since we added the support for packed 
> > > virtqueue
> > > in vdpa and vhost incrementally.
> > > 
> > > Initially I was thinking of adding the same tag used here:
> > > 
> > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
> > > 
> > > Then I discovered that vq_state wasn't there, so I was thinking of
> > > 
> > > Fixes: 530a5678bc00 ("vdpa: support packed virtqueue for 
> > > set/get_vq_state()")
> > > 
> > > So we would have to backport quite a few patches into the stable branches.
> > > I don't know if it's worth it...
> > > 
> > > I still think it is better to disable packed in the stable branches,
> > > otherwise I have to make a list of all the patches we need.
> > > 
> > > Any other ideas?
> > > 
> > > Thanks,
> > > Stefano
> > 
> > OK so. You want me to apply this one now, and fixes in the next
> > kernel?
> 
> Yep, it seems to me the least risky approach.
> 
> Thanks,
> Stefano

I'd like something more general though. Bring back the allowlist?
Jason, your thoughts?

-- 
MST

___
Virtualization m

Re: vhost: Fix vhost_task regressions in 6.4-rc

2023-06-07 Thread Michael S. Tsirkin
On Wed, Jun 07, 2023 at 02:23:36PM -0500, Mike Christie wrote:
> The following patches were made over Linus's tree which contains a
> vhost change missing in mst's vhost branch. These patches fix two
> issues caused by the vhost_task patches:
> 1. I was setting dev->worker too early and this caused crashes when
> vsock would queue work before VHOST_SET_OWNER.
> 
> 2. The patch that Linus's tree contains which vhost does not yet
> have converted vhost_tasks to use CLONE_THREAD. That required a
> change to how we set the task state, but I completely messed up
> and moved when we set ourself to interruptible too late.
> 

Right. and that's in rc5 yes?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-07 Thread Michael S. Tsirkin
On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote:
> On Tue, May 30, 2023 at 9:19 AM Liang Chen  wrote:
> >
> > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  wrote:
> > >
> > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > The implementation at the moment uses one page per packet in both 
> > > > > > the
> > > > > > normal and XDP path. In addition, introducing a module parameter to 
> > > > > > enable
> > > > > > or disable the usage of page pool (disabled by default).
> > > > > >
> > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > performance gain
> > > > > > in the normal path.
> > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > >
> > > > > > In multi-core vm testing environments, The most significant 
> > > > > > performance
> > > > > > gain is observed in XDP cpumap:
> > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > >
> > > > > > With this foundation, we can further integrate page pool 
> > > > > > fragmentation and
> > > > > > DMA map/unmap support.
> > > > > >
> > > > > > Signed-off-by: Liang Chen 
> > > > >
> > > > > Why off by default?
> > > > > I am guessing it sometimes has performance costs too?
> > > > >
> > > > >
> > > > > What happens if we use page pool for big mode too?
> > > > > The less modes we have the better...
> > > > >
> > > > >
> > > >
> > > > Sure, now I believe it makes sense to enable it by default. When the
> > > > packet size is very small, it reduces the likelihood of skb
> > > > coalescing. But such cases are rare.
> > >
> > > small packets are rare? These workloads are easy to create actually.
> > > Pls try and include benchmark with small packet size.
> > >
> >
> > Sure, Thanks!
> 
> Before going ahead and posting v2 patch, I would like to hear more
> advice for the cases of small packets. I have done more performance
> benchmark with small packets since then. Here is a list of iperf
> output,
> 
> With PP and PP fragmenting:
> 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0144 
> KBytes
> 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> 223 KBytes
> 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> 324 KBytes
> 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> 1.08 MBytes
> 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> 744 KBytes
> 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0963 
> KBytes
> 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   1.25 
> MBytes
> 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   1.70 
> MBytes
> 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   4.26 
> MBytes
> 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   3.20 
> MBytes
> 
> Without PP:
> 256: [  5] 680.00-685.00 sec  1.57 GBytes  2.69 Gbits/sec0359 
> KBytes
> 1K:  [  5]  75.00-80.00  sec  5.47 GBytes  9.40 Gbits/sec0730 
> KBytes
> 2K:  [  5]  65.00-70.00  sec  9.46 GBytes  16.2 Gbits/sec0   1.99 
> MBytes
> 4K:  [  5]  30.00-35.00  sec  14.5 GBytes  25.0 Gbits/sec0   1.20 
> MBytes
> 8K:  [  5]  45.00-50.00  sec  19.9 GBytes  34.1 Gbits/sec0   1.72 
> MBytes
> 16K:[  5]   5.00-10.00  sec  23.8 GBytes  40.9 Gbits/sec0   2.90 
> MBytes
> 32K:[  5]  15.00-20.00  sec  28.0 GBytes  48.1 Gbits/sec0   3.03 
> MBytes
> 64K:[  5]  60.00-65.00  sec  31.8 GBytes  54.6 Gbits/sec0   3.05 
> MBytes
> 128K:  [  5]  45.00-50.00  sec  33.0 GBytes  56.6 Gbits/sec1   3.03 MBytes
> 256K:  [  5]  25.00-30.00  sec  34.7 GBytes  59.6 Gbits/sec0   3.11 MBytes
> 
> 
> The major factor contributing to the performance drop is the reduction
> of skb coalescing. Additionally, without the page pool, small packets
> can still benefit from the allocation of 8 continuous pages by
> breaking them down into smaller pieces. This effectively reduces the
> frequency of page allocation from the buddy system. For instance, the
> arrival of 32 1K packets only triggers one alloc_page call. Therefore,
> the benefits of using a page pool are limited in such cases. In fact,
> without page pool fragmenting enabled, it can even hinder performance
> from this perspective.
> 
> Upon further consideration, I tend to believe making page pool the
> default option may not be appropriate. As you pointed out, we cannot
> simply ignore the performance impact on small packets. Any comments on
> this will be much appreciated.
> 
> 
> Thanks,
> Liang


So, let's only use page pool for XDP then?

> 
> > > > The usage of page pool for big mode is being evaluated now. Thanks!
> > > >
> > > > > > 

Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP

2023-06-07 Thread Michael S. Tsirkin
On Wed, Jun 07, 2023 at 07:05:11AM -0700, Christoph Hellwig wrote:
> On Mon, Jun 05, 2023 at 09:58:21AM +0800, Xuan Zhuo wrote:
> > On Fri, 2 Jun 2023 23:29:02 -0700, Jakub Kicinski  wrote:
> > > On Fri,  2 Jun 2023 17:21:56 +0800 Xuan Zhuo wrote:
> > > > Thanks for the help from Christoph.
> > >
> > > That said you haven't CCed him on the series, isn't the general rule to
> > > CC anyone who was involved in previous discussions?
> > 
> > 
> > Sorry, I forgot to add cc after git format-patch.
> 
> So I've been looking for this series elsewhere, but it seems to include
> neither lkml nor the iommu list, so I can't find it.  Can you please
> repost it?

I bounced to lkml now - does this help?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


vhost: Fix vhost_task regressions in 6.4-rc

2023-06-07 Thread Mike Christie
The following patches were made over Linus's tree which contains a
vhost change missing in mst's vhost branch. These patches fix two
issues caused by the vhost_task patches:
1. I was setting dev->worker too early and this caused crashes when
vsock would queue work before VHOST_SET_OWNER.

2. The patch that Linus's tree contains which vhost does not yet
have converted vhost_tasks to use CLONE_THREAD. That required a
change to how we set the task state, but I completely messed up
and moved when we set ourself to interruptible too late.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] vhost: Fix worker hangs due to missed wake up calls

2023-06-07 Thread Mike Christie
We can race where we have added work to the work_list, but
vhost_task_fn has passed that check but not yet set us into
TASK_INTERRUPTIBLE. wake_up_process will see us in TASK_RUNNING and
just return.

This bug was intoduced in commit f9010dbdce91 ("fork, vhost: Use
CLONE_THREAD to fix freezer/ps regression") when I moved the setting
of TASK_INTERRUPTIBLE to simplfy the code and avoid get_signal from
logging warnings about being in the wrong state. This moves the setting
of TASK_INTERRUPTIBLE back to before we test if we need to stop the
task to avoid a possible race there as well. We then have vhost_worker
set TASK_RUNNING if it finds work similar to before.

Fixes: f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps 
regression")
Signed-off-by: Mike Christie 
---
 drivers/vhost/vhost.c |  2 ++
 kernel/vhost_task.c   | 16 +---
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 7a9f93eae225..b2722d29e069 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -341,6 +341,8 @@ static bool vhost_worker(void *data)
 
node = llist_del_all(&worker->work_list);
if (node) {
+   __set_current_state(TASK_RUNNING);
+
node = llist_reverse_order(node);
/* make sure flag is seen after deletion */
smp_wmb();
diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
index f80d5c51ae67..da35e5b7f047 100644
--- a/kernel/vhost_task.c
+++ b/kernel/vhost_task.c
@@ -28,10 +28,6 @@ static int vhost_task_fn(void *data)
for (;;) {
bool did_work;
 
-   /* mb paired w/ vhost_task_stop */
-   if (test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags))
-   break;
-
if (!dead && signal_pending(current)) {
struct ksignal ksig;
/*
@@ -48,11 +44,17 @@ static int vhost_task_fn(void *data)
clear_thread_flag(TIF_SIGPENDING);
}
 
+   /* mb paired w/ vhost_task_stop */
+   set_current_state(TASK_INTERRUPTIBLE);
+
+   if (test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags)) {
+   __set_current_state(TASK_RUNNING);
+   break;
+   }
+
did_work = vtsk->fn(vtsk->data);
-   if (!did_work) {
-   set_current_state(TASK_INTERRUPTIBLE);
+   if (!did_work)
schedule();
-   }
}
 
complete(&vtsk->exited);
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/2] vhost: Fix crash during early vhost_transport_send_pkt calls

2023-06-07 Thread Mike Christie
If userspace does VHOST_VSOCK_SET_GUEST_CID before VHOST_SET_OWNER we
can race where:
1. thread0 calls vhost_transport_send_pkt -> vhost_work_queue
2. thread1 does VHOST_SET_OWNER which calls vhost_worker_create.
3. vhost_worker_create will set the dev->worker pointer before setting
the worker->vtsk pointer.
4. thread0's vhost_work_queue will see the dev->worker pointer is
set and try to call vhost_task_wake using not yet set worker->vtsk
pointer.
5. We then crash since vtsk is NULL.

Before commit 6e890c5d5021 ("vhost: use vhost_tasks for worker
threads"), we only had the worker pointer so we could just check it to
see if VHOST_SET_OWNER has been done. After that commit we have the
vhost_worker and vhost_task pointer, so we can now hit the bug above.

This patch embeds the vhost_worker in the vhost_dev and moves the work
list initialization back to vhost_dev_init, so we can just check the
worker.vtsk pointer to check if VHOST_SET_OWNER has been done like
before.

Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads")
Signed-off-by: Mike Christie 
---
 drivers/vhost/vhost.c | 50 +++
 drivers/vhost/vhost.h |  2 +-
 2 files changed, 18 insertions(+), 34 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 074273020849..7a9f93eae225 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -235,7 +235,7 @@ void vhost_dev_flush(struct vhost_dev *dev)
 {
struct vhost_flush_struct flush;
 
-   if (dev->worker) {
+   if (dev->worker.vtsk) {
init_completion(&flush.wait_event);
vhost_work_init(&flush.work, vhost_flush_work);
 
@@ -247,7 +247,7 @@ EXPORT_SYMBOL_GPL(vhost_dev_flush);
 
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 {
-   if (!dev->worker)
+   if (!dev->worker.vtsk)
return;
 
if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) {
@@ -255,8 +255,8 @@ void vhost_work_queue(struct vhost_dev *dev, struct 
vhost_work *work)
 * sure it was not in the list.
 * test_and_set_bit() implies a memory barrier.
 */
-   llist_add(&work->node, &dev->worker->work_list);
-   vhost_task_wake(dev->worker->vtsk);
+   llist_add(&work->node, &dev->worker.work_list);
+   vhost_task_wake(dev->worker.vtsk);
}
 }
 EXPORT_SYMBOL_GPL(vhost_work_queue);
@@ -264,7 +264,7 @@ EXPORT_SYMBOL_GPL(vhost_work_queue);
 /* A lockless hint for busy polling code to exit the loop */
 bool vhost_has_work(struct vhost_dev *dev)
 {
-   return dev->worker && !llist_empty(&dev->worker->work_list);
+   return !llist_empty(&dev->worker.work_list);
 }
 EXPORT_SYMBOL_GPL(vhost_has_work);
 
@@ -456,7 +456,8 @@ void vhost_dev_init(struct vhost_dev *dev,
dev->umem = NULL;
dev->iotlb = NULL;
dev->mm = NULL;
-   dev->worker = NULL;
+   memset(&dev->worker, 0, sizeof(dev->worker));
+   init_llist_head(&dev->worker.work_list);
dev->iov_limit = iov_limit;
dev->weight = weight;
dev->byte_weight = byte_weight;
@@ -530,47 +531,30 @@ static void vhost_detach_mm(struct vhost_dev *dev)
 
 static void vhost_worker_free(struct vhost_dev *dev)
 {
-   struct vhost_worker *worker = dev->worker;
-
-   if (!worker)
+   if (!dev->worker.vtsk)
return;
 
-   dev->worker = NULL;
-   WARN_ON(!llist_empty(&worker->work_list));
-   vhost_task_stop(worker->vtsk);
-   kfree(worker);
+   WARN_ON(!llist_empty(&dev->worker.work_list));
+   vhost_task_stop(dev->worker.vtsk);
+   dev->worker.kcov_handle = 0;
+   dev->worker.vtsk = NULL;
 }
 
 static int vhost_worker_create(struct vhost_dev *dev)
 {
-   struct vhost_worker *worker;
struct vhost_task *vtsk;
char name[TASK_COMM_LEN];
-   int ret;
-
-   worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT);
-   if (!worker)
-   return -ENOMEM;
 
-   dev->worker = worker;
-   worker->kcov_handle = kcov_common_handle();
-   init_llist_head(&worker->work_list);
snprintf(name, sizeof(name), "vhost-%d", current->pid);
 
-   vtsk = vhost_task_create(vhost_worker, worker, name);
-   if (!vtsk) {
-   ret = -ENOMEM;
-   goto free_worker;
-   }
+   vtsk = vhost_task_create(vhost_worker, &dev->worker, name);
+   if (!vtsk)
+   return -ENOMEM;
 
-   worker->vtsk = vtsk;
+   dev->worker.kcov_handle = kcov_common_handle();
+   dev->worker.vtsk = vtsk;
vhost_task_start(vtsk);
return 0;
-
-free_worker:
-   kfree(worker);
-   dev->worker = NULL;
-   return ret;
 }
 
 /* Caller should have device mutex */
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 0308638cdeee..305ec8593d46 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -154,7 +154,7 @@ struct 

[PATCH] vdpa/mlx5: Support interrupt bypassing

2023-06-07 Thread Dragos Tatulea via Virtualization
From: Eli Cohen 

Add support for generation of interrupts from the device directly to the
VM to the VCPU thus avoiding the overhead on the host CPU.

When supported, the driver will attempt to allocate vectors for each
data virtqueue. If a vector for a virtqueue cannot be provided it will
use the QP mode where notifications go through the driver.

In addition, we add a shutdown callback to make sure allocated
interrupts are released in case of shutdown to allow clean shutdown.

Signed-off-by: Eli Cohen 
Signed-off-by: Saeed Mahameed 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 165 --
 drivers/vdpa/mlx5/net/mlx5_vnet.h |  15 +++
 2 files changed, 171 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 279ac6a558d2..9138ef2fb2c8 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -83,6 +83,7 @@ struct mlx5_vq_restore_info {
u64 driver_addr;
u16 avail_index;
u16 used_index;
+   struct msi_map map;
bool ready;
bool restore;
 };
@@ -118,6 +119,7 @@ struct mlx5_vdpa_virtqueue {
u16 avail_idx;
u16 used_idx;
int fw_state;
+   struct msi_map map;
 
/* keep last in the struct */
struct mlx5_vq_restore_info ri;
@@ -808,6 +810,13 @@ static bool counters_supported(const struct mlx5_vdpa_dev 
*mvdev)
   BIT_ULL(MLX5_OBJ_TYPE_VIRTIO_Q_COUNTERS);
 }
 
+static bool msix_mode_supported(struct mlx5_vdpa_dev *mvdev)
+{
+   return MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, event_mode) &
+   (1 << MLX5_VIRTIO_Q_EVENT_MODE_MSIX_MODE) &&
+   pci_msix_can_alloc_dyn(mvdev->mdev->pdev);
+}
+
 static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq)
 {
int inlen = MLX5_ST_SZ_BYTES(create_virtio_net_q_in);
@@ -849,9 +858,15 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
if (vq_is_tx(mvq->index))
MLX5_SET(virtio_net_q_object, obj_context, tisn_or_qpn, 
ndev->res.tisn);
 
-   MLX5_SET(virtio_q, vq_ctx, event_mode, 
MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE);
+   if (mvq->map.virq) {
+   MLX5_SET(virtio_q, vq_ctx, event_mode, 
MLX5_VIRTIO_Q_EVENT_MODE_MSIX_MODE);
+   MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->map.index);
+   } else {
+   MLX5_SET(virtio_q, vq_ctx, event_mode, 
MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE);
+   MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, 
mvq->fwqp.mqp.qpn);
+   }
+
MLX5_SET(virtio_q, vq_ctx, queue_index, mvq->index);
-   MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->fwqp.mqp.qpn);
MLX5_SET(virtio_q, vq_ctx, queue_size, mvq->num_ent);
MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
 !!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_F_VERSION_1)));
@@ -1194,6 +1209,56 @@ static void counter_set_dealloc(struct mlx5_vdpa_net 
*ndev, struct mlx5_vdpa_vir
mlx5_vdpa_warn(&ndev->mvdev, "dealloc counter set 0x%x\n", 
mvq->counter_set_id);
 }
 
+static irqreturn_t mlx5_vdpa_int_handler(int irq, void *priv)
+{
+   struct vdpa_callback *cb = priv;
+
+   if (cb->callback)
+   return cb->callback(cb->private);
+
+   return IRQ_HANDLED;
+}
+
+static void alloc_vector(struct mlx5_vdpa_net *ndev,
+struct mlx5_vdpa_virtqueue *mvq)
+{
+   struct mlx5_vdpa_irq_pool *irqp = &ndev->irqp;
+   struct mlx5_vdpa_irq_pool_entry *ent;
+   int err;
+   int i;
+
+   for (i = 0; i < irqp->num_ent; i++) {
+   ent = &irqp->entries[i];
+   if (!ent->used) {
+   snprintf(ent->name, MLX5_VDPA_IRQ_NAME_LEN, "%s-vq-%d",
+dev_name(&ndev->mvdev.vdev.dev), mvq->index);
+   ent->dev_id = &ndev->event_cbs[mvq->index];
+   err = request_irq(ent->map.virq, mlx5_vdpa_int_handler, 
0,
+ ent->name, ent->dev_id);
+   if (err)
+   return;
+
+   ent->used = true;
+   mvq->map = ent->map;
+   return;
+   }
+   }
+}
+
+static void dealloc_vector(struct mlx5_vdpa_net *ndev,
+  struct mlx5_vdpa_virtqueue *mvq)
+{
+   struct mlx5_vdpa_irq_pool *irqp = &ndev->irqp;
+   int i;
+
+   for (i = 0; i < irqp->num_ent; i++)
+   if (mvq->map.virq == irqp->entries[i].map.virq) {
+   free_irq(mvq->map.virq, irqp->entries[i].dev_id);
+   irqp->entries[i].used = false;
+   return;
+   }
+}
+
 static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
 {
u16 idx = mvq->index;
@@ -1223,27 +1288,31 @@ static int setup

Re: [PATCH RFC 0/4] x86/fixmap: Unify FIXADDR_TOP

2023-06-07 Thread Thomas Gleixner
On Mon, May 15 2023 at 16:19, Hou Wenlong wrote:

> This patchset unifies FIXADDR_TOP as a variable for x86, allowing the
> fixmap area to be movable and relocated with the kernel image in the
> x86/PIE patchset [0]. This enables the kernel image to be relocated in
> the top 512G of the address space.

What for? What's the use case.

Please provide a proper argument why this is generally useful and
important.

Thanks,

tglx
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP

2023-06-07 Thread Christoph Hellwig
On Mon, Jun 05, 2023 at 09:58:21AM +0800, Xuan Zhuo wrote:
> On Fri, 2 Jun 2023 23:29:02 -0700, Jakub Kicinski  wrote:
> > On Fri,  2 Jun 2023 17:21:56 +0800 Xuan Zhuo wrote:
> > > Thanks for the help from Christoph.
> >
> > That said you haven't CCed him on the series, isn't the general rule to
> > CC anyone who was involved in previous discussions?
> 
> 
> Sorry, I forgot to add cc after git format-patch.

So I've been looking for this series elsewhere, but it seems to include
neither lkml nor the iommu list, so I can't find it.  Can you please
repost it?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-07 Thread Liang Chen
On Wed, Jun 7, 2023 at 5:36 PM Xuan Zhuo  wrote:
>
> On Wed, 7 Jun 2023 17:08:59 +0800, Liang Chen  
> wrote:
> > On Tue, May 30, 2023 at 9:19 AM Liang Chen  
> > wrote:
> > >
> > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > > The implementation at the moment uses one page per packet in both 
> > > > > > > the
> > > > > > > normal and XDP path. In addition, introducing a module parameter 
> > > > > > > to enable
> > > > > > > or disable the usage of page pool (disabled by default).
> > > > > > >
> > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > performance gain
> > > > > > > in the normal path.
> > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > >
> > > > > > > In multi-core vm testing environments, The most significant 
> > > > > > > performance
> > > > > > > gain is observed in XDP cpumap:
> > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > > >
> > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > fragmentation and
> > > > > > > DMA map/unmap support.
> > > > > > >
> > > > > > > Signed-off-by: Liang Chen 
> > > > > >
> > > > > > Why off by default?
> > > > > > I am guessing it sometimes has performance costs too?
> > > > > >
> > > > > >
> > > > > > What happens if we use page pool for big mode too?
> > > > > > The less modes we have the better...
> > > > > >
> > > > > >
> > > > >
> > > > > Sure, now I believe it makes sense to enable it by default. When the
> > > > > packet size is very small, it reduces the likelihood of skb
> > > > > coalescing. But such cases are rare.
> > > >
> > > > small packets are rare? These workloads are easy to create actually.
> > > > Pls try and include benchmark with small packet size.
> > > >
> > >
> > > Sure, Thanks!
> >
> > Before going ahead and posting v2 patch, I would like to hear more
> > advice for the cases of small packets. I have done more performance
> > benchmark with small packets since then. Here is a list of iperf
> > output,
>
> Could you show the commnad line?
>
> Thanks
>
>

Sure.   iperf3 -c  -i 5 -f g -t 0 -l 

> >
> > With PP and PP fragmenting:
> > 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0144 
> > KBytes
> > 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> > 223 KBytes
> > 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> > 324 KBytes
> > 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> > 1.08 MBytes
> > 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> > 744 KBytes
> > 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0963 
> > KBytes
> > 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   1.25 
> > MBytes
> > 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   1.70 
> > MBytes
> > 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   4.26 
> > MBytes
> > 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   3.20 
> > MBytes
> >
> > Without PP:
> > 256: [  5] 680.00-685.00 sec  1.57 GBytes  2.69 Gbits/sec0359 
> > KBytes
> > 1K:  [  5]  75.00-80.00  sec  5.47 GBytes  9.40 Gbits/sec0730 
> > KBytes
> > 2K:  [  5]  65.00-70.00  sec  9.46 GBytes  16.2 Gbits/sec0   1.99 
> > MBytes
> > 4K:  [  5]  30.00-35.00  sec  14.5 GBytes  25.0 Gbits/sec0   1.20 
> > MBytes
> > 8K:  [  5]  45.00-50.00  sec  19.9 GBytes  34.1 Gbits/sec0   1.72 
> > MBytes
> > 16K:[  5]   5.00-10.00  sec  23.8 GBytes  40.9 Gbits/sec0   2.90 
> > MBytes
> > 32K:[  5]  15.00-20.00  sec  28.0 GBytes  48.1 Gbits/sec0   3.03 
> > MBytes
> > 64K:[  5]  60.00-65.00  sec  31.8 GBytes  54.6 Gbits/sec0   3.05 
> > MBytes
> > 128K:  [  5]  45.00-50.00  sec  33.0 GBytes  56.6 Gbits/sec1   3.03 
> > MBytes
> > 256K:  [  5]  25.00-30.00  sec  34.7 GBytes  59.6 Gbits/sec0   3.11 
> > MBytes
> >
> >
> > The major factor contributing to the performance drop is the reduction
> > of skb coalescing. Additionally, without the page pool, small packets
> > can still benefit from the allocation of 8 continuous pages by
> > breaking them down into smaller pieces. This effectively reduces the
> > frequency of page allocation from the buddy system. For instance, the
> > arrival of 32 1K packets only triggers one alloc_page call. Therefore,
> > the benefits of using a page pool are limited in such cases. In fact,
> > without page pool fragmenting enabled, it can even hinder performance
> > from this perspective.
> >
> > Upon further consideration, I ten

Re: [PATCH RFC 0/4] x86/fixmap: Unify FIXADDR_TOP

2023-06-07 Thread Dave Hansen
On 5/15/23 01:19, Hou Wenlong wrote:
> This patchset unifies FIXADDR_TOP as a variable for x86, allowing the
> fixmap area to be movable and relocated with the kernel image in the
> x86/PIE patchset [0]. This enables the kernel image to be relocated in
> the top 512G of the address space.

What problems does this patch set solve?  How might that solution be
visible to end users?  Why is this problem important to you?

Also, while you're waiting for someone to review _your_ code, have you
considered reviewing anyone else's code?  I don't think I've seen any
review activity from you lately.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost-vdpa: filter VIRTIO_F_RING_PACKED feature

2023-06-07 Thread Michael S. Tsirkin
On Wed, Jun 07, 2023 at 10:39:15AM +0200, Stefano Garzarella wrote:
> On Tue, Jun 6, 2023 at 2:58 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Jun 06, 2023 at 09:29:22AM +0800, Jason Wang wrote:
> > > On Mon, Jun 5, 2023 at 10:58 PM Stefano Garzarella  
> > > wrote:
> > > >
> > > > On Mon, Jun 05, 2023 at 09:54:57AM -0400, Michael S. Tsirkin wrote:
> > > > >On Mon, Jun 05, 2023 at 03:30:35PM +0200, Stefano Garzarella wrote:
> > > > >> On Mon, Jun 05, 2023 at 09:00:25AM -0400, Michael S. Tsirkin wrote:
> > > > >> > On Mon, Jun 05, 2023 at 02:54:20PM +0200, Stefano Garzarella wrote:
> > > > >> > > On Mon, Jun 05, 2023 at 08:41:54AM -0400, Michael S. Tsirkin 
> > > > >> > > wrote:
> > > > >> > > > On Mon, Jun 05, 2023 at 01:06:44PM +0200, Stefano Garzarella 
> > > > >> > > > wrote:
> > > > >> > > > > vhost-vdpa IOCTLs (eg. VHOST_GET_VRING_BASE, 
> > > > >> > > > > VHOST_SET_VRING_BASE)
> > > > >> > > > > don't support packed virtqueue well yet, so let's filter the
> > > > >> > > > > VIRTIO_F_RING_PACKED feature for now in 
> > > > >> > > > > vhost_vdpa_get_features().
> > > > >> > > > >
> > > > >> > > > > This way, even if the device supports it, we don't risk it 
> > > > >> > > > > being
> > > > >> > > > > negotiated, then the VMM is unable to set the vring state 
> > > > >> > > > > properly.
> > > > >> > > > >
> > > > >> > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
> > > > >> > > > > Cc: sta...@vger.kernel.org
> > > > >> > > > > Signed-off-by: Stefano Garzarella 
> > > > >> > > > > ---
> > > > >> > > > >
> > > > >> > > > > Notes:
> > > > >> > > > > This patch should be applied before the "[PATCH v2 0/3] 
> > > > >> > > > > vhost_vdpa:
> > > > >> > > > > better PACKED support" series [1] and backported in 
> > > > >> > > > > stable branches.
> > > > >> > > > >
> > > > >> > > > > We can revert it when we are sure that everything is 
> > > > >> > > > > working with
> > > > >> > > > > packed virtqueues.
> > > > >> > > > >
> > > > >> > > > > Thanks,
> > > > >> > > > > Stefano
> > > > >> > > > >
> > > > >> > > > > [1] 
> > > > >> > > > > https://lore.kernel.org/virtualization/20230424225031.18947-1-shannon.nel...@amd.com/
> > > > >> > > >
> > > > >> > > > I'm a bit lost here. So why am I merging "better PACKED 
> > > > >> > > > support" then?
> > > > >> > >
> > > > >> > > To really support packed virtqueue with vhost-vdpa, at that 
> > > > >> > > point we would
> > > > >> > > also have to revert this patch.
> > > > >> > >
> > > > >> > > I wasn't sure if you wanted to queue the series for this merge 
> > > > >> > > window.
> > > > >> > > In that case do you think it is better to send this patch only 
> > > > >> > > for stable
> > > > >> > > branches?
> > > > >> > > > Does this patch make them a NOP?
> > > > >> > >
> > > > >> > > Yep, after applying the "better PACKED support" series and being
> > > > >> > > sure that
> > > > >> > > the IOCTLs of vhost-vdpa support packed virtqueue, we should 
> > > > >> > > revert this
> > > > >> > > patch.
> > > > >> > >
> > > > >> > > Let me know if you prefer a different approach.
> > > > >> > >
> > > > >> > > I'm concerned that QEMU uses vhost-vdpa IOCTLs thinking that the 
> > > > >> > > kernel
> > > > >> > > interprets them the right way, when it does not.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Stefano
> > > > >> > >
> > > > >> >
> > > > >> > If this fixes a bug can you add Fixes tags to each of them? Then 
> > > > >> > it's ok
> > > > >> > to merge in this window. Probably easier than the elaborate
> > > > >> > mask/unmask dance.
> > > > >>
> > > > >> CCing Shannon (the original author of the "better PACKED support"
> > > > >> series).
> > > > >>
> > > > >> IIUC Shannon is going to send a v3 of that series to fix the
> > > > >> documentation, so Shannon can you also add the Fixes tags?
> > > > >>
> > > > >> Thanks,
> > > > >> Stefano
> > > > >
> > > > >Well this is in my tree already. Just reply with
> > > > >Fixes: <>
> > > > >to each and I will add these tags.
> > > >
> > > > I tried, but it is not easy since we added the support for packed
> > > > virtqueue in vdpa and vhost incrementally.
> > > >
> > > > Initially I was thinking of adding the same tag used here:
> > > >
> > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
> > > >
> > > > Then I discovered that vq_state wasn't there, so I was thinking of
> > > >
> > > > Fixes: 530a5678bc00 ("vdpa: support packed virtqueue for 
> > > > set/get_vq_state()")
> > > >
> > > > So we would have to backport quite a few patches into the stable 
> > > > branches.
> > > > I don't know if it's worth it...
> > > >
> > > > I still think it is better to disable packed in the stable branches,
> > > > otherwise I have to make a list of all the patches we need.
> > > >
> > > > Any other ideas?
> > >
> > > AFAIK, except for vp_vdpa, pds seems to be the first parent that
> > > supports packed virtqueue. Users should not notice anything wrong if
> > > they don't

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-07 Thread Xuan Zhuo
On Wed, 7 Jun 2023 17:08:59 +0800, Liang Chen  wrote:
> On Tue, May 30, 2023 at 9:19 AM Liang Chen  wrote:
> >
> > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  wrote:
> > >
> > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > The implementation at the moment uses one page per packet in both 
> > > > > > the
> > > > > > normal and XDP path. In addition, introducing a module parameter to 
> > > > > > enable
> > > > > > or disable the usage of page pool (disabled by default).
> > > > > >
> > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > performance gain
> > > > > > in the normal path.
> > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > >
> > > > > > In multi-core vm testing environments, The most significant 
> > > > > > performance
> > > > > > gain is observed in XDP cpumap:
> > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > >
> > > > > > With this foundation, we can further integrate page pool 
> > > > > > fragmentation and
> > > > > > DMA map/unmap support.
> > > > > >
> > > > > > Signed-off-by: Liang Chen 
> > > > >
> > > > > Why off by default?
> > > > > I am guessing it sometimes has performance costs too?
> > > > >
> > > > >
> > > > > What happens if we use page pool for big mode too?
> > > > > The less modes we have the better...
> > > > >
> > > > >
> > > >
> > > > Sure, now I believe it makes sense to enable it by default. When the
> > > > packet size is very small, it reduces the likelihood of skb
> > > > coalescing. But such cases are rare.
> > >
> > > small packets are rare? These workloads are easy to create actually.
> > > Pls try and include benchmark with small packet size.
> > >
> >
> > Sure, Thanks!
>
> Before going ahead and posting v2 patch, I would like to hear more
> advice for the cases of small packets. I have done more performance
> benchmark with small packets since then. Here is a list of iperf
> output,

Could you show the commnad line?

Thanks


>
> With PP and PP fragmenting:
> 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0144 
> KBytes
> 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> 223 KBytes
> 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> 324 KBytes
> 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> 1.08 MBytes
> 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> 744 KBytes
> 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0963 
> KBytes
> 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   1.25 
> MBytes
> 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   1.70 
> MBytes
> 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   4.26 
> MBytes
> 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   3.20 
> MBytes
>
> Without PP:
> 256: [  5] 680.00-685.00 sec  1.57 GBytes  2.69 Gbits/sec0359 
> KBytes
> 1K:  [  5]  75.00-80.00  sec  5.47 GBytes  9.40 Gbits/sec0730 
> KBytes
> 2K:  [  5]  65.00-70.00  sec  9.46 GBytes  16.2 Gbits/sec0   1.99 
> MBytes
> 4K:  [  5]  30.00-35.00  sec  14.5 GBytes  25.0 Gbits/sec0   1.20 
> MBytes
> 8K:  [  5]  45.00-50.00  sec  19.9 GBytes  34.1 Gbits/sec0   1.72 
> MBytes
> 16K:[  5]   5.00-10.00  sec  23.8 GBytes  40.9 Gbits/sec0   2.90 
> MBytes
> 32K:[  5]  15.00-20.00  sec  28.0 GBytes  48.1 Gbits/sec0   3.03 
> MBytes
> 64K:[  5]  60.00-65.00  sec  31.8 GBytes  54.6 Gbits/sec0   3.05 
> MBytes
> 128K:  [  5]  45.00-50.00  sec  33.0 GBytes  56.6 Gbits/sec1   3.03 MBytes
> 256K:  [  5]  25.00-30.00  sec  34.7 GBytes  59.6 Gbits/sec0   3.11 MBytes
>
>
> The major factor contributing to the performance drop is the reduction
> of skb coalescing. Additionally, without the page pool, small packets
> can still benefit from the allocation of 8 continuous pages by
> breaking them down into smaller pieces. This effectively reduces the
> frequency of page allocation from the buddy system. For instance, the
> arrival of 32 1K packets only triggers one alloc_page call. Therefore,
> the benefits of using a page pool are limited in such cases. In fact,
> without page pool fragmenting enabled, it can even hinder performance
> from this perspective.
>
> Upon further consideration, I tend to believe making page pool the
> default option may not be appropriate. As you pointed out, we cannot
> simply ignore the performance impact on small packets. Any comments on
> this will be much appreciated.
>
>
> Thanks,
> Liang
>
>
> > > > The usage of page pool for big mode is being evaluated now. Thanks!
> > > >
> > > > > > ---
> > > > > >

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-07 Thread Xuan Zhuo
On Wed, 7 Jun 2023 17:11:44 +0800, Liang Chen  wrote:
> On Wed, May 31, 2023 at 11:12 AM Xuan Zhuo  wrote:
> >
> > On Mon, 29 May 2023 15:28:17 +0800, Liang Chen  
> > wrote:
> > > On Sun, May 28, 2023 at 2:40 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Sat, May 27, 2023 at 08:35:01PM +0800, Liang Chen wrote:
> > > > > On Fri, May 26, 2023 at 2:51 PM Jason Wang  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, May 26, 2023 at 1:46 PM Liang Chen 
> > > > > >  wrote:
> > > > > > >
> > > > > > > The implementation at the moment uses one page per packet in both 
> > > > > > > the
> > > > > > > normal and XDP path.
> > > > > >
> > > > > > It's better to explain why we need a page pool and how it can help 
> > > > > > the
> > > > > > performance.
> > > > > >
> > > > >
> > > > > Sure, I will include that on v2.
> > > > > > > In addition, introducing a module parameter to enable
> > > > > > > or disable the usage of page pool (disabled by default).
> > > > > >
> > > > > > If page pool wins for most of the cases, any reason to disable it 
> > > > > > by default?
> > > > > >
> > > > >
> > > > > Thank you for raising the point. It does make sense to enable it by 
> > > > > default.
> > > >
> > > > I'd like to see more benchmarks pls then, with a variety of packet
> > > > sizes, udp and tcp.
> > > >
> > >
> > > Sure, more benchmarks will be provided. Thanks.
> >
> >
> > I think so.
> >
> > I did this, but I did not found any improve. So I gave up it.
> >
> > Thanks.
> >
> >
>
> Our UDP benchmark shows a steady 0.8 percent change in PPS
> measurement. However, when conducting iperf TCP stream performance
> testing, the results vary depending on the packet size and testing
> setup. With small packet sizes, the performance actually drops
> slightly due to the reasons I explained in the previous email. On the
> other hand, with large packets, we need to ensure that the sender side
> doesn't become the bottleneck. To achieve this, our setup uses a
> single-core vm to keep the receiver busy, which allows us to identify
> performance differences in the receiving path.

Could you show some numbers?

Thanks.


>
>
> Thanks,
> Liang
>
>
>
>
> > >
> > >
> > > > > > >
> > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > performance gain
> > > > > > > in the normal path.
> > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > >
> > > > > > > In multi-core vm testing environments, The most significant 
> > > > > > > performance
> > > > > > > gain is observed in XDP cpumap:
> > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > >
> > > > > > Please show more details on the test. E.g which kinds of tests have
> > > > > > you measured?
> > > > > >
> > > > > > Btw, it would be better to measure PPS as well.
> > > > > >
> > > > >
> > > > > Sure. It will be added on v2.
> > > > > > >
> > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > fragmentation and
> > > > > > > DMA map/unmap support.
> > > > > > >
> > > > > > > Signed-off-by: Liang Chen 
> > > > > > > ---
> > > > > > >  drivers/net/virtio_net.c | 188 
> > > > > > > ++-
> > > > > >
> > > > > > I believe we should make virtio-net to select CONFIG_PAGE_POOL or do
> > > > > > the ifdef tricks at least.
> > > > > >
> > > > >
> > > > > Sure. it will be done on v2.
> > > > > > >  1 file changed, 146 insertions(+), 42 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > index c5dca0d92e64..99c0ca0c1781 100644
> > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444);
> > > > > > >  module_param(gso, bool, 0444);
> > > > > > >  module_param(napi_tx, bool, 0644);
> > > > > > >
> > > > > > > +static bool page_pool_enabled;
> > > > > > > +module_param(page_pool_enabled, bool, 0400);
> > > > > > > +
> > > > > > >  /* FIXME: MTU in config. */
> > > > > > >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > > > > > >  #define GOOD_COPY_LEN  128
> > > > > > > @@ -159,6 +162,9 @@ struct receive_queue {
> > > > > > > /* Chain pages by the private ptr. */
> > > > > > > struct page *pages;
> > > > > > >
> > > > > > > +   /* Page pool */
> > > > > > > +   struct page_pool *page_pool;
> > > > > > > +
> > > > > > > /* Average packet length for mergeable receive buffers. */
> > > > > > > struct ewma_pkt_len mrg_avg_pkt_len;
> > > > > > >
> > > > > > > @@ -459,6 +465,14 @@ static struct sk_buff 
> > > > > > > *virtnet_build_skb(void *buf, unsigned int buflen,
> > > > > > > return skb;
> > > > > > >  }
> > > > > > >
> > > > > > > +static void virtnet_put_page(struct receive_queue *rq, struct 
> > > > > > > page *page)
> > >

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-07 Thread Liang Chen
On Wed, May 31, 2023 at 11:12 AM Xuan Zhuo  wrote:
>
> On Mon, 29 May 2023 15:28:17 +0800, Liang Chen  
> wrote:
> > On Sun, May 28, 2023 at 2:40 PM Michael S. Tsirkin  wrote:
> > >
> > > On Sat, May 27, 2023 at 08:35:01PM +0800, Liang Chen wrote:
> > > > On Fri, May 26, 2023 at 2:51 PM Jason Wang  wrote:
> > > > >
> > > > > On Fri, May 26, 2023 at 1:46 PM Liang Chen 
> > > > >  wrote:
> > > > > >
> > > > > > The implementation at the moment uses one page per packet in both 
> > > > > > the
> > > > > > normal and XDP path.
> > > > >
> > > > > It's better to explain why we need a page pool and how it can help the
> > > > > performance.
> > > > >
> > > >
> > > > Sure, I will include that on v2.
> > > > > > In addition, introducing a module parameter to enable
> > > > > > or disable the usage of page pool (disabled by default).
> > > > >
> > > > > If page pool wins for most of the cases, any reason to disable it by 
> > > > > default?
> > > > >
> > > >
> > > > Thank you for raising the point. It does make sense to enable it by 
> > > > default.
> > >
> > > I'd like to see more benchmarks pls then, with a variety of packet
> > > sizes, udp and tcp.
> > >
> >
> > Sure, more benchmarks will be provided. Thanks.
>
>
> I think so.
>
> I did this, but I did not found any improve. So I gave up it.
>
> Thanks.
>
>

Our UDP benchmark shows a steady 0.8 percent change in PPS
measurement. However, when conducting iperf TCP stream performance
testing, the results vary depending on the packet size and testing
setup. With small packet sizes, the performance actually drops
slightly due to the reasons I explained in the previous email. On the
other hand, with large packets, we need to ensure that the sender side
doesn't become the bottleneck. To achieve this, our setup uses a
single-core vm to keep the receiver busy, which allows us to identify
performance differences in the receiving path.


Thanks,
Liang




> >
> >
> > > > > >
> > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > performance gain
> > > > > > in the normal path.
> > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > >
> > > > > > In multi-core vm testing environments, The most significant 
> > > > > > performance
> > > > > > gain is observed in XDP cpumap:
> > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > >
> > > > > Please show more details on the test. E.g which kinds of tests have
> > > > > you measured?
> > > > >
> > > > > Btw, it would be better to measure PPS as well.
> > > > >
> > > >
> > > > Sure. It will be added on v2.
> > > > > >
> > > > > > With this foundation, we can further integrate page pool 
> > > > > > fragmentation and
> > > > > > DMA map/unmap support.
> > > > > >
> > > > > > Signed-off-by: Liang Chen 
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 188 
> > > > > > ++-
> > > > >
> > > > > I believe we should make virtio-net to select CONFIG_PAGE_POOL or do
> > > > > the ifdef tricks at least.
> > > > >
> > > >
> > > > Sure. it will be done on v2.
> > > > > >  1 file changed, 146 insertions(+), 42 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index c5dca0d92e64..99c0ca0c1781 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444);
> > > > > >  module_param(gso, bool, 0444);
> > > > > >  module_param(napi_tx, bool, 0644);
> > > > > >
> > > > > > +static bool page_pool_enabled;
> > > > > > +module_param(page_pool_enabled, bool, 0400);
> > > > > > +
> > > > > >  /* FIXME: MTU in config. */
> > > > > >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > > > > >  #define GOOD_COPY_LEN  128
> > > > > > @@ -159,6 +162,9 @@ struct receive_queue {
> > > > > > /* Chain pages by the private ptr. */
> > > > > > struct page *pages;
> > > > > >
> > > > > > +   /* Page pool */
> > > > > > +   struct page_pool *page_pool;
> > > > > > +
> > > > > > /* Average packet length for mergeable receive buffers. */
> > > > > > struct ewma_pkt_len mrg_avg_pkt_len;
> > > > > >
> > > > > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void 
> > > > > > *buf, unsigned int buflen,
> > > > > > return skb;
> > > > > >  }
> > > > > >
> > > > > > +static void virtnet_put_page(struct receive_queue *rq, struct page 
> > > > > > *page)
> > > > > > +{
> > > > > > +   if (rq->page_pool)
> > > > > > +   page_pool_put_full_page(rq->page_pool, page, true);
> > > > > > +   else
> > > > > > +   put_page(page);
> > > > > > +}
> > > > > > +
> > > > > >  /* Called from bottom half context */
> > > > > >  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > >   

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-07 Thread Liang Chen
On Tue, May 30, 2023 at 9:19 AM Liang Chen  wrote:
>
> On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > The implementation at the moment uses one page per packet in both the
> > > > > normal and XDP path. In addition, introducing a module parameter to 
> > > > > enable
> > > > > or disable the usage of page pool (disabled by default).
> > > > >
> > > > > In single-core vm testing environments, it gives a modest performance 
> > > > > gain
> > > > > in the normal path.
> > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > >
> > > > > In multi-core vm testing environments, The most significant 
> > > > > performance
> > > > > gain is observed in XDP cpumap:
> > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > >
> > > > > With this foundation, we can further integrate page pool 
> > > > > fragmentation and
> > > > > DMA map/unmap support.
> > > > >
> > > > > Signed-off-by: Liang Chen 
> > > >
> > > > Why off by default?
> > > > I am guessing it sometimes has performance costs too?
> > > >
> > > >
> > > > What happens if we use page pool for big mode too?
> > > > The less modes we have the better...
> > > >
> > > >
> > >
> > > Sure, now I believe it makes sense to enable it by default. When the
> > > packet size is very small, it reduces the likelihood of skb
> > > coalescing. But such cases are rare.
> >
> > small packets are rare? These workloads are easy to create actually.
> > Pls try and include benchmark with small packet size.
> >
>
> Sure, Thanks!

Before going ahead and posting v2 patch, I would like to hear more
advice for the cases of small packets. I have done more performance
benchmark with small packets since then. Here is a list of iperf
output,

With PP and PP fragmenting:
256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0144 KBytes
1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
223 KBytes
2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
324 KBytes
4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
1.08 MBytes
8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
744 KBytes
16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0963 KBytes
32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   1.25 MBytes
64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   1.70 MBytes
128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   4.26 MBytes
256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   3.20 MBytes

Without PP:
256: [  5] 680.00-685.00 sec  1.57 GBytes  2.69 Gbits/sec0359 KBytes
1K:  [  5]  75.00-80.00  sec  5.47 GBytes  9.40 Gbits/sec0730 KBytes
2K:  [  5]  65.00-70.00  sec  9.46 GBytes  16.2 Gbits/sec0   1.99 MBytes
4K:  [  5]  30.00-35.00  sec  14.5 GBytes  25.0 Gbits/sec0   1.20 MBytes
8K:  [  5]  45.00-50.00  sec  19.9 GBytes  34.1 Gbits/sec0   1.72 MBytes
16K:[  5]   5.00-10.00  sec  23.8 GBytes  40.9 Gbits/sec0   2.90 MBytes
32K:[  5]  15.00-20.00  sec  28.0 GBytes  48.1 Gbits/sec0   3.03 MBytes
64K:[  5]  60.00-65.00  sec  31.8 GBytes  54.6 Gbits/sec0   3.05 MBytes
128K:  [  5]  45.00-50.00  sec  33.0 GBytes  56.6 Gbits/sec1   3.03 MBytes
256K:  [  5]  25.00-30.00  sec  34.7 GBytes  59.6 Gbits/sec0   3.11 MBytes


The major factor contributing to the performance drop is the reduction
of skb coalescing. Additionally, without the page pool, small packets
can still benefit from the allocation of 8 continuous pages by
breaking them down into smaller pieces. This effectively reduces the
frequency of page allocation from the buddy system. For instance, the
arrival of 32 1K packets only triggers one alloc_page call. Therefore,
the benefits of using a page pool are limited in such cases. In fact,
without page pool fragmenting enabled, it can even hinder performance
from this perspective.

Upon further consideration, I tend to believe making page pool the
default option may not be appropriate. As you pointed out, we cannot
simply ignore the performance impact on small packets. Any comments on
this will be much appreciated.


Thanks,
Liang


> > > The usage of page pool for big mode is being evaluated now. Thanks!
> > >
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 188 
> > > > > ++-
> > > > >  1 file changed, 146 insertions(+), 42 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index c5dca0d92e64..99c0ca0c1781 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.

Re: [PATCH 1/1] vhost: Fix crash during early vhost_transport_send_pkt calls

2023-06-07 Thread Stefano Garzarella

On Tue, Jun 06, 2023 at 12:19:10PM -0500, Mike Christie wrote:

On 6/6/23 4:49 AM, Stefano Garzarella wrote:

On Mon, Jun 05, 2023 at 01:57:30PM -0500, Mike Christie wrote:

If userspace does VHOST_VSOCK_SET_GUEST_CID before VHOST_SET_OWNER we
can race where:
1. thread0 calls vhost_transport_send_pkt -> vhost_work_queue
2. thread1 does VHOST_SET_OWNER which calls vhost_worker_create.
3. vhost_worker_create will set the dev->worker pointer before setting
the worker->vtsk pointer.
4. thread0's vhost_work_queue will see the dev->worker pointer is
set and try to call vhost_task_wake using not yet set worker->vtsk
pointer.
5. We then crash since vtsk is NULL.

Before commit 6e890c5d5021 ("vhost: use vhost_tasks for worker
threads"), we only had the worker pointer so we could just check it to
see if VHOST_SET_OWNER has been done. After that commit we have the
vhost_worker and vhost_task pointers, so we can now hit the bug above.

This patch embeds the vhost_worker in the vhost_dev, so we can just
check the worker.vtsk pointer to check if VHOST_SET_OWNER has been done
like before.

Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads")


We should add:

Reported-by: syzbot+d0d442c22fa8db45f...@syzkaller.appspotmail.com



Ok. Will do.



-    }
+    vtsk = vhost_task_create(vhost_worker, &dev->worker, name);
+    if (!vtsk)
+    return -ENOMEM;

-    worker->vtsk = vtsk;
+    dev->worker.kcov_handle = kcov_common_handle();
+    dev->worker.vtsk = vtsk;


vhost_work_queue() is called by vhost_transport_send_pkt() without
holding vhost_dev.mutex (like vhost_poll_queue() in several places).

If vhost_work_queue() finds dev->worker.vtsk not NULL, how can we
be sure that for example `work_list` has been initialized?

Maybe I'm overthinking since we didn't have this problem before or the
race is really short that it never happened.


Yeah, I dropped the READ/WRITE_ONCE use because I didn't think we needed
it for the vhost_vsock_start case, and for the case you mentioned above, I
wondered the same thing as you but was not sure so I added the same
behavior as before. When I read memory-barriers.txt, it sounds like we've
been getting lucky.


Yep, it happened to me too before I highlighted it :-)



I'll add back the READ/WRITE_ONCE for vtsk access since that's what we are
keying off as signaling that the worker is ready to be used. I didn't see
any type of perf hit when using it, and from the memory-barriers.txt doc
it sounds like that's what we should be doing.



LGTM, I just wonder if RCU on dev->worker can save us from future
problems, but maybe better to do it in the next release.

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost-vdpa: filter VIRTIO_F_RING_PACKED feature

2023-06-07 Thread Stefano Garzarella
On Tue, Jun 6, 2023 at 2:58 PM Michael S. Tsirkin  wrote:
>
> On Tue, Jun 06, 2023 at 09:29:22AM +0800, Jason Wang wrote:
> > On Mon, Jun 5, 2023 at 10:58 PM Stefano Garzarella  
> > wrote:
> > >
> > > On Mon, Jun 05, 2023 at 09:54:57AM -0400, Michael S. Tsirkin wrote:
> > > >On Mon, Jun 05, 2023 at 03:30:35PM +0200, Stefano Garzarella wrote:
> > > >> On Mon, Jun 05, 2023 at 09:00:25AM -0400, Michael S. Tsirkin wrote:
> > > >> > On Mon, Jun 05, 2023 at 02:54:20PM +0200, Stefano Garzarella wrote:
> > > >> > > On Mon, Jun 05, 2023 at 08:41:54AM -0400, Michael S. Tsirkin wrote:
> > > >> > > > On Mon, Jun 05, 2023 at 01:06:44PM +0200, Stefano Garzarella 
> > > >> > > > wrote:
> > > >> > > > > vhost-vdpa IOCTLs (eg. VHOST_GET_VRING_BASE, 
> > > >> > > > > VHOST_SET_VRING_BASE)
> > > >> > > > > don't support packed virtqueue well yet, so let's filter the
> > > >> > > > > VIRTIO_F_RING_PACKED feature for now in 
> > > >> > > > > vhost_vdpa_get_features().
> > > >> > > > >
> > > >> > > > > This way, even if the device supports it, we don't risk it 
> > > >> > > > > being
> > > >> > > > > negotiated, then the VMM is unable to set the vring state 
> > > >> > > > > properly.
> > > >> > > > >
> > > >> > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
> > > >> > > > > Cc: sta...@vger.kernel.org
> > > >> > > > > Signed-off-by: Stefano Garzarella 
> > > >> > > > > ---
> > > >> > > > >
> > > >> > > > > Notes:
> > > >> > > > > This patch should be applied before the "[PATCH v2 0/3] 
> > > >> > > > > vhost_vdpa:
> > > >> > > > > better PACKED support" series [1] and backported in stable 
> > > >> > > > > branches.
> > > >> > > > >
> > > >> > > > > We can revert it when we are sure that everything is 
> > > >> > > > > working with
> > > >> > > > > packed virtqueues.
> > > >> > > > >
> > > >> > > > > Thanks,
> > > >> > > > > Stefano
> > > >> > > > >
> > > >> > > > > [1] 
> > > >> > > > > https://lore.kernel.org/virtualization/20230424225031.18947-1-shannon.nel...@amd.com/
> > > >> > > >
> > > >> > > > I'm a bit lost here. So why am I merging "better PACKED support" 
> > > >> > > > then?
> > > >> > >
> > > >> > > To really support packed virtqueue with vhost-vdpa, at that point 
> > > >> > > we would
> > > >> > > also have to revert this patch.
> > > >> > >
> > > >> > > I wasn't sure if you wanted to queue the series for this merge 
> > > >> > > window.
> > > >> > > In that case do you think it is better to send this patch only for 
> > > >> > > stable
> > > >> > > branches?
> > > >> > > > Does this patch make them a NOP?
> > > >> > >
> > > >> > > Yep, after applying the "better PACKED support" series and being
> > > >> > > sure that
> > > >> > > the IOCTLs of vhost-vdpa support packed virtqueue, we should 
> > > >> > > revert this
> > > >> > > patch.
> > > >> > >
> > > >> > > Let me know if you prefer a different approach.
> > > >> > >
> > > >> > > I'm concerned that QEMU uses vhost-vdpa IOCTLs thinking that the 
> > > >> > > kernel
> > > >> > > interprets them the right way, when it does not.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Stefano
> > > >> > >
> > > >> >
> > > >> > If this fixes a bug can you add Fixes tags to each of them? Then 
> > > >> > it's ok
> > > >> > to merge in this window. Probably easier than the elaborate
> > > >> > mask/unmask dance.
> > > >>
> > > >> CCing Shannon (the original author of the "better PACKED support"
> > > >> series).
> > > >>
> > > >> IIUC Shannon is going to send a v3 of that series to fix the
> > > >> documentation, so Shannon can you also add the Fixes tags?
> > > >>
> > > >> Thanks,
> > > >> Stefano
> > > >
> > > >Well this is in my tree already. Just reply with
> > > >Fixes: <>
> > > >to each and I will add these tags.
> > >
> > > I tried, but it is not easy since we added the support for packed
> > > virtqueue in vdpa and vhost incrementally.
> > >
> > > Initially I was thinking of adding the same tag used here:
> > >
> > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
> > >
> > > Then I discovered that vq_state wasn't there, so I was thinking of
> > >
> > > Fixes: 530a5678bc00 ("vdpa: support packed virtqueue for 
> > > set/get_vq_state()")
> > >
> > > So we would have to backport quite a few patches into the stable branches.
> > > I don't know if it's worth it...
> > >
> > > I still think it is better to disable packed in the stable branches,
> > > otherwise I have to make a list of all the patches we need.
> > >
> > > Any other ideas?
> >
> > AFAIK, except for vp_vdpa, pds seems to be the first parent that
> > supports packed virtqueue. Users should not notice anything wrong if
> > they don't use packed virtqueue. And the problem of vp_vdpa + packed
> > virtqueue came since the day0 of vp_vdpa. It seems fine to do nothing
> > I guess.
> >
> > Thanks
>
>
> I have a question though, what if down the road there
> is a new feature that needs more changes? It will be
> broken too just like PACKED no?
> Shouldn't vdp