Re: [PATCH] vhost-vdpa: filter VIRTIO_F_RING_PACKED feature
On Thu, Jun 08, 2023 at 08:42:15AM +0800, Jason Wang wrote: > On Wed, Jun 7, 2023 at 5:43 PM Michael S. Tsirkin wrote: > > > > On Wed, Jun 07, 2023 at 10:39:15AM +0200, Stefano Garzarella wrote: > > > On Tue, Jun 6, 2023 at 2:58 PM Michael S. Tsirkin wrote: > > > > > > > > On Tue, Jun 06, 2023 at 09:29:22AM +0800, Jason Wang wrote: > > > > > On Mon, Jun 5, 2023 at 10:58 PM Stefano Garzarella > > > > > wrote: > > > > > > > > > > > > On Mon, Jun 05, 2023 at 09:54:57AM -0400, Michael S. Tsirkin wrote: > > > > > > >On Mon, Jun 05, 2023 at 03:30:35PM +0200, Stefano Garzarella wrote: > > > > > > >> On Mon, Jun 05, 2023 at 09:00:25AM -0400, Michael S. Tsirkin > > > > > > >> wrote: > > > > > > >> > On Mon, Jun 05, 2023 at 02:54:20PM +0200, Stefano Garzarella > > > > > > >> > wrote: > > > > > > >> > > On Mon, Jun 05, 2023 at 08:41:54AM -0400, Michael S. Tsirkin > > > > > > >> > > wrote: > > > > > > >> > > > On Mon, Jun 05, 2023 at 01:06:44PM +0200, Stefano > > > > > > >> > > > Garzarella wrote: > > > > > > >> > > > > vhost-vdpa IOCTLs (eg. VHOST_GET_VRING_BASE, > > > > > > >> > > > > VHOST_SET_VRING_BASE) > > > > > > >> > > > > don't support packed virtqueue well yet, so let's filter > > > > > > >> > > > > the > > > > > > >> > > > > VIRTIO_F_RING_PACKED feature for now in > > > > > > >> > > > > vhost_vdpa_get_features(). > > > > > > >> > > > > > > > > > > >> > > > > This way, even if the device supports it, we don't risk > > > > > > >> > > > > it being > > > > > > >> > > > > negotiated, then the VMM is unable to set the vring > > > > > > >> > > > > state properly. > > > > > > >> > > > > > > > > > > >> > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based > > > > > > >> > > > > backend") > > > > > > >> > > > > Cc: sta...@vger.kernel.org > > > > > > >> > > > > Signed-off-by: Stefano Garzarella > > > > > > >> > > > > --- > > > > > > >> > > > > > > > > > > >> > > > > Notes: > > > > > > >> > > > > This patch should be applied before the "[PATCH v2 > > > > > > >> > > > > 0/3] vhost_vdpa: > > > > > > >> > > > > better PACKED support" series [1] and backported in > > > > > > >> > > > > stable branches. > > > > > > >> > > > > > > > > > > >> > > > > We can revert it when we are sure that everything is > > > > > > >> > > > > working with > > > > > > >> > > > > packed virtqueues. > > > > > > >> > > > > > > > > > > >> > > > > Thanks, > > > > > > >> > > > > Stefano > > > > > > >> > > > > > > > > > > >> > > > > [1] > > > > > > >> > > > > https://lore.kernel.org/virtualization/20230424225031.18947-1-shannon.nel...@amd.com/ > > > > > > >> > > > > > > > > > >> > > > I'm a bit lost here. So why am I merging "better PACKED > > > > > > >> > > > support" then? > > > > > > >> > > > > > > > > >> > > To really support packed virtqueue with vhost-vdpa, at that > > > > > > >> > > point we would > > > > > > >> > > also have to revert this patch. > > > > > > >> > > > > > > > > >> > > I wasn't sure if you wanted to queue the series for this > > > > > > >> > > merge window. > > > > > > >> > > In that case do you think it is better to send this patch > > > > > > >> > > only for stable > > > > > > >> > > branches? > > > > > > >> > > > Does this patch make them a NOP? > > > > > > >> > > > > > > > > >> > > Yep, after applying the "better PACKED support" series and > > > > > > >> > > being > > > > > > >> > > sure that > > > > > > >> > > the IOCTLs of vhost-vdpa support packed virtqueue, we should > > > > > > >> > > revert this > > > > > > >> > > patch. > > > > > > >> > > > > > > > > >> > > Let me know if you prefer a different approach. > > > > > > >> > > > > > > > > >> > > I'm concerned that QEMU uses vhost-vdpa IOCTLs thinking that > > > > > > >> > > the kernel > > > > > > >> > > interprets them the right way, when it does not. > > > > > > >> > > > > > > > > >> > > Thanks, > > > > > > >> > > Stefano > > > > > > >> > > > > > > > > >> > > > > > > > >> > If this fixes a bug can you add Fixes tags to each of them? > > > > > > >> > Then it's ok > > > > > > >> > to merge in this window. Probably easier than the elaborate > > > > > > >> > mask/unmask dance. > > > > > > >> > > > > > > >> CCing Shannon (the original author of the "better PACKED support" > > > > > > >> series). > > > > > > >> > > > > > > >> IIUC Shannon is going to send a v3 of that series to fix the > > > > > > >> documentation, so Shannon can you also add the Fixes tags? > > > > > > >> > > > > > > >> Thanks, > > > > > > >> Stefano > > > > > > > > > > > > > >Well this is in my tree already. Just reply with > > > > > > >Fixes: <> > > > > > > >to each and I will add these tags. > > > > > > > > > > > > I tried, but it is not easy since we added the support for packed > > > > > > virtqueue in vdpa and vhost incrementally. > > > > > > > > > > > > Initially I was thinking of adding the same tag used here: > > > > > > > > > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") > > > > > > > > > > > > Then I discovered that vq_s
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Thu, 8 Jun 2023 08:38:14 +0800, Jason Wang wrote: > On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin wrote: > > > > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote: > > > On Tue, May 30, 2023 at 9:19 AM Liang Chen > > > wrote: > > > > > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin > > > > wrote: > > > > > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin > > > > > > wrote: > > > > > > > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > > > > > The implementation at the moment uses one page per packet in > > > > > > > > both the > > > > > > > > normal and XDP path. In addition, introducing a module > > > > > > > > parameter to enable > > > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > > > > > > > In single-core vm testing environments, it gives a modest > > > > > > > > performance gain > > > > > > > > in the normal path. > > > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > > > > > In multi-core vm testing environments, The most significant > > > > > > > > performance > > > > > > > > gain is observed in XDP cpumap: > > > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > > > > > > > With this foundation, we can further integrate page pool > > > > > > > > fragmentation and > > > > > > > > DMA map/unmap support. > > > > > > > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > > > > > > > > > Why off by default? > > > > > > > I am guessing it sometimes has performance costs too? > > > > > > > > > > > > > > > > > > > > > What happens if we use page pool for big mode too? > > > > > > > The less modes we have the better... > > > > > > > > > > > > > > > > > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. When the > > > > > > packet size is very small, it reduces the likelihood of skb > > > > > > coalescing. But such cases are rare. > > > > > > > > > > small packets are rare? These workloads are easy to create actually. > > > > > Pls try and include benchmark with small packet size. > > > > > > > > > > > > > Sure, Thanks! > > > > > > Before going ahead and posting v2 patch, I would like to hear more > > > advice for the cases of small packets. I have done more performance > > > benchmark with small packets since then. Here is a list of iperf > > > output, > > > > > > With PP and PP fragmenting: > > > 256K: [ 5] 505.00-510.00 sec 1.34 GBytes 2.31 Gbits/sec0144 > > > KBytes > > > 1K: [ 5] 30.00-35.00 sec 4.63 GBytes 7.95 Gbits/sec0 > > > 223 KBytes > > > 2K: [ 5] 65.00-70.00 sec 8.33 GBytes 14.3 Gbits/sec0 > > > 324 KBytes > > > 4K: [ 5] 30.00-35.00 sec 13.3 GBytes 22.8 Gbits/sec0 > > > 1.08 MBytes > > > 8K: [ 5] 50.00-55.00 sec 18.9 GBytes 32.4 Gbits/sec0 > > > 744 KBytes > > > 16K: [ 5] 25.00-30.00 sec 24.6 GBytes 42.3 Gbits/sec0963 > > > KBytes > > > 32K: [ 5] 45.00-50.00 sec 29.8 GBytes 51.2 Gbits/sec0 1.25 > > > MBytes > > > 64K: [ 5] 35.00-40.00 sec 34.0 GBytes 58.4 Gbits/sec0 1.70 > > > MBytes > > > 128K: [ 5] 45.00-50.00 sec 36.7 GBytes 63.1 Gbits/sec0 4.26 > > > MBytes > > > 256K: [ 5] 30.00-35.00 sec 40.0 GBytes 68.8 Gbits/sec0 3.20 > > > MBytes > > Note that virtio-net driver is lacking things like BQL and others, so > it might suffer from buffer bloat for TCP performance. Would you mind > to measure with e.g using testpmd on the vhost to see the rx PPS? > > > > > > > Without PP: > > > 256: [ 5] 680.00-685.00 sec 1.57 GBytes 2.69 Gbits/sec0359 > > > KBytes > > > 1K: [ 5] 75.00-80.00 sec 5.47 GBytes 9.40 Gbits/sec0730 > > > KBytes > > > 2K: [ 5] 65.00-70.00 sec 9.46 GBytes 16.2 Gbits/sec0 1.99 > > > MBytes > > > 4K: [ 5] 30.00-35.00 sec 14.5 GBytes 25.0 Gbits/sec0 1.20 > > > MBytes > > > 8K: [ 5] 45.00-50.00 sec 19.9 GBytes 34.1 Gbits/sec0 1.72 > > > MBytes > > > 16K:[ 5] 5.00-10.00 sec 23.8 GBytes 40.9 Gbits/sec0 2.90 > > > MBytes > > > 32K:[ 5] 15.00-20.00 sec 28.0 GBytes 48.1 Gbits/sec0 3.03 > > > MBytes > > > 64K:[ 5] 60.00-65.00 sec 31.8 GBytes 54.6 Gbits/sec0 3.05 > > > MBytes > > > 128K: [ 5] 45.00-50.00 sec 33.0 GBytes 56.6 Gbits/sec1 3.03 > > > MBytes > > > 256K: [ 5] 25.00-30.00 sec 34.7 GBytes 59.6 Gbits/sec0 3.11 > > > MBytes > > > > > > > > > The major factor contributing to the performance drop is the reduction > > > of skb coalescing. Additionally, without the page pool, small packets > > > can still benefit from the allocation of 8 continuous pages by > > > breaking them down int
Re: [PATCH] scsi: virtio_scsi: Remove a useless function call
On Mon, 29 May 2023 09:35:08 +0200, Christophe JAILLET wrote: > 'inq_result' is known to be NULL. There is no point calling kfree(). > > Applied to 6.5/scsi-queue, thanks! [1/1] scsi: virtio_scsi: Remove a useless function call https://git.kernel.org/mkp/scsi/c/0e5e41ee3d73 -- Martin K. Petersen Oracle Linux Engineering ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost-vdpa: filter VIRTIO_F_RING_PACKED feature
On Wed, Jun 7, 2023 at 5:43 PM Michael S. Tsirkin wrote: > > On Wed, Jun 07, 2023 at 10:39:15AM +0200, Stefano Garzarella wrote: > > On Tue, Jun 6, 2023 at 2:58 PM Michael S. Tsirkin wrote: > > > > > > On Tue, Jun 06, 2023 at 09:29:22AM +0800, Jason Wang wrote: > > > > On Mon, Jun 5, 2023 at 10:58 PM Stefano Garzarella > > > > wrote: > > > > > > > > > > On Mon, Jun 05, 2023 at 09:54:57AM -0400, Michael S. Tsirkin wrote: > > > > > >On Mon, Jun 05, 2023 at 03:30:35PM +0200, Stefano Garzarella wrote: > > > > > >> On Mon, Jun 05, 2023 at 09:00:25AM -0400, Michael S. Tsirkin wrote: > > > > > >> > On Mon, Jun 05, 2023 at 02:54:20PM +0200, Stefano Garzarella > > > > > >> > wrote: > > > > > >> > > On Mon, Jun 05, 2023 at 08:41:54AM -0400, Michael S. Tsirkin > > > > > >> > > wrote: > > > > > >> > > > On Mon, Jun 05, 2023 at 01:06:44PM +0200, Stefano Garzarella > > > > > >> > > > wrote: > > > > > >> > > > > vhost-vdpa IOCTLs (eg. VHOST_GET_VRING_BASE, > > > > > >> > > > > VHOST_SET_VRING_BASE) > > > > > >> > > > > don't support packed virtqueue well yet, so let's filter > > > > > >> > > > > the > > > > > >> > > > > VIRTIO_F_RING_PACKED feature for now in > > > > > >> > > > > vhost_vdpa_get_features(). > > > > > >> > > > > > > > > > >> > > > > This way, even if the device supports it, we don't risk it > > > > > >> > > > > being > > > > > >> > > > > negotiated, then the VMM is unable to set the vring state > > > > > >> > > > > properly. > > > > > >> > > > > > > > > > >> > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") > > > > > >> > > > > Cc: sta...@vger.kernel.org > > > > > >> > > > > Signed-off-by: Stefano Garzarella > > > > > >> > > > > --- > > > > > >> > > > > > > > > > >> > > > > Notes: > > > > > >> > > > > This patch should be applied before the "[PATCH v2 > > > > > >> > > > > 0/3] vhost_vdpa: > > > > > >> > > > > better PACKED support" series [1] and backported in > > > > > >> > > > > stable branches. > > > > > >> > > > > > > > > > >> > > > > We can revert it when we are sure that everything is > > > > > >> > > > > working with > > > > > >> > > > > packed virtqueues. > > > > > >> > > > > > > > > > >> > > > > Thanks, > > > > > >> > > > > Stefano > > > > > >> > > > > > > > > > >> > > > > [1] > > > > > >> > > > > https://lore.kernel.org/virtualization/20230424225031.18947-1-shannon.nel...@amd.com/ > > > > > >> > > > > > > > > >> > > > I'm a bit lost here. So why am I merging "better PACKED > > > > > >> > > > support" then? > > > > > >> > > > > > > > >> > > To really support packed virtqueue with vhost-vdpa, at that > > > > > >> > > point we would > > > > > >> > > also have to revert this patch. > > > > > >> > > > > > > > >> > > I wasn't sure if you wanted to queue the series for this merge > > > > > >> > > window. > > > > > >> > > In that case do you think it is better to send this patch only > > > > > >> > > for stable > > > > > >> > > branches? > > > > > >> > > > Does this patch make them a NOP? > > > > > >> > > > > > > > >> > > Yep, after applying the "better PACKED support" series and > > > > > >> > > being > > > > > >> > > sure that > > > > > >> > > the IOCTLs of vhost-vdpa support packed virtqueue, we should > > > > > >> > > revert this > > > > > >> > > patch. > > > > > >> > > > > > > > >> > > Let me know if you prefer a different approach. > > > > > >> > > > > > > > >> > > I'm concerned that QEMU uses vhost-vdpa IOCTLs thinking that > > > > > >> > > the kernel > > > > > >> > > interprets them the right way, when it does not. > > > > > >> > > > > > > > >> > > Thanks, > > > > > >> > > Stefano > > > > > >> > > > > > > > >> > > > > > > >> > If this fixes a bug can you add Fixes tags to each of them? Then > > > > > >> > it's ok > > > > > >> > to merge in this window. Probably easier than the elaborate > > > > > >> > mask/unmask dance. > > > > > >> > > > > > >> CCing Shannon (the original author of the "better PACKED support" > > > > > >> series). > > > > > >> > > > > > >> IIUC Shannon is going to send a v3 of that series to fix the > > > > > >> documentation, so Shannon can you also add the Fixes tags? > > > > > >> > > > > > >> Thanks, > > > > > >> Stefano > > > > > > > > > > > >Well this is in my tree already. Just reply with > > > > > >Fixes: <> > > > > > >to each and I will add these tags. > > > > > > > > > > I tried, but it is not easy since we added the support for packed > > > > > virtqueue in vdpa and vhost incrementally. > > > > > > > > > > Initially I was thinking of adding the same tag used here: > > > > > > > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") > > > > > > > > > > Then I discovered that vq_state wasn't there, so I was thinking of > > > > > > > > > > Fixes: 530a5678bc00 ("vdpa: support packed virtqueue for > > > > > set/get_vq_state()") > > > > > > > > > > So we would have to backport quite a few patches into the stable > > > > > branches. > > > > > I don't know if it's worth it... > > > > > > > > >
Re: vhost: Fix vhost_task regressions in 6.4-rc
On 6/7/23 3:22 PM, Michael S. Tsirkin wrote: > On Wed, Jun 07, 2023 at 02:23:36PM -0500, Mike Christie wrote: >> The following patches were made over Linus's tree which contains a >> vhost change missing in mst's vhost branch. These patches fix two >> issues caused by the vhost_task patches: >> 1. I was setting dev->worker too early and this caused crashes when >> vsock would queue work before VHOST_SET_OWNER. >> >> 2. The patch that Linus's tree contains which vhost does not yet >> have converted vhost_tasks to use CLONE_THREAD. That required a >> change to how we set the task state, but I completely messed up >> and moved when we set ourself to interruptible too late. >> > > Right. and that's in rc5 yes? > Yes. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin wrote: > > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote: > > On Tue, May 30, 2023 at 9:19 AM Liang Chen > > wrote: > > > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin > > > wrote: > > > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin > > > > > wrote: > > > > > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > > > > The implementation at the moment uses one page per packet in both > > > > > > > the > > > > > > > normal and XDP path. In addition, introducing a module parameter > > > > > > > to enable > > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > > > > > In single-core vm testing environments, it gives a modest > > > > > > > performance gain > > > > > > > in the normal path. > > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > > > In multi-core vm testing environments, The most significant > > > > > > > performance > > > > > > > gain is observed in XDP cpumap: > > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > > > > > With this foundation, we can further integrate page pool > > > > > > > fragmentation and > > > > > > > DMA map/unmap support. > > > > > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > > > > > > > Why off by default? > > > > > > I am guessing it sometimes has performance costs too? > > > > > > > > > > > > > > > > > > What happens if we use page pool for big mode too? > > > > > > The less modes we have the better... > > > > > > > > > > > > > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. When the > > > > > packet size is very small, it reduces the likelihood of skb > > > > > coalescing. But such cases are rare. > > > > > > > > small packets are rare? These workloads are easy to create actually. > > > > Pls try and include benchmark with small packet size. > > > > > > > > > > Sure, Thanks! > > > > Before going ahead and posting v2 patch, I would like to hear more > > advice for the cases of small packets. I have done more performance > > benchmark with small packets since then. Here is a list of iperf > > output, > > > > With PP and PP fragmenting: > > 256K: [ 5] 505.00-510.00 sec 1.34 GBytes 2.31 Gbits/sec0144 > > KBytes > > 1K: [ 5] 30.00-35.00 sec 4.63 GBytes 7.95 Gbits/sec0 > > 223 KBytes > > 2K: [ 5] 65.00-70.00 sec 8.33 GBytes 14.3 Gbits/sec0 > > 324 KBytes > > 4K: [ 5] 30.00-35.00 sec 13.3 GBytes 22.8 Gbits/sec0 > > 1.08 MBytes > > 8K: [ 5] 50.00-55.00 sec 18.9 GBytes 32.4 Gbits/sec0 > > 744 KBytes > > 16K: [ 5] 25.00-30.00 sec 24.6 GBytes 42.3 Gbits/sec0963 > > KBytes > > 32K: [ 5] 45.00-50.00 sec 29.8 GBytes 51.2 Gbits/sec0 1.25 > > MBytes > > 64K: [ 5] 35.00-40.00 sec 34.0 GBytes 58.4 Gbits/sec0 1.70 > > MBytes > > 128K: [ 5] 45.00-50.00 sec 36.7 GBytes 63.1 Gbits/sec0 4.26 > > MBytes > > 256K: [ 5] 30.00-35.00 sec 40.0 GBytes 68.8 Gbits/sec0 3.20 > > MBytes Note that virtio-net driver is lacking things like BQL and others, so it might suffer from buffer bloat for TCP performance. Would you mind to measure with e.g using testpmd on the vhost to see the rx PPS? > > > > Without PP: > > 256: [ 5] 680.00-685.00 sec 1.57 GBytes 2.69 Gbits/sec0359 > > KBytes > > 1K: [ 5] 75.00-80.00 sec 5.47 GBytes 9.40 Gbits/sec0730 > > KBytes > > 2K: [ 5] 65.00-70.00 sec 9.46 GBytes 16.2 Gbits/sec0 1.99 > > MBytes > > 4K: [ 5] 30.00-35.00 sec 14.5 GBytes 25.0 Gbits/sec0 1.20 > > MBytes > > 8K: [ 5] 45.00-50.00 sec 19.9 GBytes 34.1 Gbits/sec0 1.72 > > MBytes > > 16K:[ 5] 5.00-10.00 sec 23.8 GBytes 40.9 Gbits/sec0 2.90 > > MBytes > > 32K:[ 5] 15.00-20.00 sec 28.0 GBytes 48.1 Gbits/sec0 3.03 > > MBytes > > 64K:[ 5] 60.00-65.00 sec 31.8 GBytes 54.6 Gbits/sec0 3.05 > > MBytes > > 128K: [ 5] 45.00-50.00 sec 33.0 GBytes 56.6 Gbits/sec1 3.03 > > MBytes > > 256K: [ 5] 25.00-30.00 sec 34.7 GBytes 59.6 Gbits/sec0 3.11 > > MBytes > > > > > > The major factor contributing to the performance drop is the reduction > > of skb coalescing. Additionally, without the page pool, small packets > > can still benefit from the allocation of 8 continuous pages by > > breaking them down into smaller pieces. This effectively reduces the > > frequency of page allocation from the buddy system. For instance, the > > arrival of 32 1K packets only triggers one alloc_page call. Therefore, > > the benefits of using a page pool are limited in such cases. I wonder if we can imp
Re: [PATCH] vhost-vdpa: filter VIRTIO_F_RING_PACKED feature
On Tue, Jun 06, 2023 at 12:09:11PM +0200, Stefano Garzarella wrote: > On Mon, Jun 05, 2023 at 05:44:50PM -0400, Michael S. Tsirkin wrote: > > On Mon, Jun 05, 2023 at 04:56:37PM +0200, Stefano Garzarella wrote: > > > On Mon, Jun 05, 2023 at 09:54:57AM -0400, Michael S. Tsirkin wrote: > > > > On Mon, Jun 05, 2023 at 03:30:35PM +0200, Stefano Garzarella wrote: > > > > > On Mon, Jun 05, 2023 at 09:00:25AM -0400, Michael S. Tsirkin wrote: > > > > > > On Mon, Jun 05, 2023 at 02:54:20PM +0200, Stefano Garzarella wrote: > > > > > > > On Mon, Jun 05, 2023 at 08:41:54AM -0400, Michael S. Tsirkin > > > > > > > wrote: > > > > > > > > On Mon, Jun 05, 2023 at 01:06:44PM +0200, Stefano Garzarella > > > > > > > > wrote: > > > > > > > > > vhost-vdpa IOCTLs (eg. VHOST_GET_VRING_BASE, > > > > > > > > > VHOST_SET_VRING_BASE) > > > > > > > > > don't support packed virtqueue well yet, so let's filter the > > > > > > > > > VIRTIO_F_RING_PACKED feature for now in > > > > > > > > > vhost_vdpa_get_features(). > > > > > > > > > > > > > > > > > > This way, even if the device supports it, we don't risk it > > > > > > > > > being > > > > > > > > > negotiated, then the VMM is unable to set the vring state > > > > > > > > > properly. > > > > > > > > > > > > > > > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") > > > > > > > > > Cc: sta...@vger.kernel.org > > > > > > > > > Signed-off-by: Stefano Garzarella > > > > > > > > > --- > > > > > > > > > > > > > > > > > > Notes: > > > > > > > > > This patch should be applied before the "[PATCH v2 0/3] > > > > > > > > > vhost_vdpa: > > > > > > > > > better PACKED support" series [1] and backported in > > > > > > > > > stable branches. > > > > > > > > > > > > > > > > > > We can revert it when we are sure that everything is > > > > > > > > > working with > > > > > > > > > packed virtqueues. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Stefano > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > https://lore.kernel.org/virtualization/20230424225031.18947-1-shannon.nel...@amd.com/ > > > > > > > > > > > > > > > > I'm a bit lost here. So why am I merging "better PACKED > > > > > > > > support" then? > > > > > > > > > > > > > > To really support packed virtqueue with vhost-vdpa, at that point > > > > > > > we would > > > > > > > also have to revert this patch. > > > > > > > > > > > > > > I wasn't sure if you wanted to queue the series for this merge > > > > > > > window. > > > > > > > In that case do you think it is better to send this patch only > > > > > > > for stable > > > > > > > branches? > > > > > > > > Does this patch make them a NOP? > > > > > > > > > > > > > > Yep, after applying the "better PACKED support" series and being > > > > > > > sure that > > > > > > > the IOCTLs of vhost-vdpa support packed virtqueue, we should > > > > > > > revert this > > > > > > > patch. > > > > > > > > > > > > > > Let me know if you prefer a different approach. > > > > > > > > > > > > > > I'm concerned that QEMU uses vhost-vdpa IOCTLs thinking that the > > > > > > > kernel > > > > > > > interprets them the right way, when it does not. > > > > > > > > > > > > > > Thanks, > > > > > > > Stefano > > > > > > > > > > > > > > > > > > > If this fixes a bug can you add Fixes tags to each of them? Then > > > > > > it's ok > > > > > > to merge in this window. Probably easier than the elaborate > > > > > > mask/unmask dance. > > > > > > > > > > CCing Shannon (the original author of the "better PACKED support" > > > > > series). > > > > > > > > > > IIUC Shannon is going to send a v3 of that series to fix the > > > > > documentation, so Shannon can you also add the Fixes tags? > > > > > > > > > > Thanks, > > > > > Stefano > > > > > > > > Well this is in my tree already. Just reply with > > > > Fixes: <> > > > > to each and I will add these tags. > > > > > > I tried, but it is not easy since we added the support for packed > > > virtqueue > > > in vdpa and vhost incrementally. > > > > > > Initially I was thinking of adding the same tag used here: > > > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") > > > > > > Then I discovered that vq_state wasn't there, so I was thinking of > > > > > > Fixes: 530a5678bc00 ("vdpa: support packed virtqueue for > > > set/get_vq_state()") > > > > > > So we would have to backport quite a few patches into the stable branches. > > > I don't know if it's worth it... > > > > > > I still think it is better to disable packed in the stable branches, > > > otherwise I have to make a list of all the patches we need. > > > > > > Any other ideas? > > > > > > Thanks, > > > Stefano > > > > OK so. You want me to apply this one now, and fixes in the next > > kernel? > > Yep, it seems to me the least risky approach. > > Thanks, > Stefano I'd like something more general though. Bring back the allowlist? Jason, your thoughts? -- MST ___ Virtualization m
Re: vhost: Fix vhost_task regressions in 6.4-rc
On Wed, Jun 07, 2023 at 02:23:36PM -0500, Mike Christie wrote: > The following patches were made over Linus's tree which contains a > vhost change missing in mst's vhost branch. These patches fix two > issues caused by the vhost_task patches: > 1. I was setting dev->worker too early and this caused crashes when > vsock would queue work before VHOST_SET_OWNER. > > 2. The patch that Linus's tree contains which vhost does not yet > have converted vhost_tasks to use CLONE_THREAD. That required a > change to how we set the task state, but I completely messed up > and moved when we set ourself to interruptible too late. > Right. and that's in rc5 yes? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote: > On Tue, May 30, 2023 at 9:19 AM Liang Chen wrote: > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin wrote: > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin > > > > wrote: > > > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > > > The implementation at the moment uses one page per packet in both > > > > > > the > > > > > > normal and XDP path. In addition, introducing a module parameter to > > > > > > enable > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > > > In single-core vm testing environments, it gives a modest > > > > > > performance gain > > > > > > in the normal path. > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > In multi-core vm testing environments, The most significant > > > > > > performance > > > > > > gain is observed in XDP cpumap: > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > > > With this foundation, we can further integrate page pool > > > > > > fragmentation and > > > > > > DMA map/unmap support. > > > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > > > > > Why off by default? > > > > > I am guessing it sometimes has performance costs too? > > > > > > > > > > > > > > > What happens if we use page pool for big mode too? > > > > > The less modes we have the better... > > > > > > > > > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. When the > > > > packet size is very small, it reduces the likelihood of skb > > > > coalescing. But such cases are rare. > > > > > > small packets are rare? These workloads are easy to create actually. > > > Pls try and include benchmark with small packet size. > > > > > > > Sure, Thanks! > > Before going ahead and posting v2 patch, I would like to hear more > advice for the cases of small packets. I have done more performance > benchmark with small packets since then. Here is a list of iperf > output, > > With PP and PP fragmenting: > 256K: [ 5] 505.00-510.00 sec 1.34 GBytes 2.31 Gbits/sec0144 > KBytes > 1K: [ 5] 30.00-35.00 sec 4.63 GBytes 7.95 Gbits/sec0 > 223 KBytes > 2K: [ 5] 65.00-70.00 sec 8.33 GBytes 14.3 Gbits/sec0 > 324 KBytes > 4K: [ 5] 30.00-35.00 sec 13.3 GBytes 22.8 Gbits/sec0 > 1.08 MBytes > 8K: [ 5] 50.00-55.00 sec 18.9 GBytes 32.4 Gbits/sec0 > 744 KBytes > 16K: [ 5] 25.00-30.00 sec 24.6 GBytes 42.3 Gbits/sec0963 > KBytes > 32K: [ 5] 45.00-50.00 sec 29.8 GBytes 51.2 Gbits/sec0 1.25 > MBytes > 64K: [ 5] 35.00-40.00 sec 34.0 GBytes 58.4 Gbits/sec0 1.70 > MBytes > 128K: [ 5] 45.00-50.00 sec 36.7 GBytes 63.1 Gbits/sec0 4.26 > MBytes > 256K: [ 5] 30.00-35.00 sec 40.0 GBytes 68.8 Gbits/sec0 3.20 > MBytes > > Without PP: > 256: [ 5] 680.00-685.00 sec 1.57 GBytes 2.69 Gbits/sec0359 > KBytes > 1K: [ 5] 75.00-80.00 sec 5.47 GBytes 9.40 Gbits/sec0730 > KBytes > 2K: [ 5] 65.00-70.00 sec 9.46 GBytes 16.2 Gbits/sec0 1.99 > MBytes > 4K: [ 5] 30.00-35.00 sec 14.5 GBytes 25.0 Gbits/sec0 1.20 > MBytes > 8K: [ 5] 45.00-50.00 sec 19.9 GBytes 34.1 Gbits/sec0 1.72 > MBytes > 16K:[ 5] 5.00-10.00 sec 23.8 GBytes 40.9 Gbits/sec0 2.90 > MBytes > 32K:[ 5] 15.00-20.00 sec 28.0 GBytes 48.1 Gbits/sec0 3.03 > MBytes > 64K:[ 5] 60.00-65.00 sec 31.8 GBytes 54.6 Gbits/sec0 3.05 > MBytes > 128K: [ 5] 45.00-50.00 sec 33.0 GBytes 56.6 Gbits/sec1 3.03 MBytes > 256K: [ 5] 25.00-30.00 sec 34.7 GBytes 59.6 Gbits/sec0 3.11 MBytes > > > The major factor contributing to the performance drop is the reduction > of skb coalescing. Additionally, without the page pool, small packets > can still benefit from the allocation of 8 continuous pages by > breaking them down into smaller pieces. This effectively reduces the > frequency of page allocation from the buddy system. For instance, the > arrival of 32 1K packets only triggers one alloc_page call. Therefore, > the benefits of using a page pool are limited in such cases. In fact, > without page pool fragmenting enabled, it can even hinder performance > from this perspective. > > Upon further consideration, I tend to believe making page pool the > default option may not be appropriate. As you pointed out, we cannot > simply ignore the performance impact on small packets. Any comments on > this will be much appreciated. > > > Thanks, > Liang So, let's only use page pool for XDP then? > > > > > The usage of page pool for big mode is being evaluated now. Thanks! > > > > > > > > > >
Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
On Wed, Jun 07, 2023 at 07:05:11AM -0700, Christoph Hellwig wrote: > On Mon, Jun 05, 2023 at 09:58:21AM +0800, Xuan Zhuo wrote: > > On Fri, 2 Jun 2023 23:29:02 -0700, Jakub Kicinski wrote: > > > On Fri, 2 Jun 2023 17:21:56 +0800 Xuan Zhuo wrote: > > > > Thanks for the help from Christoph. > > > > > > That said you haven't CCed him on the series, isn't the general rule to > > > CC anyone who was involved in previous discussions? > > > > > > Sorry, I forgot to add cc after git format-patch. > > So I've been looking for this series elsewhere, but it seems to include > neither lkml nor the iommu list, so I can't find it. Can you please > repost it? I bounced to lkml now - does this help? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
vhost: Fix vhost_task regressions in 6.4-rc
The following patches were made over Linus's tree which contains a vhost change missing in mst's vhost branch. These patches fix two issues caused by the vhost_task patches: 1. I was setting dev->worker too early and this caused crashes when vsock would queue work before VHOST_SET_OWNER. 2. The patch that Linus's tree contains which vhost does not yet have converted vhost_tasks to use CLONE_THREAD. That required a change to how we set the task state, but I completely messed up and moved when we set ourself to interruptible too late. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 2/2] vhost: Fix worker hangs due to missed wake up calls
We can race where we have added work to the work_list, but vhost_task_fn has passed that check but not yet set us into TASK_INTERRUPTIBLE. wake_up_process will see us in TASK_RUNNING and just return. This bug was intoduced in commit f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression") when I moved the setting of TASK_INTERRUPTIBLE to simplfy the code and avoid get_signal from logging warnings about being in the wrong state. This moves the setting of TASK_INTERRUPTIBLE back to before we test if we need to stop the task to avoid a possible race there as well. We then have vhost_worker set TASK_RUNNING if it finds work similar to before. Fixes: f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression") Signed-off-by: Mike Christie --- drivers/vhost/vhost.c | 2 ++ kernel/vhost_task.c | 16 +--- 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 7a9f93eae225..b2722d29e069 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -341,6 +341,8 @@ static bool vhost_worker(void *data) node = llist_del_all(&worker->work_list); if (node) { + __set_current_state(TASK_RUNNING); + node = llist_reverse_order(node); /* make sure flag is seen after deletion */ smp_wmb(); diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c index f80d5c51ae67..da35e5b7f047 100644 --- a/kernel/vhost_task.c +++ b/kernel/vhost_task.c @@ -28,10 +28,6 @@ static int vhost_task_fn(void *data) for (;;) { bool did_work; - /* mb paired w/ vhost_task_stop */ - if (test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags)) - break; - if (!dead && signal_pending(current)) { struct ksignal ksig; /* @@ -48,11 +44,17 @@ static int vhost_task_fn(void *data) clear_thread_flag(TIF_SIGPENDING); } + /* mb paired w/ vhost_task_stop */ + set_current_state(TASK_INTERRUPTIBLE); + + if (test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags)) { + __set_current_state(TASK_RUNNING); + break; + } + did_work = vtsk->fn(vtsk->data); - if (!did_work) { - set_current_state(TASK_INTERRUPTIBLE); + if (!did_work) schedule(); - } } complete(&vtsk->exited); -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 1/2] vhost: Fix crash during early vhost_transport_send_pkt calls
If userspace does VHOST_VSOCK_SET_GUEST_CID before VHOST_SET_OWNER we can race where: 1. thread0 calls vhost_transport_send_pkt -> vhost_work_queue 2. thread1 does VHOST_SET_OWNER which calls vhost_worker_create. 3. vhost_worker_create will set the dev->worker pointer before setting the worker->vtsk pointer. 4. thread0's vhost_work_queue will see the dev->worker pointer is set and try to call vhost_task_wake using not yet set worker->vtsk pointer. 5. We then crash since vtsk is NULL. Before commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads"), we only had the worker pointer so we could just check it to see if VHOST_SET_OWNER has been done. After that commit we have the vhost_worker and vhost_task pointer, so we can now hit the bug above. This patch embeds the vhost_worker in the vhost_dev and moves the work list initialization back to vhost_dev_init, so we can just check the worker.vtsk pointer to check if VHOST_SET_OWNER has been done like before. Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") Signed-off-by: Mike Christie --- drivers/vhost/vhost.c | 50 +++ drivers/vhost/vhost.h | 2 +- 2 files changed, 18 insertions(+), 34 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 074273020849..7a9f93eae225 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -235,7 +235,7 @@ void vhost_dev_flush(struct vhost_dev *dev) { struct vhost_flush_struct flush; - if (dev->worker) { + if (dev->worker.vtsk) { init_completion(&flush.wait_event); vhost_work_init(&flush.work, vhost_flush_work); @@ -247,7 +247,7 @@ EXPORT_SYMBOL_GPL(vhost_dev_flush); void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) { - if (!dev->worker) + if (!dev->worker.vtsk) return; if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) { @@ -255,8 +255,8 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) * sure it was not in the list. * test_and_set_bit() implies a memory barrier. */ - llist_add(&work->node, &dev->worker->work_list); - vhost_task_wake(dev->worker->vtsk); + llist_add(&work->node, &dev->worker.work_list); + vhost_task_wake(dev->worker.vtsk); } } EXPORT_SYMBOL_GPL(vhost_work_queue); @@ -264,7 +264,7 @@ EXPORT_SYMBOL_GPL(vhost_work_queue); /* A lockless hint for busy polling code to exit the loop */ bool vhost_has_work(struct vhost_dev *dev) { - return dev->worker && !llist_empty(&dev->worker->work_list); + return !llist_empty(&dev->worker.work_list); } EXPORT_SYMBOL_GPL(vhost_has_work); @@ -456,7 +456,8 @@ void vhost_dev_init(struct vhost_dev *dev, dev->umem = NULL; dev->iotlb = NULL; dev->mm = NULL; - dev->worker = NULL; + memset(&dev->worker, 0, sizeof(dev->worker)); + init_llist_head(&dev->worker.work_list); dev->iov_limit = iov_limit; dev->weight = weight; dev->byte_weight = byte_weight; @@ -530,47 +531,30 @@ static void vhost_detach_mm(struct vhost_dev *dev) static void vhost_worker_free(struct vhost_dev *dev) { - struct vhost_worker *worker = dev->worker; - - if (!worker) + if (!dev->worker.vtsk) return; - dev->worker = NULL; - WARN_ON(!llist_empty(&worker->work_list)); - vhost_task_stop(worker->vtsk); - kfree(worker); + WARN_ON(!llist_empty(&dev->worker.work_list)); + vhost_task_stop(dev->worker.vtsk); + dev->worker.kcov_handle = 0; + dev->worker.vtsk = NULL; } static int vhost_worker_create(struct vhost_dev *dev) { - struct vhost_worker *worker; struct vhost_task *vtsk; char name[TASK_COMM_LEN]; - int ret; - - worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT); - if (!worker) - return -ENOMEM; - dev->worker = worker; - worker->kcov_handle = kcov_common_handle(); - init_llist_head(&worker->work_list); snprintf(name, sizeof(name), "vhost-%d", current->pid); - vtsk = vhost_task_create(vhost_worker, worker, name); - if (!vtsk) { - ret = -ENOMEM; - goto free_worker; - } + vtsk = vhost_task_create(vhost_worker, &dev->worker, name); + if (!vtsk) + return -ENOMEM; - worker->vtsk = vtsk; + dev->worker.kcov_handle = kcov_common_handle(); + dev->worker.vtsk = vtsk; vhost_task_start(vtsk); return 0; - -free_worker: - kfree(worker); - dev->worker = NULL; - return ret; } /* Caller should have device mutex */ diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 0308638cdeee..305ec8593d46 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -154,7 +154,7 @@ struct
[PATCH] vdpa/mlx5: Support interrupt bypassing
From: Eli Cohen Add support for generation of interrupts from the device directly to the VM to the VCPU thus avoiding the overhead on the host CPU. When supported, the driver will attempt to allocate vectors for each data virtqueue. If a vector for a virtqueue cannot be provided it will use the QP mode where notifications go through the driver. In addition, we add a shutdown callback to make sure allocated interrupts are released in case of shutdown to allow clean shutdown. Signed-off-by: Eli Cohen Signed-off-by: Saeed Mahameed --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 165 -- drivers/vdpa/mlx5/net/mlx5_vnet.h | 15 +++ 2 files changed, 171 insertions(+), 9 deletions(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 279ac6a558d2..9138ef2fb2c8 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -83,6 +83,7 @@ struct mlx5_vq_restore_info { u64 driver_addr; u16 avail_index; u16 used_index; + struct msi_map map; bool ready; bool restore; }; @@ -118,6 +119,7 @@ struct mlx5_vdpa_virtqueue { u16 avail_idx; u16 used_idx; int fw_state; + struct msi_map map; /* keep last in the struct */ struct mlx5_vq_restore_info ri; @@ -808,6 +810,13 @@ static bool counters_supported(const struct mlx5_vdpa_dev *mvdev) BIT_ULL(MLX5_OBJ_TYPE_VIRTIO_Q_COUNTERS); } +static bool msix_mode_supported(struct mlx5_vdpa_dev *mvdev) +{ + return MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, event_mode) & + (1 << MLX5_VIRTIO_Q_EVENT_MODE_MSIX_MODE) && + pci_msix_can_alloc_dyn(mvdev->mdev->pdev); +} + static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq) { int inlen = MLX5_ST_SZ_BYTES(create_virtio_net_q_in); @@ -849,9 +858,15 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque if (vq_is_tx(mvq->index)) MLX5_SET(virtio_net_q_object, obj_context, tisn_or_qpn, ndev->res.tisn); - MLX5_SET(virtio_q, vq_ctx, event_mode, MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE); + if (mvq->map.virq) { + MLX5_SET(virtio_q, vq_ctx, event_mode, MLX5_VIRTIO_Q_EVENT_MODE_MSIX_MODE); + MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->map.index); + } else { + MLX5_SET(virtio_q, vq_ctx, event_mode, MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE); + MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->fwqp.mqp.qpn); + } + MLX5_SET(virtio_q, vq_ctx, queue_index, mvq->index); - MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->fwqp.mqp.qpn); MLX5_SET(virtio_q, vq_ctx, queue_size, mvq->num_ent); MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0, !!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_F_VERSION_1))); @@ -1194,6 +1209,56 @@ static void counter_set_dealloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_vir mlx5_vdpa_warn(&ndev->mvdev, "dealloc counter set 0x%x\n", mvq->counter_set_id); } +static irqreturn_t mlx5_vdpa_int_handler(int irq, void *priv) +{ + struct vdpa_callback *cb = priv; + + if (cb->callback) + return cb->callback(cb->private); + + return IRQ_HANDLED; +} + +static void alloc_vector(struct mlx5_vdpa_net *ndev, +struct mlx5_vdpa_virtqueue *mvq) +{ + struct mlx5_vdpa_irq_pool *irqp = &ndev->irqp; + struct mlx5_vdpa_irq_pool_entry *ent; + int err; + int i; + + for (i = 0; i < irqp->num_ent; i++) { + ent = &irqp->entries[i]; + if (!ent->used) { + snprintf(ent->name, MLX5_VDPA_IRQ_NAME_LEN, "%s-vq-%d", +dev_name(&ndev->mvdev.vdev.dev), mvq->index); + ent->dev_id = &ndev->event_cbs[mvq->index]; + err = request_irq(ent->map.virq, mlx5_vdpa_int_handler, 0, + ent->name, ent->dev_id); + if (err) + return; + + ent->used = true; + mvq->map = ent->map; + return; + } + } +} + +static void dealloc_vector(struct mlx5_vdpa_net *ndev, + struct mlx5_vdpa_virtqueue *mvq) +{ + struct mlx5_vdpa_irq_pool *irqp = &ndev->irqp; + int i; + + for (i = 0; i < irqp->num_ent; i++) + if (mvq->map.virq == irqp->entries[i].map.virq) { + free_irq(mvq->map.virq, irqp->entries[i].dev_id); + irqp->entries[i].used = false; + return; + } +} + static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq) { u16 idx = mvq->index; @@ -1223,27 +1288,31 @@ static int setup
Re: [PATCH RFC 0/4] x86/fixmap: Unify FIXADDR_TOP
On Mon, May 15 2023 at 16:19, Hou Wenlong wrote: > This patchset unifies FIXADDR_TOP as a variable for x86, allowing the > fixmap area to be movable and relocated with the kernel image in the > x86/PIE patchset [0]. This enables the kernel image to be relocated in > the top 512G of the address space. What for? What's the use case. Please provide a proper argument why this is generally useful and important. Thanks, tglx ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
On Mon, Jun 05, 2023 at 09:58:21AM +0800, Xuan Zhuo wrote: > On Fri, 2 Jun 2023 23:29:02 -0700, Jakub Kicinski wrote: > > On Fri, 2 Jun 2023 17:21:56 +0800 Xuan Zhuo wrote: > > > Thanks for the help from Christoph. > > > > That said you haven't CCed him on the series, isn't the general rule to > > CC anyone who was involved in previous discussions? > > > Sorry, I forgot to add cc after git format-patch. So I've been looking for this series elsewhere, but it seems to include neither lkml nor the iommu list, so I can't find it. Can you please repost it? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Wed, Jun 7, 2023 at 5:36 PM Xuan Zhuo wrote: > > On Wed, 7 Jun 2023 17:08:59 +0800, Liang Chen > wrote: > > On Tue, May 30, 2023 at 9:19 AM Liang Chen > > wrote: > > > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin > > > wrote: > > > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin > > > > > wrote: > > > > > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > > > > The implementation at the moment uses one page per packet in both > > > > > > > the > > > > > > > normal and XDP path. In addition, introducing a module parameter > > > > > > > to enable > > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > > > > > In single-core vm testing environments, it gives a modest > > > > > > > performance gain > > > > > > > in the normal path. > > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > > > In multi-core vm testing environments, The most significant > > > > > > > performance > > > > > > > gain is observed in XDP cpumap: > > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > > > > > With this foundation, we can further integrate page pool > > > > > > > fragmentation and > > > > > > > DMA map/unmap support. > > > > > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > > > > > > > Why off by default? > > > > > > I am guessing it sometimes has performance costs too? > > > > > > > > > > > > > > > > > > What happens if we use page pool for big mode too? > > > > > > The less modes we have the better... > > > > > > > > > > > > > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. When the > > > > > packet size is very small, it reduces the likelihood of skb > > > > > coalescing. But such cases are rare. > > > > > > > > small packets are rare? These workloads are easy to create actually. > > > > Pls try and include benchmark with small packet size. > > > > > > > > > > Sure, Thanks! > > > > Before going ahead and posting v2 patch, I would like to hear more > > advice for the cases of small packets. I have done more performance > > benchmark with small packets since then. Here is a list of iperf > > output, > > Could you show the commnad line? > > Thanks > > Sure. iperf3 -c -i 5 -f g -t 0 -l > > > > With PP and PP fragmenting: > > 256K: [ 5] 505.00-510.00 sec 1.34 GBytes 2.31 Gbits/sec0144 > > KBytes > > 1K: [ 5] 30.00-35.00 sec 4.63 GBytes 7.95 Gbits/sec0 > > 223 KBytes > > 2K: [ 5] 65.00-70.00 sec 8.33 GBytes 14.3 Gbits/sec0 > > 324 KBytes > > 4K: [ 5] 30.00-35.00 sec 13.3 GBytes 22.8 Gbits/sec0 > > 1.08 MBytes > > 8K: [ 5] 50.00-55.00 sec 18.9 GBytes 32.4 Gbits/sec0 > > 744 KBytes > > 16K: [ 5] 25.00-30.00 sec 24.6 GBytes 42.3 Gbits/sec0963 > > KBytes > > 32K: [ 5] 45.00-50.00 sec 29.8 GBytes 51.2 Gbits/sec0 1.25 > > MBytes > > 64K: [ 5] 35.00-40.00 sec 34.0 GBytes 58.4 Gbits/sec0 1.70 > > MBytes > > 128K: [ 5] 45.00-50.00 sec 36.7 GBytes 63.1 Gbits/sec0 4.26 > > MBytes > > 256K: [ 5] 30.00-35.00 sec 40.0 GBytes 68.8 Gbits/sec0 3.20 > > MBytes > > > > Without PP: > > 256: [ 5] 680.00-685.00 sec 1.57 GBytes 2.69 Gbits/sec0359 > > KBytes > > 1K: [ 5] 75.00-80.00 sec 5.47 GBytes 9.40 Gbits/sec0730 > > KBytes > > 2K: [ 5] 65.00-70.00 sec 9.46 GBytes 16.2 Gbits/sec0 1.99 > > MBytes > > 4K: [ 5] 30.00-35.00 sec 14.5 GBytes 25.0 Gbits/sec0 1.20 > > MBytes > > 8K: [ 5] 45.00-50.00 sec 19.9 GBytes 34.1 Gbits/sec0 1.72 > > MBytes > > 16K:[ 5] 5.00-10.00 sec 23.8 GBytes 40.9 Gbits/sec0 2.90 > > MBytes > > 32K:[ 5] 15.00-20.00 sec 28.0 GBytes 48.1 Gbits/sec0 3.03 > > MBytes > > 64K:[ 5] 60.00-65.00 sec 31.8 GBytes 54.6 Gbits/sec0 3.05 > > MBytes > > 128K: [ 5] 45.00-50.00 sec 33.0 GBytes 56.6 Gbits/sec1 3.03 > > MBytes > > 256K: [ 5] 25.00-30.00 sec 34.7 GBytes 59.6 Gbits/sec0 3.11 > > MBytes > > > > > > The major factor contributing to the performance drop is the reduction > > of skb coalescing. Additionally, without the page pool, small packets > > can still benefit from the allocation of 8 continuous pages by > > breaking them down into smaller pieces. This effectively reduces the > > frequency of page allocation from the buddy system. For instance, the > > arrival of 32 1K packets only triggers one alloc_page call. Therefore, > > the benefits of using a page pool are limited in such cases. In fact, > > without page pool fragmenting enabled, it can even hinder performance > > from this perspective. > > > > Upon further consideration, I ten
Re: [PATCH RFC 0/4] x86/fixmap: Unify FIXADDR_TOP
On 5/15/23 01:19, Hou Wenlong wrote: > This patchset unifies FIXADDR_TOP as a variable for x86, allowing the > fixmap area to be movable and relocated with the kernel image in the > x86/PIE patchset [0]. This enables the kernel image to be relocated in > the top 512G of the address space. What problems does this patch set solve? How might that solution be visible to end users? Why is this problem important to you? Also, while you're waiting for someone to review _your_ code, have you considered reviewing anyone else's code? I don't think I've seen any review activity from you lately. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost-vdpa: filter VIRTIO_F_RING_PACKED feature
On Wed, Jun 07, 2023 at 10:39:15AM +0200, Stefano Garzarella wrote: > On Tue, Jun 6, 2023 at 2:58 PM Michael S. Tsirkin wrote: > > > > On Tue, Jun 06, 2023 at 09:29:22AM +0800, Jason Wang wrote: > > > On Mon, Jun 5, 2023 at 10:58 PM Stefano Garzarella > > > wrote: > > > > > > > > On Mon, Jun 05, 2023 at 09:54:57AM -0400, Michael S. Tsirkin wrote: > > > > >On Mon, Jun 05, 2023 at 03:30:35PM +0200, Stefano Garzarella wrote: > > > > >> On Mon, Jun 05, 2023 at 09:00:25AM -0400, Michael S. Tsirkin wrote: > > > > >> > On Mon, Jun 05, 2023 at 02:54:20PM +0200, Stefano Garzarella wrote: > > > > >> > > On Mon, Jun 05, 2023 at 08:41:54AM -0400, Michael S. Tsirkin > > > > >> > > wrote: > > > > >> > > > On Mon, Jun 05, 2023 at 01:06:44PM +0200, Stefano Garzarella > > > > >> > > > wrote: > > > > >> > > > > vhost-vdpa IOCTLs (eg. VHOST_GET_VRING_BASE, > > > > >> > > > > VHOST_SET_VRING_BASE) > > > > >> > > > > don't support packed virtqueue well yet, so let's filter the > > > > >> > > > > VIRTIO_F_RING_PACKED feature for now in > > > > >> > > > > vhost_vdpa_get_features(). > > > > >> > > > > > > > > >> > > > > This way, even if the device supports it, we don't risk it > > > > >> > > > > being > > > > >> > > > > negotiated, then the VMM is unable to set the vring state > > > > >> > > > > properly. > > > > >> > > > > > > > > >> > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") > > > > >> > > > > Cc: sta...@vger.kernel.org > > > > >> > > > > Signed-off-by: Stefano Garzarella > > > > >> > > > > --- > > > > >> > > > > > > > > >> > > > > Notes: > > > > >> > > > > This patch should be applied before the "[PATCH v2 0/3] > > > > >> > > > > vhost_vdpa: > > > > >> > > > > better PACKED support" series [1] and backported in > > > > >> > > > > stable branches. > > > > >> > > > > > > > > >> > > > > We can revert it when we are sure that everything is > > > > >> > > > > working with > > > > >> > > > > packed virtqueues. > > > > >> > > > > > > > > >> > > > > Thanks, > > > > >> > > > > Stefano > > > > >> > > > > > > > > >> > > > > [1] > > > > >> > > > > https://lore.kernel.org/virtualization/20230424225031.18947-1-shannon.nel...@amd.com/ > > > > >> > > > > > > > >> > > > I'm a bit lost here. So why am I merging "better PACKED > > > > >> > > > support" then? > > > > >> > > > > > > >> > > To really support packed virtqueue with vhost-vdpa, at that > > > > >> > > point we would > > > > >> > > also have to revert this patch. > > > > >> > > > > > > >> > > I wasn't sure if you wanted to queue the series for this merge > > > > >> > > window. > > > > >> > > In that case do you think it is better to send this patch only > > > > >> > > for stable > > > > >> > > branches? > > > > >> > > > Does this patch make them a NOP? > > > > >> > > > > > > >> > > Yep, after applying the "better PACKED support" series and being > > > > >> > > sure that > > > > >> > > the IOCTLs of vhost-vdpa support packed virtqueue, we should > > > > >> > > revert this > > > > >> > > patch. > > > > >> > > > > > > >> > > Let me know if you prefer a different approach. > > > > >> > > > > > > >> > > I'm concerned that QEMU uses vhost-vdpa IOCTLs thinking that the > > > > >> > > kernel > > > > >> > > interprets them the right way, when it does not. > > > > >> > > > > > > >> > > Thanks, > > > > >> > > Stefano > > > > >> > > > > > > >> > > > > > >> > If this fixes a bug can you add Fixes tags to each of them? Then > > > > >> > it's ok > > > > >> > to merge in this window. Probably easier than the elaborate > > > > >> > mask/unmask dance. > > > > >> > > > > >> CCing Shannon (the original author of the "better PACKED support" > > > > >> series). > > > > >> > > > > >> IIUC Shannon is going to send a v3 of that series to fix the > > > > >> documentation, so Shannon can you also add the Fixes tags? > > > > >> > > > > >> Thanks, > > > > >> Stefano > > > > > > > > > >Well this is in my tree already. Just reply with > > > > >Fixes: <> > > > > >to each and I will add these tags. > > > > > > > > I tried, but it is not easy since we added the support for packed > > > > virtqueue in vdpa and vhost incrementally. > > > > > > > > Initially I was thinking of adding the same tag used here: > > > > > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") > > > > > > > > Then I discovered that vq_state wasn't there, so I was thinking of > > > > > > > > Fixes: 530a5678bc00 ("vdpa: support packed virtqueue for > > > > set/get_vq_state()") > > > > > > > > So we would have to backport quite a few patches into the stable > > > > branches. > > > > I don't know if it's worth it... > > > > > > > > I still think it is better to disable packed in the stable branches, > > > > otherwise I have to make a list of all the patches we need. > > > > > > > > Any other ideas? > > > > > > AFAIK, except for vp_vdpa, pds seems to be the first parent that > > > supports packed virtqueue. Users should not notice anything wrong if > > > they don't
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Wed, 7 Jun 2023 17:08:59 +0800, Liang Chen wrote: > On Tue, May 30, 2023 at 9:19 AM Liang Chen wrote: > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin wrote: > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin > > > > wrote: > > > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > > > The implementation at the moment uses one page per packet in both > > > > > > the > > > > > > normal and XDP path. In addition, introducing a module parameter to > > > > > > enable > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > > > In single-core vm testing environments, it gives a modest > > > > > > performance gain > > > > > > in the normal path. > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > In multi-core vm testing environments, The most significant > > > > > > performance > > > > > > gain is observed in XDP cpumap: > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > > > With this foundation, we can further integrate page pool > > > > > > fragmentation and > > > > > > DMA map/unmap support. > > > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > > > > > Why off by default? > > > > > I am guessing it sometimes has performance costs too? > > > > > > > > > > > > > > > What happens if we use page pool for big mode too? > > > > > The less modes we have the better... > > > > > > > > > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. When the > > > > packet size is very small, it reduces the likelihood of skb > > > > coalescing. But such cases are rare. > > > > > > small packets are rare? These workloads are easy to create actually. > > > Pls try and include benchmark with small packet size. > > > > > > > Sure, Thanks! > > Before going ahead and posting v2 patch, I would like to hear more > advice for the cases of small packets. I have done more performance > benchmark with small packets since then. Here is a list of iperf > output, Could you show the commnad line? Thanks > > With PP and PP fragmenting: > 256K: [ 5] 505.00-510.00 sec 1.34 GBytes 2.31 Gbits/sec0144 > KBytes > 1K: [ 5] 30.00-35.00 sec 4.63 GBytes 7.95 Gbits/sec0 > 223 KBytes > 2K: [ 5] 65.00-70.00 sec 8.33 GBytes 14.3 Gbits/sec0 > 324 KBytes > 4K: [ 5] 30.00-35.00 sec 13.3 GBytes 22.8 Gbits/sec0 > 1.08 MBytes > 8K: [ 5] 50.00-55.00 sec 18.9 GBytes 32.4 Gbits/sec0 > 744 KBytes > 16K: [ 5] 25.00-30.00 sec 24.6 GBytes 42.3 Gbits/sec0963 > KBytes > 32K: [ 5] 45.00-50.00 sec 29.8 GBytes 51.2 Gbits/sec0 1.25 > MBytes > 64K: [ 5] 35.00-40.00 sec 34.0 GBytes 58.4 Gbits/sec0 1.70 > MBytes > 128K: [ 5] 45.00-50.00 sec 36.7 GBytes 63.1 Gbits/sec0 4.26 > MBytes > 256K: [ 5] 30.00-35.00 sec 40.0 GBytes 68.8 Gbits/sec0 3.20 > MBytes > > Without PP: > 256: [ 5] 680.00-685.00 sec 1.57 GBytes 2.69 Gbits/sec0359 > KBytes > 1K: [ 5] 75.00-80.00 sec 5.47 GBytes 9.40 Gbits/sec0730 > KBytes > 2K: [ 5] 65.00-70.00 sec 9.46 GBytes 16.2 Gbits/sec0 1.99 > MBytes > 4K: [ 5] 30.00-35.00 sec 14.5 GBytes 25.0 Gbits/sec0 1.20 > MBytes > 8K: [ 5] 45.00-50.00 sec 19.9 GBytes 34.1 Gbits/sec0 1.72 > MBytes > 16K:[ 5] 5.00-10.00 sec 23.8 GBytes 40.9 Gbits/sec0 2.90 > MBytes > 32K:[ 5] 15.00-20.00 sec 28.0 GBytes 48.1 Gbits/sec0 3.03 > MBytes > 64K:[ 5] 60.00-65.00 sec 31.8 GBytes 54.6 Gbits/sec0 3.05 > MBytes > 128K: [ 5] 45.00-50.00 sec 33.0 GBytes 56.6 Gbits/sec1 3.03 MBytes > 256K: [ 5] 25.00-30.00 sec 34.7 GBytes 59.6 Gbits/sec0 3.11 MBytes > > > The major factor contributing to the performance drop is the reduction > of skb coalescing. Additionally, without the page pool, small packets > can still benefit from the allocation of 8 continuous pages by > breaking them down into smaller pieces. This effectively reduces the > frequency of page allocation from the buddy system. For instance, the > arrival of 32 1K packets only triggers one alloc_page call. Therefore, > the benefits of using a page pool are limited in such cases. In fact, > without page pool fragmenting enabled, it can even hinder performance > from this perspective. > > Upon further consideration, I tend to believe making page pool the > default option may not be appropriate. As you pointed out, we cannot > simply ignore the performance impact on small packets. Any comments on > this will be much appreciated. > > > Thanks, > Liang > > > > > > The usage of page pool for big mode is being evaluated now. Thanks! > > > > > > > > > > --- > > > > > >
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Wed, 7 Jun 2023 17:11:44 +0800, Liang Chen wrote: > On Wed, May 31, 2023 at 11:12 AM Xuan Zhuo wrote: > > > > On Mon, 29 May 2023 15:28:17 +0800, Liang Chen > > wrote: > > > On Sun, May 28, 2023 at 2:40 PM Michael S. Tsirkin > > > wrote: > > > > > > > > On Sat, May 27, 2023 at 08:35:01PM +0800, Liang Chen wrote: > > > > > On Fri, May 26, 2023 at 2:51 PM Jason Wang > > > > > wrote: > > > > > > > > > > > > On Fri, May 26, 2023 at 1:46 PM Liang Chen > > > > > > wrote: > > > > > > > > > > > > > > The implementation at the moment uses one page per packet in both > > > > > > > the > > > > > > > normal and XDP path. > > > > > > > > > > > > It's better to explain why we need a page pool and how it can help > > > > > > the > > > > > > performance. > > > > > > > > > > > > > > > > Sure, I will include that on v2. > > > > > > > In addition, introducing a module parameter to enable > > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > > > If page pool wins for most of the cases, any reason to disable it > > > > > > by default? > > > > > > > > > > > > > > > > Thank you for raising the point. It does make sense to enable it by > > > > > default. > > > > > > > > I'd like to see more benchmarks pls then, with a variety of packet > > > > sizes, udp and tcp. > > > > > > > > > > Sure, more benchmarks will be provided. Thanks. > > > > > > I think so. > > > > I did this, but I did not found any improve. So I gave up it. > > > > Thanks. > > > > > > Our UDP benchmark shows a steady 0.8 percent change in PPS > measurement. However, when conducting iperf TCP stream performance > testing, the results vary depending on the packet size and testing > setup. With small packet sizes, the performance actually drops > slightly due to the reasons I explained in the previous email. On the > other hand, with large packets, we need to ensure that the sender side > doesn't become the bottleneck. To achieve this, our setup uses a > single-core vm to keep the receiver busy, which allows us to identify > performance differences in the receiving path. Could you show some numbers? Thanks. > > > Thanks, > Liang > > > > > > > > > > > > > > > > > > > > > > > > In single-core vm testing environments, it gives a modest > > > > > > > performance gain > > > > > > > in the normal path. > > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > > > In multi-core vm testing environments, The most significant > > > > > > > performance > > > > > > > gain is observed in XDP cpumap: > > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > > > Please show more details on the test. E.g which kinds of tests have > > > > > > you measured? > > > > > > > > > > > > Btw, it would be better to measure PPS as well. > > > > > > > > > > > > > > > > Sure. It will be added on v2. > > > > > > > > > > > > > > With this foundation, we can further integrate page pool > > > > > > > fragmentation and > > > > > > > DMA map/unmap support. > > > > > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > > --- > > > > > > > drivers/net/virtio_net.c | 188 > > > > > > > ++- > > > > > > > > > > > > I believe we should make virtio-net to select CONFIG_PAGE_POOL or do > > > > > > the ifdef tricks at least. > > > > > > > > > > > > > > > > Sure. it will be done on v2. > > > > > > > 1 file changed, 146 insertions(+), 42 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > > > index c5dca0d92e64..99c0ca0c1781 100644 > > > > > > > --- a/drivers/net/virtio_net.c > > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444); > > > > > > > module_param(gso, bool, 0444); > > > > > > > module_param(napi_tx, bool, 0644); > > > > > > > > > > > > > > +static bool page_pool_enabled; > > > > > > > +module_param(page_pool_enabled, bool, 0400); > > > > > > > + > > > > > > > /* FIXME: MTU in config. */ > > > > > > > #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) > > > > > > > #define GOOD_COPY_LEN 128 > > > > > > > @@ -159,6 +162,9 @@ struct receive_queue { > > > > > > > /* Chain pages by the private ptr. */ > > > > > > > struct page *pages; > > > > > > > > > > > > > > + /* Page pool */ > > > > > > > + struct page_pool *page_pool; > > > > > > > + > > > > > > > /* Average packet length for mergeable receive buffers. */ > > > > > > > struct ewma_pkt_len mrg_avg_pkt_len; > > > > > > > > > > > > > > @@ -459,6 +465,14 @@ static struct sk_buff > > > > > > > *virtnet_build_skb(void *buf, unsigned int buflen, > > > > > > > return skb; > > > > > > > } > > > > > > > > > > > > > > +static void virtnet_put_page(struct receive_queue *rq, struct > > > > > > > page *page) > > >
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Wed, May 31, 2023 at 11:12 AM Xuan Zhuo wrote: > > On Mon, 29 May 2023 15:28:17 +0800, Liang Chen > wrote: > > On Sun, May 28, 2023 at 2:40 PM Michael S. Tsirkin wrote: > > > > > > On Sat, May 27, 2023 at 08:35:01PM +0800, Liang Chen wrote: > > > > On Fri, May 26, 2023 at 2:51 PM Jason Wang wrote: > > > > > > > > > > On Fri, May 26, 2023 at 1:46 PM Liang Chen > > > > > wrote: > > > > > > > > > > > > The implementation at the moment uses one page per packet in both > > > > > > the > > > > > > normal and XDP path. > > > > > > > > > > It's better to explain why we need a page pool and how it can help the > > > > > performance. > > > > > > > > > > > > > Sure, I will include that on v2. > > > > > > In addition, introducing a module parameter to enable > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > If page pool wins for most of the cases, any reason to disable it by > > > > > default? > > > > > > > > > > > > > Thank you for raising the point. It does make sense to enable it by > > > > default. > > > > > > I'd like to see more benchmarks pls then, with a variety of packet > > > sizes, udp and tcp. > > > > > > > Sure, more benchmarks will be provided. Thanks. > > > I think so. > > I did this, but I did not found any improve. So I gave up it. > > Thanks. > > Our UDP benchmark shows a steady 0.8 percent change in PPS measurement. However, when conducting iperf TCP stream performance testing, the results vary depending on the packet size and testing setup. With small packet sizes, the performance actually drops slightly due to the reasons I explained in the previous email. On the other hand, with large packets, we need to ensure that the sender side doesn't become the bottleneck. To achieve this, our setup uses a single-core vm to keep the receiver busy, which allows us to identify performance differences in the receiving path. Thanks, Liang > > > > > > > > > > > > > > > > In single-core vm testing environments, it gives a modest > > > > > > performance gain > > > > > > in the normal path. > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > In multi-core vm testing environments, The most significant > > > > > > performance > > > > > > gain is observed in XDP cpumap: > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > Please show more details on the test. E.g which kinds of tests have > > > > > you measured? > > > > > > > > > > Btw, it would be better to measure PPS as well. > > > > > > > > > > > > > Sure. It will be added on v2. > > > > > > > > > > > > With this foundation, we can further integrate page pool > > > > > > fragmentation and > > > > > > DMA map/unmap support. > > > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > --- > > > > > > drivers/net/virtio_net.c | 188 > > > > > > ++- > > > > > > > > > > I believe we should make virtio-net to select CONFIG_PAGE_POOL or do > > > > > the ifdef tricks at least. > > > > > > > > > > > > > Sure. it will be done on v2. > > > > > > 1 file changed, 146 insertions(+), 42 deletions(-) > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > > index c5dca0d92e64..99c0ca0c1781 100644 > > > > > > --- a/drivers/net/virtio_net.c > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444); > > > > > > module_param(gso, bool, 0444); > > > > > > module_param(napi_tx, bool, 0644); > > > > > > > > > > > > +static bool page_pool_enabled; > > > > > > +module_param(page_pool_enabled, bool, 0400); > > > > > > + > > > > > > /* FIXME: MTU in config. */ > > > > > > #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) > > > > > > #define GOOD_COPY_LEN 128 > > > > > > @@ -159,6 +162,9 @@ struct receive_queue { > > > > > > /* Chain pages by the private ptr. */ > > > > > > struct page *pages; > > > > > > > > > > > > + /* Page pool */ > > > > > > + struct page_pool *page_pool; > > > > > > + > > > > > > /* Average packet length for mergeable receive buffers. */ > > > > > > struct ewma_pkt_len mrg_avg_pkt_len; > > > > > > > > > > > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void > > > > > > *buf, unsigned int buflen, > > > > > > return skb; > > > > > > } > > > > > > > > > > > > +static void virtnet_put_page(struct receive_queue *rq, struct page > > > > > > *page) > > > > > > +{ > > > > > > + if (rq->page_pool) > > > > > > + page_pool_put_full_page(rq->page_pool, page, true); > > > > > > + else > > > > > > + put_page(page); > > > > > > +} > > > > > > + > > > > > > /* Called from bottom half context */ > > > > > > static struct sk_buff *page_to_skb(struct virtnet_info *vi, > > > > > >
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Tue, May 30, 2023 at 9:19 AM Liang Chen wrote: > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin wrote: > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin > > > wrote: > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > > The implementation at the moment uses one page per packet in both the > > > > > normal and XDP path. In addition, introducing a module parameter to > > > > > enable > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > In single-core vm testing environments, it gives a modest performance > > > > > gain > > > > > in the normal path. > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > In multi-core vm testing environments, The most significant > > > > > performance > > > > > gain is observed in XDP cpumap: > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > With this foundation, we can further integrate page pool > > > > > fragmentation and > > > > > DMA map/unmap support. > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > > > Why off by default? > > > > I am guessing it sometimes has performance costs too? > > > > > > > > > > > > What happens if we use page pool for big mode too? > > > > The less modes we have the better... > > > > > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. When the > > > packet size is very small, it reduces the likelihood of skb > > > coalescing. But such cases are rare. > > > > small packets are rare? These workloads are easy to create actually. > > Pls try and include benchmark with small packet size. > > > > Sure, Thanks! Before going ahead and posting v2 patch, I would like to hear more advice for the cases of small packets. I have done more performance benchmark with small packets since then. Here is a list of iperf output, With PP and PP fragmenting: 256K: [ 5] 505.00-510.00 sec 1.34 GBytes 2.31 Gbits/sec0144 KBytes 1K: [ 5] 30.00-35.00 sec 4.63 GBytes 7.95 Gbits/sec0 223 KBytes 2K: [ 5] 65.00-70.00 sec 8.33 GBytes 14.3 Gbits/sec0 324 KBytes 4K: [ 5] 30.00-35.00 sec 13.3 GBytes 22.8 Gbits/sec0 1.08 MBytes 8K: [ 5] 50.00-55.00 sec 18.9 GBytes 32.4 Gbits/sec0 744 KBytes 16K: [ 5] 25.00-30.00 sec 24.6 GBytes 42.3 Gbits/sec0963 KBytes 32K: [ 5] 45.00-50.00 sec 29.8 GBytes 51.2 Gbits/sec0 1.25 MBytes 64K: [ 5] 35.00-40.00 sec 34.0 GBytes 58.4 Gbits/sec0 1.70 MBytes 128K: [ 5] 45.00-50.00 sec 36.7 GBytes 63.1 Gbits/sec0 4.26 MBytes 256K: [ 5] 30.00-35.00 sec 40.0 GBytes 68.8 Gbits/sec0 3.20 MBytes Without PP: 256: [ 5] 680.00-685.00 sec 1.57 GBytes 2.69 Gbits/sec0359 KBytes 1K: [ 5] 75.00-80.00 sec 5.47 GBytes 9.40 Gbits/sec0730 KBytes 2K: [ 5] 65.00-70.00 sec 9.46 GBytes 16.2 Gbits/sec0 1.99 MBytes 4K: [ 5] 30.00-35.00 sec 14.5 GBytes 25.0 Gbits/sec0 1.20 MBytes 8K: [ 5] 45.00-50.00 sec 19.9 GBytes 34.1 Gbits/sec0 1.72 MBytes 16K:[ 5] 5.00-10.00 sec 23.8 GBytes 40.9 Gbits/sec0 2.90 MBytes 32K:[ 5] 15.00-20.00 sec 28.0 GBytes 48.1 Gbits/sec0 3.03 MBytes 64K:[ 5] 60.00-65.00 sec 31.8 GBytes 54.6 Gbits/sec0 3.05 MBytes 128K: [ 5] 45.00-50.00 sec 33.0 GBytes 56.6 Gbits/sec1 3.03 MBytes 256K: [ 5] 25.00-30.00 sec 34.7 GBytes 59.6 Gbits/sec0 3.11 MBytes The major factor contributing to the performance drop is the reduction of skb coalescing. Additionally, without the page pool, small packets can still benefit from the allocation of 8 continuous pages by breaking them down into smaller pieces. This effectively reduces the frequency of page allocation from the buddy system. For instance, the arrival of 32 1K packets only triggers one alloc_page call. Therefore, the benefits of using a page pool are limited in such cases. In fact, without page pool fragmenting enabled, it can even hinder performance from this perspective. Upon further consideration, I tend to believe making page pool the default option may not be appropriate. As you pointed out, we cannot simply ignore the performance impact on small packets. Any comments on this will be much appreciated. Thanks, Liang > > > The usage of page pool for big mode is being evaluated now. Thanks! > > > > > > > > --- > > > > > drivers/net/virtio_net.c | 188 > > > > > ++- > > > > > 1 file changed, 146 insertions(+), 42 deletions(-) > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > index c5dca0d92e64..99c0ca0c1781 100644 > > > > > --- a/drivers/net/virtio_net.c > > > > > +++ b/drivers/net/virtio_net.
Re: [PATCH 1/1] vhost: Fix crash during early vhost_transport_send_pkt calls
On Tue, Jun 06, 2023 at 12:19:10PM -0500, Mike Christie wrote: On 6/6/23 4:49 AM, Stefano Garzarella wrote: On Mon, Jun 05, 2023 at 01:57:30PM -0500, Mike Christie wrote: If userspace does VHOST_VSOCK_SET_GUEST_CID before VHOST_SET_OWNER we can race where: 1. thread0 calls vhost_transport_send_pkt -> vhost_work_queue 2. thread1 does VHOST_SET_OWNER which calls vhost_worker_create. 3. vhost_worker_create will set the dev->worker pointer before setting the worker->vtsk pointer. 4. thread0's vhost_work_queue will see the dev->worker pointer is set and try to call vhost_task_wake using not yet set worker->vtsk pointer. 5. We then crash since vtsk is NULL. Before commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads"), we only had the worker pointer so we could just check it to see if VHOST_SET_OWNER has been done. After that commit we have the vhost_worker and vhost_task pointers, so we can now hit the bug above. This patch embeds the vhost_worker in the vhost_dev, so we can just check the worker.vtsk pointer to check if VHOST_SET_OWNER has been done like before. Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") We should add: Reported-by: syzbot+d0d442c22fa8db45f...@syzkaller.appspotmail.com Ok. Will do. - } + vtsk = vhost_task_create(vhost_worker, &dev->worker, name); + if (!vtsk) + return -ENOMEM; - worker->vtsk = vtsk; + dev->worker.kcov_handle = kcov_common_handle(); + dev->worker.vtsk = vtsk; vhost_work_queue() is called by vhost_transport_send_pkt() without holding vhost_dev.mutex (like vhost_poll_queue() in several places). If vhost_work_queue() finds dev->worker.vtsk not NULL, how can we be sure that for example `work_list` has been initialized? Maybe I'm overthinking since we didn't have this problem before or the race is really short that it never happened. Yeah, I dropped the READ/WRITE_ONCE use because I didn't think we needed it for the vhost_vsock_start case, and for the case you mentioned above, I wondered the same thing as you but was not sure so I added the same behavior as before. When I read memory-barriers.txt, it sounds like we've been getting lucky. Yep, it happened to me too before I highlighted it :-) I'll add back the READ/WRITE_ONCE for vtsk access since that's what we are keying off as signaling that the worker is ready to be used. I didn't see any type of perf hit when using it, and from the memory-barriers.txt doc it sounds like that's what we should be doing. LGTM, I just wonder if RCU on dev->worker can save us from future problems, but maybe better to do it in the next release. Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost-vdpa: filter VIRTIO_F_RING_PACKED feature
On Tue, Jun 6, 2023 at 2:58 PM Michael S. Tsirkin wrote: > > On Tue, Jun 06, 2023 at 09:29:22AM +0800, Jason Wang wrote: > > On Mon, Jun 5, 2023 at 10:58 PM Stefano Garzarella > > wrote: > > > > > > On Mon, Jun 05, 2023 at 09:54:57AM -0400, Michael S. Tsirkin wrote: > > > >On Mon, Jun 05, 2023 at 03:30:35PM +0200, Stefano Garzarella wrote: > > > >> On Mon, Jun 05, 2023 at 09:00:25AM -0400, Michael S. Tsirkin wrote: > > > >> > On Mon, Jun 05, 2023 at 02:54:20PM +0200, Stefano Garzarella wrote: > > > >> > > On Mon, Jun 05, 2023 at 08:41:54AM -0400, Michael S. Tsirkin wrote: > > > >> > > > On Mon, Jun 05, 2023 at 01:06:44PM +0200, Stefano Garzarella > > > >> > > > wrote: > > > >> > > > > vhost-vdpa IOCTLs (eg. VHOST_GET_VRING_BASE, > > > >> > > > > VHOST_SET_VRING_BASE) > > > >> > > > > don't support packed virtqueue well yet, so let's filter the > > > >> > > > > VIRTIO_F_RING_PACKED feature for now in > > > >> > > > > vhost_vdpa_get_features(). > > > >> > > > > > > > >> > > > > This way, even if the device supports it, we don't risk it > > > >> > > > > being > > > >> > > > > negotiated, then the VMM is unable to set the vring state > > > >> > > > > properly. > > > >> > > > > > > > >> > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") > > > >> > > > > Cc: sta...@vger.kernel.org > > > >> > > > > Signed-off-by: Stefano Garzarella > > > >> > > > > --- > > > >> > > > > > > > >> > > > > Notes: > > > >> > > > > This patch should be applied before the "[PATCH v2 0/3] > > > >> > > > > vhost_vdpa: > > > >> > > > > better PACKED support" series [1] and backported in stable > > > >> > > > > branches. > > > >> > > > > > > > >> > > > > We can revert it when we are sure that everything is > > > >> > > > > working with > > > >> > > > > packed virtqueues. > > > >> > > > > > > > >> > > > > Thanks, > > > >> > > > > Stefano > > > >> > > > > > > > >> > > > > [1] > > > >> > > > > https://lore.kernel.org/virtualization/20230424225031.18947-1-shannon.nel...@amd.com/ > > > >> > > > > > > >> > > > I'm a bit lost here. So why am I merging "better PACKED support" > > > >> > > > then? > > > >> > > > > > >> > > To really support packed virtqueue with vhost-vdpa, at that point > > > >> > > we would > > > >> > > also have to revert this patch. > > > >> > > > > > >> > > I wasn't sure if you wanted to queue the series for this merge > > > >> > > window. > > > >> > > In that case do you think it is better to send this patch only for > > > >> > > stable > > > >> > > branches? > > > >> > > > Does this patch make them a NOP? > > > >> > > > > > >> > > Yep, after applying the "better PACKED support" series and being > > > >> > > sure that > > > >> > > the IOCTLs of vhost-vdpa support packed virtqueue, we should > > > >> > > revert this > > > >> > > patch. > > > >> > > > > > >> > > Let me know if you prefer a different approach. > > > >> > > > > > >> > > I'm concerned that QEMU uses vhost-vdpa IOCTLs thinking that the > > > >> > > kernel > > > >> > > interprets them the right way, when it does not. > > > >> > > > > > >> > > Thanks, > > > >> > > Stefano > > > >> > > > > > >> > > > > >> > If this fixes a bug can you add Fixes tags to each of them? Then > > > >> > it's ok > > > >> > to merge in this window. Probably easier than the elaborate > > > >> > mask/unmask dance. > > > >> > > > >> CCing Shannon (the original author of the "better PACKED support" > > > >> series). > > > >> > > > >> IIUC Shannon is going to send a v3 of that series to fix the > > > >> documentation, so Shannon can you also add the Fixes tags? > > > >> > > > >> Thanks, > > > >> Stefano > > > > > > > >Well this is in my tree already. Just reply with > > > >Fixes: <> > > > >to each and I will add these tags. > > > > > > I tried, but it is not easy since we added the support for packed > > > virtqueue in vdpa and vhost incrementally. > > > > > > Initially I was thinking of adding the same tag used here: > > > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") > > > > > > Then I discovered that vq_state wasn't there, so I was thinking of > > > > > > Fixes: 530a5678bc00 ("vdpa: support packed virtqueue for > > > set/get_vq_state()") > > > > > > So we would have to backport quite a few patches into the stable branches. > > > I don't know if it's worth it... > > > > > > I still think it is better to disable packed in the stable branches, > > > otherwise I have to make a list of all the patches we need. > > > > > > Any other ideas? > > > > AFAIK, except for vp_vdpa, pds seems to be the first parent that > > supports packed virtqueue. Users should not notice anything wrong if > > they don't use packed virtqueue. And the problem of vp_vdpa + packed > > virtqueue came since the day0 of vp_vdpa. It seems fine to do nothing > > I guess. > > > > Thanks > > > I have a question though, what if down the road there > is a new feature that needs more changes? It will be > broken too just like PACKED no? > Shouldn't vdp