Re: [RFC] vhost: vmap virtio descriptor table kernel/kvm
On Tue, Oct 24, 2023 at 12:45 PM Jason Wang wrote: > > On Tue, Oct 24, 2023 at 11:17 AM Liang Chen wrote: > > > > The current vhost code uses 'copy_from_user' to copy descriptors from > > userspace to vhost. We attempted to 'vmap' the descriptor table to > > reduce the overhead associated with 'copy_from_user' during descriptor > > access, because it can be accessed quite frequently. This change > > resulted in a moderate performance improvement (approximately 3%) in > > our pktgen test, as shown below. Additionally, the latency in the > > 'vhost_get_vq_desc' function shows a noticeable decrease. > > > > current code: > > IFACE rxpck/s txpck/srxkB/stxkB/s > > rxcmp/s txcmp/s rxmcst/s %ifutil > > Average:vnet0 0.31 1330807.03 0.02 77976.98 > > 0.00 0.00 0.00 6.39 > > # /usr/share/bcc/tools/funclatency -d 10 vhost_get_vq_desc > > avg = 145 nsecs, total: 1455980341 nsecs, count: 9985224 > > > > vmap: > > IFACE rxpck/s txpck/srxkB/stxkB/s > > rxcmp/s txcmp/s rxmcst/s %ifutil > > Average:vnet0 0.07 1371870.49 0.00 80383.04 > > 0.00 0.00 0.00 6.58 > > # /usr/share/bcc/tools/funclatency -d 10 vhost_get_vq_desc > > avg = 122 nsecs, total: 1286983929 nsecs, count: 10478134 > > > > We are uncertain if there are any aspects we may have overlooked and > > would appreciate any advice before we submit an actual patch. > > So the idea is to use a shadow page table instead of the userspace one > to avoid things like spec barriers or SMAP. > > I've tried this in the past: > > commit 7f466032dc9e5a61217f22ea34b2df932786bbfc (HEAD) > Author: Jason Wang > Date: Fri May 24 04:12:18 2019 -0400 > > vhost: access vq metadata through kernel virtual address > > It was noticed that the copy_to/from_user() friends that was used to > access virtqueue metdata tends to be very expensive for dataplane > implementation like vhost since it involves lots of software checks, > speculation barriers, hardware feature toggling (e.g SMAP). The > extra cost will be more obvious when transferring small packets since > the time spent on metadata accessing become more significant. > ... > > Note that it tries to use a direct map instead of a VMAP as Andrea > suggests. But it led to several fallouts which were tricky to be > fixed[1] (like the use of MMU notifiers to do synchronization). So it > is reverted finally. > > I'm not saying it's a dead end. But we need to find a way to solve the > issues or use something different. I'm happy to offer help. > > 1) Avoid using SMAP for vhost kthread, for example using shed > notifier, I'm not sure this is possible or not > 2) A new iov iterator that doesn't do SMAP at all, this looks > dangerous and Al might not like it > 3) (Re)using HMM > ... > > You may want to see archives for more information. We've had a lot of > discussions. > > Thanks > > [1] https://lore.kernel.org/lkml/20190731084655.7024-1-jasow...@redhat.com/ > Thanks a lot for the help. We will take a deeper look and reach out if we have any questions or once we've finalized the patch. Thanks, Liang > > > > > > Thanks, > > Liang > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC] vhost: vmap virtio descriptor table kernel/kvm
The current vhost code uses 'copy_from_user' to copy descriptors from userspace to vhost. We attempted to 'vmap' the descriptor table to reduce the overhead associated with 'copy_from_user' during descriptor access, because it can be accessed quite frequently. This change resulted in a moderate performance improvement (approximately 3%) in our pktgen test, as shown below. Additionally, the latency in the 'vhost_get_vq_desc' function shows a noticeable decrease. current code: IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average:vnet0 0.31 1330807.03 0.02 77976.98 0.00 0.00 0.00 6.39 # /usr/share/bcc/tools/funclatency -d 10 vhost_get_vq_desc avg = 145 nsecs, total: 1455980341 nsecs, count: 9985224 vmap: IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average:vnet0 0.07 1371870.49 0.00 80383.04 0.00 0.00 0.00 6.58 # /usr/share/bcc/tools/funclatency -d 10 vhost_get_vq_desc avg = 122 nsecs, total: 1286983929 nsecs, count: 10478134 We are uncertain if there are any aspects we may have overlooked and would appreciate any advice before we submit an actual patch. Thanks, Liang ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC] vmap virtio descriptor table
The current vhost code uses 'copy_from_user' to copy descriptors from userspace to vhost. We attempted to 'vmap' the descriptor table to reduce the overhead associated with 'copy_from_user' during descriptor access, because it can be accessed quite frequently. This change resulted in a moderate performance improvement (approximately 3%) in our pktgen test, as shown below. Additionally, the latency in the 'vhost_get_vq_desc' function shows a noticeable decrease. current code: IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average:vnet0 0.31 1330807.03 0.02 77976.98 0.00 0.00 0.00 6.39 # /usr/share/bcc/tools/funclatency -d 10 vhost_get_vq_desc avg = 145 nsecs, total: 1455980341 nsecs, count: 9985224 kmap: IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average:vnet0 0.07 1371870.49 0.00 80383.04 0.00 0.00 0.00 6.58 # /usr/share/bcc/tools/funclatency -d 10 vhost_get_vq_desc avg = 122 nsecs, total: 1286983929 nsecs, count: 10478134 We are uncertain if there are any aspects we may have overlooked and would appreciate any advice before we submit an actual patch. Thanks, Liang ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Wed, Jul 5, 2023 at 2:05 PM Jason Wang wrote: > > On Wed, Jul 5, 2023 at 1:41 PM Liang Chen wrote: > > > > On Fri, Jun 9, 2023 at 10:57 AM Liang Chen > > wrote: > > > > > > On Thu, Jun 8, 2023 at 8:38 AM Jason Wang wrote: > > > > > > > > On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin > > > > wrote: > > > > > > > > > > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote: > > > > > > On Tue, May 30, 2023 at 9:19 AM Liang Chen > > > > > > wrote: > > > > > > > > > > > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin > > > > > > > wrote: > > > > > > > > > > > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > > > > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > > > > > > > > The implementation at the moment uses one page per packet > > > > > > > > > > > in both the > > > > > > > > > > > normal and XDP path. In addition, introducing a module > > > > > > > > > > > parameter to enable > > > > > > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > > > > > > > > > > > > > In single-core vm testing environments, it gives a modest > > > > > > > > > > > performance gain > > > > > > > > > > > in the normal path. > > > > > > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > > > > > > > > > > > In multi-core vm testing environments, The most > > > > > > > > > > > significant performance > > > > > > > > > > > gain is observed in XDP cpumap: > > > > > > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > > > > > > > > > > > > > With this foundation, we can further integrate page pool > > > > > > > > > > > fragmentation and > > > > > > > > > > > DMA map/unmap support. > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > > > > > > > > > > > > > > > Why off by default? > > > > > > > > > > I am guessing it sometimes has performance costs too? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What happens if we use page pool for big mode too? > > > > > > > > > > The less modes we have the better... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. > > > > > > > > > When the > > > > > > > > > packet size is very small, it reduces the likelihood of skb > > > > > > > > > coalescing. But such cases are rare. > > > > > > > > > > > > > > > > small packets are rare? These workloads are easy to create > > > > > > > > actually. > > > > > > > > Pls try and include benchmark with small packet size. > > > > > > > > > > > > > > > > > > > > > > Sure, Thanks! > > > > > > > > > > > > Before going ahead and posting v2 patch, I would like to hear more > > > > > > advice for the cases of small packets. I have done more performance > > > > > > benchmark with small packets since then. Here is a list of iperf > > > > > > output, > > > >
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Fri, Jun 9, 2023 at 10:57 AM Liang Chen wrote: > > On Thu, Jun 8, 2023 at 8:38 AM Jason Wang wrote: > > > > On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin wrote: > > > > > > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote: > > > > On Tue, May 30, 2023 at 9:19 AM Liang Chen > > > > wrote: > > > > > > > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin > > > > > wrote: > > > > > > > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin > > > > > > > wrote: > > > > > > > > > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > > > > > > The implementation at the moment uses one page per packet in > > > > > > > > > both the > > > > > > > > > normal and XDP path. In addition, introducing a module > > > > > > > > > parameter to enable > > > > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > > > > > > > > > In single-core vm testing environments, it gives a modest > > > > > > > > > performance gain > > > > > > > > > in the normal path. > > > > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > > > > > > > In multi-core vm testing environments, The most significant > > > > > > > > > performance > > > > > > > > > gain is observed in XDP cpumap: > > > > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > > > > > > > > > With this foundation, we can further integrate page pool > > > > > > > > > fragmentation and > > > > > > > > > DMA map/unmap support. > > > > > > > > > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > > > > > > > > > > > Why off by default? > > > > > > > > I am guessing it sometimes has performance costs too? > > > > > > > > > > > > > > > > > > > > > > > > What happens if we use page pool for big mode too? > > > > > > > > The less modes we have the better... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. When > > > > > > > the > > > > > > > packet size is very small, it reduces the likelihood of skb > > > > > > > coalescing. But such cases are rare. > > > > > > > > > > > > small packets are rare? These workloads are easy to create actually. > > > > > > Pls try and include benchmark with small packet size. > > > > > > > > > > > > > > > > Sure, Thanks! > > > > > > > > Before going ahead and posting v2 patch, I would like to hear more > > > > advice for the cases of small packets. I have done more performance > > > > benchmark with small packets since then. Here is a list of iperf > > > > output, > > > > > > > > With PP and PP fragmenting: > > > > 256K: [ 5] 505.00-510.00 sec 1.34 GBytes 2.31 Gbits/sec0 > > > > 144 KBytes > > > > 1K: [ 5] 30.00-35.00 sec 4.63 GBytes 7.95 Gbits/sec0 > > > > 223 KBytes > > > > 2K: [ 5] 65.00-70.00 sec 8.33 GBytes 14.3 Gbits/sec0 > > > > 324 KBytes > > > > 4K: [ 5] 30.00-35.00 sec 13.3 GBytes 22.8 Gbits/sec0 > > > > 1.08 MBytes > > > > 8K: [ 5] 50.00-55.00 sec 18.9 GBytes 32.4 Gbits/sec0 > > > > 744 KBytes > > > > 16K: [ 5] 25.00-30.00 sec 24.6 GBytes 42.3 Gbits/sec0 > > > > 963 KBytes > > > > 32K: [ 5] 45.00-50.00 sec 29.8 GBytes 51.2 Gb
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Thu, Jun 8, 2023 at 8:38 AM Jason Wang wrote: > > On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin wrote: > > > > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote: > > > On Tue, May 30, 2023 at 9:19 AM Liang Chen > > > wrote: > > > > > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin > > > > wrote: > > > > > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin > > > > > > wrote: > > > > > > > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > > > > > The implementation at the moment uses one page per packet in > > > > > > > > both the > > > > > > > > normal and XDP path. In addition, introducing a module > > > > > > > > parameter to enable > > > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > > > > > > > In single-core vm testing environments, it gives a modest > > > > > > > > performance gain > > > > > > > > in the normal path. > > > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > > > > > In multi-core vm testing environments, The most significant > > > > > > > > performance > > > > > > > > gain is observed in XDP cpumap: > > > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > > > > > > > With this foundation, we can further integrate page pool > > > > > > > > fragmentation and > > > > > > > > DMA map/unmap support. > > > > > > > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > > > > > > > > > Why off by default? > > > > > > > I am guessing it sometimes has performance costs too? > > > > > > > > > > > > > > > > > > > > > What happens if we use page pool for big mode too? > > > > > > > The less modes we have the better... > > > > > > > > > > > > > > > > > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. When the > > > > > > packet size is very small, it reduces the likelihood of skb > > > > > > coalescing. But such cases are rare. > > > > > > > > > > small packets are rare? These workloads are easy to create actually. > > > > > Pls try and include benchmark with small packet size. > > > > > > > > > > > > > Sure, Thanks! > > > > > > Before going ahead and posting v2 patch, I would like to hear more > > > advice for the cases of small packets. I have done more performance > > > benchmark with small packets since then. Here is a list of iperf > > > output, > > > > > > With PP and PP fragmenting: > > > 256K: [ 5] 505.00-510.00 sec 1.34 GBytes 2.31 Gbits/sec0144 > > > KBytes > > > 1K: [ 5] 30.00-35.00 sec 4.63 GBytes 7.95 Gbits/sec0 > > > 223 KBytes > > > 2K: [ 5] 65.00-70.00 sec 8.33 GBytes 14.3 Gbits/sec0 > > > 324 KBytes > > > 4K: [ 5] 30.00-35.00 sec 13.3 GBytes 22.8 Gbits/sec0 > > > 1.08 MBytes > > > 8K: [ 5] 50.00-55.00 sec 18.9 GBytes 32.4 Gbits/sec0 > > > 744 KBytes > > > 16K: [ 5] 25.00-30.00 sec 24.6 GBytes 42.3 Gbits/sec0963 > > > KBytes > > > 32K: [ 5] 45.00-50.00 sec 29.8 GBytes 51.2 Gbits/sec0 1.25 > > > MBytes > > > 64K: [ 5] 35.00-40.00 sec 34.0 GBytes 58.4 Gbits/sec0 1.70 > > > MBytes > > > 128K: [ 5] 45.00-50.00 sec 36.7 GBytes 63.1 Gbits/sec0 4.26 > > > MBytes > > > 256K: [ 5] 30.00-35.00 sec 40.0 GBytes 68.8 Gbits/sec0 3.20 > > > MBytes > > Note that virtio-net driver is lacking things like BQL and others, so > it might suffer from buffe
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin wrote: > > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote: > > On Tue, May 30, 2023 at 9:19 AM Liang Chen > > wrote: > > > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin > > > wrote: > > > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin > > > > > wrote: > > > > > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > > > > The implementation at the moment uses one page per packet in both > > > > > > > the > > > > > > > normal and XDP path. In addition, introducing a module parameter > > > > > > > to enable > > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > > > > > In single-core vm testing environments, it gives a modest > > > > > > > performance gain > > > > > > > in the normal path. > > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > > > In multi-core vm testing environments, The most significant > > > > > > > performance > > > > > > > gain is observed in XDP cpumap: > > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > > > > > With this foundation, we can further integrate page pool > > > > > > > fragmentation and > > > > > > > DMA map/unmap support. > > > > > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > > > > > > > Why off by default? > > > > > > I am guessing it sometimes has performance costs too? > > > > > > > > > > > > > > > > > > What happens if we use page pool for big mode too? > > > > > > The less modes we have the better... > > > > > > > > > > > > > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. When the > > > > > packet size is very small, it reduces the likelihood of skb > > > > > coalescing. But such cases are rare. > > > > > > > > small packets are rare? These workloads are easy to create actually. > > > > Pls try and include benchmark with small packet size. > > > > > > > > > > Sure, Thanks! > > > > Before going ahead and posting v2 patch, I would like to hear more > > advice for the cases of small packets. I have done more performance > > benchmark with small packets since then. Here is a list of iperf > > output, > > > > With PP and PP fragmenting: > > 256K: [ 5] 505.00-510.00 sec 1.34 GBytes 2.31 Gbits/sec0144 > > KBytes > > 1K: [ 5] 30.00-35.00 sec 4.63 GBytes 7.95 Gbits/sec0 > > 223 KBytes > > 2K: [ 5] 65.00-70.00 sec 8.33 GBytes 14.3 Gbits/sec0 > > 324 KBytes > > 4K: [ 5] 30.00-35.00 sec 13.3 GBytes 22.8 Gbits/sec0 > > 1.08 MBytes > > 8K: [ 5] 50.00-55.00 sec 18.9 GBytes 32.4 Gbits/sec0 > > 744 KBytes > > 16K: [ 5] 25.00-30.00 sec 24.6 GBytes 42.3 Gbits/sec0963 > > KBytes > > 32K: [ 5] 45.00-50.00 sec 29.8 GBytes 51.2 Gbits/sec0 1.25 > > MBytes > > 64K: [ 5] 35.00-40.00 sec 34.0 GBytes 58.4 Gbits/sec0 1.70 > > MBytes > > 128K: [ 5] 45.00-50.00 sec 36.7 GBytes 63.1 Gbits/sec0 4.26 > > MBytes > > 256K: [ 5] 30.00-35.00 sec 40.0 GBytes 68.8 Gbits/sec0 3.20 > > MBytes > > > > Without PP: > > 256: [ 5] 680.00-685.00 sec 1.57 GBytes 2.69 Gbits/sec0359 > > KBytes > > 1K: [ 5] 75.00-80.00 sec 5.47 GBytes 9.40 Gbits/sec0730 > > KBytes > > 2K: [ 5] 65.00-70.00 sec 9.46 GBytes 16.2 Gbits/sec0 1.99 > > MBytes > > 4K: [ 5] 30.00-35.00 sec 14.5 GBytes 25.0 Gbits/sec0 1.20 > > MBytes > > 8K: [ 5] 45.00-50.00 sec 19.9 GBytes 34.1 Gbits/sec0 1.72 > > MBytes > > 16K:[ 5] 5.00-10.00 sec 2
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Wed, Jun 7, 2023 at 5:36 PM Xuan Zhuo wrote: > > On Wed, 7 Jun 2023 17:08:59 +0800, Liang Chen > wrote: > > On Tue, May 30, 2023 at 9:19 AM Liang Chen > > wrote: > > > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin > > > wrote: > > > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin > > > > > wrote: > > > > > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > > > > The implementation at the moment uses one page per packet in both > > > > > > > the > > > > > > > normal and XDP path. In addition, introducing a module parameter > > > > > > > to enable > > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > > > > > In single-core vm testing environments, it gives a modest > > > > > > > performance gain > > > > > > > in the normal path. > > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > > > In multi-core vm testing environments, The most significant > > > > > > > performance > > > > > > > gain is observed in XDP cpumap: > > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > > > > > With this foundation, we can further integrate page pool > > > > > > > fragmentation and > > > > > > > DMA map/unmap support. > > > > > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > > > > > > > Why off by default? > > > > > > I am guessing it sometimes has performance costs too? > > > > > > > > > > > > > > > > > > What happens if we use page pool for big mode too? > > > > > > The less modes we have the better... > > > > > > > > > > > > > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. When the > > > > > packet size is very small, it reduces the likelihood of skb > > > > > coalescing. But such cases are rare. > > > > > > > > small packets are rare? These workloads are easy to create actually. > > > > Pls try and include benchmark with small packet size. > > > > > > > > > > Sure, Thanks! > > > > Before going ahead and posting v2 patch, I would like to hear more > > advice for the cases of small packets. I have done more performance > > benchmark with small packets since then. Here is a list of iperf > > output, > > Could you show the commnad line? > > Thanks > > Sure. iperf3 -c -i 5 -f g -t 0 -l > > > > With PP and PP fragmenting: > > 256K: [ 5] 505.00-510.00 sec 1.34 GBytes 2.31 Gbits/sec0144 > > KBytes > > 1K: [ 5] 30.00-35.00 sec 4.63 GBytes 7.95 Gbits/sec0 > > 223 KBytes > > 2K: [ 5] 65.00-70.00 sec 8.33 GBytes 14.3 Gbits/sec0 > > 324 KBytes > > 4K: [ 5] 30.00-35.00 sec 13.3 GBytes 22.8 Gbits/sec0 > > 1.08 MBytes > > 8K: [ 5] 50.00-55.00 sec 18.9 GBytes 32.4 Gbits/sec0 > > 744 KBytes > > 16K: [ 5] 25.00-30.00 sec 24.6 GBytes 42.3 Gbits/sec0963 > > KBytes > > 32K: [ 5] 45.00-50.00 sec 29.8 GBytes 51.2 Gbits/sec0 1.25 > > MBytes > > 64K: [ 5] 35.00-40.00 sec 34.0 GBytes 58.4 Gbits/sec0 1.70 > > MBytes > > 128K: [ 5] 45.00-50.00 sec 36.7 GBytes 63.1 Gbits/sec0 4.26 > > MBytes > > 256K: [ 5] 30.00-35.00 sec 40.0 GBytes 68.8 Gbits/sec0 3.20 > > MBytes > > > > Without PP: > > 256: [ 5] 680.00-685.00 sec 1.57 GBytes 2.69 Gbits/sec0359 > > KBytes > > 1K: [ 5] 75.00-80.00 sec 5.47 GBytes 9.40 Gbits/sec0730 > > KBytes > > 2K: [ 5] 65.00-70.00 sec 9.46 GBytes 16.2 Gbits/sec0 1.99 > > MBytes > > 4K: [ 5] 30.00-35.00 sec 14.5 GBytes 25.0 Gbits/sec0 1.20 > > MBytes > > 8K: [ 5] 45.00-50.00 sec
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Wed, May 31, 2023 at 11:12 AM Xuan Zhuo wrote: > > On Mon, 29 May 2023 15:28:17 +0800, Liang Chen > wrote: > > On Sun, May 28, 2023 at 2:40 PM Michael S. Tsirkin wrote: > > > > > > On Sat, May 27, 2023 at 08:35:01PM +0800, Liang Chen wrote: > > > > On Fri, May 26, 2023 at 2:51 PM Jason Wang wrote: > > > > > > > > > > On Fri, May 26, 2023 at 1:46 PM Liang Chen > > > > > wrote: > > > > > > > > > > > > The implementation at the moment uses one page per packet in both > > > > > > the > > > > > > normal and XDP path. > > > > > > > > > > It's better to explain why we need a page pool and how it can help the > > > > > performance. > > > > > > > > > > > > > Sure, I will include that on v2. > > > > > > In addition, introducing a module parameter to enable > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > If page pool wins for most of the cases, any reason to disable it by > > > > > default? > > > > > > > > > > > > > Thank you for raising the point. It does make sense to enable it by > > > > default. > > > > > > I'd like to see more benchmarks pls then, with a variety of packet > > > sizes, udp and tcp. > > > > > > > Sure, more benchmarks will be provided. Thanks. > > > I think so. > > I did this, but I did not found any improve. So I gave up it. > > Thanks. > > Our UDP benchmark shows a steady 0.8 percent change in PPS measurement. However, when conducting iperf TCP stream performance testing, the results vary depending on the packet size and testing setup. With small packet sizes, the performance actually drops slightly due to the reasons I explained in the previous email. On the other hand, with large packets, we need to ensure that the sender side doesn't become the bottleneck. To achieve this, our setup uses a single-core vm to keep the receiver busy, which allows us to identify performance differences in the receiving path. Thanks, Liang > > > > > > > > > > > > > > > > In single-core vm testing environments, it gives a modest > > > > > > performance gain > > > > > > in the normal path. > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > In multi-core vm testing environments, The most significant > > > > > > performance > > > > > > gain is observed in XDP cpumap: > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > Please show more details on the test. E.g which kinds of tests have > > > > > you measured? > > > > > > > > > > Btw, it would be better to measure PPS as well. > > > > > > > > > > > > > Sure. It will be added on v2. > > > > > > > > > > > > With this foundation, we can further integrate page pool > > > > > > fragmentation and > > > > > > DMA map/unmap support. > > > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > --- > > > > > > drivers/net/virtio_net.c | 188 > > > > > > ++- > > > > > > > > > > I believe we should make virtio-net to select CONFIG_PAGE_POOL or do > > > > > the ifdef tricks at least. > > > > > > > > > > > > > Sure. it will be done on v2. > > > > > > 1 file changed, 146 insertions(+), 42 deletions(-) > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > > index c5dca0d92e64..99c0ca0c1781 100644 > > > > > > --- a/drivers/net/virtio_net.c > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444); > > > > > > module_param(gso, bool, 0444); > > > > > > module_param(napi_tx, bool, 0644); > > > > > > > > > > > > +static bool page_pool_enabled; > > > >
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Tue, May 30, 2023 at 9:19 AM Liang Chen wrote: > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin wrote: > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin > > > wrote: > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > > The implementation at the moment uses one page per packet in both the > > > > > normal and XDP path. In addition, introducing a module parameter to > > > > > enable > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > In single-core vm testing environments, it gives a modest performance > > > > > gain > > > > > in the normal path. > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > In multi-core vm testing environments, The most significant > > > > > performance > > > > > gain is observed in XDP cpumap: > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > With this foundation, we can further integrate page pool > > > > > fragmentation and > > > > > DMA map/unmap support. > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > > > Why off by default? > > > > I am guessing it sometimes has performance costs too? > > > > > > > > > > > > What happens if we use page pool for big mode too? > > > > The less modes we have the better... > > > > > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. When the > > > packet size is very small, it reduces the likelihood of skb > > > coalescing. But such cases are rare. > > > > small packets are rare? These workloads are easy to create actually. > > Pls try and include benchmark with small packet size. > > > > Sure, Thanks! Before going ahead and posting v2 patch, I would like to hear more advice for the cases of small packets. I have done more performance benchmark with small packets since then. Here is a list of iperf output, With PP and PP fragmenting: 256K: [ 5] 505.00-510.00 sec 1.34 GBytes 2.31 Gbits/sec0144 KBytes 1K: [ 5] 30.00-35.00 sec 4.63 GBytes 7.95 Gbits/sec0 223 KBytes 2K: [ 5] 65.00-70.00 sec 8.33 GBytes 14.3 Gbits/sec0 324 KBytes 4K: [ 5] 30.00-35.00 sec 13.3 GBytes 22.8 Gbits/sec0 1.08 MBytes 8K: [ 5] 50.00-55.00 sec 18.9 GBytes 32.4 Gbits/sec0 744 KBytes 16K: [ 5] 25.00-30.00 sec 24.6 GBytes 42.3 Gbits/sec0963 KBytes 32K: [ 5] 45.00-50.00 sec 29.8 GBytes 51.2 Gbits/sec0 1.25 MBytes 64K: [ 5] 35.00-40.00 sec 34.0 GBytes 58.4 Gbits/sec0 1.70 MBytes 128K: [ 5] 45.00-50.00 sec 36.7 GBytes 63.1 Gbits/sec0 4.26 MBytes 256K: [ 5] 30.00-35.00 sec 40.0 GBytes 68.8 Gbits/sec0 3.20 MBytes Without PP: 256: [ 5] 680.00-685.00 sec 1.57 GBytes 2.69 Gbits/sec0359 KBytes 1K: [ 5] 75.00-80.00 sec 5.47 GBytes 9.40 Gbits/sec0730 KBytes 2K: [ 5] 65.00-70.00 sec 9.46 GBytes 16.2 Gbits/sec0 1.99 MBytes 4K: [ 5] 30.00-35.00 sec 14.5 GBytes 25.0 Gbits/sec0 1.20 MBytes 8K: [ 5] 45.00-50.00 sec 19.9 GBytes 34.1 Gbits/sec0 1.72 MBytes 16K:[ 5] 5.00-10.00 sec 23.8 GBytes 40.9 Gbits/sec0 2.90 MBytes 32K:[ 5] 15.00-20.00 sec 28.0 GBytes 48.1 Gbits/sec0 3.03 MBytes 64K:[ 5] 60.00-65.00 sec 31.8 GBytes 54.6 Gbits/sec0 3.05 MBytes 128K: [ 5] 45.00-50.00 sec 33.0 GBytes 56.6 Gbits/sec1 3.03 MBytes 256K: [ 5] 25.00-30.00 sec 34.7 GBytes 59.6 Gbits/sec0 3.11 MBytes The major factor contributing to the performance drop is the reduction of skb coalescing. Additionally, without the page pool, small packets can still benefit from the allocation of 8 continuous pages by breaking them down into smaller pieces. This effectively reduces the frequency of page allocation from the buddy system. For instance, the arrival of 32 1K packets only triggers one alloc_page call. Therefore, the benefits of using a page pool are limited in such cases. In fact, without page pool fragmenting enabled, it can even hinder performance from this perspective. Upon further consideration, I tend to believe making page pool the default option may not be appropriate. As you pointed out, we cannot simply i
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin wrote: > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin wrote: > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > The implementation at the moment uses one page per packet in both the > > > > normal and XDP path. In addition, introducing a module parameter to > > > > enable > > > > or disable the usage of page pool (disabled by default). > > > > > > > > In single-core vm testing environments, it gives a modest performance > > > > gain > > > > in the normal path. > > > > Upstream codebase: 47.5 Gbits/sec > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > In multi-core vm testing environments, The most significant performance > > > > gain is observed in XDP cpumap: > > > > Upstream codebase: 1.38 Gbits/sec > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > With this foundation, we can further integrate page pool fragmentation > > > > and > > > > DMA map/unmap support. > > > > > > > > Signed-off-by: Liang Chen > > > > > > Why off by default? > > > I am guessing it sometimes has performance costs too? > > > > > > > > > What happens if we use page pool for big mode too? > > > The less modes we have the better... > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. When the > > packet size is very small, it reduces the likelihood of skb > > coalescing. But such cases are rare. > > small packets are rare? These workloads are easy to create actually. > Pls try and include benchmark with small packet size. > Sure, Thanks! > > The usage of page pool for big mode is being evaluated now. Thanks! > > > > > > --- > > > > drivers/net/virtio_net.c | 188 ++- > > > > 1 file changed, 146 insertions(+), 42 deletions(-) > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > index c5dca0d92e64..99c0ca0c1781 100644 > > > > --- a/drivers/net/virtio_net.c > > > > +++ b/drivers/net/virtio_net.c > > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444); > > > > module_param(gso, bool, 0444); > > > > module_param(napi_tx, bool, 0644); > > > > > > > > +static bool page_pool_enabled; > > > > +module_param(page_pool_enabled, bool, 0400); > > > > + > > > > /* FIXME: MTU in config. */ > > > > #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) > > > > #define GOOD_COPY_LEN128 > > > > @@ -159,6 +162,9 @@ struct receive_queue { > > > > /* Chain pages by the private ptr. */ > > > > struct page *pages; > > > > > > > > + /* Page pool */ > > > > + struct page_pool *page_pool; > > > > + > > > > /* Average packet length for mergeable receive buffers. */ > > > > struct ewma_pkt_len mrg_avg_pkt_len; > > > > > > > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void > > > > *buf, unsigned int buflen, > > > > return skb; > > > > } > > > > > > > > +static void virtnet_put_page(struct receive_queue *rq, struct page > > > > *page) > > > > +{ > > > > + if (rq->page_pool) > > > > + page_pool_put_full_page(rq->page_pool, page, true); > > > > + else > > > > + put_page(page); > > > > +} > > > > + > > > > /* Called from bottom half context */ > > > > static struct sk_buff *page_to_skb(struct virtnet_info *vi, > > > > struct receive_queue *rq, > > > > @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct > > > > virtnet_info *vi, > > > > hdr = skb_vnet_hdr(skb); > > > > memcpy(hdr, hdr_p, hdr_len); > > > > if (page_to_free) > > > > - put_page(page_to_free); > > > > + virtnet_put_page(rq, page_to_free); > > > > > > > > return skb; > > > > } > > > > @@ -802,7 +816,7 @@ sta
Re: [PATCH net-next 3/5] virtio_net: Add page pool fragmentation support
On Mon, May 29, 2023 at 9:33 AM Yunsheng Lin wrote: > > On 2023/5/26 13:46, Liang Chen wrote: > > ... > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 99c0ca0c1781..ac40b8c66c59 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -32,7 +32,9 @@ module_param(gso, bool, 0444); > > module_param(napi_tx, bool, 0644); > > > > static bool page_pool_enabled; > > +static bool page_pool_frag; > > module_param(page_pool_enabled, bool, 0400); > > +module_param(page_pool_frag, bool, 0400); > > The below patchset unifies the frag and non-frag page for > page_pool_alloc_frag() API, perhaps it would simplify the > driver's support of page pool. > > https://patchwork.kernel.org/project/netdevbpf/cover/20230526092616.40355-1-linyunsh...@huawei.com/ > Thanks for the information and the work to make driver support easy. I will rebase accordingly after it lands. > > > > ... > > > @@ -1769,13 +1788,29 @@ static int add_recvbuf_mergeable(struct > > virtnet_info *vi, > >*/ > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > if (rq->page_pool) { > > - struct page *page; > > + if (rq->page_pool->p.flags & PP_FLAG_PAGE_FRAG) { > > + if (unlikely(!page_pool_dev_alloc_frag(rq->page_pool, > > + > > &pp_frag_offset, len + room))) > > + return -ENOMEM; > > + buf = (char *)page_address(rq->page_pool->frag_page) + > > + pp_frag_offset; > > + buf += headroom; /* advance address leaving hole at > > front of pkt */ > > + hole = (PAGE_SIZE << rq->page_pool->p.order) > > + - rq->page_pool->frag_offset; > > + if (hole < len + room) { > > + if (!headroom) > > + len += hole; > > + rq->page_pool->frag_offset += hole; > > Is there any reason why the driver need to be aware of page_pool->frag_offset? > Isn't the page_pool_dev_alloc_frag() will drain the last page for you when > page_pool_dev_alloc_frag() is called with size being 'len + room' later? > One case I can think of needing this is to have an accurate truesize report > for skb, but I am not sure it matters that much as 'struct page_frag_cache' > and 'page_frag' implementation both have a similar problem. > Yeah, as you pointed out page_pool_dev_alloc_frag will drain the page itself, so does skb_page_frag_refill. This is trying to keep the logic consistent with non page pool case where the hole was skipped and included in buffer len. > > + } > > + } else { > > + struct page *page; > > > > - page = page_pool_dev_alloc_pages(rq->page_pool); > > - if (unlikely(!page)) > > - return -ENOMEM; > > - buf = (char *)page_address(page); > > - buf += headroom; /* advance address leaving hole at front of > > pkt */ > > + page = page_pool_dev_alloc_pages(rq->page_pool); > > + if (unlikely(!page)) > > + return -ENOMEM; > > + buf = (char *)page_address(page); > > + buf += headroom; /* advance address leaving hole at > > front of pkt */ > > + } > > } else { > > if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, > > gfp))) > > return -ENOMEM; > > @@ -3800,13 +3835,16 @@ static void virtnet_alloc_page_pool(struct > > receive_queue *rq) > > struct virtio_device *vdev = rq->vq->vdev; > > > > struct page_pool_params pp_params = { > > - .order = 0, > > + .order = page_pool_frag ? SKB_FRAG_PAGE_ORDER : 0, > > .pool_size = rq->vq->num_max, > > If it using order SKB_FRAG_PAGE_ORDER page, perhaps pool_size does > not have to be rq->vq->num_max? Even for order 0 page, perhaps the > pool_size does not need to be as big as rq->vq->num_max? > Thanks for pointing this out! pool_size will be lowered to a more appropriate value on v2. > > .nid = dev_to_node(vdev->dev.parent), > > .dev = vdev->dev.parent, > > .offset = 0, > > }; > > > > + if (page_pool_frag) > > + pp_params.flags |= PP_FLAG_PAGE_FRAG; > > + > > rq->page_pool = page_pool_create(&pp_params); > > if (IS_ERR(rq->page_pool)) { > > dev_warn(&vdev->dev, "page pool creation failed: %ld\n", > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 3/5] virtio_net: Add page pool fragmentation support
On Sun, May 28, 2023 at 2:25 PM Michael S. Tsirkin wrote: > > On Fri, May 26, 2023 at 01:46:19PM +0800, Liang Chen wrote: > > To further enhance performance, implement page pool fragmentation > > support and introduce a module parameter to enable or disable it. > > > > In single-core vm testing environments, there is an additional performance > > gain observed in the normal path compared to the one packet per page > > approach. > > Upstream codebase: 47.5 Gbits/sec > > Upstream codebase with page pool: 50.2 Gbits/sec > > Upstream codebase with page pool fragmentation support: 52.3 Gbits/sec > > > > There is also some performance gain for XDP cpumap. > > Upstream codebase: 1.38 Gbits/sec > > Upstream codebase with page pool: 9.74 Gbits/sec > > Upstream codebase with page pool fragmentation: 10.3 Gbits/sec > > > > Signed-off-by: Liang Chen > > I think it's called fragmenting not fragmentation. > > Sure, thanks! > > --- > > drivers/net/virtio_net.c | 72 ++-- > > 1 file changed, 55 insertions(+), 17 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 99c0ca0c1781..ac40b8c66c59 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -32,7 +32,9 @@ module_param(gso, bool, 0444); > > module_param(napi_tx, bool, 0644); > > > > static bool page_pool_enabled; > > +static bool page_pool_frag; > > module_param(page_pool_enabled, bool, 0400); > > +module_param(page_pool_frag, bool, 0400); > > > > /* FIXME: MTU in config. */ > > #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) > > So here again same questions. > > -when is this a net perf gain when does it have no effect? > -can be on by default > - can we get rid of the extra modes? > > Yeah, now I believe it makes sense to enable it by default to avoid the extra modes. Thanks. > > @@ -909,23 +911,32 @@ static struct page *xdp_linearize_page(struct > > receive_queue *rq, > > struct page *p, > > int offset, > > int page_off, > > -unsigned int *len) > > +unsigned int *len, > > +unsigned int *pp_frag_offset) > > { > > int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > struct page *page; > > + unsigned int pp_frag_offset_val; > > > > if (page_off + *len + tailroom > PAGE_SIZE) > > return NULL; > > > > if (rq->page_pool) > > - page = page_pool_dev_alloc_pages(rq->page_pool); > > + if (rq->page_pool->p.flags & PP_FLAG_PAGE_FRAG) > > + page = page_pool_dev_alloc_frag(rq->page_pool, > > pp_frag_offset, > > + PAGE_SIZE); > > + else > > + page = page_pool_dev_alloc_pages(rq->page_pool); > > else > > page = alloc_page(GFP_ATOMIC); > > > > if (!page) > > return NULL; > > > > - memcpy(page_address(page) + page_off, page_address(p) + offset, *len); > > + pp_frag_offset_val = pp_frag_offset ? *pp_frag_offset : 0; > > + > > + memcpy(page_address(page) + page_off + pp_frag_offset_val, > > +page_address(p) + offset, *len); > > page_off += *len; > > > > while (--*num_buf) { > > @@ -948,7 +959,7 @@ static struct page *xdp_linearize_page(struct > > receive_queue *rq, > > goto err_buf; > > } > > > > - memcpy(page_address(page) + page_off, > > + memcpy(page_address(page) + page_off + pp_frag_offset_val, > > page_address(p) + off, buflen); > > page_off += buflen; > > virtnet_put_page(rq, p); > > @@ -1029,7 +1040,7 @@ static struct sk_buff *receive_small_xdp(struct > > net_device *dev, > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > xdp_page = xdp_linearize_page(rq, &num_buf, page, > > offset, header_offset, > > - &tlen); > > + &tlen, NULL); > >
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Sun, May 28, 2023 at 2:40 PM Michael S. Tsirkin wrote: > > On Sat, May 27, 2023 at 08:35:01PM +0800, Liang Chen wrote: > > On Fri, May 26, 2023 at 2:51 PM Jason Wang wrote: > > > > > > On Fri, May 26, 2023 at 1:46 PM Liang Chen > > > wrote: > > > > > > > > The implementation at the moment uses one page per packet in both the > > > > normal and XDP path. > > > > > > It's better to explain why we need a page pool and how it can help the > > > performance. > > > > > > > Sure, I will include that on v2. > > > > In addition, introducing a module parameter to enable > > > > or disable the usage of page pool (disabled by default). > > > > > > If page pool wins for most of the cases, any reason to disable it by > > > default? > > > > > > > Thank you for raising the point. It does make sense to enable it by default. > > I'd like to see more benchmarks pls then, with a variety of packet > sizes, udp and tcp. > Sure, more benchmarks will be provided. Thanks. > > > > > > > > In single-core vm testing environments, it gives a modest performance > > > > gain > > > > in the normal path. > > > > Upstream codebase: 47.5 Gbits/sec > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > In multi-core vm testing environments, The most significant performance > > > > gain is observed in XDP cpumap: > > > > Upstream codebase: 1.38 Gbits/sec > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > Please show more details on the test. E.g which kinds of tests have > > > you measured? > > > > > > Btw, it would be better to measure PPS as well. > > > > > > > Sure. It will be added on v2. > > > > > > > > With this foundation, we can further integrate page pool fragmentation > > > > and > > > > DMA map/unmap support. > > > > > > > > Signed-off-by: Liang Chen > > > > --- > > > > drivers/net/virtio_net.c | 188 ++- > > > > > > I believe we should make virtio-net to select CONFIG_PAGE_POOL or do > > > the ifdef tricks at least. > > > > > > > Sure. it will be done on v2. > > > > 1 file changed, 146 insertions(+), 42 deletions(-) > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > index c5dca0d92e64..99c0ca0c1781 100644 > > > > --- a/drivers/net/virtio_net.c > > > > +++ b/drivers/net/virtio_net.c > > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444); > > > > module_param(gso, bool, 0444); > > > > module_param(napi_tx, bool, 0644); > > > > > > > > +static bool page_pool_enabled; > > > > +module_param(page_pool_enabled, bool, 0400); > > > > + > > > > /* FIXME: MTU in config. */ > > > > #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) > > > > #define GOOD_COPY_LEN 128 > > > > @@ -159,6 +162,9 @@ struct receive_queue { > > > > /* Chain pages by the private ptr. */ > > > > struct page *pages; > > > > > > > > + /* Page pool */ > > > > + struct page_pool *page_pool; > > > > + > > > > /* Average packet length for mergeable receive buffers. */ > > > > struct ewma_pkt_len mrg_avg_pkt_len; > > > > > > > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void > > > > *buf, unsigned int buflen, > > > > return skb; > > > > } > > > > > > > > +static void virtnet_put_page(struct receive_queue *rq, struct page > > > > *page) > > > > +{ > > > > + if (rq->page_pool) > > > > + page_pool_put_full_page(rq->page_pool, page, true); > > > > + else > > > > + put_page(page); > > > > +} > > > > + > > > > /* Called from bottom half context */ > > > > static struct sk_buff *page_to_skb(struct virtnet_info *vi, > > > >struct receive_queue *rq, > > > > @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct > > > > virtnet_info *vi, > > > > hdr = skb_vnet_hd
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Sun, May 28, 2023 at 2:27 PM Michael S. Tsirkin wrote: > > On Sat, May 27, 2023 at 12:11:25AM +0800, kernel test robot wrote: > > Hi Liang, > > > > kernel test robot noticed the following build errors: > > > > [auto build test ERROR on net-next/main] > > > > url: > > https://github.com/intel-lab-lkp/linux/commits/Liang-Chen/virtio_net-Add-page_pool-support-to-improve-performance/20230526-135805 > > base: net-next/main > > patch link: > > https://lore.kernel.org/r/20230526054621.18371-2-liangchen.linux%40gmail.com > > patch subject: [PATCH net-next 2/5] virtio_net: Add page_pool support to > > improve performance > > config: x86_64-defconfig > > (https://download.01.org/0day-ci/archive/20230526/202305262334.gifq3wpg-...@intel.com/config) > > compiler: gcc-11 (Debian 11.3.0-12) 11.3.0 > > reproduce (this is a W=1 build): > > # > > https://github.com/intel-lab-lkp/linux/commit/bfba563f43bba37181d8502cb2e566c32f96ec9e > > git remote add linux-review https://github.com/intel-lab-lkp/linux > > git fetch --no-tags linux-review > > Liang-Chen/virtio_net-Add-page_pool-support-to-improve-performance/20230526-135805 > > git checkout bfba563f43bba37181d8502cb2e566c32f96ec9e > > # save the config file > > mkdir build_dir && cp config build_dir/.config > > make W=1 O=build_dir ARCH=x86_64 olddefconfig > > make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash > > > > If you fix the issue, kindly add following tag where applicable > > | Reported-by: kernel test robot > > | Closes: > > https://lore.kernel.org/oe-kbuild-all/202305262334.gifq3wpg-...@intel.com/ > > > > All errors (new ones prefixed by >>): > > > >ld: vmlinux.o: in function `virtnet_find_vqs': > > >> virtio_net.c:(.text+0x901fb5): undefined reference to `page_pool_create' > >ld: vmlinux.o: in function `add_recvbuf_mergeable.isra.0': > > >> virtio_net.c:(.text+0x905618): undefined reference to > > >> `page_pool_alloc_pages' > >ld: vmlinux.o: in function `xdp_linearize_page': > >virtio_net.c:(.text+0x906b6b): undefined reference to > > `page_pool_alloc_pages' > >ld: vmlinux.o: in function `mergeable_xdp_get_buf.isra.0': > >virtio_net.c:(.text+0x90728f): undefined reference to > > `page_pool_alloc_pages' > > > you need to tweak Kconfig to select PAGE_POOL I think. > Sure, thanks! > > -- > > 0-DAY CI Kernel Test Service > > https://github.com/intel/lkp-tests/wiki > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin wrote: > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > The implementation at the moment uses one page per packet in both the > > normal and XDP path. In addition, introducing a module parameter to enable > > or disable the usage of page pool (disabled by default). > > > > In single-core vm testing environments, it gives a modest performance gain > > in the normal path. > > Upstream codebase: 47.5 Gbits/sec > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > In multi-core vm testing environments, The most significant performance > > gain is observed in XDP cpumap: > > Upstream codebase: 1.38 Gbits/sec > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > With this foundation, we can further integrate page pool fragmentation and > > DMA map/unmap support. > > > > Signed-off-by: Liang Chen > > Why off by default? > I am guessing it sometimes has performance costs too? > > > What happens if we use page pool for big mode too? > The less modes we have the better... > > Sure, now I believe it makes sense to enable it by default. When the packet size is very small, it reduces the likelihood of skb coalescing. But such cases are rare. The usage of page pool for big mode is being evaluated now. Thanks! > > --- > > drivers/net/virtio_net.c | 188 ++- > > 1 file changed, 146 insertions(+), 42 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index c5dca0d92e64..99c0ca0c1781 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444); > > module_param(gso, bool, 0444); > > module_param(napi_tx, bool, 0644); > > > > +static bool page_pool_enabled; > > +module_param(page_pool_enabled, bool, 0400); > > + > > /* FIXME: MTU in config. */ > > #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) > > #define GOOD_COPY_LEN128 > > @@ -159,6 +162,9 @@ struct receive_queue { > > /* Chain pages by the private ptr. */ > > struct page *pages; > > > > + /* Page pool */ > > + struct page_pool *page_pool; > > + > > /* Average packet length for mergeable receive buffers. */ > > struct ewma_pkt_len mrg_avg_pkt_len; > > > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void *buf, > > unsigned int buflen, > > return skb; > > } > > > > +static void virtnet_put_page(struct receive_queue *rq, struct page *page) > > +{ > > + if (rq->page_pool) > > + page_pool_put_full_page(rq->page_pool, page, true); > > + else > > + put_page(page); > > +} > > + > > /* Called from bottom half context */ > > static struct sk_buff *page_to_skb(struct virtnet_info *vi, > > struct receive_queue *rq, > > @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info > > *vi, > > hdr = skb_vnet_hdr(skb); > > memcpy(hdr, hdr_p, hdr_len); > > if (page_to_free) > > - put_page(page_to_free); > > + virtnet_put_page(rq, page_to_free); > > > > return skb; > > } > > @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_device *dev, > > return ret; > > } > > > > -static void put_xdp_frags(struct xdp_buff *xdp) > > +static void put_xdp_frags(struct xdp_buff *xdp, struct receive_queue *rq) > > { > > struct skb_shared_info *shinfo; > > struct page *xdp_page; > > @@ -812,7 +826,7 @@ static void put_xdp_frags(struct xdp_buff *xdp) > > shinfo = xdp_get_shared_info_from_buff(xdp); > > for (i = 0; i < shinfo->nr_frags; i++) { > > xdp_page = skb_frag_page(&shinfo->frags[i]); > > - put_page(xdp_page); > > + virtnet_put_page(rq, xdp_page); > > } > > } > > } > > @@ -903,7 +917,11 @@ static struct page *xdp_linearize_page(struct > > receive_queue *rq, > > if (page_off + *len + tailroom > PAGE_SIZE) > > return NULL; > > > > - page = alloc_page(GFP_ATOMIC); > > + if (rq->page_pool) > > + page = page_pool_dev_alloc_pages(rq->page_pool); > > + else > > + page = allo
Re: [PATCH net-next 1/5] virtio_net: Fix an unsafe reference to the page chain
On Sun, May 28, 2023 at 2:29 PM Michael S. Tsirkin wrote: > > On Fri, May 26, 2023 at 02:38:54PM +0800, Jason Wang wrote: > > On Fri, May 26, 2023 at 1:46 PM Liang Chen > > wrote: > > > > > > "private" of buffer page is currently used for big mode to chain pages. > > > But in mergeable mode, that offset of page could mean something else, > > > e.g. when page_pool page is used instead. So excluding mergeable mode to > > > avoid such a problem. > > > > If this issue happens only in the case of page_pool, it would be > > better to squash it there. > > > > Thanks > > > This is a tiny patch so I don't care. Generally it's ok > to first rework code then change functionality. > in this case what Jason says os right especially because > you then do not need to explain that current code is ok. > Sure. it will be squashed into the page pool enablement patch. Thanks! > > > > > > Signed-off-by: Liang Chen > > > --- > > > drivers/net/virtio_net.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > index 5a7f7a76b920..c5dca0d92e64 100644 > > > --- a/drivers/net/virtio_net.c > > > +++ b/drivers/net/virtio_net.c > > > @@ -497,7 +497,7 @@ static struct sk_buff *page_to_skb(struct > > > virtnet_info *vi, > > > return NULL; > > > > > > page = (struct page *)page->private; > > > - if (page) > > > + if (!vi->mergeable_rx_bufs && page) > > > give_pages(rq, page); > > > goto ok; > > > } > > > -- > > > 2.31.1 > > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 1/5] virtio_net: Fix an unsafe reference to the page chain
On Sun, May 28, 2023 at 2:16 PM Michael S. Tsirkin wrote: > > On Fri, May 26, 2023 at 01:46:17PM +0800, Liang Chen wrote: > > "private" of buffer page is currently used for big mode to chain pages. > > But in mergeable mode, that offset of page could mean something else, > > e.g. when page_pool page is used instead. So excluding mergeable mode to > > avoid such a problem. > > > > Signed-off-by: Liang Chen > > Ugh the subject makes it looks like current code has a problem > but I don't think so because I don't think anything besides > big packets uses page->private. > > The reason patch is needed is because follow up patches > use page_pool. > pls adjust commit log and subject to make all this clear. > > > > --- > > drivers/net/virtio_net.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 5a7f7a76b920..c5dca0d92e64 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -497,7 +497,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info > > *vi, > > return NULL; > > > > page = (struct page *)page->private; > > - if (page) > > + if (!vi->mergeable_rx_bufs && page) > > To be safe let's limit to big packets too: > > if (!vi->mergeable_rx_bufs && vi->big_packets && page) > > > Sure, thanks! > > give_pages(rq, page); > > goto ok; > > } > > -- > > 2.31.1 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 3/5] virtio_net: Add page pool fragmentation support
On Fri, May 26, 2023 at 4:29 PM Horatiu Vultur wrote: > > The 05/26/2023 13:46, Liang Chen wrote: > > Hi Liang, > > > > > To further enhance performance, implement page pool fragmentation > > support and introduce a module parameter to enable or disable it. > > > > In single-core vm testing environments, there is an additional performance > > gain observed in the normal path compared to the one packet per page > > approach. > > Upstream codebase: 47.5 Gbits/sec > > Upstream codebase with page pool: 50.2 Gbits/sec > > Upstream codebase with page pool fragmentation support: 52.3 Gbits/sec > > > > There is also some performance gain for XDP cpumap. > > Upstream codebase: 1.38 Gbits/sec > > Upstream codebase with page pool: 9.74 Gbits/sec > > Upstream codebase with page pool fragmentation: 10.3 Gbits/sec > > > > Signed-off-by: Liang Chen > > --- > > drivers/net/virtio_net.c | 72 ++-- > > 1 file changed, 55 insertions(+), 17 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 99c0ca0c1781..ac40b8c66c59 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -32,7 +32,9 @@ module_param(gso, bool, 0444); > > module_param(napi_tx, bool, 0644); > > > > static bool page_pool_enabled; > > +static bool page_pool_frag; > > module_param(page_pool_enabled, bool, 0400); > > +module_param(page_pool_frag, bool, 0400); > > > > /* FIXME: MTU in config. */ > > #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) > > @@ -909,23 +911,32 @@ static struct page *xdp_linearize_page(struct > > receive_queue *rq, > >struct page *p, > >int offset, > >int page_off, > > - unsigned int *len) > > + unsigned int *len, > > + unsigned int *pp_frag_offset) > > The 'unsigned int *pp_frag_offset' seems to be unaligned. > Sure, Thanks! > > { > > int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > struct page *page; > > + unsigned int pp_frag_offset_val; > > Please use reverse christmas tree notation here. The pp_frag_offset_val > needs to be declared before page; > Sure. Will do on v2. > > > > if (page_off + *len + tailroom > PAGE_SIZE) > > return NULL; > > > > if (rq->page_pool) > > - page = page_pool_dev_alloc_pages(rq->page_pool); > > + if (rq->page_pool->p.flags & PP_FLAG_PAGE_FRAG) > > + page = page_pool_dev_alloc_frag(rq->page_pool, > > pp_frag_offset, > > + PAGE_SIZE); > > Don't you need to check if pp_frag_offset is null? As you call once with > NULL. > At the moment, page_pool is enabled only for mergeable mode, and the path leading to a call with NULL pp_frag_offset is from small mode. But I will evaluate again whether it is beneficial to support page_pool for small mode on v2. Thanks. > > + else > > + page = page_pool_dev_alloc_pages(rq->page_pool); > > else > > page = alloc_page(GFP_ATOMIC); > > > > if (!page) > > return NULL; > > > > - memcpy(page_address(page) + page_off, page_address(p) + offset, > > *len); > > + pp_frag_offset_val = pp_frag_offset ? *pp_frag_offset : 0; > > + > > + memcpy(page_address(page) + page_off + pp_frag_offset_val, > > + page_address(p) + offset, *len); > > page_off += *len; > > > > while (--*num_buf) { > > @@ -948,7 +959,7 @@ static struct page *xdp_linearize_page(struct > > receive_queue *rq, > > goto err_buf; > > } > > > > - memcpy(page_address(page) + page_off, > > + memcpy(page_address(page) + page_off + pp_frag_offset_val, > >page_address(p) + off, buflen); > > page_off += buflen; > > virtnet_put_page(rq, p); > > @@ -1029,7 +1040,7 @@ static struct sk_buff *receive_small_xdp(struct > > net_device *dev, > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > >
Re: [PATCH net-next 5/5] virtio_net: Implement DMA pre-handler
On Fri, May 26, 2023 at 3:06 PM Jason Wang wrote: > > On Fri, May 26, 2023 at 1:47 PM Liang Chen wrote: > > > > Adding a DMA pre-handler that utilizes page pool for managing DMA mappings. > > When IOMMU is enabled, turning on the page_pool_dma_map module parameter to > > select page pool for DMA mapping management gives a significant reduction > > in the overhead caused by DMA mappings. > > > > In testing environments with a single core vm and qemu emulated IOMMU, > > significant performance improvements can be observed: > > Upstream codebase: 1.76 Gbits/sec > > Upstream codebase with page pool fragmentation support: 1.81 Gbits/sec > > Upstream codebase with page pool fragmentation and DMA support: 19.3 > > Gbits/sec > > > > Signed-off-by: Liang Chen > > --- > > drivers/net/virtio_net.c | 55 > > 1 file changed, 55 insertions(+) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index ac40b8c66c59..73cc4f9fe4fa 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -22,6 +22,7 @@ > > #include > > #include > > #include > > +#include > > > > static int napi_weight = NAPI_POLL_WEIGHT; > > module_param(napi_weight, int, 0444); > > @@ -33,8 +34,10 @@ module_param(napi_tx, bool, 0644); > > > > static bool page_pool_enabled; > > static bool page_pool_frag; > > +static bool page_pool_dma_map; > > module_param(page_pool_enabled, bool, 0400); > > module_param(page_pool_frag, bool, 0400); > > +module_param(page_pool_dma_map, bool, 0400); > > > > /* FIXME: MTU in config. */ > > #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) > > @@ -3830,6 +3833,49 @@ static void virtnet_del_vqs(struct virtnet_info *vi) > > virtnet_free_queues(vi); > > } > > > > +static dma_addr_t virtnet_pp_dma_map_page(struct device *dev, struct page > > *page, > > + unsigned long offset, size_t size, > > + enum dma_data_direction dir, > > unsigned long attrs) > > +{ > > + struct page *head_page; > > + > > + if (dir != DMA_FROM_DEVICE) > > + return 0; > > + > > + head_page = compound_head(page); > > + return page_pool_get_dma_addr(head_page) > > + + (page - head_page) * PAGE_SIZE > > + + offset; > > So it's not a map, it is just a query from the dma address from the pool. > > > +} > > + > > +static bool virtnet_pp_dma_unmap_page(struct device *dev, dma_addr_t > > dma_handle, > > + size_t size, enum dma_data_direction > > dir, > > + unsigned long attrs) > > +{ > > + phys_addr_t phys; > > + > > + /* Handle only the RX direction, and sync the DMA memory only if > > it's not > > +* a DMA coherent architecture. > > +*/ > > + if (dir != DMA_FROM_DEVICE) > > + return false; > > + > > + if (dev_is_dma_coherent(dev)) > > + return true; > > + > > + phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle); > > This would be somehow slow. If we track the mapping by driver, it > would be much faster. > > More could be seen here: > > https://lists.linuxfoundation.org/pipermail/virtualization/2023-May/066778.html > > Thanks > Thanks for the information. I agree with your suggestion, and I will drop the last two patches on v2 and wait for Xuan's patch to land for dma mapping management. > > + if (WARN_ON(!phys)) > > + return false; > > + > > + arch_sync_dma_for_cpu(phys, size, dir); > > + return true; > > +} > > + > > +static struct virtqueue_pre_dma_ops virtnet_pp_pre_dma_ops = { > > + .map_page = virtnet_pp_dma_map_page, > > + .unmap_page = virtnet_pp_dma_unmap_page, > > +}; > > + > > static void virtnet_alloc_page_pool(struct receive_queue *rq) > > { > > struct virtio_device *vdev = rq->vq->vdev; > > @@ -3845,6 +3891,15 @@ static void virtnet_alloc_page_pool(struct > > receive_queue *rq) > > if (page_pool_frag) > > pp_params.flags |= PP_FLAG_PAGE_FRAG; > > > > + /* Consider using page pool DMA support only when DMA API is used. > > */ > > +
Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Fri, May 26, 2023 at 2:51 PM Jason Wang wrote: > > On Fri, May 26, 2023 at 1:46 PM Liang Chen wrote: > > > > The implementation at the moment uses one page per packet in both the > > normal and XDP path. > > It's better to explain why we need a page pool and how it can help the > performance. > Sure, I will include that on v2. > > In addition, introducing a module parameter to enable > > or disable the usage of page pool (disabled by default). > > If page pool wins for most of the cases, any reason to disable it by default? > Thank you for raising the point. It does make sense to enable it by default. > > > > In single-core vm testing environments, it gives a modest performance gain > > in the normal path. > > Upstream codebase: 47.5 Gbits/sec > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > In multi-core vm testing environments, The most significant performance > > gain is observed in XDP cpumap: > > Upstream codebase: 1.38 Gbits/sec > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > Please show more details on the test. E.g which kinds of tests have > you measured? > > Btw, it would be better to measure PPS as well. > Sure. It will be added on v2. > > > > With this foundation, we can further integrate page pool fragmentation and > > DMA map/unmap support. > > > > Signed-off-by: Liang Chen > > --- > > drivers/net/virtio_net.c | 188 ++- > > I believe we should make virtio-net to select CONFIG_PAGE_POOL or do > the ifdef tricks at least. > Sure. it will be done on v2. > > 1 file changed, 146 insertions(+), 42 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index c5dca0d92e64..99c0ca0c1781 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444); > > module_param(gso, bool, 0444); > > module_param(napi_tx, bool, 0644); > > > > +static bool page_pool_enabled; > > +module_param(page_pool_enabled, bool, 0400); > > + > > /* FIXME: MTU in config. */ > > #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) > > #define GOOD_COPY_LEN 128 > > @@ -159,6 +162,9 @@ struct receive_queue { > > /* Chain pages by the private ptr. */ > > struct page *pages; > > > > + /* Page pool */ > > + struct page_pool *page_pool; > > + > > /* Average packet length for mergeable receive buffers. */ > > struct ewma_pkt_len mrg_avg_pkt_len; > > > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void *buf, > > unsigned int buflen, > > return skb; > > } > > > > +static void virtnet_put_page(struct receive_queue *rq, struct page *page) > > +{ > > + if (rq->page_pool) > > + page_pool_put_full_page(rq->page_pool, page, true); > > + else > > + put_page(page); > > +} > > + > > /* Called from bottom half context */ > > static struct sk_buff *page_to_skb(struct virtnet_info *vi, > >struct receive_queue *rq, > > @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info > > *vi, > > hdr = skb_vnet_hdr(skb); > > memcpy(hdr, hdr_p, hdr_len); > > if (page_to_free) > > - put_page(page_to_free); > > + virtnet_put_page(rq, page_to_free); > > > > return skb; > > } > > @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_device *dev, > > return ret; > > } > > > > -static void put_xdp_frags(struct xdp_buff *xdp) > > +static void put_xdp_frags(struct xdp_buff *xdp, struct receive_queue *rq) > > { > > rq could be fetched from xdp_rxq_info? Yeah, it has the queue_index there. > > > struct skb_shared_info *shinfo; > > struct page *xdp_page; > > @@ -812,7 +826,7 @@ static void put_xdp_frags(struct xdp_buff *xdp) > > shinfo = xdp_get_shared_info_from_buff(xdp); > > for (i = 0; i < shinfo->nr_frags; i++) { > > xdp_page = skb_frag_page(&shinfo->frags[i]); > > - put_page(xdp_page); > > + virtnet_put_page(rq, xdp_page); > > } > > } > > } > > @@ -903,7 +917,11 @@ static struct page *xdp_linearize_page(struct > > receive_queue *rq, > &
Re: [PATCH net-next 1/5] virtio_net: Fix an unsafe reference to the page chain
On Fri, May 26, 2023 at 2:39 PM Jason Wang wrote: > > On Fri, May 26, 2023 at 1:46 PM Liang Chen wrote: > > > > "private" of buffer page is currently used for big mode to chain pages. > > But in mergeable mode, that offset of page could mean something else, > > e.g. when page_pool page is used instead. So excluding mergeable mode to > > avoid such a problem. > > If this issue happens only in the case of page_pool, it would be > better to squash it there. > > Thanks Sure, thanks! > > > > > Signed-off-by: Liang Chen > > --- > > drivers/net/virtio_net.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 5a7f7a76b920..c5dca0d92e64 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -497,7 +497,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info > > *vi, > > return NULL; > > > > page = (struct page *)page->private; > > - if (page) > > + if (!vi->mergeable_rx_bufs && page) > > give_pages(rq, page); > > goto ok; > > } > > -- > > 2.31.1 > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next 5/5] virtio_net: Implement DMA pre-handler
Adding a DMA pre-handler that utilizes page pool for managing DMA mappings. When IOMMU is enabled, turning on the page_pool_dma_map module parameter to select page pool for DMA mapping management gives a significant reduction in the overhead caused by DMA mappings. In testing environments with a single core vm and qemu emulated IOMMU, significant performance improvements can be observed: Upstream codebase: 1.76 Gbits/sec Upstream codebase with page pool fragmentation support: 1.81 Gbits/sec Upstream codebase with page pool fragmentation and DMA support: 19.3 Gbits/sec Signed-off-by: Liang Chen --- drivers/net/virtio_net.c | 55 1 file changed, 55 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ac40b8c66c59..73cc4f9fe4fa 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -22,6 +22,7 @@ #include #include #include +#include static int napi_weight = NAPI_POLL_WEIGHT; module_param(napi_weight, int, 0444); @@ -33,8 +34,10 @@ module_param(napi_tx, bool, 0644); static bool page_pool_enabled; static bool page_pool_frag; +static bool page_pool_dma_map; module_param(page_pool_enabled, bool, 0400); module_param(page_pool_frag, bool, 0400); +module_param(page_pool_dma_map, bool, 0400); /* FIXME: MTU in config. */ #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) @@ -3830,6 +3833,49 @@ static void virtnet_del_vqs(struct virtnet_info *vi) virtnet_free_queues(vi); } +static dma_addr_t virtnet_pp_dma_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, + enum dma_data_direction dir, unsigned long attrs) +{ + struct page *head_page; + + if (dir != DMA_FROM_DEVICE) + return 0; + + head_page = compound_head(page); + return page_pool_get_dma_addr(head_page) + + (page - head_page) * PAGE_SIZE + + offset; +} + +static bool virtnet_pp_dma_unmap_page(struct device *dev, dma_addr_t dma_handle, + size_t size, enum dma_data_direction dir, + unsigned long attrs) +{ + phys_addr_t phys; + + /* Handle only the RX direction, and sync the DMA memory only if it's not +* a DMA coherent architecture. +*/ + if (dir != DMA_FROM_DEVICE) + return false; + + if (dev_is_dma_coherent(dev)) + return true; + + phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle); + if (WARN_ON(!phys)) + return false; + + arch_sync_dma_for_cpu(phys, size, dir); + return true; +} + +static struct virtqueue_pre_dma_ops virtnet_pp_pre_dma_ops = { + .map_page = virtnet_pp_dma_map_page, + .unmap_page = virtnet_pp_dma_unmap_page, +}; + static void virtnet_alloc_page_pool(struct receive_queue *rq) { struct virtio_device *vdev = rq->vq->vdev; @@ -3845,6 +3891,15 @@ static void virtnet_alloc_page_pool(struct receive_queue *rq) if (page_pool_frag) pp_params.flags |= PP_FLAG_PAGE_FRAG; + /* Consider using page pool DMA support only when DMA API is used. */ + if (virtio_has_feature(vdev, VIRTIO_F_ACCESS_PLATFORM) && + page_pool_dma_map) { + pp_params.flags |= PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV; + pp_params.dma_dir = DMA_FROM_DEVICE; + pp_params.max_len = PAGE_SIZE << pp_params.order; + virtqueue_register_pre_dma_ops(rq->vq, &virtnet_pp_pre_dma_ops); + } + rq->page_pool = page_pool_create(&pp_params); if (IS_ERR(rq->page_pool)) { dev_warn(&vdev->dev, "page pool creation failed: %ld\n", -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next 4/5] virtio_ring: Introduce DMA pre-handler
Currently, DMA operations of virtio devices' data buffer are encapsulated within the underlying virtqueue implementation. DMA map/unmap operations are performed for each data buffer attached to/detached from the virtqueue, which is transparent and invisible to the higher-level virtio device drivers. This encapsulation makes it not viable for device drivers to introduce certain mechanisms, such as page pool, that require explicit management of DMA map/unmap. Therefore, by inserting a pre-handler before the generic DMA map/unmap operations, virtio device drivers have the opportunity to participate in DMA operations. Signed-off-by: Liang Chen --- drivers/virtio/virtio_ring.c | 73 +--- include/linux/virtio.h | 18 + 2 files changed, 85 insertions(+), 6 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index c5310eaf8b46..a99641260555 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -213,6 +213,9 @@ struct vring_virtqueue { bool last_add_time_valid; ktime_t last_add_time; #endif + + /* DMA mapping Pre-handler for virtio device driver */ + struct virtqueue_pre_dma_ops *pre_dma_ops; }; static struct virtqueue *__vring_new_virtqueue(unsigned int index, @@ -369,6 +372,19 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq, return (dma_addr_t)sg_phys(sg); } + /* Allow virtio drivers to perform customized mapping operation, and +* fallback to the generic path if it fails to handle the mapping. +*/ + if (vq->pre_dma_ops && vq->pre_dma_ops->map_page) { + dma_addr_t addr; + + addr = vq->pre_dma_ops->map_page(vring_dma_dev(vq), + sg_page(sg), sg->offset, sg->length, + direction, 0); + if (addr) + return addr; + } + /* * We can't use dma_map_sg, because we don't use scatterlists in * the way it expects (we don't guarantee that the scatterlist @@ -432,6 +448,15 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq, flags = virtio16_to_cpu(vq->vq.vdev, desc->flags); + if (vq->pre_dma_ops && vq->pre_dma_ops->unmap_page) { + if (vq->pre_dma_ops->unmap_page(vring_dma_dev(vq), + virtio64_to_cpu(vq->vq.vdev, desc->addr), + virtio32_to_cpu(vq->vq.vdev, desc->len), + (flags & VRING_DESC_F_WRITE) ? + DMA_FROM_DEVICE : DMA_TO_DEVICE, 0)) + return; + } + dma_unmap_page(vring_dma_dev(vq), virtio64_to_cpu(vq->vq.vdev, desc->addr), virtio32_to_cpu(vq->vq.vdev, desc->len), @@ -456,14 +481,22 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq, extra[i].len, (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE); - } else { - dma_unmap_page(vring_dma_dev(vq), - extra[i].addr, - extra[i].len, - (flags & VRING_DESC_F_WRITE) ? - DMA_FROM_DEVICE : DMA_TO_DEVICE); + goto out; + } else if (vq->pre_dma_ops && vq->pre_dma_ops->unmap_page) { + if (vq->pre_dma_ops->unmap_page(vring_dma_dev(vq), + extra[i].addr, + extra[i].len, + (flags & VRING_DESC_F_WRITE) ? + DMA_FROM_DEVICE : DMA_TO_DEVICE, 0)) + goto out; } + dma_unmap_page(vring_dma_dev(vq), + extra[i].addr, + extra[i].len, + (flags & VRING_DESC_F_WRITE) ? + DMA_FROM_DEVICE : DMA_TO_DEVICE); + out: return extra[i].next; } @@ -1206,10 +1239,19 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq, (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE); } else { + if (vq->pre_dma_ops && vq->pre_dma_ops->unmap_page) { + if (vq->pre_dma_ops->unmap_page(vring_dma_dev(vq), + extra->addr, + extra->len, +
[PATCH net-next 3/5] virtio_net: Add page pool fragmentation support
To further enhance performance, implement page pool fragmentation support and introduce a module parameter to enable or disable it. In single-core vm testing environments, there is an additional performance gain observed in the normal path compared to the one packet per page approach. Upstream codebase: 47.5 Gbits/sec Upstream codebase with page pool: 50.2 Gbits/sec Upstream codebase with page pool fragmentation support: 52.3 Gbits/sec There is also some performance gain for XDP cpumap. Upstream codebase: 1.38 Gbits/sec Upstream codebase with page pool: 9.74 Gbits/sec Upstream codebase with page pool fragmentation: 10.3 Gbits/sec Signed-off-by: Liang Chen --- drivers/net/virtio_net.c | 72 ++-- 1 file changed, 55 insertions(+), 17 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 99c0ca0c1781..ac40b8c66c59 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -32,7 +32,9 @@ module_param(gso, bool, 0444); module_param(napi_tx, bool, 0644); static bool page_pool_enabled; +static bool page_pool_frag; module_param(page_pool_enabled, bool, 0400); +module_param(page_pool_frag, bool, 0400); /* FIXME: MTU in config. */ #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) @@ -909,23 +911,32 @@ static struct page *xdp_linearize_page(struct receive_queue *rq, struct page *p, int offset, int page_off, - unsigned int *len) + unsigned int *len, + unsigned int *pp_frag_offset) { int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); struct page *page; + unsigned int pp_frag_offset_val; if (page_off + *len + tailroom > PAGE_SIZE) return NULL; if (rq->page_pool) - page = page_pool_dev_alloc_pages(rq->page_pool); + if (rq->page_pool->p.flags & PP_FLAG_PAGE_FRAG) + page = page_pool_dev_alloc_frag(rq->page_pool, pp_frag_offset, + PAGE_SIZE); + else + page = page_pool_dev_alloc_pages(rq->page_pool); else page = alloc_page(GFP_ATOMIC); if (!page) return NULL; - memcpy(page_address(page) + page_off, page_address(p) + offset, *len); + pp_frag_offset_val = pp_frag_offset ? *pp_frag_offset : 0; + + memcpy(page_address(page) + page_off + pp_frag_offset_val, + page_address(p) + offset, *len); page_off += *len; while (--*num_buf) { @@ -948,7 +959,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq, goto err_buf; } - memcpy(page_address(page) + page_off, + memcpy(page_address(page) + page_off + pp_frag_offset_val, page_address(p) + off, buflen); page_off += buflen; virtnet_put_page(rq, p); @@ -1029,7 +1040,7 @@ static struct sk_buff *receive_small_xdp(struct net_device *dev, SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); xdp_page = xdp_linearize_page(rq, &num_buf, page, offset, header_offset, - &tlen); + &tlen, NULL); if (!xdp_page) goto err_xdp; @@ -1323,6 +1334,7 @@ static void *mergeable_xdp_get_buf(struct virtnet_info *vi, unsigned int headroom = mergeable_ctx_to_headroom(ctx); struct page *xdp_page; unsigned int xdp_room; + unsigned int page_frag_offset = 0; /* Transient failure which in theory could occur if * in-flight packets from before XDP was enabled reach @@ -1356,7 +1368,8 @@ static void *mergeable_xdp_get_buf(struct virtnet_info *vi, xdp_page = xdp_linearize_page(rq, num_buf, *page, offset, VIRTIO_XDP_HEADROOM, - len); + len, + &page_frag_offset); if (!xdp_page) return NULL; } else { @@ -1366,14 +1379,19 @@ static void *mergeable_xdp_get_buf(struct virtnet_info *vi, return NULL; if (rq->page_pool) - xdp_page = page_pool_dev_alloc_pages(rq->page_pool); + if (rq->page_pool->p.flags & PP_FLAG_PAGE_FRAG) +
[PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
The implementation at the moment uses one page per packet in both the normal and XDP path. In addition, introducing a module parameter to enable or disable the usage of page pool (disabled by default). In single-core vm testing environments, it gives a modest performance gain in the normal path. Upstream codebase: 47.5 Gbits/sec Upstream codebase + page_pool support: 50.2 Gbits/sec In multi-core vm testing environments, The most significant performance gain is observed in XDP cpumap: Upstream codebase: 1.38 Gbits/sec Upstream codebase + page_pool support: 9.74 Gbits/sec With this foundation, we can further integrate page pool fragmentation and DMA map/unmap support. Signed-off-by: Liang Chen --- drivers/net/virtio_net.c | 188 ++- 1 file changed, 146 insertions(+), 42 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index c5dca0d92e64..99c0ca0c1781 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -31,6 +31,9 @@ module_param(csum, bool, 0444); module_param(gso, bool, 0444); module_param(napi_tx, bool, 0644); +static bool page_pool_enabled; +module_param(page_pool_enabled, bool, 0400); + /* FIXME: MTU in config. */ #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) #define GOOD_COPY_LEN 128 @@ -159,6 +162,9 @@ struct receive_queue { /* Chain pages by the private ptr. */ struct page *pages; + /* Page pool */ + struct page_pool *page_pool; + /* Average packet length for mergeable receive buffers. */ struct ewma_pkt_len mrg_avg_pkt_len; @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void *buf, unsigned int buflen, return skb; } +static void virtnet_put_page(struct receive_queue *rq, struct page *page) +{ + if (rq->page_pool) + page_pool_put_full_page(rq->page_pool, page, true); + else + put_page(page); +} + /* Called from bottom half context */ static struct sk_buff *page_to_skb(struct virtnet_info *vi, struct receive_queue *rq, @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, hdr = skb_vnet_hdr(skb); memcpy(hdr, hdr_p, hdr_len); if (page_to_free) - put_page(page_to_free); + virtnet_put_page(rq, page_to_free); return skb; } @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_device *dev, return ret; } -static void put_xdp_frags(struct xdp_buff *xdp) +static void put_xdp_frags(struct xdp_buff *xdp, struct receive_queue *rq) { struct skb_shared_info *shinfo; struct page *xdp_page; @@ -812,7 +826,7 @@ static void put_xdp_frags(struct xdp_buff *xdp) shinfo = xdp_get_shared_info_from_buff(xdp); for (i = 0; i < shinfo->nr_frags; i++) { xdp_page = skb_frag_page(&shinfo->frags[i]); - put_page(xdp_page); + virtnet_put_page(rq, xdp_page); } } } @@ -903,7 +917,11 @@ static struct page *xdp_linearize_page(struct receive_queue *rq, if (page_off + *len + tailroom > PAGE_SIZE) return NULL; - page = alloc_page(GFP_ATOMIC); + if (rq->page_pool) + page = page_pool_dev_alloc_pages(rq->page_pool); + else + page = alloc_page(GFP_ATOMIC); + if (!page) return NULL; @@ -926,21 +944,24 @@ static struct page *xdp_linearize_page(struct receive_queue *rq, * is sending packet larger than the MTU. */ if ((page_off + buflen + tailroom) > PAGE_SIZE) { - put_page(p); + virtnet_put_page(rq, p); goto err_buf; } memcpy(page_address(page) + page_off, page_address(p) + off, buflen); page_off += buflen; - put_page(p); + virtnet_put_page(rq, p); } /* Headroom does not contribute to packet length */ *len = page_off - VIRTIO_XDP_HEADROOM; return page; err_buf: - __free_pages(page, 0); + if (rq->page_pool) + page_pool_put_full_page(rq->page_pool, page, true); + else + __free_pages(page, 0); return NULL; } @@ -1144,7 +1165,7 @@ static void mergeable_buf_free(struct receive_queue *rq, int num_buf, } stats->bytes += len; page = virt_to_head_page(buf); - put_page(page); + virtnet_put_page(rq, page); } } @@ -1264,7 +1285,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev, cur_frag_size = truesize; xdp_frags_truesz += cur_frag_size; if (unlikel
[PATCH net-next 1/5] virtio_net: Fix an unsafe reference to the page chain
"private" of buffer page is currently used for big mode to chain pages. But in mergeable mode, that offset of page could mean something else, e.g. when page_pool page is used instead. So excluding mergeable mode to avoid such a problem. Signed-off-by: Liang Chen --- drivers/net/virtio_net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 5a7f7a76b920..c5dca0d92e64 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -497,7 +497,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, return NULL; page = (struct page *)page->private; - if (page) + if (!vi->mergeable_rx_bufs && page) give_pages(rq, page); goto ok; } -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization