Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists
On Mon, Jul 31, 2023 at 5:08 PM Michael S. Tsirkin wrote: > > On Mon, Jul 31, 2023 at 07:15:31AM +, Dragos Tatulea wrote: > > On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote: > > > On Thu, Jul 27, 2023 at 04:02:16PM +, Dragos Tatulea wrote: > > > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote: > > > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote: > > > > > > The ndev was accessed on shutdown without a check if it actually > > > > > > exists. > > > > > > This triggered the crash pasted below. This patch simply adds a > > > > > > check > > > > > > before using ndev. > > > > > > > > > > > > BUG: kernel NULL pointer dereference, address: 0300 > > > > > > #PF: supervisor read access in kernel mode > > > > > > #PF: error_code(0x) - not-present page > > > > > > PGD 0 P4D 0 > > > > > > Oops: [#1] SMP > > > > > > CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0- > > > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1 > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > > > > > rel-1.13.0-0- > > > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > > > > > > RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > > > > RSP: 0018:8881003bfdc0 EFLAGS: 00010286 > > > > > > RAX: 888103befba0 RBX: 888109d28008 RCX: 0017 > > > > > > RDX: 0001 RSI: 0212 RDI: 888109d28000 > > > > > > RBP: R08: 000d3a3a3882 R09: 0001 > > > > > > R10: R11: R12: 888109d28000 > > > > > > R13: 888109d28080 R14: fee1dead R15: > > > > > > FS: 7f4969e0be40() GS:88852c80() > > > > > > knlGS: > > > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > > > CR2: 0300 CR3: 0001051cd006 CR4: 00370eb0 > > > > > > DR0: DR1: DR2: > > > > > > DR3: DR6: fffe0ff0 DR7: 0400 > > > > > > Call Trace: > > > > > > > > > > > > ? __die+0x20/0x60 > > > > > > ? page_fault_oops+0x14c/0x3c0 > > > > > > ? exc_page_fault+0x75/0x140 > > > > > > ? asm_exc_page_fault+0x22/0x30 > > > > > > ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > > > > device_shutdown+0x13e/0x1e0 > > > > > > kernel_restart+0x36/0x90 > > > > > > __do_sys_reboot+0x141/0x210 > > > > > > ? vfs_writev+0xcd/0x140 > > > > > > ? handle_mm_fault+0x161/0x260 > > > > > > ? do_writev+0x6b/0x110 > > > > > > do_syscall_64+0x3d/0x90 > > > > > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > > > RIP: 0033:0x7f496990fb56 > > > > > > RSP: 002b:7fffc7bdde88 EFLAGS: 0206 ORIG_RAX: > > > > > > 00a9 > > > > > > RAX: ffda RBX: RCX: 7f496990fb56 > > > > > > RDX: 01234567 RSI: 28121969 RDI: fee1dead > > > > > > RBP: 7fffc7bde1d0 R08: R09: > > > > > > R10: R11: 0206 R12: > > > > > > R13: 7fffc7bddf10 R14: R15: 7fffc7bde2b8 > > > > > > > > > > > > CR2: 0300 > > > > > > ---[ end trace ]--- > > > > > > > > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") > > > > > > Signed-off-by: Dragos Tatulea > > > > > > --- > > > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > > index 9138ef2fb2c8..e2e7ebd71798 100644 > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct > > > > > > auxiliary_device > > > > > > *auxdev) > > > > > > mgtdev = auxiliary_get_drvdata(auxdev); > > > > > > ndev = mgtdev->ndev; > > > > > > > > > > > > - free_irqs(ndev); > > > > > > + if (ndev) > > > > > > + free_irqs(ndev); > > > > > > } > > > > > > > > > > > > > > > > something I don't get: > > > > > irqs are allocated in mlx5_vdpa_dev_add > > > > > why are they not freed in mlx5_vdpa_dev_del? > > > > > > > > > That is a good point. I will try to find out. I also don't get why > > > > free_irq > > > > is > > > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I > > > > can > > > > change > > > > that in a different refactoring. > > > > > > as it is I have no idea whether e.g. ndev can change > > > between these two call sites. that would make the check > > > pointless. > > > > > > > > this is what's creating all this mess. > > > > > > > > > > > > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct > > > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I > > > >
Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
On Mon, 31 Jul 2023 19:36:06 -0700, Jakub Kicinski wrote: > On Tue, 1 Aug 2023 10:03:44 +0800 Xuan Zhuo wrote: > > > Virtio is either a SW > > > construct or offloaded to very capable HW, so either way cost of > > > creating an extra instance for DPDK or whatever else is very low. > > > > The extra instance is virtio-net? > > > > I think there is a gap. So let me give you a brief introduction of our case. > > > > Firstly, we donot use dpdk. We use the AF_XDP, because of that the AF_XDP is > > more simpler and easy to deploy for the nginx. > > > > We use the AF_XDP to speedup the UDP of the quic. By the library, the APP > > just > > needs some simple change. > > > > On the AliYun, the net driver is virtio-net. So we want the virtio-net > > support > > the AF_XDP. > > > > I guess what you mean is that we can speed up through the cooperation of > > devices > > and drivers, but our machines are public clouds, and we cannot change the > > back-end devices of virtio under normal circumstances. > > > > Here I do not know the different of the real hw and the virtio-net. > > You have this working and benchmarked or this is just and idea? This is not just an idea. I said that has been used on large scale. This is the library for the APP to use the AF_XDP. We has open it. https://gitee.com/anolis/libxudp This is the Alibaba version of the nginx. That has been opened, that supported to work with the libray to use AF_XDP. http://tengine.taobao.org/ I supported this on our kernel release Anolis/Alinux. The work was done about 2 years ago. You know, I pushed the first version to enable AF_XDP on virtio-net about two years ago. I never thought the job would be so difficult. The nic (virtio-net) of AliYun can reach 24,000,000PPS. So I think there is no different with the real HW on the performance. With the AF_XDP, the UDP pps is seven times that of the kernel udp stack. > > What about io_uring zero copy w/ pre-registered buffers. > You'll get csum offload, GSO, all the normal perf features. We tried io-uring, but it was not suitable for our scenario. Yes, now the AF_XDP does not support the csum offload and GSO. This is indeed a small problem. Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
On Mon, 31 Jul 2023 08:46:51 -0700, Jakub Kicinski wrote: > On Mon, 31 Jul 2023 09:23:29 +0800 Jason Wang wrote: > > > I'd step back and ask you why do you want to use AF_XDP with virtio. > > > Instead of bifurcating one virtio instance into different queues why > > > not create a separate virtio instance? > > > > I'm not sure I get this, but do you mean a separate virtio device that > > owns AF_XDP queues only? If I understand it correctly, bifurcating is > > one of the key advantages of AF_XDP. What's more, current virtio > > doesn't support being split at queue (pair) level. And it may still > > suffer from the yes/no DMA API issue. > > I guess we should step even further back and ask Xuan what the use case > is, because I'm not very sure. All we hear is "enable AF_XDP on virtio" > but AF_XDP is barely used on real HW, so why? Why just for real HW? I want to enable AF_XDP on virtio-net. Then the user can send/recv packets by AF_XDP bypass through the kernel. That has be used on large scale. I donot know what is the problem of the virtio-net. Why do you think that the virtio-net cannot work with AF_XDP? > > Bifurcating makes (used to make?) some sense in case of real HW when you > had only one PCI function and had to subdivide it. Sorry I do not get this. > Virtio is either a SW > construct or offloaded to very capable HW, so either way cost of > creating an extra instance for DPDK or whatever else is very low. The extra instance is virtio-net? I think there is a gap. So let me give you a brief introduction of our case. Firstly, we donot use dpdk. We use the AF_XDP, because of that the AF_XDP is more simpler and easy to deploy for the nginx. We use the AF_XDP to speedup the UDP of the quic. By the library, the APP just needs some simple change. On the AliYun, the net driver is virtio-net. So we want the virtio-net support the AF_XDP. I guess what you mean is that we can speed up through the cooperation of devices and drivers, but our machines are public clouds, and we cannot change the back-end devices of virtio under normal circumstances. Here I do not know the different of the real hw and the virtio-net. Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio: a new vcpu watchdog driver
On 7/31/23 04:30, Xuan Zhuo wrote: On Mon, 31 Jul 2023 09:25:12 +0800, "zhanghao1" wrote: A new virtio pci driver is added for listening to vcpus inside guest. Each vcpu creates a corresponding thread to periodically send data to qemu's back-end watchdog device. If a vCPU is in the stall state, data cannot be sent to back-end virtio device. As a result, the back-end device can detect that the guest is in the stall state. The driver is mainly used with the back-end watchdog device of qemu. The qemu backend watchdog device is implemented as follow: https://lore.kernel.org/qemu-devel/20230705081813.411526-1-zhangh...@kylinos.cn/ I think we need to introduce this new device to the virtio spec firstly. And when having a watchdog drivers: shouldn't you watch the virtio queue _itself_? There is no guarantee that the virtio queue makes forward progress ... Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next] vsock: Remove unused function declarations
On Sat, Jul 29, 2023 at 08:20:36PM +0800, Yue Haibing wrote: These are never implemented since introduction in commit d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Yue Haibing --- net/vmw_vsock/vmci_transport.h | 3 --- 1 file changed, 3 deletions(-) Good catch ;-) I'd used "vsock/vmci:" as a prefix in the title. With or without: Reviewed-by: Stefano Garzarella diff --git a/net/vmw_vsock/vmci_transport.h b/net/vmw_vsock/vmci_transport.h index b7b072194282..dbda3ababa14 100644 --- a/net/vmw_vsock/vmci_transport.h +++ b/net/vmw_vsock/vmci_transport.h @@ -116,9 +116,6 @@ struct vmci_transport { spinlock_t lock; /* protects sk. */ }; -int vmci_transport_register(void); -void vmci_transport_unregister(void); - int vmci_transport_send_wrote_bh(struct sockaddr_vm *dst, struct sockaddr_vm *src); int vmci_transport_send_read_bh(struct sockaddr_vm *dst, -- 2.34.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists
On Mon, Jul 31, 2023 at 07:15:31AM +, Dragos Tatulea wrote: > On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote: > > On Thu, Jul 27, 2023 at 04:02:16PM +, Dragos Tatulea wrote: > > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote: > > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote: > > > > > The ndev was accessed on shutdown without a check if it actually > > > > > exists. > > > > > This triggered the crash pasted below. This patch simply adds a check > > > > > before using ndev. > > > > > > > > > > BUG: kernel NULL pointer dereference, address: 0300 > > > > > #PF: supervisor read access in kernel mode > > > > > #PF: error_code(0x) - not-present page > > > > > PGD 0 P4D 0 > > > > > Oops: [#1] SMP > > > > > CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0- > > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1 > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > > > > rel-1.13.0-0- > > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > > > > > RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > > > RSP: 0018:8881003bfdc0 EFLAGS: 00010286 > > > > > RAX: 888103befba0 RBX: 888109d28008 RCX: 0017 > > > > > RDX: 0001 RSI: 0212 RDI: 888109d28000 > > > > > RBP: R08: 000d3a3a3882 R09: 0001 > > > > > R10: R11: R12: 888109d28000 > > > > > R13: 888109d28080 R14: fee1dead R15: > > > > > FS: 7f4969e0be40() GS:88852c80() > > > > > knlGS: > > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > > CR2: 0300 CR3: 0001051cd006 CR4: 00370eb0 > > > > > DR0: DR1: DR2: > > > > > DR3: DR6: fffe0ff0 DR7: 0400 > > > > > Call Trace: > > > > > > > > > > ? __die+0x20/0x60 > > > > > ? page_fault_oops+0x14c/0x3c0 > > > > > ? exc_page_fault+0x75/0x140 > > > > > ? asm_exc_page_fault+0x22/0x30 > > > > > ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > > > device_shutdown+0x13e/0x1e0 > > > > > kernel_restart+0x36/0x90 > > > > > __do_sys_reboot+0x141/0x210 > > > > > ? vfs_writev+0xcd/0x140 > > > > > ? handle_mm_fault+0x161/0x260 > > > > > ? do_writev+0x6b/0x110 > > > > > do_syscall_64+0x3d/0x90 > > > > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > > RIP: 0033:0x7f496990fb56 > > > > > RSP: 002b:7fffc7bdde88 EFLAGS: 0206 ORIG_RAX: > > > > > 00a9 > > > > > RAX: ffda RBX: RCX: 7f496990fb56 > > > > > RDX: 01234567 RSI: 28121969 RDI: fee1dead > > > > > RBP: 7fffc7bde1d0 R08: R09: > > > > > R10: R11: 0206 R12: > > > > > R13: 7fffc7bddf10 R14: R15: 7fffc7bde2b8 > > > > > > > > > > CR2: 0300 > > > > > ---[ end trace ]--- > > > > > > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") > > > > > Signed-off-by: Dragos Tatulea > > > > > --- > > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > index 9138ef2fb2c8..e2e7ebd71798 100644 > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct > > > > > auxiliary_device > > > > > *auxdev) > > > > > mgtdev = auxiliary_get_drvdata(auxdev); > > > > > ndev = mgtdev->ndev; > > > > > > > > > > - free_irqs(ndev); > > > > > + if (ndev) > > > > > + free_irqs(ndev); > > > > > } > > > > > > > > > > > > > something I don't get: > > > > irqs are allocated in mlx5_vdpa_dev_add > > > > why are they not freed in mlx5_vdpa_dev_del? > > > > > > > That is a good point. I will try to find out. I also don't get why > > > free_irq > > > is > > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can > > > change > > > that in a different refactoring. > > > > as it is I have no idea whether e.g. ndev can change > > between these two call sites. that would make the check > > pointless. > > > > > > this is what's creating all this mess. > > > > > > > > > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct > > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I > > > see. Or > > > am I missing something? > > > > and why do we care whether irqs are freed on shutdown? > > > Had to ask around a bit to find out the answer: there can be issues with kexec > IRQ allocation on some platforms. It is documented here [0] for mlx5_core. > >
Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists
On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote: > On Thu, Jul 27, 2023 at 04:02:16PM +, Dragos Tatulea wrote: > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote: > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote: > > > > The ndev was accessed on shutdown without a check if it actually exists. > > > > This triggered the crash pasted below. This patch simply adds a check > > > > before using ndev. > > > > > > > > BUG: kernel NULL pointer dereference, address: 0300 > > > > #PF: supervisor read access in kernel mode > > > > #PF: error_code(0x) - not-present page > > > > PGD 0 P4D 0 > > > > Oops: [#1] SMP > > > > CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0- > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1 > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0- > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > > > > RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > > RSP: 0018:8881003bfdc0 EFLAGS: 00010286 > > > > RAX: 888103befba0 RBX: 888109d28008 RCX: 0017 > > > > RDX: 0001 RSI: 0212 RDI: 888109d28000 > > > > RBP: R08: 000d3a3a3882 R09: 0001 > > > > R10: R11: R12: 888109d28000 > > > > R13: 888109d28080 R14: fee1dead R15: > > > > FS: 7f4969e0be40() GS:88852c80() > > > > knlGS: > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > CR2: 0300 CR3: 0001051cd006 CR4: 00370eb0 > > > > DR0: DR1: DR2: > > > > DR3: DR6: fffe0ff0 DR7: 0400 > > > > Call Trace: > > > > > > > > ? __die+0x20/0x60 > > > > ? page_fault_oops+0x14c/0x3c0 > > > > ? exc_page_fault+0x75/0x140 > > > > ? asm_exc_page_fault+0x22/0x30 > > > > ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > > device_shutdown+0x13e/0x1e0 > > > > kernel_restart+0x36/0x90 > > > > __do_sys_reboot+0x141/0x210 > > > > ? vfs_writev+0xcd/0x140 > > > > ? handle_mm_fault+0x161/0x260 > > > > ? do_writev+0x6b/0x110 > > > > do_syscall_64+0x3d/0x90 > > > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > RIP: 0033:0x7f496990fb56 > > > > RSP: 002b:7fffc7bdde88 EFLAGS: 0206 ORIG_RAX: 00a9 > > > > RAX: ffda RBX: RCX: 7f496990fb56 > > > > RDX: 01234567 RSI: 28121969 RDI: fee1dead > > > > RBP: 7fffc7bde1d0 R08: R09: > > > > R10: R11: 0206 R12: > > > > R13: 7fffc7bddf10 R14: R15: 7fffc7bde2b8 > > > > > > > > CR2: 0300 > > > > ---[ end trace ]--- > > > > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") > > > > Signed-off-by: Dragos Tatulea > > > > --- > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > index 9138ef2fb2c8..e2e7ebd71798 100644 > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device > > > > *auxdev) > > > > mgtdev = auxiliary_get_drvdata(auxdev); > > > > ndev = mgtdev->ndev; > > > > > > > > - free_irqs(ndev); > > > > + if (ndev) > > > > + free_irqs(ndev); > > > > } > > > > > > > > > > something I don't get: > > > irqs are allocated in mlx5_vdpa_dev_add > > > why are they not freed in mlx5_vdpa_dev_del? > > > > > That is a good point. I will try to find out. I also don't get why free_irq > > is > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can > > change > > that in a different refactoring. > > as it is I have no idea whether e.g. ndev can change > between these two call sites. that would make the check > pointless. > > > > this is what's creating all this mess. > > > > > > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I > > see. Or > > am I missing something? > > and why do we care whether irqs are freed on shutdown? > Had to ask around a bit to find out the answer: there can be issues with kexec IRQ allocation on some platforms. It is documented here [0] for mlx5_core. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/main.c#n2129 Thanks, Dragos ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop
On Thu, Jul 27, 2023 at 5:46 PM Michael S. Tsirkin wrote: > > On Thu, Jul 27, 2023 at 04:59:33PM +0800, Jason Wang wrote: > > > They really shouldn't - any NIC that takes forever to > > > program will create issues in the networking stack. > > > > Unfortunately, it's not rare as the device/cvq could be implemented > > via firmware or software. > > Currently that mean one either has sane firmware with a scheduler that > can meet deadlines, or loses ability to report errors back. > > > > But if they do they can always set this flag too. > > > > This may have false negatives and may confuse the management. > > > > Maybe we can extend the networking core to allow some device specific > > configurations to be done with device specific lock without rtnl. For > > example, split the set_channels to > > > > pre_set_channels > > set_channels > > post_set_channels > > > > The device specific part could be done in pre and post without a rtnl lock? > > > > Thanks > > > Would the benefit be that errors can be reported to userspace then? > Then maybe. I think you will have to show how this works for at least > one card besides virtio. Even for virtio, this seems not easy, as e.g the virtnet_send_command() and netif_set_real_num_tx_queues() need to appear to be atomic to the networking core. I wonder if we can re-consider the way of a timeout here and choose a sane value as a start. Thanks > > > -- > MST > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next V4 2/3] virtio_net: support per queue interrupt coalesce command
在 2023/7/28 13:46, Michael S. Tsirkin 写道: On Thu, Jul 27, 2023 at 03:28:32PM +0200, Paolo Abeni wrote: On Tue, 2023-07-25 at 16:07 +0300, Gavin Li wrote: Add interrupt_coalesce config in send_queue and receive_queue to cache user config. Send per virtqueue interrupt moderation config to underlying device in order to have more efficient interrupt moderation and cpu utilization of guest VM. Additionally, address all the VQs when updating the global configuration, as now the individual VQs configuration can diverge from the global configuration. Signed-off-by: Gavin Li Reviewed-by: Dragos Tatulea Reviewed-by: Jiri Pirko Acked-by: Michael S. Tsirkin FTR, this patch is significantly different from the version previously acked/reviewed, I'm unsure if all the reviewers are ok with the new one. [...] still ok by me Acked-by: Michael S. Tsirkin let's wait for Jason too. I'm fine with this series (I've acked each patch). Thanks static int virtnet_set_coalesce(struct net_device *dev, struct ethtool_coalesce *ec, struct kernel_ethtool_coalesce *kernel_coal, struct netlink_ext_ack *extack) { struct virtnet_info *vi = netdev_priv(dev); - int ret, i, napi_weight; + int ret, queue_number, napi_weight; bool update_napi = false; /* Can't change NAPI weight if the link is up */ napi_weight = ec->tx_max_coalesced_frames ? NAPI_POLL_WEIGHT : 0; - if (napi_weight ^ vi->sq[0].napi.weight) { - if (dev->flags & IFF_UP) - return -EBUSY; - else - update_napi = true; + for (queue_number = 0; queue_number < vi->max_queue_pairs; queue_number++) { + ret = virtnet_should_update_vq_weight(dev->flags, napi_weight, + vi->sq[queue_number].napi.weight, + _napi); + if (ret) + return ret; + + if (update_napi) { + /* All queues that belong to [queue_number, queue_count] will be +* updated for the sake of simplicity, which might not be necessary It looks like the comment above still refers to the old code. Should be: [queue_number, vi->max_queue_pairs] Otherwise LGTM, thanks! Paolo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next V4 3/3] virtio_net: enable per queue interrupt coalesce feature
在 2023/7/25 21:07, Gavin Li 写道: Enable per queue interrupt coalesce feature bit in driver and validate its dependency with control queue. Signed-off-by: Gavin Li Reviewed-by: Dragos Tatulea Reviewed-by: Jiri Pirko Acked-by: Michael S. Tsirkin Reviewed-by: Heng Qi Acked-by: Jason Wang Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next V4 2/3] virtio_net: support per queue interrupt coalesce command
在 2023/7/25 21:07, Gavin Li 写道: Add interrupt_coalesce config in send_queue and receive_queue to cache user config. Send per virtqueue interrupt moderation config to underlying device in order to have more efficient interrupt moderation and cpu utilization of guest VM. Additionally, address all the VQs when updating the global configuration, as now the individual VQs configuration can diverge from the global configuration. Signed-off-by: Gavin Li Reviewed-by: Dragos Tatulea Reviewed-by: Jiri Pirko Acked-by: Michael S. Tsirkin Reviewed-by: Heng Qi Acked-by: Jason Wang Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization