Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists
On Mon, Jul 31, 2023 at 5:08 PM Michael S. Tsirkin wrote: > > On Mon, Jul 31, 2023 at 07:15:31AM +, Dragos Tatulea wrote: > > On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote: > > > On Thu, Jul 27, 2023 at 04:02:16PM +, Dragos Tatulea wrote: > > > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote: > > > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote: > > > > > > The ndev was accessed on shutdown without a check if it actually > > > > > > exists. > > > > > > This triggered the crash pasted below. This patch simply adds a > > > > > > check > > > > > > before using ndev. > > > > > > > > > > > > BUG: kernel NULL pointer dereference, address: 0300 > > > > > > #PF: supervisor read access in kernel mode > > > > > > #PF: error_code(0x) - not-present page > > > > > > PGD 0 P4D 0 > > > > > > Oops: [#1] SMP > > > > > > CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0- > > > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1 > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > > > > > rel-1.13.0-0- > > > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > > > > > > RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > > > > RSP: 0018:8881003bfdc0 EFLAGS: 00010286 > > > > > > RAX: 888103befba0 RBX: 888109d28008 RCX: 0017 > > > > > > RDX: 0001 RSI: 0212 RDI: 888109d28000 > > > > > > RBP: R08: 000d3a3a3882 R09: 0001 > > > > > > R10: R11: R12: 888109d28000 > > > > > > R13: 888109d28080 R14: fee1dead R15: > > > > > > FS: 7f4969e0be40() GS:88852c80() > > > > > > knlGS: > > > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > > > CR2: 0300 CR3: 0001051cd006 CR4: 00370eb0 > > > > > > DR0: DR1: DR2: > > > > > > DR3: DR6: fffe0ff0 DR7: 0400 > > > > > > Call Trace: > > > > > > > > > > > > ? __die+0x20/0x60 > > > > > > ? page_fault_oops+0x14c/0x3c0 > > > > > > ? exc_page_fault+0x75/0x140 > > > > > > ? asm_exc_page_fault+0x22/0x30 > > > > > > ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > > > > device_shutdown+0x13e/0x1e0 > > > > > > kernel_restart+0x36/0x90 > > > > > > __do_sys_reboot+0x141/0x210 > > > > > > ? vfs_writev+0xcd/0x140 > > > > > > ? handle_mm_fault+0x161/0x260 > > > > > > ? do_writev+0x6b/0x110 > > > > > > do_syscall_64+0x3d/0x90 > > > > > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > > > RIP: 0033:0x7f496990fb56 > > > > > > RSP: 002b:7fffc7bdde88 EFLAGS: 0206 ORIG_RAX: > > > > > > 00a9 > > > > > > RAX: ffda RBX: RCX: 7f496990fb56 > > > > > > RDX: 01234567 RSI: 28121969 RDI: fee1dead > > > > > > RBP: 7fffc7bde1d0 R08: R09: > > > > > > R10: R11: 0206 R12: > > > > > > R13: 7fffc7bddf10 R14: R15: 7fffc7bde2b8 > > > > > > > > > > > > CR2: 0300 > > > > > > ---[ end trace ]--- > > > > > > > > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") > > > > > > Signed-off-by: Dragos Tatulea > > > > > > --- > > > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > > index 9138ef2fb2c8..e2e7ebd71798 100644 > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct > > > > > > auxiliary_device > > > > > > *auxdev) > > > > > > mgtdev = auxiliary_get_drvdata(auxdev); > > > > > > ndev = mgtdev->ndev; > > > > > > > > > > > > - free_irqs(ndev); > > > > > > + if (ndev) > > > > > > + free_irqs(ndev); > > > > > > } > > > > > > > > > > > > > > > > something I don't get: > > > > > irqs are allocated in mlx5_vdpa_dev_add > > > > > why are they not freed in mlx5_vdpa_dev_del? > > > > > > > > > That is a good point. I will try to find out. I also don't get why > > > > free_irq > > > > is > > > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I > > > > can > > > > change > > > > that in a different refactoring. > > > > > > as it is I have no idea whether e.g. ndev can change > > > between these two call sites. that would make the check > > > pointless. > > > > > > > > this is what's creating all this mess. > > > > > > > > > > > > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct > > > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I > > > > see
Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
On Mon, 31 Jul 2023 19:36:06 -0700, Jakub Kicinski wrote: > On Tue, 1 Aug 2023 10:03:44 +0800 Xuan Zhuo wrote: > > > Virtio is either a SW > > > construct or offloaded to very capable HW, so either way cost of > > > creating an extra instance for DPDK or whatever else is very low. > > > > The extra instance is virtio-net? > > > > I think there is a gap. So let me give you a brief introduction of our case. > > > > Firstly, we donot use dpdk. We use the AF_XDP, because of that the AF_XDP is > > more simpler and easy to deploy for the nginx. > > > > We use the AF_XDP to speedup the UDP of the quic. By the library, the APP > > just > > needs some simple change. > > > > On the AliYun, the net driver is virtio-net. So we want the virtio-net > > support > > the AF_XDP. > > > > I guess what you mean is that we can speed up through the cooperation of > > devices > > and drivers, but our machines are public clouds, and we cannot change the > > back-end devices of virtio under normal circumstances. > > > > Here I do not know the different of the real hw and the virtio-net. > > You have this working and benchmarked or this is just and idea? This is not just an idea. I said that has been used on large scale. This is the library for the APP to use the AF_XDP. We has open it. https://gitee.com/anolis/libxudp This is the Alibaba version of the nginx. That has been opened, that supported to work with the libray to use AF_XDP. http://tengine.taobao.org/ I supported this on our kernel release Anolis/Alinux. The work was done about 2 years ago. You know, I pushed the first version to enable AF_XDP on virtio-net about two years ago. I never thought the job would be so difficult. The nic (virtio-net) of AliYun can reach 24,000,000PPS. So I think there is no different with the real HW on the performance. With the AF_XDP, the UDP pps is seven times that of the kernel udp stack. > > What about io_uring zero copy w/ pre-registered buffers. > You'll get csum offload, GSO, all the normal perf features. We tried io-uring, but it was not suitable for our scenario. Yes, now the AF_XDP does not support the csum offload and GSO. This is indeed a small problem. Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
On Mon, 31 Jul 2023 08:46:51 -0700, Jakub Kicinski wrote: > On Mon, 31 Jul 2023 09:23:29 +0800 Jason Wang wrote: > > > I'd step back and ask you why do you want to use AF_XDP with virtio. > > > Instead of bifurcating one virtio instance into different queues why > > > not create a separate virtio instance? > > > > I'm not sure I get this, but do you mean a separate virtio device that > > owns AF_XDP queues only? If I understand it correctly, bifurcating is > > one of the key advantages of AF_XDP. What's more, current virtio > > doesn't support being split at queue (pair) level. And it may still > > suffer from the yes/no DMA API issue. > > I guess we should step even further back and ask Xuan what the use case > is, because I'm not very sure. All we hear is "enable AF_XDP on virtio" > but AF_XDP is barely used on real HW, so why? Why just for real HW? I want to enable AF_XDP on virtio-net. Then the user can send/recv packets by AF_XDP bypass through the kernel. That has be used on large scale. I donot know what is the problem of the virtio-net. Why do you think that the virtio-net cannot work with AF_XDP? > > Bifurcating makes (used to make?) some sense in case of real HW when you > had only one PCI function and had to subdivide it. Sorry I do not get this. > Virtio is either a SW > construct or offloaded to very capable HW, so either way cost of > creating an extra instance for DPDK or whatever else is very low. The extra instance is virtio-net? I think there is a gap. So let me give you a brief introduction of our case. Firstly, we donot use dpdk. We use the AF_XDP, because of that the AF_XDP is more simpler and easy to deploy for the nginx. We use the AF_XDP to speedup the UDP of the quic. By the library, the APP just needs some simple change. On the AliYun, the net driver is virtio-net. So we want the virtio-net support the AF_XDP. I guess what you mean is that we can speed up through the cooperation of devices and drivers, but our machines are public clouds, and we cannot change the back-end devices of virtio under normal circumstances. Here I do not know the different of the real hw and the virtio-net. Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio: a new vcpu watchdog driver
On 7/31/23 04:30, Xuan Zhuo wrote: On Mon, 31 Jul 2023 09:25:12 +0800, "zhanghao1" wrote: A new virtio pci driver is added for listening to vcpus inside guest. Each vcpu creates a corresponding thread to periodically send data to qemu's back-end watchdog device. If a vCPU is in the stall state, data cannot be sent to back-end virtio device. As a result, the back-end device can detect that the guest is in the stall state. The driver is mainly used with the back-end watchdog device of qemu. The qemu backend watchdog device is implemented as follow: https://lore.kernel.org/qemu-devel/20230705081813.411526-1-zhangh...@kylinos.cn/ I think we need to introduce this new device to the virtio spec firstly. And when having a watchdog drivers: shouldn't you watch the virtio queue _itself_? There is no guarantee that the virtio queue makes forward progress ... Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next] vsock: Remove unused function declarations
On Sat, Jul 29, 2023 at 08:20:36PM +0800, Yue Haibing wrote: These are never implemented since introduction in commit d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Yue Haibing --- net/vmw_vsock/vmci_transport.h | 3 --- 1 file changed, 3 deletions(-) Good catch ;-) I'd used "vsock/vmci:" as a prefix in the title. With or without: Reviewed-by: Stefano Garzarella diff --git a/net/vmw_vsock/vmci_transport.h b/net/vmw_vsock/vmci_transport.h index b7b072194282..dbda3ababa14 100644 --- a/net/vmw_vsock/vmci_transport.h +++ b/net/vmw_vsock/vmci_transport.h @@ -116,9 +116,6 @@ struct vmci_transport { spinlock_t lock; /* protects sk. */ }; -int vmci_transport_register(void); -void vmci_transport_unregister(void); - int vmci_transport_send_wrote_bh(struct sockaddr_vm *dst, struct sockaddr_vm *src); int vmci_transport_send_read_bh(struct sockaddr_vm *dst, -- 2.34.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists
On Mon, Jul 31, 2023 at 07:15:31AM +, Dragos Tatulea wrote: > On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote: > > On Thu, Jul 27, 2023 at 04:02:16PM +, Dragos Tatulea wrote: > > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote: > > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote: > > > > > The ndev was accessed on shutdown without a check if it actually > > > > > exists. > > > > > This triggered the crash pasted below. This patch simply adds a check > > > > > before using ndev. > > > > > > > > > > BUG: kernel NULL pointer dereference, address: 0300 > > > > > #PF: supervisor read access in kernel mode > > > > > #PF: error_code(0x) - not-present page > > > > > PGD 0 P4D 0 > > > > > Oops: [#1] SMP > > > > > CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0- > > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1 > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > > > > rel-1.13.0-0- > > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > > > > > RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > > > RSP: 0018:8881003bfdc0 EFLAGS: 00010286 > > > > > RAX: 888103befba0 RBX: 888109d28008 RCX: 0017 > > > > > RDX: 0001 RSI: 0212 RDI: 888109d28000 > > > > > RBP: R08: 000d3a3a3882 R09: 0001 > > > > > R10: R11: R12: 888109d28000 > > > > > R13: 888109d28080 R14: fee1dead R15: > > > > > FS: 7f4969e0be40() GS:88852c80() > > > > > knlGS: > > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > > CR2: 0300 CR3: 0001051cd006 CR4: 00370eb0 > > > > > DR0: DR1: DR2: > > > > > DR3: DR6: fffe0ff0 DR7: 0400 > > > > > Call Trace: > > > > > > > > > > ? __die+0x20/0x60 > > > > > ? page_fault_oops+0x14c/0x3c0 > > > > > ? exc_page_fault+0x75/0x140 > > > > > ? asm_exc_page_fault+0x22/0x30 > > > > > ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > > > device_shutdown+0x13e/0x1e0 > > > > > kernel_restart+0x36/0x90 > > > > > __do_sys_reboot+0x141/0x210 > > > > > ? vfs_writev+0xcd/0x140 > > > > > ? handle_mm_fault+0x161/0x260 > > > > > ? do_writev+0x6b/0x110 > > > > > do_syscall_64+0x3d/0x90 > > > > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > > RIP: 0033:0x7f496990fb56 > > > > > RSP: 002b:7fffc7bdde88 EFLAGS: 0206 ORIG_RAX: > > > > > 00a9 > > > > > RAX: ffda RBX: RCX: 7f496990fb56 > > > > > RDX: 01234567 RSI: 28121969 RDI: fee1dead > > > > > RBP: 7fffc7bde1d0 R08: R09: > > > > > R10: R11: 0206 R12: > > > > > R13: 7fffc7bddf10 R14: R15: 7fffc7bde2b8 > > > > > > > > > > CR2: 0300 > > > > > ---[ end trace ]--- > > > > > > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") > > > > > Signed-off-by: Dragos Tatulea > > > > > --- > > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > index 9138ef2fb2c8..e2e7ebd71798 100644 > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct > > > > > auxiliary_device > > > > > *auxdev) > > > > > mgtdev = auxiliary_get_drvdata(auxdev); > > > > > ndev = mgtdev->ndev; > > > > > > > > > > - free_irqs(ndev); > > > > > + if (ndev) > > > > > + free_irqs(ndev); > > > > > } > > > > > > > > > > > > > something I don't get: > > > > irqs are allocated in mlx5_vdpa_dev_add > > > > why are they not freed in mlx5_vdpa_dev_del? > > > > > > > That is a good point. I will try to find out. I also don't get why > > > free_irq > > > is > > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can > > > change > > > that in a different refactoring. > > > > as it is I have no idea whether e.g. ndev can change > > between these two call sites. that would make the check > > pointless. > > > > > > this is what's creating all this mess. > > > > > > > > > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct > > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I > > > see. Or > > > am I missing something? > > > > and why do we care whether irqs are freed on shutdown? > > > Had to ask around a bit to find out the answer: there can be issues with kexec > IRQ allocation on some platforms. It is documented here [0] for mlx5_core. > > https://git
Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists
On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote: > On Thu, Jul 27, 2023 at 04:02:16PM +, Dragos Tatulea wrote: > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote: > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote: > > > > The ndev was accessed on shutdown without a check if it actually exists. > > > > This triggered the crash pasted below. This patch simply adds a check > > > > before using ndev. > > > > > > > > BUG: kernel NULL pointer dereference, address: 0300 > > > > #PF: supervisor read access in kernel mode > > > > #PF: error_code(0x) - not-present page > > > > PGD 0 P4D 0 > > > > Oops: [#1] SMP > > > > CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0- > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1 > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0- > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > > > > RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > > RSP: 0018:8881003bfdc0 EFLAGS: 00010286 > > > > RAX: 888103befba0 RBX: 888109d28008 RCX: 0017 > > > > RDX: 0001 RSI: 0212 RDI: 888109d28000 > > > > RBP: R08: 000d3a3a3882 R09: 0001 > > > > R10: R11: R12: 888109d28000 > > > > R13: 888109d28080 R14: fee1dead R15: > > > > FS: 7f4969e0be40() GS:88852c80() > > > > knlGS: > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > CR2: 0300 CR3: 0001051cd006 CR4: 00370eb0 > > > > DR0: DR1: DR2: > > > > DR3: DR6: fffe0ff0 DR7: 0400 > > > > Call Trace: > > > > > > > > ? __die+0x20/0x60 > > > > ? page_fault_oops+0x14c/0x3c0 > > > > ? exc_page_fault+0x75/0x140 > > > > ? asm_exc_page_fault+0x22/0x30 > > > > ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > > device_shutdown+0x13e/0x1e0 > > > > kernel_restart+0x36/0x90 > > > > __do_sys_reboot+0x141/0x210 > > > > ? vfs_writev+0xcd/0x140 > > > > ? handle_mm_fault+0x161/0x260 > > > > ? do_writev+0x6b/0x110 > > > > do_syscall_64+0x3d/0x90 > > > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > RIP: 0033:0x7f496990fb56 > > > > RSP: 002b:7fffc7bdde88 EFLAGS: 0206 ORIG_RAX: 00a9 > > > > RAX: ffda RBX: RCX: 7f496990fb56 > > > > RDX: 01234567 RSI: 28121969 RDI: fee1dead > > > > RBP: 7fffc7bde1d0 R08: R09: > > > > R10: R11: 0206 R12: > > > > R13: 7fffc7bddf10 R14: R15: 7fffc7bde2b8 > > > > > > > > CR2: 0300 > > > > ---[ end trace ]--- > > > > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") > > > > Signed-off-by: Dragos Tatulea > > > > --- > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > index 9138ef2fb2c8..e2e7ebd71798 100644 > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device > > > > *auxdev) > > > > mgtdev = auxiliary_get_drvdata(auxdev); > > > > ndev = mgtdev->ndev; > > > > > > > > - free_irqs(ndev); > > > > + if (ndev) > > > > + free_irqs(ndev); > > > > } > > > > > > > > > > something I don't get: > > > irqs are allocated in mlx5_vdpa_dev_add > > > why are they not freed in mlx5_vdpa_dev_del? > > > > > That is a good point. I will try to find out. I also don't get why free_irq > > is > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can > > change > > that in a different refactoring. > > as it is I have no idea whether e.g. ndev can change > between these two call sites. that would make the check > pointless. > > > > this is what's creating all this mess. > > > > > > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I > > see. Or > > am I missing something? > > and why do we care whether irqs are freed on shutdown? > Had to ask around a bit to find out the answer: there can be issues with kexec IRQ allocation on some platforms. It is documented here [0] for mlx5_core. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/main.c#n2129 Thanks, Dragos ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization