Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

2023-07-31 Thread Jason Wang
On Mon, Jul 31, 2023 at 5:08 PM Michael S. Tsirkin  wrote:
>
> On Mon, Jul 31, 2023 at 07:15:31AM +, Dragos Tatulea wrote:
> > On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
> > > On Thu, Jul 27, 2023 at 04:02:16PM +, Dragos Tatulea wrote:
> > > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> > > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> > > > > > The ndev was accessed on shutdown without a check if it actually 
> > > > > > exists.
> > > > > > This triggered the crash pasted below. This patch simply adds a 
> > > > > > check
> > > > > > before using ndev.
> > > > > >
> > > > > >  BUG: kernel NULL pointer dereference, address: 0300
> > > > > >  #PF: supervisor read access in kernel mode
> > > > > >  #PF: error_code(0x) - not-present page
> > > > > >  PGD 0 P4D 0
> > > > > >  Oops:  [#1] SMP
> > > > > >  CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> > > > > >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > > > > > rel-1.13.0-0-
> > > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > >  RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > >  RSP: 0018:8881003bfdc0 EFLAGS: 00010286
> > > > > >  RAX: 888103befba0 RBX: 888109d28008 RCX: 0017
> > > > > >  RDX: 0001 RSI: 0212 RDI: 888109d28000
> > > > > >  RBP:  R08: 000d3a3a3882 R09: 0001
> > > > > >  R10:  R11:  R12: 888109d28000
> > > > > >  R13: 888109d28080 R14: fee1dead R15: 
> > > > > >  FS:  7f4969e0be40() GS:88852c80()
> > > > > > knlGS:
> > > > > >  CS:  0010 DS:  ES:  CR0: 80050033
> > > > > >  CR2: 0300 CR3: 0001051cd006 CR4: 00370eb0
> > > > > >  DR0:  DR1:  DR2: 
> > > > > >  DR3:  DR6: fffe0ff0 DR7: 0400
> > > > > >  Call Trace:
> > > > > >   
> > > > > >   ? __die+0x20/0x60
> > > > > >   ? page_fault_oops+0x14c/0x3c0
> > > > > >   ? exc_page_fault+0x75/0x140
> > > > > >   ? asm_exc_page_fault+0x22/0x30
> > > > > >   ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > >   device_shutdown+0x13e/0x1e0
> > > > > >   kernel_restart+0x36/0x90
> > > > > >   __do_sys_reboot+0x141/0x210
> > > > > >   ? vfs_writev+0xcd/0x140
> > > > > >   ? handle_mm_fault+0x161/0x260
> > > > > >   ? do_writev+0x6b/0x110
> > > > > >   do_syscall_64+0x3d/0x90
> > > > > >   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > > > > >  RIP: 0033:0x7f496990fb56
> > > > > >  RSP: 002b:7fffc7bdde88 EFLAGS: 0206 ORIG_RAX: 
> > > > > > 00a9
> > > > > >  RAX: ffda RBX:  RCX: 7f496990fb56
> > > > > >  RDX: 01234567 RSI: 28121969 RDI: fee1dead
> > > > > >  RBP: 7fffc7bde1d0 R08:  R09: 
> > > > > >  R10:  R11: 0206 R12: 
> > > > > >  R13: 7fffc7bddf10 R14:  R15: 7fffc7bde2b8
> > > > > >   
> > > > > >  CR2: 0300
> > > > > >  ---[ end trace  ]---
> > > > > >
> > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> > > > > > Signed-off-by: Dragos Tatulea 
> > > > > > ---
> > > > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > index 9138ef2fb2c8..e2e7ebd71798 100644
> > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct 
> > > > > > auxiliary_device
> > > > > > *auxdev)
> > > > > > mgtdev = auxiliary_get_drvdata(auxdev);
> > > > > > ndev = mgtdev->ndev;
> > > > > >
> > > > > > -   free_irqs(ndev);
> > > > > > +   if (ndev)
> > > > > > +   free_irqs(ndev);
> > > > > >  }
> > > > > >
> > > > >
> > > > > something I don't get:
> > > > > irqs are allocated in mlx5_vdpa_dev_add
> > > > > why are they not freed in mlx5_vdpa_dev_del?
> > > > >
> > > > That is a good point. I will try to find out. I also don't get why 
> > > > free_irq
> > > > is
> > > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I 
> > > > can
> > > > change
> > > > that in a different refactoring.
> > >
> > > as it is I have no idea whether e.g. ndev can change
> > > between these two call sites. that would make the check
> > > pointless.
> > >
> > > > > this is what's creating all this mess.
> > > > >
> > > > >
> > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
> > > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I
> > > > see

Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-31 Thread Xuan Zhuo
On Mon, 31 Jul 2023 19:36:06 -0700, Jakub Kicinski  wrote:
> On Tue, 1 Aug 2023 10:03:44 +0800 Xuan Zhuo wrote:
> > > Virtio is either a SW
> > > construct or offloaded to very capable HW, so either way cost of
> > > creating an extra instance for DPDK or whatever else is very low.
> >
> > The extra instance is virtio-net?
> >
> > I think there is a gap. So let me give you a brief introduction of our case.
> >
> > Firstly, we donot use dpdk. We use the AF_XDP, because of that the AF_XDP is
> > more simpler and easy to deploy for the nginx.
> >
> > We use the AF_XDP to speedup the UDP of the quic. By the library, the APP 
> > just
> > needs some simple change.
> >
> > On the AliYun, the net driver is virtio-net. So we want the virtio-net 
> > support
> > the AF_XDP.
> >
> > I guess what you mean is that we can speed up through the cooperation of 
> > devices
> > and drivers, but our machines are public clouds, and we cannot change the
> > back-end devices of virtio under normal circumstances.
> >
> > Here I do not know the different of the real hw and the virtio-net.
>
> You have this working and benchmarked or this is just and idea?

This is not just an idea. I said that has been used on large scale.

This is the library for the APP to use the AF_XDP. We has open it.
https://gitee.com/anolis/libxudp

This is the Alibaba version of the nginx. That has been opened, that supported
to work with the libray to use AF_XDP.
http://tengine.taobao.org/

I supported this on our kernel release Anolis/Alinux.

The work was done about 2 years ago. You know, I pushed the first version to
enable AF_XDP on virtio-net about two years ago. I never thought the job would
be so difficult.

The nic (virtio-net) of AliYun can reach 24,000,000PPS.
So I think there is no different with the real HW on the performance.

With the AF_XDP, the UDP pps is seven times that of the kernel udp stack.

>
> What about io_uring zero copy w/ pre-registered buffers.
> You'll get csum offload, GSO, all the normal perf features.

We tried io-uring, but it was not suitable for our scenario.

Yes, now the AF_XDP does not support the csum offload and GSO.
This is indeed a small problem.

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-31 Thread Xuan Zhuo
On Mon, 31 Jul 2023 08:46:51 -0700, Jakub Kicinski  wrote:
> On Mon, 31 Jul 2023 09:23:29 +0800 Jason Wang wrote:
> > > I'd step back and ask you why do you want to use AF_XDP with virtio.
> > > Instead of bifurcating one virtio instance into different queues why
> > > not create a separate virtio instance?
> >
> > I'm not sure I get this, but do you mean a separate virtio device that
> > owns AF_XDP queues only? If I understand it correctly, bifurcating is
> > one of the key advantages of AF_XDP. What's more, current virtio
> > doesn't support being split at queue (pair) level. And it may still
> > suffer from the yes/no DMA API issue.
>
> I guess we should step even further back and ask Xuan what the use case
> is, because I'm not very sure. All we hear is "enable AF_XDP on virtio"
> but AF_XDP is barely used on real HW, so why?


Why just for real HW?

I want to enable AF_XDP on virtio-net. Then the user can send/recv packets by
AF_XDP bypass through the kernel. That has be used on large scale.

I donot know what is the problem of the virtio-net.
Why do you think that the virtio-net cannot work with AF_XDP?


>
> Bifurcating makes (used to make?) some sense in case of real HW when you
> had only one PCI function and had to subdivide it.

Sorry I do not get this.


> Virtio is either a SW
> construct or offloaded to very capable HW, so either way cost of
> creating an extra instance for DPDK or whatever else is very low.


The extra instance is virtio-net?

I think there is a gap. So let me give you a brief introduction of our case.

Firstly, we donot use dpdk. We use the AF_XDP, because of that the AF_XDP is
more simpler and easy to deploy for the nginx.

We use the AF_XDP to speedup the UDP of the quic. By the library, the APP just
needs some simple change.

On the AliYun, the net driver is virtio-net. So we want the virtio-net support
the AF_XDP.

I guess what you mean is that we can speed up through the cooperation of devices
and drivers, but our machines are public clouds, and we cannot change the
back-end devices of virtio under normal circumstances.

Here I do not know the different of the real hw and the virtio-net.


Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio: a new vcpu watchdog driver

2023-07-31 Thread Hannes Reinecke

On 7/31/23 04:30, Xuan Zhuo wrote:

On Mon, 31 Jul 2023 09:25:12 +0800, "zhanghao1"  wrote:

A new virtio pci driver is added for listening to vcpus
inside guest. Each vcpu creates a corresponding thread to
periodically send data to qemu's back-end watchdog device.
If a vCPU is in the stall state, data cannot be sent to
back-end virtio device. As a result, the back-end device
can detect that the guest is in the stall state.

The driver is mainly used with the back-end watchdog device of qemu.

The qemu backend watchdog device is implemented as follow:
https://lore.kernel.org/qemu-devel/20230705081813.411526-1-zhangh...@kylinos.cn/



I think we need to introduce this new device to the virtio spec firstly.

And when having a watchdog drivers: shouldn't you watch the virtio queue 
_itself_? There is no guarantee that the virtio queue makes forward 
progress ...


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next] vsock: Remove unused function declarations

2023-07-31 Thread Stefano Garzarella

On Sat, Jul 29, 2023 at 08:20:36PM +0800, Yue Haibing wrote:

These are never implemented since introduction in
commit d021c344051a ("VSOCK: Introduce VM Sockets")

Signed-off-by: Yue Haibing 
---
net/vmw_vsock/vmci_transport.h | 3 ---
1 file changed, 3 deletions(-)


Good catch ;-)

I'd used "vsock/vmci:" as a prefix in the title.

With or without:

Reviewed-by: Stefano Garzarella 



diff --git a/net/vmw_vsock/vmci_transport.h b/net/vmw_vsock/vmci_transport.h
index b7b072194282..dbda3ababa14 100644
--- a/net/vmw_vsock/vmci_transport.h
+++ b/net/vmw_vsock/vmci_transport.h
@@ -116,9 +116,6 @@ struct vmci_transport {
spinlock_t lock; /* protects sk. */
};

-int vmci_transport_register(void);
-void vmci_transport_unregister(void);
-
int vmci_transport_send_wrote_bh(struct sockaddr_vm *dst,
 struct sockaddr_vm *src);
int vmci_transport_send_read_bh(struct sockaddr_vm *dst,
--
2.34.1




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

2023-07-31 Thread Michael S. Tsirkin
On Mon, Jul 31, 2023 at 07:15:31AM +, Dragos Tatulea wrote:
> On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
> > On Thu, Jul 27, 2023 at 04:02:16PM +, Dragos Tatulea wrote:
> > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> > > > > The ndev was accessed on shutdown without a check if it actually 
> > > > > exists.
> > > > > This triggered the crash pasted below. This patch simply adds a check
> > > > > before using ndev.
> > > > > 
> > > > >  BUG: kernel NULL pointer dereference, address: 0300
> > > > >  #PF: supervisor read access in kernel mode
> > > > >  #PF: error_code(0x) - not-present page
> > > > >  PGD 0 P4D 0
> > > > >  Oops:  [#1] SMP
> > > > >  CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> > > > >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > > > > rel-1.13.0-0-
> > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > >  RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > >  RSP: 0018:8881003bfdc0 EFLAGS: 00010286
> > > > >  RAX: 888103befba0 RBX: 888109d28008 RCX: 0017
> > > > >  RDX: 0001 RSI: 0212 RDI: 888109d28000
> > > > >  RBP:  R08: 000d3a3a3882 R09: 0001
> > > > >  R10:  R11:  R12: 888109d28000
> > > > >  R13: 888109d28080 R14: fee1dead R15: 
> > > > >  FS:  7f4969e0be40() GS:88852c80()
> > > > > knlGS:
> > > > >  CS:  0010 DS:  ES:  CR0: 80050033
> > > > >  CR2: 0300 CR3: 0001051cd006 CR4: 00370eb0
> > > > >  DR0:  DR1:  DR2: 
> > > > >  DR3:  DR6: fffe0ff0 DR7: 0400
> > > > >  Call Trace:
> > > > >   
> > > > >   ? __die+0x20/0x60
> > > > >   ? page_fault_oops+0x14c/0x3c0
> > > > >   ? exc_page_fault+0x75/0x140
> > > > >   ? asm_exc_page_fault+0x22/0x30
> > > > >   ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > >   device_shutdown+0x13e/0x1e0
> > > > >   kernel_restart+0x36/0x90
> > > > >   __do_sys_reboot+0x141/0x210
> > > > >   ? vfs_writev+0xcd/0x140
> > > > >   ? handle_mm_fault+0x161/0x260
> > > > >   ? do_writev+0x6b/0x110
> > > > >   do_syscall_64+0x3d/0x90
> > > > >   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > > > >  RIP: 0033:0x7f496990fb56
> > > > >  RSP: 002b:7fffc7bdde88 EFLAGS: 0206 ORIG_RAX: 
> > > > > 00a9
> > > > >  RAX: ffda RBX:  RCX: 7f496990fb56
> > > > >  RDX: 01234567 RSI: 28121969 RDI: fee1dead
> > > > >  RBP: 7fffc7bde1d0 R08:  R09: 
> > > > >  R10:  R11: 0206 R12: 
> > > > >  R13: 7fffc7bddf10 R14:  R15: 7fffc7bde2b8
> > > > >   
> > > > >  CR2: 0300
> > > > >  ---[ end trace  ]---
> > > > > 
> > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> > > > > Signed-off-by: Dragos Tatulea 
> > > > > ---
> > > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > index 9138ef2fb2c8..e2e7ebd71798 100644
> > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct 
> > > > > auxiliary_device
> > > > > *auxdev)
> > > > > mgtdev = auxiliary_get_drvdata(auxdev);
> > > > > ndev = mgtdev->ndev;
> > > > >  
> > > > > -   free_irqs(ndev);
> > > > > +   if (ndev)
> > > > > +   free_irqs(ndev);
> > > > >  }
> > > > >  
> > > > 
> > > > something I don't get:
> > > > irqs are allocated in mlx5_vdpa_dev_add
> > > > why are they not freed in mlx5_vdpa_dev_del?
> > > > 
> > > That is a good point. I will try to find out. I also don't get why 
> > > free_irq
> > > is
> > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can
> > > change
> > > that in a different refactoring.
> > 
> > as it is I have no idea whether e.g. ndev can change
> > between these two call sites. that would make the check
> > pointless.
> > 
> > > > this is what's creating all this mess.
> > > > 
> > > > 
> > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
> > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I
> > > see. Or
> > > am I missing something?
> > 
> > and why do we care whether irqs are freed on shutdown?
> > 
> Had to ask around a bit to find out the answer: there can be issues with kexec
> IRQ allocation on some platforms. It is documented here [0] for mlx5_core.
> 
> https://git

Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

2023-07-31 Thread Dragos Tatulea via Virtualization
On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 27, 2023 at 04:02:16PM +, Dragos Tatulea wrote:
> > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> > > > The ndev was accessed on shutdown without a check if it actually exists.
> > > > This triggered the crash pasted below. This patch simply adds a check
> > > > before using ndev.
> > > > 
> > > >  BUG: kernel NULL pointer dereference, address: 0300
> > > >  #PF: supervisor read access in kernel mode
> > > >  #PF: error_code(0x) - not-present page
> > > >  PGD 0 P4D 0
> > > >  Oops:  [#1] SMP
> > > >  CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> > > >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-
> > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > >  RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > >  RSP: 0018:8881003bfdc0 EFLAGS: 00010286
> > > >  RAX: 888103befba0 RBX: 888109d28008 RCX: 0017
> > > >  RDX: 0001 RSI: 0212 RDI: 888109d28000
> > > >  RBP:  R08: 000d3a3a3882 R09: 0001
> > > >  R10:  R11:  R12: 888109d28000
> > > >  R13: 888109d28080 R14: fee1dead R15: 
> > > >  FS:  7f4969e0be40() GS:88852c80()
> > > > knlGS:
> > > >  CS:  0010 DS:  ES:  CR0: 80050033
> > > >  CR2: 0300 CR3: 0001051cd006 CR4: 00370eb0
> > > >  DR0:  DR1:  DR2: 
> > > >  DR3:  DR6: fffe0ff0 DR7: 0400
> > > >  Call Trace:
> > > >   
> > > >   ? __die+0x20/0x60
> > > >   ? page_fault_oops+0x14c/0x3c0
> > > >   ? exc_page_fault+0x75/0x140
> > > >   ? asm_exc_page_fault+0x22/0x30
> > > >   ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > >   device_shutdown+0x13e/0x1e0
> > > >   kernel_restart+0x36/0x90
> > > >   __do_sys_reboot+0x141/0x210
> > > >   ? vfs_writev+0xcd/0x140
> > > >   ? handle_mm_fault+0x161/0x260
> > > >   ? do_writev+0x6b/0x110
> > > >   do_syscall_64+0x3d/0x90
> > > >   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > > >  RIP: 0033:0x7f496990fb56
> > > >  RSP: 002b:7fffc7bdde88 EFLAGS: 0206 ORIG_RAX: 00a9
> > > >  RAX: ffda RBX:  RCX: 7f496990fb56
> > > >  RDX: 01234567 RSI: 28121969 RDI: fee1dead
> > > >  RBP: 7fffc7bde1d0 R08:  R09: 
> > > >  R10:  R11: 0206 R12: 
> > > >  R13: 7fffc7bddf10 R14:  R15: 7fffc7bde2b8
> > > >   
> > > >  CR2: 0300
> > > >  ---[ end trace  ]---
> > > > 
> > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> > > > Signed-off-by: Dragos Tatulea 
> > > > ---
> > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > index 9138ef2fb2c8..e2e7ebd71798 100644
> > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device
> > > > *auxdev)
> > > > mgtdev = auxiliary_get_drvdata(auxdev);
> > > > ndev = mgtdev->ndev;
> > > >  
> > > > -   free_irqs(ndev);
> > > > +   if (ndev)
> > > > +   free_irqs(ndev);
> > > >  }
> > > >  
> > > 
> > > something I don't get:
> > > irqs are allocated in mlx5_vdpa_dev_add
> > > why are they not freed in mlx5_vdpa_dev_del?
> > > 
> > That is a good point. I will try to find out. I also don't get why free_irq
> > is
> > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can
> > change
> > that in a different refactoring.
> 
> as it is I have no idea whether e.g. ndev can change
> between these two call sites. that would make the check
> pointless.
> 
> > > this is what's creating all this mess.
> > > 
> > > 
> > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
> > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I
> > see. Or
> > am I missing something?
> 
> and why do we care whether irqs are freed on shutdown?
> 
Had to ask around a bit to find out the answer: there can be issues with kexec
IRQ allocation on some platforms. It is documented here [0] for mlx5_core.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/main.c#n2129

Thanks,
Dragos
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization