Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

2023-07-31 Thread Jason Wang
On Mon, Jul 31, 2023 at 5:08 PM Michael S. Tsirkin  wrote:
>
> On Mon, Jul 31, 2023 at 07:15:31AM +, Dragos Tatulea wrote:
> > On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
> > > On Thu, Jul 27, 2023 at 04:02:16PM +, Dragos Tatulea wrote:
> > > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> > > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> > > > > > The ndev was accessed on shutdown without a check if it actually 
> > > > > > exists.
> > > > > > This triggered the crash pasted below. This patch simply adds a 
> > > > > > check
> > > > > > before using ndev.
> > > > > >
> > > > > >  BUG: kernel NULL pointer dereference, address: 0300
> > > > > >  #PF: supervisor read access in kernel mode
> > > > > >  #PF: error_code(0x) - not-present page
> > > > > >  PGD 0 P4D 0
> > > > > >  Oops:  [#1] SMP
> > > > > >  CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> > > > > >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > > > > > rel-1.13.0-0-
> > > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > >  RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > >  RSP: 0018:8881003bfdc0 EFLAGS: 00010286
> > > > > >  RAX: 888103befba0 RBX: 888109d28008 RCX: 0017
> > > > > >  RDX: 0001 RSI: 0212 RDI: 888109d28000
> > > > > >  RBP:  R08: 000d3a3a3882 R09: 0001
> > > > > >  R10:  R11:  R12: 888109d28000
> > > > > >  R13: 888109d28080 R14: fee1dead R15: 
> > > > > >  FS:  7f4969e0be40() GS:88852c80()
> > > > > > knlGS:
> > > > > >  CS:  0010 DS:  ES:  CR0: 80050033
> > > > > >  CR2: 0300 CR3: 0001051cd006 CR4: 00370eb0
> > > > > >  DR0:  DR1:  DR2: 
> > > > > >  DR3:  DR6: fffe0ff0 DR7: 0400
> > > > > >  Call Trace:
> > > > > >   
> > > > > >   ? __die+0x20/0x60
> > > > > >   ? page_fault_oops+0x14c/0x3c0
> > > > > >   ? exc_page_fault+0x75/0x140
> > > > > >   ? asm_exc_page_fault+0x22/0x30
> > > > > >   ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > >   device_shutdown+0x13e/0x1e0
> > > > > >   kernel_restart+0x36/0x90
> > > > > >   __do_sys_reboot+0x141/0x210
> > > > > >   ? vfs_writev+0xcd/0x140
> > > > > >   ? handle_mm_fault+0x161/0x260
> > > > > >   ? do_writev+0x6b/0x110
> > > > > >   do_syscall_64+0x3d/0x90
> > > > > >   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > > > > >  RIP: 0033:0x7f496990fb56
> > > > > >  RSP: 002b:7fffc7bdde88 EFLAGS: 0206 ORIG_RAX: 
> > > > > > 00a9
> > > > > >  RAX: ffda RBX:  RCX: 7f496990fb56
> > > > > >  RDX: 01234567 RSI: 28121969 RDI: fee1dead
> > > > > >  RBP: 7fffc7bde1d0 R08:  R09: 
> > > > > >  R10:  R11: 0206 R12: 
> > > > > >  R13: 7fffc7bddf10 R14:  R15: 7fffc7bde2b8
> > > > > >   
> > > > > >  CR2: 0300
> > > > > >  ---[ end trace  ]---
> > > > > >
> > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> > > > > > Signed-off-by: Dragos Tatulea 
> > > > > > ---
> > > > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > index 9138ef2fb2c8..e2e7ebd71798 100644
> > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct 
> > > > > > auxiliary_device
> > > > > > *auxdev)
> > > > > > mgtdev = auxiliary_get_drvdata(auxdev);
> > > > > > ndev = mgtdev->ndev;
> > > > > >
> > > > > > -   free_irqs(ndev);
> > > > > > +   if (ndev)
> > > > > > +   free_irqs(ndev);
> > > > > >  }
> > > > > >
> > > > >
> > > > > something I don't get:
> > > > > irqs are allocated in mlx5_vdpa_dev_add
> > > > > why are they not freed in mlx5_vdpa_dev_del?
> > > > >
> > > > That is a good point. I will try to find out. I also don't get why 
> > > > free_irq
> > > > is
> > > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I 
> > > > can
> > > > change
> > > > that in a different refactoring.
> > >
> > > as it is I have no idea whether e.g. ndev can change
> > > between these two call sites. that would make the check
> > > pointless.
> > >
> > > > > this is what's creating all this mess.
> > > > >
> > > > >
> > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
> > > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I
> > > > 

Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-31 Thread Xuan Zhuo
On Mon, 31 Jul 2023 19:36:06 -0700, Jakub Kicinski  wrote:
> On Tue, 1 Aug 2023 10:03:44 +0800 Xuan Zhuo wrote:
> > > Virtio is either a SW
> > > construct or offloaded to very capable HW, so either way cost of
> > > creating an extra instance for DPDK or whatever else is very low.
> >
> > The extra instance is virtio-net?
> >
> > I think there is a gap. So let me give you a brief introduction of our case.
> >
> > Firstly, we donot use dpdk. We use the AF_XDP, because of that the AF_XDP is
> > more simpler and easy to deploy for the nginx.
> >
> > We use the AF_XDP to speedup the UDP of the quic. By the library, the APP 
> > just
> > needs some simple change.
> >
> > On the AliYun, the net driver is virtio-net. So we want the virtio-net 
> > support
> > the AF_XDP.
> >
> > I guess what you mean is that we can speed up through the cooperation of 
> > devices
> > and drivers, but our machines are public clouds, and we cannot change the
> > back-end devices of virtio under normal circumstances.
> >
> > Here I do not know the different of the real hw and the virtio-net.
>
> You have this working and benchmarked or this is just and idea?

This is not just an idea. I said that has been used on large scale.

This is the library for the APP to use the AF_XDP. We has open it.
https://gitee.com/anolis/libxudp

This is the Alibaba version of the nginx. That has been opened, that supported
to work with the libray to use AF_XDP.
http://tengine.taobao.org/

I supported this on our kernel release Anolis/Alinux.

The work was done about 2 years ago. You know, I pushed the first version to
enable AF_XDP on virtio-net about two years ago. I never thought the job would
be so difficult.

The nic (virtio-net) of AliYun can reach 24,000,000PPS.
So I think there is no different with the real HW on the performance.

With the AF_XDP, the UDP pps is seven times that of the kernel udp stack.

>
> What about io_uring zero copy w/ pre-registered buffers.
> You'll get csum offload, GSO, all the normal perf features.

We tried io-uring, but it was not suitable for our scenario.

Yes, now the AF_XDP does not support the csum offload and GSO.
This is indeed a small problem.

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-31 Thread Xuan Zhuo
On Mon, 31 Jul 2023 08:46:51 -0700, Jakub Kicinski  wrote:
> On Mon, 31 Jul 2023 09:23:29 +0800 Jason Wang wrote:
> > > I'd step back and ask you why do you want to use AF_XDP with virtio.
> > > Instead of bifurcating one virtio instance into different queues why
> > > not create a separate virtio instance?
> >
> > I'm not sure I get this, but do you mean a separate virtio device that
> > owns AF_XDP queues only? If I understand it correctly, bifurcating is
> > one of the key advantages of AF_XDP. What's more, current virtio
> > doesn't support being split at queue (pair) level. And it may still
> > suffer from the yes/no DMA API issue.
>
> I guess we should step even further back and ask Xuan what the use case
> is, because I'm not very sure. All we hear is "enable AF_XDP on virtio"
> but AF_XDP is barely used on real HW, so why?


Why just for real HW?

I want to enable AF_XDP on virtio-net. Then the user can send/recv packets by
AF_XDP bypass through the kernel. That has be used on large scale.

I donot know what is the problem of the virtio-net.
Why do you think that the virtio-net cannot work with AF_XDP?


>
> Bifurcating makes (used to make?) some sense in case of real HW when you
> had only one PCI function and had to subdivide it.

Sorry I do not get this.


> Virtio is either a SW
> construct or offloaded to very capable HW, so either way cost of
> creating an extra instance for DPDK or whatever else is very low.


The extra instance is virtio-net?

I think there is a gap. So let me give you a brief introduction of our case.

Firstly, we donot use dpdk. We use the AF_XDP, because of that the AF_XDP is
more simpler and easy to deploy for the nginx.

We use the AF_XDP to speedup the UDP of the quic. By the library, the APP just
needs some simple change.

On the AliYun, the net driver is virtio-net. So we want the virtio-net support
the AF_XDP.

I guess what you mean is that we can speed up through the cooperation of devices
and drivers, but our machines are public clouds, and we cannot change the
back-end devices of virtio under normal circumstances.

Here I do not know the different of the real hw and the virtio-net.


Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio: a new vcpu watchdog driver

2023-07-31 Thread Hannes Reinecke

On 7/31/23 04:30, Xuan Zhuo wrote:

On Mon, 31 Jul 2023 09:25:12 +0800, "zhanghao1"  wrote:

A new virtio pci driver is added for listening to vcpus
inside guest. Each vcpu creates a corresponding thread to
periodically send data to qemu's back-end watchdog device.
If a vCPU is in the stall state, data cannot be sent to
back-end virtio device. As a result, the back-end device
can detect that the guest is in the stall state.

The driver is mainly used with the back-end watchdog device of qemu.

The qemu backend watchdog device is implemented as follow:
https://lore.kernel.org/qemu-devel/20230705081813.411526-1-zhangh...@kylinos.cn/



I think we need to introduce this new device to the virtio spec firstly.

And when having a watchdog drivers: shouldn't you watch the virtio queue 
_itself_? There is no guarantee that the virtio queue makes forward 
progress ...


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next] vsock: Remove unused function declarations

2023-07-31 Thread Stefano Garzarella

On Sat, Jul 29, 2023 at 08:20:36PM +0800, Yue Haibing wrote:

These are never implemented since introduction in
commit d021c344051a ("VSOCK: Introduce VM Sockets")

Signed-off-by: Yue Haibing 
---
net/vmw_vsock/vmci_transport.h | 3 ---
1 file changed, 3 deletions(-)


Good catch ;-)

I'd used "vsock/vmci:" as a prefix in the title.

With or without:

Reviewed-by: Stefano Garzarella 



diff --git a/net/vmw_vsock/vmci_transport.h b/net/vmw_vsock/vmci_transport.h
index b7b072194282..dbda3ababa14 100644
--- a/net/vmw_vsock/vmci_transport.h
+++ b/net/vmw_vsock/vmci_transport.h
@@ -116,9 +116,6 @@ struct vmci_transport {
spinlock_t lock; /* protects sk. */
};

-int vmci_transport_register(void);
-void vmci_transport_unregister(void);
-
int vmci_transport_send_wrote_bh(struct sockaddr_vm *dst,
 struct sockaddr_vm *src);
int vmci_transport_send_read_bh(struct sockaddr_vm *dst,
--
2.34.1




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

2023-07-31 Thread Michael S. Tsirkin
On Mon, Jul 31, 2023 at 07:15:31AM +, Dragos Tatulea wrote:
> On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
> > On Thu, Jul 27, 2023 at 04:02:16PM +, Dragos Tatulea wrote:
> > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> > > > > The ndev was accessed on shutdown without a check if it actually 
> > > > > exists.
> > > > > This triggered the crash pasted below. This patch simply adds a check
> > > > > before using ndev.
> > > > > 
> > > > >  BUG: kernel NULL pointer dereference, address: 0300
> > > > >  #PF: supervisor read access in kernel mode
> > > > >  #PF: error_code(0x) - not-present page
> > > > >  PGD 0 P4D 0
> > > > >  Oops:  [#1] SMP
> > > > >  CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> > > > >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > > > > rel-1.13.0-0-
> > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > >  RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > >  RSP: 0018:8881003bfdc0 EFLAGS: 00010286
> > > > >  RAX: 888103befba0 RBX: 888109d28008 RCX: 0017
> > > > >  RDX: 0001 RSI: 0212 RDI: 888109d28000
> > > > >  RBP:  R08: 000d3a3a3882 R09: 0001
> > > > >  R10:  R11:  R12: 888109d28000
> > > > >  R13: 888109d28080 R14: fee1dead R15: 
> > > > >  FS:  7f4969e0be40() GS:88852c80()
> > > > > knlGS:
> > > > >  CS:  0010 DS:  ES:  CR0: 80050033
> > > > >  CR2: 0300 CR3: 0001051cd006 CR4: 00370eb0
> > > > >  DR0:  DR1:  DR2: 
> > > > >  DR3:  DR6: fffe0ff0 DR7: 0400
> > > > >  Call Trace:
> > > > >   
> > > > >   ? __die+0x20/0x60
> > > > >   ? page_fault_oops+0x14c/0x3c0
> > > > >   ? exc_page_fault+0x75/0x140
> > > > >   ? asm_exc_page_fault+0x22/0x30
> > > > >   ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > >   device_shutdown+0x13e/0x1e0
> > > > >   kernel_restart+0x36/0x90
> > > > >   __do_sys_reboot+0x141/0x210
> > > > >   ? vfs_writev+0xcd/0x140
> > > > >   ? handle_mm_fault+0x161/0x260
> > > > >   ? do_writev+0x6b/0x110
> > > > >   do_syscall_64+0x3d/0x90
> > > > >   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > > > >  RIP: 0033:0x7f496990fb56
> > > > >  RSP: 002b:7fffc7bdde88 EFLAGS: 0206 ORIG_RAX: 
> > > > > 00a9
> > > > >  RAX: ffda RBX:  RCX: 7f496990fb56
> > > > >  RDX: 01234567 RSI: 28121969 RDI: fee1dead
> > > > >  RBP: 7fffc7bde1d0 R08:  R09: 
> > > > >  R10:  R11: 0206 R12: 
> > > > >  R13: 7fffc7bddf10 R14:  R15: 7fffc7bde2b8
> > > > >   
> > > > >  CR2: 0300
> > > > >  ---[ end trace  ]---
> > > > > 
> > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> > > > > Signed-off-by: Dragos Tatulea 
> > > > > ---
> > > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > index 9138ef2fb2c8..e2e7ebd71798 100644
> > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct 
> > > > > auxiliary_device
> > > > > *auxdev)
> > > > > mgtdev = auxiliary_get_drvdata(auxdev);
> > > > > ndev = mgtdev->ndev;
> > > > >  
> > > > > -   free_irqs(ndev);
> > > > > +   if (ndev)
> > > > > +   free_irqs(ndev);
> > > > >  }
> > > > >  
> > > > 
> > > > something I don't get:
> > > > irqs are allocated in mlx5_vdpa_dev_add
> > > > why are they not freed in mlx5_vdpa_dev_del?
> > > > 
> > > That is a good point. I will try to find out. I also don't get why 
> > > free_irq
> > > is
> > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can
> > > change
> > > that in a different refactoring.
> > 
> > as it is I have no idea whether e.g. ndev can change
> > between these two call sites. that would make the check
> > pointless.
> > 
> > > > this is what's creating all this mess.
> > > > 
> > > > 
> > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
> > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I
> > > see. Or
> > > am I missing something?
> > 
> > and why do we care whether irqs are freed on shutdown?
> > 
> Had to ask around a bit to find out the answer: there can be issues with kexec
> IRQ allocation on some platforms. It is documented here [0] for mlx5_core.
> 
> 

Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

2023-07-31 Thread Dragos Tatulea via Virtualization
On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 27, 2023 at 04:02:16PM +, Dragos Tatulea wrote:
> > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> > > > The ndev was accessed on shutdown without a check if it actually exists.
> > > > This triggered the crash pasted below. This patch simply adds a check
> > > > before using ndev.
> > > > 
> > > >  BUG: kernel NULL pointer dereference, address: 0300
> > > >  #PF: supervisor read access in kernel mode
> > > >  #PF: error_code(0x) - not-present page
> > > >  PGD 0 P4D 0
> > > >  Oops:  [#1] SMP
> > > >  CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> > > >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-
> > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > >  RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > >  RSP: 0018:8881003bfdc0 EFLAGS: 00010286
> > > >  RAX: 888103befba0 RBX: 888109d28008 RCX: 0017
> > > >  RDX: 0001 RSI: 0212 RDI: 888109d28000
> > > >  RBP:  R08: 000d3a3a3882 R09: 0001
> > > >  R10:  R11:  R12: 888109d28000
> > > >  R13: 888109d28080 R14: fee1dead R15: 
> > > >  FS:  7f4969e0be40() GS:88852c80()
> > > > knlGS:
> > > >  CS:  0010 DS:  ES:  CR0: 80050033
> > > >  CR2: 0300 CR3: 0001051cd006 CR4: 00370eb0
> > > >  DR0:  DR1:  DR2: 
> > > >  DR3:  DR6: fffe0ff0 DR7: 0400
> > > >  Call Trace:
> > > >   
> > > >   ? __die+0x20/0x60
> > > >   ? page_fault_oops+0x14c/0x3c0
> > > >   ? exc_page_fault+0x75/0x140
> > > >   ? asm_exc_page_fault+0x22/0x30
> > > >   ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > >   device_shutdown+0x13e/0x1e0
> > > >   kernel_restart+0x36/0x90
> > > >   __do_sys_reboot+0x141/0x210
> > > >   ? vfs_writev+0xcd/0x140
> > > >   ? handle_mm_fault+0x161/0x260
> > > >   ? do_writev+0x6b/0x110
> > > >   do_syscall_64+0x3d/0x90
> > > >   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > > >  RIP: 0033:0x7f496990fb56
> > > >  RSP: 002b:7fffc7bdde88 EFLAGS: 0206 ORIG_RAX: 00a9
> > > >  RAX: ffda RBX:  RCX: 7f496990fb56
> > > >  RDX: 01234567 RSI: 28121969 RDI: fee1dead
> > > >  RBP: 7fffc7bde1d0 R08:  R09: 
> > > >  R10:  R11: 0206 R12: 
> > > >  R13: 7fffc7bddf10 R14:  R15: 7fffc7bde2b8
> > > >   
> > > >  CR2: 0300
> > > >  ---[ end trace  ]---
> > > > 
> > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> > > > Signed-off-by: Dragos Tatulea 
> > > > ---
> > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > index 9138ef2fb2c8..e2e7ebd71798 100644
> > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device
> > > > *auxdev)
> > > > mgtdev = auxiliary_get_drvdata(auxdev);
> > > > ndev = mgtdev->ndev;
> > > >  
> > > > -   free_irqs(ndev);
> > > > +   if (ndev)
> > > > +   free_irqs(ndev);
> > > >  }
> > > >  
> > > 
> > > something I don't get:
> > > irqs are allocated in mlx5_vdpa_dev_add
> > > why are they not freed in mlx5_vdpa_dev_del?
> > > 
> > That is a good point. I will try to find out. I also don't get why free_irq
> > is
> > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can
> > change
> > that in a different refactoring.
> 
> as it is I have no idea whether e.g. ndev can change
> between these two call sites. that would make the check
> pointless.
> 
> > > this is what's creating all this mess.
> > > 
> > > 
> > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
> > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I
> > see. Or
> > am I missing something?
> 
> and why do we care whether irqs are freed on shutdown?
> 
Had to ask around a bit to find out the answer: there can be issues with kexec
IRQ allocation on some platforms. It is documented here [0] for mlx5_core.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/main.c#n2129

Thanks,
Dragos
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-31 Thread Jason Wang
On Thu, Jul 27, 2023 at 5:46 PM Michael S. Tsirkin  wrote:
>
> On Thu, Jul 27, 2023 at 04:59:33PM +0800, Jason Wang wrote:
> > > They really shouldn't - any NIC that takes forever to
> > > program will create issues in the networking stack.
> >
> > Unfortunately, it's not rare as the device/cvq could be implemented
> > via firmware or software.
>
> Currently that mean one either has sane firmware with a scheduler that
> can meet deadlines, or loses ability to report errors back.
>
> > > But if they do they can always set this flag too.
> >
> > This may have false negatives and may confuse the management.
> >
> > Maybe we can extend the networking core to allow some device specific
> > configurations to be done with device specific lock without rtnl. For
> > example, split the set_channels to
> >
> > pre_set_channels
> > set_channels
> > post_set_channels
> >
> > The device specific part could be done in pre and post without a rtnl lock?
> >
> > Thanks
>
>
> Would the benefit be that errors can be reported to userspace then?
> Then maybe.  I think you will have to show how this works for at least
> one card besides virtio.

Even for virtio, this seems not easy, as e.g the
virtnet_send_command() and netif_set_real_num_tx_queues() need to
appear to be atomic to the networking core.

I wonder if we can re-consider the way of a timeout here and choose a
sane value as a start.

Thanks

>
>
> --
> MST
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next V4 2/3] virtio_net: support per queue interrupt coalesce command

2023-07-31 Thread Jason Wang


在 2023/7/28 13:46, Michael S. Tsirkin 写道:

On Thu, Jul 27, 2023 at 03:28:32PM +0200, Paolo Abeni wrote:

On Tue, 2023-07-25 at 16:07 +0300, Gavin Li wrote:

Add interrupt_coalesce config in send_queue and receive_queue to cache user
config.

Send per virtqueue interrupt moderation config to underlying device in
order to have more efficient interrupt moderation and cpu utilization of
guest VM.

Additionally, address all the VQs when updating the global configuration,
as now the individual VQs configuration can diverge from the global
configuration.

Signed-off-by: Gavin Li 
Reviewed-by: Dragos Tatulea 
Reviewed-by: Jiri Pirko 
Acked-by: Michael S. Tsirkin 

FTR, this patch is significantly different from the version previously
acked/reviewed, I'm unsure if all the reviewers are ok with the new
one.

[...]

still ok by me

Acked-by: Michael S. Tsirkin 

let's wait for Jason too.



I'm fine with this series (I've acked each patch).

Thanks





  static int virtnet_set_coalesce(struct net_device *dev,
struct ethtool_coalesce *ec,
struct kernel_ethtool_coalesce *kernel_coal,
struct netlink_ext_ack *extack)
  {
struct virtnet_info *vi = netdev_priv(dev);
-   int ret, i, napi_weight;
+   int ret, queue_number, napi_weight;
bool update_napi = false;
  
  	/* Can't change NAPI weight if the link is up */

napi_weight = ec->tx_max_coalesced_frames ? NAPI_POLL_WEIGHT : 0;
-   if (napi_weight ^ vi->sq[0].napi.weight) {
-   if (dev->flags & IFF_UP)
-   return -EBUSY;
-   else
-   update_napi = true;
+   for (queue_number = 0; queue_number < vi->max_queue_pairs; 
queue_number++) {
+   ret = virtnet_should_update_vq_weight(dev->flags, napi_weight,
+ 
vi->sq[queue_number].napi.weight,
+ _napi);
+   if (ret)
+   return ret;
+
+   if (update_napi) {
+   /* All queues that belong to [queue_number, 
queue_count] will be
+* updated for the sake of simplicity, which might not 
be necessary

It looks like the comment above still refers to the old code. Should
be:
[queue_number, vi->max_queue_pairs]

Otherwise LGTM, thanks!

Paolo


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next V4 3/3] virtio_net: enable per queue interrupt coalesce feature

2023-07-31 Thread Jason Wang


在 2023/7/25 21:07, Gavin Li 写道:

Enable per queue interrupt coalesce feature bit in driver and validate its
dependency with control queue.

Signed-off-by: Gavin Li 
Reviewed-by: Dragos Tatulea 
Reviewed-by: Jiri Pirko 
Acked-by: Michael S. Tsirkin 
Reviewed-by: Heng Qi 



Acked-by: Jason Wang 

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next V4 2/3] virtio_net: support per queue interrupt coalesce command

2023-07-31 Thread Jason Wang


在 2023/7/25 21:07, Gavin Li 写道:

Add interrupt_coalesce config in send_queue and receive_queue to cache user
config.

Send per virtqueue interrupt moderation config to underlying device in
order to have more efficient interrupt moderation and cpu utilization of
guest VM.

Additionally, address all the VQs when updating the global configuration,
as now the individual VQs configuration can diverge from the global
configuration.

Signed-off-by: Gavin Li 
Reviewed-by: Dragos Tatulea 
Reviewed-by: Jiri Pirko 
Acked-by: Michael S. Tsirkin 
Reviewed-by: Heng Qi 



Acked-by: Jason Wang 

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization