Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init

Dragos Tatulea Thu, 20 Mar 2025 08:55:21 -0700

Hi Lei,

On 03/20, Lei Yang wrote:
> Hi Dragos, Si-Wei
> 
> 1.  I applied [0] [1] [2] to the downstream kernel then tested
> hotplug/unplug, this bug still exists.
> 
> [0] 35025963326e ("vdpa/mlx5: Fix suboptimal range on iotlb iteration")
> [1] 29ce8b8a4fa7 ("vdpa/mlx5: Fix PA offset with unaligned starting iotlb 
> map")
> [2] a6097e0a54a5 ("vdpa/mlx5: Fix oversized null mkey longer than 32bit")
> 
> 2. Si-Wei mentioned two patches [1] [2] have been merged into qemu
> master branch, so based on the test result it can not help fix this
> bug.
> [1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
> [2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")
> 
> 3. I found triggers for the unhealthy report from firmware step is
> just boot up guest when using the current patches qemu. The host dmesg
> will print  unhealthy info immediately after booting up the guest.
> 
Did you set the locked memory to ulimite before (ulimit -l unlimited)?
This could also be the cause for the FW issue.


Thanks,
Dragos

> Thanks
> Lei
> 
> 
> On Wed, Mar 19, 2025 at 8:14 AM Si-Wei Liu <[email protected]> wrote:
> >
> > Hi Lei,
> >
> > On 3/18/2025 7:06 AM, Lei Yang wrote:
> > > On Tue, Mar 18, 2025 at 10:15 AM Jason Wang <[email protected]> wrote:
> > >> On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <[email protected]> wrote:
> > >>> Hi Jonah
> > >>>
> > >>> I tested this series with the vhost_vdpa device based on mellanox
> > >>> ConnectX-6 DX nic and hit the host kernel crash. This problem can be
> > >>> easier to reproduce under the hotplug/unplug device scenario.
> > >>> For the core dump messages please review the attachment.
> > >>> FW version:
> > >>> #  flint -d 0000:0d:00.0 q |grep Version
> > >>> FW Version:            22.44.1036
> > >>> Product Version:       22.44.1036
> > >> The trace looks more like a mlx5e driver bug other than vDPA?
> > >>
> > >> [ 3256.256707] Call Trace:
> > >> [ 3256.256708]  <IRQ>
> > >> [ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
> > >> [ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
> > >> [ 3256.256715]  ? __build_skb+0x4a/0x60
> > >> [ 3256.256719]  ? __die_body.cold+0x8/0xd
> > >> [ 3256.256720]  ? die_addr+0x39/0x60
> > >> [ 3256.256725]  ? exc_general_protection+0x1ec/0x420
> > >> [ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
> > >> [ 3256.256736]  ? __build_skb_around+0x8c/0xf0
> > >> [ 3256.256738]  __build_skb+0x4a/0x60
> > >> [ 3256.256740]  build_skb+0x11/0xa0
> > >> [ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
> > >> [ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
> > >> [ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 
> > >> [mlx5_core]
> > >> [ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
> > >> [ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
> > >> [ 3256.257226]  __napi_poll+0x29/0x170
> > >> [ 3256.257229]  net_rx_action+0x29c/0x370
> > >> [ 3256.257231]  handle_softirqs+0xce/0x270
> > >> [ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
> > >> [ 3256.257238]  common_interrupt+0x80/0xa0
> > >>
> > > Hi Jason
> > >
> > >> Which kernel tree did you use? Can you please try net.git?
> > > I used the latest 9.6 downstream kernel and upstream qemu (applied
> > > this series of patches) to test this scenario.
> > > First based on my test result this bug is related to this series of
> > > patches, the conclusions are based on the following test results(All
> > > test results are based on the above mentioned nic driver):
> > > Case 1: downstream kernel + downstream qemu-kvm  -  pass
> > > Case 2: downstream kernel + upstream qemu (doesn't included this
> > > series of patches)  -  pass
> > > Case 3: downstream kernel + upstream qemu (included this series of
> > > patches)  - failed, reproduce ratio 100%
> > Just as Dragos replied earlier, the firmware was already in a bogus
> > state before the panic that I also suspect it has something to do with
> > various bugs in the downstream kernel. You have to apply the 3 patches
> > to the downstream kernel before you may kick of the relevant tests
> > again. Please pay special attention to which specific command or step
> > that triggers the unhealthy report from firmware, and let us know if you
> > still run into any of them.
> >
> > In addition, as you seem to be testing the device hot plug and unplug
> > use cases, for which the latest qemu should have related fixes
> > below[1][2], but in case they are missed somehow it might also end up
> > with bad firmware state to some extend. Just fyi.
> >
> > [1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
> > [2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")
> >
> > Thanks,
> > -Siwei
> > >
> > > Then I also tried to test it with the net.git tree, but it will hit
> > > the host kernel panic after compiling when rebooting the host. For the
> > > call trace info please review following messages:
> > > [    9.902851] No filesystem could mount root, tried:
> > > [    9.902851]
> > > [    9.909248] Kernel panic - not syncing: VFS: Unable to mount root
> > > fs on "/dev/mapper/rhel_dell--per760--12-root" or unknown-block(0,0)
> > > [    9.921335] CPU: 16 UID: 0 PID: 1 Comm: swapper/0 Not tainted 
> > > 6.14.0-rc6+ #3
> > > [    9.928398] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS
> > > 1.3.2 03/28/2023
> > > [    9.935876] Call Trace:
> > > [    9.938332]  <TASK>
> > > [    9.940436]  panic+0x356/0x380
> > > [    9.943513]  mount_root_generic+0x2e7/0x300
> > > [    9.947717]  prepare_namespace+0x65/0x270
> > > [    9.951731]  kernel_init_freeable+0x2e2/0x310
> > > [    9.956105]  ? __pfx_kernel_init+0x10/0x10
> > > [    9.960221]  kernel_init+0x16/0x1d0
> > > [    9.963715]  ret_from_fork+0x2d/0x50
> > > [    9.967303]  ? __pfx_kernel_init+0x10/0x10
> > > [    9.971404]  ret_from_fork_asm+0x1a/0x30
> > > [    9.975348]  </TASK>
> > > [    9.977555] Kernel Offset: 0xc00000 from 0xffffffff81000000
> > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > [   10.101881] ---[ end Kernel panic - not syncing: VFS: Unable to
> > > mount root fs on "/dev/mapper/rhel_dell--per760--12-root" or
> > > unknown-block(0,0) ]---
> > >
> > > # git log -1
> > > commit 4003c9e78778e93188a09d6043a74f7154449d43 (HEAD -> main,
> > > origin/main, origin/HEAD)
> > > Merge: 8f7617f45009 2409fa66e29a
> > > Author: Linus Torvalds <[email protected]>
> > > Date:   Thu Mar 13 07:58:48 2025 -1000
> > >
> > >      Merge tag 'net-6.14-rc7' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> > >
> > >
> > > Thanks
> > >
> > > Lei
> > >> Thanks
> > >>
> > >>> Best Regards
> > >>> Lei
> > >>>
> > >>> On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer <[email protected]> 
> > >>> wrote:
> > >>>> Current memory operations like pinning may take a lot of time at the
> > >>>> destination.  Currently they are done after the source of the 
> > >>>> migration is
> > >>>> stopped, and before the workload is resumed at the destination.  This 
> > >>>> is a
> > >>>> period where neigher traffic can flow, nor the VM workload can continue
> > >>>> (downtime).
> > >>>>
> > >>>> We can do better as we know the memory layout of the guest RAM at the
> > >>>> destination from the moment that all devices are initializaed.  So
> > >>>> moving that operation allows QEMU to communicate the kernel the maps
> > >>>> while the workload is still running in the source, so Linux can start
> > >>>> mapping them.
> > >>>>
> > >>>> As a small drawback, there is a time in the initialization where QEMU
> > >>>> cannot respond to QMP etc.  By some testing, this time is about
> > >>>> 0.2seconds.  This may be further reduced (or increased) depending on 
> > >>>> the
> > >>>> vdpa driver and the platform hardware, and it is dominated by the cost
> > >>>> of memory pinning.
> > >>>>
> > >>>> This matches the time that we move out of the called downtime window.
> > >>>> The downtime is measured as checking the trace timestamp from the 
> > >>>> moment
> > >>>> the source suspend the device to the moment the destination starts the
> > >>>> eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
> > >>>> secs to 2.0949.
> > >>>>
> > >>>> Future directions on top of this series may include to move more 
> > >>>> things ahead
> > >>>> of the migration time, like set DRIVER_OK or perform actual iterative 
> > >>>> migration
> > >>>> of virtio-net devices.
> > >>>>
> > >>>> Comments are welcome.
> > >>>>
> > >>>> This series is a different approach of series [1]. As the title does 
> > >>>> not
> > >>>> reflect the changes anymore, please refer to the previous one to know 
> > >>>> the
> > >>>> series history.
> > >>>>
> > >>>> This series is based on [2], it must be applied after it.
> > >>>>
> > >>>> [Jonah Palmer]
> > >>>> This series was rebased after [3] was pulled in, as [3] was a 
> > >>>> prerequisite
> > >>>> fix for this series.
> > >>>>
> > >>>> v3:
> > >>>> ---
> > >>>> * Rebase
> > >>>>
> > >>>> v2:
> > >>>> ---
> > >>>> * Move the memory listener registration to vhost_vdpa_set_owner 
> > >>>> function.
> > >>>> * Move the iova_tree allocation to net_vhost_vdpa_init.
> > >>>>
> > >>>> v1 at 
> > >>>> https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.
> > >>>>
> > >>>> [1] 
> > >>>> https://patchwork.kernel.org/project/qemu-devel/cover/[email protected]/
> > >>>> [2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
> > >>>> [3] 
> > >>>> https://lore.kernel.org/qemu-devel/[email protected]/
> > >>>>
> > >>>> Eugenio Pérez (7):
> > >>>>    vdpa: check for iova tree initialized at net_client_start
> > >>>>    vdpa: reorder vhost_vdpa_set_backend_cap
> > >>>>    vdpa: set backend capabilities at vhost_vdpa_init
> > >>>>    vdpa: add listener_registered
> > >>>>    vdpa: reorder listener assignment
> > >>>>    vdpa: move iova_tree allocation to net_vhost_vdpa_init
> > >>>>    vdpa: move memory listener register to vhost_vdpa_init
> > >>>>
> > >>>>   hw/virtio/vhost-vdpa.c         | 98 
> > >>>> ++++++++++++++++++++++------------
> > >>>>   include/hw/virtio/vhost-vdpa.h | 22 +++++++-
> > >>>>   net/vhost-vdpa.c               | 34 ++----------
> > >>>>   3 files changed, 88 insertions(+), 66 deletions(-)
> > >>>>
> > >>>> --
> > >>>> 2.43.5
> > >>>>
> > >>>>
> >
>

Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init

Reply via email to