On 2025/08/29 17:34, Michael Tokarev wrote:
On 28.08.2025 03:57, Akihiko Odaki wrote:

The posted call trace indicates a lockup happens in the control path, but commit cefd67f25430 ("virtio-net: Fix num_buffers for version 1") changes the data path.

On the other hand, I can come up with a possible failure scenario with commit ce1431615292 ("virtio: Call set_features during reset"). Perhaps it changed the machine state before loading the migrated state, and caused a mismatch between them.

Yes, the problem commit is 0caed25cd171c6 "virtio: Call set_features
during reset", - the OP corrected himself in the next message (subject
line updated).

I need more information to understand the issue. A command line to reproduce the issue is especially helpful because options like mrg_rxbuf=, which you mentioned, tell enabled features, which is valuable information.

See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1112044#69 for
the command line.  The guest is started by libvirtd.

Thank you, now I think I understand the problem.


Please note: this is stable-7.2 series, it is *not* master (I picked
up this commit to 7.2.x).  It'd be interesting to check if master is
affected too.  Unfortunately I never tried migration, and now I only
have my notebook, so can only migrate between two qemu instances on
the same box - which is probably okay too.

I think you need to backport commit 9379ea9db3c0 ("virtio-net: Add queues before loading them") and adda0ad56bd2 ("virtio-net: Add queues for RSS during migration"). Here is an explanation:

First, let me define two variables for conciseness:
N: the number of queue pairs
M: the maximum number of queue pairs, which is determined with
   n->max_queue_pairs

The problem is that QEMU inconsistently chose N for virtio-net in the past. Before commit 8c49756825da ("virtio-net: Add only one queue pair when realizing"):
1) realize() chose M.
2) set_features() chose: 1 (when RSS and MQ are disabled)
                         M (otherwise)

This itself was a problem; both RSS and MQ were disabled when realize() but N was M, which is inconsistent with 2) and this inconsistency was guest-visible.

I wrote commit 8c49756825da ("virtio-net: Add only one queue pair when realizing") to make QEMU implement the behavior in 2) also during realization and fix the inconsistency, but it broke migration when the migrated VM had enabled VIRTIO_NET_F_RSS and VIRTIO_NET_F_MQ because it expected that N == M.

This is also why the backported commit also broke migration; it accidentally fixed the inconsistency between the first reset state and the state after set_features() and caused the same problem.

I wrote commit 9379ea9db3c0 ("virtio-net: Add queues before loading them") to fix the issue and later complemented it with commit adda0ad56bd2 ("virtio-net: Add queues for RSS during migration").

There are several relevant commits because I could not fix the underlying problem at once, but hopefully this email clarifies how the two commits fixed it in the end.

Regards,
Akihiko Odaki

Reply via email to