On 02.02.26 18:42, Chaney, Ben wrote:
On 2/2/26, 9:07 AM, "Peter Xu" <[email protected] <mailto:[email protected]>>
wrote:
The latest version was
[PATCH v9 0/8] virtio-net: live-TAP local migration
https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/[email protected]
<mailto:[email protected]>/__;!!GjvTz_vk!T9a83fcFgNhv7DgAAfFINb-FE6knVp53t5c7XkCrY76jAhmGxZz42YqHIlPt-eV_hh7V-OksMeBMsw$
and I plan to post v10 soon.
Yes, thanks for re-raising this. If we have similar features being
proposed, we should always discuss whether we should stick with one of them
if that'll work for all.
IIUC Vladimir's solution looks indeed superior in that it has less
constraints, and also works for CPR mode.
This was previously discussed here:
https://lore.kernel.org/all/[email protected]/
My impression from that discussion is that
1. Vladimir's solution has some extra complexity
2. We are trying to standardize cpr as the primary method for local migration,
I believe, that we may do local migration of devices with FDs natively,
through one migration channel, without CPR.
In my opinion, CPR breaks migration architecture, creating additional
state, which owns mixed pieces of different devices (and sometimes,
not only FDs, I heard).
Instead we can keep device state all in device state description,
including FDs if needed.
Also, second migration channel, and the fact that on target we can't
access QMP until we say "migrate" on source seems to me an unnecessary
load on the user and management software, we can avoid this.
Next, as I understand, the only point, why we use CPR for devices, is
avoiding rework of initialization code of some devices, which wants to
have FDs at early stage. But that approach can't be applied
everywhere. An example is vhost-user-blk: you have to rework
initialization code anyway, as if you simple pass FDs to the target in
CPR state, when source is still running, target will simple break the
source, touching the FDs. And, if we can't touch FDs until source stop
- it's actually a usual migration, and we can pass FDs through main
migration channel, doing necessary things in pre-save and post-load,
as usual.
Hmm, looking at patch 01 here, I understand, that virtio-net/TAP does
suffer from same problem? That we actually must not use passed FDs on
target, when source is still running? But stopping source earlier
means increase freeze-time. I think, if we can avoid it (and we can)
we should avoid it.
so the benefit of supporting non-cpr local transfers is slightly double edged
So, I think, if we plan that there would be more and more devices,
supporting FDs local migration, and we have any change of fitting them
into the "old" migration architecture (without CPR), we should try it.
--
Hm, I don't have a full picture of CPR, it's not only device migration,
but also some other things? Interesting, how much feasible is to move
all these things into main migration channel. That's the question I
can't answer now. But even if keep CPR for some non-device things, it
seems still good to keep the whole state description for a device in
one place - in device code, like it was historically.
--
Best regards,
Vladimir