On Tue, Feb 03, 2026 at 12:57:16PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 02.02.26 18:42, Chaney, Ben wrote:
> > 
> > 
> > On 2/2/26, 9:07 AM, "Peter Xu" <[email protected] 
> > <mailto:[email protected]>> wrote:
> > 
> > > > The latest version was
> > > > 
> > > > [PATCH v9 0/8] virtio-net: live-TAP local migration
> > > > https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/[email protected]
> > > >  
> > > > <mailto:[email protected]>/__;!!GjvTz_vk!T9a83fcFgNhv7DgAAfFINb-FE6knVp53t5c7XkCrY76jAhmGxZz42YqHIlPt-eV_hh7V-OksMeBMsw$
> > > > 
> > > > and I plan to post v10 soon.
> > 
> > 
> > > Yes, thanks for re-raising this. If we have similar features being
> > > proposed, we should always discuss whether we should stick with one of 
> > > them
> > > if that'll work for all.
> > 
> > 
> > > IIUC Vladimir's solution looks indeed superior in that it has less
> > > constraints, and also works for CPR mode.
> > 
> > 
> > This was previously discussed here: 
> > https://lore.kernel.org/all/[email protected]/
> > 
> > My impression from that discussion is that
> > 
> > 1. Vladimir's solution has some extra complexity
> > 2. We are trying to standardize cpr as the primary method for local 
> > migration,
> 
> I believe, that we may do local migration of devices with FDs natively,
> through one migration channel, without CPR.
> 
> In my opinion, CPR breaks migration architecture, creating additional
> state, which owns mixed pieces of different devices (and sometimes,
> not only FDs, I heard).
> 
> Instead we can keep device state all in device state description,
> including FDs if needed.
> 
> Also, second migration channel, and the fact that on target we can't
> access QMP until we say "migrate" on source seems to me an unnecessary
> load on the user and management software, we can avoid this.
> 
> Next, as I understand, the only point, why we use CPR for devices, is
> avoiding rework of initialization code of some devices, which wants to
> have FDs at early stage. But that approach can't be applied
> everywhere. An example is vhost-user-blk: you have to rework
> initialization code anyway, as if you simple pass FDs to the target in
> CPR state, when source is still running, target will simple break the
> source, touching the FDs. And, if we can't touch FDs until source stop
> - it's actually a usual migration, and we can pass FDs through main
> migration channel, doing necessary things in pre-save and post-load,
> as usual.
> 
> Hmm, looking at patch 01 here, I understand, that virtio-net/TAP does
> suffer from same problem? That we actually must not use passed FDs on
> target, when source is still running? But stopping source earlier
> means increase freeze-time. I think, if we can avoid it (and we can)
> we should avoid it.
> 
> > so the benefit of supporting non-cpr local transfers is slightly double 
> > edged
> > 
> 
> So, I think, if we plan that there would be more and more devices,
> supporting FDs local migration, and we have any change of fitting them
> into the "old" migration architecture (without CPR), we should try it.

Well explained, thank you Vladimir.  I wish some day we can move all at
least cpr-transfer users to local-migration and deprecate CPR if ever
possible.  The uncertainty to me is cpr-exec, but I really don't know how
much mgmt is adopting cpr-exec..  cpr-reboot also looks pretty special and
may not be relevant.

The core idea (originated from Steve..) is really about fd sharing, and
it's great if we can do it in a cleaner way.

> 
> --
> 
> Hm, I don't have a full picture of CPR, it's not only device migration,
> but also some other things? Interesting, how much feasible is to move
> all these things into main migration channel. That's the question I
> can't answer now. But even if keep CPR for some non-device things, it
> seems still good to keep the whole state description for a device in
> one place - in device code, like it was historically.

My understanding is there're some special mgmt (Oracle's?) that may depend
on cpr-exec; I'm not sure how far that went in any downstream deployment.

That should be able to reuse mgmt channels too (relevant to chardev fd
sharing, perhaps?) instead of requiring e.g. all monitor ports to reconnect
to a new QEMU after migration.  Said that, I always assumed re-connect is
fine, and most mgmt supports live migration so the mgmt should have that
infrastructure there already.  Maybe Ben would know better.

Thanks,

-- 
Peter Xu


Reply via email to