* Juraj Marcin (jmar...@redhat.com) wrote:
> Hi Dave,
> 
> On 2025-09-01 17:57, Dr. David Alan Gilbert wrote:
> > * Peter Xu (pet...@redhat.com) wrote:
> > > On Thu, Aug 14, 2025 at 05:42:23PM +0200, Juraj Marcin wrote:
> > > > Fair point, I'll then continue with the PING/PONG solution, the first
> > > > implementation I have seems to be working to resolve Issue 1.
> > > > 
> > > > For rarer split brain, we'll rely on block device locks/mgmt to resolve
> > > > and change the failure handling, so it registers errors from disk
> > > > activation.
> > > > 
> > > > As tested, there should be no problems with the destination
> > > > transitioning to POSTCOPY_PAUSED, since the VM was not started yet.
> > > > 
> > > > However, to prevent the source side from transitioning to
> > > > POSTCOPY_PAUSED, I think adding a new state is still the best option.
> > > > 
> > > > I tried keeping the migration states as they are now and just rely on an
> > > > attribute of MigrationState if 3rd PONG was received, however, this
> > > > collides with (at least) migrate_pause tests, that are waiting for
> > > > POSTCOPY_ACTIVE, and then pause the migration triggering the source to
> > > > resume. We could maybe work around it by waiting for the 3rd pong
> > > > instead, but I am not sure if it is possible from tests, or by not
> > > > resuming if migrate_pause command is executed?
> > > > 
> > > > I also tried extending the span of the DEVICE state, but some functions
> > > > behave differently depending on if they are in postcopy or not, using
> > > > the migration_in_postcopy() function, but adding the DEVICE there isn't
> > > > working either. And treating the DEVICE state sometimes as postcopy and
> > > > sometimes as not seems just too messy, if it would even be possible.
> > > 
> > > Yeah, it might indeed be a bit messy.
> > > 
> > > Is it possible to find a middle ground?  E.g. add postcopy-setup status,
> > > but without any new knob to enable it?  Just to describe the period of 
> > > time
> > > where dest QEMU haven't started running but started loading device states.
> > > 
> > > The hope is libvirt (which, AFAIU, always enables the "events" capability)
> > > can ignore the new postcopy-setup status transition, then maybe we can 
> > > also
> > > introduce the postcopy-setup and make it always appear.
> > 
> > When the destination is started with '-S' (autostart=false), which is what
> > I think libvirt does, doesn't management only start the destination
> > after a certain useful event?
> > In other words, is there an event we already emit to say that the 
> > destination
> > has finished loading the postcopy devices, or could we just add that
> > event, so that management could just wait for that before issuing
> > the continue?
> 
> I am not aware of any such event on the destination side. When postcopy
> (and its switchower) starts, the destination transitions from ACTIVE
> directly to POSTCOPY_ACTIVE in the listen thread while devices are
> loaded concurrently by the main thread.
> 
> There is DEVICE state on the source side, but that is used only on the
> source side when device state is being collected. When device state is
> being loaded on the destination, the source side is also already in
> POSTCOPY_ACTIVE state.

So I wonder what libvirt uses to trigger it starting the destination in
the postcopy case?  It's got to be after the device state has loaded.

Dave

> Best regards,
> 
> Juraj Marcin
> 
> > 
> > Dave
> > 
> > > Thanks,
> > > 
> > > -- 
> > > Peter Xu
> > > 
> > > 
> > -- 
> >  -----Open up your eyes, open up your mind, open up your code -------   
> > / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
> > \        dave @ treblig.org |                               | In Hex /
> >  \ _________________________|_____ http://www.treblig.org   |_______/
> > 
> 
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

Reply via email to