* Peter Xu ([email protected]) wrote:
> On Wed, Jan 21, 2026 at 01:25:32AM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu ([email protected]) wrote:
> > > On Tue, Jan 20, 2026 at 07:04:09PM +0000, Dr. David Alan Gilbert wrote:
> > 
> > <snip>
> > 
> > > > >   (2) Failure happens _after_ applying the new checkpoint, but 
> > > > > _before_ the
> > > > >       whole checkpoint is applied.
> > > > > 
> > > > >       To be explicit, consider qemu_load_device_state() when the 
> > > > > process of
> > > > >       colo_incoming_process_checkpoint() failed.  It means SVM applied
> > > > >       partial of PVM's checkpoint, I think it should mean PVM is 
> > > > > completely
> > > > >       corrupted.
> > > > 
> > > > As long as the SVM has got the entire checkpoint, then it *can* apply 
> > > > it all
> > > > and carry on from that point.
> > > 
> > > Does it mean we assert() that qemu_load_device_state() will always success
> > > for COLO syncs?
> > 
> > Not sure; I'd expect if that load fails then the SVM fails; if that happens
> > on a periodic checkpoint then the PVM should carry on.
> 
> Hmm right, if qemu_load_device_state() failed, likely PVM is still alive.
> 
> > 
> > > Logically post_load() can invoke anything and I'm not sure if something 
> > > can
> > > start to fail, but I confess I don't know an existing device that can
> > > trigger it.
> > 
> > Like a postcopy, it shouldn't fail unless there's an underlying failure
> > (e.g. storage died)
> 
> Postcopy can definitely fail at post_load()..  Actually Juraj just fixed it
> for 10.2 here so postcopy can now fail properly while save/load device
> states (we used to hang):
> 
> https://lore.kernel.org/r/[email protected]

Ah good.

> The two major causes that can fail postcopy vmstate load that I hit (while
> looking at bugs after you left; I wished you are still here!):
> 
> (1) KVM put() failures due to kernel version mismatch, or,
> 
> (2) virtio post_load() failures due to e.g. virtio feature unsupported.
> 
> Both of them fall into "unsupported dest kernel version" realm, though, so
> indeed it may not affect COLO, as I expect COLO should have two hosts to
> run the same kernel.

Right.

> > > Lukas told me something was broken though with pc machine type, on
> > > post_load() not re-entrant.  I think it might be possible though when
> > > post_load() is relevant to some device states (that guest driver can 
> > > change
> > > between two checkpoint loads), but that's still only theoretical.  So 
> > > maybe
> > > we can indeed assert it here.
> > 
> > I don't understand that non re-entrant bit?
> 
> It may not be the exact wording, the message is here:
> 
> https://lore.kernel.org/r/20260115233500.26fd1628@penguin
> 
>         There is a bug in the emulated ahci disk controller which crashes
>         when it's vmstate is loaded more than once.
> 
> I was expecting it's a post_load() because normal scalar vmstates should be
> fine to be loaded more than once.  I didn't look deeper.

Oh I see, multiple calls to post-load rather than calling within side each 
other;
yeh that makes sense - some things aren't expecting that.
But again, you're likely to find that out pretty quickly either way; it's not
something that is made worse by regular checkpointing.

<snip>

> > Oh, I think I've remembered why it's necessary to split it into RAM and 
> > non-RAM;
> > you can't parse a non-RAM stream and know when you've got an EOF flag in 
> > the stream;
> > especially for stuff that's open coded (like some of virtio);   so there's
> 
> Shouldn't customized get()/put() will at least still be wrapped with a
> QEMU_VM_SECTION_FULL section?

Yes - but the VM_SECTION wrapper doesn't tell you how long the data in the
section is; you have to walk your vmstate structures, decoding the data
(and possibly doing magic get()/put()'s) and at the end hoping
you hit a VMS_END (which I added just to spot screwups in this process).
So there's no way to 'read the whole of a VM_SECTION' - because you don't
know you've hit the end until you've decoded it.
(And some of those get() calls are open coded list storage which are something
like

  do {
      x=get()
      if (x & flag)
        break;

      read more data
  } while (...)

so on those you're really hoping you hit the flag.
I did turn some get()/put()'s into vmstate a while back; but those open
coded loops are really hard, there's a lot of variation.

> > no way to write a 'load until EOF' into a simple RAM buffer; you need to be
> > given an explicit size to know how much to expect.
> > 
> > You could do it for the RAM, but you'd need to write a protocol parser
> > to follow the stream to watch for the EOF.  It's actuallly harder with 
> > multifd;
> > how would you make a temporary buffer with multiple streams like that?
> 
> My understanding is postcopy must need a buffer because postcopy needs page
> request to work even during loading vmstates.  I don't see it required for
> COLO, though..

Right that's true for postcopy; but then the only way to load the stream into
that buffer is to load it all at once because of the vmstate problem above.
(and because in the original postcopy we needed the original fd free
for page requests; you might be able to avoid that with multifd now)

> I'll try to see if I can change COLO to use the generic precopy way of
> dumping vmstate, then I'll know if I missed something, and what I've
> missed..

Dave

> Thanks,
> 
> -- 
> Peter Xu
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

Reply via email to