On Fri, Apr 05, 2024 at 01:38:31AM +0000, Wang, Wei W wrote:
> On Friday, April 5, 2024 4:57 AM, Peter Xu wrote:
> > On Fri, Apr 05, 2024 at 12:48:15AM +0800, Wang, Lei wrote:
> > > On 4/5/2024 0:25, Wang, Wei W wrote:> On Thursday, April 4, 2024 10:12
> > > PM, Peter Xu wrote:
> > > >> On Thu, Apr 04, 2024 at 06:05:50PM +0800, Wei Wang wrote:
> > > >>> Before loading the guest states, ensure that the preempt channel
> > > >>> has been ready to use, as some of the states (e.g. via
> > > >>> virtio_load) might trigger page faults that will be handled through 
> > > >>> the
> > preempt channel.
> > > >>> So yield to the main thread in the case that the channel create
> > > >>> event has been dispatched.
> > > >>>
> > > >>> Originally-by: Lei Wang <lei4.w...@intel.com>
> > > >>> Link:
> > > >>> https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced249@i
> > > >>> ntel
> > > >>> .com/T/
> > > >>> Suggested-by: Peter Xu <pet...@redhat.com>
> > > >>> Signed-off-by: Lei Wang <lei4.w...@intel.com>
> > > >>> Signed-off-by: Wei Wang <wei.w.w...@intel.com>
> > > >>> ---
> > > >>>  migration/savevm.c | 17 +++++++++++++++++
> > > >>>  1 file changed, 17 insertions(+)
> > > >>>
> > > >>> diff --git a/migration/savevm.c b/migration/savevm.c index
> > > >>> 388d7af7cd..fbc9f2bdd4 100644
> > > >>> --- a/migration/savevm.c
> > > >>> +++ b/migration/savevm.c
> > > >>> @@ -2342,6 +2342,23 @@ static int
> > > >>> loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
> > > >>>
> > > >>>      QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc));
> > > >>>
> > > >>> +    /*
> > > >>> +     * Before loading the guest states, ensure that the preempt 
> > > >>> channel
> > has
> > > >>> +     * been ready to use, as some of the states (e.g. via 
> > > >>> virtio_load) might
> > > >>> +     * trigger page faults that will be handled through the preempt
> > channel.
> > > >>> +     * So yield to the main thread in the case that the channel 
> > > >>> create
> > event
> > > >>> +     * has been dispatched.
> > > >>> +     */
> > > >>> +    do {
> > > >>> +        if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
> > > >>> +            mis->postcopy_qemufile_dst) {
> > > >>> +            break;
> > > >>> +        }
> > > >>> +
> > > >>> +        aio_co_schedule(qemu_get_current_aio_context(),
> > > >> qemu_coroutine_self());
> > > >>> +        qemu_coroutine_yield();
> > > >>> +    } while
> > > >>> + (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done,
> > > >>> + 1));
> > > >>
> > > >> I think we need s/!// here, so the same mistake I made?  I think we
> > > >> need to rework the retval of qemu_sem_timedwait() at some point later..
> > > >
> > > > No. qemu_sem_timedwait returns false when timeout, which means sem
> > isn’t posted yet.
> > > > So it needs to go back to the loop. (the patch was tested)
> > >
> > > When timeout, qemu_sem_timedwait() will return -1. I think the patch
> > > test passed may because you will always have at least one yield (the
> > > first yield in the do ...while ...) when loadvm_handle_cmd_packaged()?
> > 
> > My guess is that here the kick will work and qemu_sem_timedwait() later will
> > ETIMEOUT -> qemu_sem_timedwait() returns -1, then the loop just broke.
> > That aio schedule should make sure anyway that the file is ready; the 
> > preempt
> > thread must run before this to not hang that thread.
> 
> Yes, misread of the return value. It still worked because the loop broke at
> the "if (mis->postcopy_qemufile_dst)" check.
> 
> Even below will work:
> do {
>     if (mis->postcopy_qemufile_dst) {
>         break;
>      }
> ...
> } while (1);
> 
> I still don’t see the value of using postcopy_qemufile_dst_done sem in the 
> code though
> It simplify blocks the main thread from creating the preempt channel for 1ms 
> (regardless
> of the possibility about whether the sem has been be posted or not. We add it 
> for the case
> it is not posted and need to go back to the loop).

I think it used to only wait() in the preempt thread, so that is needed.
It's also needed when postcopy is interrupted and need a recover, see
loadvm_postcopy_handle_resume(), in that case it's the postcopy ram load
thread that waits for it rather than the main thread or preempt thread.

Indeed if we move channel creation out of the preempt thread then it seems
we don't need the sem in this path.  However the other path will still need
it, then when the new channel created (postcopy_preempt_new_channel) we'll
need to identify a "switch to postcopy" case or "postcopy recovery" case,
only post the sem when the former.  I think it might complicate the code,
I'll think again tomorrow after a sleep so my brain will work better, but I
doubt this is what we want to do at rc3.

If you feel comfortable, please feel free to send a version that you think
is the most correct so far (if you prefer no timedwait it's fine), and make
sure the test works the best on your side.  Then I'll smoke it a bit during
weekends. Please always keep that in mind if that will be for rc3 it should
be non-intrusive change, or we keep it slow for 9.1 and we don't need to
rush.

Thanks,

-- 
Peter Xu


Reply via email to