On Thursday, April 4, 2024 10:12 PM, Peter Xu wrote:
> On Thu, Apr 04, 2024 at 06:05:50PM +0800, Wei Wang wrote:
> > Before loading the guest states, ensure that the preempt channel has
> > been ready to use, as some of the states (e.g. via virtio_load) might
> > trigger page faults that will be handled through the preempt channel.
> > So yield to the main thread in the case that the channel create event
> > has been dispatched.
> >
> > Originally-by: Lei Wang <lei4.w...@intel.com>
> > Link:
> > https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced249@intel
> > .com/T/
> > Suggested-by: Peter Xu <pet...@redhat.com>
> > Signed-off-by: Lei Wang <lei4.w...@intel.com>
> > Signed-off-by: Wei Wang <wei.w.w...@intel.com>
> > ---
> >  migration/savevm.c | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> >
> > diff --git a/migration/savevm.c b/migration/savevm.c index
> > 388d7af7cd..fbc9f2bdd4 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -2342,6 +2342,23 @@ static int
> > loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
> >
> >      QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc));
> >
> > +    /*
> > +     * Before loading the guest states, ensure that the preempt channel has
> > +     * been ready to use, as some of the states (e.g. via virtio_load) 
> > might
> > +     * trigger page faults that will be handled through the preempt 
> > channel.
> > +     * So yield to the main thread in the case that the channel create 
> > event
> > +     * has been dispatched.
> > +     */
> > +    do {
> > +        if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
> > +            mis->postcopy_qemufile_dst) {
> > +            break;
> > +        }
> > +
> > +        aio_co_schedule(qemu_get_current_aio_context(),
> qemu_coroutine_self());
> > +        qemu_coroutine_yield();
> > +    } while (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done,
> > + 1));
> 
> I think we need s/!// here, so the same mistake I made?  I think we need to
> rework the retval of qemu_sem_timedwait() at some point later..

No. qemu_sem_timedwait returns false when timeout, which means sem isn’t posted 
yet.
So it needs to go back to the loop. (the patch was tested)

> 
> Besides, this patch kept the sem_wait() in postcopy_preempt_thread() so it
> will wait() on this sem again.  If this qemu_sem_timedwait() accidentally
> consumed the sem count then I think the other thread can hang forever?

I can get the issue you mentioned, and seems better to be placed before the 
creation of
the preempt thread. Then we probably don’t need to wait_sem in the preempt 
thread, as the
channel is guaranteed to be ready when it runs?

Update will be:

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index eccff499cb..5a70ce4f23 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1254,6 +1254,15 @@ int postcopy_ram_incoming_setup(MigrationIncomingState 
*mis)
     }

     if (migrate_postcopy_preempt()) {
+        do {
+            if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
+                mis->postcopy_qemufile_dst) {
+                break;
+            }
+            aio_co_schedule(qemu_get_current_aio_context(), 
qemu_coroutine_self());
+            qemu_coroutine_yield();
+        } while (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done, 1));
+
         /*
          * This thread needs to be created after the temp pages because
          * it'll fetch RAM_CHANNEL_POSTCOPY PostcopyTmpPage immediately.
@@ -1743,12 +1752,6 @@ void *postcopy_preempt_thread(void *opaque)

     qemu_sem_post(&mis->thread_sync_sem);

-    /*
-     * The preempt channel is established in asynchronous way.  Wait
-     * for its completion.
-     */
-    qemu_sem_wait(&mis->postcopy_qemufile_dst_done);







Reply via email to