On Thursday, April 4, 2024 10:12 PM, Peter Xu wrote: > On Thu, Apr 04, 2024 at 06:05:50PM +0800, Wei Wang wrote: > > Before loading the guest states, ensure that the preempt channel has > > been ready to use, as some of the states (e.g. via virtio_load) might > > trigger page faults that will be handled through the preempt channel. > > So yield to the main thread in the case that the channel create event > > has been dispatched. > > > > Originally-by: Lei Wang <lei4.w...@intel.com> > > Link: > > https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced249@intel > > .com/T/ > > Suggested-by: Peter Xu <pet...@redhat.com> > > Signed-off-by: Lei Wang <lei4.w...@intel.com> > > Signed-off-by: Wei Wang <wei.w.w...@intel.com> > > --- > > migration/savevm.c | 17 +++++++++++++++++ > > 1 file changed, 17 insertions(+) > > > > diff --git a/migration/savevm.c b/migration/savevm.c index > > 388d7af7cd..fbc9f2bdd4 100644 > > --- a/migration/savevm.c > > +++ b/migration/savevm.c > > @@ -2342,6 +2342,23 @@ static int > > loadvm_handle_cmd_packaged(MigrationIncomingState *mis) > > > > QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc)); > > > > + /* > > + * Before loading the guest states, ensure that the preempt channel has > > + * been ready to use, as some of the states (e.g. via virtio_load) > > might > > + * trigger page faults that will be handled through the preempt > > channel. > > + * So yield to the main thread in the case that the channel create > > event > > + * has been dispatched. > > + */ > > + do { > > + if (!migrate_postcopy_preempt() || !qemu_in_coroutine() || > > + mis->postcopy_qemufile_dst) { > > + break; > > + } > > + > > + aio_co_schedule(qemu_get_current_aio_context(), > qemu_coroutine_self()); > > + qemu_coroutine_yield(); > > + } while (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done, > > + 1)); > > I think we need s/!// here, so the same mistake I made? I think we need to > rework the retval of qemu_sem_timedwait() at some point later..
No. qemu_sem_timedwait returns false when timeout, which means sem isn’t posted yet. So it needs to go back to the loop. (the patch was tested) > > Besides, this patch kept the sem_wait() in postcopy_preempt_thread() so it > will wait() on this sem again. If this qemu_sem_timedwait() accidentally > consumed the sem count then I think the other thread can hang forever? I can get the issue you mentioned, and seems better to be placed before the creation of the preempt thread. Then we probably don’t need to wait_sem in the preempt thread, as the channel is guaranteed to be ready when it runs? Update will be: diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index eccff499cb..5a70ce4f23 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -1254,6 +1254,15 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis) } if (migrate_postcopy_preempt()) { + do { + if (!migrate_postcopy_preempt() || !qemu_in_coroutine() || + mis->postcopy_qemufile_dst) { + break; + } + aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self()); + qemu_coroutine_yield(); + } while (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done, 1)); + /* * This thread needs to be created after the temp pages because * it'll fetch RAM_CHANNEL_POSTCOPY PostcopyTmpPage immediately. @@ -1743,12 +1752,6 @@ void *postcopy_preempt_thread(void *opaque) qemu_sem_post(&mis->thread_sync_sem); - /* - * The preempt channel is established in asynchronous way. Wait - * for its completion. - */ - qemu_sem_wait(&mis->postcopy_qemufile_dst_done);