On Fri, Sep 22, 2023 at 11:53:17AM -0300, Fabiano Rosas wrote: > Commit d2026ee117 ("multifd: Fix the number of channels ready") moved > the "post" of channels_ready to the start of the multifd_send_thread() > loop and added a missing "wait" at multifd_send_sync_main(). While it > does work, the placement of the wait goes against what the rest of the > code does. > > The sequence at multifd_send_thread() is: > > qemu_sem_post(&multifd_send_state->channels_ready); > qemu_sem_wait(&p->sem); > <work> > if (flags & MULTIFD_FLAG_SYNC) { > qemu_sem_post(&p->sem_sync); > } > > Which means that the sending thread makes itself available > (channels_ready) and waits for more work (sem). So the sequence in the > migration thread should be to check if any channel is available > (channels_ready), give it some work and set it off (sem): > > qemu_sem_wait(&multifd_send_state->channels_ready);
Here it means we have at least 1 free send thread, then... > <enqueue work> > qemu_sem_post(&p->sem); ... here we enqueue some work to the current thread (pointed by "i"), no matter it's free or not, as "i" may not always point to the free thread. > if (flags & MULTIFD_FLAG_SYNC) { > qemu_sem_wait(&p->sem_sync); > } So I must confess I never fully digest how these sem/mutex/.. worked in multifd, since the 1st day it's introduced.. so please take below comment with a grain of salt.. It seems to me that the current design allows >1 pending_job for a thread. Here the current code didn't do "wait(channels_ready)" because it doesn't need to - it simply always queue an MULTIFD_FLAG_SYNC pending job over the thread, and wait for it to run. >From that POV I think I can understand why "wait(channels_ready)" is not needed here. But then I'm confused because we don't have a real QUEUE to put those requests; we simply apply this: multifd_send_sync_main(): p->flags |= MULTIFD_FLAG_SYNC; Even if this send thread can be busy handling a batch of pages and accessing p->flags. I think it can actually race with the send thread reading the flag at the exact same time: multifd_send_thread(): multifd_send_fill_packet(p); flags = p->flags; <-------------- here And whether it sees MULTIFD_FLAG_SYNC is unpredictable. If it sees it, it'll post(sem_sync) in this round. If it doesn't see it, it'll post(sem_sync) in the next round. In whatever way, we'll generate an empty multifd packet to the wire I think, even though I don't know whether that's needed at all... I'm not sure whether we should fix it in a more complete form, by not sending that empty multifd packet at all? Because that only contains the header without any real page inside, IIUC, so it seems to be a waste of resource. Here what we want is only to kick sem_sync? > > The reason there's no deadlock today is that the migration thread > enqueues the SYNC packet right before the wait on channels_ready and > we end up taking advantage of the out-of-order post to sem: > > ... > qemu_sem_post(&p->sem); > } > for (i = 0; i < migrate_multifd_channels(); i++) { > MultiFDSendParams *p = &multifd_send_state->params[i]; > > qemu_sem_wait(&multifd_send_state->channels_ready); > trace_multifd_send_sync_main_wait(p->id); > qemu_sem_wait(&p->sem_sync); > ... > > Move the channels_ready wait before the sem post to keep the sequence > consistent. Also fix the error path to post to channels_ready and > sem_sync in the correct order. > > Signed-off-by: Fabiano Rosas <faro...@suse.de> > --- > migration/multifd.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/migration/multifd.c b/migration/multifd.c > index a7c7a947e3..d626740f2f 100644 > --- a/migration/multifd.c > +++ b/migration/multifd.c > @@ -618,6 +618,7 @@ int multifd_send_sync_main(QEMUFile *f) > > trace_multifd_send_sync_main_signal(p->id); > > + qemu_sem_wait(&multifd_send_state->channels_ready); > qemu_mutex_lock(&p->mutex); > > if (p->quit) { > @@ -635,7 +636,6 @@ int multifd_send_sync_main(QEMUFile *f) > for (i = 0; i < migrate_multifd_channels(); i++) { > MultiFDSendParams *p = &multifd_send_state->params[i]; > > - qemu_sem_wait(&multifd_send_state->channels_ready); > trace_multifd_send_sync_main_wait(p->id); > qemu_sem_wait(&p->sem_sync); > > @@ -763,8 +763,8 @@ out: > * who pay attention to me. > */ > if (ret != 0) { > - qemu_sem_post(&p->sem_sync); > qemu_sem_post(&multifd_send_state->channels_ready); > + qemu_sem_post(&p->sem_sync); I'm not sure why such movement will have a difference; afaiu on the semaphore semantics, post() to two sems don't matter on order? > } > > qemu_mutex_lock(&p->mutex); > -- > 2.35.3 > -- Peter Xu