On Fri, Sep 22, 2023 at 11:53:17AM -0300, Fabiano Rosas wrote:
> Commit d2026ee117 ("multifd: Fix the number of channels ready") moved
> the "post" of channels_ready to the start of the multifd_send_thread()
> loop and added a missing "wait" at multifd_send_sync_main(). While it
> does work, the placement of the wait goes against what the rest of the
> code does.
> 
> The sequence at multifd_send_thread() is:
> 
>     qemu_sem_post(&multifd_send_state->channels_ready);
>     qemu_sem_wait(&p->sem);
>     <work>
>     if (flags & MULTIFD_FLAG_SYNC) {
>         qemu_sem_post(&p->sem_sync);
>     }
> 
> Which means that the sending thread makes itself available
> (channels_ready) and waits for more work (sem). So the sequence in the
> migration thread should be to check if any channel is available
> (channels_ready), give it some work and set it off (sem):
> 
>     qemu_sem_wait(&multifd_send_state->channels_ready);
>     <enqueue work>
>     qemu_sem_post(&p->sem);
>     if (flags & MULTIFD_FLAG_SYNC) {
>         qemu_sem_wait(&p->sem_sync);
>     }
> 
> The reason there's no deadlock today is that the migration thread
> enqueues the SYNC packet right before the wait on channels_ready and
> we end up taking advantage of the out-of-order post to sem:
> 
>         ...
>         qemu_sem_post(&p->sem);
>     }
>     for (i = 0; i < migrate_multifd_channels(); i++) {
>         MultiFDSendParams *p = &multifd_send_state->params[i];
> 
>         qemu_sem_wait(&multifd_send_state->channels_ready);
>         trace_multifd_send_sync_main_wait(p->id);
>         qemu_sem_wait(&p->sem_sync);
>       ...
> 
> Move the channels_ready wait before the sem post to keep the sequence
> consistent. Also fix the error path to post to channels_ready and
> sem_sync in the correct order.
>

Thank you Fabiano,

Your solution is more complete. I also had in mind getting rid of
sem_sync.

With your second patch, this one could be merged with it?

> Signed-off-by: Fabiano Rosas <faro...@suse.de>
> ---
>  migration/multifd.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index a7c7a947e3..d626740f2f 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -618,6 +618,7 @@ int multifd_send_sync_main(QEMUFile *f)
>  
>          trace_multifd_send_sync_main_signal(p->id);
>  
> +        qemu_sem_wait(&multifd_send_state->channels_ready);
>          qemu_mutex_lock(&p->mutex);
>  
>          if (p->quit) {
> @@ -635,7 +636,6 @@ int multifd_send_sync_main(QEMUFile *f)
>      for (i = 0; i < migrate_multifd_channels(); i++) {
>          MultiFDSendParams *p = &multifd_send_state->params[i];
>  
> -        qemu_sem_wait(&multifd_send_state->channels_ready);
>          trace_multifd_send_sync_main_wait(p->id);
>          qemu_sem_wait(&p->sem_sync);
>  
> @@ -763,8 +763,8 @@ out:
>       * who pay attention to me.
>       */
>      if (ret != 0) {
> -        qemu_sem_post(&p->sem_sync);
>          qemu_sem_post(&multifd_send_state->channels_ready);
> +        qemu_sem_post(&p->sem_sync);

Can this thread in this error case be woken up again between
these two qemu_sem_posts?
I see in other places p->quit is set to true before it.
Or maybe it should one more patch to make these consistent 
as well.

Elena U.
>      }
>  
>      qemu_mutex_lock(&p->mutex);
> -- 
> 2.35.3
> 

Reply via email to