On Thu, Oct 19, 2023 at 08:28:05PM +0200, Juan Quintela wrote:
> Peter Xu <pet...@redhat.com> wrote:
> > On Thu, Oct 19, 2023 at 05:00:02PM +0200, Juan Quintela wrote:
> >> Peter Xu <pet...@redhat.com> wrote:
> >> > Fabiano,
> >> >
> >> > Sorry to look at this series late; I messed up my inbox after I reworked 
> >> > my
> >> > arrangement methodology of emails. ;)
> >> >
> >> > On Thu, Oct 19, 2023 at 11:06:06AM +0200, Juan Quintela wrote:
> >> >> Fabiano Rosas <faro...@suse.de> wrote:
> >> >> > The channels_ready semaphore is a global variable not linked to any
> >> >> > single multifd channel. Waiting on it only means that "some" channel
> >> >> > has become ready to send data. Since we need to address the channels
> >> >> > by index (multifd_send_state->params[i]), that information adds
> >> >> > nothing of value.
> 
> >> And that is what we do here.
> >> We didn't had this last line (not needed for making sure the channels
> >> are ready here).
> >> 
> >> But needed to make sure that we are maintaining channels_ready exact.
> >
> > I didn't expect it to be exact, I think that's the major part of confusion.
> > For example, I see this comment:
> >
> > static void *multifd_send_thread(void *opaque)
> >        ...
> >         } else {
> >             qemu_mutex_unlock(&p->mutex);
> >             /* sometimes there are spurious wakeups */
> >         }
> 
> I put that there during development, and let it there just to be safe.
> Years later I put an assert() there and did lots of migrations, never
> hit it.
> 
> > So do we have spurious wakeup anywhere for either p->sem or channels_ready?
> > They are related, because if we got spurious p->sem wakeups, then we'll
> > boost channels_ready one more time too there.
> 
> I think that we can change that for g_assert_not_reached()

Sounds good.  We can also use an error_erport_once(), depending on your
confidence of that. :)  Dropping that comment definitely helps.

I had a quick look, indeed I think it's safe even with assert.  We may want
to put some more comment on when one should kick p->sem; IIUC it can only
be kicked in either (1) pending_job increased, or (2) set exiting=1.  Then
it seems all guaranteed.

Thanks,

-- 
Peter Xu


Reply via email to