On Thu, Oct 19, 2023 at 08:28:05PM +0200, Juan Quintela wrote: > Peter Xu <pet...@redhat.com> wrote: > > On Thu, Oct 19, 2023 at 05:00:02PM +0200, Juan Quintela wrote: > >> Peter Xu <pet...@redhat.com> wrote: > >> > Fabiano, > >> > > >> > Sorry to look at this series late; I messed up my inbox after I reworked > >> > my > >> > arrangement methodology of emails. ;) > >> > > >> > On Thu, Oct 19, 2023 at 11:06:06AM +0200, Juan Quintela wrote: > >> >> Fabiano Rosas <faro...@suse.de> wrote: > >> >> > The channels_ready semaphore is a global variable not linked to any > >> >> > single multifd channel. Waiting on it only means that "some" channel > >> >> > has become ready to send data. Since we need to address the channels > >> >> > by index (multifd_send_state->params[i]), that information adds > >> >> > nothing of value. > > >> And that is what we do here. > >> We didn't had this last line (not needed for making sure the channels > >> are ready here). > >> > >> But needed to make sure that we are maintaining channels_ready exact. > > > > I didn't expect it to be exact, I think that's the major part of confusion. > > For example, I see this comment: > > > > static void *multifd_send_thread(void *opaque) > > ... > > } else { > > qemu_mutex_unlock(&p->mutex); > > /* sometimes there are spurious wakeups */ > > } > > I put that there during development, and let it there just to be safe. > Years later I put an assert() there and did lots of migrations, never > hit it. > > > So do we have spurious wakeup anywhere for either p->sem or channels_ready? > > They are related, because if we got spurious p->sem wakeups, then we'll > > boost channels_ready one more time too there. > > I think that we can change that for g_assert_not_reached()
Sounds good. We can also use an error_erport_once(), depending on your confidence of that. :) Dropping that comment definitely helps. I had a quick look, indeed I think it's safe even with assert. We may want to put some more comment on when one should kick p->sem; IIUC it can only be kicked in either (1) pending_job increased, or (2) set exiting=1. Then it seems all guaranteed. Thanks, -- Peter Xu