pet...@redhat.com writes: > From: Peter Xu <pet...@redhat.com> > > Multifd provide a threaded model for processing jobs. On sender side, > there can be two kinds of job: (1) a list of pages to send, or (2) a sync > request. > > The sync request is a very special kind of job. It never contains a page > array, but only a multifd packet telling the dest side to synchronize with > sent pages. > > Before this patch, both requests use the pending_job field, no matter what > the request is, it will boost pending_job, while multifd sender thread will > decrement it after it finishes one job. > > However this should be racy, because SYNC is special in that it needs to > set p->flags with MULTIFD_FLAG_SYNC, showing that this is a sync request. > Consider a sequence of operations where: > > - migration thread enqueue a job to send some pages, pending_job++ (0->1) > > - [...before the selected multifd sender thread wakes up...] > > - migration thread enqueue another job to sync, pending_job++ (1->2), > setup p->flags=MULTIFD_FLAG_SYNC > > - multifd sender thread wakes up, found pending_job==2 > - send the 1st packet with MULTIFD_FLAG_SYNC and list of pages > - send the 2nd packet with flags==0 and no pages > > This is not expected, because MULTIFD_FLAG_SYNC should hopefully be done > after all the pages are received. Meanwhile, the 2nd packet will be > completely useless, which contains zero information. > > I didn't verify above, but I think this issue is still benign in that at > least on the recv side we always receive pages before handling > MULTIFD_FLAG_SYNC. However that's not always guaranteed and just tricky. > > One other reason I want to separate it is using p->flags to communicate > between the two threads is also not clearly defined, it's very hard to read > and understand why accessing p->flags is always safe; see the current impl > of multifd_send_thread() where we tried to cache only p->flags. It doesn't > need to be that complicated. > > This patch introduces pending_sync, a separate flag just to show that the > requester needs a sync. Alongside, we remove the tricky caching of > p->flags now because after this patch p->flags should only be used by > multifd sender thread now, which will be crystal clear. So it is always > thread safe to access p->flags. > > With that, we can also safely convert the pending_job into a boolean, > because we don't support >1 pending jobs anyway. > > Signed-off-by: Peter Xu <pet...@redhat.com>
Reviewed-by: Fabiano Rosas <faro...@suse.de>