pet...@redhat.com writes:

> From: Peter Xu <pet...@redhat.com>
>
> Multifd provide a threaded model for processing jobs.  On sender side,
> there can be two kinds of job: (1) a list of pages to send, or (2) a sync
> request.
>
> The sync request is a very special kind of job.  It never contains a page
> array, but only a multifd packet telling the dest side to synchronize with
> sent pages.
>
> Before this patch, both requests use the pending_job field, no matter what
> the request is, it will boost pending_job, while multifd sender thread will
> decrement it after it finishes one job.
>
> However this should be racy, because SYNC is special in that it needs to
> set p->flags with MULTIFD_FLAG_SYNC, showing that this is a sync request.
> Consider a sequence of operations where:
>
>   - migration thread enqueue a job to send some pages, pending_job++ (0->1)
>
>   - [...before the selected multifd sender thread wakes up...]
>
>   - migration thread enqueue another job to sync, pending_job++ (1->2),
>     setup p->flags=MULTIFD_FLAG_SYNC
>
>   - multifd sender thread wakes up, found pending_job==2
>     - send the 1st packet with MULTIFD_FLAG_SYNC and list of pages
>     - send the 2nd packet with flags==0 and no pages
>
> This is not expected, because MULTIFD_FLAG_SYNC should hopefully be done
> after all the pages are received.  Meanwhile, the 2nd packet will be
> completely useless, which contains zero information.
>
> I didn't verify above, but I think this issue is still benign in that at
> least on the recv side we always receive pages before handling
> MULTIFD_FLAG_SYNC.  However that's not always guaranteed and just tricky.
>
> One other reason I want to separate it is using p->flags to communicate
> between the two threads is also not clearly defined, it's very hard to read
> and understand why accessing p->flags is always safe; see the current impl
> of multifd_send_thread() where we tried to cache only p->flags.  It doesn't
> need to be that complicated.
>
> This patch introduces pending_sync, a separate flag just to show that the
> requester needs a sync.  Alongside, we remove the tricky caching of
> p->flags now because after this patch p->flags should only be used by
> multifd sender thread now, which will be crystal clear.  So it is always
> thread safe to access p->flags.
>
> With that, we can also safely convert the pending_job into a boolean,
> because we don't support >1 pending jobs anyway.
>
> Signed-off-by: Peter Xu <pet...@redhat.com>

Reviewed-by: Fabiano Rosas <faro...@suse.de>


Reply via email to