On Wed, 6 Sep 2017, Peter Zijlstra wrote: > Attempt to untangle the ordering in blk-mq. The patch introducing the > single smp_mb__before_atomic() is obviously broken in that it doesn't > clearly specify a pairing barrier and an obtained guarantee. > > The comment is further misleading in that it hints that the > deadline store and the COMPLETE store also need to be ordered, but > AFAICT there is no such dependency. However what does appear to be > important is the clear happening _after_ the store, and that worked by > pure accident. > > This clarifies blk_mq_start_request() -- we should not get there with > STARTING set -- this simplifies the code and makes the barrier usage > sane (the old code could be read to allow not having _any_ atomic after > the barrier, in which case the barrier hasn't got anything to order). We > then also introduce the missing pairing barrier for it. > > Also down-grade the barrier to smp_wmb(), this is cheaper for > PowerPC/ARM and doesn't cost anything extra on x86. > > And it documents the STARTING vs COMPLETE ordering. Although I've not > been entirely successful in reverse engineering the blk-mq state > machine so there might still be more funnies around timeout vs > requeue. > > If I got anything wrong, feel free to educate me by adding comments to > clarify things ;-) > > Cc: Alan Stern <st...@rowland.harvard.edu> > Cc: Will Deacon <will.dea...@arm.com> > Cc: Ming Lei <tom.leim...@gmail.com> > Cc: Christoph Hellwig <h...@lst.de> > Cc: Jens Axboe <ax...@fb.com> > Cc: Andrea Parri <parri.and...@gmail.com> > Cc: Boqun Feng <boqun.f...@gmail.com> > Cc: Bart Van Assche <bart.vanass...@wdc.com> > Cc: "Paul E. McKenney" <paul...@linux.vnet.ibm.com> > Fixes: 538b75341835 ("blk-mq: request deadline must be visible before marking > rq as started") > Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> > --- > - spelling; Andrea and Bart > - compiles (urgh!) > - smp_wmb(); Adrea > > > block/blk-mq.c | 52 ++++++++++++++++++++++++++++++++++++++++------------ > block/blk-timeout.c | 2 +- > 2 files changed, 41 insertions(+), 13 deletions(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 4603b115e234..506a0f355117 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -558,22 +558,32 @@ void blk_mq_start_request(struct request *rq) > > blk_add_timer(rq); > > - /* > - * Ensure that ->deadline is visible before set the started > - * flag and clear the completed flag. > - */ > - smp_mb__before_atomic(); > + WARN_ON_ONCE(test_bit(REQ_ATOM_STARTED, &rq->atomic_flags)); > > /* > * Mark us as started and clear complete. Complete might have been > * set if requeue raced with timeout, which then marked it as > * complete. So be sure to clear complete again when we start > * the request, otherwise we'll ignore the completion event. > + * > + * Ensure that ->deadline is visible before we set STARTED, such that > + * blk_mq_check_expired() is guaranteed to observe our ->deadline when > + * it observes STARTED. > */ > - if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) > - set_bit(REQ_ATOM_STARTED, &rq->atomic_flags); > - if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) > + smp_wmb(); > + set_bit(REQ_ATOM_STARTED, &rq->atomic_flags); > + if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) { > + /* > + * Coherence order guarantees these consecutive stores to a > + * single variable propagate in the specified order. Thus the > + * clear_bit() is ordered _after_ the set bit. See > + * blk_mq_check_expired(). > + * > + * (the bits must be part of the same byte for this to be > + * true).
Adding this comment is well and good, but for more security you should also add a comment (maybe even a compile-time check) to the place where REQ_ATOM_STARTED and REQ_ATOM_COMPLETE are defined. Otherwise they might eventually get moved into separate bytes. Alan Stern