On 12/6/18 2:04 PM, Jens Axboe wrote: > On 12/6/18 1:56 PM, Bart Van Assche wrote: >> On Thu, 2018-12-06 at 08:47 -0800, Bart Van Assche wrote: >>> If I merge Jens' for-next branch with Linus' master branch, boot the >>> resulting kernel in a VM and run blktests/tests/srp/002 then that test >>> never finishes. The same test passes against Linus' master branch. I >>> think this is a regression. The following appears in the system log if >>> I run that test: >>> >>> Call Trace: >>> INFO: task kworker/0:1:12 blocked for more than 120 seconds. >>> Call Trace: >>> INFO: task ext4lazyinit:2079 blocked for more than 120 seconds. >>> Call Trace: >>> INFO: task fio:2151 blocked for more than 120 seconds. >>> Call Trace: >>> INFO: task fio:2154 blocked for more than 120 seconds. >> >> Hi Jens, >> >> My test results so far are as follows: >> * With kernel v4.20-rc5 test srp/002 passes. >> * With your for-next branch test srp/002 reports the symptoms reported in my >> e-mail. >> * With Linus' master branch from this morning test srp/002 fails in the same >> way as >> your for-next branch. >> * Also with Linus' master branch, test srp/002 passes if I revert the >> following commit: >> ffe81d45322c ("blk-mq: fix corruption with direct issue"). So it seems >> like that >> commit fixed one regression but introduced another regression. > > Yes, I'm on the same page, I've been able to reproduce. It seems to be related > to dm and bypass insert, which is somewhat odd. If I just do: > > diff --git a/block/blk-core.c b/block/blk-core.c > index deb56932f8c4..4c44e6fa0d08 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -2637,7 +2637,8 @@ blk_status_t blk_insert_cloned_request(struct > request_queue *q, struct request * > * bypass a potential scheduler on the bottom device for > * insert. > */ > - return blk_mq_request_issue_directly(rq); > + blk_mq_request_bypass_insert(rq, true); > + return BLK_STS_OK; > } > > spin_lock_irqsave(q->queue_lock, flags); > > it works fine. Well, at least this regression is less serious, I'll bang > out a fix for it and ensure we make -rc6. I'm guessing it's the bypassin > of non-read/write, which your top of dispatch also shows to be a > non-read/write. But there should be no new failure case here that wasn't > possible before, only it's easier to hit now.
OK, so here's the thing. As part of the corruption fix, we disallowed direct dispatch for anything that wasn't a read or write. This means that your WRITE_ZEROES will always fail direct dispatch. When it does, we return busy. But next time dm will try the exact same thing again, blk_insert_cloned_request() -> direct dispatch -> fail. Before we'd succeed eventually, now we will always fail for that type. The insert clone path is unique in that regard. So we have two options - the patch I did above which always just does bypass insert for DM, or we need to mark the request as having failed and just not retry direct dispatch for it. I'm still not crazy about exploring the dispatch insert off this path. And we'd need to do this on the original request in dm, not the clone we are passed or it won't be persistent. Hence I lean towards the already posted patch. -- Jens Axboe