On Tue, Apr 04 2017, Michael Wang wrote: > Hi, Neil > > On 04/03/2017 11:25 PM, NeilBrown wrote: >> On Mon, Apr 03 2017, Michael Wang wrote: >> >>> blk_attempt_plug_merge() try to merge bio into request and chain them >>> by 'bi_next', while after the bio is done inside request, we forgot to >>> reset the 'bi_next'. >>> >>> This lead into BUG while removing all the underlying devices from md-raid1, >>> the bio once go through: >>> >>> md_do_sync() >>> sync_request() >>> generic_make_request() >> >> This is a read request from the "first" device. >> >>> blk_queue_bio() >>> blk_attempt_plug_merge() >>> CHAINED HERE >>> >>> will keep chained and reused by: >>> >>> raid1d() >>> sync_request_write() >>> generic_make_request() >> >> This is a write request to some other device, isn't it? >> >> If sync_request_write() is using a bio that has already been used, it >> should call bio_reset() and fill in the details again. >> However I don't see how that would happen. >> Can you give specific details on the situation that triggers the bug? > > We have storage side mapping lv through scst to server, on server side > we assemble them into multipath device, and then assemble these dm into > two raid1. > > The test is firstly do mkfs.ext4 on raid1 then start fio on it, on storage > side we unmap all the lv (could during mkfs or fio), then on server side > we hit the BUG (reproducible).
So I assume the initial resync is still happening at this point?
And you unmap *all* the lv's so you expect IO to fail?
I can see that the code would behave strangely if you have a
bad-block-list configured (which is the default).
Do you have a bbl? If you create the array without the bbl, does it
still crash?
>
> The path of bio was confirmed by add tracing, it is reused in
> sync_request_write()
> with 'bi_next' once chained inside blk_attempt_plug_merge().
I still don't see why it is re-used.
I assume you didn't explicitly ask for a check/repair (i.e. didn't write
to .../md/sync_action at all?). In that case MD_RECOVERY_REQUESTED is
not set.
So sync_request() sends only one bio to generic_make_request():
r1_bio->bios[r1_bio->read_disk];
then sync_request_write() *doesn't* send that bio again, but does send
all the others.
So where does it reuse a bio?
>
> We also tried to reset the bi_next inside sync_request_write() before
> generic_make_request() which also works.
>
> The testing was done with 4.4, but we found upstream also left bi_next
> chained after done in request, thus we post this RFC.
>
> Regarding raid1, we haven't found the place on path where the bio was
> reset... where does it supposed to be?
I'm not sure what you mean.
We only reset bios when they are being reused.
One place is in process_checks() where bio_reset() is called before
filling in all the details.
Maybe, in sync_request_write(), before
wbio->bi_rw = WRITE;
add something like
if (wbio->bi_next)
printk("bi_next!= NULL i=%d read_disk=%d bi_end_io=%pf\n",
i, r1_bio->read_disk, wbio->bi_end_io);
that might help narrow down what is happening.
NeilBrown
signature.asc
Description: PGP signature

