Re: [PATCH 2/2] dm: Avoid use-after-free of a mapped device
On 02/26/13 00:09, Bart Van Assche wrote: > Without your patch my test failed after two or three iterations. With your > patch my test is still running after 53 iterations. So if you want you can > add Tested-by: Bart Van Assche . Great. Thanks for testing. I'll submit a patch with your Reported-by and Tested-by. > Your e-mail and the above patch are also interesting because these explain > why reverting to the v3.7 of drivers/md made my test succeed. > > Note: even if this patch gets accepted I think it's still useful to modify > blk_run_queue() such that it converts recursion into iteration. Yes. That's a separate discussion. Though I'm not sure if it's ok in general to implicitly convert sync run-queue to async one. -- Jun'ichi Nomura, NEC Corporation -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] dm: Avoid use-after-free of a mapped device
On 02/25/13 10:49, Jun'ichi Nomura wrote: diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 314a0e2..51fefb5 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1973,15 +1973,27 @@ static void __bind_mempools(struct mapped_device *md, struct dm_table *t) { struct dm_md_mempools *p = dm_table_get_md_mempools(t); - if (md->io_pool && (md->tio_pool || dm_table_get_type(t) == DM_TYPE_BIO_BASED) && md->bs) { - /* -* The md already has necessary mempools. Reload just the -* bioset because front_pad may have changed because -* a different table was loaded. -*/ - bioset_free(md->bs); - md->bs = p->bs; - p->bs = NULL; + if (md->io_pool && md->bs) { + /* The md already has necessary mempools. */ + if (dm_table_get_type(t) == DM_TYPE_BIO_BASED) { + /* +* Reload bioset because front_pad may have changed +* because a different table was loaded. +*/ + bioset_free(md->bs); + md->bs = p->bs; + p->bs = NULL; + } else if (dm_table_get_type(t) == DM_TYPE_REQUEST_BASED) { + BUG_ON(!md->tio_pool); + /* +* No need to reload in case of request-based dm +* because of fixed size front_pad. +* Note for future: if you are to reload bioset, +* prep-ed requests in queue may have reference +* to bio from the old bioset. +* So you must walk through the queue to unprep. +*/ + } goto out; } Without your patch my test failed after two or three iterations. With your patch my test is still running after 53 iterations. So if you want you can add Tested-by: Bart Van Assche . Your e-mail and the above patch are also interesting because these explain why reverting to the v3.7 of drivers/md made my test succeed. Note: even if this patch gets accepted I think it's still useful to modify blk_run_queue() such that it converts recursion into iteration. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] dm: Avoid use-after-free of a mapped device
Hello Bart, On 02/22/13 19:47, Bart Van Assche wrote: > As the comment above rq_completed() explains, md members must > not be touched after the dm_put() at the end of that function > has been invoked. Avoid that the md->queue can be run > asynchronously after the last md reference has been dropped by > running that queue synchronously. This patch fixes the > following kernel oops: Calling blk_run_queue_async() there should be ok. After dm_put(), the dm device may be removed. But free_dev() in dm.c calls blk_queue_cleanup() and it should solve the race vs. delayed work. And I could reproduce very similar oops without removing dm device by following procedure: (please replace "mpathX" with your dm-multipath map name) # t=`dmsetup table mpathX` # while sleep 1; do \ echo "$t" | dmsetup load mpathX; dmsetup resume mpathX; done Looking at the following back trace: > general protection fault: [#1] SMP > RIP: 0010:[] [] mempool_free+0x24/0xb0 > Call Trace: > > [] bio_put+0x97/0xc0 > [] end_clone_bio+0x35/0x90 [dm_mod] > [] bio_endio+0x1d/0x30 > [] req_bio_endio.isra.51+0xa3/0xe0 > [] blk_update_request+0x118/0x520 > [] blk_update_bidi_request+0x27/0xa0 > [] blk_end_bidi_request+0x2c/0x80 > [] blk_end_request+0x10/0x20 > [] scsi_io_completion+0xfb/0x6c0 [scsi_mod] > [] scsi_finish_command+0xbd/0x120 [scsi_mod] > [] scsi_softirq_done+0x13f/0x160 [scsi_mod] > [] blk_done_softirq+0x80/0xa0 > [] __do_softirq+0xf1/0x250 > [] call_softirq+0x1c/0x30 > [] do_softirq+0x8d/0xc0 > [] irq_exit+0xd5/0xe0 > [] do_IRQ+0x63/0xe0 > [] common_interrupt+0x6f/0x6f > > [] srp_queuecommand+0x8c/0xcb0 [ib_srp] > [] scsi_dispatch_cmd+0x148/0x310 [scsi_mod] > [] scsi_request_fn+0x31e/0x520 [scsi_mod] > [] __blk_run_queue+0x37/0x50 > [] blk_delay_work+0x29/0x40 > [] process_one_work+0x1c3/0x5c0 > [] worker_thread+0x15e/0x440 > [] kthread+0xdb/0xe0 > [] ret_from_fork+0x7c/0xb0 it seems that the bioset was removed while being referenced. c0820cf5 "dm: introduce per_bio_data" started to replace dm bioset during table replacement because the size of bioset front_pad might change for bio-based dm. However, for request-based dm, it is not necessary because the size of front_pad is static. Also we can't simply replace bioset because prep-ed requests in queue have reference to the old bioset. The patch below changes it not to replace bioset for request-based dm. (Brings back to the same behavior with v3.7) With this patch, I could not reproduce the problem. Could you try this? -- Jun'ichi Nomura, NEC Corporation diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 314a0e2..51fefb5 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1973,15 +1973,27 @@ static void __bind_mempools(struct mapped_device *md, struct dm_table *t) { struct dm_md_mempools *p = dm_table_get_md_mempools(t); - if (md->io_pool && (md->tio_pool || dm_table_get_type(t) == DM_TYPE_BIO_BASED) && md->bs) { - /* -* The md already has necessary mempools. Reload just the -* bioset because front_pad may have changed because -* a different table was loaded. -*/ - bioset_free(md->bs); - md->bs = p->bs; - p->bs = NULL; + if (md->io_pool && md->bs) { + /* The md already has necessary mempools. */ + if (dm_table_get_type(t) == DM_TYPE_BIO_BASED) { + /* +* Reload bioset because front_pad may have changed +* because a different table was loaded. +*/ + bioset_free(md->bs); + md->bs = p->bs; + p->bs = NULL; + } else if (dm_table_get_type(t) == DM_TYPE_REQUEST_BASED) { + BUG_ON(!md->tio_pool); + /* +* No need to reload in case of request-based dm +* because of fixed size front_pad. +* Note for future: if you are to reload bioset, +* prep-ed requests in queue may have reference +* to bio from the old bioset. +* So you must walk through the queue to unprep. +*/ + } goto out; } -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] dm: Avoid use-after-free of a mapped device
On Fri, Feb 22 2013 at 6:22am -0500, Bart Van Assche wrote: > On 02/22/13 12:08, Mike Snitzer wrote: > >On Fri, Feb 22 2013 at 5:47am -0500, > >Bart Van Assche wrote: > > > >>As the comment above rq_completed() explains, md members must > >>not be touched after the dm_put() at the end of that function > >>has been invoked. Avoid that the md->queue can be run > >>asynchronously after the last md reference has been dropped by > >>running that queue synchronously. > > > >Your commit header should probably reference commit > >a8c32a5c98943d370ea606a2e7dc04717eb92206 ("dm: fix deadlock with request > >based dm and queue request_fn recursion") and cc: stable with "v3.7+" > >guidance. > > > >Acked-by: Mike Snitzer > > Hello Mike, > > Thanks for reviewing this patch and for the ack. Regarding the > stable tag: had you noticed that commit a8c32a5 had a "Cc: stable" > tag itself and hence probably has already been backported to kernels > older than v3.7 ? Ah yes. Good point. Jens, since this DM change is dependent on Bart's 1/2 block patch it'd be ideal if you could pick up both of these patches for v3.9. Thanks, Mike -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] dm: Avoid use-after-free of a mapped device
On 02/22/13 12:08, Mike Snitzer wrote: On Fri, Feb 22 2013 at 5:47am -0500, Bart Van Assche wrote: As the comment above rq_completed() explains, md members must not be touched after the dm_put() at the end of that function has been invoked. Avoid that the md->queue can be run asynchronously after the last md reference has been dropped by running that queue synchronously. Your commit header should probably reference commit a8c32a5c98943d370ea606a2e7dc04717eb92206 ("dm: fix deadlock with request based dm and queue request_fn recursion") and cc: stable with "v3.7+" guidance. Acked-by: Mike Snitzer Hello Mike, Thanks for reviewing this patch and for the ack. Regarding the stable tag: had you noticed that commit a8c32a5 had a "Cc: stable" tag itself and hence probably has already been backported to kernels older than v3.7 ? Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] dm: Avoid use-after-free of a mapped device
On Fri, Feb 22 2013 at 5:47am -0500, Bart Van Assche wrote: > As the comment above rq_completed() explains, md members must > not be touched after the dm_put() at the end of that function > has been invoked. Avoid that the md->queue can be run > asynchronously after the last md reference has been dropped by > running that queue synchronously. This patch fixes the > following kernel oops: > > general protection fault: [#1] SMP > RIP: 0010:[] [] mempool_free+0x24/0xb0 > Call Trace: > > [] bio_put+0x97/0xc0 > [] end_clone_bio+0x35/0x90 [dm_mod] > [] bio_endio+0x1d/0x30 > [] req_bio_endio.isra.51+0xa3/0xe0 > [] blk_update_request+0x118/0x520 > [] blk_update_bidi_request+0x27/0xa0 > [] blk_end_bidi_request+0x2c/0x80 > [] blk_end_request+0x10/0x20 > [] scsi_io_completion+0xfb/0x6c0 [scsi_mod] > [] scsi_finish_command+0xbd/0x120 [scsi_mod] > [] scsi_softirq_done+0x13f/0x160 [scsi_mod] > [] blk_done_softirq+0x80/0xa0 > [] __do_softirq+0xf1/0x250 > [] call_softirq+0x1c/0x30 > [] do_softirq+0x8d/0xc0 > [] irq_exit+0xd5/0xe0 > [] do_IRQ+0x63/0xe0 > [] common_interrupt+0x6f/0x6f > > [] srp_queuecommand+0x8c/0xcb0 [ib_srp] > [] scsi_dispatch_cmd+0x148/0x310 [scsi_mod] > [] scsi_request_fn+0x31e/0x520 [scsi_mod] > [] __blk_run_queue+0x37/0x50 > [] blk_delay_work+0x29/0x40 > [] process_one_work+0x1c3/0x5c0 > [] worker_thread+0x15e/0x440 > [] kthread+0xdb/0xe0 > [] ret_from_fork+0x7c/0xb0 Your commit header should probably reference commit a8c32a5c98943d370ea606a2e7dc04717eb92206 ("dm: fix deadlock with request based dm and queue request_fn recursion") and cc: stable with "v3.7+" guidance. Acked-by: Mike Snitzer -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] dm: Avoid use-after-free of a mapped device
As the comment above rq_completed() explains, md members must not be touched after the dm_put() at the end of that function has been invoked. Avoid that the md->queue can be run asynchronously after the last md reference has been dropped by running that queue synchronously. This patch fixes the following kernel oops: general protection fault: [#1] SMP RIP: 0010:[] [] mempool_free+0x24/0xb0 Call Trace: [] bio_put+0x97/0xc0 [] end_clone_bio+0x35/0x90 [dm_mod] [] bio_endio+0x1d/0x30 [] req_bio_endio.isra.51+0xa3/0xe0 [] blk_update_request+0x118/0x520 [] blk_update_bidi_request+0x27/0xa0 [] blk_end_bidi_request+0x2c/0x80 [] blk_end_request+0x10/0x20 [] scsi_io_completion+0xfb/0x6c0 [scsi_mod] [] scsi_finish_command+0xbd/0x120 [scsi_mod] [] scsi_softirq_done+0x13f/0x160 [scsi_mod] [] blk_done_softirq+0x80/0xa0 [] __do_softirq+0xf1/0x250 [] call_softirq+0x1c/0x30 [] do_softirq+0x8d/0xc0 [] irq_exit+0xd5/0xe0 [] do_IRQ+0x63/0xe0 [] common_interrupt+0x6f/0x6f [] srp_queuecommand+0x8c/0xcb0 [ib_srp] [] scsi_dispatch_cmd+0x148/0x310 [scsi_mod] [] scsi_request_fn+0x31e/0x520 [scsi_mod] [] __blk_run_queue+0x37/0x50 [] blk_delay_work+0x29/0x40 [] process_one_work+0x1c3/0x5c0 [] worker_thread+0x15e/0x440 [] kthread+0xdb/0xe0 [] ret_from_fork+0x7c/0xb0 Signed-off-by: Bart Van Assche Cc: Alasdair G Kergon Cc: Jens Axboe Cc: Mike Snitzer Cc: Tejun Heo Cc: James Bottomley Cc: --- drivers/md/dm.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 314a0e2..0218fc3 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -729,13 +729,13 @@ static void rq_completed(struct mapped_device *md, int rw, int run_queue) wake_up(&md->wait); /* -* Run this off this callpath, as drivers could invoke end_io while -* inside their request_fn (and holding the queue lock). Calling -* back into ->request_fn() could deadlock attempting to grab the -* queue lock again. +* Although this function may be invoked indirectly from inside +* blk_run_queue(), invoking blk_run_queue() here is safe because that +* function returns immediately when it detects that it has been +* called recursively. */ if (run_queue) - blk_run_queue_async(md->queue); + blk_run_queue(md->queue); /* * dm_put() must be at the end of this function. See the comment above -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html