Re: [PATCH 2/2] dm: Avoid use-after-free of a mapped device

2013-02-25 Thread Jun'ichi Nomura
On 02/26/13 00:09, Bart Van Assche wrote:
> Without your patch my test failed after two or three iterations. With your 
> patch my test is still running after 53 iterations. So if you want you can 
> add Tested-by: Bart Van Assche .

Great. Thanks for testing.
I'll submit a patch with your Reported-by and Tested-by.

> Your e-mail and the above patch are also interesting because these explain 
> why reverting to the v3.7 of drivers/md made my test succeed.
> 
> Note: even if this patch gets accepted I think it's still useful to modify 
> blk_run_queue() such that it converts recursion into iteration.

Yes. That's a separate discussion.
Though I'm not sure if it's ok in general to implicitly convert
sync run-queue to async one.

-- 
Jun'ichi Nomura, NEC Corporation

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] dm: Avoid use-after-free of a mapped device

2013-02-25 Thread Bart Van Assche

On 02/25/13 10:49, Jun'ichi Nomura wrote:

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 314a0e2..51fefb5 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1973,15 +1973,27 @@ static void __bind_mempools(struct mapped_device *md, 
struct dm_table *t)
 {
struct dm_md_mempools *p = dm_table_get_md_mempools(t);

-   if (md->io_pool && (md->tio_pool || dm_table_get_type(t) == DM_TYPE_BIO_BASED) 
&& md->bs) {
-   /*
-* The md already has necessary mempools. Reload just the
-* bioset because front_pad may have changed because
-* a different table was loaded.
-*/
-   bioset_free(md->bs);
-   md->bs = p->bs;
-   p->bs = NULL;
+   if (md->io_pool && md->bs) {
+   /* The md already has necessary mempools. */
+   if (dm_table_get_type(t) == DM_TYPE_BIO_BASED) {
+   /*
+* Reload bioset because front_pad may have changed
+* because a different table was loaded.
+*/
+   bioset_free(md->bs);
+   md->bs = p->bs;
+   p->bs = NULL;
+   } else if (dm_table_get_type(t) == DM_TYPE_REQUEST_BASED) {
+   BUG_ON(!md->tio_pool);
+   /*
+* No need to reload in case of request-based dm
+* because of fixed size front_pad.
+* Note for future: if you are to reload bioset,
+* prep-ed requests in queue may have reference
+* to bio from the old bioset.
+* So you must walk through the queue to unprep.
+*/
+   }
goto out;
}


Without your patch my test failed after two or three iterations. With 
your patch my test is still running after 53 iterations. So if you want 
you can add Tested-by: Bart Van Assche .


Your e-mail and the above patch are also interesting because these 
explain why reverting to the v3.7 of drivers/md made my test succeed.


Note: even if this patch gets accepted I think it's still useful to 
modify blk_run_queue() such that it converts recursion into iteration.


Bart.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] dm: Avoid use-after-free of a mapped device

2013-02-25 Thread Jun'ichi Nomura
Hello Bart,

On 02/22/13 19:47, Bart Van Assche wrote:
> As the comment above rq_completed() explains, md members must
> not be touched after the dm_put() at the end of that function
> has been invoked. Avoid that the md->queue can be run
> asynchronously after the last md reference has been dropped by
> running that queue synchronously. This patch fixes the
> following kernel oops:

Calling blk_run_queue_async() there should be ok.
After dm_put(), the dm device may be removed. But free_dev() in dm.c
calls blk_queue_cleanup() and it should solve the race vs. delayed work.

And I could reproduce very similar oops without removing dm device
by following procedure:
(please replace "mpathX" with your dm-multipath map name)

  # t=`dmsetup table mpathX`
  # while sleep 1; do \
  echo "$t" | dmsetup load mpathX; dmsetup resume mpathX; done

Looking at the following back trace:

> general protection fault:  [#1] SMP
> RIP: 0010:[]  [] mempool_free+0x24/0xb0
> Call Trace:
>   
>   [] bio_put+0x97/0xc0
>   [] end_clone_bio+0x35/0x90 [dm_mod]
>   [] bio_endio+0x1d/0x30
>   [] req_bio_endio.isra.51+0xa3/0xe0
>   [] blk_update_request+0x118/0x520
>   [] blk_update_bidi_request+0x27/0xa0
>   [] blk_end_bidi_request+0x2c/0x80
>   [] blk_end_request+0x10/0x20
>   [] scsi_io_completion+0xfb/0x6c0 [scsi_mod]
>   [] scsi_finish_command+0xbd/0x120 [scsi_mod]
>   [] scsi_softirq_done+0x13f/0x160 [scsi_mod]
>   [] blk_done_softirq+0x80/0xa0
>   [] __do_softirq+0xf1/0x250
>   [] call_softirq+0x1c/0x30
>   [] do_softirq+0x8d/0xc0
>   [] irq_exit+0xd5/0xe0
>   [] do_IRQ+0x63/0xe0
>   [] common_interrupt+0x6f/0x6f
>   
>   [] srp_queuecommand+0x8c/0xcb0 [ib_srp]
>   [] scsi_dispatch_cmd+0x148/0x310 [scsi_mod]
>   [] scsi_request_fn+0x31e/0x520 [scsi_mod]
>   [] __blk_run_queue+0x37/0x50
>   [] blk_delay_work+0x29/0x40
>   [] process_one_work+0x1c3/0x5c0
>   [] worker_thread+0x15e/0x440
>   [] kthread+0xdb/0xe0
>   [] ret_from_fork+0x7c/0xb0

it seems that the bioset was removed while being referenced.

c0820cf5 "dm: introduce per_bio_data" started to replace dm bioset
during table replacement because the size of bioset front_pad might
change for bio-based dm.
However, for request-based dm, it is not necessary because the size
of front_pad is static. Also we can't simply replace bioset because
prep-ed requests in queue have reference to the old bioset.

The patch below changes it not to replace bioset for request-based dm.
(Brings back to the same behavior with v3.7)
With this patch, I could not reproduce the problem.
Could you try this?

-- 
Jun'ichi Nomura, NEC Corporation

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 314a0e2..51fefb5 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1973,15 +1973,27 @@ static void __bind_mempools(struct mapped_device *md, 
struct dm_table *t)
 {
struct dm_md_mempools *p = dm_table_get_md_mempools(t);
 
-   if (md->io_pool && (md->tio_pool || dm_table_get_type(t) == 
DM_TYPE_BIO_BASED) && md->bs) {
-   /*
-* The md already has necessary mempools. Reload just the
-* bioset because front_pad may have changed because
-* a different table was loaded.
-*/
-   bioset_free(md->bs);
-   md->bs = p->bs;
-   p->bs = NULL;
+   if (md->io_pool && md->bs) {
+   /* The md already has necessary mempools. */
+   if (dm_table_get_type(t) == DM_TYPE_BIO_BASED) {
+   /*
+* Reload bioset because front_pad may have changed
+* because a different table was loaded.
+*/
+   bioset_free(md->bs);
+   md->bs = p->bs;
+   p->bs = NULL;
+   } else if (dm_table_get_type(t) == DM_TYPE_REQUEST_BASED) {
+   BUG_ON(!md->tio_pool);
+   /*
+* No need to reload in case of request-based dm
+* because of fixed size front_pad.
+* Note for future: if you are to reload bioset,
+* prep-ed requests in queue may have reference
+* to bio from the old bioset.
+* So you must walk through the queue to unprep.
+*/
+   }
goto out;
}
 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] dm: Avoid use-after-free of a mapped device

2013-02-22 Thread Mike Snitzer
On Fri, Feb 22 2013 at  6:22am -0500,
Bart Van Assche  wrote:

> On 02/22/13 12:08, Mike Snitzer wrote:
> >On Fri, Feb 22 2013 at  5:47am -0500,
> >Bart Van Assche  wrote:
> >
> >>As the comment above rq_completed() explains, md members must
> >>not be touched after the dm_put() at the end of that function
> >>has been invoked. Avoid that the md->queue can be run
> >>asynchronously after the last md reference has been dropped by
> >>running that queue synchronously.
> >
> >Your commit header should probably reference commit
> >a8c32a5c98943d370ea606a2e7dc04717eb92206 ("dm: fix deadlock with request
> >based dm and queue request_fn recursion") and cc: stable with "v3.7+"
> >guidance.
> >
> >Acked-by: Mike Snitzer 
> 
> Hello Mike,
> 
> Thanks for reviewing this patch and for the ack. Regarding the
> stable tag: had you noticed that commit a8c32a5 had a "Cc: stable"
> tag itself and hence probably has already been backported to kernels
> older than v3.7 ?

Ah yes.  Good point.

Jens, since this DM change is dependent on Bart's 1/2 block patch it'd
be ideal if you could pick up both of these patches for v3.9.

Thanks,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] dm: Avoid use-after-free of a mapped device

2013-02-22 Thread Bart Van Assche

On 02/22/13 12:08, Mike Snitzer wrote:

On Fri, Feb 22 2013 at  5:47am -0500,
Bart Van Assche  wrote:


As the comment above rq_completed() explains, md members must
not be touched after the dm_put() at the end of that function
has been invoked. Avoid that the md->queue can be run
asynchronously after the last md reference has been dropped by
running that queue synchronously.


Your commit header should probably reference commit
a8c32a5c98943d370ea606a2e7dc04717eb92206 ("dm: fix deadlock with request
based dm and queue request_fn recursion") and cc: stable with "v3.7+"
guidance.

Acked-by: Mike Snitzer 


Hello Mike,

Thanks for reviewing this patch and for the ack. Regarding the stable 
tag: had you noticed that commit a8c32a5 had a "Cc: stable" tag itself 
and hence probably has already been backported to kernels older than v3.7 ?


Bart.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] dm: Avoid use-after-free of a mapped device

2013-02-22 Thread Mike Snitzer
On Fri, Feb 22 2013 at  5:47am -0500,
Bart Van Assche  wrote:

> As the comment above rq_completed() explains, md members must
> not be touched after the dm_put() at the end of that function
> has been invoked. Avoid that the md->queue can be run
> asynchronously after the last md reference has been dropped by
> running that queue synchronously. This patch fixes the
> following kernel oops:
> 
> general protection fault:  [#1] SMP
> RIP: 0010:[]  [] mempool_free+0x24/0xb0
> Call Trace:
>   
>   [] bio_put+0x97/0xc0
>   [] end_clone_bio+0x35/0x90 [dm_mod]
>   [] bio_endio+0x1d/0x30
>   [] req_bio_endio.isra.51+0xa3/0xe0
>   [] blk_update_request+0x118/0x520
>   [] blk_update_bidi_request+0x27/0xa0
>   [] blk_end_bidi_request+0x2c/0x80
>   [] blk_end_request+0x10/0x20
>   [] scsi_io_completion+0xfb/0x6c0 [scsi_mod]
>   [] scsi_finish_command+0xbd/0x120 [scsi_mod]
>   [] scsi_softirq_done+0x13f/0x160 [scsi_mod]
>   [] blk_done_softirq+0x80/0xa0
>   [] __do_softirq+0xf1/0x250
>   [] call_softirq+0x1c/0x30
>   [] do_softirq+0x8d/0xc0
>   [] irq_exit+0xd5/0xe0
>   [] do_IRQ+0x63/0xe0
>   [] common_interrupt+0x6f/0x6f
>   
>   [] srp_queuecommand+0x8c/0xcb0 [ib_srp]
>   [] scsi_dispatch_cmd+0x148/0x310 [scsi_mod]
>   [] scsi_request_fn+0x31e/0x520 [scsi_mod]
>   [] __blk_run_queue+0x37/0x50
>   [] blk_delay_work+0x29/0x40
>   [] process_one_work+0x1c3/0x5c0
>   [] worker_thread+0x15e/0x440
>   [] kthread+0xdb/0xe0
>   [] ret_from_fork+0x7c/0xb0

Your commit header should probably reference commit
a8c32a5c98943d370ea606a2e7dc04717eb92206 ("dm: fix deadlock with request
based dm and queue request_fn recursion") and cc: stable with "v3.7+"
guidance.

Acked-by: Mike Snitzer 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] dm: Avoid use-after-free of a mapped device

2013-02-22 Thread Bart Van Assche
As the comment above rq_completed() explains, md members must
not be touched after the dm_put() at the end of that function
has been invoked. Avoid that the md->queue can be run
asynchronously after the last md reference has been dropped by
running that queue synchronously. This patch fixes the
following kernel oops:

general protection fault:  [#1] SMP
RIP: 0010:[]  [] mempool_free+0x24/0xb0
Call Trace:
  
  [] bio_put+0x97/0xc0
  [] end_clone_bio+0x35/0x90 [dm_mod]
  [] bio_endio+0x1d/0x30
  [] req_bio_endio.isra.51+0xa3/0xe0
  [] blk_update_request+0x118/0x520
  [] blk_update_bidi_request+0x27/0xa0
  [] blk_end_bidi_request+0x2c/0x80
  [] blk_end_request+0x10/0x20
  [] scsi_io_completion+0xfb/0x6c0 [scsi_mod]
  [] scsi_finish_command+0xbd/0x120 [scsi_mod]
  [] scsi_softirq_done+0x13f/0x160 [scsi_mod]
  [] blk_done_softirq+0x80/0xa0
  [] __do_softirq+0xf1/0x250
  [] call_softirq+0x1c/0x30
  [] do_softirq+0x8d/0xc0
  [] irq_exit+0xd5/0xe0
  [] do_IRQ+0x63/0xe0
  [] common_interrupt+0x6f/0x6f
  
  [] srp_queuecommand+0x8c/0xcb0 [ib_srp]
  [] scsi_dispatch_cmd+0x148/0x310 [scsi_mod]
  [] scsi_request_fn+0x31e/0x520 [scsi_mod]
  [] __blk_run_queue+0x37/0x50
  [] blk_delay_work+0x29/0x40
  [] process_one_work+0x1c3/0x5c0
  [] worker_thread+0x15e/0x440
  [] kthread+0xdb/0xe0
  [] ret_from_fork+0x7c/0xb0

Signed-off-by: Bart Van Assche 
Cc: Alasdair G Kergon 
Cc: Jens Axboe 
Cc: Mike Snitzer 
Cc: Tejun Heo 
Cc: James Bottomley 
Cc: 
---
 drivers/md/dm.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 314a0e2..0218fc3 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -729,13 +729,13 @@ static void rq_completed(struct mapped_device *md, int 
rw, int run_queue)
wake_up(&md->wait);
 
/*
-* Run this off this callpath, as drivers could invoke end_io while
-* inside their request_fn (and holding the queue lock). Calling
-* back into ->request_fn() could deadlock attempting to grab the
-* queue lock again.
+* Although this function may be invoked indirectly from inside
+* blk_run_queue(), invoking blk_run_queue() here is safe because that
+* function returns immediately when it detects that it has been
+* called recursively.
 */
if (run_queue)
-   blk_run_queue_async(md->queue);
+   blk_run_queue(md->queue);
 
/*
 * dm_put() must be at the end of this function. See the comment above
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html