[PATCH] dm mpath: do not fail path on -ENOSPC

2013-09-12 Thread Jun';ichi Nomura
Since ENOSPC is a target-side error, dm-mpath should just pass the error
information to upper layer instead of retrying itself with path failover.
Otherwise it will end up failing all paths down while path checkers find
all paths ok.

ENOSPC can now be returned from SCSI device after commit a9d6ceb8
("[SCSI] return ENOSPC on thin provisioning failure").

Signed-off-by: Jun'ichi Nomura 
Cc: Hannes Reinecke 
---
 drivers/md/dm-mpath.c | 1 +
 1 file changed, 1 insertion(+)

The patch is not actually tested as I don't have such a storage at my hand.
Hannes, please let me know if you didn't change noretry_error() intentionally.

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index b759a12..075747e 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1268,6 +1268,7 @@ static int noretry_error(int error)
case -EREMOTEIO:
case -EILSEQ:
case -ENODATA:
+   case -ENOSPC:
return 1;
}
 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] scsi: Return ENODATA on medium error

2013-07-01 Thread Jun';ichi Nomura
Hi Hannes,

On 07/01/13 22:16, Hannes Reinecke wrote:
> When a medium error is detected the SCSI stack should return
> ENODATA to the upper layers.
> 
> Cc: Jun'ichi Nomura 
> Signed-off-by: Hannes Reinecke 
> ---
>  block/blk-core.c  |  3 +++
>  drivers/md/dm-mpath.c | 16 +++-
>  drivers/scsi/scsi_error.c |  2 +-
>  drivers/scsi/scsi_lib.c   |  5 +
>  include/scsi/scsi.h   |  1 +
>  5 files changed, 25 insertions(+), 2 deletions(-)
..
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index bdf26f5..57896cea 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c

Thank you for including the change.
But I think this change to dm should be a separate patch in this series
(prerequisite to the SCSI error code change) and Cc to dm-devel.

> @@ -1261,6 +1261,20 @@ static void activate_path(struct work_struct *work)
>   pg_init_done, pgpath);
>  }
>  
> +static int noretry_error(int error)
> +{
> + switch(error) {
> + case -EOPNOTSUPP:
> + case -EREMOTEIO:
> + case -EILSEQ:
> + case -ENODATA:

case -ENOSPC:

We don't want to fail the path for -ENOSPC, do we?

> + return 1;
> + }
> +
> + /* Anything else could be a path failure, so should be retried */
> + return 0;
> +}
> +
>  /*
>   * end_io handling
>   */
> @@ -1284,7 +1298,7 @@ static int do_end_io(struct multipath *m, struct 
> request *clone,
>   if (!error && !clone->errors)
>   return 0;   /* I/O complete */
>  
> - if (error == -EOPNOTSUPP || error == -EREMOTEIO || error == -EILSEQ)
> + if (noretry_error(error))
>   return error;
>  
>   if (mpio->pgpath)

-- 
Jun'ichi Nomura, NEC Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] scsi: Return ENODATA on medium error

2013-07-01 Thread Jun';ichi Nomura
On 07/01/13 17:12, Hannes Reinecke wrote:
> When a medium error is detected the SCSI stack should return
> ENODATA to the upper layers.

Hi Hannes,

since you change the error code from -EREMOTEIO to -ENODATA/-ENOSPC,
upper layers that checks -EREMOTEIO have to be updated as well.

Something like below for dm-multipath.
It seems btrfs checking -EREMOTEIO, too.

-- 
Jun'ichi Nomura, NEC Corporation

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index bdf26f5..15bf881 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1261,6 +1261,21 @@ static void activate_path(struct work_struct *work)
pg_init_done, pgpath);
 }
 
+static int maybe_path_failure(int error)
+{
+   switch(error) {
+   case -EOPNOTSUPP:
+   case -EREMOTEIO:
+   case -EILSEQ:
+   case -ENOSPC:
+   case -ENODATA:
+   return 0;
+   }
+
+   /* Anything else could be a path failure */
+   return 1;
+}
+
 /*
  * end_io handling
  */
@@ -1284,7 +1299,7 @@ static int do_end_io(struct multipath *m, struct request 
*clone,
if (!error && !clone->errors)
return 0;   /* I/O complete */
 
-   if (error == -EOPNOTSUPP || error == -EREMOTEIO || error == -EILSEQ)
+   if (!maybe_path_failure(error))
return error;
 
if (mpio->pgpath)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/2] dm: Avoid use-after-free of a mapped device

2013-03-01 Thread Jun';ichi Nomura
On 02/28/13 22:00, Bart Van Assche wrote:
> How about reposting these patches as a performance optimization ?

How and when do they improve performance?

> With these patches I see a slightly lower latency and slightly higher
> throughput. With a dm-linear mapping on top of a RAM disk (brd), a
> request size of 512 bytes and 100% reads fio reports 2063K IOPS without
> these patches and 2083K IOPS with these two patches applied. That's an
> improvement of about 1%. It's not much but that comes on top of the

dm-linear + brd is not appropriate for testing your patches.
They are both bio-based drivers while your patches change request
processing.  To test the patches, you have to use request-based drivers,
e.g. dm-multipath on top of scsi.

Also, IMO, the measurement should be done not only for IO performance
but also how other processes are interrupted by IO processing.
The use of blk_run_queue_async should have reduced the foot print of
softirq handling.

-- 
Jun'ichi Nomura, NEC Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/2] dm: Avoid use-after-free of a mapped device

2013-02-27 Thread Jun';ichi Nomura
Hi Bart,

On 02/27/13 23:45, Bart Van Assche wrote:
> This mini-series of two patches avoids that the device mapper
> implementation can trigger a use-after-free during removal of a
> mapped device. The two patches in this series are:
> - block: Convert blk_run_queue() recursion into iteration.
> - dm: Avoid running the md queue after the last dm_put().
> 
> Note: these patches are the result of source reading. As far as I know this 
> issue has not (yet) caused any harm.

Ref-counting of mapped device is like this:
  - dm depends on the fact that the block device is opened while there
is bio/request submitted.  So dm_get/put in dm_blk_open/close is
enough to keep mapped device while there are bios.
  - Request-based target has a tiny window between dm_blk_close()
and the end of rq_completed() because the opener may close the device
once the last bio completes even if request is still finishing.
dm_get/dm_put in dm_start_request/rq_completed closes this window.
(See comments in dm_start_request())
  - So, when dm_put() puts the last reference, there should be no
requests in the queue.
  - If there is no reference to the mapped device, dm_destroy() may
start tearing it down.
It is ok if there is pending delayed work for the request queue
because blk_cleanup_queue() is called before freeing the mapped device
and cancels the delayed work.

So as far as blk_run_queue_async() in rq_completed() is concerned,
it is not a problem from "use-after-free" point of view.

-- 
Jun'ichi Nomura, NEC Corporation

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] dm: do not replace bioset for request-based dm

2013-02-25 Thread Jun';ichi Nomura
This patch fixes a regression introduced in v3.8, which causes oops
like this when dm-multipath is used:

general protection fault:  [#1] SMP
RIP: 0010:[]  [] mempool_free+0x24/0xb0
Call Trace:
  
  [] bio_put+0x97/0xc0
  [] end_clone_bio+0x35/0x90 [dm_mod]
  [] bio_endio+0x1d/0x30
  [] req_bio_endio.isra.51+0xa3/0xe0
  [] blk_update_request+0x118/0x520
  [] blk_update_bidi_request+0x27/0xa0
  [] blk_end_bidi_request+0x2c/0x80
  [] blk_end_request+0x10/0x20
  [] scsi_io_completion+0xfb/0x6c0 [scsi_mod]
  [] scsi_finish_command+0xbd/0x120 [scsi_mod]
  [] scsi_softirq_done+0x13f/0x160 [scsi_mod]
  [] blk_done_softirq+0x80/0xa0
  [] __do_softirq+0xf1/0x250
  [] call_softirq+0x1c/0x30
  [] do_softirq+0x8d/0xc0
  [] irq_exit+0xd5/0xe0
  [] do_IRQ+0x63/0xe0
  [] common_interrupt+0x6f/0x6f
  
  [] srp_queuecommand+0x8c/0xcb0 [ib_srp]
  [] scsi_dispatch_cmd+0x148/0x310 [scsi_mod]
  [] scsi_request_fn+0x31e/0x520 [scsi_mod]
  [] __blk_run_queue+0x37/0x50
  [] blk_delay_work+0x29/0x40
  [] process_one_work+0x1c3/0x5c0
  [] worker_thread+0x15e/0x440
  [] kthread+0xdb/0xe0
  [] ret_from_fork+0x7c/0xb0

The regression was introduced by the change
c0820cf5 "dm: introduce per_bio_data", where dm started to replace
bioset during table replacement.
For bio-based dm, it is good because clone bios do not exist during the
table replacement.
For request-based dm, however, (not-yet-mapped) clone bios may stay in
request queue and survive during the table replacement.
So freeing the old bioset could cause the oops in bio_put().

Since the size of front_pad may change only with bio-based dm,
it is not necessary to replace bioset for request-based dm.

Reported-by: Bart Van Assche 
Tested-by: Bart Van Assche 
Signed-off-by: Jun'ichi Nomura 
Cc: Alasdair G Kergon 
Cc: Mikulas Patocka 
Cc: Mike Snitzer 
Cc: 
---
 drivers/md/dm.c |   30 +-
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 314a0e2..51fefb5 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1973,15 +1973,27 @@ static void __bind_mempools(struct mapped_device *md, 
struct dm_table *t)
 {
struct dm_md_mempools *p = dm_table_get_md_mempools(t);
 
-   if (md->io_pool && (md->tio_pool || dm_table_get_type(t) == 
DM_TYPE_BIO_BASED) && md->bs) {
-   /*
-* The md already has necessary mempools. Reload just the
-* bioset because front_pad may have changed because
-* a different table was loaded.
-*/
-   bioset_free(md->bs);
-   md->bs = p->bs;
-   p->bs = NULL;
+   if (md->io_pool && md->bs) {
+   /* The md already has necessary mempools. */
+   if (dm_table_get_type(t) == DM_TYPE_BIO_BASED) {
+   /*
+* Reload bioset because front_pad may have changed
+* because a different table was loaded.
+*/
+   bioset_free(md->bs);
+   md->bs = p->bs;
+   p->bs = NULL;
+   } else if (dm_table_get_type(t) == DM_TYPE_REQUEST_BASED) {
+   BUG_ON(!md->tio_pool);
+   /*
+* No need to reload in case of request-based dm
+* because of fixed size front_pad.
+* Note for future: if you are to reload bioset,
+* prep-ed requests in queue may have reference
+* to bio from the old bioset.
+* So you must walk through the queue to unprep.
+*/
+   }
goto out;
}
 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] dm: Avoid use-after-free of a mapped device

2013-02-25 Thread Jun';ichi Nomura
On 02/26/13 00:09, Bart Van Assche wrote:
> Without your patch my test failed after two or three iterations. With your 
> patch my test is still running after 53 iterations. So if you want you can 
> add Tested-by: Bart Van Assche .

Great. Thanks for testing.
I'll submit a patch with your Reported-by and Tested-by.

> Your e-mail and the above patch are also interesting because these explain 
> why reverting to the v3.7 of drivers/md made my test succeed.
> 
> Note: even if this patch gets accepted I think it's still useful to modify 
> blk_run_queue() such that it converts recursion into iteration.

Yes. That's a separate discussion.
Though I'm not sure if it's ok in general to implicitly convert
sync run-queue to async one.

-- 
Jun'ichi Nomura, NEC Corporation

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] dm: Avoid use-after-free of a mapped device

2013-02-25 Thread Jun';ichi Nomura
Hello Bart,

On 02/22/13 19:47, Bart Van Assche wrote:
> As the comment above rq_completed() explains, md members must
> not be touched after the dm_put() at the end of that function
> has been invoked. Avoid that the md->queue can be run
> asynchronously after the last md reference has been dropped by
> running that queue synchronously. This patch fixes the
> following kernel oops:

Calling blk_run_queue_async() there should be ok.
After dm_put(), the dm device may be removed. But free_dev() in dm.c
calls blk_queue_cleanup() and it should solve the race vs. delayed work.

And I could reproduce very similar oops without removing dm device
by following procedure:
(please replace "mpathX" with your dm-multipath map name)

  # t=`dmsetup table mpathX`
  # while sleep 1; do \
  echo "$t" | dmsetup load mpathX; dmsetup resume mpathX; done

Looking at the following back trace:

> general protection fault:  [#1] SMP
> RIP: 0010:[]  [] mempool_free+0x24/0xb0
> Call Trace:
>   
>   [] bio_put+0x97/0xc0
>   [] end_clone_bio+0x35/0x90 [dm_mod]
>   [] bio_endio+0x1d/0x30
>   [] req_bio_endio.isra.51+0xa3/0xe0
>   [] blk_update_request+0x118/0x520
>   [] blk_update_bidi_request+0x27/0xa0
>   [] blk_end_bidi_request+0x2c/0x80
>   [] blk_end_request+0x10/0x20
>   [] scsi_io_completion+0xfb/0x6c0 [scsi_mod]
>   [] scsi_finish_command+0xbd/0x120 [scsi_mod]
>   [] scsi_softirq_done+0x13f/0x160 [scsi_mod]
>   [] blk_done_softirq+0x80/0xa0
>   [] __do_softirq+0xf1/0x250
>   [] call_softirq+0x1c/0x30
>   [] do_softirq+0x8d/0xc0
>   [] irq_exit+0xd5/0xe0
>   [] do_IRQ+0x63/0xe0
>   [] common_interrupt+0x6f/0x6f
>   
>   [] srp_queuecommand+0x8c/0xcb0 [ib_srp]
>   [] scsi_dispatch_cmd+0x148/0x310 [scsi_mod]
>   [] scsi_request_fn+0x31e/0x520 [scsi_mod]
>   [] __blk_run_queue+0x37/0x50
>   [] blk_delay_work+0x29/0x40
>   [] process_one_work+0x1c3/0x5c0
>   [] worker_thread+0x15e/0x440
>   [] kthread+0xdb/0xe0
>   [] ret_from_fork+0x7c/0xb0

it seems that the bioset was removed while being referenced.

c0820cf5 "dm: introduce per_bio_data" started to replace dm bioset
during table replacement because the size of bioset front_pad might
change for bio-based dm.
However, for request-based dm, it is not necessary because the size
of front_pad is static. Also we can't simply replace bioset because
prep-ed requests in queue have reference to the old bioset.

The patch below changes it not to replace bioset for request-based dm.
(Brings back to the same behavior with v3.7)
With this patch, I could not reproduce the problem.
Could you try this?

-- 
Jun'ichi Nomura, NEC Corporation

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 314a0e2..51fefb5 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1973,15 +1973,27 @@ static void __bind_mempools(struct mapped_device *md, 
struct dm_table *t)
 {
struct dm_md_mempools *p = dm_table_get_md_mempools(t);
 
-   if (md->io_pool && (md->tio_pool || dm_table_get_type(t) == 
DM_TYPE_BIO_BASED) && md->bs) {
-   /*
-* The md already has necessary mempools. Reload just the
-* bioset because front_pad may have changed because
-* a different table was loaded.
-*/
-   bioset_free(md->bs);
-   md->bs = p->bs;
-   p->bs = NULL;
+   if (md->io_pool && md->bs) {
+   /* The md already has necessary mempools. */
+   if (dm_table_get_type(t) == DM_TYPE_BIO_BASED) {
+   /*
+* Reload bioset because front_pad may have changed
+* because a different table was loaded.
+*/
+   bioset_free(md->bs);
+   md->bs = p->bs;
+   p->bs = NULL;
+   } else if (dm_table_get_type(t) == DM_TYPE_REQUEST_BASED) {
+   BUG_ON(!md->tio_pool);
+   /*
+* No need to reload in case of request-based dm
+* because of fixed size front_pad.
+* Note for future: if you are to reload bioset,
+* prep-ed requests in queue may have reference
+* to bio from the old bioset.
+* So you must walk through the queue to unprep.
+*/
+   }
goto out;
}
 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: SAS1068 PCI-X Fusion-MPT SAS 1000:0055

2007-01-26 Thread Jun';ichi Nomura
Hi,

> I have new NEC server with SAS1068 PCI-X Fusion-MPT SAS
> pciid: 1000:0055
> mptsas form 2.6.20-rc5 don't recognize it ;(
> 
> I see that driver support only 1000:0054 and 1000:0058 devices.

It might be that the device has software RAID feature and changes
device ID based on setup. (1000:0055 when software RAID is enabled
and 1000:0054 or something for normal SAS)

If so, there is a chance you can disable the software RAID
via BIOS setup utility.

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation of America
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: megaraid driver always fails to reset adapter

2005-03-01 Thread Jun';ichi Nomura
Hi,
thanks for the info.

Adding one more 'F' to the loop counter works.
i.e. 0xFF.

Just adding rmb() didn't solve the problem though it
may decrease the necessary counter value.

I don't know this value is ok for environments other than mine.

Bagalkote, Sreenivas wrote:
> Please try:
> 
> In mbox_post_sync_cmd_fast(...) replace
> 
> for (i = 0; i < 0xF; i++) {
>   if (mbox->numstatus != 0xFF) break;
> }
> 
> with
> 
> for (i = 0; i < 0xF; i++) {
>   if (mbox->numstatus != 0xFF) break;
>   rmb();
> } 
> 
> Additionally, increase the loop counter to a bigger value.
> 
> Thanks,
> Sreenivas
> LSI LOGIC Corporation 
> 
> 
>>-Original Message-
>>From: Jun'ichi Nomura [mailto:[EMAIL PROTECTED] 
>>Sent: Tuesday, March 01, 2005 1:36 PM
>>To: linux-scsi@vger.kernel.org
>>Subject: megaraid driver always fails to reset adapter
>>
>>Hello,
>>
>>I found that the megaraid driver always fails to reset the
>>adapter with the following message:
>>   megaraid: resetting the host...
>>   megaraid mbox: reset sequence completed successfully
>>   megaraid: fast sync command timed out
>>   megaraid: reservation reset failed
>>when the "Cluster mode" of the adapter BIOS is enabled.
>>So, whenever the reset occurs, the adapter goes to
>>offline and just become unavailable.
>>
>>Is this a known problem?
>>
>>I tried 2.6.9 and 2.6.11-rc5 and the results were the same.
>>I used sg_reset to invoke reset artificially to test this.
>>
>>The problem doesn't occur if I disabled the "Cluster mode"
>>parameter in the adapter BIOS.
>>
>>I'm not sure how well the currenet megaraid driver supports
>>the "Cluster mode".
>>I appreciate if you have any idea.
>>
>>Thanks,
>>Jun'ichi Nomura
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

megaraid driver always fails to reset adapter

2005-03-01 Thread Jun';ichi Nomura
Hello,

I found that the megaraid driver always fails to reset the
adapter with the following message:
megaraid: resetting the host...
megaraid mbox: reset sequence completed successfully
megaraid: fast sync command timed out
megaraid: reservation reset failed
when the "Cluster mode" of the adapter BIOS is enabled.
So, whenever the reset occurs, the adapter goes to
offline and just become unavailable.

Is this a known problem?

I tried 2.6.9 and 2.6.11-rc5 and the results were the same.
I used sg_reset to invoke reset artificially to test this.

The problem doesn't occur if I disabled the "Cluster mode"
parameter in the adapter BIOS.

I'm not sure how well the currenet megaraid driver supports
the "Cluster mode".
I appreciate if you have any idea.

Thanks,
Jun'ichi Nomura

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html