[PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

2017-12-04 Thread Ming Lei
Before commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget
for blk-mq"), we run queue after 3ms if queue is idle and SCSI device
queue isn't ready, which is done in handling BLK_STS_RESOURCE. After
commit 0df21c86bdbf is introduced, queue won't be run any more under
this situation.

IO hang is observed when timeout happened, and this patch fixes the IO
hang issue by running queue after delay in scsi_dev_queue_ready, just like
non-mq. This issue can be triggered by the following script[1].

There is another issue which can be covered by running idle queue:
when .get_budget() is called on request coming from hctx->dispatch_list,
if one request just completes during .get_budget(), we can't depend on
SCSI's restart to make progress any more. This patch fixes the race too.

With this patch, we basically recover to previous behaviour(before commit
0df21c86bdbf) of handling idle queue when running out of resource.

[1] script for test/verify SCSI timeout
rmmod scsi_debug
modprobe scsi_debug max_queue=1

DEVICE=`ls -d 
/sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | 
xargs basename`
DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*`

echo "using scsi device $DEVICE"
echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth
echo "temporary write through" >$DISK_DIR/cache_type
echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts
echo none > /sys/block/$DEVICE/queue/scheduler
dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 &
sleep 5
echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts
wait
echo "SUCCESS"

Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq")
Signed-off-by: Ming Lei 
---
 drivers/scsi/scsi_lib.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index db9556662e27..1816dd8259b3 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1967,6 +1967,8 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
 out_put_device:
put_device(>sdev_gendev);
 out:
+   if (atomic_read(>device_busy) == 0 && !scsi_device_blocked(sdev))
+   blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
return false;
 }
 
-- 
2.9.5



Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()

2017-12-04 Thread Ming Lei
On Tue, Dec 05, 2017 at 01:16:24PM +0800, Ming Lei wrote:
> On Mon, Dec 04, 2017 at 11:48:07PM +, Holger Hoffstätte wrote:
> > On Tue, 05 Dec 2017 06:45:08 +0800, Ming Lei wrote:
> > 
> > > On Mon, Dec 04, 2017 at 03:09:20PM +, Bart Van Assche wrote:
> > >> On Sun, 2017-12-03 at 00:31 +0800, Ming Lei wrote:
> > >> > Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for 
> > >> > blk-mq")
> > >> 
> > >> It might be safer to revert commit 0df21c86bdbf instead of trying to fix 
> > >> all
> > >> issues introduced by that commit for kernel version v4.15 ...
> > > 
> > > What are all issues in v4.15-rc? Up to now, it is the only issue reported,
> > > and can be fixed by this simple patch, which one can be thought as cleanup
> > > too.
> > 
> > Even with this patch I've encountered at least one hang that
> > seemed related. I'm using most of block/scsi-4.15 on top of 4.14 and
> > the hang in question was on a rotating disk. It could be solved by 
> > activating
> > a different scheduler on the hanging device; all hanging sync/df processes 
> > got
> > unstuck and all was fine again, which leads me to believe that there is at 
> > least
> > one more rare condition where delaying requests (as done in the budget 
> > patch)
> > leads to a hang.
> > 
> > This happened with mq-deadline which I was testing specifically to avoid
> > any BFQ-related side effects.
> 
> OK, this looks a new report.
> 
> Without any log, we can't make any progress, and even we can't guess
> what the issue is related with.
> 
> Could you post your dmesg log(include the hang process stack trace)? And
> dump the debugfs log by the following script when this hang happens?
> 
>   http://people.redhat.com/minlei/tests/tools/dump-blk-info
> 
> BTW, you just need to pass the disk name to the script, such as: /dev/sda.

Thinking of the issue further, this patch only covers case of
scsi_set_blocked(), but don't consider the case in which .get_budget()
is called inside blk_mq_dispatch_rq_list() for request coming from
hctx->dispatch_list.

If .get_budget() is called in both blk_mq_do_dispatch_sched() and
blk_mq_do_dispatch_ctx(), we don't need to run queue if the queue
is idle. But if it is called from blk_mq_dispatch_rq_list() for request
coming from hctx->dispatch_list, we have to run queue if queue is
idle, as before.

So please ignore this patch, and will submit V2 for cover both cases.

Thanks,
Ming


Re: [PATCH v3 00/22] qla2xxx: Bug fixes for 4.15-rc2

2017-12-04 Thread Madhani, Himanshu
Hi Martin, 

> On Dec 4, 2017, at 6:24 PM, Martin K. Petersen  
> wrote:
> 
> 
> Himanshu,
> 
>> drivers/scsi/qla2xxx/qla_def.h |  49 
>> drivers/scsi/qla2xxx/qla_gs.c  | 230 
>> ++---
>> drivers/scsi/qla2xxx/qla_init.c|  69 +--
>> drivers/scsi/qla2xxx/qla_iocb.c|  13 ---
>> drivers/scsi/qla2xxx/qla_isr.c |   7 +-
>> drivers/scsi/qla2xxx/qla_mbx.c |   3 +-
>> drivers/scsi/qla2xxx/qla_mid.c |  42 ---
>> drivers/scsi/qla2xxx/qla_os.c  |  78 ++---
>> drivers/scsi/qla2xxx/qla_target.c  |  60 +++---
>> drivers/scsi/qla2xxx/qla_version.h |   2 +-
>> 10 files changed, 405 insertions(+), 148 deletions(-)
> 
> This looks pretty big for a series of bug fixes. Are all these patches
> really candidates for 4.15 and stable backports all the way back to
> 4.10?
> 
> -- 
> Martin K. PetersenOracle Linux Engineering


Yes Please. I would want them back ported to 4.10 since these issues were 
discovered in combination of 4.10/4.11 kernel.

Thanks,
- Himanshu



Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()

2017-12-04 Thread Ming Lei
On Mon, Dec 04, 2017 at 11:48:07PM +, Holger Hoffstätte wrote:
> On Tue, 05 Dec 2017 06:45:08 +0800, Ming Lei wrote:
> 
> > On Mon, Dec 04, 2017 at 03:09:20PM +, Bart Van Assche wrote:
> >> On Sun, 2017-12-03 at 00:31 +0800, Ming Lei wrote:
> >> > Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for 
> >> > blk-mq")
> >> 
> >> It might be safer to revert commit 0df21c86bdbf instead of trying to fix 
> >> all
> >> issues introduced by that commit for kernel version v4.15 ...
> > 
> > What are all issues in v4.15-rc? Up to now, it is the only issue reported,
> > and can be fixed by this simple patch, which one can be thought as cleanup
> > too.
> 
> Even with this patch I've encountered at least one hang that
> seemed related. I'm using most of block/scsi-4.15 on top of 4.14 and
> the hang in question was on a rotating disk. It could be solved by activating
> a different scheduler on the hanging device; all hanging sync/df processes got
> unstuck and all was fine again, which leads me to believe that there is at 
> least
> one more rare condition where delaying requests (as done in the budget patch)
> leads to a hang.
> 
> This happened with mq-deadline which I was testing specifically to avoid
> any BFQ-related side effects.

OK, this looks a new report.

Without any log, we can't make any progress, and even we can't guess
what the issue is related with.

Could you post your dmesg log(include the hang process stack trace)? And
dump the debugfs log by the following script when this hang happens?

http://people.redhat.com/minlei/tests/tools/dump-blk-info

BTW, you just need to pass the disk name to the script, such as: /dev/sda.

-- 
Ming


[PATCH v2] scsi_debug: add cdb_len paramete

2017-12-04 Thread Douglas Gilbert
While testing "sd: Micro-optimize READ / WRITE CDB encoding" patches it was
helpful to check various code paths associated with READ/WRITE 6, 10
and 16 byte cdb variants. There seems to be no user space "knobs" to
twiddle use_10_for_rw and friends in the scsi_device structure.
So add a parameter to scsi_debug called "cdb_len" for this purpose.

Changes since v1:
  - address most of the concerns from Bart Van Assche
  - keep driver version and date and tie them into some responses
(e.g. version becomes INQUIRY Revision field)

Patch built on lk 4.15.0-rc2

Signed-off-by: Douglas Gilbert 
---
 drivers/scsi/scsi_debug.c | 92 ---
 1 file changed, 87 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index e4f037f0f38b..691ce8f37d34 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -6,7 +6,7 @@
  *  anything out of the ordinary is seen.
  * ^^^ Original ^^^
  *
- * Copyright (C) 2001 - 2016 Douglas Gilbert
+ * Copyright (C) 2001 - 2017 Douglas Gilbert
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -61,8 +61,8 @@
 #include "scsi_logging.h"
 
 /* make sure inq_product_rev string corresponds to this version */
-#define SDEBUG_VERSION "1.86"
-static const char *sdebug_version_date = "20160430";
+#define SDEBUG_VERSION "0187"  /* format to fit INQUIRY revision field */
+static const char *sdebug_version_date = "20171202";
 
 #define MY_NAME "scsi_debug"
 
@@ -105,6 +105,7 @@ static const char *sdebug_version_date = "20160430";
  * (id 0) containing 1 logical unit (lun 0). That is 1 device.
  */
 #define DEF_ATO 1
+#define DEF_CDB_LEN 10
 #define DEF_JDELAY   1 /* if > 0 unit is a jiffy */
 #define DEF_DEV_SIZE_MB   8
 #define DEF_DIF 0
@@ -571,6 +572,7 @@ static const struct opcode_info_t 
opcode_info_arr[SDEB_I_LAST_ELEMENT + 1] = {
 
 static int sdebug_add_host = DEF_NUM_HOST;
 static int sdebug_ato = DEF_ATO;
+static int sdebug_cdb_len = DEF_CDB_LEN;
 static int sdebug_jdelay = DEF_JDELAY; /* if > 0 then unit is jiffies */
 static int sdebug_dev_size_mb = DEF_DEV_SIZE_MB;
 static int sdebug_dif = DEF_DIF;
@@ -797,6 +799,61 @@ static int scsi_debug_ioctl(struct scsi_device *dev, int 
cmd, void __user *arg)
/* return -ENOTTY; // correct return but upsets fdisk */
 }
 
+static void config_cdb_len(struct scsi_device *sdev)
+{
+   switch (sdebug_cdb_len) {
+   case 6: /* suggest 6 byte READ, WRITE and MODE SENSE/SELECT */
+   sdev->use_10_for_rw = false;
+   sdev->use_16_for_rw = false;
+   sdev->use_10_for_ms = false;
+   break;
+   case 10: /* suggest 10 byte RWs and 6 byte MODE SENSE/SELECT */
+   sdev->use_10_for_rw = true;
+   sdev->use_16_for_rw = false;
+   sdev->use_10_for_ms = false;
+   break;
+   case 12: /* suggest 10 byte RWs and 10 byte MODE SENSE/SELECT */
+   sdev->use_10_for_rw = true;
+   sdev->use_16_for_rw = false;
+   sdev->use_10_for_ms = true;
+   break;
+   case 16:
+   sdev->use_10_for_rw = false;
+   sdev->use_16_for_rw = true;
+   sdev->use_10_for_ms = true;
+   break;
+   case 32: /* No knobs to suggest this so same as 16 for now */
+   sdev->use_10_for_rw = false;
+   sdev->use_16_for_rw = true;
+   sdev->use_10_for_ms = true;
+   break;
+   default:
+   pr_warn("unexpected cdb_len=%d, force to 10\n",
+   sdebug_cdb_len);
+   sdev->use_10_for_rw = true;
+   sdev->use_16_for_rw = false;
+   sdev->use_10_for_ms = false;
+   sdebug_cdb_len = 10;
+   break;
+   }
+}
+
+static void all_config_cdb_len(void)
+{
+   struct sdebug_host_info *sdbg_host;
+   struct Scsi_Host *shost;
+   struct scsi_device *sdev;
+
+   spin_lock(_host_list_lock);
+   list_for_each_entry(sdbg_host, _host_list, host_list) {
+   shost = sdbg_host->shost;
+   shost_for_each_device(sdev, shost) {
+   config_cdb_len(sdev);
+   }
+   }
+   spin_unlock(_host_list_lock);
+}
+
 static void clear_luns_changed_on_target(struct sdebug_dev_info *devip)
 {
struct sdebug_host_info *sdhp;
@@ -955,7 +1012,7 @@ static int fetch_to_dev_buffer(struct scsi_cmnd *scp, 
unsigned char *arr,
 
 static char sdebug_inq_vendor_id[9] = "Linux   ";
 static char sdebug_inq_product_id[17] = "scsi_debug  ";
-static char sdebug_inq_product_rev[5] = "0186";/* version less '.' */
+static char sdebug_inq_product_rev[5] = SDEBUG_VERSION;
 /* Use some locally assigned NAAs for SAS addresses. 

Re: [PATCH 1/2] scsi-mq: Only show the CDB if available

2017-12-04 Thread Ming Lei
On Tue, Dec 05, 2017 at 01:59:51AM +, Bart Van Assche wrote:
> On Tue, 2017-12-05 at 09:15 +0800, Ming Lei wrote:
> > On Mon, Dec 04, 2017 at 04:38:08PM -0800, Bart Van Assche wrote:
> > > Since the next patch will make it possible that scsi_show_rq() gets
> > > called before the CDB pointer is changed into a non-NULL value,
> > > only show the CDB if the CDB pointer is not NULL. Additionally,
> > > show the request timeout and SCSI command flags. This patch also
> > > fixes a bug that was reported by Ming Lei. See also Ming Lei,
> > > scsi_debugfs: fix crash in scsi_show_rq(), linux-scsi, 7 November
> > > 2017 (https://marc.info/?l=linux-block=151006655317188).
> > 
> > Please cook a patch for fixing the crash issue only, since we need
> > to backport the fix to stable kernel.
> 
> The code that is touched by this patch is only used for kernel debugging.
> I will do this if others agree with your opinion.

No, do not mix two different things in one patch, especially the fix part
need to be backported to stable.

The fix part should aim at V4.15, and the other part can be a V4.16
stuff.

-- 
Ming


Re: [PATCH v3 00/22] qla2xxx: Bug fixes for 4.15-rc2

2017-12-04 Thread Martin K. Petersen

Himanshu,

>  drivers/scsi/qla2xxx/qla_def.h |  49 
>  drivers/scsi/qla2xxx/qla_gs.c  | 230 
> ++---
>  drivers/scsi/qla2xxx/qla_init.c|  69 +--
>  drivers/scsi/qla2xxx/qla_iocb.c|  13 ---
>  drivers/scsi/qla2xxx/qla_isr.c |   7 +-
>  drivers/scsi/qla2xxx/qla_mbx.c |   3 +-
>  drivers/scsi/qla2xxx/qla_mid.c |  42 ---
>  drivers/scsi/qla2xxx/qla_os.c  |  78 ++---
>  drivers/scsi/qla2xxx/qla_target.c  |  60 +++---
>  drivers/scsi/qla2xxx/qla_version.h |   2 +-
>  10 files changed, 405 insertions(+), 148 deletions(-)

This looks pretty big for a series of bug fixes. Are all these patches
really candidates for 4.15 and stable backports all the way back to
4.10?

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 1/2] scsi-mq: Only show the CDB if available

2017-12-04 Thread Ming Lei
On Mon, Dec 04, 2017 at 10:42:28PM -0500, Martin K. Petersen wrote:
> 
> Hi Ming,
> 
> > Please cook a patch for fixing the crash issue only, since we need
> > to backport the fix to stable kernel.
> 
> I thought you were going to submit a V5 that addressed James' concerns?
> 
> -- 
> Martin K. PetersenOracle Linux Engineering

Hi Martin,

I replied in the following link for James's concerns:

https://marc.info/?l=linux-block=151074751321108=2

The fact is that use-after-free can't avoided at all, no matter if
we set the cmnd to NULL before calling free, that means we have to
handle use-after-free well in scsi_show_rq(), so we don't need to
touch the free code.

So V4 is well enough for merge, IMO.


Thanks,
Ming


Re: [PATCH 1/2] scsi-mq: Only show the CDB if available

2017-12-04 Thread Martin K. Petersen

Hi Ming,

> Please cook a patch for fixing the crash issue only, since we need
> to backport the fix to stable kernel.

I thought you were going to submit a V5 that addressed James' concerns?

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] ibmvscsis: add DRC indices to debug statements

2017-12-04 Thread Martin K. Petersen

Bryant,

> Where applicable, changes pr_debug, pr_info, pr_err, etc. calls
> to the dev_* versions.  This adds the DRC index of the device to the
> corresponding trace statement.

Applied to 4.16/scsi-queue, thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: csiostor: fix spelling mistake: "Couldnt" -> "Couldn't"

2017-12-04 Thread Martin K. Petersen

Colin,

> Trivial fix to spelling mistake in error message text.

Applied to 4.16/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: ipr: fix incorrect indentation of assignment statement

2017-12-04 Thread Martin K. Petersen

Colin,

> Remove one extraneous level of indentation on an assignment statement.

Applied to 4.16/scsi-queue, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: bnx2fc: fix spelling mistake: "Couldnt" -> "Couldn't"

2017-12-04 Thread Martin K. Petersen

Colin,

> Trivial fix to spelling mistake in error message text.

Applied to 4.16/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: sd: add missing KERN_CONT for disk spin-up

2017-12-04 Thread Martin K. Petersen

Michał,

> KERN_CONT is now required for continued printks(). Add it.

Applied to 4.16/scsi-queue. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 1/2] scsi: ufs: add some definition included in UFS HCI specifications

2017-12-04 Thread Martin K. Petersen

> These would be used in the future in some specific drivers.

Applied to 4.16/scsi-queue. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] mpt3sas: Remove unused variable requeue_event

2017-12-04 Thread Martin K. Petersen

Suganath,

> No Functional change just cleanup,
> Removed variable requeue_event and made function as void.

Applied to 4.16/scsi-queue. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v15 4/5] scsi: mpt3sas: Replace PCI pool old API

2017-12-04 Thread Martin K. Petersen

Romain,

> The PCI pool API is deprecated. This commit replaces the PCI pool old
> API by the appropriate function with the DMA pool API.

Applied to 4.16/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] [SCSI] fnic: Fix coccinelle warnings

2017-12-04 Thread Martin K. Petersen

Vasyl,

> Remove the duplicate copies of this simple function and use an
> open-coded version.

Applied to 4.16/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: scsi_devinfo: handle non-terminated strings

2017-12-04 Thread Martin K. Petersen

Martin,

> devinfo->vendor and devinfo->model aren't necessarily
> zero-terminated.

Applied to 4.15/scsi-fixes. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi_devinfo: cleanly zero-pad devinfo strings

2017-12-04 Thread Martin K. Petersen

Martin,

> Cleanly fill memory for "vendor" and "model" with 0-bytes for the
> "compatible" case rather than adding only a single 0 byte.  This
> simplifies the devinfo code a a bit, and avoids mistakes in other
> places of the code (not in current upstream, but we had one such
> mistake in the SUSE kernel).

Applied to 4.15/scsi-fixes. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 1/2] scsi-mq: Only show the CDB if available

2017-12-04 Thread Bart Van Assche
On Tue, 2017-12-05 at 09:15 +0800, Ming Lei wrote:
> On Mon, Dec 04, 2017 at 04:38:08PM -0800, Bart Van Assche wrote:
> > Since the next patch will make it possible that scsi_show_rq() gets
> > called before the CDB pointer is changed into a non-NULL value,
> > only show the CDB if the CDB pointer is not NULL. Additionally,
> > show the request timeout and SCSI command flags. This patch also
> > fixes a bug that was reported by Ming Lei. See also Ming Lei,
> > scsi_debugfs: fix crash in scsi_show_rq(), linux-scsi, 7 November
> > 2017 (https://marc.info/?l=linux-block=151006655317188).
> 
> Please cook a patch for fixing the crash issue only, since we need
> to backport the fix to stable kernel.

The code that is touched by this patch is only used for kernel debugging.
I will do this if others agree with your opinion.

Bart.

Re: [PATCH] blk-mq: Fix several SCSI request queue lockups

2017-12-04 Thread Ming Lei
On Tue, Dec 05, 2017 at 01:13:43AM +, Bart Van Assche wrote:
> On Tue, 2017-12-05 at 09:04 +0800, Ming Lei wrote:
> > Then no reason to revert commit(0df21c86bdbf scsi: implement .get_budget an
> > .put_budget for blk-mq) for one issue which may never happen in reality 
> > since
> > this reproducer need out-of-tree patch.
> 
> Sorry but I disagree completely. You seem to overlook that there may be other
> circumstances that trigger the same lockup, e.g. a SCSI queue full condition.

If the scsi_dev_queue_ready() returns false, .get_budget() catches that
and never add request to hctx->dispatch. And scsi_host_queue_ready()
always returns true, since we respect per-host queue depth by
blk_mq_get_driver_tag() before calling .queue_rq().

Or if I miss other cases, please point it out.

-- 
Ming


Re: [PATCH 1/2] scsi-mq: Only show the CDB if available

2017-12-04 Thread Ming Lei
On Mon, Dec 04, 2017 at 04:38:08PM -0800, Bart Van Assche wrote:
> Since the next patch will make it possible that scsi_show_rq() gets
> called before the CDB pointer is changed into a non-NULL value,
> only show the CDB if the CDB pointer is not NULL. Additionally,
> show the request timeout and SCSI command flags. This patch also
> fixes a bug that was reported by Ming Lei. See also Ming Lei,
> scsi_debugfs: fix crash in scsi_show_rq(), linux-scsi, 7 November
> 2017 (https://marc.info/?l=linux-block=151006655317188).

Please cook a patch for fixing the crash issue only, since we need
to backport the fix to stable kernel.

> 
> Signed-off-by: Bart Van Assche 
> Cc: James E.J. Bottomley 
> Cc: Martin K. Petersen 
> Cc: Ming Lei 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 

Please Cc: 

> ---
>  drivers/scsi/scsi_debugfs.c | 47 
> -
>  1 file changed, 42 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
> index 01f08c03f2c1..37ed6bb8e6ec 100644
> --- a/drivers/scsi/scsi_debugfs.c
> +++ b/drivers/scsi/scsi_debugfs.c
> @@ -4,13 +4,50 @@
>  #include 
>  #include "scsi_debugfs.h"
>  
> +#define SCSI_CMD_FLAG_NAME(name) [ilog2(SCMD_##name)] = #name
> +static const char *const scsi_cmd_flags[] = {
> + SCSI_CMD_FLAG_NAME(TAGGED),
> + SCSI_CMD_FLAG_NAME(UNCHECKED_ISA_DMA),
> + SCSI_CMD_FLAG_NAME(ZONE_WRITE_LOCK),
> + SCSI_CMD_FLAG_NAME(INITIALIZED),
> +};
> +#undef SCSI_CMD_FLAG_NAME
> +
> +static int scsi_flags_show(struct seq_file *m, const unsigned long flags,
> +const char *const *flag_name, int flag_name_count)
> +{
> + bool sep = false;
> + int i;
> +
> + for (i = 0; i < sizeof(flags) * BITS_PER_BYTE; i++) {
> + if (!(flags & BIT(i)))
> + continue;
> + if (sep)
> + seq_puts(m, "|");
> + sep = true;
> + if (i < flag_name_count && flag_name[i])
> + seq_puts(m, flag_name[i]);
> + else
> + seq_printf(m, "%d", i);
> + }
> + return 0;
> +}
> +
>  void scsi_show_rq(struct seq_file *m, struct request *rq)
>  {
>   struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req);
> - int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
> - char buf[80];
> + int alloc_ms = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
> + int timeout_ms = jiffies_to_msecs(rq->timeout);
> + const u8 *const cdb = READ_ONCE(cmd->cmnd);
> + char buf[80] = "(?)";
>  
> - __scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len);
> - seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf,
> -cmd->retries, msecs / 1000, msecs % 1000);
> + if (cdb)
> + __scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len);
> + seq_printf(m, ", .cmd=%s, .retries=%d, .result = %#x, .flags=", buf,
> +cmd->retries, cmd->result);
> + scsi_flags_show(m, cmd->flags, scsi_cmd_flags,
> + ARRAY_SIZE(scsi_cmd_flags));
> + seq_printf(m, ", .timeout=%d.%03d, allocated %d.%03d s ago",
> +timeout_ms / 1000, timeout_ms % 1000,
> +alloc_ms / 1000, alloc_ms % 1000);
>  }
> -- 
> 2.15.0
> 

-- 
Ming


Re: [PATCH] blk-mq: Fix several SCSI request queue lockups

2017-12-04 Thread Bart Van Assche
On Tue, 2017-12-05 at 09:04 +0800, Ming Lei wrote:
> Then no reason to revert commit(0df21c86bdbf scsi: implement .get_budget an
> .put_budget for blk-mq) for one issue which may never happen in reality since
> this reproducer need out-of-tree patch.

Sorry but I disagree completely. You seem to overlook that there may be other
circumstances that trigger the same lockup, e.g. a SCSI queue full condition.

Bart.

Re: [PATCH] blk-mq: Fix several SCSI request queue lockups

2017-12-04 Thread Ming Lei
On Tue, Dec 05, 2017 at 12:29:59AM +, Bart Van Assche wrote:
> On Tue, 2017-12-05 at 08:20 +0800, Ming Lei wrote:
> > Also it is a bit odd to see request in hctx->dispatch now, and it can only
> > happen now when scsi_target_queue_ready() returns false, so I guess you 
> > apply
> > some change on target->can_queue(such as setting it as 1 in srp/ib code
> > manually)?
> 
> Yes, but that had already been mentioned. From the e-mail at the start of
> this e-mail thread: "Change the SRP initiator such that SCSI target queue
> depth is limited to 1." The changes I made in the SRP initiator are the same
> as those described in the following message from about one month ago:
> https://www.spinics.net/lists/linux-scsi/msg114720.html.

OK, got it.

Then no reason to revert commit(0df21c86bdbf scsi: implement .get_budget an
.put_budget for blk-mq) for one issue which may never happen in reality since
this reproducer need out-of-tree patch.

I don't mean it isn't a issue, but I don't think it has top priority
for reverting commit 0df21c86bdbf. Especially there isn't proof shown
that 0df21c86bdbf causes this issue since this commit won't change run
queue for requests in hctx->dispatch_list.

I's like to take a look if someone'd like to cooperate, such as
providing kernel log, test debug patch, and kind of things. Or when
I get this hardware to reproduce.

--
Ming


[PATCH 1/2] scsi-mq: Only show the CDB if available

2017-12-04 Thread Bart Van Assche
Since the next patch will make it possible that scsi_show_rq() gets
called before the CDB pointer is changed into a non-NULL value,
only show the CDB if the CDB pointer is not NULL. Additionally,
show the request timeout and SCSI command flags. This patch also
fixes a bug that was reported by Ming Lei. See also Ming Lei,
scsi_debugfs: fix crash in scsi_show_rq(), linux-scsi, 7 November
2017 (https://marc.info/?l=linux-block=151006655317188).

Signed-off-by: Bart Van Assche 
Cc: James E.J. Bottomley 
Cc: Martin K. Petersen 
Cc: Ming Lei 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 drivers/scsi/scsi_debugfs.c | 47 -
 1 file changed, 42 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
index 01f08c03f2c1..37ed6bb8e6ec 100644
--- a/drivers/scsi/scsi_debugfs.c
+++ b/drivers/scsi/scsi_debugfs.c
@@ -4,13 +4,50 @@
 #include 
 #include "scsi_debugfs.h"
 
+#define SCSI_CMD_FLAG_NAME(name) [ilog2(SCMD_##name)] = #name
+static const char *const scsi_cmd_flags[] = {
+   SCSI_CMD_FLAG_NAME(TAGGED),
+   SCSI_CMD_FLAG_NAME(UNCHECKED_ISA_DMA),
+   SCSI_CMD_FLAG_NAME(ZONE_WRITE_LOCK),
+   SCSI_CMD_FLAG_NAME(INITIALIZED),
+};
+#undef SCSI_CMD_FLAG_NAME
+
+static int scsi_flags_show(struct seq_file *m, const unsigned long flags,
+  const char *const *flag_name, int flag_name_count)
+{
+   bool sep = false;
+   int i;
+
+   for (i = 0; i < sizeof(flags) * BITS_PER_BYTE; i++) {
+   if (!(flags & BIT(i)))
+   continue;
+   if (sep)
+   seq_puts(m, "|");
+   sep = true;
+   if (i < flag_name_count && flag_name[i])
+   seq_puts(m, flag_name[i]);
+   else
+   seq_printf(m, "%d", i);
+   }
+   return 0;
+}
+
 void scsi_show_rq(struct seq_file *m, struct request *rq)
 {
struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req);
-   int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
-   char buf[80];
+   int alloc_ms = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
+   int timeout_ms = jiffies_to_msecs(rq->timeout);
+   const u8 *const cdb = READ_ONCE(cmd->cmnd);
+   char buf[80] = "(?)";
 
-   __scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len);
-   seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf,
-  cmd->retries, msecs / 1000, msecs % 1000);
+   if (cdb)
+   __scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len);
+   seq_printf(m, ", .cmd=%s, .retries=%d, .result = %#x, .flags=", buf,
+  cmd->retries, cmd->result);
+   scsi_flags_show(m, cmd->flags, scsi_cmd_flags,
+   ARRAY_SIZE(scsi_cmd_flags));
+   seq_printf(m, ", .timeout=%d.%03d, allocated %d.%03d s ago",
+  timeout_ms / 1000, timeout_ms % 1000,
+  alloc_ms / 1000, alloc_ms % 1000);
 }
-- 
2.15.0



[PATCH 2/2] blk-mq-debugfs: Also show requests that have not yet been started

2017-12-04 Thread Bart Van Assche
When debugging e.g. the SCSI timeout handler it is important that
requests that have not yet been started or that already have
completed are also reported through debugfs.

Signed-off-by: Bart Van Assche 
Cc: Ming Lei 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
Cc: Martin K. Petersen 
---
 block/blk-mq-debugfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index f7db73f1698e..886b37163f17 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -409,7 +409,7 @@ static void hctx_show_busy_rq(struct request *rq, void 
*data, bool reserved)
const struct show_busy_params *params = data;
 
if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx &&
-   test_bit(REQ_ATOM_STARTED, >atomic_flags))
+   list_empty(>queuelist))
__blk_mq_debugfs_rq_show(params->m,
 list_entry_rq(>queuelist));
 }
-- 
2.15.0



[PATCH 0/2] Show commands stuck in a timeout handler in debugfs

2017-12-04 Thread Bart Van Assche
Hello Jens,

While debugging an issue with the SCSI error handler I noticed that commands
that got stuck in that error handler are not shown in debugfs. That is very
annoying for anyone who relies on the information in debugfs for root-causing
such an issue. Hence this patch series that makes sure that commands that got
stuck in a block driver timeout handler are also shown in debugfs. Please
consider these patches for kernel v4.16.

Thanks,

Bart.

Bart Van Assche (2):
  scsi-mq: Only show the CDB if available
  blk-mq-debugfs: Also show requests that have not yet been started

 block/blk-mq-debugfs.c  |  2 +-
 drivers/scsi/scsi_debugfs.c | 47 -
 2 files changed, 43 insertions(+), 6 deletions(-)

-- 
2.15.0



Re: [PATCH] blk-mq: Fix several SCSI request queue lockups

2017-12-04 Thread Bart Van Assche
On Tue, 2017-12-05 at 08:20 +0800, Ming Lei wrote:
> Also it is a bit odd to see request in hctx->dispatch now, and it can only
> happen now when scsi_target_queue_ready() returns false, so I guess you apply
> some change on target->can_queue(such as setting it as 1 in srp/ib code
> manually)?

Yes, but that had already been mentioned. From the e-mail at the start of
this e-mail thread: "Change the SRP initiator such that SCSI target queue
depth is limited to 1." The changes I made in the SRP initiator are the same
as those described in the following message from about one month ago:
https://www.spinics.net/lists/linux-scsi/msg114720.html.

Bart.

Re: [PATCH] blk-mq: Fix several SCSI request queue lockups

2017-12-04 Thread Ming Lei
On Mon, Dec 04, 2017 at 11:32:27PM +, Bart Van Assche wrote:
> On Tue, 2017-12-05 at 07:01 +0800, Ming Lei wrote:
> > On Mon, Dec 04, 2017 at 10:48:18PM +, Bart Van Assche wrote:
> > > On Tue, 2017-12-05 at 06:42 +0800, Ming Lei wrote:
> > > > On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote:
> > > > > * A systematic lockup for SCSI queues with queue depth 1. The
> > > > >   following test reproduces that bug systematically:
> > > > >   - Change the SRP initiator such that SCSI target queue depth is
> > > > > limited to 1.
> > > > >   - Run the following command:
> > > > >   srp-test/run_tests -f xfs -d -e none -r 60 -t 01
> > > > >   See also "[PATCH 4/7] blk-mq: Avoid that request processing
> > > > >   stalls when sharing tags"
> > > > >   (https://marc.info/?l=linux-block=151208695316857). Note:
> > > > >   reverting commit 0df21c86bdbf also fixes a sporadic SCSI request
> > > > >   queue lockup while inserting a blk_mq_sched_mark_restart_hctx()
> > > > >   before all blk_mq_dispatch_rq_list() calls only fixes the
> > > > >   systematic lockup for queue depth 1.
> > > > 
> > > > You are the only reproducer [ ... ]
> > > 
> > > That's not correct. I'm pretty sure if you try to reproduce this that
> > > you will see the same hang I ran into. Does this mean that you have not
> > > yet tried to reproduce the hang I reported?
> > 
> > Do you mean every kernel developer has to own one SRP/IB hardware?
> 
> When I have the time I will make it possible to run this test on any system
> equipped with at least one Ethernet port. But the fact that the test I
> mentioned requires IB hardware should not prevent you from running this test
> since you have run this test software before.
> 
> > I don't have your hardware to reproduce that,
> 
> That's not true. Your employer definitely owns IB hardware. E.g. the
> following message shows that you have run the srp-test yourself on IB hardware
> only four weeks ago:
> 
> https://www.spinics.net/lists/linux-block/msg19511.html

The hardware belongs to Laurence, at that time I can borrow from him, and
now I am not sure if it is available.

> 
> > Otherwise, there should have be such similar reports from others, not from
> > only you.
> 
> That's not correct either. How long was it ago that kernel v4.15-rc1 was
> released? One week? How many SRP users do you think have tried to trigger a
> queue full condition with that kernel version?

OK, we can wait for further reporters to provide kernel log if you
don't want.

> 
> > More importantly I don't understand why you can't share the kernel
> > log/debugfs log when IO hang happens?
> > 
> > Without any kernel log, how can we confirm that it is a valid report?
> 
> It's really unfortunate that you are focussing on denying that the bug I
> reported exists instead of trying to fix the bugs introduced by commit

If you look at bug reports in kenrel mail list, you will see most of
reports includes some kind of log, that is a common practice to report
issue with log attached. It can save us much time for talking in mails.

> b347689ffbca. BTW, I have more than enough experience to decide myself what
> a valid report is and what not. I can easily send you several MB of kernel

As I mentioned, only dmesg with hang trace and debugfs log should be enough,
both can't be so big, right?

> logs. The reason I have not yet done this is because I'm 99.9% sure that
> these won't help to root cause the reported issue. But I can tell you what

That is your opinion, most of times, I can find some clue from debugfs
about hang issue, then I can try to add trace just in some possible
places for narrowing down the issue.

> I learned from analyzing the information under /sys/kernel/debug/block:
> every time a hang occurred I noticed that no requests were "busy", that two
> requests occurred in rq_lists and one request occurred in a hctx dispatch

Then what do other attributes show? Like queue/hctx state?

The following script can get all this info easily:

http://people.redhat.com/minlei/tests/tools/dump-blk-info

Also it is a bit odd to see request in hctx->dispatch now, and it can only
happen now when scsi_target_queue_ready() returns false, so I guess you apply
some change on target->can_queue(such as setting it as 1 in srp/ib code
manually)?

Please reply, if yes, I will try to see if I can reproduce it with this
kind of change on scsi_debug.

> list. This is enough to conclude that a queue run was missing. And I think

In this case, seems it isn't related with both commit b347689ff and 
0df21c86bdbf,
since both don't change RESTART for hctx->dispatch, and shouldn't affect
run queue.

> that the patch at the start of this e-mail thread not only shows that the
> root cause is in the block layer but also that this bug was introduced by
> commit b347689ffbca.
> 
> > > > You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched:
> > > > improve dispatching from sw queue")', but you don't mention any 

Re: [PATCH] blk-mq: Fix several SCSI request queue lockups

2017-12-04 Thread Bart Van Assche
On Tue, 2017-12-05 at 07:01 +0800, Ming Lei wrote:
> On Mon, Dec 04, 2017 at 10:48:18PM +, Bart Van Assche wrote:
> > On Tue, 2017-12-05 at 06:42 +0800, Ming Lei wrote:
> > > On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote:
> > > > * A systematic lockup for SCSI queues with queue depth 1. The
> > > >   following test reproduces that bug systematically:
> > > >   - Change the SRP initiator such that SCSI target queue depth is
> > > > limited to 1.
> > > >   - Run the following command:
> > > >   srp-test/run_tests -f xfs -d -e none -r 60 -t 01
> > > >   See also "[PATCH 4/7] blk-mq: Avoid that request processing
> > > >   stalls when sharing tags"
> > > >   (https://marc.info/?l=linux-block=151208695316857). Note:
> > > >   reverting commit 0df21c86bdbf also fixes a sporadic SCSI request
> > > >   queue lockup while inserting a blk_mq_sched_mark_restart_hctx()
> > > >   before all blk_mq_dispatch_rq_list() calls only fixes the
> > > >   systematic lockup for queue depth 1.
> > > 
> > > You are the only reproducer [ ... ]
> > 
> > That's not correct. I'm pretty sure if you try to reproduce this that
> > you will see the same hang I ran into. Does this mean that you have not
> > yet tried to reproduce the hang I reported?
> 
> Do you mean every kernel developer has to own one SRP/IB hardware?

When I have the time I will make it possible to run this test on any system
equipped with at least one Ethernet port. But the fact that the test I
mentioned requires IB hardware should not prevent you from running this test
since you have run this test software before.

> I don't have your hardware to reproduce that,

That's not true. Your employer definitely owns IB hardware. E.g. the
following message shows that you have run the srp-test yourself on IB hardware
only four weeks ago:

https://www.spinics.net/lists/linux-block/msg19511.html

> Otherwise, there should have be such similar reports from others, not from
> only you.

That's not correct either. How long was it ago that kernel v4.15-rc1 was
released? One week? How many SRP users do you think have tried to trigger a
queue full condition with that kernel version?

> More importantly I don't understand why you can't share the kernel
> log/debugfs log when IO hang happens?
> 
> Without any kernel log, how can we confirm that it is a valid report?

It's really unfortunate that you are focussing on denying that the bug I
reported exists instead of trying to fix the bugs introduced by commit
b347689ffbca. BTW, I have more than enough experience to decide myself what
a valid report is and what not. I can easily send you several MB of kernel
logs. The reason I have not yet done this is because I'm 99.9% sure that
these won't help to root cause the reported issue. But I can tell you what
I learned from analyzing the information under /sys/kernel/debug/block:
every time a hang occurred I noticed that no requests were "busy", that two
requests occurred in rq_lists and one request occurred in a hctx dispatch
list. This is enough to conclude that a queue run was missing. And I think
that the patch at the start of this e-mail thread not only shows that the
root cause is in the block layer but also that this bug was introduced by
commit b347689ffbca.

> > > You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched:
> > > improve dispatching from sw queue")', but you don't mention any issue
> > > about that commit.
> > 
> > That's not correct either. From the commit message "A systematic lockup
> > for SCSI queues with queue depth 1."
> 
> I mean you mentioned your patch can fix 'commit b347689ffbca
> ("blk-mq-sched: improve dispatching from sw queue")', but you never
> point where the commit b347689ffbca is wrong, how your patch fixes
> the mistake of that commit.

You should know that it is not required to perform a root cause analysis
before posting a revert. Having performed a bisect is sufficient.

BTW, it seems like you forgot that last Friday I explained to you that there
is an obvious bug in commit b347689ffbca, namely that a 
blk_mq_sched_mark_restart_hctx()
call is missing in blk_mq_sched_dispatch_requests() before the 
blk_mq_do_dispatch_ctx()
call. See also https://marc.info/?l=linux-block=151215794224401.

Bart.

Re: [PATCH] blk-mq: Fix several SCSI request queue lockups

2017-12-04 Thread Ming Lei
On Mon, Dec 04, 2017 at 10:48:18PM +, Bart Van Assche wrote:
> On Tue, 2017-12-05 at 06:42 +0800, Ming Lei wrote:
> > On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote:
> > > * A systematic lockup for SCSI queues with queue depth 1. The
> > >   following test reproduces that bug systematically:
> > >   - Change the SRP initiator such that SCSI target queue depth is
> > > limited to 1.
> > >   - Run the following command:
> > >   srp-test/run_tests -f xfs -d -e none -r 60 -t 01
> > >   See also "[PATCH 4/7] blk-mq: Avoid that request processing
> > >   stalls when sharing tags"
> > >   (https://marc.info/?l=linux-block=151208695316857). Note:
> > >   reverting commit 0df21c86bdbf also fixes a sporadic SCSI request
> > >   queue lockup while inserting a blk_mq_sched_mark_restart_hctx()
> > >   before all blk_mq_dispatch_rq_list() calls only fixes the
> > >   systematic lockup for queue depth 1.
> > 
> > You are the only reproducer [ ... ]
> 
> That's not correct. I'm pretty sure if you try to reproduce this that
> you will see the same hang I ran into. Does this mean that you have not
> yet tried to reproduce the hang I reported?

Do you mean every kernel developer has to own one SRP/IB hardware?
I don't have your hardware to reproduce that, and I don't think most
of guys have that. Otherwise, there should have be such similar reports
from others, not from only you.

More importantly I don't understand why you can't share the kernel
log/debugfs log when IO hang happens?

Without any kernel log, how can we confirm that it is a valid report?

> 
> > You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched:
> > improve dispatching from sw queue")', but you don't mention any issue
> > about that commit.
> 
> That's not correct either. From the commit message "A systematic lockup
> for SCSI queues with queue depth 1."

I mean you mentioned your patch can fix 'commit b347689ffbca
("blk-mq-sched: improve dispatching from sw queue")', but you never
point where the commit b347689ffbca is wrong, how your patch fixes
the mistake of that commit.

> 
> > > I think the above means that it is too risky to try to fix all bugs
> > > introduced by commit 0df21c86bdbf before kernel v4.15 is released.
> > > Hence revert that commit.
> > 
> > What is the risk?
> 
> That more bugs were introduced by commit 0df21c86bdbf than the ones that
> have been discovered so far.

If you don't provide any log, I have to ignore your report simply.
So there is only one real issue which can be addressed easily by
the following patch:

https://marc.info/?l=linux-scsi=151223234607157=2

-- 
Ming


Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()

2017-12-04 Thread Bart Van Assche
On Tue, 2017-12-05 at 06:45 +0800, Ming Lei wrote:
> On Mon, Dec 04, 2017 at 03:09:20PM +, Bart Van Assche wrote:
> > On Sun, 2017-12-03 at 00:31 +0800, Ming Lei wrote:
> > > Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for 
> > > blk-mq")
> > 
> > It might be safer to revert commit 0df21c86bdbf instead of trying to fix all
> > issues introduced by that commit for kernel version v4.15 ...
> 
> What are all issues in v4.15-rc? Up to now, it is the only issue reported,
> and can be fixed by this simple patch, which one can be thought as cleanup
> too.

The three issues I described in the commit message of the patch that is 
available at:
https://marc.info/?l=linux-block=151240866010572.

Bart.

Re: [PATCH] blk-mq: Fix several SCSI request queue lockups

2017-12-04 Thread Bart Van Assche
On Tue, 2017-12-05 at 06:42 +0800, Ming Lei wrote:
> On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote:
> > * A systematic lockup for SCSI queues with queue depth 1. The
> >   following test reproduces that bug systematically:
> >   - Change the SRP initiator such that SCSI target queue depth is
> > limited to 1.
> >   - Run the following command:
> >   srp-test/run_tests -f xfs -d -e none -r 60 -t 01
> >   See also "[PATCH 4/7] blk-mq: Avoid that request processing
> >   stalls when sharing tags"
> >   (https://marc.info/?l=linux-block=151208695316857). Note:
> >   reverting commit 0df21c86bdbf also fixes a sporadic SCSI request
> >   queue lockup while inserting a blk_mq_sched_mark_restart_hctx()
> >   before all blk_mq_dispatch_rq_list() calls only fixes the
> >   systematic lockup for queue depth 1.
> 
> You are the only reproducer [ ... ]

That's not correct. I'm pretty sure if you try to reproduce this that
you will see the same hang I ran into. Does this mean that you have not
yet tried to reproduce the hang I reported?

> You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched:
> improve dispatching from sw queue")', but you don't mention any issue
> about that commit.

That's not correct either. From the commit message "A systematic lockup
for SCSI queues with queue depth 1."

> > I think the above means that it is too risky to try to fix all bugs
> > introduced by commit 0df21c86bdbf before kernel v4.15 is released.
> > Hence revert that commit.
> 
> What is the risk?

That more bugs were introduced by commit 0df21c86bdbf than the ones that
have been discovered so far.

Bart.

[PATCH v3 18/22] qla2xxx: Defer processing of GS IOCB calls

2017-12-04 Thread Himanshu Madhani
From: Giridhar Malavali 

This patch defers processing of GS IOCB calls from interrupt
context to avoid hardware spinlock recursion.

Following stack trace is seen

? mod_timer+0x193/0x330
? ql_dbg+0xa7/0xf0 [qla2xxx]
_raw_spin_lock_irqsave+0x31/0x40
qla2x00_start_sp+0x3b/0x250 [qla2xxx]
qla24xx_async_gnl+0x1d3/0x240 [qla2xxx]
qla24xx_fcport_handle_login+0x285/0x290 [qla2xxx]
? vprintk_func+0x20/0x50

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc:  # 4.10+
Signed-off-by: Giridhar Malavali 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_init.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index 7dd19785f820..57b8f43c5980 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -975,7 +975,7 @@ int qla24xx_fcport_handle_login(struct scsi_qla_host *vha, 
fc_port_t *fcport)
ql_dbg(ql_dbg_disc, vha, 0x20bd,
"%s %d %8phC post gnl\n",
__func__, __LINE__, fcport->port_name);
-   qla24xx_async_gnl(vha, fcport);
+   qla24xx_post_gnl_work(vha, fcport);
} else {
ql_dbg(ql_dbg_disc, vha, 0x20bf,
"%s %d %8phC post login\n",
@@ -1143,7 +1143,7 @@ void qla24xx_handle_relogin_event(scsi_qla_host_t *vha,
ql_dbg(ql_dbg_disc, vha, 0x20e9, "%s %d %8phC post gidpn\n",
__func__, __LINE__, fcport->port_name);
 
-   qla24xx_async_gidpn(vha, fcport);
+   qla24xx_post_gidpn_work(vha, fcport);
return;
}
 
-- 
2.12.0



[PATCH v3 17/22] qla2xxx: Clear loop id after delete

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

clear loop id after delete to prevent session invalidation
of stale session.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc:  # 4.10+
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_target.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c 
b/drivers/scsi/qla2xxx/qla_target.c
index 1c219998ab60..0c0453f2ca9e 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -986,7 +986,7 @@ static void qlt_free_session_done(struct work_struct *work)
sess->send_els_logo = 0;
}
 
-   if (sess->logout_on_delete) {
+   if (sess->logout_on_delete && sess->loop_id != FC_NO_LOOP_ID) {
int rc;
 
rc = qla2x00_post_async_logout_work(vha, sess, NULL);
@@ -1045,8 +1045,7 @@ static void qlt_free_session_done(struct work_struct 
*work)
sess->login_succ = 0;
}
 
-   if (sess->chip_reset != ha->base_qpair->chip_reset)
-   qla2x00_clear_loop_id(sess);
+   qla2x00_clear_loop_id(sess);
 
if (sess->conflict) {
sess->conflict->login_pause = 0;
@@ -4600,9 +4599,9 @@ qlt_find_sess_invalidate_other(scsi_qla_host_t *vha, 
uint64_t wwn,
"Invalidating sess %p loop_id %d wwn 
%llx.\n",
other_sess, other_sess->loop_id, other_wwn);
 
-
other_sess->keep_nport_handle = 1;
-   *conflict_sess = other_sess;
+   if (other_sess->disc_state != DSC_DELETED)
+   *conflict_sess = other_sess;
qlt_schedule_sess_for_deletion(other_sess,
true);
}
-- 
2.12.0



[PATCH v3 21/22] qla2xxx: Fix memory leak in dual/target mode

2017-12-04 Thread Himanshu Madhani
When driver is loaded in Target/Dual mode, it creates QPair
to support MQ and allocates resources for each QPair. This Qpair
initialization is delayed until the FW personality is changed to
Dual/Target mode by issuing chip reset. At the time of chip reset
firmware is re-initilized in correct personality all the QPairs
are initialized by sending MBC_INITIALIZE_MULTIQ (001Fh).

This patch fixes memory leak by adding check to issue
MBC_INITIALIZE_MULTIQ command only while deleting rsp/req queue
when the flag is set for initiator mode, and clean up QPair resources
correctly during the driver unload. This MBX does not need to be
issued for Target/Dual mode because chip reset will reset ISP.

Fixes: d65237c7f0860 ("scsi: qla2xxx: Fix mailbox failure while deleting Queue 
pairs")
Cc:  # 4.10+
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_init.c |  4 +---
 drivers/scsi/qla2xxx/qla_mid.c  | 18 ++
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index 57b8f43c5980..58663df38627 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -8220,9 +8220,6 @@ int qla2xxx_delete_qpair(struct scsi_qla_host *vha, 
struct qla_qpair *qpair)
int ret = QLA_FUNCTION_FAILED;
struct qla_hw_data *ha = qpair->hw;
 
-   if (!vha->flags.qpairs_req_created && !vha->flags.qpairs_rsp_created)
-   goto fail;
-
qpair->delete_in_progress = 1;
while (atomic_read(>ref_count))
msleep(500);
@@ -8230,6 +8227,7 @@ int qla2xxx_delete_qpair(struct scsi_qla_host *vha, 
struct qla_qpair *qpair)
ret = qla25xx_delete_req_que(vha, qpair->req);
if (ret != QLA_SUCCESS)
goto fail;
+
ret = qla25xx_delete_rsp_que(vha, qpair->rsp);
if (ret != QLA_SUCCESS)
goto fail;
diff --git a/drivers/scsi/qla2xxx/qla_mid.c b/drivers/scsi/qla2xxx/qla_mid.c
index 618ca272d01a..e538e6308885 100644
--- a/drivers/scsi/qla2xxx/qla_mid.c
+++ b/drivers/scsi/qla2xxx/qla_mid.c
@@ -575,14 +575,15 @@ qla25xx_free_rsp_que(struct scsi_qla_host *vha, struct 
rsp_que *rsp)
 int
 qla25xx_delete_req_que(struct scsi_qla_host *vha, struct req_que *req)
 {
-   int ret = -1;
+   int ret = QLA_SUCCESS;
 
-   if (req) {
+   if (req && vha->flags.qpairs_req_created) {
req->options |= BIT_0;
ret = qla25xx_init_req_que(vha, req);
+   if (ret != QLA_SUCCESS)
+   return QLA_FUNCTION_FAILED;
}
-   if (ret == QLA_SUCCESS)
-   qla25xx_free_req_que(vha, req);
+   qla25xx_free_req_que(vha, req);
 
return ret;
 }
@@ -590,14 +591,15 @@ qla25xx_delete_req_que(struct scsi_qla_host *vha, struct 
req_que *req)
 int
 qla25xx_delete_rsp_que(struct scsi_qla_host *vha, struct rsp_que *rsp)
 {
-   int ret = -1;
+   int ret = QLA_SUCCESS;
 
-   if (rsp) {
+   if (rsp && vha->flags.qpairs_rsp_created) {
rsp->options |= BIT_0;
ret = qla25xx_init_rsp_que(vha, rsp);
+   if (ret != QLA_SUCCESS)
+   return QLA_FUNCTION_FAILED;
}
-   if (ret == QLA_SUCCESS)
-   qla25xx_free_rsp_que(vha, rsp);
+   qla25xx_free_rsp_que(vha, rsp);
 
return ret;
 }
-- 
2.12.0



[PATCH v3 20/22] qla2xxx: Fix system crash in qlt_plogi_ack_unref

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

Fix system crash due to NULL pointer access.

qlt_plogi_ack_t and fc_port structures were not properly
bound before calling qlt_plogi_ack_unref().

RIP: 0010:qlt_plogi_ack_unref+0xa1/0x150 [qla2xxx]
Call Trace:
qla24xx_create_new_sess+0xb1/0x320 [qla2xxx]
qla2x00_do_work+0x123/0x260 [qla2xxx]
qla2x00_iocb_work_fn+0x30/0x40 [qla2xxx]
process_one_work+0x1f3/0x530
worker_thread+0x4e/0x480
kthread+0x10c/0x140

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc:  # 4.10+
Signed-off-by: Quinn Tran 
Signed-off-by: Giridhar Malavali 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_os.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 2ec77b9f78b8..789030c9dd26 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -4750,11 +4750,11 @@ void qla24xx_create_new_sess(struct scsi_qla_host *vha, 
struct qla_work_evt *e)
} else {
list_add_tail(>list, >vp_fcports);
 
-   if (pla) {
-   qlt_plogi_ack_link(vha, pla, fcport,
-   QLT_PLOGI_LINK_SAME_WWN);
-   pla->ref_count--;
-   }
+   }
+   if (pla) {
+   qlt_plogi_ack_link(vha, pla, fcport,
+   QLT_PLOGI_LINK_SAME_WWN);
+   pla->ref_count--;
}
}
spin_unlock_irqrestore(>hw->tgt.sess_lock, flags);
-- 
2.12.0



[PATCH v3 22/22] qla2xxx: Update driver version to 10.00.00.03-k

2017-12-04 Thread Himanshu Madhani
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_version.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_version.h 
b/drivers/scsi/qla2xxx/qla_version.h
index b6ec02b96d3d..911b82226d13 100644
--- a/drivers/scsi/qla2xxx/qla_version.h
+++ b/drivers/scsi/qla2xxx/qla_version.h
@@ -7,7 +7,7 @@
 /*
  * Driver version
  */
-#define QLA2XXX_VERSION  "10.00.00.02-k"
+#define QLA2XXX_VERSION  "10.00.00.03-k"
 
 #define QLA_DRIVER_MAJOR_VER   10
 #define QLA_DRIVER_MINOR_VER   0
-- 
2.12.0



[PATCH v3 19/22] qla2xxx: Remove aborting ELS IOCB call issued as part of timeout.

2017-12-04 Thread Himanshu Madhani
From: Giridhar Malavali 

This fix the spinlock recursion issue seen while unloading the driver.

14 [9f2e21e03db8] native_queued_spin_lock_slowpath at ad0d8802
15 [9f2e21e03dc0] do_raw_spin_lock at ad0d99e4
16 [9f2e21e03dd8] _raw_spin_lock_irqsave at ad652471
17 [9f2e21e03e00] qla2x00_els_dcmd_iocb_timeout at c070cd63
18 [9f2e21e03e40] qla2x00_sp_timeout at c06f06d3 [qla2xxx]
19 [9f2e21e03e68] call_timer_fn at ad0f97d8
20 [9f2e21e03ed8] run_timer_softirq at ad0faf47
21 [9f2e21e03f68] __softirqentry_text_start at ad655f32

Fixes: 6eb54715b54bb ("qla2xxx: Added interface to send explicit LOGO.")
Cc:  # 4.10+
Signed-off-by: Giridhar Malavali 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_iocb.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
index 106f4ac4f733..8ea59586f4f1 100644
--- a/drivers/scsi/qla2xxx/qla_iocb.c
+++ b/drivers/scsi/qla2xxx/qla_iocb.c
@@ -2392,7 +2392,6 @@ qla2x00_els_dcmd_iocb_timeout(void *data)
srb_t *sp = data;
fc_port_t *fcport = sp->fcport;
struct scsi_qla_host *vha = sp->vha;
-   struct qla_hw_data *ha = vha->hw;
struct srb_iocb *lio = >u.iocb_cmd;
 
ql_dbg(ql_dbg_io, vha, 0x3069,
@@ -2400,15 +2399,6 @@ qla2x00_els_dcmd_iocb_timeout(void *data)
sp->name, sp->handle, fcport->d_id.b.domain, fcport->d_id.b.area,
fcport->d_id.b.al_pa);
 
-   /* Abort the exchange */
-   if (ha->isp_ops->abort_command(sp)) {
-   ql_dbg(ql_dbg_io, vha, 0x3070,
-   "mbx abort_command failed.\n");
-   } else {
-   ql_dbg(ql_dbg_io, vha, 0x3071,
-   "mbx abort_command success.\n");
-   }
-
complete(>u.els_logo.comp);
 }
 
-- 
2.12.0



[PATCH v3 13/22] qla2xxx: Fix PRLI state check

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

Get Port Database MBX cmd is to validate current Login state upon
PRLI completion. Current code looks at the last login state for
re-validation which was incorrect. This patch removed incorrect
state check.

Fixes: 15f30a5752287 ("qla2xxx: Use IOCB interface to submit non-critical MBX.")
Cc:  # 4.10+
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_mbx.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_mbx.c b/drivers/scsi/qla2xxx/qla_mbx.c
index cb717d47339f..e2b5fa47bb57 100644
--- a/drivers/scsi/qla2xxx/qla_mbx.c
+++ b/drivers/scsi/qla2xxx/qla_mbx.c
@@ -6160,8 +6160,7 @@ int __qla24xx_parse_gpdb(struct scsi_qla_host *vha, 
fc_port_t *fcport,
}
 
/* Check for logged in state. */
-   if (current_login_state != PDS_PRLI_COMPLETE &&
-   last_login_state != PDS_PRLI_COMPLETE) {
+   if (current_login_state != PDS_PRLI_COMPLETE) {
ql_dbg(ql_dbg_mbx, vha, 0x119a,
"Unable to verify login-state (%x/%x) for loop_id %x.\n",
current_login_state, last_login_state, fcport->loop_id);
-- 
2.12.0



[PATCH v3 10/22] qla2xxx: Relogin to target port on a cable swap

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

If user swaps one target port for another target port for same
switch port, the new target port is not being recognized by the
driver. Current code assumes that old Target port has recovered
from link down. The fix will ask switch what is the WWPN of a
specific NportID (GPNID) rather than assuming it's the same Target
port which has came back.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc:  # 4.10+
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_gs.c | 166 +-
 drivers/scsi/qla2xxx/qla_init.c   |   6 +-
 drivers/scsi/qla2xxx/qla_os.c |  35 +++-
 drivers/scsi/qla2xxx/qla_target.c |  35 ++--
 4 files changed, 195 insertions(+), 47 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
index 59ecc4eda6cd..7d715e58901f 100644
--- a/drivers/scsi/qla2xxx/qla_gs.c
+++ b/drivers/scsi/qla2xxx/qla_gs.c
@@ -3171,43 +3171,136 @@ void qla24xx_async_gpnid_done(scsi_qla_host_t *vha, 
srb_t *sp)
 
 void qla24xx_handle_gpnid_event(scsi_qla_host_t *vha, struct event_arg *ea)
 {
-   fc_port_t *fcport;
-   unsigned long flags;
+   fc_port_t *fcport, *conflict, *t;
 
-   spin_lock_irqsave(>hw->tgt.sess_lock, flags);
-   fcport = qla2x00_find_fcport_by_wwpn(vha, ea->port_name, 1);
-   spin_unlock_irqrestore(>hw->tgt.sess_lock, flags);
+   ql_dbg(ql_dbg_disc, vha, 0x,
+   "%s %d port_id: %06x\n",
+   __func__, __LINE__, ea->id.b24);
 
-   if (fcport) {
-   /* cable moved. just plugged in */
-   fcport->rscn_gen++;
-   fcport->d_id = ea->id;
-   fcport->scan_state = QLA_FCPORT_FOUND;
-   fcport->flags |= FCF_FABRIC_DEVICE;
-
-   switch (fcport->disc_state) {
-   case DSC_DELETED:
-   ql_dbg(ql_dbg_disc, vha, 0x210d,
-   "%s %d %8phC login\n", __func__, __LINE__,
-   fcport->port_name);
-   qla24xx_fcport_handle_login(vha, fcport);
-   break;
-   case DSC_DELETE_PEND:
-   break;
-   default:
-   ql_dbg(ql_dbg_disc, vha, 0x2064,
-   "%s %d %8phC post del sess\n",
-   __func__, __LINE__, fcport->port_name);
-   qlt_schedule_sess_for_deletion_lock(fcport);
-   break;
+   if (ea->rc) {
+   /* cable is disconnected */
+   list_for_each_entry_safe(fcport, t, >vp_fcports, list) {
+   if (fcport->d_id.b24 == ea->id.b24) {
+   ql_dbg(ql_dbg_disc, vha, 0x,
+   "%s %d %8phC DS %d\n",
+   __func__, __LINE__,
+   fcport->port_name,
+   fcport->disc_state);
+   fcport->scan_state = QLA_FCPORT_SCAN;
+   switch (fcport->disc_state) {
+   case DSC_DELETED:
+   case DSC_DELETE_PEND:
+   break;
+   default:
+   ql_dbg(ql_dbg_disc, vha, 0x,
+   "%s %d %8phC post del sess\n",
+   __func__, __LINE__,
+   fcport->port_name);
+   qlt_schedule_sess_for_deletion_lock
+   (fcport);
+   break;
+   }
+   }
}
} else {
-   /* create new fcport */
-   ql_dbg(ql_dbg_disc, vha, 0x2065,
-   "%s %d %8phC post new sess\n",
-   __func__, __LINE__, ea->port_name);
+   /* cable is connected */
+   fcport = qla2x00_find_fcport_by_wwpn(vha, ea->port_name, 1);
+   if (fcport) {
+   list_for_each_entry_safe(conflict, t, >vp_fcports,
+   list) {
+   if ((conflict->d_id.b24 == ea->id.b24) &&
+   (fcport != conflict)) {
+   /* 2 fcports with conflict Nport ID or
+* an existing fcport is having nport ID
+* conflict with new fcport.
+*/
+
+   ql_dbg(ql_dbg_disc, vha, 0x,

[PATCH v3 12/22] qla2xxx: Clear send ELS LOGO flag after target re-login

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

This patch fixes clearing out els_send_logo flag at the
time of session deletion.

Fixes: 3515832cc614 ("scsi: qla2xxx: Reset the logo flag, after target 
re-login.")
Cc:  # 4.10+
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_target.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/qla2xxx/qla_target.c 
b/drivers/scsi/qla2xxx/qla_target.c
index 283ff316e4b2..e824cdc77139 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -983,6 +983,7 @@ static void qlt_free_session_done(struct work_struct *work)
logo.id = sess->d_id;
logo.cmd_count = 0;
qlt_send_first_logo(vha, );
+   sess->send_els_logo = 0;
}
 
if (sess->logout_on_delete) {
-- 
2.12.0



[PATCH v3 07/22] qla2xxx: Serialize GPNID for multiple RSCN

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

GPNID is triggered by RSCN. For multiple RSCNs of the same
affected NPORT ID, serialize the GPNID to prevent confusion.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc:  # 4.10+
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
---
 drivers/scsi/qla2xxx/qla_def.h | 48 +++---
 drivers/scsi/qla2xxx/qla_gs.c  | 35 +-
 drivers/scsi/qla2xxx/qla_isr.c |  2 +-
 drivers/scsi/qla2xxx/qla_os.c  |  1 +
 4 files changed, 58 insertions(+), 28 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
index 01a9b8971e88..d9b4a0651a0f 100644
--- a/drivers/scsi/qla2xxx/qla_def.h
+++ b/drivers/scsi/qla2xxx/qla_def.h
@@ -315,6 +315,29 @@ struct srb_cmd {
 /* To identify if a srb is of T10-CRC type. @sp => srb_t pointer */
 #define IS_PROT_IO(sp) (sp->flags & SRB_CRC_CTX_DSD_VALID)
 
+/*
+ * 24 bit port ID type definition.
+ */
+typedef union {
+   uint32_t b24 : 24;
+
+   struct {
+#ifdef __BIG_ENDIAN
+   uint8_t domain;
+   uint8_t area;
+   uint8_t al_pa;
+#elif defined(__LITTLE_ENDIAN)
+   uint8_t al_pa;
+   uint8_t area;
+   uint8_t domain;
+#else
+#error "__BIG_ENDIAN or __LITTLE_ENDIAN must be defined!"
+#endif
+   uint8_t rsvd_1;
+   } b;
+} port_id_t;
+#define INVALID_PORT_ID0xFF
+
 struct els_logo_payload {
uint8_t opcode;
uint8_t rsvd[3];
@@ -338,6 +361,7 @@ struct ct_arg {
u32 rsp_size;
void*req;
void*rsp;
+   port_id_t   id;
 };
 
 /*
@@ -499,6 +523,7 @@ typedef struct srb {
const char *name;
int iocbs;
struct qla_qpair *qpair;
+   struct list_head elem;
u32 gen1;   /* scratch */
u32 gen2;   /* scratch */
union {
@@ -2164,28 +2189,6 @@ struct imm_ntfy_from_isp {
 #define REQUEST_ENTRY_SIZE (sizeof(request_t))
 
 
-/*
- * 24 bit port ID type definition.
- */
-typedef union {
-   uint32_t b24 : 24;
-
-   struct {
-#ifdef __BIG_ENDIAN
-   uint8_t domain;
-   uint8_t area;
-   uint8_t al_pa;
-#elif defined(__LITTLE_ENDIAN)
-   uint8_t al_pa;
-   uint8_t area;
-   uint8_t domain;
-#else
-#error "__BIG_ENDIAN or __LITTLE_ENDIAN must be defined!"
-#endif
-   uint8_t rsvd_1;
-   } b;
-} port_id_t;
-#define INVALID_PORT_ID0xFF
 
 /*
  * Switch info gathering structure.
@@ -4252,6 +4255,7 @@ typedef struct scsi_qla_host {
uint8_t n2n_node_name[WWN_SIZE];
uint8_t n2n_port_name[WWN_SIZE];
uint16_tn2n_id;
+   struct list_head gpnid_list;
 } scsi_qla_host_t;
 
 struct qla27xx_image_status {
diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
index ea1b562ebc8a..59ecc4eda6cd 100644
--- a/drivers/scsi/qla2xxx/qla_gs.c
+++ b/drivers/scsi/qla2xxx/qla_gs.c
@@ -3221,16 +3221,17 @@ static void qla2x00_async_gpnid_sp_done(void *s, int 
res)
(struct ct_sns_rsp *)sp->u.iocb_cmd.u.ctarg.rsp;
struct event_arg ea;
struct qla_work_evt *e;
+   unsigned long flags;
 
if (res)
ql_dbg(ql_dbg_disc, vha, 0x2066,
-   "Async done-%s fail res %x ID %3phC. %8phC\n",
-   sp->name, res, ct_req->req.port_id.port_id,
+   "Async done-%s fail res %x rscn gen %d ID %3phC. %8phC\n",
+   sp->name, res, sp->gen1, ct_req->req.port_id.port_id,
ct_rsp->rsp.gpn_id.port_name);
else
ql_dbg(ql_dbg_disc, vha, 0x2066,
-   "Async done-%s good ID %3phC. %8phC\n",
-   sp->name, ct_req->req.port_id.port_id,
+   "Async done-%s good rscn gen %d ID %3phC. %8phC\n",
+   sp->name, sp->gen1, ct_req->req.port_id.port_id,
ct_rsp->rsp.gpn_id.port_name);
 
memset(, 0, sizeof(ea));
@@ -3242,11 +3243,20 @@ static void qla2x00_async_gpnid_sp_done(void *s, int 
res)
ea.rc = res;
ea.event = FCME_GPNID_DONE;
 
+   spin_lock_irqsave(>hw->tgt.sess_lock, flags);
+   list_del(>elem);
+   spin_unlock_irqrestore(>hw->tgt.sess_lock, flags);
+
if (res) {
if (res == QLA_FUNCTION_TIMEOUT)
qla24xx_post_gpnid_work(sp->vha, );
sp->free(sp);
return;
+   } else if (sp->gen1) {
+   /* There was anoter RSNC for this Nport ID */
+   qla24xx_post_gpnid_work(sp->vha, );
+   sp->free(sp);
+   return;
}
 
qla2x00_fcport_event_handler(vha, );
@@ -3282,8 +3292,9 @@ int qla24xx_async_gpnid(scsi_qla_host_t *vha, port_id_t 

[PATCH v3 14/22] qla2xxx: Fix abort command deadlock due to spinlock

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

Original code acquires hardware_lock to add Abort IOCB
onto driver request queue for processing. However,
abort_command() will also acquire hardware lock to look up
sp pointer before issuing abort IOCB command resulting
into a deadlock. This patch safely removes the possible
deadlock scenario by removing extra spinlock.

Fixes: 6eb54715b54bb ("qla2xxx: Added interface to send explicit LOGO.")
Cc:  # 4.10+
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_iocb.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
index d810a447cb4a..106f4ac4f733 100644
--- a/drivers/scsi/qla2xxx/qla_iocb.c
+++ b/drivers/scsi/qla2xxx/qla_iocb.c
@@ -2394,7 +2394,6 @@ qla2x00_els_dcmd_iocb_timeout(void *data)
struct scsi_qla_host *vha = sp->vha;
struct qla_hw_data *ha = vha->hw;
struct srb_iocb *lio = >u.iocb_cmd;
-   unsigned long flags = 0;
 
ql_dbg(ql_dbg_io, vha, 0x3069,
"%s Timeout, hdl=%x, portid=%02x%02x%02x\n",
@@ -2402,7 +2401,6 @@ qla2x00_els_dcmd_iocb_timeout(void *data)
fcport->d_id.b.al_pa);
 
/* Abort the exchange */
-   spin_lock_irqsave(>hardware_lock, flags);
if (ha->isp_ops->abort_command(sp)) {
ql_dbg(ql_dbg_io, vha, 0x3070,
"mbx abort_command failed.\n");
@@ -2410,7 +2408,6 @@ qla2x00_els_dcmd_iocb_timeout(void *data)
ql_dbg(ql_dbg_io, vha, 0x3071,
"mbx abort_command success.\n");
}
-   spin_unlock_irqrestore(>hardware_lock, flags);
 
complete(>u.els_logo.comp);
 }
-- 
2.12.0



[PATCH v3 09/22] qla2xxx: Fix NPIV host cleanup in target mode

2017-12-04 Thread Himanshu Madhani
From: Sawan Chandak 

Add check to make sure we are cleaning up global target host
list only for NPIV hosts

Fixes: bdbe24de281e2 ("scsi: qla2xxx: Cleanup NPIV host in target mode during 
config teardown")
Cc:  # 4.10+
Signed-off-by: Sawan Chandak 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_target.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c 
b/drivers/scsi/qla2xxx/qla_target.c
index 924d58f5408f..1bec8aebb7b6 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -1561,8 +1561,11 @@ static void qlt_release(struct qla_tgt *tgt)
 
btree_destroy64(>lun_qpair_map);
 
-   if (ha->tgt.tgt_ops && ha->tgt.tgt_ops->remove_target)
-   ha->tgt.tgt_ops->remove_target(vha);
+   if (vha->vp_idx)
+   if (ha->tgt.tgt_ops &&
+   ha->tgt.tgt_ops->remove_target &&
+   vha->vha_tgt.target_lport_ptr)
+   ha->tgt.tgt_ops->remove_target(vha);
 
vha->vha_tgt.qla_tgt = NULL;
 
-- 
2.12.0



[PATCH v3 16/22] qla2xxx: Fix scan state field for fcport

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

Add correct value of scan_state field indicating state
of the FC port

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc:  # 4.10+
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_target.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/qla2xxx/qla_target.c 
b/drivers/scsi/qla2xxx/qla_target.c
index 2a6242d97a7e..1c219998ab60 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -5812,6 +5812,7 @@ static fc_port_t *qlt_get_port_database(struct 
scsi_qla_host *vha,
tfcp->port_type = fcport->port_type;
tfcp->supported_classes = fcport->supported_classes;
tfcp->flags |= fcport->flags;
+   tfcp->scan_state = QLA_FCPORT_FOUND;
 
del = fcport;
fcport = tfcp;
-- 
2.12.0



[PATCH v3 15/22] qla2xxx: Replace fcport alloc with qla2x00_alloc_fcport

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

Current code manually allocate an fcport structure that
is not properly initialize. Replace kzalloc with
qla2x00_alloc_fcport, so that all fields are initialized.
Also set set scan flag to port found

Cc: 
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_target.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c 
b/drivers/scsi/qla2xxx/qla_target.c
index e824cdc77139..2a6242d97a7e 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -5783,7 +5783,7 @@ static fc_port_t *qlt_get_port_database(struct 
scsi_qla_host *vha,
unsigned long flags;
u8 newfcport = 0;
 
-   fcport = kzalloc(sizeof(*fcport), GFP_KERNEL);
+   fcport = qla2x00_alloc_fcport(vha, GFP_KERNEL);
if (!fcport) {
ql_dbg(ql_dbg_tgt_mgt, vha, 0xf06f,
"qla_target(%d): Allocation of tmp FC port failed",
-- 
2.12.0



[PATCH v3 11/22] qla2xxx: Fix Relogin being triggered too fast

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

Current driver design schedules relogin process via DPC thread
every 1 second. In a large fabric, this DPC thread tries to
schedule too many jobs and might get overloaded. As a result of
this processing of DPC thread, it can schedule relogin earlier
than 1 second.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc:  # 4.10+
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_def.h |  1 +
 drivers/scsi/qla2xxx/qla_mid.c | 24 +++-
 drivers/scsi/qla2xxx/qla_os.c  | 22 ++
 3 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
index d9b4a0651a0f..93ff92e2363f 100644
--- a/drivers/scsi/qla2xxx/qla_def.h
+++ b/drivers/scsi/qla2xxx/qla_def.h
@@ -4110,6 +4110,7 @@ typedef struct scsi_qla_host {
 #define LOOP_READY 5
 #define LOOP_DEAD  6
 
+   unsigned long   relogin_jif;
unsigned long   dpc_flags;
 #define RESET_MARKER_NEEDED0   /* Send marker to ISP. */
 #define RESET_ACTIVE   1
diff --git a/drivers/scsi/qla2xxx/qla_mid.c b/drivers/scsi/qla2xxx/qla_mid.c
index bd9f14bf7ac2..618ca272d01a 100644
--- a/drivers/scsi/qla2xxx/qla_mid.c
+++ b/drivers/scsi/qla2xxx/qla_mid.c
@@ -343,15 +343,21 @@ qla2x00_do_dpc_vp(scsi_qla_host_t *vha)
"FCPort update end.\n");
}
 
-   if ((test_and_clear_bit(RELOGIN_NEEDED, >dpc_flags)) &&
-   !test_bit(LOOP_RESYNC_NEEDED, >dpc_flags) &&
-   atomic_read(>loop_state) != LOOP_DOWN) {
-
-   ql_dbg(ql_dbg_dpc, vha, 0x4018,
-   "Relogin needed scheduled.\n");
-   qla2x00_relogin(vha);
-   ql_dbg(ql_dbg_dpc, vha, 0x4019,
-   "Relogin needed end.\n");
+   if (test_bit(RELOGIN_NEEDED, >dpc_flags) &&
+   !test_bit(LOOP_RESYNC_NEEDED, >dpc_flags) &&
+   atomic_read(>loop_state) != LOOP_DOWN) {
+
+   if (!vha->relogin_jif ||
+   time_after_eq(jiffies, vha->relogin_jif)) {
+   vha->relogin_jif = jiffies + HZ;
+   clear_bit(RELOGIN_NEEDED, >dpc_flags);
+
+   ql_dbg(ql_dbg_dpc, vha, 0x4018,
+   "Relogin needed scheduled.\n");
+   qla2x00_relogin(vha);
+   ql_dbg(ql_dbg_dpc, vha, 0x4019,
+   "Relogin needed end.\n");
+   }
}
 
if (test_and_clear_bit(RESET_MARKER_NEEDED, >dpc_flags) &&
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 820d1c185beb..2ec77b9f78b8 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -4905,7 +4905,7 @@ void qla2x00_relogin(struct scsi_qla_host *vha)
 */
if (atomic_read(>state) != FCS_ONLINE &&
fcport->login_retry && !(fcport->flags & FCF_ASYNC_SENT)) {
-   fcport->login_retry--;
+
if (fcport->flags & FCF_FABRIC_DEVICE) {
ql_dbg(ql_dbg_disc, fcport->vha, 0x2108,
"%s %8phC DS %d LS %d\n", __func__,
@@ -4916,6 +4916,7 @@ void qla2x00_relogin(struct scsi_qla_host *vha)
ea.fcport = fcport;
qla2x00_fcport_event_handler(vha, );
} else {
+   fcport->login_retry--;
status = qla2x00_local_device_login(vha,
fcport);
if (status == QLA_SUCCESS) {
@@ -5898,16 +5899,21 @@ qla2x00_do_dpc(void *data)
}
 
/* Retry each device up to login retry count */
-   if ((test_and_clear_bit(RELOGIN_NEEDED,
-   _vha->dpc_flags)) &&
+   if (test_bit(RELOGIN_NEEDED, _vha->dpc_flags) &&
!test_bit(LOOP_RESYNC_NEEDED, _vha->dpc_flags) &&
atomic_read(_vha->loop_state) != LOOP_DOWN) {
 
-   ql_dbg(ql_dbg_dpc, base_vha, 0x400d,
-   "Relogin scheduled.\n");
-   qla2x00_relogin(base_vha);
-   ql_dbg(ql_dbg_dpc, base_vha, 0x400e,
-   "Relogin end.\n");
+   if (!base_vha->relogin_jif ||
+   time_after_eq(jiffies, base_vha->relogin_jif)) {
+   base_vha->relogin_jif = jiffies + HZ;
+   clear_bit(RELOGIN_NEEDED, _vha->dpc_flags);
+
+   ql_dbg(ql_dbg_dpc, base_vha, 0x400d,
+

[PATCH v3 08/22] qla2xxx: Fix login state machine stuck at GPDB

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

This patch returns discovery state machine back to
Login Complete.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc:  # 4.10+
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_init.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index be4c67b465b8..2f246996d3e2 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -863,6 +863,7 @@ void qla24xx_handle_gpdb_event(scsi_qla_host_t *vha, struct 
event_arg *ea)
int rval = ea->rc;
fc_port_t *fcport = ea->fcport;
unsigned long flags;
+   u16 opt = ea->sp->u.iocb_cmd.u.mbx.out_mb[10];
 
fcport->flags &= ~FCF_ASYNC_SENT;
 
@@ -893,7 +894,8 @@ void qla24xx_handle_gpdb_event(scsi_qla_host_t *vha, struct 
event_arg *ea)
}
 
spin_lock_irqsave(>hw->tgt.sess_lock, flags);
-   ea->fcport->login_gen++;
+   if (opt != PDO_FORCE_ADISC)
+   ea->fcport->login_gen++;
ea->fcport->deleted = 0;
ea->fcport->logout_on_delete = 1;
 
@@ -917,6 +919,13 @@ void qla24xx_handle_gpdb_event(scsi_qla_host_t *vha, 
struct event_arg *ea)
 
qla24xx_post_gpsc_work(vha, fcport);
}
+   } else if (ea->fcport->login_succ) {
+   /*
+* We have an existing session. A late RSCN delivery
+* must have triggered the session to be re-validate.
+* session is still valid.
+*/
+   fcport->disc_state = DSC_LOGIN_COMPLETE;
}
spin_unlock_irqrestore(>hw->tgt.sess_lock, flags);
 } /* gpdb event */
-- 
2.12.0



[PATCH v3 05/22] qla2xxx: Fix re-login for Nport Handle in use

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

When NPort Handle is in use, driver needs to mark the handle
as used and pick another. Instead, the code clears the handle
and re-pick the same handle.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc:  # 4.10+
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
---
 drivers/scsi/qla2xxx/qla_gs.c   | 16 ++-
 drivers/scsi/qla2xxx/qla_init.c | 44 +
 drivers/scsi/qla2xxx/qla_isr.c  |  5 -
 3 files changed, 51 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
index ddc69d36877e..8984f857bb34 100644
--- a/drivers/scsi/qla2xxx/qla_gs.c
+++ b/drivers/scsi/qla2xxx/qla_gs.c
@@ -2833,7 +2833,7 @@ void qla24xx_handle_gidpn_event(scsi_qla_host_t *vha, 
struct event_arg *ea)
}
} else { /* fcport->d_id.b24 != ea->id.b24 */
fcport->d_id.b24 = ea->id.b24;
-   if (fcport->deleted == QLA_SESS_DELETED) {
+   if (fcport->deleted != QLA_SESS_DELETED) {
ql_dbg(ql_dbg_disc, vha, 0x2021,
"%s %d %8phC post del sess\n",
__func__, __LINE__, 
fcport->port_name);
@@ -3206,10 +3206,16 @@ static void qla2x00_async_gpnid_sp_done(void *s, int 
res)
struct event_arg ea;
struct qla_work_evt *e;
 
-   ql_dbg(ql_dbg_disc, vha, 0x2066,
-   "Async done-%s res %x ID %3phC. %8phC\n",
-   sp->name, res, ct_req->req.port_id.port_id,
-   ct_rsp->rsp.gpn_id.port_name);
+   if (res)
+   ql_dbg(ql_dbg_disc, vha, 0x2066,
+   "Async done-%s fail res %x ID %3phC. %8phC\n",
+   sp->name, res, ct_req->req.port_id.port_id,
+   ct_rsp->rsp.gpn_id.port_name);
+   else
+   ql_dbg(ql_dbg_disc, vha, 0x2066,
+   "Async done-%s good ID %3phC. %8phC\n",
+   sp->name, ct_req->req.port_id.port_id,
+   ct_rsp->rsp.gpn_id.port_name);
 
if (res) {
sp->free(sp);
diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index 1bafa043f9f1..be4c67b465b8 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -1452,6 +1452,8 @@ static void
 qla24xx_handle_plogi_done_event(struct scsi_qla_host *vha, struct event_arg 
*ea)
 {
port_id_t cid;  /* conflict Nport id */
+   u16 lid;
+   struct fc_port *conflict_fcport;
 
switch (ea->data[0]) {
case MBS_COMMAND_COMPLETE:
@@ -1467,8 +1469,12 @@ qla24xx_handle_plogi_done_event(struct scsi_qla_host 
*vha, struct event_arg *ea)
qla24xx_post_prli_work(vha, ea->fcport);
} else {
ql_dbg(ql_dbg_disc, vha, 0x20ea,
-   "%s %d %8phC post gpdb\n",
-   __func__, __LINE__, ea->fcport->port_name);
+   "%s %d %8phC LoopID 0x%x in use with %06x. post 
gnl\n",
+   __func__, __LINE__, ea->fcport->port_name,
+   ea->fcport->loop_id, ea->fcport->d_id.b24);
+
+   set_bit(ea->fcport->loop_id, vha->hw->loop_id_map);
+   ea->fcport->loop_id = FC_NO_LOOP_ID;
ea->fcport->chip_reset = 
vha->hw->base_qpair->chip_reset;
ea->fcport->logout_on_delete = 1;
ea->fcport->send_els_logo = 0;
@@ -1513,8 +1519,38 @@ qla24xx_handle_plogi_done_event(struct scsi_qla_host 
*vha, struct event_arg *ea)
ea->fcport->d_id.b.domain, ea->fcport->d_id.b.area,
ea->fcport->d_id.b.al_pa);
 
-   qla2x00_clear_loop_id(ea->fcport);
-   qla24xx_post_gidpn_work(vha, ea->fcport);
+   lid = ea->iop[1] & 0x;
+   qlt_find_sess_invalidate_other(vha,
+   wwn_to_u64(ea->fcport->port_name),
+   ea->fcport->d_id, lid, _fcport);
+
+   if (conflict_fcport) {
+   /*
+* Another fcport share the same loop_id/nport id.
+* Conflict fcport needs to finish cleanup before this
+* fcport can proceed to login.
+*/
+   conflict_fcport->conflict = ea->fcport;
+   ea->fcport->login_pause = 1;
+
+   ql_dbg(ql_dbg_disc, vha, 0x20ed,
+   "%s %d %8phC NPortId %06x inuse with loopid 0x%x. 
post gidpn\n",
+   __func__, __LINE__, ea->fcport->port_name,
+ 

[PATCH v3 06/22] qla2xxx: Retry switch command on time out

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

Retry GID_PN & GPN_ID switch commands for time out case.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc:  # 4.10+
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_gs.c | 34 ++
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
index 8984f857bb34..ea1b562ebc8a 100644
--- a/drivers/scsi/qla2xxx/qla_gs.c
+++ b/drivers/scsi/qla2xxx/qla_gs.c
@@ -175,6 +175,9 @@ qla2x00_chk_ms_status(scsi_qla_host_t *vha, ms_iocb_entry_t 
*ms_pkt,
set_bit(LOCAL_LOOP_UPDATE, >dpc_flags);
}
break;
+   case CS_TIMEOUT:
+   rval = QLA_FUNCTION_TIMEOUT;
+   /* drop through */
default:
ql_dbg(ql_dbg_disc, vha, 0x2033,
"%s failed, completion status (%x) on port_id: "
@@ -2889,9 +2892,22 @@ static void qla2x00_async_gidpn_sp_done(void *s, int res)
ea.rc = res;
ea.event = FCME_GIDPN_DONE;
 
-   ql_dbg(ql_dbg_disc, vha, 0x204f,
-   "Async done-%s res %x, WWPN %8phC ID %3phC \n",
-   sp->name, res, fcport->port_name, id);
+   if (res == QLA_FUNCTION_TIMEOUT) {
+   ql_dbg(ql_dbg_disc, sp->vha, 0x,
+   "Async done-%s WWPN %8phC timed out.\n",
+   sp->name, fcport->port_name);
+   qla24xx_post_gidpn_work(sp->vha, fcport);
+   sp->free(sp);
+   return;
+   } else if (res) {
+   ql_dbg(ql_dbg_disc, sp->vha, 0x,
+   "Async done-%s fail res %x, WWPN %8phC\n",
+   sp->name, res, fcport->port_name);
+   } else {
+   ql_dbg(ql_dbg_disc, vha, 0x204f,
+   "Async done-%s good WWPN %8phC ID %3phC\n",
+   sp->name, fcport->port_name, id);
+   }
 
qla2x00_fcport_event_handler(vha, );
 
@@ -3217,11 +3233,6 @@ static void qla2x00_async_gpnid_sp_done(void *s, int res)
sp->name, ct_req->req.port_id.port_id,
ct_rsp->rsp.gpn_id.port_name);
 
-   if (res) {
-   sp->free(sp);
-   return;
-   }
-
memset(, 0, sizeof(ea));
memcpy(ea.port_name, ct_rsp->rsp.gpn_id.port_name, WWN_SIZE);
ea.sp = sp;
@@ -3231,6 +3242,13 @@ static void qla2x00_async_gpnid_sp_done(void *s, int res)
ea.rc = res;
ea.event = FCME_GPNID_DONE;
 
+   if (res) {
+   if (res == QLA_FUNCTION_TIMEOUT)
+   qla24xx_post_gpnid_work(sp->vha, );
+   sp->free(sp);
+   return;
+   }
+
qla2x00_fcport_event_handler(vha, );
 
e = qla2x00_alloc_work(vha, QLA_EVT_GPNID_DONE);
-- 
2.12.0



[PATCH v3 04/22] qla2xxx: Skip IRQ affinity for Target QPairs

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

Fix co-existence between Block MQ and Target Mode. Block MQ
and initiator mode requires midlayer queue mapping to check
for IRQ to be affinitize. For target mode, it's not the case.

Fixes: 09620eeb62c41 ("scsi: qla2xxx: Add debug knob for user control workload")
Cc:  # 4.12+
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_os.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index dfbf82e716b0..428e1bfaa83b 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -6609,9 +6609,14 @@ qla83xx_disable_laser(scsi_qla_host_t *vha)
 
 static int qla2xxx_map_queues(struct Scsi_Host *shost)
 {
+   int rc;
scsi_qla_host_t *vha = (scsi_qla_host_t *)shost->hostdata;
 
-   return blk_mq_pci_map_queues(>tag_set, vha->hw->pdev);
+   if (USER_CTRL_IRQ(vha->hw))
+   rc = blk_mq_map_queues(>tag_set);
+   else
+   rc = blk_mq_pci_map_queues(>tag_set, vha->hw->pdev);
+   return rc;
 }
 
 static const struct pci_error_handlers qla2xxx_err_handler = {
-- 
2.12.0



[PATCH v3 03/22] qla2xxx: Move session delete to driver work queue

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

Move session delete from system work queue to driver's
work queue for in time processing.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc:  # 4.10+
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_os.c | 3 ++-
 drivers/scsi/qla2xxx/qla_target.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 46f2d0cf7c0d..dfbf82e716b0 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -3193,10 +3193,11 @@ qla2x00_probe_one(struct pci_dev *pdev, const struct 
pci_device_id *id)
host->can_queue, base_vha->req,
base_vha->mgmt_svr_loop_id, host->sg_tablesize);
 
+   ha->wq = alloc_workqueue("qla2xxx_wq", WQ_MEM_RECLAIM, 0);
+
if (ha->mqenable) {
bool mq = false;
bool startit = false;
-   ha->wq = alloc_workqueue("qla2xxx_wq", WQ_MEM_RECLAIM, 0);
 
if (QLA_TGT_MODE_ENABLED()) {
mq = true;
diff --git a/drivers/scsi/qla2xxx/qla_target.c 
b/drivers/scsi/qla2xxx/qla_target.c
index 1259ec85ec0a..924d58f5408f 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -1205,7 +1205,8 @@ void qlt_schedule_sess_for_deletion(struct fc_port *sess,
ql_dbg(ql_dbg_tgt, sess->vha, 0xe001,
"Scheduling sess %p for deletion\n", sess);
 
-   schedule_work(>del_work);
+   INIT_WORK(>del_work, qla24xx_delete_sess_fn);
+   queue_work(sess->vha->hw->wq, >del_work);
 }
 
 void qlt_schedule_sess_for_deletion_lock(struct fc_port *sess)
-- 
2.12.0



[PATCH v3 02/22] qla2xxx: Fix gpnid error processing

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

Stop GPNID command from advancing if command has failed.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc:  # 4.10+
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_gs.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
index bc3db6abc9a0..ddc69d36877e 100644
--- a/drivers/scsi/qla2xxx/qla_gs.c
+++ b/drivers/scsi/qla2xxx/qla_gs.c
@@ -3211,6 +3211,11 @@ static void qla2x00_async_gpnid_sp_done(void *s, int res)
sp->name, res, ct_req->req.port_id.port_id,
ct_rsp->rsp.gpn_id.port_name);
 
+   if (res) {
+   sp->free(sp);
+   return;
+   }
+
memset(, 0, sizeof(ea));
memcpy(ea.port_name, ct_rsp->rsp.gpn_id.port_name, WWN_SIZE);
ea.sp = sp;
-- 
2.12.0



[PATCH v3 01/22] qla2xxx: Fix system crash for Notify ack timeout handling

2017-12-04 Thread Himanshu Madhani
From: Quinn Tran 

Fix NULL pointer crash due to missing timeout handling callback
for Notify Ack IOCB.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc:  # 4.10+
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/qla2xxx/qla_target.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c 
b/drivers/scsi/qla2xxx/qla_target.c
index 18069edd4773..1259ec85ec0a 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -665,7 +665,7 @@ int qla24xx_async_notify_ack(scsi_qla_host_t *vha, 
fc_port_t *fcport,
qla2x00_init_timer(sp, qla2x00_get_async_timeout(vha)+2);
 
sp->u.iocb_cmd.u.nack.ntfy = ntfy;
-
+   sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout;
sp->done = qla2x00_async_nack_sp_done;
 
rval = qla2x00_start_sp(sp);
-- 
2.12.0



[PATCH v3 00/22] qla2xxx: Bug fixes for 4.15-rc2

2017-12-04 Thread Himanshu Madhani
Hi Martin,

This series contains bug fixes discovered during error handling test
cases for large fabric.

Please apply this series to 4.15-rc2 at your earliest convenience.

Changes from v2 -> v3

o Added Reviewed-by tag from Hannes. 
o Fixed Spelling mistake in patch 7.

Changes from v1 -> v2

o Updated patch description for patch 14 as per Bart's suggestion.

Thanks,
Himanshu
 
Giridhar Malavali (2):
  qla2xxx: Defer processing of GS IOCB calls
  qla2xxx: Remove aborting ELS IOCB call issued as part of timeout.

Himanshu Madhani (2):
  qla2xxx: Fix memory leak in dual/target mode
  qla2xxx: Update driver version to 10.00.00.03-k

Quinn Tran (17):
  qla2xxx: Fix system crash for Notify ack timeout handling
  qla2xxx: Fix gpnid error processing
  qla2xxx: Move session delete to driver work queue
  qla2xxx: Skip IRQ affinity for Target QPairs
  qla2xxx: Fix re-login for Nport Handle in use
  qla2xxx: Retry switch command on time out
  qla2xxx: Serialize GPNID for multiple RSCN
  qla2xxx: Fix login state machine stuck at GPDB
  qla2xxx: Relogin to target port on a cable swap
  qla2xxx: Fix Relogin being triggered too fast
  qla2xxx: Clear send ELS LOGO flag after target re-login
  qla2xxx: Fix PRLI state check
  qla2xxx: Fix abort command deadlock due to spinlock
  qla2xxx: Replace fcport alloc with qla2x00_alloc_fcport
  qla2xxx: Fix scan state field for fcport
  qla2xxx: Clear loop id after delete
  qla2xxx: Fix system crash in qlt_plogi_ack_unref

Sawan Chandak (1):
  qla2xxx: Fix NPIV host cleanup in target mode

 drivers/scsi/qla2xxx/qla_def.h |  49 
 drivers/scsi/qla2xxx/qla_gs.c  | 230 ++---
 drivers/scsi/qla2xxx/qla_init.c|  69 +--
 drivers/scsi/qla2xxx/qla_iocb.c|  13 ---
 drivers/scsi/qla2xxx/qla_isr.c |   7 +-
 drivers/scsi/qla2xxx/qla_mbx.c |   3 +-
 drivers/scsi/qla2xxx/qla_mid.c |  42 ---
 drivers/scsi/qla2xxx/qla_os.c  |  78 ++---
 drivers/scsi/qla2xxx/qla_target.c  |  60 +++---
 drivers/scsi/qla2xxx/qla_version.h |   2 +-
 10 files changed, 405 insertions(+), 148 deletions(-)

-- 
2.12.0



Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()

2017-12-04 Thread Ming Lei
On Mon, Dec 04, 2017 at 03:09:20PM +, Bart Van Assche wrote:
> On Sun, 2017-12-03 at 00:31 +0800, Ming Lei wrote:
> > Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for 
> > blk-mq")
> 
> It might be safer to revert commit 0df21c86bdbf instead of trying to fix all
> issues introduced by that commit for kernel version v4.15 ...

What are all issues in v4.15-rc? Up to now, it is the only issue reported,
and can be fixed by this simple patch, which one can be thought as cleanup
too.

-- 
Ming


Re: [PATCH] blk-mq: Fix several SCSI request queue lockups

2017-12-04 Thread Ming Lei
On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote:
> Commit 0df21c86bdbf introduced several bugs:
> * A SCSI queue stall for queue depths > 1, addressed by commit
>   88022d7201e9 ("blk-mq: don't handle failure in .get_budget")

This one is committed already.

> * A systematic lockup for SCSI queues with queue depth 1. The
>   following test reproduces that bug systematically:
>   - Change the SRP initiator such that SCSI target queue depth is
> limited to 1.
>   - Run the following command:
>   srp-test/run_tests -f xfs -d -e none -r 60 -t 01
>   See also "[PATCH 4/7] blk-mq: Avoid that request processing
>   stalls when sharing tags"
>   (https://marc.info/?l=linux-block=151208695316857). Note:
>   reverting commit 0df21c86bdbf also fixes a sporadic SCSI request
>   queue lockup while inserting a blk_mq_sched_mark_restart_hctx()
>   before all blk_mq_dispatch_rq_list() calls only fixes the
>   systematic lockup for queue depth 1.

You are the only reproducer, and you don't want to provide any kernel
log about this issue, so how can we help you fix your issue?

You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched:
improve dispatching from sw queue")', but you don't mention any issue
about that commit, and your patch is actually nothing to do with
commit b347689ffbca, and seems your work style is just try and guess.

Also both Jens and I have run tests on null_blk and scsi_debug by setting
queue_depth as one, and we all can't see IO hang with current blk-mq.

> * A scsi_debug lockup - see also "[PATCH] SCSI: delay run queue if
>   device is blocked in scsi_dev_queue_ready()"
>   (https://marc.info/?l=linux-block=151223233407154).

This issue is clearly explained in theory, and can be reproduced/verified
by scsi_debug, so why can't we apply it to fix the issue? And the fix is
simply and can be thought as cleanup too, since the handling for this case
becomes same with non-mq path now.

> 
> I think the above means that it is too risky to try to fix all bugs
> introduced by commit 0df21c86bdbf before kernel v4.15 is released.
> Hence revert that commit.

What is the risk?

> 
> Fixes: commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for 
> blk-mq")
> Signed-off-by: Bart Van Assche 
> Cc: Ming Lei 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> Cc: James E.J. Bottomley 
> Cc: Martin K. Petersen 
> Cc: linux-scsi@vger.kernel.org

This commit fixes one important SCSI_MQ performance issue, we can't
simply revert it just because of one un-confirmed report from you
only(without any kernel log provided).

So Nak.

-- 
Ming


[PATCH] ibmvscsis: add DRC indices to debug statements

2017-12-04 Thread Bryant G. Ly
Where applicable, changes pr_debug, pr_info, pr_err, etc. calls
to the dev_* versions.  This adds the DRC index of the device to the
corresponding trace statement.

Signed-off-by: Bryant G. Ly 
Signed-off-by: Brad Warrum 
---
 drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 320 ---
 1 file changed, 170 insertions(+), 150 deletions(-)

diff --git a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c 
b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
index 2799a6b08f736..c3a76af9f5fa9 100644
--- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
+++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
@@ -122,7 +122,7 @@ static bool connection_broken(struct scsi_info *vscsi)
   cpu_to_be64(buffer[MSG_HI]),
   cpu_to_be64(buffer[MSG_LOW]));
 
-   pr_debug("connection_broken: rc %ld\n", h_return_code);
+   dev_dbg(>dev, "Connection_broken: rc %ld\n", h_return_code);
 
if (h_return_code == H_CLOSED)
rc = true;
@@ -210,7 +210,7 @@ static long ibmvscsis_unregister_command_q(struct scsi_info 
*vscsi)
}
} while (qrc != H_SUCCESS && rc == ADAPT_SUCCESS);
 
-   pr_debug("Freeing CRQ: phyp rc %ld, rc %ld\n", qrc, rc);
+   dev_dbg(>dev, "Freeing CRQ: phyp rc %ld, rc %ld\n", qrc, rc);
 
return rc;
 }
@@ -291,9 +291,9 @@ static long ibmvscsis_free_command_q(struct scsi_info 
*vscsi)
ibmvscsis_delete_client_info(vscsi, false);
}
 
-   pr_debug("free_command_q: flags 0x%x, state 0x%hx, acr_flags 
0x%x, acr_state 0x%hx\n",
-vscsi->flags, vscsi->state, vscsi->phyp_acr_flags,
-vscsi->phyp_acr_state);
+   dev_dbg(>dev, "free_command_q: flags 0x%x, state 0x%hx, 
acr_flags 0x%x, acr_state 0x%hx\n",
+   vscsi->flags, vscsi->state, vscsi->phyp_acr_flags,
+   vscsi->phyp_acr_state);
}
return rc;
 }
@@ -428,8 +428,8 @@ static void ibmvscsis_disconnect(struct work_struct *work)
vscsi->flags |= DISCONNECT_SCHEDULED;
vscsi->flags &= ~SCHEDULE_DISCONNECT;
 
-   pr_debug("disconnect: flags 0x%x, state 0x%hx\n", vscsi->flags,
-vscsi->state);
+   dev_dbg(>dev, "disconnect: flags 0x%x, state 0x%hx\n",
+   vscsi->flags, vscsi->state);
 
/*
 * check which state we are in and see if we
@@ -540,13 +540,14 @@ static void ibmvscsis_disconnect(struct work_struct *work)
}
 
if (wait_idle) {
-   pr_debug("disconnect start wait, active %d, sched %d\n",
-(int)list_empty(>active_q),
-(int)list_empty(>schedule_q));
+   dev_dbg(>dev, "disconnect start wait, active %d, sched 
%d\n",
+   (int)list_empty(>active_q),
+   (int)list_empty(>schedule_q));
if (!list_empty(>active_q) ||
!list_empty(>schedule_q)) {
vscsi->flags |= WAIT_FOR_IDLE;
-   pr_debug("disconnect flags 0x%x\n", vscsi->flags);
+   dev_dbg(>dev, "disconnect flags 0x%x\n",
+   vscsi->flags);
/*
 * This routine is can not be called with the interrupt
 * lock held.
@@ -555,7 +556,7 @@ static void ibmvscsis_disconnect(struct work_struct *work)
wait_for_completion(>wait_idle);
spin_lock_bh(>intr_lock);
}
-   pr_debug("disconnect stop wait\n");
+   dev_dbg(>dev, "disconnect stop wait\n");
 
ibmvscsis_adapter_idle(vscsi);
}
@@ -597,8 +598,8 @@ static void ibmvscsis_post_disconnect(struct scsi_info 
*vscsi, uint new_state,
 
vscsi->flags |= flag_bits;
 
-   pr_debug("post_disconnect: new_state 0x%x, flag_bits 0x%x, vscsi->flags 
0x%x, state %hx\n",
-new_state, flag_bits, vscsi->flags, vscsi->state);
+   dev_dbg(>dev, "post_disconnect: new_state 0x%x, flag_bits 0x%x, 
vscsi->flags 0x%x, state %hx\n",
+   new_state, flag_bits, vscsi->flags, vscsi->state);
 
if (!(vscsi->flags & (DISCONNECT_SCHEDULED | SCHEDULE_DISCONNECT))) {
vscsi->flags |= SCHEDULE_DISCONNECT;
@@ -648,8 +649,8 @@ static void ibmvscsis_post_disconnect(struct scsi_info 
*vscsi, uint new_state,
}
}
 
-   pr_debug("Leaving post_disconnect: flags 0x%x, new_state 0x%x\n",
-vscsi->flags, vscsi->new_state);
+   dev_dbg(>dev, "Leaving post_disconnect: flags 0x%x, new_state 
0x%x\n",
+   vscsi->flags, vscsi->new_state);
 }
 
 /**
@@ -724,7 +725,8 @@ static long ibmvscsis_handle_init_msg(struct scsi_info 
*vscsi)
break;
 
case H_CLOSED:
-   

Re: [EXT] Re: UFS utilities

2017-12-04 Thread gre...@linuxfoundation.org
On Mon, Dec 04, 2017 at 03:20:34PM +, Bean Huo (beanhuo) wrote:
> Hi, Bart
> Sorry for later!
> >
> >Hello Bean,
> >
> >Please be more specific. What is inconvenient about sg3_utils on embedded
> >ARM systems?
> >
> Exactly, I don't know how to compile sg3_utils with static library, instead 
> of sharing library. I used following configuration
> Parameter:
> ./configure --enable-static=yes --build=x86_64-unknown-linux-gnu 
> --host=aarch64-linux-gnu  --prefix=$PWD/out/  CC=aarch64-linux-gnu-g
> cc --target=aarch64-linux-gnu LD=aarch64-linux-gnu-ld 
> AS=aarch64-linux-gnu-as CFLAGS=-static LDFLAGS=-static
> 
> But the object files are still dynamically linked.
> 
> ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, 
> interpreter /lib/ld-linux-aarch64.so.1, 
> for GNU/Linux 3.7.0, BuildID[sha1]=4f01b4c9f1ff47bc00aef93950c02734b4cc8e57, 
> not stripped.
> 
> I want it to be statically linked. Otherwise, I should copy its library to my 
> lib folder, and sometimes for the embedded, 
> Need to re-build rootfs. Meanwhile, for the UFS, there are totally 27 scsi 
> commands being used based on UFS2.1.
> For the most case, we just use several sg3_utils object files, don't need to 
> copy all object files to the ending product. 

So what UFS commands are you missing that you need to see implemented?

And again, have you checked the different forks of the driver?

> >> And also it doesn't support several UFS special command.
> >
> >Are you referring to SCSI commands or rather to UFS commands that fall
> >outside the SCSI spec? Anyway, an approach that is used by many SCSI drivers
> >to export information to user space that falls outside the SCSI spec is to
> >create additional sysfs attributes. See also the sdev_attrs and shost_attrs
> >members of struct scsi_host_template.
> >
> Yes, for the UFS information, I can use these interface/approach to easily 
> get.
> I am thinking how about some testing case and configuration operation.

Which ones exactly?

> Also, is it possible bypass SCSI stacks and go into directly UFS stack?

Look at the different sysfs files for the UFS device, it does that for
some commands.

thanks,

greg k-h


[PATCH 2/3] Use blist_flags_t consistently

2017-12-04 Thread Bart Van Assche
Use the type blist_flags_t for all variables that represent blacklist
flags. Additionally, suppress recently introduced sparse warnings related
to blacklist flags.

Fixes: commit c6b54164508a ("scsi: Use 'blist_flags_t' for scsi_devinfo flags")
Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 drivers/scsi/scsi_devinfo.c   |  8 +++-
 drivers/scsi/scsi_scan.c  | 13 +++--
 drivers/scsi/scsi_sysfs.c |  4 ++--
 drivers/scsi/scsi_transport_spi.c | 12 +++-
 4 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index 2fed250e87bf..4b33c001ae23 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -382,10 +382,8 @@ int scsi_dev_info_list_add_keyed(int compatible, char 
*vendor, char *model,
model, compatible);
 
if (strflags)
-   devinfo->flags = simple_strtoul(strflags, NULL, 0);
-   else
-   devinfo->flags = flags;
-
+   flags = (__force blist_flags_t)simple_strtoul(strflags, NULL, 
0);
+   devinfo->flags = flags;
devinfo->compatible = compatible;
 
if (compatible)
@@ -612,7 +610,7 @@ blist_flags_t scsi_get_device_flags_keyed(struct 
scsi_device *sdev,
if (sdev->sdev_bflags)
return sdev->sdev_bflags;
 
-   return scsi_default_dev_flags;
+   return (__force blist_flags_t)scsi_default_dev_flags;
 }
 EXPORT_SYMBOL(scsi_get_device_flags_keyed);
 
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index be5e919db0e8..0880d975eed3 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -770,7 +770,7 @@ static int scsi_probe_lun(struct scsi_device *sdev, 
unsigned char *inq_result,
  * SCSI_SCAN_LUN_PRESENT: a new scsi_device was allocated and initialized
  **/
 static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
-   int *bflags, int async)
+   blist_flags_t *bflags, int async)
 {
int ret;
 
@@ -1049,14 +1049,15 @@ static unsigned char *scsi_inq_str(unsigned char *buf, 
unsigned char *inq,
  *   - SCSI_SCAN_LUN_PRESENT: a new scsi_device was allocated and initialized
  **/
 static int scsi_probe_and_add_lun(struct scsi_target *starget,
- u64 lun, int *bflagsp,
+ u64 lun, blist_flags_t *bflagsp,
  struct scsi_device **sdevp,
  enum scsi_scan_mode rescan,
  void *hostdata)
 {
struct scsi_device *sdev;
unsigned char *result;
-   int bflags, res = SCSI_SCAN_NO_RESPONSE, result_len = 256;
+   blist_flags_t bflags;
+   int res = SCSI_SCAN_NO_RESPONSE, result_len = 256;
struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
 
/*
@@ -1201,7 +1202,7 @@ static int scsi_probe_and_add_lun(struct scsi_target 
*starget,
  * Modifies sdevscan->lun.
  **/
 static void scsi_sequential_lun_scan(struct scsi_target *starget,
-int bflags, int scsi_level,
+blist_flags_t bflags, int scsi_level,
 enum scsi_scan_mode rescan)
 {
uint max_dev_lun;
@@ -1292,7 +1293,7 @@ static void scsi_sequential_lun_scan(struct scsi_target 
*starget,
  * 0: scan completed (or no memory, so further scanning is futile)
  * 1: could not scan with REPORT LUN
  **/
-static int scsi_report_lun_scan(struct scsi_target *starget, int bflags,
+static int scsi_report_lun_scan(struct scsi_target *starget, blist_flags_t 
bflags,
enum scsi_scan_mode rescan)
 {
unsigned char scsi_cmd[MAX_COMMAND_SIZE];
@@ -1538,7 +1539,7 @@ static void __scsi_scan_target(struct device *parent, 
unsigned int channel,
unsigned int id, u64 lun, enum scsi_scan_mode rescan)
 {
struct Scsi_Host *shost = dev_to_shost(parent);
-   int bflags = 0;
+   blist_flags_t bflags = 0;
int res;
struct scsi_target *starget;
 
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 50e7d7e4a861..6ee964643531 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -967,7 +967,7 @@ sdev_show_wwid(struct device *dev, struct device_attribute 
*attr,
 }
 static DEVICE_ATTR(wwid, S_IRUGO, sdev_show_wwid, NULL);
 
-#define BLIST_FLAG_NAME(name) [ilog2(BLIST_##name)] = #name
+#define BLIST_FLAG_NAME(name) [ilog2((__force u32)BLIST_##name)] = #name
 static const char *const sdev_bflags_name[] = {
 #include "scsi_devinfo_tbl.c"
 };
@@ -984,7 +984,7 @@ sdev_show_blacklist(struct device *dev, struct 
device_attribute *attr,
for (i = 0; i < sizeof(sdev->sdev_bflags) * BITS_PER_BYTE; i++) {
 

[PATCH 0/3] SCSI device blacklist handling improvements

2017-12-04 Thread Bart Van Assche
Hello Martin,

These three patches is what I came up with after having reviewed recent
changes in the code for handling blacklist flags handling. Please consider
these patches for kernel v4.16.

Note: since patch "Use 'blist_flags_t' for scsi_devinfo flags" is not yet
in the 4.16/scsi-queue branch I have developed these patches on top of a
merge of the v4.15-rc1 release and your 4.16/scsi-queue branch.

Thanks,

Bart.

Bart Van Assche (3):
  scsi_get_device_flags_keyed(): Always return device flags
  Use blist_flags_t consistently
  Introduce scsi_devinfo_key enumeration type

 drivers/scsi/scsi_devinfo.c   | 29 -
 drivers/scsi/scsi_priv.h  | 14 --
 drivers/scsi/scsi_scan.c  | 13 +++--
 drivers/scsi/scsi_sysfs.c |  4 ++--
 drivers/scsi/scsi_transport_spi.c | 12 +++-
 5 files changed, 36 insertions(+), 36 deletions(-)

-- 
2.15.0



[PATCH 3/3] Introduce scsi_devinfo_key enumeration type

2017-12-04 Thread Bart Van Assche
Since symbolic names for the device information keys alread exist,
associate an enumeration type with these symbolic values. This change
makes it clear what the valid values for the 'key' arguments are.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 drivers/scsi/scsi_devinfo.c | 14 --
 drivers/scsi/scsi_priv.h| 14 --
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index 4b33c001ae23..82bc807e1d50 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -361,7 +361,8 @@ static int scsi_dev_info_list_add(int compatible, char 
*vendor, char *model,
  * Returns: 0 OK, -error on failure.
  **/
 int scsi_dev_info_list_add_keyed(int compatible, char *vendor, char *model,
-char *strflags, blist_flags_t flags, int key)
+char *strflags, blist_flags_t flags,
+enum scsi_devinfo_key key)
 {
struct scsi_dev_info_list *devinfo;
struct scsi_dev_info_list_table *devinfo_table =
@@ -410,7 +411,7 @@ EXPORT_SYMBOL(scsi_dev_info_list_add_keyed);
  * Returns: pointer to matching entry, or ERR_PTR on failure.
  **/
 static struct scsi_dev_info_list *scsi_dev_info_list_find(const char *vendor,
-   const char *model, int key)
+   const char *model, enum scsi_devinfo_key key)
 {
struct scsi_dev_info_list *devinfo;
struct scsi_dev_info_list_table *devinfo_table =
@@ -492,7 +493,8 @@ static struct scsi_dev_info_list 
*scsi_dev_info_list_find(const char *vendor,
  *
  * Returns: 0 OK, -error on failure.
  **/
-int scsi_dev_info_list_del_keyed(char *vendor, char *model, int key)
+int scsi_dev_info_list_del_keyed(char *vendor, char *model,
+enum scsi_devinfo_key key)
 {
struct scsi_dev_info_list *found;
 
@@ -594,7 +596,7 @@ blist_flags_t scsi_get_device_flags(struct scsi_device 
*sdev,
 blist_flags_t scsi_get_device_flags_keyed(struct scsi_device *sdev,
const unsigned char *vendor,
const unsigned char *model,
-   int key)
+   enum scsi_devinfo_key key)
 {
struct scsi_dev_info_list *devinfo;
 
@@ -776,7 +778,7 @@ void scsi_exit_devinfo(void)
  * Adds the requested list, returns zero on success, -EEXIST if the
  * key is already registered to a list, or other error on failure.
  */
-int scsi_dev_info_add_list(int key, const char *name)
+int scsi_dev_info_add_list(enum scsi_devinfo_key key, const char *name)
 {
struct scsi_dev_info_list_table *devinfo_table =
scsi_devinfo_lookup_by_key(key);
@@ -808,7 +810,7 @@ EXPORT_SYMBOL(scsi_dev_info_add_list);
  * frees the list itself.  Returns 0 on success or -EINVAL if the key
  * can't be found.
  */
-int scsi_dev_info_remove_list(int key)
+int scsi_dev_info_remove_list(enum scsi_devinfo_key key)
 {
struct list_head *lh, *lh_next;
struct scsi_dev_info_list_table *devinfo_table =
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index a5946cd64caa..61024db5953d 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -45,7 +45,7 @@ static inline void scsi_log_completion(struct scsi_cmnd *cmd, 
int disposition)
 /* scsi_devinfo.c */
 
 /* list of keys for the lists */
-enum {
+enum scsi_devinfo_key {
SCSI_DEVINFO_GLOBAL = 0,
SCSI_DEVINFO_SPI,
 };
@@ -56,13 +56,15 @@ extern blist_flags_t scsi_get_device_flags(struct 
scsi_device *sdev,
 extern blist_flags_t scsi_get_device_flags_keyed(struct scsi_device *sdev,
 const unsigned char *vendor,
 const unsigned char *model,
-int key);
+enum scsi_devinfo_key key);
 extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor,
char *model, char *strflags,
-   blist_flags_t flags, int key);
-extern int scsi_dev_info_list_del_keyed(char *vendor, char *model, int key);
-extern int scsi_dev_info_add_list(int key, const char *name);
-extern int scsi_dev_info_remove_list(int key);
+   blist_flags_t flags,
+   enum scsi_devinfo_key key);
+extern int scsi_dev_info_list_del_keyed(char *vendor, char *model,
+   enum scsi_devinfo_key key);
+extern int scsi_dev_info_add_list(enum scsi_devinfo_key key, const char *name);
+extern int scsi_dev_info_remove_list(enum scsi_devinfo_key key);
 
 extern int __init scsi_init_devinfo(void);
 extern 

[PATCH 1/3] scsi_get_device_flags_keyed(): Always return device flags

2017-12-04 Thread Bart Van Assche
Since scsi_get_device_flags_keyed() callers do not check whether or
not the returned value is an error code, change that function such
that it returns a flags value even if the 'key' argument is invalid.
Note: since commit 28a0bc4120d3 ("scsi: sd: Implement blacklist
option for WRITE SAME w/ UNMAP") bit 31 is a valid device information
flag so checking whether bit 31 is set in the return value is not
sufficient to tell the difference between an error code and a flags
value.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 drivers/scsi/scsi_devinfo.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index 78d4aa8df675..2fed250e87bf 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -599,17 +599,12 @@ blist_flags_t scsi_get_device_flags_keyed(struct 
scsi_device *sdev,
int key)
 {
struct scsi_dev_info_list *devinfo;
-   int err;
 
devinfo = scsi_dev_info_list_find(vendor, model, key);
if (!IS_ERR(devinfo))
return devinfo->flags;
 
-   err = PTR_ERR(devinfo);
-   if (err != -ENOENT)
-   return err;
-
-   /* nothing found, return nothing */
+   /* key or device not found: return nothing */
if (key != SCSI_DEVINFO_GLOBAL)
return 0;
 
-- 
2.15.0



[PATCH v3 1/2] Ensure that the SCSI error handler gets woken up

2017-12-04 Thread Bart Van Assche
If scsi_eh_scmd_add() is called concurrently with
scsi_host_queue_ready() while shost->host_blocked > 0 then it can
happen that neither function wakes up the SCSI error handler. Fix
this by making every function that decreases the host_busy counter
wake up the error handler if necessary and by protecting the
host_failed checks with the SCSI host lock.

Reported-by: Pavel Tikhomirov 
References: https://marc.info/?l=linux-kernel=150461610630736
Fixes: commit 746650160866 ("scsi: convert host_busy to atomic_t")
Signed-off-by: Bart Van Assche 
Cc: Konstantin Khorenko 
Cc: Stuart Hayes 
Cc: Pavel Tikhomirov 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
Cc: 
---
 drivers/scsi/hosts.c  |  6 ++
 drivers/scsi/scsi_error.c | 18 --
 drivers/scsi/scsi_lib.c   | 39 ---
 include/scsi/scsi_host.h  |  2 ++
 4 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index a306af6a5ea7..a0a7e4ff255c 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -324,6 +324,9 @@ static void scsi_host_dev_release(struct device *dev)
 
scsi_proc_hostdir_rm(shost->hostt);
 
+   /* Wait for functions invoked through call_rcu(>rcu, ...) */
+   rcu_barrier();
+
if (shost->tmf_work_q)
destroy_workqueue(shost->tmf_work_q);
if (shost->ehandler)
@@ -331,6 +334,8 @@ static void scsi_host_dev_release(struct device *dev)
if (shost->work_q)
destroy_workqueue(shost->work_q);
 
+   destroy_rcu_head(>rcu);
+
if (shost->shost_state == SHOST_CREATED) {
/*
 * Free the shost_dev device name here if scsi_host_alloc()
@@ -399,6 +404,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template 
*sht, int privsize)
INIT_LIST_HEAD(>starved_list);
init_waitqueue_head(>host_wait);
mutex_init(>scan_mutex);
+   init_rcu_head(>rcu);
 
index = ida_simple_get(_index_ida, 0, 0, GFP_KERNEL);
if (index < 0)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 5e89049e9b4e..258b8a741992 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -226,6 +226,17 @@ static void scsi_eh_reset(struct scsi_cmnd *scmd)
}
 }
 
+static void scsi_eh_inc_host_failed(struct rcu_head *head)
+{
+   struct Scsi_Host *shost = container_of(head, typeof(*shost), rcu);
+   unsigned long flags;
+
+   spin_lock_irqsave(shost->host_lock, flags);
+   shost->host_failed++;
+   scsi_eh_wakeup(shost);
+   spin_unlock_irqrestore(shost->host_lock, flags);
+}
+
 /**
  * scsi_eh_scmd_add - add scsi cmd to error handling.
  * @scmd:  scmd to run eh on.
@@ -248,9 +259,12 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
 
scsi_eh_reset(scmd);
list_add_tail(>eh_entry, >eh_cmd_q);
-   shost->host_failed++;
-   scsi_eh_wakeup(shost);
spin_unlock_irqrestore(shost->host_lock, flags);
+   /*
+* Ensure that all tasks observe the host state change before the
+* host_failed change.
+*/
+   call_rcu(>rcu, scsi_eh_inc_host_failed);
 }
 
 /**
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b6d3842b6809..5cbc69b2b1ae 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -318,22 +318,39 @@ static void scsi_init_cmd_errh(struct scsi_cmnd *cmd)
cmd->cmd_len = scsi_command_size(cmd->cmnd);
 }
 
-void scsi_device_unbusy(struct scsi_device *sdev)
+/*
+ * Decrement the host_busy counter and wake up the error handler if necessary.
+ * Avoid as follows that the error handler is not woken up if shost->host_busy
+ * == shost->host_failed: use call_rcu() in scsi_eh_scmd_add() in combination
+ * with an RCU read lock in this function to ensure that this function in its
+ * entirety either finishes before scsi_eh_scmd_add() increases the
+ * host_failed counter or that it notices the shost state change made by
+ * scsi_eh_scmd_add().
+ */
+static void scsi_dec_host_busy(struct Scsi_Host *shost)
 {
-   struct Scsi_Host *shost = sdev->host;
-   struct scsi_target *starget = scsi_target(sdev);
unsigned long flags;
 
+   rcu_read_lock();
atomic_dec(>host_busy);
-   if (starget->can_queue > 0)
-   atomic_dec(>target_busy);
-
-   if (unlikely(scsi_host_in_recovery(shost) &&
-(shost->host_failed || shost->host_eh_scheduled))) {
+   if (unlikely(scsi_host_in_recovery(shost))) {
spin_lock_irqsave(shost->host_lock, flags);
-   scsi_eh_wakeup(shost);
+   if (shost->host_failed || shost->host_eh_scheduled)
+   

[PATCH v3 0/2] Ensure that the SCSI error handler gets woken up

2017-12-04 Thread Bart Van Assche
Hello Martin,

As reported by Pavel Tikhomirov it can happen that the SCSI error handler does
not get woken up. This is very annoying because it results in a queue
stall. The two patches in this series address this issue without acquiring the
SCSI host lock in the hot path. Please consider these patches for kernel
v4.16.

Thanks,

Bart.

Changes between v2 and v3:
- Made it again safe to call scsi_eh_scmd_add() from interrupt context.

Changes between v1 and v2:
- Ensure that host_lock is held while checking host_failed.
- Moved the lockdep_assert_held() change into a separate patch.

Bart Van Assche (2):
  Ensure that the SCSI error handler gets woken up
  Convert a source code comment into a runtime check

 drivers/scsi/hosts.c  |  6 ++
 drivers/scsi/scsi_error.c | 21 ++---
 drivers/scsi/scsi_lib.c   | 39 ---
 include/scsi/scsi_host.h  |  2 ++
 4 files changed, 54 insertions(+), 14 deletions(-)

-- 
2.15.0



[PATCH v3 2/2] Convert a source code comment into a runtime check

2017-12-04 Thread Bart Van Assche
Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 drivers/scsi/scsi_error.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 258b8a741992..9cae0194e21a 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -61,9 +61,10 @@ static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
 static int scsi_try_to_abort_cmd(struct scsi_host_template *,
 struct scsi_cmnd *);
 
-/* called with shost->host_lock held */
 void scsi_eh_wakeup(struct Scsi_Host *shost)
 {
+   lockdep_assert_held(shost->host_lock);
+
if (atomic_read(>host_busy) == shost->host_failed) {
trace_scsi_eh_wakeup(shost);
wake_up_process(shost->ehandler);
-- 
2.15.0



[PATCH] blk-mq: Fix several SCSI request queue lockups

2017-12-04 Thread Bart Van Assche
Commit 0df21c86bdbf introduced several bugs:
* A SCSI queue stall for queue depths > 1, addressed by commit
  88022d7201e9 ("blk-mq: don't handle failure in .get_budget")
* A systematic lockup for SCSI queues with queue depth 1. The
  following test reproduces that bug systematically:
  - Change the SRP initiator such that SCSI target queue depth is
limited to 1.
  - Run the following command:
  srp-test/run_tests -f xfs -d -e none -r 60 -t 01
  See also "[PATCH 4/7] blk-mq: Avoid that request processing
  stalls when sharing tags"
  (https://marc.info/?l=linux-block=151208695316857). Note:
  reverting commit 0df21c86bdbf also fixes a sporadic SCSI request
  queue lockup while inserting a blk_mq_sched_mark_restart_hctx()
  before all blk_mq_dispatch_rq_list() calls only fixes the
  systematic lockup for queue depth 1.
* A scsi_debug lockup - see also "[PATCH] SCSI: delay run queue if
  device is blocked in scsi_dev_queue_ready()"
  (https://marc.info/?l=linux-block=151223233407154).

I think the above means that it is too risky to try to fix all bugs
introduced by commit 0df21c86bdbf before kernel v4.15 is released.
Hence revert that commit.

Fixes: commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for 
blk-mq")
Signed-off-by: Bart Van Assche 
Cc: Ming Lei 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
Cc: James E.J. Bottomley 
Cc: Martin K. Petersen 
Cc: linux-scsi@vger.kernel.org
---
 drivers/scsi/scsi_lib.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 84bd2b16d216..a7e7966f1477 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1976,9 +1976,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx 
*hctx,
struct scsi_device *sdev = q->queuedata;
struct Scsi_Host *shost = sdev->host;
struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
-   blk_status_t ret;
+   blk_status_t ret = BLK_STS_RESOURCE;
int reason;
 
+   if (!scsi_mq_get_budget(hctx))
+   goto out;
ret = prep_to_mq(scsi_prep_state_check(sdev, req));
if (ret != BLK_STS_OK)
goto out_put_budget;
@@ -2022,6 +2024,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx 
*hctx,
atomic_dec(_target(sdev)->target_busy);
 out_put_budget:
scsi_mq_put_budget(hctx);
+out:
switch (ret) {
case BLK_STS_OK:
break;
@@ -2225,8 +2228,6 @@ struct request_queue *scsi_old_alloc_queue(struct 
scsi_device *sdev)
 }
 
 static const struct blk_mq_ops scsi_mq_ops = {
-   .get_budget = scsi_mq_get_budget,
-   .put_budget = scsi_mq_put_budget,
.queue_rq   = scsi_queue_rq,
.complete   = scsi_softirq_done,
.timeout= scsi_timeout,
-- 
2.15.0



RE: [EXT] Re: UFS utilities

2017-12-04 Thread Bean Huo (beanhuo)
Hi, Bart
Sorry for later!
>
>Hello Bean,
>
>Please be more specific. What is inconvenient about sg3_utils on embedded
>ARM systems?
>
Exactly, I don't know how to compile sg3_utils with static library, instead of 
sharing library. I used following configuration
Parameter:
./configure --enable-static=yes --build=x86_64-unknown-linux-gnu 
--host=aarch64-linux-gnu  --prefix=$PWD/out/  CC=aarch64-linux-gnu-g
cc --target=aarch64-linux-gnu LD=aarch64-linux-gnu-ld 
AS=aarch64-linux-gnu-as CFLAGS=-static LDFLAGS=-static

But the object files are still dynamically linked.

ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, 
interpreter /lib/ld-linux-aarch64.so.1, 
for GNU/Linux 3.7.0, BuildID[sha1]=4f01b4c9f1ff47bc00aef93950c02734b4cc8e57, 
not stripped.

I want it to be statically linked. Otherwise, I should copy its library to my 
lib folder, and sometimes for the embedded, 
Need to re-build rootfs. Meanwhile, for the UFS, there are totally 27 scsi 
commands being used based on UFS2.1.
For the most case, we just use several sg3_utils object files, don't need to 
copy all object files to the ending product. 

>> And also it doesn't support several UFS special command.
>
>Are you referring to SCSI commands or rather to UFS commands that fall
>outside the SCSI spec? Anyway, an approach that is used by many SCSI drivers
>to export information to user space that falls outside the SCSI spec is to
>create additional sysfs attributes. See also the sdev_attrs and shost_attrs
>members of struct scsi_host_template.
>
Yes, for the UFS information, I can use these interface/approach to easily get.
I am thinking how about some testing case and configuration operation.
Also, is it possible bypass SCSI stacks and go into directly UFS stack?
Thanks for your inputs.
>Bart.

//Bean Huo


Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()

2017-12-04 Thread Bart Van Assche
On Sun, 2017-12-03 at 00:31 +0800, Ming Lei wrote:
> Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq")

It might be safer to revert commit 0df21c86bdbf instead of trying to fix all
issues introduced by that commit for kernel version v4.15 ...

Bart.

[PATCH] scsi: bfa: convert to strlcpy/strlcat

2017-12-04 Thread Arnd Bergmann
The bfa driver has a number of real issues with string termination
that gcc-8 now points out:

drivers/scsi/bfa/bfad_bsg.c: In function 'bfad_iocmd_port_get_attr':
drivers/scsi/bfa/bfad_bsg.c:320:9: error: argument to 'sizeof' in 'strncpy' 
call is the same expression as the source; did you mean to use the size of the 
destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c: In function 'bfa_fcs_fabric_psymb_init':
drivers/scsi/bfa/bfa_fcs.c:775:9: error: argument to 'sizeof' in 'strncat' call 
is the same expression as the source; did you mean to use the size of the 
destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:781:9: error: argument to 'sizeof' in 'strncat' call 
is the same expression as the source; did you mean to use the size of the 
destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:788:9: error: argument to 'sizeof' in 'strncat' call 
is the same expression as the source; did you mean to use the size of the 
destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:801:10: error: argument to 'sizeof' in 'strncat' 
call is the same expression as the source; did you mean to use the size of the 
destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:808:10: error: argument to 'sizeof' in 'strncat' 
call is the same expression as the source; did you mean to use the size of the 
destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c: In function 'bfa_fcs_fabric_nsymb_init':
drivers/scsi/bfa/bfa_fcs.c:837:10: error: argument to 'sizeof' in 'strncat' 
call is the same expression as the source; did you mean to use the size of the 
destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:844:10: error: argument to 'sizeof' in 'strncat' 
call is the same expression as the source; did you mean to use the size of the 
destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:852:10: error: argument to 'sizeof' in 'strncat' 
call is the same expression as the source; did you mean to use the size of the 
destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c: In function 'bfa_fcs_fabric_psymb_init':
drivers/scsi/bfa/bfa_fcs.c:778:2: error: 'strncat' output may be truncated 
copying 10 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs.c:784:2: error: 'strncat' output may be truncated 
copying 30 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs.c:803:3: error: 'strncat' output may be truncated 
copying 44 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs.c:811:3: error: 'strncat' output may be truncated 
copying 16 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs.c: In function 'bfa_fcs_fabric_nsymb_init':
drivers/scsi/bfa/bfa_fcs.c:840:2: error: 'strncat' output may be truncated 
copying 10 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs.c:847:2: error: 'strncat' output may be truncated 
copying 30 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_fdmi_get_hbaattr':
drivers/scsi/bfa/bfa_fcs_lport.c:2657:10: error: argument to 'sizeof' in 
'strncat' call is the same expression as the source; did you mean to use the 
size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs_lport.c:2659:11: error: argument to 'sizeof' in 
'strncat' call is the same expression as the source; did you mean to use the 
size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_lport_ms_gmal_response':
drivers/scsi/bfa/bfa_fcs_lport.c:3232:5: error: 'strncpy' output may be 
truncated copying 16 bytes from a string of length 247 
[-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_lport_ns_send_rspn_id':
drivers/scsi/bfa/bfa_fcs_lport.c:4670:3: error: 'strncpy' output truncated 
before terminating nul copying as many bytes from a string as its length 
[-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c:4682:3: error: 'strncat' output truncated 
before terminating nul copying as many bytes from a string as its length 
[-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c: In function 
'bfa_fcs_lport_ns_util_send_rspn_id':
drivers/scsi/bfa/bfa_fcs_lport.c:5206:3: error: 'strncpy' output truncated 
before terminating nul copying as many bytes from a string as its length 
[-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c:5215:3: error: 'strncat' output truncated 
before terminating nul copying as many bytes from a string as its length 
[-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_fdmi_get_portattr':
drivers/scsi/bfa/bfa_fcs_lport.c:2751:2: error: 'strncpy' specified bound 128 
equals destination size 

Driver version for PMC Adaptec HBA in Linux and from vendor

2017-12-04 Thread Paul Menzel

Dear Raghava, dear Linux folks,


Evaluating HBA extension cards, one of our key requirement is easy 
maintenance, especially when upgrading the firmware.


You provide the utility `arcconf` [1], which can be used for such tasks 
directly on the command line.


Unfortunately, we can’t find the source code for this application, which 
is something we’d like to have when executing programs with root privileges.


It’d be great to have something similar like flashrom [2], or the source 
of your program.


Do you know the reasons, why the source of this utility is not published 
under a free license?


Who can be contacted to discuss this issue further?


Kind regards,

Paul


[1] http://download.adaptec.com/raid/storage_manager/arcconf_v2_05_22932.zip
[2] https://www.flashrom.org/



smime.p7s
Description: S/MIME Cryptographic Signature


[Bug 198081] scsi sg

2017-12-04 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=198081

--- Comment #1 from Cristian Crinteanu (crinteanu.crist...@gmail.com) ---
*** Bug 198079 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 198081] New: scsi sg

2017-12-04 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=198081

Bug ID: 198081
   Summary: scsi sg
   Product: IO/Storage
   Version: 2.5
Kernel Version: 4.4.89 and higher
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: SCSI
  Assignee: linux-scsi@vger.kernel.org
  Reporter: crinteanu.crist...@gmail.com
Regression: No

Created attachment 261015
  --> https://bugzilla.kernel.org/attachment.cgi?id=261015=edit
scsi sg problems

hi, just noticed that one of the following commits introduced in 4.4.89 borks
some apps (like nerolinux and norton ghost 12.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3e0097499839e0fe3af380410eababe5a47c4cf9

https://lkml.org/lkml/2017/9/24/457

https://patchwork.kernel.org/patch/9968779/

https://lkml.org/lkml/2017/9/24/356

nerolinux fail to detect any drives on my pc with

 Call Trace:
  [] ? dump_stack+0x44/0x57
  [] ? warn_slowpath_common+0x85/0x9a
  [] ? sg_rq_end_io+0x42/0x277
  [] ? warn_slowpath_null+0xd/0x10
  [] ? sg_rq_end_io+0x42/0x277
  [] ? blk_account_io_done+0xc/0xea
  [] ? blk_finish_request+0x63/0x84
  [] ? scsi_end_request+0x11a/0x155
  [] ? scsi_io_completion+0x1c7/0x4aa
  [] ? blk_done_softirq+0x56/0x70
  [] ? __do_softirq+0xa6/0x18b
  [] ? tasklet_action+0x90/0x90
  [] ? do_softirq_own_stack+0x1a/0x1f
[] ? irq_exit+0x3c/0x7b
  [] ? do_IRQ+0x7a/0x8b
  [] ? common_interrupt+0x33/0x38
  [] ? unregister_console+0x87/0xad
  [] ? cpuidle_enter_state+0xdb/0x17d
  [] ? cpu_startup_entry+0x17f/0x1e2
  [] ? start_secondary+0x13b/0x153
 ---[ end trace 36fc4958c27e64df ]---

and norton ghost 12

*
Date   : Wed Nov 29 22:34:34 2017
Error Number: 36000
Message: A signal or windows exception occurred
Version: 12.0.0.10561(Nov  2 2017, Build=10561
OS Version: Linux 4.4.100-pclos1.pae #1 SMP Tue Nov 21 16:10:25 EET 2017 i686
Command line arguments:
Active Switches :
   AutoName
PathName:
FlagImplode : 0
FlagExplode : 0

Operation Details :
  Total size.0
  MB copied..0
  MB remaining...0
  Percent complete...0%
  Speed..0 MB/min
  Time elapsed...0:00  
  Time remaining.0:00  

Processor exception
Generated at HardExceptionHandlerLinux.cpp:230

Program Call Stack
VolumeContainerManagerLvm::addIfPv
VolumeContainerManagerLvm::loadPvs
VolumeContainerManagerLvm::loadVolumeGroups
VolumeContainerManagerLvm::loadVolumeGroups
VolumeContainerManagerLvm::constructor
VolumeContainerManager::loadVolumeManagers
VolumeContainerManager::constructor
sub_main
main

Call Stack
  0xeba0f62a
  0xb77e8a0c
  0x084644f6
  0x084645ce
  0x086ad8e0
  0x086b3aac
  0x0870f00e
  0x0870f3a3
  0x0870f51b
  0x0870f7ab
  0x0870f8de
  0x086ff373
  0x086ffdab
  0x08700039
  0x08284c59
  0x08286679
  0x082893df
  0x08052d25
  0x080541b4
End Call Stack

Anyway i reversed the scsi: sg: changes introduced in 4.4.89 and everything
works fine (see the attach) - now i'm running 4.4.100 with the patch attached
thx!

-- 
You are receiving this mail because:
You are the assignee for the bug.


Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()

2017-12-04 Thread Ming Lei
On Mon, Dec 04, 2017 at 09:44:55AM +0100, Johannes Thumshirn wrote:
> Ming Lei  writes:
> 
> 
> > I am happy to do that, but recently I am very busy, so it may be done
> > a bit late by me.
> >
> > But anyone should reproduce the issue 100% with V4.15-rc kernel by just
> > running the above script, not any specific hardware is required at all,
> > so that means anyone can make a patch for blktest to test block/SCSI
> > timeout if he/she is interested in doing that.
> 
> OK, let me see if I can spent some time on this the next days.

That is great, thanks!

-- 
Ming


Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()

2017-12-04 Thread Johannes Thumshirn
Ming Lei  writes:


> I am happy to do that, but recently I am very busy, so it may be done
> a bit late by me.
>
> But anyone should reproduce the issue 100% with V4.15-rc kernel by just
> running the above script, not any specific hardware is required at all,
> so that means anyone can make a patch for blktest to test block/SCSI
> timeout if he/she is interested in doing that.

OK, let me see if I can spent some time on this the next days.

Byte,
Johannes
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()

2017-12-04 Thread Ming Lei
On Mon, Dec 04, 2017 at 09:19:33AM +0100, Johannes Thumshirn wrote:
> 
> Hi Ming,
> 
> Ming Lei  writes:
> > This issue can be triggered by the following script:
> >
> > #!/bin/sh
> > rmmod scsi_debug
> > modprobe scsi_debug max_queue=1
> >
> > DEVICE=`ls -d 
> > /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head 
> > -1 | xargs basename`
> >
> > DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*`
> >
> > echo "using scsi device $DEVICE"
> > echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth
> > echo starting loop $i
> > echo "temporary write through" >$DISK_DIR/cache_type
> > echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts
> > echo none > /sys/block/$DEVICE/queue/scheduler
> > dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 &
> > sleep 5
> > echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts
> > wait
> > echo "SUCCESS"
> 
> Can you please submit a test-case for blktest as well, given you have a
> nice reproducer?

Hi Johannes,

I am happy to do that, but recently I am very busy, so it may be done
a bit late by me.

But anyone should reproduce the issue 100% with V4.15-rc kernel by just
running the above script, not any specific hardware is required at all,
so that means anyone can make a patch for blktest to test block/SCSI
timeout if he/she is interested in doing that.

Thanks,
Ming


Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()

2017-12-04 Thread Johannes Thumshirn

Hi Ming,

Ming Lei  writes:
> This issue can be triggered by the following script:
>
>   #!/bin/sh
>   rmmod scsi_debug
>   modprobe scsi_debug max_queue=1
>
>   DEVICE=`ls -d 
> /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 
> | xargs basename`
>
>   DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*`
>
>   echo "using scsi device $DEVICE"
>   echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth
>   echo starting loop $i
>   echo "temporary write through" >$DISK_DIR/cache_type
>   echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts
>   echo none > /sys/block/$DEVICE/queue/scheduler
>   dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 &
>   sleep 5
>   echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts
>   wait
>   echo "SUCCESS"

Can you please submit a test-case for blktest as well, given you have a
nice reproducer?

Thanks,
Johannes
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850