[PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle
Before commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq"), we run queue after 3ms if queue is idle and SCSI device queue isn't ready, which is done in handling BLK_STS_RESOURCE. After commit 0df21c86bdbf is introduced, queue won't be run any more under this situation. IO hang is observed when timeout happened, and this patch fixes the IO hang issue by running queue after delay in scsi_dev_queue_ready, just like non-mq. This issue can be triggered by the following script[1]. There is another issue which can be covered by running idle queue: when .get_budget() is called on request coming from hctx->dispatch_list, if one request just completes during .get_budget(), we can't depend on SCSI's restart to make progress any more. This patch fixes the race too. With this patch, we basically recover to previous behaviour(before commit 0df21c86bdbf) of handling idle queue when running out of resource. [1] script for test/verify SCSI timeout rmmod scsi_debug modprobe scsi_debug max_queue=1 DEVICE=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename` DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*` echo "using scsi device $DEVICE" echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth echo "temporary write through" >$DISK_DIR/cache_type echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts echo none > /sys/block/$DEVICE/queue/scheduler dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 & sleep 5 echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts wait echo "SUCCESS" Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq") Signed-off-by: Ming Lei --- drivers/scsi/scsi_lib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index db9556662e27..1816dd8259b3 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1967,6 +1967,8 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx) out_put_device: put_device(&sdev->sdev_gendev); out: + if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev)) + blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY); return false; } -- 2.9.5
Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()
On Tue, Dec 05, 2017 at 01:16:24PM +0800, Ming Lei wrote: > On Mon, Dec 04, 2017 at 11:48:07PM +, Holger Hoffstätte wrote: > > On Tue, 05 Dec 2017 06:45:08 +0800, Ming Lei wrote: > > > > > On Mon, Dec 04, 2017 at 03:09:20PM +, Bart Van Assche wrote: > > >> On Sun, 2017-12-03 at 00:31 +0800, Ming Lei wrote: > > >> > Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for > > >> > blk-mq") > > >> > > >> It might be safer to revert commit 0df21c86bdbf instead of trying to fix > > >> all > > >> issues introduced by that commit for kernel version v4.15 ... > > > > > > What are all issues in v4.15-rc? Up to now, it is the only issue reported, > > > and can be fixed by this simple patch, which one can be thought as cleanup > > > too. > > > > Even with this patch I've encountered at least one hang that > > seemed related. I'm using most of block/scsi-4.15 on top of 4.14 and > > the hang in question was on a rotating disk. It could be solved by > > activating > > a different scheduler on the hanging device; all hanging sync/df processes > > got > > unstuck and all was fine again, which leads me to believe that there is at > > least > > one more rare condition where delaying requests (as done in the budget > > patch) > > leads to a hang. > > > > This happened with mq-deadline which I was testing specifically to avoid > > any BFQ-related side effects. > > OK, this looks a new report. > > Without any log, we can't make any progress, and even we can't guess > what the issue is related with. > > Could you post your dmesg log(include the hang process stack trace)? And > dump the debugfs log by the following script when this hang happens? > > http://people.redhat.com/minlei/tests/tools/dump-blk-info > > BTW, you just need to pass the disk name to the script, such as: /dev/sda. Thinking of the issue further, this patch only covers case of scsi_set_blocked(), but don't consider the case in which .get_budget() is called inside blk_mq_dispatch_rq_list() for request coming from hctx->dispatch_list. If .get_budget() is called in both blk_mq_do_dispatch_sched() and blk_mq_do_dispatch_ctx(), we don't need to run queue if the queue is idle. But if it is called from blk_mq_dispatch_rq_list() for request coming from hctx->dispatch_list, we have to run queue if queue is idle, as before. So please ignore this patch, and will submit V2 for cover both cases. Thanks, Ming
Re: [PATCH v3 00/22] qla2xxx: Bug fixes for 4.15-rc2
Hi Martin, > On Dec 4, 2017, at 6:24 PM, Martin K. Petersen > wrote: > > > Himanshu, > >> drivers/scsi/qla2xxx/qla_def.h | 49 >> drivers/scsi/qla2xxx/qla_gs.c | 230 >> ++--- >> drivers/scsi/qla2xxx/qla_init.c| 69 +-- >> drivers/scsi/qla2xxx/qla_iocb.c| 13 --- >> drivers/scsi/qla2xxx/qla_isr.c | 7 +- >> drivers/scsi/qla2xxx/qla_mbx.c | 3 +- >> drivers/scsi/qla2xxx/qla_mid.c | 42 --- >> drivers/scsi/qla2xxx/qla_os.c | 78 ++--- >> drivers/scsi/qla2xxx/qla_target.c | 60 +++--- >> drivers/scsi/qla2xxx/qla_version.h | 2 +- >> 10 files changed, 405 insertions(+), 148 deletions(-) > > This looks pretty big for a series of bug fixes. Are all these patches > really candidates for 4.15 and stable backports all the way back to > 4.10? > > -- > Martin K. PetersenOracle Linux Engineering Yes Please. I would want them back ported to 4.10 since these issues were discovered in combination of 4.10/4.11 kernel. Thanks, - Himanshu
Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()
On Mon, Dec 04, 2017 at 11:48:07PM +, Holger Hoffstätte wrote: > On Tue, 05 Dec 2017 06:45:08 +0800, Ming Lei wrote: > > > On Mon, Dec 04, 2017 at 03:09:20PM +, Bart Van Assche wrote: > >> On Sun, 2017-12-03 at 00:31 +0800, Ming Lei wrote: > >> > Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for > >> > blk-mq") > >> > >> It might be safer to revert commit 0df21c86bdbf instead of trying to fix > >> all > >> issues introduced by that commit for kernel version v4.15 ... > > > > What are all issues in v4.15-rc? Up to now, it is the only issue reported, > > and can be fixed by this simple patch, which one can be thought as cleanup > > too. > > Even with this patch I've encountered at least one hang that > seemed related. I'm using most of block/scsi-4.15 on top of 4.14 and > the hang in question was on a rotating disk. It could be solved by activating > a different scheduler on the hanging device; all hanging sync/df processes got > unstuck and all was fine again, which leads me to believe that there is at > least > one more rare condition where delaying requests (as done in the budget patch) > leads to a hang. > > This happened with mq-deadline which I was testing specifically to avoid > any BFQ-related side effects. OK, this looks a new report. Without any log, we can't make any progress, and even we can't guess what the issue is related with. Could you post your dmesg log(include the hang process stack trace)? And dump the debugfs log by the following script when this hang happens? http://people.redhat.com/minlei/tests/tools/dump-blk-info BTW, you just need to pass the disk name to the script, such as: /dev/sda. -- Ming
[PATCH v2] scsi_debug: add cdb_len paramete
While testing "sd: Micro-optimize READ / WRITE CDB encoding" patches it was helpful to check various code paths associated with READ/WRITE 6, 10 and 16 byte cdb variants. There seems to be no user space "knobs" to twiddle use_10_for_rw and friends in the scsi_device structure. So add a parameter to scsi_debug called "cdb_len" for this purpose. Changes since v1: - address most of the concerns from Bart Van Assche - keep driver version and date and tie them into some responses (e.g. version becomes INQUIRY Revision field) Patch built on lk 4.15.0-rc2 Signed-off-by: Douglas Gilbert --- drivers/scsi/scsi_debug.c | 92 --- 1 file changed, 87 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c index e4f037f0f38b..691ce8f37d34 100644 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -6,7 +6,7 @@ * anything out of the ordinary is seen. * ^^^ Original ^^^ * - * Copyright (C) 2001 - 2016 Douglas Gilbert + * Copyright (C) 2001 - 2017 Douglas Gilbert * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -61,8 +61,8 @@ #include "scsi_logging.h" /* make sure inq_product_rev string corresponds to this version */ -#define SDEBUG_VERSION "1.86" -static const char *sdebug_version_date = "20160430"; +#define SDEBUG_VERSION "0187" /* format to fit INQUIRY revision field */ +static const char *sdebug_version_date = "20171202"; #define MY_NAME "scsi_debug" @@ -105,6 +105,7 @@ static const char *sdebug_version_date = "20160430"; * (id 0) containing 1 logical unit (lun 0). That is 1 device. */ #define DEF_ATO 1 +#define DEF_CDB_LEN 10 #define DEF_JDELAY 1 /* if > 0 unit is a jiffy */ #define DEF_DEV_SIZE_MB 8 #define DEF_DIF 0 @@ -571,6 +572,7 @@ static const struct opcode_info_t opcode_info_arr[SDEB_I_LAST_ELEMENT + 1] = { static int sdebug_add_host = DEF_NUM_HOST; static int sdebug_ato = DEF_ATO; +static int sdebug_cdb_len = DEF_CDB_LEN; static int sdebug_jdelay = DEF_JDELAY; /* if > 0 then unit is jiffies */ static int sdebug_dev_size_mb = DEF_DEV_SIZE_MB; static int sdebug_dif = DEF_DIF; @@ -797,6 +799,61 @@ static int scsi_debug_ioctl(struct scsi_device *dev, int cmd, void __user *arg) /* return -ENOTTY; // correct return but upsets fdisk */ } +static void config_cdb_len(struct scsi_device *sdev) +{ + switch (sdebug_cdb_len) { + case 6: /* suggest 6 byte READ, WRITE and MODE SENSE/SELECT */ + sdev->use_10_for_rw = false; + sdev->use_16_for_rw = false; + sdev->use_10_for_ms = false; + break; + case 10: /* suggest 10 byte RWs and 6 byte MODE SENSE/SELECT */ + sdev->use_10_for_rw = true; + sdev->use_16_for_rw = false; + sdev->use_10_for_ms = false; + break; + case 12: /* suggest 10 byte RWs and 10 byte MODE SENSE/SELECT */ + sdev->use_10_for_rw = true; + sdev->use_16_for_rw = false; + sdev->use_10_for_ms = true; + break; + case 16: + sdev->use_10_for_rw = false; + sdev->use_16_for_rw = true; + sdev->use_10_for_ms = true; + break; + case 32: /* No knobs to suggest this so same as 16 for now */ + sdev->use_10_for_rw = false; + sdev->use_16_for_rw = true; + sdev->use_10_for_ms = true; + break; + default: + pr_warn("unexpected cdb_len=%d, force to 10\n", + sdebug_cdb_len); + sdev->use_10_for_rw = true; + sdev->use_16_for_rw = false; + sdev->use_10_for_ms = false; + sdebug_cdb_len = 10; + break; + } +} + +static void all_config_cdb_len(void) +{ + struct sdebug_host_info *sdbg_host; + struct Scsi_Host *shost; + struct scsi_device *sdev; + + spin_lock(&sdebug_host_list_lock); + list_for_each_entry(sdbg_host, &sdebug_host_list, host_list) { + shost = sdbg_host->shost; + shost_for_each_device(sdev, shost) { + config_cdb_len(sdev); + } + } + spin_unlock(&sdebug_host_list_lock); +} + static void clear_luns_changed_on_target(struct sdebug_dev_info *devip) { struct sdebug_host_info *sdhp; @@ -955,7 +1012,7 @@ static int fetch_to_dev_buffer(struct scsi_cmnd *scp, unsigned char *arr, static char sdebug_inq_vendor_id[9] = "Linux "; static char sdebug_inq_product_id[17] = "scsi_debug "; -static char sdebug_inq_product_rev[5] = "0186";/* version less '.' */ +static char sdebug_inq_product_rev[5] = SDEBUG_VERSION; /* Use some locally assigned NAAs for SAS addresses. */
Re: [PATCH 1/2] scsi-mq: Only show the CDB if available
On Tue, Dec 05, 2017 at 01:59:51AM +, Bart Van Assche wrote: > On Tue, 2017-12-05 at 09:15 +0800, Ming Lei wrote: > > On Mon, Dec 04, 2017 at 04:38:08PM -0800, Bart Van Assche wrote: > > > Since the next patch will make it possible that scsi_show_rq() gets > > > called before the CDB pointer is changed into a non-NULL value, > > > only show the CDB if the CDB pointer is not NULL. Additionally, > > > show the request timeout and SCSI command flags. This patch also > > > fixes a bug that was reported by Ming Lei. See also Ming Lei, > > > scsi_debugfs: fix crash in scsi_show_rq(), linux-scsi, 7 November > > > 2017 (https://marc.info/?l=linux-block&m=151006655317188). > > > > Please cook a patch for fixing the crash issue only, since we need > > to backport the fix to stable kernel. > > The code that is touched by this patch is only used for kernel debugging. > I will do this if others agree with your opinion. No, do not mix two different things in one patch, especially the fix part need to be backported to stable. The fix part should aim at V4.15, and the other part can be a V4.16 stuff. -- Ming
Re: [PATCH v3 00/22] qla2xxx: Bug fixes for 4.15-rc2
Himanshu, > drivers/scsi/qla2xxx/qla_def.h | 49 > drivers/scsi/qla2xxx/qla_gs.c | 230 > ++--- > drivers/scsi/qla2xxx/qla_init.c| 69 +-- > drivers/scsi/qla2xxx/qla_iocb.c| 13 --- > drivers/scsi/qla2xxx/qla_isr.c | 7 +- > drivers/scsi/qla2xxx/qla_mbx.c | 3 +- > drivers/scsi/qla2xxx/qla_mid.c | 42 --- > drivers/scsi/qla2xxx/qla_os.c | 78 ++--- > drivers/scsi/qla2xxx/qla_target.c | 60 +++--- > drivers/scsi/qla2xxx/qla_version.h | 2 +- > 10 files changed, 405 insertions(+), 148 deletions(-) This looks pretty big for a series of bug fixes. Are all these patches really candidates for 4.15 and stable backports all the way back to 4.10? -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 1/2] scsi-mq: Only show the CDB if available
On Mon, Dec 04, 2017 at 10:42:28PM -0500, Martin K. Petersen wrote: > > Hi Ming, > > > Please cook a patch for fixing the crash issue only, since we need > > to backport the fix to stable kernel. > > I thought you were going to submit a V5 that addressed James' concerns? > > -- > Martin K. PetersenOracle Linux Engineering Hi Martin, I replied in the following link for James's concerns: https://marc.info/?l=linux-block&m=151074751321108&w=2 The fact is that use-after-free can't avoided at all, no matter if we set the cmnd to NULL before calling free, that means we have to handle use-after-free well in scsi_show_rq(), so we don't need to touch the free code. So V4 is well enough for merge, IMO. Thanks, Ming
Re: [PATCH 1/2] scsi-mq: Only show the CDB if available
Hi Ming, > Please cook a patch for fixing the crash issue only, since we need > to backport the fix to stable kernel. I thought you were going to submit a V5 that addressed James' concerns? -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] ibmvscsis: add DRC indices to debug statements
Bryant, > Where applicable, changes pr_debug, pr_info, pr_err, etc. calls > to the dev_* versions. This adds the DRC index of the device to the > corresponding trace statement. Applied to 4.16/scsi-queue, thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: csiostor: fix spelling mistake: "Couldnt" -> "Couldn't"
Colin, > Trivial fix to spelling mistake in error message text. Applied to 4.16/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: ipr: fix incorrect indentation of assignment statement
Colin, > Remove one extraneous level of indentation on an assignment statement. Applied to 4.16/scsi-queue, thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: bnx2fc: fix spelling mistake: "Couldnt" -> "Couldn't"
Colin, > Trivial fix to spelling mistake in error message text. Applied to 4.16/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: sd: add missing KERN_CONT for disk spin-up
Michał, > KERN_CONT is now required for continued printks(). Add it. Applied to 4.16/scsi-queue. Thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 1/2] scsi: ufs: add some definition included in UFS HCI specifications
> These would be used in the future in some specific drivers. Applied to 4.16/scsi-queue. Thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] mpt3sas: Remove unused variable requeue_event
Suganath, > No Functional change just cleanup, > Removed variable requeue_event and made function as void. Applied to 4.16/scsi-queue. Thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v15 4/5] scsi: mpt3sas: Replace PCI pool old API
Romain, > The PCI pool API is deprecated. This commit replaces the PCI pool old > API by the appropriate function with the DMA pool API. Applied to 4.16/scsi-queue. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] [SCSI] fnic: Fix coccinelle warnings
Vasyl, > Remove the duplicate copies of this simple function and use an > open-coded version. Applied to 4.16/scsi-queue. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: scsi_devinfo: handle non-terminated strings
Martin, > devinfo->vendor and devinfo->model aren't necessarily > zero-terminated. Applied to 4.15/scsi-fixes. Thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi_devinfo: cleanly zero-pad devinfo strings
Martin, > Cleanly fill memory for "vendor" and "model" with 0-bytes for the > "compatible" case rather than adding only a single 0 byte. This > simplifies the devinfo code a a bit, and avoids mistakes in other > places of the code (not in current upstream, but we had one such > mistake in the SUSE kernel). Applied to 4.15/scsi-fixes. Thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 1/2] scsi-mq: Only show the CDB if available
On Tue, 2017-12-05 at 09:15 +0800, Ming Lei wrote: > On Mon, Dec 04, 2017 at 04:38:08PM -0800, Bart Van Assche wrote: > > Since the next patch will make it possible that scsi_show_rq() gets > > called before the CDB pointer is changed into a non-NULL value, > > only show the CDB if the CDB pointer is not NULL. Additionally, > > show the request timeout and SCSI command flags. This patch also > > fixes a bug that was reported by Ming Lei. See also Ming Lei, > > scsi_debugfs: fix crash in scsi_show_rq(), linux-scsi, 7 November > > 2017 (https://marc.info/?l=linux-block&m=151006655317188). > > Please cook a patch for fixing the crash issue only, since we need > to backport the fix to stable kernel. The code that is touched by this patch is only used for kernel debugging. I will do this if others agree with your opinion. Bart.
Re: [PATCH] blk-mq: Fix several SCSI request queue lockups
On Tue, Dec 05, 2017 at 01:13:43AM +, Bart Van Assche wrote: > On Tue, 2017-12-05 at 09:04 +0800, Ming Lei wrote: > > Then no reason to revert commit(0df21c86bdbf scsi: implement .get_budget an > > .put_budget for blk-mq) for one issue which may never happen in reality > > since > > this reproducer need out-of-tree patch. > > Sorry but I disagree completely. You seem to overlook that there may be other > circumstances that trigger the same lockup, e.g. a SCSI queue full condition. If the scsi_dev_queue_ready() returns false, .get_budget() catches that and never add request to hctx->dispatch. And scsi_host_queue_ready() always returns true, since we respect per-host queue depth by blk_mq_get_driver_tag() before calling .queue_rq(). Or if I miss other cases, please point it out. -- Ming
Re: [PATCH 1/2] scsi-mq: Only show the CDB if available
On Mon, Dec 04, 2017 at 04:38:08PM -0800, Bart Van Assche wrote: > Since the next patch will make it possible that scsi_show_rq() gets > called before the CDB pointer is changed into a non-NULL value, > only show the CDB if the CDB pointer is not NULL. Additionally, > show the request timeout and SCSI command flags. This patch also > fixes a bug that was reported by Ming Lei. See also Ming Lei, > scsi_debugfs: fix crash in scsi_show_rq(), linux-scsi, 7 November > 2017 (https://marc.info/?l=linux-block&m=151006655317188). Please cook a patch for fixing the crash issue only, since we need to backport the fix to stable kernel. > > Signed-off-by: Bart Van Assche > Cc: James E.J. Bottomley > Cc: Martin K. Petersen > Cc: Ming Lei > Cc: Christoph Hellwig > Cc: Hannes Reinecke > Cc: Johannes Thumshirn Please Cc: > --- > drivers/scsi/scsi_debugfs.c | 47 > - > 1 file changed, 42 insertions(+), 5 deletions(-) > > diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c > index 01f08c03f2c1..37ed6bb8e6ec 100644 > --- a/drivers/scsi/scsi_debugfs.c > +++ b/drivers/scsi/scsi_debugfs.c > @@ -4,13 +4,50 @@ > #include > #include "scsi_debugfs.h" > > +#define SCSI_CMD_FLAG_NAME(name) [ilog2(SCMD_##name)] = #name > +static const char *const scsi_cmd_flags[] = { > + SCSI_CMD_FLAG_NAME(TAGGED), > + SCSI_CMD_FLAG_NAME(UNCHECKED_ISA_DMA), > + SCSI_CMD_FLAG_NAME(ZONE_WRITE_LOCK), > + SCSI_CMD_FLAG_NAME(INITIALIZED), > +}; > +#undef SCSI_CMD_FLAG_NAME > + > +static int scsi_flags_show(struct seq_file *m, const unsigned long flags, > +const char *const *flag_name, int flag_name_count) > +{ > + bool sep = false; > + int i; > + > + for (i = 0; i < sizeof(flags) * BITS_PER_BYTE; i++) { > + if (!(flags & BIT(i))) > + continue; > + if (sep) > + seq_puts(m, "|"); > + sep = true; > + if (i < flag_name_count && flag_name[i]) > + seq_puts(m, flag_name[i]); > + else > + seq_printf(m, "%d", i); > + } > + return 0; > +} > + > void scsi_show_rq(struct seq_file *m, struct request *rq) > { > struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req); > - int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc); > - char buf[80]; > + int alloc_ms = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc); > + int timeout_ms = jiffies_to_msecs(rq->timeout); > + const u8 *const cdb = READ_ONCE(cmd->cmnd); > + char buf[80] = "(?)"; > > - __scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len); > - seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf, > -cmd->retries, msecs / 1000, msecs % 1000); > + if (cdb) > + __scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len); > + seq_printf(m, ", .cmd=%s, .retries=%d, .result = %#x, .flags=", buf, > +cmd->retries, cmd->result); > + scsi_flags_show(m, cmd->flags, scsi_cmd_flags, > + ARRAY_SIZE(scsi_cmd_flags)); > + seq_printf(m, ", .timeout=%d.%03d, allocated %d.%03d s ago", > +timeout_ms / 1000, timeout_ms % 1000, > +alloc_ms / 1000, alloc_ms % 1000); > } > -- > 2.15.0 > -- Ming
Re: [PATCH] blk-mq: Fix several SCSI request queue lockups
On Tue, 2017-12-05 at 09:04 +0800, Ming Lei wrote: > Then no reason to revert commit(0df21c86bdbf scsi: implement .get_budget an > .put_budget for blk-mq) for one issue which may never happen in reality since > this reproducer need out-of-tree patch. Sorry but I disagree completely. You seem to overlook that there may be other circumstances that trigger the same lockup, e.g. a SCSI queue full condition. Bart.
Re: [PATCH] blk-mq: Fix several SCSI request queue lockups
On Tue, Dec 05, 2017 at 12:29:59AM +, Bart Van Assche wrote: > On Tue, 2017-12-05 at 08:20 +0800, Ming Lei wrote: > > Also it is a bit odd to see request in hctx->dispatch now, and it can only > > happen now when scsi_target_queue_ready() returns false, so I guess you > > apply > > some change on target->can_queue(such as setting it as 1 in srp/ib code > > manually)? > > Yes, but that had already been mentioned. From the e-mail at the start of > this e-mail thread: "Change the SRP initiator such that SCSI target queue > depth is limited to 1." The changes I made in the SRP initiator are the same > as those described in the following message from about one month ago: > https://www.spinics.net/lists/linux-scsi/msg114720.html. OK, got it. Then no reason to revert commit(0df21c86bdbf scsi: implement .get_budget an .put_budget for blk-mq) for one issue which may never happen in reality since this reproducer need out-of-tree patch. I don't mean it isn't a issue, but I don't think it has top priority for reverting commit 0df21c86bdbf. Especially there isn't proof shown that 0df21c86bdbf causes this issue since this commit won't change run queue for requests in hctx->dispatch_list. I's like to take a look if someone'd like to cooperate, such as providing kernel log, test debug patch, and kind of things. Or when I get this hardware to reproduce. -- Ming
[PATCH 1/2] scsi-mq: Only show the CDB if available
Since the next patch will make it possible that scsi_show_rq() gets called before the CDB pointer is changed into a non-NULL value, only show the CDB if the CDB pointer is not NULL. Additionally, show the request timeout and SCSI command flags. This patch also fixes a bug that was reported by Ming Lei. See also Ming Lei, scsi_debugfs: fix crash in scsi_show_rq(), linux-scsi, 7 November 2017 (https://marc.info/?l=linux-block&m=151006655317188). Signed-off-by: Bart Van Assche Cc: James E.J. Bottomley Cc: Martin K. Petersen Cc: Ming Lei Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Johannes Thumshirn --- drivers/scsi/scsi_debugfs.c | 47 - 1 file changed, 42 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c index 01f08c03f2c1..37ed6bb8e6ec 100644 --- a/drivers/scsi/scsi_debugfs.c +++ b/drivers/scsi/scsi_debugfs.c @@ -4,13 +4,50 @@ #include #include "scsi_debugfs.h" +#define SCSI_CMD_FLAG_NAME(name) [ilog2(SCMD_##name)] = #name +static const char *const scsi_cmd_flags[] = { + SCSI_CMD_FLAG_NAME(TAGGED), + SCSI_CMD_FLAG_NAME(UNCHECKED_ISA_DMA), + SCSI_CMD_FLAG_NAME(ZONE_WRITE_LOCK), + SCSI_CMD_FLAG_NAME(INITIALIZED), +}; +#undef SCSI_CMD_FLAG_NAME + +static int scsi_flags_show(struct seq_file *m, const unsigned long flags, + const char *const *flag_name, int flag_name_count) +{ + bool sep = false; + int i; + + for (i = 0; i < sizeof(flags) * BITS_PER_BYTE; i++) { + if (!(flags & BIT(i))) + continue; + if (sep) + seq_puts(m, "|"); + sep = true; + if (i < flag_name_count && flag_name[i]) + seq_puts(m, flag_name[i]); + else + seq_printf(m, "%d", i); + } + return 0; +} + void scsi_show_rq(struct seq_file *m, struct request *rq) { struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req); - int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc); - char buf[80]; + int alloc_ms = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc); + int timeout_ms = jiffies_to_msecs(rq->timeout); + const u8 *const cdb = READ_ONCE(cmd->cmnd); + char buf[80] = "(?)"; - __scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len); - seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf, - cmd->retries, msecs / 1000, msecs % 1000); + if (cdb) + __scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len); + seq_printf(m, ", .cmd=%s, .retries=%d, .result = %#x, .flags=", buf, + cmd->retries, cmd->result); + scsi_flags_show(m, cmd->flags, scsi_cmd_flags, + ARRAY_SIZE(scsi_cmd_flags)); + seq_printf(m, ", .timeout=%d.%03d, allocated %d.%03d s ago", + timeout_ms / 1000, timeout_ms % 1000, + alloc_ms / 1000, alloc_ms % 1000); } -- 2.15.0
[PATCH 2/2] blk-mq-debugfs: Also show requests that have not yet been started
When debugging e.g. the SCSI timeout handler it is important that requests that have not yet been started or that already have completed are also reported through debugfs. Signed-off-by: Bart Van Assche Cc: Ming Lei Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: Martin K. Petersen --- block/blk-mq-debugfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index f7db73f1698e..886b37163f17 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -409,7 +409,7 @@ static void hctx_show_busy_rq(struct request *rq, void *data, bool reserved) const struct show_busy_params *params = data; if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx && - test_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) + list_empty(&rq->queuelist)) __blk_mq_debugfs_rq_show(params->m, list_entry_rq(&rq->queuelist)); } -- 2.15.0
[PATCH 0/2] Show commands stuck in a timeout handler in debugfs
Hello Jens, While debugging an issue with the SCSI error handler I noticed that commands that got stuck in that error handler are not shown in debugfs. That is very annoying for anyone who relies on the information in debugfs for root-causing such an issue. Hence this patch series that makes sure that commands that got stuck in a block driver timeout handler are also shown in debugfs. Please consider these patches for kernel v4.16. Thanks, Bart. Bart Van Assche (2): scsi-mq: Only show the CDB if available blk-mq-debugfs: Also show requests that have not yet been started block/blk-mq-debugfs.c | 2 +- drivers/scsi/scsi_debugfs.c | 47 - 2 files changed, 43 insertions(+), 6 deletions(-) -- 2.15.0
Re: [PATCH] blk-mq: Fix several SCSI request queue lockups
On Tue, 2017-12-05 at 08:20 +0800, Ming Lei wrote: > Also it is a bit odd to see request in hctx->dispatch now, and it can only > happen now when scsi_target_queue_ready() returns false, so I guess you apply > some change on target->can_queue(such as setting it as 1 in srp/ib code > manually)? Yes, but that had already been mentioned. From the e-mail at the start of this e-mail thread: "Change the SRP initiator such that SCSI target queue depth is limited to 1." The changes I made in the SRP initiator are the same as those described in the following message from about one month ago: https://www.spinics.net/lists/linux-scsi/msg114720.html. Bart.
Re: [PATCH] blk-mq: Fix several SCSI request queue lockups
On Mon, Dec 04, 2017 at 11:32:27PM +, Bart Van Assche wrote: > On Tue, 2017-12-05 at 07:01 +0800, Ming Lei wrote: > > On Mon, Dec 04, 2017 at 10:48:18PM +, Bart Van Assche wrote: > > > On Tue, 2017-12-05 at 06:42 +0800, Ming Lei wrote: > > > > On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote: > > > > > * A systematic lockup for SCSI queues with queue depth 1. The > > > > > following test reproduces that bug systematically: > > > > > - Change the SRP initiator such that SCSI target queue depth is > > > > > limited to 1. > > > > > - Run the following command: > > > > > srp-test/run_tests -f xfs -d -e none -r 60 -t 01 > > > > > See also "[PATCH 4/7] blk-mq: Avoid that request processing > > > > > stalls when sharing tags" > > > > > (https://marc.info/?l=linux-block&m=151208695316857). Note: > > > > > reverting commit 0df21c86bdbf also fixes a sporadic SCSI request > > > > > queue lockup while inserting a blk_mq_sched_mark_restart_hctx() > > > > > before all blk_mq_dispatch_rq_list() calls only fixes the > > > > > systematic lockup for queue depth 1. > > > > > > > > You are the only reproducer [ ... ] > > > > > > That's not correct. I'm pretty sure if you try to reproduce this that > > > you will see the same hang I ran into. Does this mean that you have not > > > yet tried to reproduce the hang I reported? > > > > Do you mean every kernel developer has to own one SRP/IB hardware? > > When I have the time I will make it possible to run this test on any system > equipped with at least one Ethernet port. But the fact that the test I > mentioned requires IB hardware should not prevent you from running this test > since you have run this test software before. > > > I don't have your hardware to reproduce that, > > That's not true. Your employer definitely owns IB hardware. E.g. the > following message shows that you have run the srp-test yourself on IB hardware > only four weeks ago: > > https://www.spinics.net/lists/linux-block/msg19511.html The hardware belongs to Laurence, at that time I can borrow from him, and now I am not sure if it is available. > > > Otherwise, there should have be such similar reports from others, not from > > only you. > > That's not correct either. How long was it ago that kernel v4.15-rc1 was > released? One week? How many SRP users do you think have tried to trigger a > queue full condition with that kernel version? OK, we can wait for further reporters to provide kernel log if you don't want. > > > More importantly I don't understand why you can't share the kernel > > log/debugfs log when IO hang happens? > > > > Without any kernel log, how can we confirm that it is a valid report? > > It's really unfortunate that you are focussing on denying that the bug I > reported exists instead of trying to fix the bugs introduced by commit If you look at bug reports in kenrel mail list, you will see most of reports includes some kind of log, that is a common practice to report issue with log attached. It can save us much time for talking in mails. > b347689ffbca. BTW, I have more than enough experience to decide myself what > a valid report is and what not. I can easily send you several MB of kernel As I mentioned, only dmesg with hang trace and debugfs log should be enough, both can't be so big, right? > logs. The reason I have not yet done this is because I'm 99.9% sure that > these won't help to root cause the reported issue. But I can tell you what That is your opinion, most of times, I can find some clue from debugfs about hang issue, then I can try to add trace just in some possible places for narrowing down the issue. > I learned from analyzing the information under /sys/kernel/debug/block: > every time a hang occurred I noticed that no requests were "busy", that two > requests occurred in rq_lists and one request occurred in a hctx dispatch Then what do other attributes show? Like queue/hctx state? The following script can get all this info easily: http://people.redhat.com/minlei/tests/tools/dump-blk-info Also it is a bit odd to see request in hctx->dispatch now, and it can only happen now when scsi_target_queue_ready() returns false, so I guess you apply some change on target->can_queue(such as setting it as 1 in srp/ib code manually)? Please reply, if yes, I will try to see if I can reproduce it with this kind of change on scsi_debug. > list. This is enough to conclude that a queue run was missing. And I think In this case, seems it isn't related with both commit b347689ff and 0df21c86bdbf, since both don't change RESTART for hctx->dispatch, and shouldn't affect run queue. > that the patch at the start of this e-mail thread not only shows that the > root cause is in the block layer but also that this bug was introduced by > commit b347689ffbca. > > > > > You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched: > > > > improve dispatching from sw queue")', but you don't mention any i
Re: [PATCH] blk-mq: Fix several SCSI request queue lockups
On Tue, 2017-12-05 at 07:01 +0800, Ming Lei wrote: > On Mon, Dec 04, 2017 at 10:48:18PM +, Bart Van Assche wrote: > > On Tue, 2017-12-05 at 06:42 +0800, Ming Lei wrote: > > > On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote: > > > > * A systematic lockup for SCSI queues with queue depth 1. The > > > > following test reproduces that bug systematically: > > > > - Change the SRP initiator such that SCSI target queue depth is > > > > limited to 1. > > > > - Run the following command: > > > > srp-test/run_tests -f xfs -d -e none -r 60 -t 01 > > > > See also "[PATCH 4/7] blk-mq: Avoid that request processing > > > > stalls when sharing tags" > > > > (https://marc.info/?l=linux-block&m=151208695316857). Note: > > > > reverting commit 0df21c86bdbf also fixes a sporadic SCSI request > > > > queue lockup while inserting a blk_mq_sched_mark_restart_hctx() > > > > before all blk_mq_dispatch_rq_list() calls only fixes the > > > > systematic lockup for queue depth 1. > > > > > > You are the only reproducer [ ... ] > > > > That's not correct. I'm pretty sure if you try to reproduce this that > > you will see the same hang I ran into. Does this mean that you have not > > yet tried to reproduce the hang I reported? > > Do you mean every kernel developer has to own one SRP/IB hardware? When I have the time I will make it possible to run this test on any system equipped with at least one Ethernet port. But the fact that the test I mentioned requires IB hardware should not prevent you from running this test since you have run this test software before. > I don't have your hardware to reproduce that, That's not true. Your employer definitely owns IB hardware. E.g. the following message shows that you have run the srp-test yourself on IB hardware only four weeks ago: https://www.spinics.net/lists/linux-block/msg19511.html > Otherwise, there should have be such similar reports from others, not from > only you. That's not correct either. How long was it ago that kernel v4.15-rc1 was released? One week? How many SRP users do you think have tried to trigger a queue full condition with that kernel version? > More importantly I don't understand why you can't share the kernel > log/debugfs log when IO hang happens? > > Without any kernel log, how can we confirm that it is a valid report? It's really unfortunate that you are focussing on denying that the bug I reported exists instead of trying to fix the bugs introduced by commit b347689ffbca. BTW, I have more than enough experience to decide myself what a valid report is and what not. I can easily send you several MB of kernel logs. The reason I have not yet done this is because I'm 99.9% sure that these won't help to root cause the reported issue. But I can tell you what I learned from analyzing the information under /sys/kernel/debug/block: every time a hang occurred I noticed that no requests were "busy", that two requests occurred in rq_lists and one request occurred in a hctx dispatch list. This is enough to conclude that a queue run was missing. And I think that the patch at the start of this e-mail thread not only shows that the root cause is in the block layer but also that this bug was introduced by commit b347689ffbca. > > > You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched: > > > improve dispatching from sw queue")', but you don't mention any issue > > > about that commit. > > > > That's not correct either. From the commit message "A systematic lockup > > for SCSI queues with queue depth 1." > > I mean you mentioned your patch can fix 'commit b347689ffbca > ("blk-mq-sched: improve dispatching from sw queue")', but you never > point where the commit b347689ffbca is wrong, how your patch fixes > the mistake of that commit. You should know that it is not required to perform a root cause analysis before posting a revert. Having performed a bisect is sufficient. BTW, it seems like you forgot that last Friday I explained to you that there is an obvious bug in commit b347689ffbca, namely that a blk_mq_sched_mark_restart_hctx() call is missing in blk_mq_sched_dispatch_requests() before the blk_mq_do_dispatch_ctx() call. See also https://marc.info/?l=linux-block&m=151215794224401. Bart.
Re: [PATCH] blk-mq: Fix several SCSI request queue lockups
On Mon, Dec 04, 2017 at 10:48:18PM +, Bart Van Assche wrote: > On Tue, 2017-12-05 at 06:42 +0800, Ming Lei wrote: > > On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote: > > > * A systematic lockup for SCSI queues with queue depth 1. The > > > following test reproduces that bug systematically: > > > - Change the SRP initiator such that SCSI target queue depth is > > > limited to 1. > > > - Run the following command: > > > srp-test/run_tests -f xfs -d -e none -r 60 -t 01 > > > See also "[PATCH 4/7] blk-mq: Avoid that request processing > > > stalls when sharing tags" > > > (https://marc.info/?l=linux-block&m=151208695316857). Note: > > > reverting commit 0df21c86bdbf also fixes a sporadic SCSI request > > > queue lockup while inserting a blk_mq_sched_mark_restart_hctx() > > > before all blk_mq_dispatch_rq_list() calls only fixes the > > > systematic lockup for queue depth 1. > > > > You are the only reproducer [ ... ] > > That's not correct. I'm pretty sure if you try to reproduce this that > you will see the same hang I ran into. Does this mean that you have not > yet tried to reproduce the hang I reported? Do you mean every kernel developer has to own one SRP/IB hardware? I don't have your hardware to reproduce that, and I don't think most of guys have that. Otherwise, there should have be such similar reports from others, not from only you. More importantly I don't understand why you can't share the kernel log/debugfs log when IO hang happens? Without any kernel log, how can we confirm that it is a valid report? > > > You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched: > > improve dispatching from sw queue")', but you don't mention any issue > > about that commit. > > That's not correct either. From the commit message "A systematic lockup > for SCSI queues with queue depth 1." I mean you mentioned your patch can fix 'commit b347689ffbca ("blk-mq-sched: improve dispatching from sw queue")', but you never point where the commit b347689ffbca is wrong, how your patch fixes the mistake of that commit. > > > > I think the above means that it is too risky to try to fix all bugs > > > introduced by commit 0df21c86bdbf before kernel v4.15 is released. > > > Hence revert that commit. > > > > What is the risk? > > That more bugs were introduced by commit 0df21c86bdbf than the ones that > have been discovered so far. If you don't provide any log, I have to ignore your report simply. So there is only one real issue which can be addressed easily by the following patch: https://marc.info/?l=linux-scsi&m=151223234607157&w=2 -- Ming
Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()
On Tue, 2017-12-05 at 06:45 +0800, Ming Lei wrote: > On Mon, Dec 04, 2017 at 03:09:20PM +, Bart Van Assche wrote: > > On Sun, 2017-12-03 at 00:31 +0800, Ming Lei wrote: > > > Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for > > > blk-mq") > > > > It might be safer to revert commit 0df21c86bdbf instead of trying to fix all > > issues introduced by that commit for kernel version v4.15 ... > > What are all issues in v4.15-rc? Up to now, it is the only issue reported, > and can be fixed by this simple patch, which one can be thought as cleanup > too. The three issues I described in the commit message of the patch that is available at: https://marc.info/?l=linux-block&m=151240866010572. Bart.
Re: [PATCH] blk-mq: Fix several SCSI request queue lockups
On Tue, 2017-12-05 at 06:42 +0800, Ming Lei wrote: > On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote: > > * A systematic lockup for SCSI queues with queue depth 1. The > > following test reproduces that bug systematically: > > - Change the SRP initiator such that SCSI target queue depth is > > limited to 1. > > - Run the following command: > > srp-test/run_tests -f xfs -d -e none -r 60 -t 01 > > See also "[PATCH 4/7] blk-mq: Avoid that request processing > > stalls when sharing tags" > > (https://marc.info/?l=linux-block&m=151208695316857). Note: > > reverting commit 0df21c86bdbf also fixes a sporadic SCSI request > > queue lockup while inserting a blk_mq_sched_mark_restart_hctx() > > before all blk_mq_dispatch_rq_list() calls only fixes the > > systematic lockup for queue depth 1. > > You are the only reproducer [ ... ] That's not correct. I'm pretty sure if you try to reproduce this that you will see the same hang I ran into. Does this mean that you have not yet tried to reproduce the hang I reported? > You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched: > improve dispatching from sw queue")', but you don't mention any issue > about that commit. That's not correct either. From the commit message "A systematic lockup for SCSI queues with queue depth 1." > > I think the above means that it is too risky to try to fix all bugs > > introduced by commit 0df21c86bdbf before kernel v4.15 is released. > > Hence revert that commit. > > What is the risk? That more bugs were introduced by commit 0df21c86bdbf than the ones that have been discovered so far. Bart.
[PATCH v3 21/22] qla2xxx: Fix memory leak in dual/target mode
When driver is loaded in Target/Dual mode, it creates QPair to support MQ and allocates resources for each QPair. This Qpair initialization is delayed until the FW personality is changed to Dual/Target mode by issuing chip reset. At the time of chip reset firmware is re-initilized in correct personality all the QPairs are initialized by sending MBC_INITIALIZE_MULTIQ (001Fh). This patch fixes memory leak by adding check to issue MBC_INITIALIZE_MULTIQ command only while deleting rsp/req queue when the flag is set for initiator mode, and clean up QPair resources correctly during the driver unload. This MBX does not need to be issued for Target/Dual mode because chip reset will reset ISP. Fixes: d65237c7f0860 ("scsi: qla2xxx: Fix mailbox failure while deleting Queue pairs") Cc: # 4.10+ Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_init.c | 4 +--- drivers/scsi/qla2xxx/qla_mid.c | 18 ++ 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c index 57b8f43c5980..58663df38627 100644 --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -8220,9 +8220,6 @@ int qla2xxx_delete_qpair(struct scsi_qla_host *vha, struct qla_qpair *qpair) int ret = QLA_FUNCTION_FAILED; struct qla_hw_data *ha = qpair->hw; - if (!vha->flags.qpairs_req_created && !vha->flags.qpairs_rsp_created) - goto fail; - qpair->delete_in_progress = 1; while (atomic_read(&qpair->ref_count)) msleep(500); @@ -8230,6 +8227,7 @@ int qla2xxx_delete_qpair(struct scsi_qla_host *vha, struct qla_qpair *qpair) ret = qla25xx_delete_req_que(vha, qpair->req); if (ret != QLA_SUCCESS) goto fail; + ret = qla25xx_delete_rsp_que(vha, qpair->rsp); if (ret != QLA_SUCCESS) goto fail; diff --git a/drivers/scsi/qla2xxx/qla_mid.c b/drivers/scsi/qla2xxx/qla_mid.c index 618ca272d01a..e538e6308885 100644 --- a/drivers/scsi/qla2xxx/qla_mid.c +++ b/drivers/scsi/qla2xxx/qla_mid.c @@ -575,14 +575,15 @@ qla25xx_free_rsp_que(struct scsi_qla_host *vha, struct rsp_que *rsp) int qla25xx_delete_req_que(struct scsi_qla_host *vha, struct req_que *req) { - int ret = -1; + int ret = QLA_SUCCESS; - if (req) { + if (req && vha->flags.qpairs_req_created) { req->options |= BIT_0; ret = qla25xx_init_req_que(vha, req); + if (ret != QLA_SUCCESS) + return QLA_FUNCTION_FAILED; } - if (ret == QLA_SUCCESS) - qla25xx_free_req_que(vha, req); + qla25xx_free_req_que(vha, req); return ret; } @@ -590,14 +591,15 @@ qla25xx_delete_req_que(struct scsi_qla_host *vha, struct req_que *req) int qla25xx_delete_rsp_que(struct scsi_qla_host *vha, struct rsp_que *rsp) { - int ret = -1; + int ret = QLA_SUCCESS; - if (rsp) { + if (rsp && vha->flags.qpairs_rsp_created) { rsp->options |= BIT_0; ret = qla25xx_init_rsp_que(vha, rsp); + if (ret != QLA_SUCCESS) + return QLA_FUNCTION_FAILED; } - if (ret == QLA_SUCCESS) - qla25xx_free_rsp_que(vha, rsp); + qla25xx_free_rsp_que(vha, rsp); return ret; } -- 2.12.0
[PATCH v3 18/22] qla2xxx: Defer processing of GS IOCB calls
From: Giridhar Malavali This patch defers processing of GS IOCB calls from interrupt context to avoid hardware spinlock recursion. Following stack trace is seen ? mod_timer+0x193/0x330 ? ql_dbg+0xa7/0xf0 [qla2xxx] _raw_spin_lock_irqsave+0x31/0x40 qla2x00_start_sp+0x3b/0x250 [qla2xxx] qla24xx_async_gnl+0x1d3/0x240 [qla2xxx] qla24xx_fcport_handle_login+0x285/0x290 [qla2xxx] ? vprintk_func+0x20/0x50 Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery") Cc: # 4.10+ Signed-off-by: Giridhar Malavali Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c index 7dd19785f820..57b8f43c5980 100644 --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -975,7 +975,7 @@ int qla24xx_fcport_handle_login(struct scsi_qla_host *vha, fc_port_t *fcport) ql_dbg(ql_dbg_disc, vha, 0x20bd, "%s %d %8phC post gnl\n", __func__, __LINE__, fcport->port_name); - qla24xx_async_gnl(vha, fcport); + qla24xx_post_gnl_work(vha, fcport); } else { ql_dbg(ql_dbg_disc, vha, 0x20bf, "%s %d %8phC post login\n", @@ -1143,7 +1143,7 @@ void qla24xx_handle_relogin_event(scsi_qla_host_t *vha, ql_dbg(ql_dbg_disc, vha, 0x20e9, "%s %d %8phC post gidpn\n", __func__, __LINE__, fcport->port_name); - qla24xx_async_gidpn(vha, fcport); + qla24xx_post_gidpn_work(vha, fcport); return; } -- 2.12.0
[PATCH v3 17/22] qla2xxx: Clear loop id after delete
From: Quinn Tran clear loop id after delete to prevent session invalidation of stale session. Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery") Cc: # 4.10+ Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_target.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index 1c219998ab60..0c0453f2ca9e 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -986,7 +986,7 @@ static void qlt_free_session_done(struct work_struct *work) sess->send_els_logo = 0; } - if (sess->logout_on_delete) { + if (sess->logout_on_delete && sess->loop_id != FC_NO_LOOP_ID) { int rc; rc = qla2x00_post_async_logout_work(vha, sess, NULL); @@ -1045,8 +1045,7 @@ static void qlt_free_session_done(struct work_struct *work) sess->login_succ = 0; } - if (sess->chip_reset != ha->base_qpair->chip_reset) - qla2x00_clear_loop_id(sess); + qla2x00_clear_loop_id(sess); if (sess->conflict) { sess->conflict->login_pause = 0; @@ -4600,9 +4599,9 @@ qlt_find_sess_invalidate_other(scsi_qla_host_t *vha, uint64_t wwn, "Invalidating sess %p loop_id %d wwn %llx.\n", other_sess, other_sess->loop_id, other_wwn); - other_sess->keep_nport_handle = 1; - *conflict_sess = other_sess; + if (other_sess->disc_state != DSC_DELETED) + *conflict_sess = other_sess; qlt_schedule_sess_for_deletion(other_sess, true); } -- 2.12.0
[PATCH v3 20/22] qla2xxx: Fix system crash in qlt_plogi_ack_unref
From: Quinn Tran Fix system crash due to NULL pointer access. qlt_plogi_ack_t and fc_port structures were not properly bound before calling qlt_plogi_ack_unref(). RIP: 0010:qlt_plogi_ack_unref+0xa1/0x150 [qla2xxx] Call Trace: qla24xx_create_new_sess+0xb1/0x320 [qla2xxx] qla2x00_do_work+0x123/0x260 [qla2xxx] qla2x00_iocb_work_fn+0x30/0x40 [qla2xxx] process_one_work+0x1f3/0x530 worker_thread+0x4e/0x480 kthread+0x10c/0x140 Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery") Cc: # 4.10+ Signed-off-by: Quinn Tran Signed-off-by: Giridhar Malavali Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_os.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index 2ec77b9f78b8..789030c9dd26 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -4750,11 +4750,11 @@ void qla24xx_create_new_sess(struct scsi_qla_host *vha, struct qla_work_evt *e) } else { list_add_tail(&fcport->list, &vha->vp_fcports); - if (pla) { - qlt_plogi_ack_link(vha, pla, fcport, - QLT_PLOGI_LINK_SAME_WWN); - pla->ref_count--; - } + } + if (pla) { + qlt_plogi_ack_link(vha, pla, fcport, + QLT_PLOGI_LINK_SAME_WWN); + pla->ref_count--; } } spin_unlock_irqrestore(&vha->hw->tgt.sess_lock, flags); -- 2.12.0
[PATCH v3 22/22] qla2xxx: Update driver version to 10.00.00.03-k
Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/qla2xxx/qla_version.h b/drivers/scsi/qla2xxx/qla_version.h index b6ec02b96d3d..911b82226d13 100644 --- a/drivers/scsi/qla2xxx/qla_version.h +++ b/drivers/scsi/qla2xxx/qla_version.h @@ -7,7 +7,7 @@ /* * Driver version */ -#define QLA2XXX_VERSION "10.00.00.02-k" +#define QLA2XXX_VERSION "10.00.00.03-k" #define QLA_DRIVER_MAJOR_VER 10 #define QLA_DRIVER_MINOR_VER 0 -- 2.12.0
[PATCH v3 19/22] qla2xxx: Remove aborting ELS IOCB call issued as part of timeout.
From: Giridhar Malavali This fix the spinlock recursion issue seen while unloading the driver. 14 [9f2e21e03db8] native_queued_spin_lock_slowpath at ad0d8802 15 [9f2e21e03dc0] do_raw_spin_lock at ad0d99e4 16 [9f2e21e03dd8] _raw_spin_lock_irqsave at ad652471 17 [9f2e21e03e00] qla2x00_els_dcmd_iocb_timeout at c070cd63 18 [9f2e21e03e40] qla2x00_sp_timeout at c06f06d3 [qla2xxx] 19 [9f2e21e03e68] call_timer_fn at ad0f97d8 20 [9f2e21e03ed8] run_timer_softirq at ad0faf47 21 [9f2e21e03f68] __softirqentry_text_start at ad655f32 Fixes: 6eb54715b54bb ("qla2xxx: Added interface to send explicit LOGO.") Cc: # 4.10+ Signed-off-by: Giridhar Malavali Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_iocb.c | 10 -- 1 file changed, 10 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c index 106f4ac4f733..8ea59586f4f1 100644 --- a/drivers/scsi/qla2xxx/qla_iocb.c +++ b/drivers/scsi/qla2xxx/qla_iocb.c @@ -2392,7 +2392,6 @@ qla2x00_els_dcmd_iocb_timeout(void *data) srb_t *sp = data; fc_port_t *fcport = sp->fcport; struct scsi_qla_host *vha = sp->vha; - struct qla_hw_data *ha = vha->hw; struct srb_iocb *lio = &sp->u.iocb_cmd; ql_dbg(ql_dbg_io, vha, 0x3069, @@ -2400,15 +2399,6 @@ qla2x00_els_dcmd_iocb_timeout(void *data) sp->name, sp->handle, fcport->d_id.b.domain, fcport->d_id.b.area, fcport->d_id.b.al_pa); - /* Abort the exchange */ - if (ha->isp_ops->abort_command(sp)) { - ql_dbg(ql_dbg_io, vha, 0x3070, - "mbx abort_command failed.\n"); - } else { - ql_dbg(ql_dbg_io, vha, 0x3071, - "mbx abort_command success.\n"); - } - complete(&lio->u.els_logo.comp); } -- 2.12.0
[PATCH v3 13/22] qla2xxx: Fix PRLI state check
From: Quinn Tran Get Port Database MBX cmd is to validate current Login state upon PRLI completion. Current code looks at the last login state for re-validation which was incorrect. This patch removed incorrect state check. Fixes: 15f30a5752287 ("qla2xxx: Use IOCB interface to submit non-critical MBX.") Cc: # 4.10+ Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_mbx.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_mbx.c b/drivers/scsi/qla2xxx/qla_mbx.c index cb717d47339f..e2b5fa47bb57 100644 --- a/drivers/scsi/qla2xxx/qla_mbx.c +++ b/drivers/scsi/qla2xxx/qla_mbx.c @@ -6160,8 +6160,7 @@ int __qla24xx_parse_gpdb(struct scsi_qla_host *vha, fc_port_t *fcport, } /* Check for logged in state. */ - if (current_login_state != PDS_PRLI_COMPLETE && - last_login_state != PDS_PRLI_COMPLETE) { + if (current_login_state != PDS_PRLI_COMPLETE) { ql_dbg(ql_dbg_mbx, vha, 0x119a, "Unable to verify login-state (%x/%x) for loop_id %x.\n", current_login_state, last_login_state, fcport->loop_id); -- 2.12.0
[PATCH v3 10/22] qla2xxx: Relogin to target port on a cable swap
From: Quinn Tran If user swaps one target port for another target port for same switch port, the new target port is not being recognized by the driver. Current code assumes that old Target port has recovered from link down. The fix will ask switch what is the WWPN of a specific NportID (GPNID) rather than assuming it's the same Target port which has came back. Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery") Cc: # 4.10+ Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_gs.c | 166 +- drivers/scsi/qla2xxx/qla_init.c | 6 +- drivers/scsi/qla2xxx/qla_os.c | 35 +++- drivers/scsi/qla2xxx/qla_target.c | 35 ++-- 4 files changed, 195 insertions(+), 47 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c index 59ecc4eda6cd..7d715e58901f 100644 --- a/drivers/scsi/qla2xxx/qla_gs.c +++ b/drivers/scsi/qla2xxx/qla_gs.c @@ -3171,43 +3171,136 @@ void qla24xx_async_gpnid_done(scsi_qla_host_t *vha, srb_t *sp) void qla24xx_handle_gpnid_event(scsi_qla_host_t *vha, struct event_arg *ea) { - fc_port_t *fcport; - unsigned long flags; + fc_port_t *fcport, *conflict, *t; - spin_lock_irqsave(&vha->hw->tgt.sess_lock, flags); - fcport = qla2x00_find_fcport_by_wwpn(vha, ea->port_name, 1); - spin_unlock_irqrestore(&vha->hw->tgt.sess_lock, flags); + ql_dbg(ql_dbg_disc, vha, 0x, + "%s %d port_id: %06x\n", + __func__, __LINE__, ea->id.b24); - if (fcport) { - /* cable moved. just plugged in */ - fcport->rscn_gen++; - fcport->d_id = ea->id; - fcport->scan_state = QLA_FCPORT_FOUND; - fcport->flags |= FCF_FABRIC_DEVICE; - - switch (fcport->disc_state) { - case DSC_DELETED: - ql_dbg(ql_dbg_disc, vha, 0x210d, - "%s %d %8phC login\n", __func__, __LINE__, - fcport->port_name); - qla24xx_fcport_handle_login(vha, fcport); - break; - case DSC_DELETE_PEND: - break; - default: - ql_dbg(ql_dbg_disc, vha, 0x2064, - "%s %d %8phC post del sess\n", - __func__, __LINE__, fcport->port_name); - qlt_schedule_sess_for_deletion_lock(fcport); - break; + if (ea->rc) { + /* cable is disconnected */ + list_for_each_entry_safe(fcport, t, &vha->vp_fcports, list) { + if (fcport->d_id.b24 == ea->id.b24) { + ql_dbg(ql_dbg_disc, vha, 0x, + "%s %d %8phC DS %d\n", + __func__, __LINE__, + fcport->port_name, + fcport->disc_state); + fcport->scan_state = QLA_FCPORT_SCAN; + switch (fcport->disc_state) { + case DSC_DELETED: + case DSC_DELETE_PEND: + break; + default: + ql_dbg(ql_dbg_disc, vha, 0x, + "%s %d %8phC post del sess\n", + __func__, __LINE__, + fcport->port_name); + qlt_schedule_sess_for_deletion_lock + (fcport); + break; + } + } } } else { - /* create new fcport */ - ql_dbg(ql_dbg_disc, vha, 0x2065, - "%s %d %8phC post new sess\n", - __func__, __LINE__, ea->port_name); + /* cable is connected */ + fcport = qla2x00_find_fcport_by_wwpn(vha, ea->port_name, 1); + if (fcport) { + list_for_each_entry_safe(conflict, t, &vha->vp_fcports, + list) { + if ((conflict->d_id.b24 == ea->id.b24) && + (fcport != conflict)) { + /* 2 fcports with conflict Nport ID or +* an existing fcport is having nport ID +* conflict with new fcport. +*/ + + ql_dbg(ql_dbg_disc, vha, 0x, + "%s %d %8phC DS %d\n", +
[PATCH v3 12/22] qla2xxx: Clear send ELS LOGO flag after target re-login
From: Quinn Tran This patch fixes clearing out els_send_logo flag at the time of session deletion. Fixes: 3515832cc614 ("scsi: qla2xxx: Reset the logo flag, after target re-login.") Cc: # 4.10+ Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_target.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index 283ff316e4b2..e824cdc77139 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -983,6 +983,7 @@ static void qlt_free_session_done(struct work_struct *work) logo.id = sess->d_id; logo.cmd_count = 0; qlt_send_first_logo(vha, &logo); + sess->send_els_logo = 0; } if (sess->logout_on_delete) { -- 2.12.0
[PATCH v3 07/22] qla2xxx: Serialize GPNID for multiple RSCN
From: Quinn Tran GPNID is triggered by RSCN. For multiple RSCNs of the same affected NPORT ID, serialize the GPNID to prevent confusion. Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery") Cc: # 4.10+ Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani --- drivers/scsi/qla2xxx/qla_def.h | 48 +++--- drivers/scsi/qla2xxx/qla_gs.c | 35 +- drivers/scsi/qla2xxx/qla_isr.c | 2 +- drivers/scsi/qla2xxx/qla_os.c | 1 + 4 files changed, 58 insertions(+), 28 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h index 01a9b8971e88..d9b4a0651a0f 100644 --- a/drivers/scsi/qla2xxx/qla_def.h +++ b/drivers/scsi/qla2xxx/qla_def.h @@ -315,6 +315,29 @@ struct srb_cmd { /* To identify if a srb is of T10-CRC type. @sp => srb_t pointer */ #define IS_PROT_IO(sp) (sp->flags & SRB_CRC_CTX_DSD_VALID) +/* + * 24 bit port ID type definition. + */ +typedef union { + uint32_t b24 : 24; + + struct { +#ifdef __BIG_ENDIAN + uint8_t domain; + uint8_t area; + uint8_t al_pa; +#elif defined(__LITTLE_ENDIAN) + uint8_t al_pa; + uint8_t area; + uint8_t domain; +#else +#error "__BIG_ENDIAN or __LITTLE_ENDIAN must be defined!" +#endif + uint8_t rsvd_1; + } b; +} port_id_t; +#define INVALID_PORT_ID0xFF + struct els_logo_payload { uint8_t opcode; uint8_t rsvd[3]; @@ -338,6 +361,7 @@ struct ct_arg { u32 rsp_size; void*req; void*rsp; + port_id_t id; }; /* @@ -499,6 +523,7 @@ typedef struct srb { const char *name; int iocbs; struct qla_qpair *qpair; + struct list_head elem; u32 gen1; /* scratch */ u32 gen2; /* scratch */ union { @@ -2164,28 +2189,6 @@ struct imm_ntfy_from_isp { #define REQUEST_ENTRY_SIZE (sizeof(request_t)) -/* - * 24 bit port ID type definition. - */ -typedef union { - uint32_t b24 : 24; - - struct { -#ifdef __BIG_ENDIAN - uint8_t domain; - uint8_t area; - uint8_t al_pa; -#elif defined(__LITTLE_ENDIAN) - uint8_t al_pa; - uint8_t area; - uint8_t domain; -#else -#error "__BIG_ENDIAN or __LITTLE_ENDIAN must be defined!" -#endif - uint8_t rsvd_1; - } b; -} port_id_t; -#define INVALID_PORT_ID0xFF /* * Switch info gathering structure. @@ -4252,6 +4255,7 @@ typedef struct scsi_qla_host { uint8_t n2n_node_name[WWN_SIZE]; uint8_t n2n_port_name[WWN_SIZE]; uint16_tn2n_id; + struct list_head gpnid_list; } scsi_qla_host_t; struct qla27xx_image_status { diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c index ea1b562ebc8a..59ecc4eda6cd 100644 --- a/drivers/scsi/qla2xxx/qla_gs.c +++ b/drivers/scsi/qla2xxx/qla_gs.c @@ -3221,16 +3221,17 @@ static void qla2x00_async_gpnid_sp_done(void *s, int res) (struct ct_sns_rsp *)sp->u.iocb_cmd.u.ctarg.rsp; struct event_arg ea; struct qla_work_evt *e; + unsigned long flags; if (res) ql_dbg(ql_dbg_disc, vha, 0x2066, - "Async done-%s fail res %x ID %3phC. %8phC\n", - sp->name, res, ct_req->req.port_id.port_id, + "Async done-%s fail res %x rscn gen %d ID %3phC. %8phC\n", + sp->name, res, sp->gen1, ct_req->req.port_id.port_id, ct_rsp->rsp.gpn_id.port_name); else ql_dbg(ql_dbg_disc, vha, 0x2066, - "Async done-%s good ID %3phC. %8phC\n", - sp->name, ct_req->req.port_id.port_id, + "Async done-%s good rscn gen %d ID %3phC. %8phC\n", + sp->name, sp->gen1, ct_req->req.port_id.port_id, ct_rsp->rsp.gpn_id.port_name); memset(&ea, 0, sizeof(ea)); @@ -3242,11 +3243,20 @@ static void qla2x00_async_gpnid_sp_done(void *s, int res) ea.rc = res; ea.event = FCME_GPNID_DONE; + spin_lock_irqsave(&vha->hw->tgt.sess_lock, flags); + list_del(&sp->elem); + spin_unlock_irqrestore(&vha->hw->tgt.sess_lock, flags); + if (res) { if (res == QLA_FUNCTION_TIMEOUT) qla24xx_post_gpnid_work(sp->vha, &ea.id); sp->free(sp); return; + } else if (sp->gen1) { + /* There was anoter RSNC for this Nport ID */ + qla24xx_post_gpnid_work(sp->vha, &ea.id); + sp->free(sp); + return; } qla2x00_fcport_event_handler(vha, &ea); @@ -3282,8 +3292,9 @@ int qla24xx_async_gpnid(scsi_qla_host_t *vha, port_id_t *id) { int rval = QLA_FUNCTION_FAILED; struct ct_sn
[PATCH v3 14/22] qla2xxx: Fix abort command deadlock due to spinlock
From: Quinn Tran Original code acquires hardware_lock to add Abort IOCB onto driver request queue for processing. However, abort_command() will also acquire hardware lock to look up sp pointer before issuing abort IOCB command resulting into a deadlock. This patch safely removes the possible deadlock scenario by removing extra spinlock. Fixes: 6eb54715b54bb ("qla2xxx: Added interface to send explicit LOGO.") Cc: # 4.10+ Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_iocb.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c index d810a447cb4a..106f4ac4f733 100644 --- a/drivers/scsi/qla2xxx/qla_iocb.c +++ b/drivers/scsi/qla2xxx/qla_iocb.c @@ -2394,7 +2394,6 @@ qla2x00_els_dcmd_iocb_timeout(void *data) struct scsi_qla_host *vha = sp->vha; struct qla_hw_data *ha = vha->hw; struct srb_iocb *lio = &sp->u.iocb_cmd; - unsigned long flags = 0; ql_dbg(ql_dbg_io, vha, 0x3069, "%s Timeout, hdl=%x, portid=%02x%02x%02x\n", @@ -2402,7 +2401,6 @@ qla2x00_els_dcmd_iocb_timeout(void *data) fcport->d_id.b.al_pa); /* Abort the exchange */ - spin_lock_irqsave(&ha->hardware_lock, flags); if (ha->isp_ops->abort_command(sp)) { ql_dbg(ql_dbg_io, vha, 0x3070, "mbx abort_command failed.\n"); @@ -2410,7 +2408,6 @@ qla2x00_els_dcmd_iocb_timeout(void *data) ql_dbg(ql_dbg_io, vha, 0x3071, "mbx abort_command success.\n"); } - spin_unlock_irqrestore(&ha->hardware_lock, flags); complete(&lio->u.els_logo.comp); } -- 2.12.0
[PATCH v3 09/22] qla2xxx: Fix NPIV host cleanup in target mode
From: Sawan Chandak Add check to make sure we are cleaning up global target host list only for NPIV hosts Fixes: bdbe24de281e2 ("scsi: qla2xxx: Cleanup NPIV host in target mode during config teardown") Cc: # 4.10+ Signed-off-by: Sawan Chandak Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_target.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index 924d58f5408f..1bec8aebb7b6 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -1561,8 +1561,11 @@ static void qlt_release(struct qla_tgt *tgt) btree_destroy64(&tgt->lun_qpair_map); - if (ha->tgt.tgt_ops && ha->tgt.tgt_ops->remove_target) - ha->tgt.tgt_ops->remove_target(vha); + if (vha->vp_idx) + if (ha->tgt.tgt_ops && + ha->tgt.tgt_ops->remove_target && + vha->vha_tgt.target_lport_ptr) + ha->tgt.tgt_ops->remove_target(vha); vha->vha_tgt.qla_tgt = NULL; -- 2.12.0
[PATCH v3 16/22] qla2xxx: Fix scan state field for fcport
From: Quinn Tran Add correct value of scan_state field indicating state of the FC port Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery") Cc: # 4.10+ Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_target.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index 2a6242d97a7e..1c219998ab60 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -5812,6 +5812,7 @@ static fc_port_t *qlt_get_port_database(struct scsi_qla_host *vha, tfcp->port_type = fcport->port_type; tfcp->supported_classes = fcport->supported_classes; tfcp->flags |= fcport->flags; + tfcp->scan_state = QLA_FCPORT_FOUND; del = fcport; fcport = tfcp; -- 2.12.0
[PATCH v3 15/22] qla2xxx: Replace fcport alloc with qla2x00_alloc_fcport
From: Quinn Tran Current code manually allocate an fcport structure that is not properly initialize. Replace kzalloc with qla2x00_alloc_fcport, so that all fields are initialized. Also set set scan flag to port found Cc: Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_target.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index e824cdc77139..2a6242d97a7e 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -5783,7 +5783,7 @@ static fc_port_t *qlt_get_port_database(struct scsi_qla_host *vha, unsigned long flags; u8 newfcport = 0; - fcport = kzalloc(sizeof(*fcport), GFP_KERNEL); + fcport = qla2x00_alloc_fcport(vha, GFP_KERNEL); if (!fcport) { ql_dbg(ql_dbg_tgt_mgt, vha, 0xf06f, "qla_target(%d): Allocation of tmp FC port failed", -- 2.12.0
[PATCH v3 08/22] qla2xxx: Fix login state machine stuck at GPDB
From: Quinn Tran This patch returns discovery state machine back to Login Complete. Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery") Cc: # 4.10+ Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_init.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c index be4c67b465b8..2f246996d3e2 100644 --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -863,6 +863,7 @@ void qla24xx_handle_gpdb_event(scsi_qla_host_t *vha, struct event_arg *ea) int rval = ea->rc; fc_port_t *fcport = ea->fcport; unsigned long flags; + u16 opt = ea->sp->u.iocb_cmd.u.mbx.out_mb[10]; fcport->flags &= ~FCF_ASYNC_SENT; @@ -893,7 +894,8 @@ void qla24xx_handle_gpdb_event(scsi_qla_host_t *vha, struct event_arg *ea) } spin_lock_irqsave(&vha->hw->tgt.sess_lock, flags); - ea->fcport->login_gen++; + if (opt != PDO_FORCE_ADISC) + ea->fcport->login_gen++; ea->fcport->deleted = 0; ea->fcport->logout_on_delete = 1; @@ -917,6 +919,13 @@ void qla24xx_handle_gpdb_event(scsi_qla_host_t *vha, struct event_arg *ea) qla24xx_post_gpsc_work(vha, fcport); } + } else if (ea->fcport->login_succ) { + /* +* We have an existing session. A late RSCN delivery +* must have triggered the session to be re-validate. +* session is still valid. +*/ + fcport->disc_state = DSC_LOGIN_COMPLETE; } spin_unlock_irqrestore(&vha->hw->tgt.sess_lock, flags); } /* gpdb event */ -- 2.12.0
[PATCH v3 11/22] qla2xxx: Fix Relogin being triggered too fast
From: Quinn Tran Current driver design schedules relogin process via DPC thread every 1 second. In a large fabric, this DPC thread tries to schedule too many jobs and might get overloaded. As a result of this processing of DPC thread, it can schedule relogin earlier than 1 second. Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery") Cc: # 4.10+ Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_def.h | 1 + drivers/scsi/qla2xxx/qla_mid.c | 24 +++- drivers/scsi/qla2xxx/qla_os.c | 22 ++ 3 files changed, 30 insertions(+), 17 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h index d9b4a0651a0f..93ff92e2363f 100644 --- a/drivers/scsi/qla2xxx/qla_def.h +++ b/drivers/scsi/qla2xxx/qla_def.h @@ -4110,6 +4110,7 @@ typedef struct scsi_qla_host { #define LOOP_READY 5 #define LOOP_DEAD 6 + unsigned long relogin_jif; unsigned long dpc_flags; #define RESET_MARKER_NEEDED0 /* Send marker to ISP. */ #define RESET_ACTIVE 1 diff --git a/drivers/scsi/qla2xxx/qla_mid.c b/drivers/scsi/qla2xxx/qla_mid.c index bd9f14bf7ac2..618ca272d01a 100644 --- a/drivers/scsi/qla2xxx/qla_mid.c +++ b/drivers/scsi/qla2xxx/qla_mid.c @@ -343,15 +343,21 @@ qla2x00_do_dpc_vp(scsi_qla_host_t *vha) "FCPort update end.\n"); } - if ((test_and_clear_bit(RELOGIN_NEEDED, &vha->dpc_flags)) && - !test_bit(LOOP_RESYNC_NEEDED, &vha->dpc_flags) && - atomic_read(&vha->loop_state) != LOOP_DOWN) { - - ql_dbg(ql_dbg_dpc, vha, 0x4018, - "Relogin needed scheduled.\n"); - qla2x00_relogin(vha); - ql_dbg(ql_dbg_dpc, vha, 0x4019, - "Relogin needed end.\n"); + if (test_bit(RELOGIN_NEEDED, &vha->dpc_flags) && + !test_bit(LOOP_RESYNC_NEEDED, &vha->dpc_flags) && + atomic_read(&vha->loop_state) != LOOP_DOWN) { + + if (!vha->relogin_jif || + time_after_eq(jiffies, vha->relogin_jif)) { + vha->relogin_jif = jiffies + HZ; + clear_bit(RELOGIN_NEEDED, &vha->dpc_flags); + + ql_dbg(ql_dbg_dpc, vha, 0x4018, + "Relogin needed scheduled.\n"); + qla2x00_relogin(vha); + ql_dbg(ql_dbg_dpc, vha, 0x4019, + "Relogin needed end.\n"); + } } if (test_and_clear_bit(RESET_MARKER_NEEDED, &vha->dpc_flags) && diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index 820d1c185beb..2ec77b9f78b8 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -4905,7 +4905,7 @@ void qla2x00_relogin(struct scsi_qla_host *vha) */ if (atomic_read(&fcport->state) != FCS_ONLINE && fcport->login_retry && !(fcport->flags & FCF_ASYNC_SENT)) { - fcport->login_retry--; + if (fcport->flags & FCF_FABRIC_DEVICE) { ql_dbg(ql_dbg_disc, fcport->vha, 0x2108, "%s %8phC DS %d LS %d\n", __func__, @@ -4916,6 +4916,7 @@ void qla2x00_relogin(struct scsi_qla_host *vha) ea.fcport = fcport; qla2x00_fcport_event_handler(vha, &ea); } else { + fcport->login_retry--; status = qla2x00_local_device_login(vha, fcport); if (status == QLA_SUCCESS) { @@ -5898,16 +5899,21 @@ qla2x00_do_dpc(void *data) } /* Retry each device up to login retry count */ - if ((test_and_clear_bit(RELOGIN_NEEDED, - &base_vha->dpc_flags)) && + if (test_bit(RELOGIN_NEEDED, &base_vha->dpc_flags) && !test_bit(LOOP_RESYNC_NEEDED, &base_vha->dpc_flags) && atomic_read(&base_vha->loop_state) != LOOP_DOWN) { - ql_dbg(ql_dbg_dpc, base_vha, 0x400d, - "Relogin scheduled.\n"); - qla2x00_relogin(base_vha); - ql_dbg(ql_dbg_dpc, base_vha, 0x400e, - "Relogin end.\n"); + if (!base_vha->relogin_jif || + time_after_eq(jiffies, base_vha->relogin_jif)) { + base_vha->relogin_jif = jiffies + HZ; + clear_bit(RELOGIN_NEEDED, &base_vha->dpc_flags); + + ql_dbg(ql_dbg_dpc, base_vha, 0x400d, + "Relogin schedu
[PATCH v3 05/22] qla2xxx: Fix re-login for Nport Handle in use
From: Quinn Tran When NPort Handle is in use, driver needs to mark the handle as used and pick another. Instead, the code clears the handle and re-pick the same handle. Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery") Cc: # 4.10+ Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani --- drivers/scsi/qla2xxx/qla_gs.c | 16 ++- drivers/scsi/qla2xxx/qla_init.c | 44 + drivers/scsi/qla2xxx/qla_isr.c | 5 - 3 files changed, 51 insertions(+), 14 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c index ddc69d36877e..8984f857bb34 100644 --- a/drivers/scsi/qla2xxx/qla_gs.c +++ b/drivers/scsi/qla2xxx/qla_gs.c @@ -2833,7 +2833,7 @@ void qla24xx_handle_gidpn_event(scsi_qla_host_t *vha, struct event_arg *ea) } } else { /* fcport->d_id.b24 != ea->id.b24 */ fcport->d_id.b24 = ea->id.b24; - if (fcport->deleted == QLA_SESS_DELETED) { + if (fcport->deleted != QLA_SESS_DELETED) { ql_dbg(ql_dbg_disc, vha, 0x2021, "%s %d %8phC post del sess\n", __func__, __LINE__, fcport->port_name); @@ -3206,10 +3206,16 @@ static void qla2x00_async_gpnid_sp_done(void *s, int res) struct event_arg ea; struct qla_work_evt *e; - ql_dbg(ql_dbg_disc, vha, 0x2066, - "Async done-%s res %x ID %3phC. %8phC\n", - sp->name, res, ct_req->req.port_id.port_id, - ct_rsp->rsp.gpn_id.port_name); + if (res) + ql_dbg(ql_dbg_disc, vha, 0x2066, + "Async done-%s fail res %x ID %3phC. %8phC\n", + sp->name, res, ct_req->req.port_id.port_id, + ct_rsp->rsp.gpn_id.port_name); + else + ql_dbg(ql_dbg_disc, vha, 0x2066, + "Async done-%s good ID %3phC. %8phC\n", + sp->name, ct_req->req.port_id.port_id, + ct_rsp->rsp.gpn_id.port_name); if (res) { sp->free(sp); diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c index 1bafa043f9f1..be4c67b465b8 100644 --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -1452,6 +1452,8 @@ static void qla24xx_handle_plogi_done_event(struct scsi_qla_host *vha, struct event_arg *ea) { port_id_t cid; /* conflict Nport id */ + u16 lid; + struct fc_port *conflict_fcport; switch (ea->data[0]) { case MBS_COMMAND_COMPLETE: @@ -1467,8 +1469,12 @@ qla24xx_handle_plogi_done_event(struct scsi_qla_host *vha, struct event_arg *ea) qla24xx_post_prli_work(vha, ea->fcport); } else { ql_dbg(ql_dbg_disc, vha, 0x20ea, - "%s %d %8phC post gpdb\n", - __func__, __LINE__, ea->fcport->port_name); + "%s %d %8phC LoopID 0x%x in use with %06x. post gnl\n", + __func__, __LINE__, ea->fcport->port_name, + ea->fcport->loop_id, ea->fcport->d_id.b24); + + set_bit(ea->fcport->loop_id, vha->hw->loop_id_map); + ea->fcport->loop_id = FC_NO_LOOP_ID; ea->fcport->chip_reset = vha->hw->base_qpair->chip_reset; ea->fcport->logout_on_delete = 1; ea->fcport->send_els_logo = 0; @@ -1513,8 +1519,38 @@ qla24xx_handle_plogi_done_event(struct scsi_qla_host *vha, struct event_arg *ea) ea->fcport->d_id.b.domain, ea->fcport->d_id.b.area, ea->fcport->d_id.b.al_pa); - qla2x00_clear_loop_id(ea->fcport); - qla24xx_post_gidpn_work(vha, ea->fcport); + lid = ea->iop[1] & 0x; + qlt_find_sess_invalidate_other(vha, + wwn_to_u64(ea->fcport->port_name), + ea->fcport->d_id, lid, &conflict_fcport); + + if (conflict_fcport) { + /* +* Another fcport share the same loop_id/nport id. +* Conflict fcport needs to finish cleanup before this +* fcport can proceed to login. +*/ + conflict_fcport->conflict = ea->fcport; + ea->fcport->login_pause = 1; + + ql_dbg(ql_dbg_disc, vha, 0x20ed, + "%s %d %8phC NPortId %06x inuse with loopid 0x%x. post gidpn\n", + __func__, __LINE__, ea->fcport->port_name, + ea->fcport->d_id.b24, lid); + qla2x00_clear_loop_id(ea
[PATCH v3 06/22] qla2xxx: Retry switch command on time out
From: Quinn Tran Retry GID_PN & GPN_ID switch commands for time out case. Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery") Cc: # 4.10+ Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_gs.c | 34 ++ 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c index 8984f857bb34..ea1b562ebc8a 100644 --- a/drivers/scsi/qla2xxx/qla_gs.c +++ b/drivers/scsi/qla2xxx/qla_gs.c @@ -175,6 +175,9 @@ qla2x00_chk_ms_status(scsi_qla_host_t *vha, ms_iocb_entry_t *ms_pkt, set_bit(LOCAL_LOOP_UPDATE, &vha->dpc_flags); } break; + case CS_TIMEOUT: + rval = QLA_FUNCTION_TIMEOUT; + /* drop through */ default: ql_dbg(ql_dbg_disc, vha, 0x2033, "%s failed, completion status (%x) on port_id: " @@ -2889,9 +2892,22 @@ static void qla2x00_async_gidpn_sp_done(void *s, int res) ea.rc = res; ea.event = FCME_GIDPN_DONE; - ql_dbg(ql_dbg_disc, vha, 0x204f, - "Async done-%s res %x, WWPN %8phC ID %3phC \n", - sp->name, res, fcport->port_name, id); + if (res == QLA_FUNCTION_TIMEOUT) { + ql_dbg(ql_dbg_disc, sp->vha, 0x, + "Async done-%s WWPN %8phC timed out.\n", + sp->name, fcport->port_name); + qla24xx_post_gidpn_work(sp->vha, fcport); + sp->free(sp); + return; + } else if (res) { + ql_dbg(ql_dbg_disc, sp->vha, 0x, + "Async done-%s fail res %x, WWPN %8phC\n", + sp->name, res, fcport->port_name); + } else { + ql_dbg(ql_dbg_disc, vha, 0x204f, + "Async done-%s good WWPN %8phC ID %3phC\n", + sp->name, fcport->port_name, id); + } qla2x00_fcport_event_handler(vha, &ea); @@ -3217,11 +3233,6 @@ static void qla2x00_async_gpnid_sp_done(void *s, int res) sp->name, ct_req->req.port_id.port_id, ct_rsp->rsp.gpn_id.port_name); - if (res) { - sp->free(sp); - return; - } - memset(&ea, 0, sizeof(ea)); memcpy(ea.port_name, ct_rsp->rsp.gpn_id.port_name, WWN_SIZE); ea.sp = sp; @@ -3231,6 +3242,13 @@ static void qla2x00_async_gpnid_sp_done(void *s, int res) ea.rc = res; ea.event = FCME_GPNID_DONE; + if (res) { + if (res == QLA_FUNCTION_TIMEOUT) + qla24xx_post_gpnid_work(sp->vha, &ea.id); + sp->free(sp); + return; + } + qla2x00_fcport_event_handler(vha, &ea); e = qla2x00_alloc_work(vha, QLA_EVT_GPNID_DONE); -- 2.12.0
[PATCH v3 04/22] qla2xxx: Skip IRQ affinity for Target QPairs
From: Quinn Tran Fix co-existence between Block MQ and Target Mode. Block MQ and initiator mode requires midlayer queue mapping to check for IRQ to be affinitize. For target mode, it's not the case. Fixes: 09620eeb62c41 ("scsi: qla2xxx: Add debug knob for user control workload") Cc: # 4.12+ Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_os.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index dfbf82e716b0..428e1bfaa83b 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -6609,9 +6609,14 @@ qla83xx_disable_laser(scsi_qla_host_t *vha) static int qla2xxx_map_queues(struct Scsi_Host *shost) { + int rc; scsi_qla_host_t *vha = (scsi_qla_host_t *)shost->hostdata; - return blk_mq_pci_map_queues(&shost->tag_set, vha->hw->pdev); + if (USER_CTRL_IRQ(vha->hw)) + rc = blk_mq_map_queues(&shost->tag_set); + else + rc = blk_mq_pci_map_queues(&shost->tag_set, vha->hw->pdev); + return rc; } static const struct pci_error_handlers qla2xxx_err_handler = { -- 2.12.0
[PATCH v3 03/22] qla2xxx: Move session delete to driver work queue
From: Quinn Tran Move session delete from system work queue to driver's work queue for in time processing. Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery") Cc: # 4.10+ Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_os.c | 3 ++- drivers/scsi/qla2xxx/qla_target.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index 46f2d0cf7c0d..dfbf82e716b0 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -3193,10 +3193,11 @@ qla2x00_probe_one(struct pci_dev *pdev, const struct pci_device_id *id) host->can_queue, base_vha->req, base_vha->mgmt_svr_loop_id, host->sg_tablesize); + ha->wq = alloc_workqueue("qla2xxx_wq", WQ_MEM_RECLAIM, 0); + if (ha->mqenable) { bool mq = false; bool startit = false; - ha->wq = alloc_workqueue("qla2xxx_wq", WQ_MEM_RECLAIM, 0); if (QLA_TGT_MODE_ENABLED()) { mq = true; diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index 1259ec85ec0a..924d58f5408f 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -1205,7 +1205,8 @@ void qlt_schedule_sess_for_deletion(struct fc_port *sess, ql_dbg(ql_dbg_tgt, sess->vha, 0xe001, "Scheduling sess %p for deletion\n", sess); - schedule_work(&sess->del_work); + INIT_WORK(&sess->del_work, qla24xx_delete_sess_fn); + queue_work(sess->vha->hw->wq, &sess->del_work); } void qlt_schedule_sess_for_deletion_lock(struct fc_port *sess) -- 2.12.0
[PATCH v3 02/22] qla2xxx: Fix gpnid error processing
From: Quinn Tran Stop GPNID command from advancing if command has failed. Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery") Cc: # 4.10+ Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_gs.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c index bc3db6abc9a0..ddc69d36877e 100644 --- a/drivers/scsi/qla2xxx/qla_gs.c +++ b/drivers/scsi/qla2xxx/qla_gs.c @@ -3211,6 +3211,11 @@ static void qla2x00_async_gpnid_sp_done(void *s, int res) sp->name, res, ct_req->req.port_id.port_id, ct_rsp->rsp.gpn_id.port_name); + if (res) { + sp->free(sp); + return; + } + memset(&ea, 0, sizeof(ea)); memcpy(ea.port_name, ct_rsp->rsp.gpn_id.port_name, WWN_SIZE); ea.sp = sp; -- 2.12.0
[PATCH v3 00/22] qla2xxx: Bug fixes for 4.15-rc2
Hi Martin, This series contains bug fixes discovered during error handling test cases for large fabric. Please apply this series to 4.15-rc2 at your earliest convenience. Changes from v2 -> v3 o Added Reviewed-by tag from Hannes. o Fixed Spelling mistake in patch 7. Changes from v1 -> v2 o Updated patch description for patch 14 as per Bart's suggestion. Thanks, Himanshu Giridhar Malavali (2): qla2xxx: Defer processing of GS IOCB calls qla2xxx: Remove aborting ELS IOCB call issued as part of timeout. Himanshu Madhani (2): qla2xxx: Fix memory leak in dual/target mode qla2xxx: Update driver version to 10.00.00.03-k Quinn Tran (17): qla2xxx: Fix system crash for Notify ack timeout handling qla2xxx: Fix gpnid error processing qla2xxx: Move session delete to driver work queue qla2xxx: Skip IRQ affinity for Target QPairs qla2xxx: Fix re-login for Nport Handle in use qla2xxx: Retry switch command on time out qla2xxx: Serialize GPNID for multiple RSCN qla2xxx: Fix login state machine stuck at GPDB qla2xxx: Relogin to target port on a cable swap qla2xxx: Fix Relogin being triggered too fast qla2xxx: Clear send ELS LOGO flag after target re-login qla2xxx: Fix PRLI state check qla2xxx: Fix abort command deadlock due to spinlock qla2xxx: Replace fcport alloc with qla2x00_alloc_fcport qla2xxx: Fix scan state field for fcport qla2xxx: Clear loop id after delete qla2xxx: Fix system crash in qlt_plogi_ack_unref Sawan Chandak (1): qla2xxx: Fix NPIV host cleanup in target mode drivers/scsi/qla2xxx/qla_def.h | 49 drivers/scsi/qla2xxx/qla_gs.c | 230 ++--- drivers/scsi/qla2xxx/qla_init.c| 69 +-- drivers/scsi/qla2xxx/qla_iocb.c| 13 --- drivers/scsi/qla2xxx/qla_isr.c | 7 +- drivers/scsi/qla2xxx/qla_mbx.c | 3 +- drivers/scsi/qla2xxx/qla_mid.c | 42 --- drivers/scsi/qla2xxx/qla_os.c | 78 ++--- drivers/scsi/qla2xxx/qla_target.c | 60 +++--- drivers/scsi/qla2xxx/qla_version.h | 2 +- 10 files changed, 405 insertions(+), 148 deletions(-) -- 2.12.0
[PATCH v3 01/22] qla2xxx: Fix system crash for Notify ack timeout handling
From: Quinn Tran Fix NULL pointer crash due to missing timeout handling callback for Notify Ack IOCB. Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery") Cc: # 4.10+ Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Reviewed-by: Hannes Reinecke --- drivers/scsi/qla2xxx/qla_target.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index 18069edd4773..1259ec85ec0a 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -665,7 +665,7 @@ int qla24xx_async_notify_ack(scsi_qla_host_t *vha, fc_port_t *fcport, qla2x00_init_timer(sp, qla2x00_get_async_timeout(vha)+2); sp->u.iocb_cmd.u.nack.ntfy = ntfy; - + sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout; sp->done = qla2x00_async_nack_sp_done; rval = qla2x00_start_sp(sp); -- 2.12.0
Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()
On Mon, Dec 04, 2017 at 03:09:20PM +, Bart Van Assche wrote: > On Sun, 2017-12-03 at 00:31 +0800, Ming Lei wrote: > > Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for > > blk-mq") > > It might be safer to revert commit 0df21c86bdbf instead of trying to fix all > issues introduced by that commit for kernel version v4.15 ... What are all issues in v4.15-rc? Up to now, it is the only issue reported, and can be fixed by this simple patch, which one can be thought as cleanup too. -- Ming
Re: [PATCH] blk-mq: Fix several SCSI request queue lockups
On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote: > Commit 0df21c86bdbf introduced several bugs: > * A SCSI queue stall for queue depths > 1, addressed by commit > 88022d7201e9 ("blk-mq: don't handle failure in .get_budget") This one is committed already. > * A systematic lockup for SCSI queues with queue depth 1. The > following test reproduces that bug systematically: > - Change the SRP initiator such that SCSI target queue depth is > limited to 1. > - Run the following command: > srp-test/run_tests -f xfs -d -e none -r 60 -t 01 > See also "[PATCH 4/7] blk-mq: Avoid that request processing > stalls when sharing tags" > (https://marc.info/?l=linux-block&m=151208695316857). Note: > reverting commit 0df21c86bdbf also fixes a sporadic SCSI request > queue lockup while inserting a blk_mq_sched_mark_restart_hctx() > before all blk_mq_dispatch_rq_list() calls only fixes the > systematic lockup for queue depth 1. You are the only reproducer, and you don't want to provide any kernel log about this issue, so how can we help you fix your issue? You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched: improve dispatching from sw queue")', but you don't mention any issue about that commit, and your patch is actually nothing to do with commit b347689ffbca, and seems your work style is just try and guess. Also both Jens and I have run tests on null_blk and scsi_debug by setting queue_depth as one, and we all can't see IO hang with current blk-mq. > * A scsi_debug lockup - see also "[PATCH] SCSI: delay run queue if > device is blocked in scsi_dev_queue_ready()" > (https://marc.info/?l=linux-block&m=151223233407154). This issue is clearly explained in theory, and can be reproduced/verified by scsi_debug, so why can't we apply it to fix the issue? And the fix is simply and can be thought as cleanup too, since the handling for this case becomes same with non-mq path now. > > I think the above means that it is too risky to try to fix all bugs > introduced by commit 0df21c86bdbf before kernel v4.15 is released. > Hence revert that commit. What is the risk? > > Fixes: commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for > blk-mq") > Signed-off-by: Bart Van Assche > Cc: Ming Lei > Cc: Christoph Hellwig > Cc: Hannes Reinecke > Cc: Johannes Thumshirn > Cc: James E.J. Bottomley > Cc: Martin K. Petersen > Cc: linux-scsi@vger.kernel.org This commit fixes one important SCSI_MQ performance issue, we can't simply revert it just because of one un-confirmed report from you only(without any kernel log provided). So Nak. -- Ming
[PATCH] ibmvscsis: add DRC indices to debug statements
Where applicable, changes pr_debug, pr_info, pr_err, etc. calls to the dev_* versions. This adds the DRC index of the device to the corresponding trace statement. Signed-off-by: Bryant G. Ly Signed-off-by: Brad Warrum --- drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 320 --- 1 file changed, 170 insertions(+), 150 deletions(-) diff --git a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c index 2799a6b08f736..c3a76af9f5fa9 100644 --- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c +++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c @@ -122,7 +122,7 @@ static bool connection_broken(struct scsi_info *vscsi) cpu_to_be64(buffer[MSG_HI]), cpu_to_be64(buffer[MSG_LOW])); - pr_debug("connection_broken: rc %ld\n", h_return_code); + dev_dbg(&vscsi->dev, "Connection_broken: rc %ld\n", h_return_code); if (h_return_code == H_CLOSED) rc = true; @@ -210,7 +210,7 @@ static long ibmvscsis_unregister_command_q(struct scsi_info *vscsi) } } while (qrc != H_SUCCESS && rc == ADAPT_SUCCESS); - pr_debug("Freeing CRQ: phyp rc %ld, rc %ld\n", qrc, rc); + dev_dbg(&vscsi->dev, "Freeing CRQ: phyp rc %ld, rc %ld\n", qrc, rc); return rc; } @@ -291,9 +291,9 @@ static long ibmvscsis_free_command_q(struct scsi_info *vscsi) ibmvscsis_delete_client_info(vscsi, false); } - pr_debug("free_command_q: flags 0x%x, state 0x%hx, acr_flags 0x%x, acr_state 0x%hx\n", -vscsi->flags, vscsi->state, vscsi->phyp_acr_flags, -vscsi->phyp_acr_state); + dev_dbg(&vscsi->dev, "free_command_q: flags 0x%x, state 0x%hx, acr_flags 0x%x, acr_state 0x%hx\n", + vscsi->flags, vscsi->state, vscsi->phyp_acr_flags, + vscsi->phyp_acr_state); } return rc; } @@ -428,8 +428,8 @@ static void ibmvscsis_disconnect(struct work_struct *work) vscsi->flags |= DISCONNECT_SCHEDULED; vscsi->flags &= ~SCHEDULE_DISCONNECT; - pr_debug("disconnect: flags 0x%x, state 0x%hx\n", vscsi->flags, -vscsi->state); + dev_dbg(&vscsi->dev, "disconnect: flags 0x%x, state 0x%hx\n", + vscsi->flags, vscsi->state); /* * check which state we are in and see if we @@ -540,13 +540,14 @@ static void ibmvscsis_disconnect(struct work_struct *work) } if (wait_idle) { - pr_debug("disconnect start wait, active %d, sched %d\n", -(int)list_empty(&vscsi->active_q), -(int)list_empty(&vscsi->schedule_q)); + dev_dbg(&vscsi->dev, "disconnect start wait, active %d, sched %d\n", + (int)list_empty(&vscsi->active_q), + (int)list_empty(&vscsi->schedule_q)); if (!list_empty(&vscsi->active_q) || !list_empty(&vscsi->schedule_q)) { vscsi->flags |= WAIT_FOR_IDLE; - pr_debug("disconnect flags 0x%x\n", vscsi->flags); + dev_dbg(&vscsi->dev, "disconnect flags 0x%x\n", + vscsi->flags); /* * This routine is can not be called with the interrupt * lock held. @@ -555,7 +556,7 @@ static void ibmvscsis_disconnect(struct work_struct *work) wait_for_completion(&vscsi->wait_idle); spin_lock_bh(&vscsi->intr_lock); } - pr_debug("disconnect stop wait\n"); + dev_dbg(&vscsi->dev, "disconnect stop wait\n"); ibmvscsis_adapter_idle(vscsi); } @@ -597,8 +598,8 @@ static void ibmvscsis_post_disconnect(struct scsi_info *vscsi, uint new_state, vscsi->flags |= flag_bits; - pr_debug("post_disconnect: new_state 0x%x, flag_bits 0x%x, vscsi->flags 0x%x, state %hx\n", -new_state, flag_bits, vscsi->flags, vscsi->state); + dev_dbg(&vscsi->dev, "post_disconnect: new_state 0x%x, flag_bits 0x%x, vscsi->flags 0x%x, state %hx\n", + new_state, flag_bits, vscsi->flags, vscsi->state); if (!(vscsi->flags & (DISCONNECT_SCHEDULED | SCHEDULE_DISCONNECT))) { vscsi->flags |= SCHEDULE_DISCONNECT; @@ -648,8 +649,8 @@ static void ibmvscsis_post_disconnect(struct scsi_info *vscsi, uint new_state, } } - pr_debug("Leaving post_disconnect: flags 0x%x, new_state 0x%x\n", -vscsi->flags, vscsi->new_state); + dev_dbg(&vscsi->dev, "Leaving post_disconnect: flags 0x%x, new_state 0x%x\n", + vscsi->flags, vscsi->new_state); } /** @@ -724,7 +725,8 @@ static long ibmvscsis_handle_init_msg(struct scsi_info *vscsi)
Re: [EXT] Re: UFS utilities
On Mon, Dec 04, 2017 at 03:20:34PM +, Bean Huo (beanhuo) wrote: > Hi, Bart > Sorry for later! > > > >Hello Bean, > > > >Please be more specific. What is inconvenient about sg3_utils on embedded > >ARM systems? > > > Exactly, I don't know how to compile sg3_utils with static library, instead > of sharing library. I used following configuration > Parameter: > ./configure --enable-static=yes --build=x86_64-unknown-linux-gnu > --host=aarch64-linux-gnu --prefix=$PWD/out/ CC=aarch64-linux-gnu-g > cc --target=aarch64-linux-gnu LD=aarch64-linux-gnu-ld > AS=aarch64-linux-gnu-as CFLAGS=-static LDFLAGS=-static > > But the object files are still dynamically linked. > > ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, > interpreter /lib/ld-linux-aarch64.so.1, > for GNU/Linux 3.7.0, BuildID[sha1]=4f01b4c9f1ff47bc00aef93950c02734b4cc8e57, > not stripped. > > I want it to be statically linked. Otherwise, I should copy its library to my > lib folder, and sometimes for the embedded, > Need to re-build rootfs. Meanwhile, for the UFS, there are totally 27 scsi > commands being used based on UFS2.1. > For the most case, we just use several sg3_utils object files, don't need to > copy all object files to the ending product. So what UFS commands are you missing that you need to see implemented? And again, have you checked the different forks of the driver? > >> And also it doesn't support several UFS special command. > > > >Are you referring to SCSI commands or rather to UFS commands that fall > >outside the SCSI spec? Anyway, an approach that is used by many SCSI drivers > >to export information to user space that falls outside the SCSI spec is to > >create additional sysfs attributes. See also the sdev_attrs and shost_attrs > >members of struct scsi_host_template. > > > Yes, for the UFS information, I can use these interface/approach to easily > get. > I am thinking how about some testing case and configuration operation. Which ones exactly? > Also, is it possible bypass SCSI stacks and go into directly UFS stack? Look at the different sysfs files for the UFS device, it does that for some commands. thanks, greg k-h
[PATCH 1/3] scsi_get_device_flags_keyed(): Always return device flags
Since scsi_get_device_flags_keyed() callers do not check whether or not the returned value is an error code, change that function such that it returns a flags value even if the 'key' argument is invalid. Note: since commit 28a0bc4120d3 ("scsi: sd: Implement blacklist option for WRITE SAME w/ UNMAP") bit 31 is a valid device information flag so checking whether bit 31 is set in the return value is not sufficient to tell the difference between an error code and a flags value. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Johannes Thumshirn --- drivers/scsi/scsi_devinfo.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c index 78d4aa8df675..2fed250e87bf 100644 --- a/drivers/scsi/scsi_devinfo.c +++ b/drivers/scsi/scsi_devinfo.c @@ -599,17 +599,12 @@ blist_flags_t scsi_get_device_flags_keyed(struct scsi_device *sdev, int key) { struct scsi_dev_info_list *devinfo; - int err; devinfo = scsi_dev_info_list_find(vendor, model, key); if (!IS_ERR(devinfo)) return devinfo->flags; - err = PTR_ERR(devinfo); - if (err != -ENOENT) - return err; - - /* nothing found, return nothing */ + /* key or device not found: return nothing */ if (key != SCSI_DEVINFO_GLOBAL) return 0; -- 2.15.0
[PATCH 2/3] Use blist_flags_t consistently
Use the type blist_flags_t for all variables that represent blacklist flags. Additionally, suppress recently introduced sparse warnings related to blacklist flags. Fixes: commit c6b54164508a ("scsi: Use 'blist_flags_t' for scsi_devinfo flags") Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Johannes Thumshirn --- drivers/scsi/scsi_devinfo.c | 8 +++- drivers/scsi/scsi_scan.c | 13 +++-- drivers/scsi/scsi_sysfs.c | 4 ++-- drivers/scsi/scsi_transport_spi.c | 12 +++- 4 files changed, 19 insertions(+), 18 deletions(-) diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c index 2fed250e87bf..4b33c001ae23 100644 --- a/drivers/scsi/scsi_devinfo.c +++ b/drivers/scsi/scsi_devinfo.c @@ -382,10 +382,8 @@ int scsi_dev_info_list_add_keyed(int compatible, char *vendor, char *model, model, compatible); if (strflags) - devinfo->flags = simple_strtoul(strflags, NULL, 0); - else - devinfo->flags = flags; - + flags = (__force blist_flags_t)simple_strtoul(strflags, NULL, 0); + devinfo->flags = flags; devinfo->compatible = compatible; if (compatible) @@ -612,7 +610,7 @@ blist_flags_t scsi_get_device_flags_keyed(struct scsi_device *sdev, if (sdev->sdev_bflags) return sdev->sdev_bflags; - return scsi_default_dev_flags; + return (__force blist_flags_t)scsi_default_dev_flags; } EXPORT_SYMBOL(scsi_get_device_flags_keyed); diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index be5e919db0e8..0880d975eed3 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -770,7 +770,7 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result, * SCSI_SCAN_LUN_PRESENT: a new scsi_device was allocated and initialized **/ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result, - int *bflags, int async) + blist_flags_t *bflags, int async) { int ret; @@ -1049,14 +1049,15 @@ static unsigned char *scsi_inq_str(unsigned char *buf, unsigned char *inq, * - SCSI_SCAN_LUN_PRESENT: a new scsi_device was allocated and initialized **/ static int scsi_probe_and_add_lun(struct scsi_target *starget, - u64 lun, int *bflagsp, + u64 lun, blist_flags_t *bflagsp, struct scsi_device **sdevp, enum scsi_scan_mode rescan, void *hostdata) { struct scsi_device *sdev; unsigned char *result; - int bflags, res = SCSI_SCAN_NO_RESPONSE, result_len = 256; + blist_flags_t bflags; + int res = SCSI_SCAN_NO_RESPONSE, result_len = 256; struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); /* @@ -1201,7 +1202,7 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget, * Modifies sdevscan->lun. **/ static void scsi_sequential_lun_scan(struct scsi_target *starget, -int bflags, int scsi_level, +blist_flags_t bflags, int scsi_level, enum scsi_scan_mode rescan) { uint max_dev_lun; @@ -1292,7 +1293,7 @@ static void scsi_sequential_lun_scan(struct scsi_target *starget, * 0: scan completed (or no memory, so further scanning is futile) * 1: could not scan with REPORT LUN **/ -static int scsi_report_lun_scan(struct scsi_target *starget, int bflags, +static int scsi_report_lun_scan(struct scsi_target *starget, blist_flags_t bflags, enum scsi_scan_mode rescan) { unsigned char scsi_cmd[MAX_COMMAND_SIZE]; @@ -1538,7 +1539,7 @@ static void __scsi_scan_target(struct device *parent, unsigned int channel, unsigned int id, u64 lun, enum scsi_scan_mode rescan) { struct Scsi_Host *shost = dev_to_shost(parent); - int bflags = 0; + blist_flags_t bflags = 0; int res; struct scsi_target *starget; diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 50e7d7e4a861..6ee964643531 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -967,7 +967,7 @@ sdev_show_wwid(struct device *dev, struct device_attribute *attr, } static DEVICE_ATTR(wwid, S_IRUGO, sdev_show_wwid, NULL); -#define BLIST_FLAG_NAME(name) [ilog2(BLIST_##name)] = #name +#define BLIST_FLAG_NAME(name) [ilog2((__force u32)BLIST_##name)] = #name static const char *const sdev_bflags_name[] = { #include "scsi_devinfo_tbl.c" }; @@ -984,7 +984,7 @@ sdev_show_blacklist(struct device *dev, struct device_attribute *attr, for (i = 0; i < sizeof(sdev->sdev_bflags) * BITS_PER_BYTE; i++) { const char *name = NULL; - if (!(sdev->sdev_b
[PATCH 0/3] SCSI device blacklist handling improvements
Hello Martin, These three patches is what I came up with after having reviewed recent changes in the code for handling blacklist flags handling. Please consider these patches for kernel v4.16. Note: since patch "Use 'blist_flags_t' for scsi_devinfo flags" is not yet in the 4.16/scsi-queue branch I have developed these patches on top of a merge of the v4.15-rc1 release and your 4.16/scsi-queue branch. Thanks, Bart. Bart Van Assche (3): scsi_get_device_flags_keyed(): Always return device flags Use blist_flags_t consistently Introduce scsi_devinfo_key enumeration type drivers/scsi/scsi_devinfo.c | 29 - drivers/scsi/scsi_priv.h | 14 -- drivers/scsi/scsi_scan.c | 13 +++-- drivers/scsi/scsi_sysfs.c | 4 ++-- drivers/scsi/scsi_transport_spi.c | 12 +++- 5 files changed, 36 insertions(+), 36 deletions(-) -- 2.15.0
[PATCH 3/3] Introduce scsi_devinfo_key enumeration type
Since symbolic names for the device information keys alread exist, associate an enumeration type with these symbolic values. This change makes it clear what the valid values for the 'key' arguments are. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Johannes Thumshirn --- drivers/scsi/scsi_devinfo.c | 14 -- drivers/scsi/scsi_priv.h| 14 -- 2 files changed, 16 insertions(+), 12 deletions(-) diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c index 4b33c001ae23..82bc807e1d50 100644 --- a/drivers/scsi/scsi_devinfo.c +++ b/drivers/scsi/scsi_devinfo.c @@ -361,7 +361,8 @@ static int scsi_dev_info_list_add(int compatible, char *vendor, char *model, * Returns: 0 OK, -error on failure. **/ int scsi_dev_info_list_add_keyed(int compatible, char *vendor, char *model, -char *strflags, blist_flags_t flags, int key) +char *strflags, blist_flags_t flags, +enum scsi_devinfo_key key) { struct scsi_dev_info_list *devinfo; struct scsi_dev_info_list_table *devinfo_table = @@ -410,7 +411,7 @@ EXPORT_SYMBOL(scsi_dev_info_list_add_keyed); * Returns: pointer to matching entry, or ERR_PTR on failure. **/ static struct scsi_dev_info_list *scsi_dev_info_list_find(const char *vendor, - const char *model, int key) + const char *model, enum scsi_devinfo_key key) { struct scsi_dev_info_list *devinfo; struct scsi_dev_info_list_table *devinfo_table = @@ -492,7 +493,8 @@ static struct scsi_dev_info_list *scsi_dev_info_list_find(const char *vendor, * * Returns: 0 OK, -error on failure. **/ -int scsi_dev_info_list_del_keyed(char *vendor, char *model, int key) +int scsi_dev_info_list_del_keyed(char *vendor, char *model, +enum scsi_devinfo_key key) { struct scsi_dev_info_list *found; @@ -594,7 +596,7 @@ blist_flags_t scsi_get_device_flags(struct scsi_device *sdev, blist_flags_t scsi_get_device_flags_keyed(struct scsi_device *sdev, const unsigned char *vendor, const unsigned char *model, - int key) + enum scsi_devinfo_key key) { struct scsi_dev_info_list *devinfo; @@ -776,7 +778,7 @@ void scsi_exit_devinfo(void) * Adds the requested list, returns zero on success, -EEXIST if the * key is already registered to a list, or other error on failure. */ -int scsi_dev_info_add_list(int key, const char *name) +int scsi_dev_info_add_list(enum scsi_devinfo_key key, const char *name) { struct scsi_dev_info_list_table *devinfo_table = scsi_devinfo_lookup_by_key(key); @@ -808,7 +810,7 @@ EXPORT_SYMBOL(scsi_dev_info_add_list); * frees the list itself. Returns 0 on success or -EINVAL if the key * can't be found. */ -int scsi_dev_info_remove_list(int key) +int scsi_dev_info_remove_list(enum scsi_devinfo_key key) { struct list_head *lh, *lh_next; struct scsi_dev_info_list_table *devinfo_table = diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h index a5946cd64caa..61024db5953d 100644 --- a/drivers/scsi/scsi_priv.h +++ b/drivers/scsi/scsi_priv.h @@ -45,7 +45,7 @@ static inline void scsi_log_completion(struct scsi_cmnd *cmd, int disposition) /* scsi_devinfo.c */ /* list of keys for the lists */ -enum { +enum scsi_devinfo_key { SCSI_DEVINFO_GLOBAL = 0, SCSI_DEVINFO_SPI, }; @@ -56,13 +56,15 @@ extern blist_flags_t scsi_get_device_flags(struct scsi_device *sdev, extern blist_flags_t scsi_get_device_flags_keyed(struct scsi_device *sdev, const unsigned char *vendor, const unsigned char *model, -int key); +enum scsi_devinfo_key key); extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor, char *model, char *strflags, - blist_flags_t flags, int key); -extern int scsi_dev_info_list_del_keyed(char *vendor, char *model, int key); -extern int scsi_dev_info_add_list(int key, const char *name); -extern int scsi_dev_info_remove_list(int key); + blist_flags_t flags, + enum scsi_devinfo_key key); +extern int scsi_dev_info_list_del_keyed(char *vendor, char *model, + enum scsi_devinfo_key key); +extern int scsi_dev_info_add_list(enum scsi_devinfo_key key, const char *name); +extern int scsi_dev_info_remove_list(enum scsi_devinfo_key key); extern int __init scsi_init_devinfo(void); extern void scsi_exit_devinfo(void); -- 2.15.0
[PATCH v3 1/2] Ensure that the SCSI error handler gets woken up
If scsi_eh_scmd_add() is called concurrently with scsi_host_queue_ready() while shost->host_blocked > 0 then it can happen that neither function wakes up the SCSI error handler. Fix this by making every function that decreases the host_busy counter wake up the error handler if necessary and by protecting the host_failed checks with the SCSI host lock. Reported-by: Pavel Tikhomirov References: https://marc.info/?l=linux-kernel&m=150461610630736 Fixes: commit 746650160866 ("scsi: convert host_busy to atomic_t") Signed-off-by: Bart Van Assche Cc: Konstantin Khorenko Cc: Stuart Hayes Cc: Pavel Tikhomirov Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: --- drivers/scsi/hosts.c | 6 ++ drivers/scsi/scsi_error.c | 18 -- drivers/scsi/scsi_lib.c | 39 --- include/scsi/scsi_host.h | 2 ++ 4 files changed, 52 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index a306af6a5ea7..a0a7e4ff255c 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -324,6 +324,9 @@ static void scsi_host_dev_release(struct device *dev) scsi_proc_hostdir_rm(shost->hostt); + /* Wait for functions invoked through call_rcu(&shost->rcu, ...) */ + rcu_barrier(); + if (shost->tmf_work_q) destroy_workqueue(shost->tmf_work_q); if (shost->ehandler) @@ -331,6 +334,8 @@ static void scsi_host_dev_release(struct device *dev) if (shost->work_q) destroy_workqueue(shost->work_q); + destroy_rcu_head(&shost->rcu); + if (shost->shost_state == SHOST_CREATED) { /* * Free the shost_dev device name here if scsi_host_alloc() @@ -399,6 +404,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) INIT_LIST_HEAD(&shost->starved_list); init_waitqueue_head(&shost->host_wait); mutex_init(&shost->scan_mutex); + init_rcu_head(&shost->rcu); index = ida_simple_get(&host_index_ida, 0, 0, GFP_KERNEL); if (index < 0) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 5e89049e9b4e..258b8a741992 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -226,6 +226,17 @@ static void scsi_eh_reset(struct scsi_cmnd *scmd) } } +static void scsi_eh_inc_host_failed(struct rcu_head *head) +{ + struct Scsi_Host *shost = container_of(head, typeof(*shost), rcu); + unsigned long flags; + + spin_lock_irqsave(shost->host_lock, flags); + shost->host_failed++; + scsi_eh_wakeup(shost); + spin_unlock_irqrestore(shost->host_lock, flags); +} + /** * scsi_eh_scmd_add - add scsi cmd to error handling. * @scmd: scmd to run eh on. @@ -248,9 +259,12 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd) scsi_eh_reset(scmd); list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q); - shost->host_failed++; - scsi_eh_wakeup(shost); spin_unlock_irqrestore(shost->host_lock, flags); + /* +* Ensure that all tasks observe the host state change before the +* host_failed change. +*/ + call_rcu(&shost->rcu, scsi_eh_inc_host_failed); } /** diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index b6d3842b6809..5cbc69b2b1ae 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -318,22 +318,39 @@ static void scsi_init_cmd_errh(struct scsi_cmnd *cmd) cmd->cmd_len = scsi_command_size(cmd->cmnd); } -void scsi_device_unbusy(struct scsi_device *sdev) +/* + * Decrement the host_busy counter and wake up the error handler if necessary. + * Avoid as follows that the error handler is not woken up if shost->host_busy + * == shost->host_failed: use call_rcu() in scsi_eh_scmd_add() in combination + * with an RCU read lock in this function to ensure that this function in its + * entirety either finishes before scsi_eh_scmd_add() increases the + * host_failed counter or that it notices the shost state change made by + * scsi_eh_scmd_add(). + */ +static void scsi_dec_host_busy(struct Scsi_Host *shost) { - struct Scsi_Host *shost = sdev->host; - struct scsi_target *starget = scsi_target(sdev); unsigned long flags; + rcu_read_lock(); atomic_dec(&shost->host_busy); - if (starget->can_queue > 0) - atomic_dec(&starget->target_busy); - - if (unlikely(scsi_host_in_recovery(shost) && -(shost->host_failed || shost->host_eh_scheduled))) { + if (unlikely(scsi_host_in_recovery(shost))) { spin_lock_irqsave(shost->host_lock, flags); - scsi_eh_wakeup(shost); + if (shost->host_failed || shost->host_eh_scheduled) + scsi_eh_wakeup(shost); spin_unlock_irqrestore(shost->host_lock, flags); } + rcu_read_unlock(); +} + +voi
[PATCH v3 0/2] Ensure that the SCSI error handler gets woken up
Hello Martin, As reported by Pavel Tikhomirov it can happen that the SCSI error handler does not get woken up. This is very annoying because it results in a queue stall. The two patches in this series address this issue without acquiring the SCSI host lock in the hot path. Please consider these patches for kernel v4.16. Thanks, Bart. Changes between v2 and v3: - Made it again safe to call scsi_eh_scmd_add() from interrupt context. Changes between v1 and v2: - Ensure that host_lock is held while checking host_failed. - Moved the lockdep_assert_held() change into a separate patch. Bart Van Assche (2): Ensure that the SCSI error handler gets woken up Convert a source code comment into a runtime check drivers/scsi/hosts.c | 6 ++ drivers/scsi/scsi_error.c | 21 ++--- drivers/scsi/scsi_lib.c | 39 --- include/scsi/scsi_host.h | 2 ++ 4 files changed, 54 insertions(+), 14 deletions(-) -- 2.15.0
[PATCH v3 2/2] Convert a source code comment into a runtime check
Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Johannes Thumshirn --- drivers/scsi/scsi_error.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 258b8a741992..9cae0194e21a 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -61,9 +61,10 @@ static int scsi_eh_try_stu(struct scsi_cmnd *scmd); static int scsi_try_to_abort_cmd(struct scsi_host_template *, struct scsi_cmnd *); -/* called with shost->host_lock held */ void scsi_eh_wakeup(struct Scsi_Host *shost) { + lockdep_assert_held(shost->host_lock); + if (atomic_read(&shost->host_busy) == shost->host_failed) { trace_scsi_eh_wakeup(shost); wake_up_process(shost->ehandler); -- 2.15.0
[PATCH] blk-mq: Fix several SCSI request queue lockups
Commit 0df21c86bdbf introduced several bugs: * A SCSI queue stall for queue depths > 1, addressed by commit 88022d7201e9 ("blk-mq: don't handle failure in .get_budget") * A systematic lockup for SCSI queues with queue depth 1. The following test reproduces that bug systematically: - Change the SRP initiator such that SCSI target queue depth is limited to 1. - Run the following command: srp-test/run_tests -f xfs -d -e none -r 60 -t 01 See also "[PATCH 4/7] blk-mq: Avoid that request processing stalls when sharing tags" (https://marc.info/?l=linux-block&m=151208695316857). Note: reverting commit 0df21c86bdbf also fixes a sporadic SCSI request queue lockup while inserting a blk_mq_sched_mark_restart_hctx() before all blk_mq_dispatch_rq_list() calls only fixes the systematic lockup for queue depth 1. * A scsi_debug lockup - see also "[PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()" (https://marc.info/?l=linux-block&m=151223233407154). I think the above means that it is too risky to try to fix all bugs introduced by commit 0df21c86bdbf before kernel v4.15 is released. Hence revert that commit. Fixes: commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq") Signed-off-by: Bart Van Assche Cc: Ming Lei Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: James E.J. Bottomley Cc: Martin K. Petersen Cc: linux-scsi@vger.kernel.org --- drivers/scsi/scsi_lib.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 84bd2b16d216..a7e7966f1477 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1976,9 +1976,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, struct scsi_device *sdev = q->queuedata; struct Scsi_Host *shost = sdev->host; struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req); - blk_status_t ret; + blk_status_t ret = BLK_STS_RESOURCE; int reason; + if (!scsi_mq_get_budget(hctx)) + goto out; ret = prep_to_mq(scsi_prep_state_check(sdev, req)); if (ret != BLK_STS_OK) goto out_put_budget; @@ -2022,6 +2024,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, atomic_dec(&scsi_target(sdev)->target_busy); out_put_budget: scsi_mq_put_budget(hctx); +out: switch (ret) { case BLK_STS_OK: break; @@ -2225,8 +2228,6 @@ struct request_queue *scsi_old_alloc_queue(struct scsi_device *sdev) } static const struct blk_mq_ops scsi_mq_ops = { - .get_budget = scsi_mq_get_budget, - .put_budget = scsi_mq_put_budget, .queue_rq = scsi_queue_rq, .complete = scsi_softirq_done, .timeout= scsi_timeout, -- 2.15.0
RE: [EXT] Re: UFS utilities
Hi, Bart Sorry for later! > >Hello Bean, > >Please be more specific. What is inconvenient about sg3_utils on embedded >ARM systems? > Exactly, I don't know how to compile sg3_utils with static library, instead of sharing library. I used following configuration Parameter: ./configure --enable-static=yes --build=x86_64-unknown-linux-gnu --host=aarch64-linux-gnu --prefix=$PWD/out/ CC=aarch64-linux-gnu-g cc --target=aarch64-linux-gnu LD=aarch64-linux-gnu-ld AS=aarch64-linux-gnu-as CFLAGS=-static LDFLAGS=-static But the object files are still dynamically linked. ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, BuildID[sha1]=4f01b4c9f1ff47bc00aef93950c02734b4cc8e57, not stripped. I want it to be statically linked. Otherwise, I should copy its library to my lib folder, and sometimes for the embedded, Need to re-build rootfs. Meanwhile, for the UFS, there are totally 27 scsi commands being used based on UFS2.1. For the most case, we just use several sg3_utils object files, don't need to copy all object files to the ending product. >> And also it doesn't support several UFS special command. > >Are you referring to SCSI commands or rather to UFS commands that fall >outside the SCSI spec? Anyway, an approach that is used by many SCSI drivers >to export information to user space that falls outside the SCSI spec is to >create additional sysfs attributes. See also the sdev_attrs and shost_attrs >members of struct scsi_host_template. > Yes, for the UFS information, I can use these interface/approach to easily get. I am thinking how about some testing case and configuration operation. Also, is it possible bypass SCSI stacks and go into directly UFS stack? Thanks for your inputs. >Bart. //Bean Huo
Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()
On Sun, 2017-12-03 at 00:31 +0800, Ming Lei wrote: > Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq") It might be safer to revert commit 0df21c86bdbf instead of trying to fix all issues introduced by that commit for kernel version v4.15 ... Bart.
[PATCH] scsi: bfa: convert to strlcpy/strlcat
The bfa driver has a number of real issues with string termination that gcc-8 now points out: drivers/scsi/bfa/bfad_bsg.c: In function 'bfad_iocmd_port_get_attr': drivers/scsi/bfa/bfad_bsg.c:320:9: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] drivers/scsi/bfa/bfa_fcs.c: In function 'bfa_fcs_fabric_psymb_init': drivers/scsi/bfa/bfa_fcs.c:775:9: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] drivers/scsi/bfa/bfa_fcs.c:781:9: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] drivers/scsi/bfa/bfa_fcs.c:788:9: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] drivers/scsi/bfa/bfa_fcs.c:801:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] drivers/scsi/bfa/bfa_fcs.c:808:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] drivers/scsi/bfa/bfa_fcs.c: In function 'bfa_fcs_fabric_nsymb_init': drivers/scsi/bfa/bfa_fcs.c:837:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] drivers/scsi/bfa/bfa_fcs.c:844:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] drivers/scsi/bfa/bfa_fcs.c:852:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] drivers/scsi/bfa/bfa_fcs.c: In function 'bfa_fcs_fabric_psymb_init': drivers/scsi/bfa/bfa_fcs.c:778:2: error: 'strncat' output may be truncated copying 10 bytes from a string of length 63 [-Werror=stringop-truncation] drivers/scsi/bfa/bfa_fcs.c:784:2: error: 'strncat' output may be truncated copying 30 bytes from a string of length 63 [-Werror=stringop-truncation] drivers/scsi/bfa/bfa_fcs.c:803:3: error: 'strncat' output may be truncated copying 44 bytes from a string of length 63 [-Werror=stringop-truncation] drivers/scsi/bfa/bfa_fcs.c:811:3: error: 'strncat' output may be truncated copying 16 bytes from a string of length 63 [-Werror=stringop-truncation] drivers/scsi/bfa/bfa_fcs.c: In function 'bfa_fcs_fabric_nsymb_init': drivers/scsi/bfa/bfa_fcs.c:840:2: error: 'strncat' output may be truncated copying 10 bytes from a string of length 63 [-Werror=stringop-truncation] drivers/scsi/bfa/bfa_fcs.c:847:2: error: 'strncat' output may be truncated copying 30 bytes from a string of length 63 [-Werror=stringop-truncation] drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_fdmi_get_hbaattr': drivers/scsi/bfa/bfa_fcs_lport.c:2657:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] drivers/scsi/bfa/bfa_fcs_lport.c:2659:11: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_lport_ms_gmal_response': drivers/scsi/bfa/bfa_fcs_lport.c:3232:5: error: 'strncpy' output may be truncated copying 16 bytes from a string of length 247 [-Werror=stringop-truncation] drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_lport_ns_send_rspn_id': drivers/scsi/bfa/bfa_fcs_lport.c:4670:3: error: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation] drivers/scsi/bfa/bfa_fcs_lport.c:4682:3: error: 'strncat' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation] drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_lport_ns_util_send_rspn_id': drivers/scsi/bfa/bfa_fcs_lport.c:5206:3: error: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation] drivers/scsi/bfa/bfa_fcs_lport.c:5215:3: error: 'strncat' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation] drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_fdmi_get_portattr': drivers/scsi/bfa/bfa_fcs_lport.c:2751:2: error: 'strncpy' specified bound 128 equals destination size [-Werror=stringop-
Driver version for PMC Adaptec HBA in Linux and from vendor
Dear Raghava, dear Linux folks, Evaluating HBA extension cards, one of our key requirement is easy maintenance, especially when upgrading the firmware. You provide the utility `arcconf` [1], which can be used for such tasks directly on the command line. Unfortunately, we can’t find the source code for this application, which is something we’d like to have when executing programs with root privileges. It’d be great to have something similar like flashrom [2], or the source of your program. Do you know the reasons, why the source of this utility is not published under a free license? Who can be contacted to discuss this issue further? Kind regards, Paul [1] http://download.adaptec.com/raid/storage_manager/arcconf_v2_05_22932.zip [2] https://www.flashrom.org/ smime.p7s Description: S/MIME Cryptographic Signature
[Bug 198081] scsi sg
https://bugzilla.kernel.org/show_bug.cgi?id=198081 --- Comment #1 from Cristian Crinteanu (crinteanu.crist...@gmail.com) --- *** Bug 198079 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are the assignee for the bug.
[Bug 198081] New: scsi sg
https://bugzilla.kernel.org/show_bug.cgi?id=198081 Bug ID: 198081 Summary: scsi sg Product: IO/Storage Version: 2.5 Kernel Version: 4.4.89 and higher Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: SCSI Assignee: linux-scsi@vger.kernel.org Reporter: crinteanu.crist...@gmail.com Regression: No Created attachment 261015 --> https://bugzilla.kernel.org/attachment.cgi?id=261015&action=edit scsi sg problems hi, just noticed that one of the following commits introduced in 4.4.89 borks some apps (like nerolinux and norton ghost 12. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3e0097499839e0fe3af380410eababe5a47c4cf9 https://lkml.org/lkml/2017/9/24/457 https://patchwork.kernel.org/patch/9968779/ https://lkml.org/lkml/2017/9/24/356 nerolinux fail to detect any drives on my pc with Call Trace: [] ? dump_stack+0x44/0x57 [] ? warn_slowpath_common+0x85/0x9a [] ? sg_rq_end_io+0x42/0x277 [] ? warn_slowpath_null+0xd/0x10 [] ? sg_rq_end_io+0x42/0x277 [] ? blk_account_io_done+0xc/0xea [] ? blk_finish_request+0x63/0x84 [] ? scsi_end_request+0x11a/0x155 [] ? scsi_io_completion+0x1c7/0x4aa [] ? blk_done_softirq+0x56/0x70 [] ? __do_softirq+0xa6/0x18b [] ? tasklet_action+0x90/0x90 [] ? do_softirq_own_stack+0x1a/0x1f [] ? irq_exit+0x3c/0x7b [] ? do_IRQ+0x7a/0x8b [] ? common_interrupt+0x33/0x38 [] ? unregister_console+0x87/0xad [] ? cpuidle_enter_state+0xdb/0x17d [] ? cpu_startup_entry+0x17f/0x1e2 [] ? start_secondary+0x13b/0x153 ---[ end trace 36fc4958c27e64df ]--- and norton ghost 12 * Date : Wed Nov 29 22:34:34 2017 Error Number: 36000 Message: A signal or windows exception occurred Version: 12.0.0.10561(Nov 2 2017, Build=10561 OS Version: Linux 4.4.100-pclos1.pae #1 SMP Tue Nov 21 16:10:25 EET 2017 i686 Command line arguments: Active Switches : AutoName PathName: FlagImplode : 0 FlagExplode : 0 Operation Details : Total size.0 MB copied..0 MB remaining...0 Percent complete...0% Speed..0 MB/min Time elapsed...0:00 Time remaining.0:00 Processor exception Generated at HardExceptionHandlerLinux.cpp:230 Program Call Stack VolumeContainerManagerLvm::addIfPv VolumeContainerManagerLvm::loadPvs VolumeContainerManagerLvm::loadVolumeGroups VolumeContainerManagerLvm::loadVolumeGroups VolumeContainerManagerLvm::constructor VolumeContainerManager::loadVolumeManagers VolumeContainerManager::constructor sub_main main Call Stack 0xeba0f62a 0xb77e8a0c 0x084644f6 0x084645ce 0x086ad8e0 0x086b3aac 0x0870f00e 0x0870f3a3 0x0870f51b 0x0870f7ab 0x0870f8de 0x086ff373 0x086ffdab 0x08700039 0x08284c59 0x08286679 0x082893df 0x08052d25 0x080541b4 End Call Stack Anyway i reversed the scsi: sg: changes introduced in 4.4.89 and everything works fine (see the attach) - now i'm running 4.4.100 with the patch attached thx! -- You are receiving this mail because: You are the assignee for the bug.
Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()
On Mon, Dec 04, 2017 at 09:44:55AM +0100, Johannes Thumshirn wrote: > Ming Lei writes: > > > > I am happy to do that, but recently I am very busy, so it may be done > > a bit late by me. > > > > But anyone should reproduce the issue 100% with V4.15-rc kernel by just > > running the above script, not any specific hardware is required at all, > > so that means anyone can make a patch for blktest to test block/SCSI > > timeout if he/she is interested in doing that. > > OK, let me see if I can spent some time on this the next days. That is great, thanks! -- Ming
Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()
Ming Lei writes: > I am happy to do that, but recently I am very busy, so it may be done > a bit late by me. > > But anyone should reproduce the issue 100% with V4.15-rc kernel by just > running the above script, not any specific hardware is required at all, > so that means anyone can make a patch for blktest to test block/SCSI > timeout if he/she is interested in doing that. OK, let me see if I can spent some time on this the next days. Byte, Johannes -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()
On Mon, Dec 04, 2017 at 09:19:33AM +0100, Johannes Thumshirn wrote: > > Hi Ming, > > Ming Lei writes: > > This issue can be triggered by the following script: > > > > #!/bin/sh > > rmmod scsi_debug > > modprobe scsi_debug max_queue=1 > > > > DEVICE=`ls -d > > /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head > > -1 | xargs basename` > > > > DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*` > > > > echo "using scsi device $DEVICE" > > echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth > > echo starting loop $i > > echo "temporary write through" >$DISK_DIR/cache_type > > echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts > > echo none > /sys/block/$DEVICE/queue/scheduler > > dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 & > > sleep 5 > > echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts > > wait > > echo "SUCCESS" > > Can you please submit a test-case for blktest as well, given you have a > nice reproducer? Hi Johannes, I am happy to do that, but recently I am very busy, so it may be done a bit late by me. But anyone should reproduce the issue 100% with V4.15-rc kernel by just running the above script, not any specific hardware is required at all, so that means anyone can make a patch for blktest to test block/SCSI timeout if he/she is interested in doing that. Thanks, Ming
Re: [PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()
Hi Ming, Ming Lei writes: > This issue can be triggered by the following script: > > #!/bin/sh > rmmod scsi_debug > modprobe scsi_debug max_queue=1 > > DEVICE=`ls -d > /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 > | xargs basename` > > DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*` > > echo "using scsi device $DEVICE" > echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth > echo starting loop $i > echo "temporary write through" >$DISK_DIR/cache_type > echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts > echo none > /sys/block/$DEVICE/queue/scheduler > dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 & > sleep 5 > echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts > wait > echo "SUCCESS" Can you please submit a test-case for blktest as well, given you have a nice reproducer? Thanks, Johannes -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850