Re: [PATCH v3] ibmvscsis: Initial commit of IBM VSCSI Tgt Driver
Hi Bryant, On Fri, 2016-05-27 at 09:32 -0500, Bryant G. Ly wrote > diff --git a/drivers/scsi/ibmvscsi/ibmvscsi_tgt.c > b/drivers/scsi/ibmvscsi/ibmvscsi_tgt.c > new file mode 100644 > index 000..292d129 > --- /dev/null > +++ b/drivers/scsi/ibmvscsi/ibmvscsi_tgt.c > + > +static struct se_portal_group *ibmvscsis_make_nexus(struct ibmvscsis_tport > + *tport, > + const char *name) > +{ > + struct se_node_acl *acl; > + > + if (tport->se_sess) { > + pr_debug("tport->se_sess already exists\n"); > + return &tport->se_tpg; > + } > + > + /* > + * Initialize the struct se_session pointer and setup tagpool > + * for struct ibmvscsis_cmd descriptors > + */ > + tport->se_sess = transport_init_session(TARGET_PROT_NORMAL); > + if (IS_ERR(tport->se_sess)) > + goto transport_init_fail; > + > + /* > + * Since we are running in 'demo mode' this call will generate a > + * struct se_node_acl for the ibmvscsis struct se_portal_group with > + * the SCSI Initiator port name of the passed configfs group 'name'. > + */ > + > + acl = core_tpg_check_initiator_node_acl(&tport->se_tpg, > + (unsigned char *)name); > + if (!acl) { > + pr_debug("core_tpg_check_initiator_node_acl() failed for %s\n", > + name); > + goto acl_failed; > + } > + tport->se_sess->se_node_acl = acl; > + > + /* > + * Now register the TCM ibmvscsis virtual I_T Nexus as active. > + */ > + transport_register_session(&tport->se_tpg, > +tport->se_sess->se_node_acl, > +tport->se_sess, tport); > + > + tport->se_sess->se_tpg = &tport->se_tpg; > + FYI, starting in v4.6 these three calls are handled directly by a new target_alloc_session() helper. No objection to leaving this as-is for now to make it easier to run atop unmodified v4.4 target code, but note you'll want to be converting this post merge. > +static int ibmvscsis_shutdown_session(struct se_session *se_sess) > +{ > + return 0; > +} > + > +static void ibmvscsis_close_session(struct se_session *se_sess) > +{ > +} > + > These two target_core_fabric_ops callers have been made optional for v4.7+ > diff --git a/drivers/scsi/ibmvscsi/ibmvscsi_tgt.h > b/drivers/scsi/ibmvscsi/ibmvscsi_tgt.h > new file mode 100644 > index 000..23e9449 > --- /dev/null > +++ b/drivers/scsi/ibmvscsi/ibmvscsi_tgt.h > + > +struct client_info { > +#define SRP_VERSION "16.a" > + char srp_version[8]; > + /* root node property ibm,partition-name */ > + char partition_name[PARTITION_NAMELEN]; > + /* root node property ibm,partition-no */ > + u32 partition_number; > + /* initially 1 */ > + u32 mad_version; > + u32 os_type; > +}; > + > +struct ibmvscsis_cmd { > + /* Used for libsrp processing callbacks */ > + struct scsi_cmnd sc; AFAICT, struct scsi_cmnd is only being used for passing io memory descriptors and struct iu_entry via sc->SCp.ptr between ibmvscsi_tgt + libsrp. Now with the other legacy libsrp.c Scsi_Host related bits removed, it should be easy to convert the rest in order to drop the last vestiges of SCSI host LLD structures from ibmvscsi_tgt code. > + /* Used for TCM Core operations */ > + struct se_cmd se_cmd; > + /* Sense buffer that will be mapped into outgoing status */ > + unsigned char sense_buf[TRANSPORT_SENSE_BUFFER]; > + u32 lun; > +}; > + > diff --git a/drivers/scsi/ibmvscsi/libsrp.h b/drivers/scsi/ibmvscsi/libsrp.h > new file mode 100644 > index 000..bf9e30b > --- /dev/null > +++ b/drivers/scsi/ibmvscsi/libsrp.h > @@ -0,0 +1,91 @@ > +#ifndef __LIBSRP_H__ > +#define __LIBSRP_H__ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +enum srp_task_attributes { > + SRP_SIMPLE_TASK = 0, > + SRP_HEAD_TASK = 1, > + SRP_ORDERED_TASK = 2, > + SRP_ACA_TASK = 4 > +}; > + > +enum iue_flags { > + V_DIOVER, > + V_WRITE, > + V_LINKED, > + V_FLYING, > +}; > + > +enum { > + SRP_TASK_MANAGEMENT_FUNCTION_COMPLETE = 0, > + SRP_REQUEST_FIELDS_INVALID = 2, > + SRP_TASK_MANAGEMENT_FUNCTION_NOT_SUPPORTED = 4, > + SRP_TASK_MANAGEMENT_FUNCTION_FAILED = 5 > +}; > + > +struct srp_buf { > + dma_addr_t dma; > + void *buf; > +}; > + > +struct srp_queue { > + void *pool; > + void *items; > + struct kfifo queue; > + spinlock_t lock; > +}; > + > +struct srp_target { > + struct Scsi_Host *shost; Unused. > + struct se_device *tgt; > + struct device *dev; > + > + spinlock_t lock; > + struct list_head cmd_queue; > + > + size_t srp_iu_size; > + struct srp_queue iu_queue; > + size_t rx_ring_size; > + struc
[PATCH] scsi: fix race between simultaneous decrements of ->host_failed
async_sas_ata_eh(), which will call scsi_eh_finish_cmd() in some case, would be performed simultaneously in sas_ata_strategy_handler(). In this case, ->host_failed may be decreased simultaneously in scsi_eh_finish_cmd() on different CPUs, and become abnormal. It will lead to permanently inequal between ->host_failed and ->host_busy. Then SCSI error handler thread won't become running, SCSI errors after that won't be handled forever. Use atomic type for ->host_failed to fix this race. Signed-off-by: Wei Fang --- drivers/ata/libata-eh.c |2 +- drivers/scsi/libsas/sas_scsi_host.c |5 +++-- drivers/scsi/scsi.c |2 +- drivers/scsi/scsi_error.c | 15 +-- drivers/scsi/scsi_lib.c |3 ++- include/scsi/scsi_host.h|2 +- 6 files changed, 17 insertions(+), 12 deletions(-) diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index 961acc7..a0e7612 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -606,7 +606,7 @@ void ata_scsi_error(struct Scsi_Host *host) ata_scsi_port_error_handler(host, ap); /* finish or retry handled scmd's and clean up */ - WARN_ON(host->host_failed || !list_empty(&eh_work_q)); + WARN_ON(atomic_read(&host->host_failed) || !list_empty(&eh_work_q)); DPRINTK("EXIT\n"); } diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c index 519dac4..8d74003 100644 --- a/drivers/scsi/libsas/sas_scsi_host.c +++ b/drivers/scsi/libsas/sas_scsi_host.c @@ -757,7 +757,8 @@ retry: spin_unlock_irq(shost->host_lock); SAS_DPRINTK("Enter %s busy: %d failed: %d\n", - __func__, atomic_read(&shost->host_busy), shost->host_failed); + __func__, atomic_read(&shost->host_busy), + atomic_read(&shost->host_failed)); /* * Deal with commands that still have SAS tasks (i.e. they didn't * complete via the normal sas_task completion mechanism), @@ -800,7 +801,7 @@ out: SAS_DPRINTK("--- Exit %s: busy: %d failed: %d tries: %d\n", __func__, atomic_read(&shost->host_busy), - shost->host_failed, tries); + atomic_read(&shost->host_failed), tries); } enum blk_eh_timer_return sas_scsi_timed_out(struct scsi_cmnd *cmd) diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 1deb6ad..7840915 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -527,7 +527,7 @@ void scsi_log_completion(struct scsi_cmnd *cmd, int disposition) scmd_printk(KERN_INFO, cmd, "scsi host busy %d failed %d\n", atomic_read(&cmd->device->host->host_busy), - cmd->device->host->host_failed); + atomic_read(&cmd->device->host->host_failed)); } } } diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 984ddcb..f37666f 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -62,7 +62,8 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *, /* called with shost->host_lock held */ void scsi_eh_wakeup(struct Scsi_Host *shost) { - if (atomic_read(&shost->host_busy) == shost->host_failed) { + if (atomic_read(&shost->host_busy) == + atomic_read(&shost->host_failed)) { trace_scsi_eh_wakeup(shost); wake_up_process(shost->ehandler); SCSI_LOG_ERROR_RECOVERY(5, shost_printk(KERN_INFO, shost, @@ -250,7 +251,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) eh_flag &= ~SCSI_EH_CANCEL_CMD; scmd->eh_eflags |= eh_flag; list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q); - shost->host_failed++; + atomic_inc(&shost->host_failed); scsi_eh_wakeup(shost); out_unlock: spin_unlock_irqrestore(shost->host_lock, flags); @@ -1127,7 +1128,7 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn) */ void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q) { - scmd->device->host->host_failed--; + atomic_dec(&scmd->device->host->host_failed); scmd->eh_eflags = 0; list_move_tail(&scmd->eh_entry, done_q); } @@ -2190,8 +2191,10 @@ int scsi_error_handler(void *data) if (kthread_should_stop()) break; - if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) || - shost->host_failed != atomic_read(&shost->host_busy)) { + if ((atomic_read(&shost->host_failed) == 0 && +shost->host_eh_scheduled == 0) || + (atomic_read(&shost->host_failed) != +atomic_read(&shost->host_busy))) { SCSI_LOG_ERROR_
[GIT PULL] target updates for v4.7-rc1
Hi Linus, Here are the outstanding target pending updates for v4.7-rc1. Please go ahead and pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git for-next The highlights this round include: - Allow external PR/ALUA metadata path be defined at runtime via top level configfs attribute. (Lee) - Fix target session shutdown bug for ib_srpt multi-channel. (hch) - Make TFO close_session() and shutdown_session() optional. (hch) - Drop se_sess->sess_kref + convert tcm_qla2xxx to internal kref. (hch) - Add tcm_qla2xxx endpoint attribute for basic FC jammer. (Laurence) - Refactor iscsi-target RX/TX PDU encode/decode into common code. (Varun) - Extend iscsit_transport with xmit_pdu, release_cmd, get_rx_pdu, validate_parameters, and get_r2t_ttt for generic ISO offload. (Varun) - Initial merge of cxgb iscsi-segment offload target driver. (Varun) The bulk of the changes are Chelsio's new driver, along with a number of iscsi-target common code improvements made by Varun + Co along the way. Thank you, --nab Christoph Hellwig (7): target: consolidate and fix session shutdown target: remove acl_stop target: make ->shutdown_session optional target: make close_session optional tcm_qla2xxx: introduce a private sess_kref iscsi-target: remove usage of ->shutdown_session target: remove sess_kref and ->shutdown_session Colin Ian King (1): target: need_to_release is always false, remove redundant check and kfree Imran Haider (1): iscsi-target: graceful disconnect on invalid mapping to iovec Laurence Oberman (1): tcm_qla2xxx Add SCSI command jammer/discard capability Lee Duncan (2): target: make target db location configurable target: use new "dbroot" target attribute Nicholas Bellinger (4): iscsi-target: Make iscsi_tpg_np driver show/store use generic code iscsi-target: Convert transport drivers to signal rdma_shutdown cxgbit: Use type ISCSI_CXGBIT + cxgbit tpg_np attribute iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race Varun Prakash (13): iscsi-target: add int (*iscsit_xmit_pdu)() iscsi-target: add void (*iscsit_release_cmd)() iscsi-target: add void (*iscsit_get_rx_pdu)() iscsi-target: split iscsi_target_rx_thread() iscsi-target: add int (*iscsit_validate_params)() iscsi-target: add void (*iscsit_get_r2t_ttt)() iscsi-target: move iscsit_thread_check_cpumask() iscsi-target: use conn_transport->transport_type in text rsp iscsi-target: add new offload transport type iscsi-target: clear tx_thread_active iscsi-target: call complete on conn_logout_comp iscsi-target: export symbols cxgbit: add files for cxgbit.ko Documentation/scsi/tcm_qla2xxx.txt| 22 + Documentation/target/tcm_mod_builder.py | 16 - drivers/infiniband/ulp/isert/ib_isert.c | 11 + drivers/infiniband/ulp/srpt/ib_srpt.c |9 - drivers/scsi/qla2xxx/Kconfig |9 + drivers/scsi/qla2xxx/qla_target.c | 56 +- drivers/scsi/qla2xxx/qla_target.h |4 +- drivers/scsi/qla2xxx/tcm_qla2xxx.c| 59 +- drivers/scsi/qla2xxx/tcm_qla2xxx.h|1 + drivers/target/iscsi/Kconfig |2 + drivers/target/iscsi/Makefile |1 + drivers/target/iscsi/cxgbit/Kconfig |7 + drivers/target/iscsi/cxgbit/Makefile |6 + drivers/target/iscsi/cxgbit/cxgbit.h | 353 drivers/target/iscsi/cxgbit/cxgbit_cm.c | 2086 + drivers/target/iscsi/cxgbit/cxgbit_ddp.c | 325 drivers/target/iscsi/cxgbit/cxgbit_lro.h | 72 + drivers/target/iscsi/cxgbit/cxgbit_main.c | 702 +++ drivers/target/iscsi/cxgbit/cxgbit_target.c | 1561 +++ drivers/target/iscsi/iscsi_target.c | 701 +++ drivers/target/iscsi/iscsi_target_configfs.c | 158 +- drivers/target/iscsi/iscsi_target_datain_values.c |1 + drivers/target/iscsi/iscsi_target_erl0.c |2 +- drivers/target/iscsi/iscsi_target_login.c | 17 +- drivers/target/iscsi/iscsi_target_nego.c | 19 +- drivers/target/iscsi/iscsi_target_parameters.c|1 + drivers/target/iscsi/iscsi_target_util.c |5 + drivers/target/loopback/tcm_loop.c| 12 - drivers/target/sbp/sbp_target.c | 12 - drivers/target/target_core_alua.c |6 +- drivers/target/target_core_configfs.c | 70 +- drivers/target/target_core_internal.h |6 + drivers/target/target_core_pr.c |2 +- drivers/target/target_core_rd.c |4 - drivers/target/target_core_tpg.c | 83 +- drivers/target/target_core_transport.c| 26 +- drivers/target/tcm_fc/tcm_fc.h|1 - drivers/target/tcm_fc/tfc_conf.c |1 - driv
Re: bio-based DM multipath is back from the dead [was: Re: Notes from the four separate IO track sessions at LSF/MM]
On Fri, May 27 2016 at 11:42am -0400, Hannes Reinecke wrote: > On 05/27/2016 04:44 PM, Mike Snitzer wrote: > >On Fri, May 27 2016 at 4:39am -0400, > >Hannes Reinecke wrote: > > > [ .. ] > >>No, the real issue is load-balancing. > >>If you have several paths you have to schedule I/O across all paths, > >>_and_ you should be feeding these paths efficiently. > > > > >detailed in the multipath paper I refernced> > > > >Right, as my patch header details, this is the only limitation that > >remains with the reinstated bio-based DM multipath. > > > > :-) > And the very reason why we went into request-based multipathing in > the first place... > > >>I was sort-of hoping that with the large bio work from Shaohua we > > > >I think you mean Ming Lei and his multipage biovec work? > > > Errm. Yeah, of course. Apologies. > > >>could build bio which would not require any merging, ie building > >>bios which would be assembled into a single request per bio. > >>Then the above problem wouldn't exist anymore and we _could_ do > >>scheduling on bio level. > >>But from what I've gathered this is not always possible (eg for > >>btrfs with delayed allocation). > > > >I doubt many people are running btrfs over multipath in production > >but... > > > Hey. There is a company who does ... > > >Taking a step back: reinstating bio-based DM multipath is _not_ at the > >expense of request-based DM multipath. As you can see I've made it so > >that all modes (bio-based, request_fn rq-based, and blk-mq rq-based) are > >supported by a single DM multipath target. When the trnasition to > >request-based happened it would've been wise to preserve bio-based but I > >digress... > > > >So, the point is: there isn't any one-size-fits-all DM multipath queue > >mode here. If a storage config benefits from the request_fn IO > >schedulers (but isn't hurt by .request_fn's queue lock, so slower > >rotational storage?) then use queue_mode=2. If the storage is connected > >to a large NUMA system and there is some reason to want to use blk-mq > >request_queue at the DM level: use queue_mode=3. If the storage is > >_really_ fast and doesn't care about extra IO grooming (e.g. sorting and > >merging) then select bio-based using queue_mode=1. > > > >I collected some quick performance numbers against a null_blk device, on > >a single NUMA node system, with various DM layers ontop -- the multipath > >runs are only with a single path... fio workload is just 10 sec randread: > > > Which is precisely the point. > Everything's nice and shiny with a single path, as then the above > issue simply doesn't apply. Heh, as you can see from the request_fn results, that wasn't the case until very recently with all the DM multipath blk-mq advances.. But my broader thesis is that for really fast storage it is looking increasingly likely that we don't _need_ or want to have the multipathing layer dealing with requests. Not unless there is some inherent big win. request cloning is definitely heavier than bio cloning. And as you can probably infer, my work to reinstate bio-based multipath is focused precisely at the fast storage case in the hopes of avoiding hch's threat to pull multipathing down into the NVMe over fabrics driver. > Things only start getting interesting if you have _several_ paths. > So the benchmarks only prove that device-mapper doesn't add too much > of an overhead; they don't prove that the above point has been > addressed... Right, but I don't really care if it is addressed by bio-based because we have the request_fn mode that offers the legacy IO schedulers. The fact that request_fn multipath has been adequate for the enterprise rotational storage arrays is somehwat surprising... the queue_lock is a massive bottleneck. But if bio merging (via multipage biovecs) does prove itself to be a win for bio-based multipath for all storage (slow and fast) then yes that'll be really good news. Nice to have options... we can dial in the option that is best for a specific usecase/deployment and fix what isn't doing well. > [ .. ] > >>Have you found another way of addressing this problem? > > > >No, bio sorting/merging really isn't a problem for DM multipath to > >solve. > > > >Though Jens did say (in the context of one of these dm-crypt bulk mode > >threads) that the block core _could_ grow some additional _minimalist_ > >capability for bio merging: > >https://www.redhat.com/archives/dm-devel/2015-November/msg00130.html > > > >I'd like to understand a bit more about what Jens is thinking in that > >area because it could benefit DM thinp as well (though that is using bio > >sorting rather than merging, introduced via commit 67324ea188). > > > >I'm not opposed to any line of future development -- but development > >needs to be driven by observed limitations while testing on _real_ > >hardware. > > > In the end, with Ming Leis multipage bvec work we essentially > already moved some merging ability into the bios; during > bio_add_page() the block la
Re: bio-based DM multipath is back from the dead [was: Re: Notes from the four separate IO track sessions at LSF/MM]
On 05/27/2016 04:44 PM, Mike Snitzer wrote: On Fri, May 27 2016 at 4:39am -0400, Hannes Reinecke wrote: [ .. ] No, the real issue is load-balancing. If you have several paths you have to schedule I/O across all paths, _and_ you should be feeding these paths efficiently. Right, as my patch header details, this is the only limitation that remains with the reinstated bio-based DM multipath. :-) And the very reason why we went into request-based multipathing in the first place... I was sort-of hoping that with the large bio work from Shaohua we I think you mean Ming Lei and his multipage biovec work? Errm. Yeah, of course. Apologies. could build bio which would not require any merging, ie building bios which would be assembled into a single request per bio. Then the above problem wouldn't exist anymore and we _could_ do scheduling on bio level. But from what I've gathered this is not always possible (eg for btrfs with delayed allocation). I doubt many people are running btrfs over multipath in production but... Hey. There is a company who does ... Taking a step back: reinstating bio-based DM multipath is _not_ at the expense of request-based DM multipath. As you can see I've made it so that all modes (bio-based, request_fn rq-based, and blk-mq rq-based) are supported by a single DM multipath target. When the trnasition to request-based happened it would've been wise to preserve bio-based but I digress... So, the point is: there isn't any one-size-fits-all DM multipath queue mode here. If a storage config benefits from the request_fn IO schedulers (but isn't hurt by .request_fn's queue lock, so slower rotational storage?) then use queue_mode=2. If the storage is connected to a large NUMA system and there is some reason to want to use blk-mq request_queue at the DM level: use queue_mode=3. If the storage is _really_ fast and doesn't care about extra IO grooming (e.g. sorting and merging) then select bio-based using queue_mode=1. I collected some quick performance numbers against a null_blk device, on a single NUMA node system, with various DM layers ontop -- the multipath runs are only with a single path... fio workload is just 10 sec randread: Which is precisely the point. Everything's nice and shiny with a single path, as then the above issue simply doesn't apply. Things only start getting interesting if you have _several_ paths. So the benchmarks only prove that device-mapper doesn't add too much of an overhead; they don't prove that the above point has been addressed... [ .. ] Have you found another way of addressing this problem? No, bio sorting/merging really isn't a problem for DM multipath to solve. Though Jens did say (in the context of one of these dm-crypt bulk mode threads) that the block core _could_ grow some additional _minimalist_ capability for bio merging: https://www.redhat.com/archives/dm-devel/2015-November/msg00130.html I'd like to understand a bit more about what Jens is thinking in that area because it could benefit DM thinp as well (though that is using bio sorting rather than merging, introduced via commit 67324ea188). I'm not opposed to any line of future development -- but development needs to be driven by observed limitations while testing on _real_ hardware. In the end, with Ming Leis multipage bvec work we essentially already moved some merging ability into the bios; during bio_add_page() the block layer will already merge bios together. (I'll probably be yelled at by hch for ignorance for the following, but nevertheless) From my POV there are several areas of 'merging' which currently happen: a) bio merging: combine several consecutive bios into a larger one; should be largely address by Ming Leis multipage bvec b) bio sorting: reshuffle bios so that any requests on the request queue are ordered 'best' for the underlying hardware (ie the actual I/O scheduler). Not implemented for mq, and actually of questionable value for fast storage. One of the points I'll be testing in the very near future; ideally we find that it's not _that_ important (compared to the previous point), then we could drop it altogether for mq. c) clustering: coalescing several consecutive pages/bvecs into a single SG element. Obviously only can happen if you have large enough requests. But the only gain is shortening the number of SG elements for a requests. Again of questionable value as the request itself and the amount of data to transfer isn't changed. And another point of performance testing on my side. So ideally we will find that b) and c) only contribute with a small amount to the overall performance, then we could easily drop it for MQ and concentrate on make bio merging work well. Then it wouldn't really matter if we were doing bio-based or request-based multipathing as we had a 1:1 relationship, and this entire discussion could go away. Well. Or that's the hope, at least. Cheers, Hannes -- Dr. Hannes Reinecke
Re: bio-based DM multipath is back from the dead [was: Re: Notes from the four separate IO track sessions at LSF/MM]
On Fri, May 27 2016 at 4:39am -0400, Hannes Reinecke wrote: > On 05/26/2016 04:38 AM, Mike Snitzer wrote: > >On Thu, Apr 28 2016 at 11:40am -0400, > >James Bottomley wrote: > > > >>On Thu, 2016-04-28 at 08:11 -0400, Mike Snitzer wrote: > >>>Full disclosure: I'll be looking at reinstating bio-based DM multipath to > >>>regain efficiencies that now really matter when issuing IO to extremely > >>>fast devices (e.g. NVMe). bio cloning is now very cheap (due to > >>>immutable biovecs), coupled with the emerging multipage biovec work that > >>>will help construct larger bios, so I think it is worth pursuing to at > >>>least keep our options open. > > > >Please see the 4 topmost commits I've published here: > >https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8 > > > >All request-based DM multipath support/advances have been completly > >preserved. I've just made it so that we can now have bio-based DM > >multipath too. > > > >All of the various modes have been tested using mptest: > >https://github.com/snitm/mptest > > > >>OK, but remember the reason we moved from bio to request was partly to > >>be nearer to the device but also because at that time requests were > >>accumulations of bios which had to be broken out, go back up the stack > >>individually and be re-elevated, which adds to the inefficiency. In > >>theory the bio splitting work will mean that we only have one or two > >>split bios per request (because they were constructed from a split up > >>huge bio), but when we send them back to the top to be reconstructed as > >>requests there's no guarantee that the split will be correct a second > >>time around and we might end up resplitting the already split bios. If > >>you do reassembly into the huge bio again before resend down the next > >>queue, that's starting to look like quite a lot of work as well. > > > >I've not even delved into the level you're laser-focused on here. > >But I'm struggling to grasp why multipath is any different than any > >other bio-based device... > > > Actually, _failover_ is not the primary concern. This is on a > (relative) slow path so any performance degradation during failover > is acceptable. > > No, the real issue is load-balancing. > If you have several paths you have to schedule I/O across all paths, > _and_ you should be feeding these paths efficiently. Right, as my patch header details, this is the only limitation that remains with the reinstated bio-based DM multipath. > I was sort-of hoping that with the large bio work from Shaohua we I think you mean Ming Lei and his multipage biovec work? > could build bio which would not require any merging, ie building > bios which would be assembled into a single request per bio. > Then the above problem wouldn't exist anymore and we _could_ do > scheduling on bio level. > But from what I've gathered this is not always possible (eg for > btrfs with delayed allocation). I doubt many people are running btrfs over multipath in production but... Taking a step back: reinstating bio-based DM multipath is _not_ at the expense of request-based DM multipath. As you can see I've made it so that all modes (bio-based, request_fn rq-based, and blk-mq rq-based) are supported by a single DM multipath target. When the trnasition to request-based happened it would've been wise to preserve bio-based but I digress... So, the point is: there isn't any one-size-fits-all DM multipath queue mode here. If a storage config benefits from the request_fn IO schedulers (but isn't hurt by .request_fn's queue lock, so slower rotational storage?) then use queue_mode=2. If the storage is connected to a large NUMA system and there is some reason to want to use blk-mq request_queue at the DM level: use queue_mode=3. If the storage is _really_ fast and doesn't care about extra IO grooming (e.g. sorting and merging) then select bio-based using queue_mode=1. I collected some quick performance numbers against a null_blk device, on a single NUMA node system, with various DM layers ontop -- the multipath runs are only with a single path... fio workload is just 10 sec randread: FIO_QUEUE_DEPTH=32 FIO_RUNTIME=10 FIO_NUMJOBS=12 {FIO} --numa_cpu_nodes=${NID} --numa_mem_policy=bind:${NID} --cpus_allowed_policy=split --group_reporting --rw=randread --bs=4k --numjobs=${FIO_NUMJOBS} \ --iodepth=${FIO_QUEUE_DEPTH} --runtime=${FIO_RUNTIME} --time_based --loops=1 --ioengine=libaio \ --direct=1 --invalidate=1 --randrepeat=1 --norandommap --exitall --name task_${TASK_NAME} --filename=${DEVICE}" I need real hardware (NVMe over Fabrics please!) to really test this stuff properly; but I think the following results at least approximate the relative performance of each multipath mode. On null_blk blk-mq -- baseline: null_blk blk-mq iops=1936.3K dm-linear iops=1616.1K multipath using round-robin path-selector: bio-based iops=1579.8K blk-mq rq-bas
[PATCH v3] ibmvscsis: Initial commit of IBM VSCSI Tgt Driver
Version 1: This initial commit contains WIP of the IBM VSCSI Target Fabric Module. It currently supports read/writes, and I have tested the ability to create a file backstore with the driver, install RHEL, and then boot up the partition via filio backstore through the driver. This driver is a pick up of the old IBM VIO scsi Target Driver that was started by Nick and Fujita 2-4 years ago. http://comments.gmane.org/gmane.linux.scsi/90119 and http://marc.info/?t=12973408564&r=1&w=2 The driver provides a virtual SCSI device on IBM Power Servers. When reviving the old libsrp, I stripped out all that utilized scsi to submit commands to the target. Hence there is no more scsi_tgt_if_*, and scsi_transport_* files and fully utilizes LIO instead. This driver does however use the SRP protocol for communication between guests/and or hosts, but its all synchronous data transfers due to the utilization of H_COPY_RDMA, a VIO mechanism which means that others like ib_srp, ib_srpt which are asynchronous can't use this driver. This was also the reason for moving libsrp out of the drivers/scsi/. and into the ibmvscsi folder. Signed-off-by: Steven Royer Signed-off-by: Tyrel Datwyler Signed-off-by: Bryant G. Ly --- MAINTAINERS | 10 + drivers/scsi/Kconfig | 27 +- drivers/scsi/Makefile|2 +- drivers/scsi/ibmvscsi/Makefile |4 + drivers/scsi/ibmvscsi/ibmvscsi_tgt.c | 1932 ++ drivers/scsi/ibmvscsi/ibmvscsi_tgt.h | 169 +++ drivers/scsi/ibmvscsi/libsrp.c | 386 +++ drivers/scsi/ibmvscsi/libsrp.h | 91 ++ drivers/scsi/libsrp.c| 447 include/scsi/libsrp.h| 78 -- 10 files changed, 2610 insertions(+), 536 deletions(-) create mode 100644 drivers/scsi/ibmvscsi/ibmvscsi_tgt.c create mode 100644 drivers/scsi/ibmvscsi/ibmvscsi_tgt.h create mode 100644 drivers/scsi/ibmvscsi/libsrp.c create mode 100644 drivers/scsi/ibmvscsi/libsrp.h delete mode 100644 drivers/scsi/libsrp.c delete mode 100644 include/scsi/libsrp.h diff --git a/MAINTAINERS b/MAINTAINERS index 9c567a4..0f412d4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5451,6 +5451,16 @@ S: Supported F: drivers/scsi/ibmvscsi/ibmvscsi* F: drivers/scsi/ibmvscsi/viosrp.h +IBM Power Virtual SCSI Device Target Driver +M: Bryant G. Ly +L: linux-scsi@vger.kernel.org +L: target-de...@vger.kernel.org +S: Supported +F: drivers/scsi/ibmvscsi/ibmvscsis.c +F: drivers/scsi/ibmvscsi/ibmvscsis.h +F: drivers/scsi/ibmvscsi/libsrp.c +F: drivers/scsi/ibmvscsi/libsrp.h + IBM Power Virtual FC Device Drivers M: Tyrel Datwyler L: linux-scsi@vger.kernel.org diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index c5883a5..0f8a1de 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -848,6 +848,23 @@ config SCSI_IBMVSCSI To compile this driver as a module, choose M here: the module will be called ibmvscsi. +config SCSI_IBMVSCSIS + tristate "IBM Virtual SCSI Server support" + depends on PPC_PSERIES && TARGET_CORE && SCSI && PCI + help + This is the IBM POWER Virtual SCSI Target Server + This driver uses the SRP protocol for communication betwen servers + guest and/or the host that run on the same server. + More information on VSCSI protocol can be found at www.power.org + + The userspace configuration needed to initialize the driver can be + be found here: + + https://github.com/powervm/ibmvscsis/wiki/Configuration + + To compile this driver as a module, choose M here: the + module will be called ibmvstgt. + config SCSI_IBMVFC tristate "IBM Virtual FC support" depends on PPC_PSERIES && SCSI @@ -1729,16 +1746,6 @@ config SCSI_PM8001 This driver supports PMC-Sierra PCIE SAS/SATA 8x6G SPC 8001 chip based host adapters. -config SCSI_SRP - tristate "SCSI RDMA Protocol helper library" - depends on SCSI && PCI - select SCSI_TGT - help - If you wish to use SRP target drivers, say Y. - - To compile this driver as a module, choose M here: the - module will be called libsrp. - config SCSI_BFA_FC tristate "Brocade BFA Fibre Channel Support" depends on PCI && SCSI diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile index 0335d28..ea2030c 100644 --- a/drivers/scsi/Makefile +++ b/drivers/scsi/Makefile @@ -127,8 +127,8 @@ obj-$(CONFIG_SCSI_LASI700) += 53c700.o lasi700.o obj-$(CONFIG_SCSI_SNI_53C710) += 53c700.o sni_53c710.o obj-$(CONFIG_SCSI_NSP32) += nsp32.o obj-$(CONFIG_SCSI_IPR) += ipr.o -obj-$(CONFIG_SCSI_SRP) += libsrp.o obj-$(CONFIG_SCSI_IBMVSCSI)+= ibmvscsi/ +obj-$(CONFIG_SCSI_IBMVSCSIS) += ibmvscsi/ obj-$(CONFIG_SCSI_IBMVFC) += ibmvscsi/ obj-$(CONFIG_SCSI_HPTIOP) += hptiop.o obj-$(CONFIG_SCSI
Re: iscsi fails to attach to targets with kernel 4.4.0
I'm facing the same problem, attaching additional log (maybe it helps) May 27 07:47:05 oldcompute02 kernel: [ 1168.239065] [ cut here ] May 27 07:47:05 oldcompute02 kernel: [ 1168.239077] WARNING: CPU: 24 PID: 12116 at /build/linux-lts-xenial-7RlTta/linux-lts- xenial-4.4.0/fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80() May 27 07:47:05 oldcompute02 kernel: [ 1168.239080] sysfs: cannot create duplicate filename '/bus/scsi/devices/11:0:0:0' May 27 07:47:05 oldcompute02 kernel: [ 1168.239082] Modules linked in: vhost_net vhost macvtap macvlan veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables scsi_dh_alua dm_round_robin ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_multipath vport_vxlan openvswitch nf_defrag_ipv6 nf_conntrack bonding binfmt_misc x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif ipmi_devintf irqbypass crct10dif_pclmul crc32_pclmul aesni_intel dcdbas input_leds joydev cdc_ether usbnet mii aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd acpi_power_meter acpi_pad ipmi_si ipmi_msghandler 8250_fintek mei_me mei wmi sb_edac shpchp edac_core lpc_ich mac_hid lp parport hid_generic bnx2x vxlan ip6_udp_tunnel ahci usbhid udp_tunnel ptp hid libahci pps_core megaraid_sas mdio libcrc32c fjes May 27 07:47:05 oldcompute02 kernel: [ 1168.239182] CPU: 24 PID: 12116 Comm: iscsiadm Tainted: GW 4.4.0-22-generic #40~14.04.1-Ubuntu May 27 07:47:05 oldcompute02 kernel: [ 1168.239185] Hardware name: Dell Inc. PowerEdge M620/06GTV1, BIOS 2.5.4 01/27/2016 May 27 07:47:05 oldcompute02 kernel: [ 1168.239188] 881fe34eb9e8 813cde6c 881fe34eba30 May 27 07:47:05 oldcompute02 kernel: [ 1168.239192] 81ada130 881fe34eba20 8107d856 881fe5c2d000 May 27 07:47:05 oldcompute02 kernel: [ 1168.239196] 881fef80b350 883ff1802870 0001 ffef May 27 07:47:05 oldcompute02 kernel: [ 1168.239200] Call Trace: May 27 07:47:05 oldcompute02 kernel: [ 1168.239209] [] dump_stack+0x63/0x87 May 27 07:47:05 oldcompute02 kernel: [ 1168.239216] [] warn_slowpath_common+0x86/0xc0 May 27 07:47:05 oldcompute02 kernel: [ 1168.239221] [] warn_slowpath_fmt+0x4c/0x50 May 27 07:47:05 oldcompute02 kernel: [ 1168.239225] [] sysfs_warn_dup+0x64/0x80 May 27 07:47:05 oldcompute02 kernel: [ 1168.239230] [] sysfs_do_create_link_sd.isra.2+0xaa/0xb0 May 27 07:47:05 oldcompute02 kernel: [ 1168.239234] [] sysfs_create_link+0x25/0x40 May 27 07:47:05 oldcompute02 kernel: [ 1168.239242] [] bus_add_device+0x10b/0x1f0 May 27 07:47:05 oldcompute02 kernel: [ 1168.239247] [] device_add+0x3b5/0x610 May 27 07:47:05 oldcompute02 kernel: [ 1168.239252] [] scsi_sysfs_add_sdev+0xa5/0x290 May 27 07:47:05 oldcompute02 kernel: [ 1168.239258] [] scsi_probe_and_add_lun+0xb65/0xd80 May 27 07:47:05 oldcompute02 kernel: [ 1168.239266] [] ? __pm_runtime_resume+0x5c/0x70 May 27 07:47:05 oldcompute02 kernel: [ 1168.239271] [] __scsi_scan_target+0xd2/0x230 May 27 07:47:05 oldcompute02 kernel: [ 1168.239276] [] scsi_scan_target+0xd7/0xf0 May 27 07:47:05 oldcompute02 kernel: [ 1168.239293] [] iscsi_user_scan_session.part.14+0x105/0x140 [scsi_transport_iscsi] May 27 07:47:05 oldcompute02 kernel: [ 1168.239303] [] ? iscsi_user_scan_session.part.14+0x140/0x140 [scsi_transport_iscsi] May 27 07:47:05 oldcompute02 kernel: [ 1168.239313] [] iscsi_user_scan_session+0x1e/0x30 [scsi_transport_iscsi] May 27 07:47:05 oldcompute02 kernel: [ 1168.239318] [] device_for_each_child+0x43/0x70 May 27 07:47:05 oldcompute02 kernel: [ 1168.239328] [] iscsi_user_scan+0x2e/0x30 [scsi_transport_iscsi] May 27 07:47:05 oldcompute02 kernel: [ 1168.239331] [] store_scan+0xa6/0x100 May 27 07:47:05 oldcompute02 kernel: [ 1168.239338] [] ? calibrate_delay+0x5b9/0x607 May 27 07:47:05 oldcompute02 kernel: [ 1168.239346] [] ? __kmalloc+0x1d6/0x270 May 27 07:47:05 oldcompute02 kernel: [ 1168.239350] [] dev_attr_store+0x18/0x30 May 27 07:47:05 oldcompute02 kernel: [ 1168.239354] [] sysfs_kf_write+0x3a/0x50 May 27 07:47:05 oldcompute02 kernel: [ 1168.239358] [] kernfs_fop_write+0x120/0x170 May 27 07:47:05 oldcompute02 kernel: [ 1168.239364] [] __vfs_write+0x18/0x40 May 27 07:47:05 oldcompute02 kernel: [ 1168.239367] [] vfs_write+0xa2/0x1a0 May 27 07:47:05 oldcompute02 kernel: [ 1168.239371] [] ? do_sys_open+0x1b8/0x270 May 27 07:47:05 oldcompute02 kernel: [ 1168.239375] [] SyS_write+0x46/0xa0 May 27 07:47:05 oldcompute02 kernel: [ 1168.239382] [] entry_SYSCALL_64_fastpath+0x16/0x75 May 27 07:47:05 oldcompute02 kernel: [ 1168.239386] ---[ end trace a628d295136d7a5b ]- -- May 27 07:47:05 oldcompute02 kernel: [ 1168.239432] scsi 11:0:0:0:
Re: BLKZEROOUT not zeroing md dev on VMDK
There seems to be some sort of race condition between blkdev_issue_zeroout() and the scsi disk driver (disabling write same after an illegal request). On my UAS drive, sometimes `blkdiscard -z /dev/sdX` will return right away, even though if I then check `write_same_max_bytes` it has turned 0. Sometimes it will just write zero with SCSI WRITE even if `write_same_max_bytes` is 33553920 before I issue `blkdiscard -z` (`write_same_max_bytes` also turned 0, as expected). Not sure if it is directly related to the case here though. On 27 May 2016 at 12:45, Sitsofe Wheeler wrote: > On 27 May 2016 at 05:18, Darrick J. Wong wrote: >> >> It's possible that the pvscsi device advertised WRITE SAME, but if the device >> sends back ILLEGAL REQUEST then the SCSI disk driver will set >> write_same_max_bytes=0. Subsequent BLKZEROOUT attempts will then issue >> writes >> of zeroes to the drive. > > Thanks for following up on this but that's not what happens on the md > device - you can go on to issue as many BLKZEROOUT requests as you > like but the md disk is never zeroed nor is an error returned. > > I filed a bug at https://bugzilla.kernel.org/show_bug.cgi?id=118581 > (see https://bugzilla.kernel.org/show_bug.cgi?id=118581#c6 for > alternative reproduction steps that use scsi_debug and can be reworked > to impact device mapper) and Shaohua Li noted that > blkdev_issue_write_same could return 0 even when the disk didn't > support write same (see > https://bugzilla.kernel.org/show_bug.cgi?id=118581#c8 ). > > Shaohua went on to create a patch for this ("block: correctly fallback > for zeroout" - https://patchwork.kernel.org/patch/9137311/ ) which has > yet to be reviewed. > > -- > Sitsofe | http://sucs.org/~sits/ > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] scsi: add new async device reset support
On 05/27/2016 10:23 AM, Christoph Hellwig wrote: Adding Hannes to the Cc list as he's been looking into EH improvements in this area. On Wed, May 25, 2016 at 02:55:02AM -0500, mchri...@redhat.com wrote: From: Mike Christie Currently, if the SCSI eh runs then before we do a LUN_RESET we stop the host. This patch and the block layer one before it begin to add infrastructure to be able to do a LUN_RESET and eventually do a transport level recovery without having to stop the host. For LUn-reset, this patch adds a new callout, eh_async_device_reset_handler, which works similar to how LLDs handle SG_SCSI_RESET_DEVICE where the LLD manages the commands that are affected. eh_async_device_reset_handler: The LLD should perform a LUN RESET that affects all commands that have been accepted by its queuecommand callout for the device passed in to the callout. While the reset handler is running, queuecommand will not be running or called for the device. Unlike eh_device_reset_handler, queuecommand may still be called for other devices, and the LLD must call scsi_done for the commands that have been affected by the reset. If SUCCESS or FAST_IO_FAIL is returned, the scsi_cmnds cleaned up must be failed with DID_ABORT. Signed-off-by: Mike Christie In general I like the approach. I'll be looking into it more closely next week. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bio-based DM multipath is back from the dead [was: Re: Notes from the four separate IO track sessions at LSF/MM]
On 05/26/2016 04:38 AM, Mike Snitzer wrote: On Thu, Apr 28 2016 at 11:40am -0400, James Bottomley wrote: On Thu, 2016-04-28 at 08:11 -0400, Mike Snitzer wrote: Full disclosure: I'll be looking at reinstating bio-based DM multipath to regain efficiencies that now really matter when issuing IO to extremely fast devices (e.g. NVMe). bio cloning is now very cheap (due to immutable biovecs), coupled with the emerging multipage biovec work that will help construct larger bios, so I think it is worth pursuing to at least keep our options open. Please see the 4 topmost commits I've published here: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8 All request-based DM multipath support/advances have been completly preserved. I've just made it so that we can now have bio-based DM multipath too. All of the various modes have been tested using mptest: https://github.com/snitm/mptest OK, but remember the reason we moved from bio to request was partly to be nearer to the device but also because at that time requests were accumulations of bios which had to be broken out, go back up the stack individually and be re-elevated, which adds to the inefficiency. In theory the bio splitting work will mean that we only have one or two split bios per request (because they were constructed from a split up huge bio), but when we send them back to the top to be reconstructed as requests there's no guarantee that the split will be correct a second time around and we might end up resplitting the already split bios. If you do reassembly into the huge bio again before resend down the next queue, that's starting to look like quite a lot of work as well. I've not even delved into the level you're laser-focused on here. But I'm struggling to grasp why multipath is any different than any other bio-based device... Actually, _failover_ is not the primary concern. This is on a (relative) slow path so any performance degradation during failover is acceptable. No, the real issue is load-balancing. If you have several paths you have to schedule I/O across all paths, _and_ you should be feeding these paths efficiently. With the original (bio-based) layout you had to schedule on the bio level, causing the requests to be inefficiently assembled. Hence the 'rr_min_io' parameter, which were changing paths after rr_min_io _bios_. I did some experimenting a while back (I even had a presentation on LSF at one point ...), and figuring that you would get a performance degradation once the rr_min_io parameter went below 100. But this means that paths will be switched after every 100 bios, irrespective of into how many requests they'll be assembled. It also means that we have a rather 'choppy' load-balancing behaviour, and cannot achieve 'true' load balancing as the I/O scheduler on the bio level doesn't have any idea when a new request will be assembled. I was sort-of hoping that with the large bio work from Shaohua we could build bio which would not require any merging, ie building bios which would be assembled into a single request per bio. Then the above problem wouldn't exist anymore and we _could_ do scheduling on bio level. But from what I've gathered this is not always possible (eg for btrfs with delayed allocation). Have you found another way of addressing this problem? Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] scsi: add new async device reset support
Adding Hannes to the Cc list as he's been looking into EH improvements in this area. On Wed, May 25, 2016 at 02:55:02AM -0500, mchri...@redhat.com wrote: > From: Mike Christie > > Currently, if the SCSI eh runs then before we do a LUN_RESET > we stop the host. This patch and the block layer one before it > begin to add infrastructure to be able to do a LUN_RESET and > eventually do a transport level recovery without having to stop the > host. > > For LUn-reset, this patch adds a new callout, eh_async_device_reset_handler, > which works similar to how LLDs handle SG_SCSI_RESET_DEVICE where the > LLD manages the commands that are affected. > > eh_async_device_reset_handler: > > The LLD should perform a LUN RESET that affects all commands > that have been accepted by its queuecommand callout for the > device passed in to the callout. While the reset handler is running, > queuecommand will not be running or called for the device. > > Unlike eh_device_reset_handler, queuecommand may still be > called for other devices, and the LLD must call scsi_done for the > commands that have been affected by the reset. > > If SUCCESS or FAST_IO_FAIL is returned, the scsi_cmnds cleaned up > must be failed with DID_ABORT. > > Signed-off-by: Mike Christie > --- > drivers/scsi/scsi_error.c | 31 --- > drivers/scsi/scsi_lib.c | 6 ++ > drivers/scsi/scsi_priv.h | 1 + > include/scsi/scsi_host.h | 17 + > 4 files changed, 52 insertions(+), 3 deletions(-) > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index 984ddcb..cec2dfb 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -853,16 +853,41 @@ static int scsi_try_bus_device_reset(struct scsi_cmnd > *scmd) > { > int rtn; > struct scsi_host_template *hostt = scmd->device->host->hostt; > + struct scsi_device *sdev = scmd->device; > > - if (!hostt->eh_device_reset_handler) > + if (!hostt->eh_device_reset_handler && > + !hostt->eh_async_device_reset_handler) > return FAILED; > > - rtn = hostt->eh_device_reset_handler(scmd); > + if (hostt->eh_device_reset_handler) { > + rtn = hostt->eh_device_reset_handler(scmd); > + } else { > + if (!blk_reset_queue(sdev->request_queue)) > + rtn = SUCCESS; > + else > + rtn = FAILED; > + } > if (rtn == SUCCESS) > - __scsi_report_device_reset(scmd->device, NULL); > + __scsi_report_device_reset(sdev, NULL); > return rtn; > } > > +enum blk_eh_timer_return scsi_reset_queue(struct request_queue *q) > +{ > + struct scsi_device *sdev = q->queuedata; > + struct scsi_host_template *hostt = sdev->host->hostt; > + int rtn; > + > + if (!hostt->eh_async_device_reset_handler) > + return -EOPNOTSUPP; > + > + rtn = hostt->eh_async_device_reset_handler(sdev); > + if (rtn == SUCCESS || rtn == FAST_IO_FAIL) > + return BLK_EH_HANDLED; > + > + return BLK_EH_NOT_HANDLED; > +} > + > /** > * scsi_try_to_abort_cmd - Ask host to abort a SCSI command > * @hostt: SCSI driver host template > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 8106515..11374dd 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -779,6 +779,10 @@ static int __scsi_error_from_host_byte(struct scsi_cmnd > *cmd, int result) > set_host_byte(cmd, DID_OK); > error = -ENODATA; > break; > + case DID_ABORT: > + set_host_byte(cmd, DID_OK); > + error = -EINTR; > + break; > default: > error = -EIO; > break; > @@ -2159,6 +2163,7 @@ struct request_queue *scsi_alloc_queue(struct > scsi_device *sdev) > blk_queue_softirq_done(q, scsi_softirq_done); > blk_queue_rq_timed_out(q, scsi_times_out); > blk_queue_lld_busy(q, scsi_lld_busy); > + blk_queue_reset(q, scsi_reset_queue); > return q; > } > > @@ -2167,6 +2172,7 @@ static struct blk_mq_ops scsi_mq_ops = { > .queue_rq = scsi_queue_rq, > .complete = scsi_softirq_done, > .timeout= scsi_timeout, > + .reset = scsi_reset_queue, > .init_request = scsi_init_request, > .exit_request = scsi_exit_request, > }; > diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h > index 27b4d0a..2e03168 100644 > --- a/drivers/scsi/scsi_priv.h > +++ b/drivers/scsi/scsi_priv.h > @@ -67,6 +67,7 @@ extern void scsi_exit_devinfo(void); > > /* scsi_error.c */ > extern void scmd_eh_abort_handler(struct work_struct *work); > +extern enum blk_eh_timer_return scsi_reset_queue(struct request_queue *q); > extern enum blk_eh_timer_return scsi_times_out(struct request *req); > extern int scsi_error_handler(void *host); > extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
Re: [PATCH 3/5] target: call queue reset if supported
On Wed, May 25, 2016 at 02:55:01AM -0500, mchri...@redhat.com wrote: > From: Mike Christie > > Instead of waiting for commands, map the lun reset operation > to a queue reset. > > We do not check the result of blk_reset_queue because if > it works then we need to wait for the bio/request completions > and if it failed we might as wait and hope like we did before. Please don't call into block device code from the file handler. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] mpt3sas - set num_phys after allocating phy[] space
Hi, This patch looks good. Please consider this patch as Ack-by: Sreekanth Reddy Thanks, Sreekanth On Thu, May 26, 2016 at 12:44 AM, Joe Lawrence wrote: > In _scsih_sas_host_add, the number of HBA phys are determined and then > later used to allocate an array of struct _sas_phy's. If the routine > sets ioc->sas_hba.num_phys, but then fails to allocate the > ioc->sas_hba.phy array (by kcalloc error or other intermediate > error/exit path), ioc->sas_hba is left in a dangerous state: all readers > of ioc->sas_hba.phy[] do so by indexing it from 0..ioc->sas_hba.num_phys > without checking that the space was ever allocated. > > Modify _scsih_sas_host_add to set ioc->sas_hba.num_phys only after > successfully allocating ioc->sas_hba.phy[]. > > Signed-off-by: Joe Lawrence > --- > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 20 +++- > 1 file changed, 11 insertions(+), 9 deletions(-) > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > index f2139e5604a3..6e36d20c9e0b 100644 > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > @@ -4893,13 +4893,22 @@ _scsih_sas_host_add(struct MPT3SAS_ADAPTER *ioc) > u16 ioc_status; > u16 sz; > u8 device_missing_delay; > + u8 num_phys; > > - mpt3sas_config_get_number_hba_phys(ioc, &ioc->sas_hba.num_phys); > - if (!ioc->sas_hba.num_phys) { > + mpt3sas_config_get_number_hba_phys(ioc, &num_phys); > + if (!num_phys) { > pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", > ioc->name, __FILE__, __LINE__, __func__); > return; > } > + ioc->sas_hba.phy = kcalloc(num_phys, > + sizeof(struct _sas_phy), GFP_KERNEL); > + if (!ioc->sas_hba.phy) { > + pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", > + ioc->name, __FILE__, __LINE__, __func__); > + goto out; > + } > + ioc->sas_hba.num_phys = num_phys; > > /* sas_iounit page 0 */ > sz = offsetof(Mpi2SasIOUnitPage0_t, PhyData) + (ioc->sas_hba.num_phys > * > @@ -4959,13 +4968,6 @@ _scsih_sas_host_add(struct MPT3SAS_ADAPTER *ioc) > MPI2_SASIOUNIT1_REPORT_MISSING_TIMEOUT_MASK; > > ioc->sas_hba.parent_dev = &ioc->shost->shost_gendev; > - ioc->sas_hba.phy = kcalloc(ioc->sas_hba.num_phys, > - sizeof(struct _sas_phy), GFP_KERNEL); > - if (!ioc->sas_hba.phy) { > - pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", > - ioc->name, __FILE__, __LINE__, __func__); > - goto out; > - } > for (i = 0; i < ioc->sas_hba.num_phys ; i++) { > if ((mpt3sas_config_get_phy_pg0(ioc, &mpi_reply, &phy_pg0, > i))) { > -- > 1.8.3.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] mpt3sas - avoid mpt3sas_transport_port_add NULL parent_dev
Hi, This patch looks good. Please consider this patch as Ack-by: Sreekanth Reddy Thanks, Sreekanth On Thu, May 26, 2016 at 12:44 AM, Joe Lawrence wrote: > If _scsih_sas_host_add's call to mpt3sas_config_get_sas_iounit_pg0 > fails, ioc->sas_hba.parent_dev may be left uninitialized. A later > device probe could invoke mpt3sas_transport_port_add which will call > sas_port_alloc_num [scsi_transport_sas] with a NULL parent_dev pointer. > > Signed-off-by: Joe Lawrence > --- > drivers/scsi/mpt3sas/mpt3sas_transport.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c > b/drivers/scsi/mpt3sas/mpt3sas_transport.c > index 6a84b82d71bb..ff93286bc32f 100644 > --- a/drivers/scsi/mpt3sas/mpt3sas_transport.c > +++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c > @@ -705,6 +705,11 @@ mpt3sas_transport_port_add(struct MPT3SAS_ADAPTER *ioc, > u16 handle, > goto out_fail; > } > > + if (!sas_node->parent_dev) { > + pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", > + ioc->name, __FILE__, __LINE__, __func__); > + goto out_fail; > + } > port = sas_port_alloc_num(sas_node->parent_dev); > if ((sas_port_add(port))) { > pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", > -- > 1.8.3.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html