Re: Behavior for hot-plugging SD card in USB SD card reader
Hello, I would appreciate any feedback or suggestions regarding the scenario below. Thanks again On 05/12/2016 11:02 PM, Frkuska, Joshua wrote: Hello, I have two questions regarding the behavior for hot-plugging SD card in USB SD card reader relating to the scsi sub system. Before the questions, the background and my understanding: On the 3.14.y branch and later (not sure about 4.2+), when using a USB SD card reader with sd card inserted and filesystem mounted, there are 4 possible cases for physical removal of the sd card. The first is when the card (not usb reader) is removed and the scsi device is not busy (being accessed). The second is with the sd card busy (being accessed) and then the sd card is removed. The 3rd and 4th are the similar but with the usb reader itself being removed with the sd card inserted. This mail focuses on the behavior of the second case where the sd card is busy and then removed. If my understanding is correct, in the scenario above, access of the scsi disk block device keeps the reference count positive. This is followed by the removal of the sd card which in the non-busy case would make the reference count zero and allow the disk partitions to be unmounted. However in the busy case any new sd card inserted into the card reader should recalculate the geometry and allow the disk partitions to be discovered and be auto-mounted. However, it seems that the scsi disk reference count does not go to zero and we are unable to unmount the partition. A lazy unmount appears to occur leaving a stale device node. Similar behavior can be seen on inserting a USB flash based storage device and executing the same problem case. In the case of a USB flash drive, after hotplug and re-insertion, a new block device node gets assigned as the previous one is still busy. In the case of the SD card, a new block device node does not get created. A simple recreation case is to use 'tail -f' on any file on the auto-mounted sd card file system, then remove the SD card and then insert a second different SD card. The second SD card does not auto-mount as one would first think. The device is not listed in MP shadow fblock nor mounted at all. In the case of the USB flash drive, the host adapter is created, scsi bus scanned, finding the scsi device controller on the bus. The device controller responds to the scan and the scsi disk is populated with appropriate geometry sizes. This is provided to the block layer, then to the vfs which the auto-mounter picks up on and mounts the partitions. In the case of the USB SD card reader, when the card reader is plugged in without SD card populated, the scsi device controller is still found, root scsi disk node still created, but its geometry is 0 and the device is put in a NO_MEDIUM_INSERTED state. Any open attempts on the device node will fail with a NOMEDIUM error. Insertion of an SD card changes the reported geometry of the SCSI disk during a periodic scan, the NOMEDIUM state removed and disk information updated. As far as kernel events go, a 'change' and 'add' events are provided to up to userland. When a USB flash drive is busy (being accessed) when hotplugged (removed), the partition remains mounted and root and subsequent re-insertion creates a new root scsi device node in sequential alphabetical order (e.g. /dev/sdb). The reason re-population of the scsi disk works in the USB case is due to the entire virtual host being destroyed and repopulated. This repopulates the parent device node after the device controller is scanned and found on the virtual scsi bus. In the case of the sd card, removal of the SD card only changes the medium inserted state and does not result in a scsi disk root node re-population. I have two questions. 1. In the case of a USB card reader sd card being hotplugged while busy, is the current behavior desired? If so, why? An embedded system where the sd card reader is physically locked/mounted to the usb hub could see such a scenario if the sd card is being accessed during removal from the user. Such a usecase is possible on PC as well. 2. If this behavior should be more similar to a USB flash drive during hotplug what layer modifications would be necessary? (mass storage driver, scsi disk, scsi layer, block, etc) I have a few ideas how this could be done but I would appreciate any additional thoughts or feedback regarding creating similar behavior to a usb hotplug. The current thoughts I have are: a.) change the behavior of the scsi disk layer in the case when the media is ejected with its user count not equal zero. The idea is to force drop (up and down the layer stack) the existing references to the disk so that it can be properly cleaned up and removed. This is a little different than the usb case above as the scsi device node will be re-used. b.) Use a notifier chain to notify the usb mass storage driver to reset and re-instantiate the virtual host
Re: NVMe over Fabrics target implementation
On Tue, 2016-06-07 at 12:55 +0200, Christoph Hellwig wrote: > There is absolutely no point in dragging in an overcomplicated configfs > structure for a very simple protocol which also is very different from > SCSI in it's nitty gritty details. Please be more specific wrt the two individual points that have been raised. > Keeping the nvme target self contains > allows it to be both much simpler and much easier to understand, as well > as much better testable - see the amount of test coverage we could easily > add for example. I disagree. > > Or to put it the other way around - if there was any major synergy in > reusing the SCSI target code that just shows we're missing functionality > in the block layer or configfs. > To reiterate the points again. *) Extensible to multiple types of backend drivers. nvme-target needs a way to absorb new backend drivers, that does not effect existing configfs group layout or attributes. Looking at the nvmet/configfs layout as-is, there are no multiple backend types defined, nor a way to control backend feature bits exposed to nvme namespaces at runtime. What is being proposed is a way to share target-core backends via existing configfs symlinks across SCSI and NVMe targets. Which means: - All I/O state + memory submission is done at RCU protected se_device level via sbc_ops - percpu reference counting is done outside of target-core - Absorb all nvmet/io-cmd optimizations into target_core_iblock.c - Base starting point for features in SCSI + NVMe that span across multiple endpoints and instances (reservations + APTPL, multipath, copy-offload across fabric types) Using target-core backends means we get features like T10-PI and sbc_ops->write_same for free that don't exist in nvmet, and can utilize a common set of backend drivers for SCSI and NVMe via an existing configfs ABI and python userspace community. And to the second, and more important point for defining a configfs ABI that works for both today's requirements, as well into the 2020s without breaking user-space compatibility. As-is, the initial design using top level nvmet configfs symlinks of subsystem groups into individual port + host groups does not scale. That is, it currently does: - Sequential list lookup under global rw_mutex of top-level nvmet_port and nvmet_host symlink ->allow_link() and ->drop_link() configfs callbacks. - nvmet_fabrics_ops->add_port() callback invoked under same global rw mutex. This is very bad for several reasons. As-is, this blocks all other configfs port + host operations from occurring even during normal operation, which makes it quite useless for any type of multi-tenant target environment where the individual target endpoints *must* be able to operate independently. Seriously, there is never a good reason why configfs group or item callbacks should be performing list lookup under a global lock at this level. Why does it ever make sense for $SUBSYSTEM_NQN_0 with $PORT_DRIVER_FOO to block operation of $SUBSYSTEM_NQN_1 with $PORT_DRIVER_BAR..? A simple example where this design breaks down quickly is a NVMf ops->add_port() call that requires a HW reset, or say reloading of firmware that can take multiple seconds. (qla2xxx comes to mind). There is a simple test to highlight this limitation. Take any nvme-target driver that is capable of multiple ports, and introduce a sleep(5) into each ops->add_port() call. Now create 256 different subsystem NQNs with 256 different ports across four different user-space processes. What happens to other subsystems, ports and host groups configfs symlinks when this occurs..? What happens to the other user-space processes..? -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] sd: remove redundant check for BLK_DEF_MAX_SECTORS
Hi Martin, Thanks for looking into this. The problem I'm trying to solve is that, I want to have lower layer driver to setup max_sectors bigger than BLK_DEF_MAX_SECTORS. In Hyper-v, we use 2MB max transfer I/O size, in future version the max transfer I/O size will increase to 8MB. The implementation of sd.c limits the maximum value of max_sectors to BLK_DEF_MAX_SECTORS. Because sd_revalidate_disk is called late in the SCSI disk initialization process, there is no way for a lower layer driver to set this value to its "bigger" optimal size. The reason why I think it may not be necessary for sd.c to setup max_sectors, it's because this value may have already been setup twice before reaching the code in sd.c: 1. When this disk device is first scanned, or re-scanned (in scsi_scan.c), where it eventually calls __scsi_init_queue(), and use the max_sectors in the scsi_host_template. 2. in slave_configure of scsi_host_template, when the lower layer driver implements this function in its template and it can change this value there. Long > -Original Message- > From: Martin K. Petersen [mailto:martin.peter...@oracle.com] > Sent: Monday, June 6, 2016 8:42 PM > To: Long Li> Cc: Tom Yan ; James E.J. Bottomley > ; Martin K. Petersen > ; linux-scsi@vger.kernel.org; linux- > ker...@vger.kernel.org > Subject: Re: [PATCH] sd: remove redundant check for > BLK_DEF_MAX_SECTORS > > > "Long" == Long Li writes: > > Long, > > Long> The reason is that, max_sectors already has value at this point, > Long> the default value is SCSI_DEFAULT_MAX_SECTORS > Long> (include/scsi/scsi_host.h). The lower layer host driver can change > Long> this value in its template. > > The LLD sets max_hw_sectors which indicates the capabilities of the > controller DMA hardware. Whereas the max_sectors limit is set by sd to > either follow advise by the device or--if not provided--use the block layer > default. max_sectors governs the size of READ/WRITE requests and do not > reflect the capabilities of the DMA hardware. > > Long> I think the drivers care about this value have already set it. So > Long> it's better not to change it again. If they want max_sectors to be > Long> set by sd, they can use BLOCK LIMITS VPD to tell it to do so. > > Most drivers don't have the luxury of being able to generate VPDs for their > attached target devices :) > > -- > Martin K. PetersenOracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 2/2] nvme/loop: Add support for controller-per-port model
From: Nicholas BellingerThis patch introduces loopback support for a nvme host controller per nvmet_port instance model, following what we've done in drivers/target/loopback/ for allowing multiple host LLDs to co-exist. It changes nvme_loop_add_port() to use struct nvme_loop_port and take the nvmf_get_default_host() reference, and invokes device_register() to nvme_loop_driver_probe() to kick off controller creation within nvme_loop_create_ctrl(). This allows nvme_loop_queue_rq to setup iod->req.port to the per nvmet_port pointer, instead of a single hardcoded global nvmet_loop_port. Subsequently, it also adds nvme_loop_remove_port() to call device_unregister() and call nvme_loop_del_ctrl() and nvmf_free_options() to drop vmet_port's nvme_default_host rereference, when the nvmet_port port is being removed from the associated nvmet_subsys. Cc: Jens Axboe Cc: Christoph Hellwig Cc: Keith Busch Cc: Jay Freyensee Cc: Martin Petersen Cc: Sagi Grimberg Cc: Hannes Reinecke Cc: Mike Christie Signed-off-by: Nicholas Bellinger --- drivers/nvme/target/loop.c | 183 - 1 file changed, 165 insertions(+), 18 deletions(-) diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c index b4b4da9..01b73dc 100644 --- a/drivers/nvme/target/loop.c +++ b/drivers/nvme/target/loop.c @@ -46,6 +46,13 @@ struct nvme_loop_iod { struct scatterlist first_sgl[]; }; +struct nvme_loop_port { + struct device dev; + struct nvmf_ctrl_options *opts; + struct nvmet_port *port; + struct nvme_ctrl*ctrl; +}; + struct nvme_loop_ctrl { spinlock_t lock; struct nvme_loop_queue *queues; @@ -62,6 +69,8 @@ struct nvme_loop_ctrl { struct nvmet_ctrl *target_ctrl; struct work_struct delete_work; struct work_struct reset_work; + + struct nvme_loop_port *port; }; static inline struct nvme_loop_ctrl *to_loop_ctrl(struct nvme_ctrl *ctrl) @@ -75,8 +84,6 @@ struct nvme_loop_queue { struct nvme_loop_ctrl *ctrl; }; -static struct nvmet_port *nvmet_loop_port; - static LIST_HEAD(nvme_loop_ctrl_list); static DEFINE_MUTEX(nvme_loop_ctrl_mutex); @@ -173,7 +180,8 @@ static int nvme_loop_queue_rq(struct blk_mq_hw_ctx *hctx, return ret; iod->cmd.common.flags |= NVME_CMD_SGL_METABUF; - iod->req.port = nvmet_loop_port; + iod->req.port = queue->ctrl->port->port; + if (!nvmet_req_init(>req, >nvme_cq, >nvme_sq, _loop_ops)) { nvme_cleanup_cmd(req); @@ -618,6 +626,8 @@ out_destroy_queues: static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev, struct nvmf_ctrl_options *opts) { + struct nvme_loop_port *loop_port = container_of(dev, + struct nvme_loop_port, dev); struct nvme_loop_ctrl *ctrl; bool changed; int ret; @@ -626,6 +636,7 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev, if (!ctrl) return ERR_PTR(-ENOMEM); ctrl->ctrl.opts = opts; + ctrl->port = loop_port; INIT_LIST_HEAD(>list); INIT_WORK(>delete_work, nvme_loop_del_ctrl_work); @@ -700,29 +711,117 @@ out_put_ctrl: return ERR_PTR(ret); } +static int nvme_loop_driver_probe(struct device *dev) +{ + struct nvme_loop_port *loop_port = container_of(dev, + struct nvme_loop_port, dev); + struct nvme_ctrl *ctrl; + + ctrl = nvme_loop_create_ctrl(dev, loop_port->opts); + if (IS_ERR(ctrl)) + return PTR_ERR(ctrl); + + loop_port->ctrl = ctrl; + return 0; +} + +static int nvme_loop_driver_remove(struct device *dev) +{ + struct nvme_loop_port *loop_port = container_of(dev, + struct nvme_loop_port, dev); + struct nvme_ctrl *ctrl = loop_port->ctrl; + struct nvmf_ctrl_options *opts = loop_port->opts; + + nvme_loop_del_ctrl(ctrl); + nvmf_free_options(opts); + return 0; +} + +static int pseudo_bus_match(struct device *dev, + struct device_driver *dev_driver) +{ + return 1; +} + +static struct bus_type nvme_loop_bus = { + .name = "nvme_loop_bus", + .match = pseudo_bus_match, + .probe = nvme_loop_driver_probe, + .remove = nvme_loop_driver_remove, +}; + +static struct device_driver nvme_loop_driverfs = { + .name = "nvme_loop", + .bus= _loop_bus, +}; + +static void nvme_loop_release_adapter(struct device *dev) +{ + struct nvme_loop_port
[RFC 1/2] nvme-fabrics: Add nvmf_get_default_host helper
From: Nicholas BellingerThis patch adds a nvmf_get_default_host() helper used to setup nvmf_ctrl_options->host to internal nvmf_default_host reference, following existing nvmf_parse_options() code. Note it's required for nvme-loop multi-controller support, in order to drive nvmet_port creation directly via configfs attribute write from in ../nvmet/subsystems/$NQN/ports/$PORT/ group context. Cc: Jens Axboe Cc: Christoph Hellwig Cc: Keith Busch Cc: Jay Freyensee Cc: Martin Petersen Cc: Sagi Grimberg Cc: Hannes Reinecke Cc: Mike Christie Signed-off-by: Nicholas Bellinger --- drivers/nvme/host/fabrics.c | 13 +++-- drivers/nvme/host/fabrics.h | 1 + 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index ee4b7f1..04317be 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -497,6 +497,16 @@ static struct nvmf_transport_ops *nvmf_lookup_transport( return NULL; } +void nvmf_get_default_host(struct nvmf_ctrl_options *opts) +{ + if (opts->host) + return; + + kref_get(_default_host->ref); + opts->host = nvmf_default_host; +} +EXPORT_SYMBOL_GPL(nvmf_get_default_host); + static const match_table_t opt_tokens = { { NVMF_OPT_TRANSPORT, "transport=%s" }, { NVMF_OPT_TRADDR, "traddr=%s" }, @@ -686,8 +696,7 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts, } if (!opts->host) { - kref_get(_default_host->ref); - opts->host = nvmf_default_host; + nvmf_get_default_host(opts); } out: diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h index b540674..73ef7a7 100644 --- a/drivers/nvme/host/fabrics.h +++ b/drivers/nvme/host/fabrics.h @@ -128,6 +128,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl); int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid); void nvmf_register_transport(struct nvmf_transport_ops *ops); void nvmf_unregister_transport(struct nvmf_transport_ops *ops); +void nvmf_get_default_host(struct nvmf_ctrl_options *opts); void nvmf_free_options(struct nvmf_ctrl_options *opts); const char *nvmf_get_subsysnqn(struct nvme_ctrl *ctrl); int nvmf_get_address(struct nvme_ctrl *ctrl, char *buf, int size); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 0/2] nvme/loop: Add support for controllers-per-port model
From: Nicholas BellingerHi again folks, So building on the nvmet/configfs-ng WIP posted yesterday: http://marc.info/?l=linux-scsi=146528147018404=2 This series adds support for nvme/loop to utilize a nvme host controller per nvmet_port, instead of a single hardcoded entry for nvmet_port pointer access in nvme_loop_queue_rq(). It uses a struct device model following what we've done in drivers/target/loopback to allow multiple host controllers to co-exist under configfs, and is driven by the nvmet_port configfs enable attribute. Which means that any arbitary number of nvme-loop controllers can now exist, and each nvmet_subsystem->ports_group can enable/disable it's own loopback controller! Here is how it looks in action for controller id 1 + 2: root@scsi-mq:/sys/kernel/config/nvmet/subsystems# tree -if . ./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon ./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/namespaces ./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/namespaces/1 ./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/namespaces/1/ramdisk1 -> ../../../../../target/core/rd_mcp_2/ramdisk1 ./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports ./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports/loop ./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports/loop/addr_adrfam ./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports/loop/addr_portid ./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports/loop/addr_traddr ./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports/loop/addr_treq ./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports/loop/addr_trsvcid ./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports/loop/addr_trtype ./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports/loop/enable ./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep ./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/namespaces ./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/namespaces/1 ./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/namespaces/1/ramdisk0 -> ../../../../../target/core/rd_mcp_1/ramdisk0 ./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports ./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports/loop ./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports/loop/addr_adrfam ./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports/loop/addr_portid ./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports/loop/addr_traddr ./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports/loop/addr_treq ./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports/loop/addr_trsvcid ./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports/loop/addr_trtype ./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports/loop/enable # cat /proc/partitions major minor #blocks name 25904194304 nvme0n1 2591 65535 nvme1n1 2592 65535 nvme2n1 Comments..? --nab Nicholas Bellinger (2): nvme-fabrics: Add nvmf_get_default_host helper nvme/loop: Add support for controller-per-port model drivers/nvme/host/fabrics.c | 13 +++- drivers/nvme/host/fabrics.h | 1 + drivers/nvme/target/loop.c | 183 +++- 3 files changed, 177 insertions(+), 20 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] scsi: ufs: fix potential memory leak
If the function ufshcd_parse_clock_info returns an error, the memory clkfreq allocated by devm_kzalloc will be freed at that time. But when the function ufshcd_parse_clock_info returns 0 on success, there exists potential memory leak, this patch fixes it. Signed-off-by: Tiezhu Yang--- drivers/scsi/ufs/ufshcd-pltfrm.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/scsi/ufs/ufshcd-pltfrm.c b/drivers/scsi/ufs/ufshcd-pltfrm.c index db53f38..83f757f 100644 --- a/drivers/scsi/ufs/ufshcd-pltfrm.c +++ b/drivers/scsi/ufs/ufshcd-pltfrm.c @@ -122,6 +122,10 @@ static int ufshcd_parse_clock_info(struct ufs_hba *hba) clki->min_freq, clki->max_freq, clki->name); list_add_tail(>list, >clk_list_head); } + + devm_kfree(dev, clkfreq); + clkfreq = NULL; + out: return ret; } -- 1.8.3.1N嫥叉靣笡y氊b瞂千v豝�)藓{.n�+壏{北�"妠ay�蕠跈�,jf"穐殝鄗�畐ア� ⒎:+v墾妛鑚豰稛�珣赙zZ+凒殠娸"濟!秈
Re: [PATCH] scsi: ufs: fix potential memory leak
At 2016-06-08 03:12:08, "James Bottomley"wrote: >On Wed, 2016-06-08 at 02:00 +0800, Tiezhu Yang wrote: >> There exists potential memory leak in ufshcd_parse_clock_info(), >> this patch fixes it. > >What makes you think there's a leak here? These are all devm_ >allocations, so they're all freed when the device is. If an error is >returned, the device is released and the memory freed. > >You can argue that on successful initialization, there's no need to >keep the clkfreq array but this patch isn't the way you'd change that. > >James > OK, thanks. I will send a v2 patch.
[PATCH] lpfc: fix oops in lpfc_sli4_scmd_to_wqidx_distr() from lpfc_send_taskmgmt()
The lpfc_sli4_scmd_to_wqidx_distr() function expects the scsi_cmnd 'lpfc_cmd->pCmd' not to be null, and point to the midlayer command. That's not true in the .eh_(device|target|bus)_reset_handler path, because lpfc_send_taskmgmt() sends commands not from the midlayer, so does not set 'lpfc_cmd->pCmd'. That is true in the .queuecommand path because lpfc_queuecommand() stores the scsi_cmnd from midlayer in lpfc_cmd->pCmd; and lpfc_cmd is stored by lpfc_scsi_prep_cmnd() in piocbq->context1 -- which is passed to lpfc_sli4_scmd_to_wqidx_distr() as lpfc_cmd parameter. This problem can be hit on SCSI EH, and immediately with sg_reset. These 2 test-cases demonstrate the problem/fix with next-20160601. Test-case 1) sg_reset # strace sg_reset --device /dev/sdm <...> open("/dev/sdm", O_RDWR|O_NONBLOCK) = 3 ioctl(3, SG_SCSI_RESET, 0x3fffde6d0994 +++ killed by SIGSEGV +++ Segmentation fault # dmesg Unable to handle kernel paging request for data at address 0x Faulting instruction address: 0xd0001c88442c Oops: Kernel access of bad area, sig: 11 [#1] <...> CPU: 104 PID: 16333 Comm: sg_reset Tainted: GW 4.7.0-rc1-next-20160601-4-g95b89dc #6 <...> NIP [d0001c88442c] lpfc_sli4_scmd_to_wqidx_distr+0xc/0xd0 [lpfc] LR [d0001c826fe8] lpfc_sli_calc_ring.part.27+0x98/0xd0 [lpfc] Call Trace: [c03c9ec876f0] [c03c9ec87770] 0xc03c9ec87770 (unreliable) [c03c9ec87720] [d0001c82e004] lpfc_sli_issue_iocb+0xd4/0x260 [lpfc] [c03c9ec87780] [d0001c831a3c] lpfc_sli_issue_iocb_wait+0x15c/0x5b0 [lpfc] [c03c9ec87880] [d0001c87f27c] lpfc_send_taskmgmt+0x24c/0x650 [lpfc] [c03c9ec87950] [d0001c87fd7c] lpfc_device_reset_handler+0x10c/0x200 [lpfc] [c03c9ec87a10] [c0610694] scsi_try_bus_device_reset+0x44/0xc0 [c03c9ec87a40] [c06113e8] scsi_ioctl_reset+0x198/0x2c0 [c03c9ec87bf0] [c060fe5c] scsi_ioctl+0x13c/0x4b0 [c03c9ec87c80] [c06629b0] sd_ioctl+0xf0/0x120 [c03c9ec87cd0] [c046e4f8] blkdev_ioctl+0x248/0xb70 [c03c9ec87d30] [c02a1f60] block_ioctl+0x70/0x90 [c03c9ec87d50] [c026d334] do_vfs_ioctl+0xc4/0x890 [c03c9ec87de0] [c026db60] SyS_ioctl+0x60/0xc0 [c03c9ec87e30] [c0009120] system_call+0x38/0x108 Instruction dump: <...> With fix: # strace sg_reset --device /dev/sdm <...> open("/dev/sdm", O_RDWR|O_NONBLOCK) = 3 ioctl(3, SG_SCSI_RESET, 0x3fffe103c554) = 0 close(3)= 0 exit_group(0) = ? +++ exited with 0 +++ # dmesg [ 424.658649] lpfc 0006:01:00.4: 4:(0):0713 SCSI layer issued Device Reset (1, 0) return x2002 Test-case 2) SCSI EH Using this debug patch to wire an SCSI EH trigger, for lpfc_scsi_cmd_iocb_cmpl(): - cmd->scsi_done(cmd); + if ((phba->pport ? phba->pport->cfg_log_verbose : phba->cfg_log_verbose) == 0x3210) + printk(KERN_ALERT "lpfc: skip scsi_done()\n"); + else + cmd->scsi_done(cmd); # echo 0x3210 > /sys/class/scsi_host/host11/lpfc_log_verbose # dd if=/dev/sdm of=/dev/null iflag=direct & <...> After a while: # dmesg lpfc 0006:01:00.4: 4:(0):3053 lpfc_log_verbose changed from 0 (x0) to 839909376 (x3210) lpfc: skip scsi_done() <...> Unable to handle kernel paging request for data at address 0x Faulting instruction address: 0xd000199e448c Oops: Kernel access of bad area, sig: 11 [#1] <...> CPU: 96 PID: 28556 Comm: scsi_eh_11 Tainted: GW 4.7.0-rc1-next-20160601-4-g95b89dc #6 <...> NIP [d000199e448c] lpfc_sli4_scmd_to_wqidx_distr+0xc/0xd0 [lpfc] LR [d00019986fe8] lpfc_sli_calc_ring.part.27+0x98/0xd0 [lpfc] Call Trace: [c00ff0d0b890] [c00ff0d0b900] 0xc00ff0d0b900 (unreliable) [c00ff0d0b8c0] [d0001998e004] lpfc_sli_issue_iocb+0xd4/0x260 [lpfc] [c00ff0d0b920] [d00019991a3c] lpfc_sli_issue_iocb_wait+0x15c/0x5b0 [lpfc] [c00ff0d0ba20] [d000199df27c] lpfc_send_taskmgmt+0x24c/0x650 [lpfc] [c00ff0d0baf0] [d000199dfd7c] lpfc_device_reset_handler+0x10c/0x200 [lpfc] [c00ff0d0bbb0] [c0610694] scsi_try_bus_device_reset+0x44/0xc0 [c00ff0d0bbe0] [c06126cc] scsi_eh_ready_devs+0x49c/0x9c0 [c00ff0d0bcb0] [c0614160] scsi_error_handler+0x580/0x680 [c00ff0d0bd80] [c00ae848] kthread+0x108/0x130 [c00ff0d0be30] [c00094a8] ret_from_kernel_thread+0x5c/0xb4 Instruction dump: <...> With fix: # dmesg lpfc 0006:01:00.4: 4:(0):3053 lpfc_log_verbose changed from 0 (x0) to 839909376 (x3210) lpfc: skip scsi_done() <...> lpfc 0006:01:00.4: 4:(0):0713 SCSI layer issued Device
Re: NVMe over Fabrics target implementation
On Tue, Jun 7, 2016 at 2:02 PM, Andy Groverwrote: > On 06/06/2016 11:23 PM, Nicholas A. Bellinger wrote: >> >> Hi HCH & Co, >> >> On Mon, 2016-06-06 at 23:22 +0200, Christoph Hellwig wrote: >>> >>> This patch set adds a generic NVMe over Fabrics target. The >>> implementation conforms to the NVMe 1.2b specification (which >>> includes Fabrics) and provides the NVMe over Fabrics access >>> to Linux block devices. >>> >> >> Thanks for all of the development work by the fabric_linux_driver team >> (HCH, Sagi, Ming, James F., James S., and Dave M.) over the last year. >> >> Very excited to see this code get a public release now that NVMf >> specification is out. Now that it's in the wild, it's a good >> opportunity to discuss some of the more interesting implementation >> details, beyond the new NVMf wire-protocol itself. > > > I'm sorry but I missed it, can you repeat the link to the NVMe spec(s)? http://www.nvmexpress.org/wp-content/uploads/NVM_Express_1_2_1_Gold_20160603.pdf http://www.nvmexpress.org/wp-content/uploads/NVMe_over_Fabrics_1_0_Gold_20160605.pdf -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NVMe over Fabrics target implementation
On 06/06/2016 11:23 PM, Nicholas A. Bellinger wrote: Hi HCH & Co, On Mon, 2016-06-06 at 23:22 +0200, Christoph Hellwig wrote: This patch set adds a generic NVMe over Fabrics target. The implementation conforms to the NVMe 1.2b specification (which includes Fabrics) and provides the NVMe over Fabrics access to Linux block devices. Thanks for all of the development work by the fabric_linux_driver team (HCH, Sagi, Ming, James F., James S., and Dave M.) over the last year. Very excited to see this code get a public release now that NVMf specification is out. Now that it's in the wild, it's a good opportunity to discuss some of the more interesting implementation details, beyond the new NVMf wire-protocol itself. I'm sorry but I missed it, can you repeat the link to the NVMe spec(s)? Thanks -- Andy -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/4] Add ioctl to issue ZBC/ZAC commands via block layer
Add New ioctl types BLKREPORT- Issue Report Zones to device. BLKOPENZONE - Issue an Zone Action: Open Zone command. BLKCLOSEZONE - Issue an Zone Action: Close Zone command. BLKRESETZONE - Issue an Zone Action: Reset Zone command. Signed-off-by: Shaun Tancheff--- V2: - Added include/uapi/linux/fs.h - Removed REQ_META flag from this patch. block/ioctl.c | 110 ++ include/uapi/linux/blkzoned_api.h | 6 +++ include/uapi/linux/fs.h | 1 + 3 files changed, 117 insertions(+) diff --git a/block/ioctl.c b/block/ioctl.c index ed2397f..97f45f5 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -194,6 +195,109 @@ int blkdev_reread_part(struct block_device *bdev) } EXPORT_SYMBOL(blkdev_reread_part); +static int blk_zoned_report_ioctl(struct block_device *bdev, fmode_t mode, + void __user *parg) +{ + int error = -EFAULT; + gfp_t gfp = GFP_KERNEL; + struct bdev_zone_report_io *zone_iodata = NULL; + int order = 0; + struct page *pgs = NULL; + u32 alloc_size = PAGE_SIZE; + unsigned long op_flags = 0; + u8 opt = 0; + + if (!(mode & FMODE_READ)) + return -EBADF; + + zone_iodata = (void *)get_zeroed_page(gfp); + if (!zone_iodata) { + error = -ENOMEM; + goto report_zones_out; + } + if (copy_from_user(zone_iodata, parg, sizeof(*zone_iodata))) { + error = -EFAULT; + goto report_zones_out; + } + if (zone_iodata->data.in.return_page_count > alloc_size) { + int npages; + + alloc_size = zone_iodata->data.in.return_page_count; + npages = (alloc_size + PAGE_SIZE - 1) / PAGE_SIZE; + order = ilog2(roundup_pow_of_two(npages)); + pgs = alloc_pages(gfp, order); + if (pgs) { + void *mem = page_address(pgs); + + if (!mem) { + error = -ENOMEM; + goto report_zones_out; + } + memset(mem, 0, alloc_size); + memcpy(mem, zone_iodata, sizeof(*zone_iodata)); + free_page((unsigned long)zone_iodata); + zone_iodata = mem; + } else { + /* Result requires DMA capable memory */ + pr_err("Not enough memory available for request.\n"); + error = -ENOMEM; + goto report_zones_out; + } + } + opt = zone_iodata->data.in.report_option & 0x7F; + error = blkdev_issue_zone_report(bdev, op_flags, + zone_iodata->data.in.zone_locator_lba, opt, + pgs ? pgs : virt_to_page(zone_iodata), + alloc_size, GFP_KERNEL); + + if (error) + goto report_zones_out; + + if (copy_to_user(parg, zone_iodata, alloc_size)) + error = -EFAULT; + +report_zones_out: + if (pgs) + __free_pages(pgs, order); + else if (zone_iodata) + free_page((unsigned long)zone_iodata); + return error; +} + +static int blk_zoned_action_ioctl(struct block_device *bdev, fmode_t mode, + unsigned int cmd, unsigned long arg) +{ + unsigned long op_flags = 0; + + if (!(mode & FMODE_WRITE)) + return -EBADF; + + /* +* When acting on zones we explicitly disallow using a partition. +*/ + if (bdev != bdev->bd_contains) { + pr_err("%s: All zone operations disallowed on this device\n", + __func__); + return -EFAULT; + } + + switch (cmd) { + case BLKOPENZONE: + op_flags |= REQ_OPEN_ZONE; + break; + case BLKCLOSEZONE: + op_flags |= REQ_CLOSE_ZONE; + break; + case BLKRESETZONE: + op_flags |= REQ_RESET_ZONE; + break; + default: + pr_err("%s: Unknown action: %u\n", __func__, cmd); + WARN_ON(1); + } + return blkdev_issue_zone_action(bdev, op_flags, arg, GFP_KERNEL); +} + static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode, unsigned long arg, unsigned long flags) { @@ -568,6 +672,12 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, case BLKTRACESETUP: case BLKTRACETEARDOWN: return blk_trace_ioctl(bdev, cmd, argp); + case BLKREPORT: + return blk_zoned_report_ioctl(bdev, mode, argp); + case BLKOPENZONE: + case BLKCLOSEZONE: + case BLKRESETZONE: +
[PATCH v2 4/4] Add ata pass-through path for ZAC commands.
The current generation of HBA SAS adapters support connecting SATA drives and perform SCSI<->ATA translations in hardware. Unfortunately the ZBC commands are not being translate (yet). Currently users of SAS controllers can only send ZAC commands via ata pass-through. This method overloads the meaning of REQ_META to direct ZBC commands to construct ZAC equivalent ATA pass through commands. Note also that this approach expects the initiator to deal with the little endian result due to bypassing the normal translation layers. Signed-off-by: Shaun Tancheff--- v2: - Added REQ_META to op_flags if high bit is set in opt. block/ioctl.c | 32 drivers/scsi/sd.c | 70 + include/linux/ata.h | 15 3 files changed, 102 insertions(+), 15 deletions(-) diff --git a/block/ioctl.c b/block/ioctl.c index 97f45f5..c853c6f 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -245,6 +245,9 @@ static int blk_zoned_report_ioctl(struct block_device *bdev, fmode_t mode, } } opt = zone_iodata->data.in.report_option & 0x7F; + if (zone_iodata->data.in.report_option & ZOPT_USE_ATA_PASS) + op_flags |= REQ_META; + error = blkdev_issue_zone_report(bdev, op_flags, zone_iodata->data.in.zone_locator_lba, opt, pgs ? pgs : virt_to_page(zone_iodata), @@ -281,6 +284,35 @@ static int blk_zoned_action_ioctl(struct block_device *bdev, fmode_t mode, return -EFAULT; } + /* +* When the low bit is set force ATA passthrough try to work around +* older SAS HBA controllers that don't support ZBC to ZAC translation. +* +* When the low bit is clear follow the normal path but also correct +* for ~0ul LBA means 'for all lbas'. +* +* NB: We should do extra checking here to see if the user specified +* the entire block device as opposed to a partition of the +* device +*/ + if (arg & 1) { + op_flags |= REQ_META; + if (arg != ~0ul) + arg &= ~1ul; /* ~1 :: 0xFF...FE */ + } else { + if (arg == ~1ul) + arg = ~0ul; + } + + /* +* When acting on zones we explicitly disallow using a partition. +*/ + if (bdev != bdev->bd_contains) { + pr_err("%s: All zone operations disallowed on this device\n", + __func__); + return -EFAULT; + } + switch (cmd) { case BLKOPENZONE: op_flags |= REQ_OPEN_ZONE; diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 06b54d5..cf96f01 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include @@ -1182,12 +1183,29 @@ static int sd_setup_zoned_cmnd(struct scsi_cmnd *cmd) cmd->cmd_len = 16; memset(cmd->cmnd, 0, cmd->cmd_len); - cmd->cmnd[0] = ZBC_IN; - cmd->cmnd[1] = ZI_REPORT_ZONES; - put_unaligned_be64(sector, >cmnd[2]); - put_unaligned_be32(nr_bytes, >cmnd[10]); - /* FUTURE ... when streamid is available */ - /* cmd->cmnd[14] = bio_get_streamid(bio); */ + if (rq->cmd_flags & REQ_META) { + cmd->cmnd[0] = ATA_16; + cmd->cmnd[1] = (0x6 << 1) | 1; + cmd->cmnd[2] = 0x0e; + /* FUTURE ... when streamid is available */ + /* cmd->cmnd[3] = bio_get_streamid(bio); */ + cmd->cmnd[4] = ATA_SUBCMD_ZAC_MGMT_IN_REPORT_ZONES; + cmd->cmnd[5] = ((nr_bytes / 512) >> 8) & 0xff; + cmd->cmnd[6] = (nr_bytes / 512) & 0xff; + + _lba_to_cmd_ata(>cmnd[7], sector); + + cmd->cmnd[13] = 1 << 6; + cmd->cmnd[14] = ATA_CMD_ZAC_MGMT_IN; + } else { + cmd->cmnd[0] = ZBC_IN; + cmd->cmnd[1] = ZI_REPORT_ZONES; + put_unaligned_be64(sector, >cmnd[2]); + put_unaligned_be32(nr_bytes, >cmnd[10]); + /* FUTURE ... when streamid is available */ + /* cmd->cmnd[14] = bio_get_streamid(bio); */ + } + cmd->sc_data_direction = DMA_FROM_DEVICE; cmd->sdb.length = nr_bytes; cmd->transfersize = sdp->sector_size; @@ -1208,14 +1226,29 @@ static int sd_setup_zoned_cmnd(struct scsi_cmnd *cmd) cmd->cmd_len = 16; memset(cmd->cmnd, 0, cmd->cmd_len); memset(>sdb, 0, sizeof(cmd->sdb)); - cmd->cmnd[0] = ZBC_OUT; - cmd->cmnd[1] = ZO_OPEN_ZONE; -
[PATCH v2 2/4] Add bio/request flags for using ZBC/ZAC commands
T10 ZBC and T13 ZAC specify operations for Zoned devices. To be able to access the zone information and open and close zones adding flags for the report zones command (REQ_REPORT_ZONES) and for Open and Close zone (REQ_OPEN_ZONE and REQ_CLOSE_ZONE) can be added for use by struct bio's bi_rw and by struct request's cmd_flags. To reduce the number of additional flags needed REQ_RESET_ZONE shares the same flag as REQ_REPORT_ZONES and is differentiated by direction. Report zones is a device read that requires a buffer. Reset is a device command (WRITE) that has no associated data transfer. The Finish zone command is intentionally not implimented as there is no current use case for that operation. Report zones currently defaults to reporting on all zones. It expected that support for the zone option flag will piggy back on streamid support. The report option is useful as it can reduce the number of zones in each report, but not critical. Signed-off-by: Shaun Tancheff--- V2: - Changed bi_rw to op_flags clarify sepeartion of bio op from flags. - Fixed memory leak in blkdev_issue_zone_report failing to put_bio(). - Documented opt in blkdev_issue_zone_report. - Removed include/uapi/linux/fs.h from this patch. MAINTAINERS | 9 ++ block/blk-lib.c | 97 + drivers/scsi/sd.c | 99 +- drivers/scsi/sd.h | 1 + include/linux/blk_types.h | 19 +++- include/linux/blkzoned_api.h | 25 + include/uapi/linux/Kbuild | 1 + include/uapi/linux/blkzoned_api.h | 215 ++ 8 files changed, 461 insertions(+), 5 deletions(-) create mode 100644 include/linux/blkzoned_api.h create mode 100644 include/uapi/linux/blkzoned_api.h diff --git a/MAINTAINERS b/MAINTAINERS index ed42cb6..d9fafa2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -12662,6 +12662,15 @@ F: Documentation/networking/z8530drv.txt F: drivers/net/hamradio/*scc.c F: drivers/net/hamradio/z8530.h +ZBC AND ZBC BLOCK DEVICES +M: Shaun Tancheff +W: http://seagate.com +W: https://github.com/Seagate/ZDM-Device-Mapper +L: linux-bl...@vger.kernel.org +S: Maintained +F: include/linux/blkzoned_api.h +F: include/uapi/linux/blkzoned_api.h + ZBUD COMPRESSED PAGE ALLOCATOR M: Seth Jennings L: linux...@kvack.org diff --git a/block/blk-lib.c b/block/blk-lib.c index 23d7f30..ce4168a 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -6,6 +6,7 @@ #include #include #include +#include #include "blk.h" @@ -249,3 +250,99 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); } EXPORT_SYMBOL(blkdev_issue_zeroout); + +/** + * blkdev_issue_zone_report - queue a report zones operation + * @bdev: target blockdev + * @op_flags: extra bio rw flags. If unsure, use 0. + * @sector:starting sector (report will include this sector). + * @opt: See: zone_report_option, default is 0 (all zones). + * @page: one or more contiguous pages. + * @pgsz: up to size of page in bytes, size of report. + * @gfp_mask: memory allocation flags (for bio_alloc) + * + * Description: + *Issue a zone report request for the sectors in question. + */ +int blkdev_issue_zone_report(struct block_device *bdev, unsigned int op_flags, +sector_t sector, u8 opt, struct page *page, +size_t pgsz, gfp_t gfp_mask) +{ + struct bdev_zone_report *conv = page_address(page); + struct bio *bio; + unsigned int nr_iovecs = 1; + int ret = 0; + + if (pgsz < (sizeof(struct bdev_zone_report) + + sizeof(struct bdev_zone_descriptor))) + return -EINVAL; + + bio = bio_alloc(gfp_mask, nr_iovecs); + if (!bio) + return -ENOMEM; + + conv->descriptor_count = 0; + bio->bi_iter.bi_sector = sector; + bio->bi_bdev = bdev; + bio->bi_vcnt = 0; + bio->bi_iter.bi_size = 0; + + op_flags |= REQ_REPORT_ZONES; + + /* FUTURE ... when streamid is available: */ + /* bio_set_streamid(bio, opt); */ + + bio_add_page(bio, page, pgsz, 0); + ret = submit_bio_wait(READ | op_flags, bio); + + /* +* When our request it nak'd the underlying device maybe conventional +* so ... report a single conventional zone the size of the device. +*/ + if (ret == -EIO && conv->descriptor_count) { + /* Adjust the conventional to the size of the partition ... */ + __be64 blksz = cpu_to_be64(bdev->bd_part->nr_sects); + + conv->maximum_lba = blksz; + conv->descriptors[0].type = ZTYP_CONVENTIONAL; + conv->descriptors[0].flags = ZCOND_CONVENTIONAL << 4; +
[PATCH v2 0/4] Block layer support ZAC/ZBC commands
As Host Aware drives are becoming available we would like to be able to make use of such drives. This series is also intended to be suitable for use by Host Managed drives. ZAC/ZBC drives add new commands for discovering and working with Zones. This extends the ZAC/ZBC support up to the block layer. Thie first patch in the series is a place-holder for Mike Christi's separate operations from flags ... https://lkml.kernel.org/r/1465155145-10812-1-git-send-email-mchri...@redhat.com Once that work is completed the first patch can be dropped. Patches for util-linux can be found here: https://github.com/Seagate/ZDM-Device-Mapper/tree/master/patches/util-linux Using BIOs to issue ZBC commands allows DM targets (such as ZDM) or file-systems such as btrfs or nilfs2 to extend their block allocation schemes and issue discards that are zone aware. A perhaps non-obvious approach is that a conventional drive will returns a descriptor with a single large conventional zone. This patch is also at https://github.com/stancheff/linux.git g...@github.com:stancheff/linux.git branch: v4.7-rc2+bio.zbc.v2 V2: - Changed bi_rw to op_flags clarify sepeartion of bio op from flags. - Fixed memory leak in blkdev_issue_zone_report failing to put_bio(). - Documented opt in blkdev_issue_zone_report. - Moved include/uapi/linux/fs.h changes to patch 3 - Fixed commit message for first patch in series. Shaun Tancheff (4): Losing bits on request.cmd_flags Add bio/request flags for using ZBC/ZAC commands Add ioctl to issue ZBC/ZAC commands via block layer Add ata pass-through path for ZAC commands. MAINTAINERS | 9 ++ block/blk-core.c | 17 +-- block/blk-lib.c | 97 + block/blk-merge.c | 2 +- block/blk-mq.c| 2 +- block/cfq-iosched.c | 2 +- block/elevator.c | 4 +- block/ioctl.c | 142 drivers/scsi/sd.c | 141 +++- drivers/scsi/sd.h | 1 + include/linux/ata.h | 15 +++ include/linux/blk_types.h | 19 +++- include/linux/blkzoned_api.h | 25 + include/linux/elevator.h | 4 +- include/uapi/linux/Kbuild | 1 + include/uapi/linux/blkzoned_api.h | 221 ++ include/uapi/linux/fs.h | 1 + 17 files changed, 683 insertions(+), 20 deletions(-) create mode 100644 include/linux/blkzoned_api.h create mode 100644 include/uapi/linux/blkzoned_api.h -- 2.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/4] Losing bits on request.cmd_flags
In a few places a temporary value smaller than a cmd_flags is used to test for bits and or build up a new cmd_flags. Change to use explict u64 values where appropriate. This patch is place holder for: Mike Christie's separate operations ... series https://lkml.kernel.org/r/1465155145-10812-1-git-send-email-mchri...@redhat.com Once Mike's patches are stablized this patch can be dropped. Signed-off-by: Shaun Tancheff--- v2: - Fixed commit message block/blk-core.c | 17 ++--- block/blk-merge.c| 2 +- block/blk-mq.c | 2 +- block/cfq-iosched.c | 2 +- block/elevator.c | 4 ++-- include/linux/elevator.h | 4 ++-- 6 files changed, 17 insertions(+), 14 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 2475b1c7..945e564 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -959,7 +959,7 @@ static void __freed_request(struct request_list *rl, int sync) * A request has just been released. Account for it, update the full and * congestion status, wake up any waiters. Called under q->queue_lock. */ -static void freed_request(struct request_list *rl, unsigned int flags) +static void freed_request(struct request_list *rl, u64 flags) { struct request_queue *q = rl->q; int sync = rw_is_sync(flags); @@ -1054,7 +1054,7 @@ static struct io_context *rq_ioc(struct bio *bio) /** * __get_request - get a free request * @rl: request list to allocate from - * @rw_flags: RW and SYNC flags + * @rw: RW and SYNC flags * @bio: bio to allocate request for (can be %NULL) * @gfp_mask: allocation mask * @@ -1065,7 +1065,7 @@ static struct io_context *rq_ioc(struct bio *bio) * Returns ERR_PTR on failure, with @q->queue_lock held. * Returns request pointer on success, with @q->queue_lock *not held*. */ -static struct request *__get_request(struct request_list *rl, int rw_flags, +static struct request *__get_request(struct request_list *rl, unsigned long rw, struct bio *bio, gfp_t gfp_mask) { struct request_queue *q = rl->q; @@ -1073,6 +1073,7 @@ static struct request *__get_request(struct request_list *rl, int rw_flags, struct elevator_type *et = q->elevator->type; struct io_context *ioc = rq_ioc(bio); struct io_cq *icq = NULL; + u64 rw_flags = rw; const bool is_sync = rw_is_sync(rw_flags) != 0; int may_queue; @@ -1237,7 +1238,8 @@ rq_starved: * Returns ERR_PTR on failure, with @q->queue_lock held. * Returns request pointer on success, with @q->queue_lock *not held*. */ -static struct request *get_request(struct request_queue *q, int rw_flags, +static struct request *get_request(struct request_queue *q, + unsigned long rw_flags, struct bio *bio, gfp_t gfp_mask) { const bool is_sync = rw_is_sync(rw_flags) != 0; @@ -1490,7 +1492,7 @@ void __blk_put_request(struct request_queue *q, struct request *req) * it didn't come out of our reserved rq pools */ if (req->cmd_flags & REQ_ALLOCED) { - unsigned int flags = req->cmd_flags; + u64 flags = req->cmd_flags; struct request_list *rl = blk_rq_rl(req); BUG_ON(!list_empty(>queuelist)); @@ -1712,7 +1714,8 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio) { const bool sync = !!(bio->bi_rw & REQ_SYNC); struct blk_plug *plug; - int el_ret, rw_flags, where = ELEVATOR_INSERT_SORT; + u64 rw_flags; + int el_ret, where = ELEVATOR_INSERT_SORT; struct request *req; unsigned int request_count = 0; @@ -2246,7 +2249,7 @@ EXPORT_SYMBOL_GPL(blk_insert_cloned_request); */ unsigned int blk_rq_err_bytes(const struct request *rq) { - unsigned int ff = rq->cmd_flags & REQ_FAILFAST_MASK; + u64 ff = rq->cmd_flags & REQ_FAILFAST_MASK; unsigned int bytes = 0; struct bio *bio; diff --git a/block/blk-merge.c b/block/blk-merge.c index 2613531..fec37e1 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -604,7 +604,7 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, */ void blk_rq_set_mixed_merge(struct request *rq) { - unsigned int ff = rq->cmd_flags & REQ_FAILFAST_MASK; + u64 ff = rq->cmd_flags & REQ_FAILFAST_MASK; struct bio *bio; if (rq->cmd_flags & REQ_MIXED_MERGE) diff --git a/block/blk-mq.c b/block/blk-mq.c index 29cbc1b..db962bc 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -159,7 +159,7 @@ bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx) EXPORT_SYMBOL(blk_mq_can_queue); static void blk_mq_rq_ctx_init(struct request_queue *q, struct blk_mq_ctx *ctx, - struct request *rq, unsigned int rw_flags) + struct request *rq, u64 rw_flags) { if
Re: [PATCH] scsi: ufs: fix potential memory leak
On Wed, 2016-06-08 at 02:00 +0800, Tiezhu Yang wrote: > There exists potential memory leak in ufshcd_parse_clock_info(), > this patch fixes it. What makes you think there's a leak here? These are all devm_ allocations, so they're all freed when the device is. If an error is returned, the device is released and the memory freed. You can argue that on successful initialization, there's no need to keep the clkfreq array but this patch isn't the way you'd change that. James > Signed-off-by: Tiezhu Yang> --- > drivers/scsi/ufs/ufshcd-pltfrm.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/scsi/ufs/ufshcd-pltfrm.c > b/drivers/scsi/ufs/ufshcd-pltfrm.c > index db53f38..8b057f4 100644 > --- a/drivers/scsi/ufs/ufshcd-pltfrm.c > +++ b/drivers/scsi/ufs/ufshcd-pltfrm.c > @@ -100,19 +100,19 @@ static int ufshcd_parse_clock_info(struct > ufs_hba *hba) > if (ret && (ret != -EINVAL)) { > dev_err(dev, "%s: error reading array %d\n", > "freq-table-hz", ret); > - return ret; > + goto out_free; > } > > for (i = 0; i < sz; i += 2) { > ret = of_property_read_string_index(np, > "clock-names", i/2, (const char > **)); > if (ret) > - goto out; > + goto out_free; > > clki = devm_kzalloc(dev, sizeof(*clki), GFP_KERNEL); > if (!clki) { > ret = -ENOMEM; > - goto out; > + goto out_free; > } > > clki->min_freq = clkfreq[i]; > @@ -122,6 +122,10 @@ static int ufshcd_parse_clock_info(struct > ufs_hba *hba) > clki->min_freq, clki->max_freq, clki > ->name); > list_add_tail(>list, >clk_list_head); > } > + > +out_free: > + devm_kfree(dev, clkfreq); > + clkfreq = NULL; > out: > return ret; > } -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] scsi: ufs: fix potential memory leak
There exists potential memory leak in ufshcd_parse_clock_info(), this patch fixes it. Signed-off-by: Tiezhu Yang--- drivers/scsi/ufs/ufshcd-pltfrm.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd-pltfrm.c b/drivers/scsi/ufs/ufshcd-pltfrm.c index db53f38..8b057f4 100644 --- a/drivers/scsi/ufs/ufshcd-pltfrm.c +++ b/drivers/scsi/ufs/ufshcd-pltfrm.c @@ -100,19 +100,19 @@ static int ufshcd_parse_clock_info(struct ufs_hba *hba) if (ret && (ret != -EINVAL)) { dev_err(dev, "%s: error reading array %d\n", "freq-table-hz", ret); - return ret; + goto out_free; } for (i = 0; i < sz; i += 2) { ret = of_property_read_string_index(np, "clock-names", i/2, (const char **)); if (ret) - goto out; + goto out_free; clki = devm_kzalloc(dev, sizeof(*clki), GFP_KERNEL); if (!clki) { ret = -ENOMEM; - goto out; + goto out_free; } clki->min_freq = clkfreq[i]; @@ -122,6 +122,10 @@ static int ufshcd_parse_clock_info(struct ufs_hba *hba) clki->min_freq, clki->max_freq, clki->name); list_add_tail(>list, >clk_list_head); } + +out_free: + devm_kfree(dev, clkfreq); + clkfreq = NULL; out: return ret; } -- 1.8.3.1
Re: Partitions not detected with firewire - 36 bytes offset - PL-3507
Adding Cc: lsml, quoting your message in full. On Jun 02 Fab Stz wrote at linux1394-user: > Hello > > I have a external hard drive box (brand SQP : www.sqp.fr, more details > below) with USB & Firewire support. The chipset is a PL-3507. > > On MaxOS X or windows (eg win10), if I connect through USB or FW, my > partitions are displayed and the disk is well detected. In Linux (kernel > 4.5.3) with USB, disk and partitions are well detected. However when > connected through firewire, the disk is well detected but not the > partitions. > > By investigating the issue I came to that thread > [ https://sourceforge.net/p/linux1394/mailman/message/13218958/ ] which > I read carefuly. I also have an offset when running dd on the disk. > There are 36 additional bytes (value : 0x00) at the begining of the disk > when connected through FW compared to USB. That may explain the fact > that the partitions are not found. BTW, it is a GUID Partition Table in > my case. > > I checked the firmware, it is dated 2004/Oct/06. According to the > release notes of the firmware, there was a fix in firmware 2004/Mar/24 > labeled " Fixed cannot mount HDD in Linux RedHat. Must use kernel > v2.4.20 and above. [1394] ". So I supposed upgrading wouldn't bring > anything. Nevertheless, I tried upgrading the firmware to a more recent > one [2004/Nov/09 or even 2006/Apr/20], but upgrade was rejected. The > above thread mentioned that some boards have a zero resistor preventing > upgrade. I searched for the zero resistance R72 > (http://club.cdfreaks.com/showthread.php?p=957178), but on my board > there isn't any. So I don't see how to upgrade the firmware. > > My questions are : > - could it be a bug in linux driver ? > - is it that my device is not well detected ? Or is there any workaround > in the kernel for buggy devices which doesn't match my particular > device ? It's vendor/device id are when connected as USB idVendor=0025, > idProduct=3507 (usualy for prolific it is idVendor=067b, idProduct=3507) > - do you have any other clue to make it work, or upgrade the firmware, > or determine which zero resistor has to be removed ? > - is there some way to tell linux to start reading the disk with an > offset of 36 bytes or ignore these 36 bytes? > > You can find below all technical details. > > Thank you First a few theoretical thoughts on why the device works under Windows and OS X but not under Linux: The first SCSI request which the Linux kernel issues contains the INQUIRY command. You are seeing results from INQUIRY in the dmesg portion below. There is for example a quirk parameter which control how the kernel reads the INQUIRY response, but it seems to me that this is not relevant here. (BLIST_INQUIRY_36 from https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/scsi/scsi_devinfo.h) Next you are also seeing results from a command which the kernel sent subsequently, MODE SENSE. MODE SENSE can request various so-called mode pages. Most of them are optional, and some buggy firmwares react badly when asked for a mode page that they don't know about. There are some quirk flags which relate to that sort of device bugs (BLIST_MS_SKIP_PAGE_08, BLIST_MS_SKIP_PAGE_3F, BLIST_USE_10_BYTE_MS, BLIST_MS_192_BYTES_FOR_3F) but I doubt that you need any of those. Next, the sd driver should issue READ_CAPACITY to determine medium size (or maybe this happened already before MODE SENSE, I would have to look that up in drivers/scsi/sd.c). Then the block layer(?) should issue a READ command of some sort in order to fetch the partition table. If the Linux kernel performs anything of the above in only the tiniest different way than Windows does it, a buggy firmware like the typical firmwares of the FireWire portion of PL-3507, the firmware will start sending malformed results or flat out freeze. AFAIR this series of commands which the kernel issues, the subsequent commands come from userland. And again, if Linux userland sends commands which Windows kernel or Windows userland do not send (or do not send with such and such parameters or in this or that particular order of sequence), PL-3507 firmwares will take offense and go belly-up. One example would be that userland sends another INQUIRY request. Which would be perfectly standards conforming but is highly dangerous for PL-3507. The 36 bytes offset which you observed happens to be the same number of bytes as the length of a typical INQUIRY response buffer. Coincidence? So, - the SCSI layer and block layer of the Linux kerne could issue commands which the PL-3507's firmware cannot handle, but this can be influenced to some degree by specifying quirk flags to SCSI core. (I don't remember right now how to do that precisely.) - Or Linux userland could issue commands which crash the firmware. To investigate the latter possibility, you could boot into a minimal single-user command line and try to use the disk from there. Consult your distribution's
Re: [PATCH v3 0/6] Introduce pci_(request|release)_(mem|io)_regions
The whole series looks fine to me: Reviewed-by: Christoph Hellwig-- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] scsi_debug: fix sleep in invalid context
> +static int p_fill_from_dev_buffer(struct scsi_cmnd *scp, const void *arr, > + int arr_len, unsigned int off_dst) > +{ > + int act_len, n; > + struct scsi_data_buffer *sdb = scsi_in(scp); > + off_t skip = off_dst; Why off_t which is a signed value instead of the unsigned in passed in? > +#define RL_BUCKET_ELEMS 8 > + > /* Even though each pseudo target has a REPORT LUNS "well known logical unit" > * (W-LUN), the normal Linux scanning logic does not associate it with a > * device (e.g. /dev/sg7). The following magic will make that association: > @@ -3285,12 +3315,14 @@ static int resp_report_luns(struct scsi_cmnd *scp, > unsigned char select_report; > u64 lun; > struct scsi_lun *lun_p; > - u8 *arr; > + u8 arr[RL_BUCKET_ELEMS * sizeof(struct scsi_lun)]; just use an on-stack array of type struct scsi_lun here, e.g.: struct scsi_lun arr[RL_BUCKET_ELEMS]; Which you can then use directly instead of lun_p later, but which can also be passed p_fill_from_dev_buffer as that takes a void pointer. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NVMe over Fabrics target implementation
There is absolutely no point in dragging in an overcomplicated configfs structure for a very simple protocol which also is very different from SCSI in it's nitty gritty details. Keeping the nvme target self contains allows it to be both much simpler and much easier to understand, as well as much better testable - see the amount of test coverage we could easily add for example. Or to put it the other way around - if there was any major synergy in reusing the SCSI target code that just shows we're missing functionality in the block layer or configfs. The only thing where this is the case mid-term is persistent reservations, but Mike Christie already has a plan for a pluggable PR API, which I'm very interested in. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/6] Introduce pci_(request|release)_(mem|io)_regions
The first patch in this series introduces the following 4 helper functions to the PCI core: * pci_request_mem_regions() * pci_request_io_regions() * pci_release_mem_regions() * pci_release_io_regions() which encapsulate the request and release of a PCI device's memory or I/O bars. The subsequent patches convert the drivers, which use the pci_request_selected_regions(pdev, pci_select_bars(pdev, IORESOURCE_MEM), name); and similar pattern to use the new interface. This was suggested by Christoph Hellwig in http://lists.infradead.org/pipermail/linux-nvme/2016-May/004570.html and tested on kernel v4.6 with NVMe. The conversion of the drivers has been performed by the following coccinelle spatch: // IORESOURCE_MEM @@ expression err, pdev, name; @@ - err = pci_request_selected_regions(pdev, pci_select_bars(pdev, IORESOURCE_MEM), name); + err = pci_request_mem_regions(pdev, name); @@ expression pdev; @@ - pci_release_selected_regions(pdev, pci_select_bars(pdev, IORESOURCE_MEM)); + pci_release_mem_regions(pdev); @@ expression err, pdev, name; identifier bars; @@ - bars = pci_select_bars(pdev, IORESOURCE_MEM); ... - err = pci_request_selected_regions(pdev, bars, name); + err = pci_request_mem_regions(pdev, name); @@ expression pdev; identifier bars; @@ - bars = pci_select_bars(pdev, IORESOURCE_MEM); ... - pci_release_selected_regions(pdev, bars); + pci_release_mem_regions(pdev); // IORESOURCE_IO @@ expression err, pdev, name; @@ - err = pci_request_selected_regions(pdev, pci_select_bars(pdev, IORESOURCE_IO), name); + err = pci_request_io_regions(pdev, name); @@ expression pdev; @@ - pci_release_selected_regions(pdev, pci_select_bars(pdev, IORESOURCE_IO)); + pci_release_io_regions(pdev); @@ expression err, pdev, name; identifier bars; @@ - bars = pci_select_bars(pdev, IORESOURCE_IO); ... - err = pci_request_selected_regions(pdev, bars, name); + err = pci_request_io_regions(pdev, name); @@ expression pdev; identifier bars; @@ - bars = pci_select_bars(pdev, IORESOURCE_IO); ... - pci_release_selected_regions(pdev, bars); + pci_release_io_regions(pdev); Changes since v2: * Fixed compilation error on platforms with CONFIG_PCI=n * Added Jeff's Acked-by on the Intel ethernet patch * Added Dick's Acked-by on the lpfc patch Changes since v1: * Fixed indendatoin in pci.h patch to not cross the 80 chars boundary. * Split Ethernet patches into two, one for Atheros and one for Intel drivers. * Correctly named lpfc patch. * Converted init-path of lpfc driver as well. * Added Reviewed-by tags were appropriate. Johannes Thumshirn (6): PCI: Add helpers to request/release memory and I/O regions NVMe: Use pci_(request|release)_mem_regions lpfc: Use pci_(request|release)_mem_regions GenWQE: Use pci_(request|release)_mem_regions ethernet/intel: Use pci_(request|release)_mem_regions alx: Use pci_(request|release)_mem_regions drivers/misc/genwqe/card_base.c | 13 + drivers/net/ethernet/atheros/alx/main.c | 12 +--- drivers/net/ethernet/intel/e1000e/netdev.c| 6 ++ drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 11 +++ drivers/net/ethernet/intel/i40e/i40e_main.c | 9 +++-- drivers/net/ethernet/intel/igb/igb_main.c | 10 +++--- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 9 +++-- drivers/nvme/host/pci.c | 10 +++--- drivers/scsi/lpfc/lpfc_init.c | 15 -- include/linux/pci.h | 28 +++ 10 files changed, 59 insertions(+), 64 deletions(-) Cc: Christoph HellwigCc: Keith Busch Cc: Jens Axboe Cc: linux-n...@lists.infradead.org Cc: James Smart Cc: Dick Kennedy Cc: "James E.J. Bottomley" Cc: "Martin K. Petersen" Cc: linux-scsi@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: Frank Haverkamp Cc: Greg Kroah-Hartman Cc: Jay Cliburn Cc: Chris Snook Cc: Jeff Kirsher Cc: David S. Miller Cc: net...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: intel-wired-...@lists.osuosl.org -- 1.8.5.6 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/6] lpfc: Use pci_(request|release)_mem_regions
Now that we do have pci_request_mem_regions() and pci_release_mem_regions() at hand, use it in the lpfc driver. Suggested-by: Christoph HellwigSigned-off-by: Johannes Thumshirn Acked-by: Dick Kennedy Cc: James Smart Cc: Dick Kennedy Cc: "James E.J. Bottomley" Cc: "Martin K. Petersen" Cc: linux-scsi@vger.kernel.org Cc: linux-ker...@vger.kernel.org --- drivers/scsi/lpfc/lpfc_init.c | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index f57d02c..a8735f7 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -4775,20 +4775,17 @@ static int lpfc_enable_pci_dev(struct lpfc_hba *phba) { struct pci_dev *pdev; - int bars = 0; /* Obtain PCI device reference */ if (!phba->pcidev) goto out_error; else pdev = phba->pcidev; - /* Select PCI BARs */ - bars = pci_select_bars(pdev, IORESOURCE_MEM); /* Enable PCI device */ if (pci_enable_device_mem(pdev)) goto out_error; /* Request PCI resource for the device */ - if (pci_request_selected_regions(pdev, bars, LPFC_DRIVER_NAME)) + if (pci_request_mem_regions(pdev, LPFC_DRIVER_NAME)) goto out_disable_device; /* Set up device as PCI master and save state for EEH */ pci_set_master(pdev); @@ -4805,7 +4802,7 @@ out_disable_device: pci_disable_device(pdev); out_error: lpfc_printf_log(phba, KERN_ERR, LOG_INIT, - "1401 Failed to enable pci device, bars:x%x\n", bars); + "1401 Failed to enable pci device\n"); return -ENODEV; } @@ -4820,17 +4817,14 @@ static void lpfc_disable_pci_dev(struct lpfc_hba *phba) { struct pci_dev *pdev; - int bars; /* Obtain PCI device reference */ if (!phba->pcidev) return; else pdev = phba->pcidev; - /* Select PCI BARs */ - bars = pci_select_bars(pdev, IORESOURCE_MEM); /* Release PCI resource and disable PCI device */ - pci_release_selected_regions(pdev, bars); + pci_release_mem_regions(pdev); pci_disable_device(pdev); return; @@ -9705,7 +9699,6 @@ lpfc_pci_remove_one_s3(struct pci_dev *pdev) struct lpfc_vport **vports; struct lpfc_hba *phba = vport->phba; int i; - int bars = pci_select_bars(pdev, IORESOURCE_MEM); spin_lock_irq(>hbalock); vport->load_flag |= FC_UNLOADING; @@ -9780,7 +9773,7 @@ lpfc_pci_remove_one_s3(struct pci_dev *pdev) lpfc_hba_free(phba); - pci_release_selected_regions(pdev, bars); + pci_release_mem_regions(pdev); pci_disable_device(pdev); } -- 1.8.5.6 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 2/2] Documentation/scsi: update scsi_eh.txt about ->host_failed
Update the new concurrency rules of ->host_failed. Signed-off-by: Wei Fang--- Documentation/scsi/scsi_eh.txt |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt index 8638f61..37eca00 100644 --- a/Documentation/scsi/scsi_eh.txt +++ b/Documentation/scsi/scsi_eh.txt @@ -263,19 +263,23 @@ scmd->allowed. 3. scmd recovered ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd - - shost->host_failed-- - clear scmd->eh_eflags - scsi_setup_cmd_retry() - move from local eh_work_q to local eh_done_q LOCKING: none +CONCURRENCY: at most one thread per separate eh_work_q to +keep queue manipulation lockless 4. EH completes ACTION: scsi_eh_flush_done_q() retries scmds or notifies upper - layer of failure. + layer of failure. May be called concurrently but must have + a no more than one thread per separate eh_work_q to + manipulate the queue locklessly - scmd is removed from eh_done_q and scmd->eh_entry is cleared - if retry is necessary, scmd is requeued using scsi_queue_insert() - otherwise, scsi_finish_command() is invoked for scmd + - zero shost->host_failed LOCKING: queue or finish function performs appropriate locking -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 1/2] scsi: fix race between simultaneous decrements of ->host_failed
sas_ata_strategy_handler() adds the works of the ata error handler to system_unbound_wq. This workqueue asynchronously runs work items, so the ata error handler will be performed concurrently on different CPUs. In this case, ->host_failed will be decreased simultaneously in scsi_eh_finish_cmd() on different CPUs, and become abnormal. It will lead to permanently inequal between ->host_failed and ->host_busy, and scsi error handler thread won't become running. IO errors after that won't be handled forever. Since all scmds must have been handled in the strategy handle, just remove the decrement in scsi_eh_finish_cmd() and zero ->host_busy after the strategy handle to fix this race. This fixes the problem introduced in commit 50824d6c5657 ("[SCSI] libsas: async ata-eh"). Reviewed-by: James BottomleySigned-off-by: Wei Fang --- Changes v1->v2: - update Documentation/scsi/scsi_eh.txt about ->host_failed Changes v2->v3: - don't use atomic type, just zero ->host_failed after the strategy handle Changes v3->v4: - add new concurrency rules of ->host_failed in scsi_eh.txt drivers/ata/libata-eh.c |2 +- drivers/scsi/scsi_error.c |4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index 961acc7..91a9e6a 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -606,7 +606,7 @@ void ata_scsi_error(struct Scsi_Host *host) ata_scsi_port_error_handler(host, ap); /* finish or retry handled scmd's and clean up */ - WARN_ON(host->host_failed || !list_empty(_work_q)); + WARN_ON(!list_empty(_work_q)); DPRINTK("EXIT\n"); } diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 984ddcb..1b9c049 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1127,7 +1127,6 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn) */ void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q) { - scmd->device->host->host_failed--; scmd->eh_eflags = 0; list_move_tail(>eh_entry, done_q); } @@ -2226,6 +2225,9 @@ int scsi_error_handler(void *data) else scsi_unjam_host(shost); + /* All scmds have been handled */ + shost->host_failed = 0; + /* * Note - if the above fails completely, the action is to take * individual devices offline and flush the queue of any -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 0/8] nvmet: Add support for multi-tenant configfs
From: Nicholas BellingerHi folks, Here's the first pass of a nvmet multi-tenant configfs layout, following what we've learned in target_core_fabric_configfs.c wrt to independent operation of storage endpoints. Here is how the running RFC-v1 code currently looks: /sys/kernel/config/nvmet/subsystems/ └── nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep ├── namespaces │ └── 1 │ └── ramdisk0 -> ../../../../../target/core/rd_mcp_1/ramdisk0 └── ports └── loop ├── addr_adrfam ├── addr_portid ├── addr_traddr ├── addr_treq ├── addr_trsvcid ├── addr_trtype └── enable Namely, it allows existing /sys/kernel/config/target/core/ backends to be configfs symlinked into ../nvmet/subsystems/$SUBSYS_NQN/ as nvme namespaces. The series exposes T10-PI from /sys/kernel/config/target/core/ as ID_NS.ms + ID_NS.dps feature bits, and enables block integrity support with nvme/loop driver. Note this series depends upon the following prerequisites of target-core: http://marc.info/?l=linux-scsi=146527281416606=2 and of course, today's earlier release of nvmet + friends: http://lists.infradead.org/pipermail/linux-nvme/2016-June/004754.html Note the full set of patches is available from: https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/log/?h=nvmet-configfs-ng Comments..? --nab Nicholas Bellinger (8): nvmet: Add nvmet_fabric_ops get/put transport helpers nvmet: Add support for configfs-ng multi-tenant logic nvmet: Hookup nvmet_ns->dev to nvmet_ns_enable nvmet/io-cmd: Hookup sbc_ops->execute_rw backend ops nvmet/io-cmd: Hookup sbc_ops->execute_sync_cache backend ops nvmet/io-cmd: Hookup sbc_ops->execute_unmap backend ops nvmet/admin-cmd: Hookup T10-PI to ID_NS.ms + ID_NS.dps feature bits nvme/loop: Add support for bio integrity handling drivers/nvme/target/Makefile | 2 +- drivers/nvme/target/admin-cmd.c | 17 ++ drivers/nvme/target/configfs-ng.c | 585 ++ drivers/nvme/target/configfs.c| 5 +- drivers/nvme/target/core.c| 83 +++--- drivers/nvme/target/io-cmd.c | 169 ++- drivers/nvme/target/loop.c| 19 ++ drivers/nvme/target/nvmet.h | 27 +- 8 files changed, 799 insertions(+), 108 deletions(-) create mode 100644 drivers/nvme/target/configfs-ng.c -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 2/8] nvmet: Add support for configfs-ng multi-tenant logic
From: Nicholas BellingerThis patch introduces support for configfs-ng, that allows for multi-tenant /sys/kernel/config/nvmet/subsystems/$SUBSYS_NQN/ operation, using existing /sys/kernel/config/target/core/ backends from target-core to be configfs symlinked as per nvme-target subsystem NQN namespaces. Here's how the layout looks: /sys/kernel/config/nvmet/ └── subsystems └── nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep ├── namespaces │ └── 1 │ └── ramdisk0 -> ../../../../../target/core/rd_mcp_1/ramdisk0 └── ports └── loop ├── addr_adrfam ├── addr_portid ├── addr_traddr ├── addr_treq ├── addr_trsvcid ├── addr_trtype └── enable Also convert nvmet_find_get_subsys to port->nf_subsys, and do the same for nvmet_host_discovery_allowed. Cc: Jens Axboe Cc: Christoph Hellwig Cc: Martin Petersen Cc: Sagi Grimberg Cc: Hannes Reinecke Cc: Mike Christie Signed-off-by: Nicholas Bellinger --- drivers/nvme/target/Makefile | 2 +- drivers/nvme/target/configfs-ng.c | 586 ++ drivers/nvme/target/configfs.c| 5 +- drivers/nvme/target/core.c| 22 +- drivers/nvme/target/nvmet.h | 11 + 5 files changed, 608 insertions(+), 18 deletions(-) create mode 100644 drivers/nvme/target/configfs-ng.c diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile index b7a0623..2799e07 100644 --- a/drivers/nvme/target/Makefile +++ b/drivers/nvme/target/Makefile @@ -3,7 +3,7 @@ obj-$(CONFIG_NVME_TARGET) += nvmet.o obj-$(CONFIG_NVME_TARGET_LOOP) += nvme-loop.o obj-$(CONFIG_NVME_TARGET_RDMA) += nvmet-rdma.o -nvmet-y+= core.o configfs.o admin-cmd.o io-cmd.o fabrics-cmd.o \ +nvmet-y+= core.o configfs-ng.o admin-cmd.o io-cmd.o fabrics-cmd.o \ discovery.o nvme-loop-y+= loop.o nvmet-rdma-y += rdma.o diff --git a/drivers/nvme/target/configfs-ng.c b/drivers/nvme/target/configfs-ng.c new file mode 100644 index 000..d495017 --- /dev/null +++ b/drivers/nvme/target/configfs-ng.c @@ -0,0 +1,586 @@ +/* + * Based on target_core_fabric_configfs.c code + */ +#include +#include +#include +#include +#include +#include +#include + +#include "nvmet.h" + +/* + * nvmet_port Generic ConfigFS definitions. + */ +static ssize_t nvmet_port_addr_adrfam_show(struct config_item *item, + char *page) +{ + switch (to_nvmet_port(item)->disc_addr.adrfam) { + case NVMF_ADDR_FAMILY_IP4: + return sprintf(page, "ipv4\n"); + case NVMF_ADDR_FAMILY_IP6: + return sprintf(page, "ipv6\n"); + case NVMF_ADDR_FAMILY_IB: + return sprintf(page, "ib\n"); + default: + return sprintf(page, "\n"); + } +} + +static ssize_t nvmet_port_addr_adrfam_store(struct config_item *item, + const char *page, size_t count) +{ + struct nvmet_port *port = to_nvmet_port(item); + + if (port->enabled) { + pr_err("Cannot modify address while enabled\n"); + pr_err("Disable the address before modifying\n"); + return -EACCES; + } + + if (sysfs_streq(page, "ipv4")) { + port->disc_addr.adrfam = NVMF_ADDR_FAMILY_IP4; + } else if (sysfs_streq(page, "ipv6")) { + port->disc_addr.adrfam = NVMF_ADDR_FAMILY_IP6; + } else if (sysfs_streq(page, "ib")) { + port->disc_addr.adrfam = NVMF_ADDR_FAMILY_IB; + } else { + pr_err("Invalid value '%s' for adrfam\n", page); + return -EINVAL; + } + + return count; +} + +CONFIGFS_ATTR(nvmet_port_, addr_adrfam); + +static ssize_t nvmet_port_addr_portid_show(struct config_item *item, + char *page) +{ + struct nvmet_port *port = to_nvmet_port(item); + + return snprintf(page, PAGE_SIZE, "%d\n", + le16_to_cpu(port->disc_addr.portid)); +} + +static ssize_t nvmet_port_addr_portid_store(struct config_item *item, + const char *page, size_t count) +{ + struct nvmet_port *port = to_nvmet_port(item); + u16 portid = 0; + + if (kstrtou16(page, 0, )) { + pr_err("Invalid value '%s' for portid\n", page); + return -EINVAL; + } + + if (port->enabled) { + pr_err("Cannot modify address while enabled\n"); + pr_err("Disable the address before modifying\n"); + return -EACCES; + } + port->disc_addr.portid = cpu_to_le16(portid); + return count; +} + +CONFIGFS_ATTR(nvmet_port_, addr_portid); + +static ssize_t nvmet_port_addr_traddr_show(struct
[RFC 5/8] nvmet/io-cmd: Hookup sbc_ops->execute_sync_cache backend ops
From: Nicholas BellingerThis patch converts nvmet_execute_flush() to utilize sbc_ops->execute_sync_cache() for target_iostate submission into existing backends drivers via configfs in /sys/kernel/config/target/core/. Cc: Jens Axboe Cc: Christoph Hellwig Cc: Martin Petersen Cc: Sagi Grimberg Cc: Hannes Reinecke Cc: Mike Christie Signed-off-by: Nicholas Bellinger --- drivers/nvme/target/io-cmd.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c index 133a14a..23905a8 100644 --- a/drivers/nvme/target/io-cmd.c +++ b/drivers/nvme/target/io-cmd.c @@ -109,18 +109,21 @@ static void nvmet_execute_rw(struct nvmet_req *req) static void nvmet_execute_flush(struct nvmet_req *req) { -#if 0 - struct bio *bio; + struct target_iostate *ios = >t_iostate; + struct se_device *dev = rcu_dereference_raw(req->ns->dev); + struct sbc_ops *sbc_ops = dev->transport->sbc_ops; + sense_reason_t rc; - nvmet_inline_bio_init(req); - bio = >inline_bio; + if (!sbc_ops || !sbc_ops->execute_sync_cache) { + nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR); + return; + } - bio->bi_bdev = req->ns->bdev; - bio->bi_private = req; - bio->bi_end_io = nvmet_bio_done; + ios->se_dev = dev; + ios->iomem = NULL; + ios->t_comp_func = _complete_ios; - submit_bio(WRITE_FLUSH, bio); -#endif + rc = sbc_ops->execute_sync_cache(ios, false); } #if 0 -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 7/8] nvmet/admin-cmd: Hookup T10-PI to ID_NS.ms + ID_NS.dps feature bits
From: Nicholas BellingerThis patch updates nvmet_execute_identify_ns() to report target-core backend T10-PI related feature bits to the NVMe host controller. Note this assumes support for NVME_NS_DPC_PI_TYPE1 and NVME_NS_DPC_PI_TYPE3 as reported by backend drivers via /sys/kernel/config/target/core/*/*/attrib/pi_prot_type. Cc: Jens Axboe Cc: Christoph Hellwig Cc: Martin Petersen Cc: Sagi Grimberg Cc: Hannes Reinecke Cc: Mike Christie Signed-off-by: Nicholas Bellinger --- drivers/nvme/target/admin-cmd.c | 17 + 1 file changed, 17 insertions(+) diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c index 240e323..3a808dc 100644 --- a/drivers/nvme/target/admin-cmd.c +++ b/drivers/nvme/target/admin-cmd.c @@ -200,6 +200,7 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req) { struct nvmet_ns *ns; struct nvme_id_ns *id; + struct se_device *dev; u16 status = 0; ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid); @@ -228,6 +229,22 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req) id->nlbaf = 0; id->flbas = 0; + /* Populate bits for T10-PI from se_device backend */ + rcu_read_lock(); + dev = rcu_dereference(ns->dev); + if (dev && dev->dev_attrib.pi_prot_type) { + int pi_prot_type = dev->dev_attrib.pi_prot_type; + + id->lbaf[0].ms = cpu_to_le16(sizeof(struct t10_pi_tuple)); + printk("nvmet_set_id_ns: ms: %u\n", id->lbaf[0].ms); + + if (pi_prot_type == 1) + id->dps = NVME_NS_DPC_PI_TYPE1; + else if (pi_prot_type == 3) + id->dps = NVME_NS_DPC_PI_TYPE3; + } + rcu_read_unlock(); + /* * Our namespace might always be shared. Not just with other * controllers, but also with any other user of the block device. -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 8/8] nvme/loop: Add support for bio integrity handling
From: Nicholas BellingerThis patch adds support for nvme/loop block integrity, based upon the reported ID_NS.ms + ID_NS.dps feature bits in nvmet_execute_identify_ns(). Cc: Jens Axboe Cc: Christoph Hellwig Cc: Martin Petersen Cc: Sagi Grimberg Cc: Hannes Reinecke Cc: Mike Christie Signed-off-by: Nicholas Bellinger --- drivers/nvme/target/loop.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c index 08b4fbb..b4b4da9 100644 --- a/drivers/nvme/target/loop.c +++ b/drivers/nvme/target/loop.c @@ -42,6 +42,7 @@ struct nvme_loop_iod { struct nvme_loop_queue *queue; struct work_struct work; struct sg_table sg_table; + struct scatterlist meta_sg; struct scatterlist first_sgl[]; }; @@ -193,6 +194,24 @@ static int nvme_loop_queue_rq(struct blk_mq_hw_ctx *hctx, BUG_ON(iod->req.sg_cnt > req->nr_phys_segments); } + if (blk_integrity_rq(req)) { + int count; + + if (blk_rq_count_integrity_sg(hctx->queue, req->bio) != 1) + BUG_ON(1); + + sg_init_table(>meta_sg, 1); + count = blk_rq_map_integrity_sg(hctx->queue, req->bio, + >meta_sg); + + iod->req.prot_sg = >meta_sg; + iod->req.prot_sg_cnt = 1; +#if 0 + printk("nvme/loop: Set prot_sg %p and prot_sg_cnt: %d\n", + iod->req.prot_sg, iod->req.prot_sg_cnt); +#endif + } + iod->cmd.common.command_id = req->tag; blk_mq_start_request(req); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 6/8] nvmet/io-cmd: Hookup sbc_ops->execute_unmap backend ops
From: Nicholas BellingerThis patch converts nvmet_execute_discard() to utilize sbc_ops->execute_unmap() for target_iostate submission into existing backends drivers via configfs in /sys/kernel/config/target/core/. Cc: Jens Axboe Cc: Christoph Hellwig Cc: Martin Petersen Cc: Sagi Grimberg Cc: Hannes Reinecke Cc: Mike Christie Signed-off-by: Nicholas Bellinger --- drivers/nvme/target/io-cmd.c | 47 1 file changed, 34 insertions(+), 13 deletions(-) diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c index 23905a8..605f560 100644 --- a/drivers/nvme/target/io-cmd.c +++ b/drivers/nvme/target/io-cmd.c @@ -126,52 +126,73 @@ static void nvmet_execute_flush(struct nvmet_req *req) rc = sbc_ops->execute_sync_cache(ios, false); } -#if 0 -static u16 nvmet_discard_range(struct nvmet_ns *ns, - struct nvme_dsm_range *range, int type, struct bio **bio) +static u16 nvmet_discard_range(struct nvmet_req *req, struct sbc_ops *sbc_ops, + struct nvme_dsm_range *range, struct bio **bio) { - if (__blkdev_issue_discard(ns->bdev, + struct nvmet_ns *ns = req->ns; + sense_reason_t rc; + + rc = sbc_ops->execute_unmap(>t_iostate, le64_to_cpu(range->slba) << (ns->blksize_shift - 9), le32_to_cpu(range->nlb) << (ns->blksize_shift - 9), - GFP_KERNEL, type, bio)) + bio); + if (rc) return NVME_SC_INTERNAL | NVME_SC_DNR; return 0; } -#endif + +static void nvmet_discard_bio_done(struct bio *bio) +{ + struct nvmet_req *req = bio->bi_private; + int err = bio->bi_error; + + bio_put(bio); + nvmet_req_complete(req, err ? NVME_SC_INTERNAL | NVME_SC_DNR : 0); +} static void nvmet_execute_discard(struct nvmet_req *req) { -#if 0 - struct nvme_dsm_range range; + struct target_iostate *ios = >t_iostate; + struct se_device *dev = rcu_dereference_raw(req->ns->dev); + struct sbc_ops *sbc_ops = dev->transport->sbc_ops; struct bio *bio = NULL; - int type = REQ_WRITE | REQ_DISCARD, i; + struct nvme_dsm_range range; + int i; u16 status; + if (!sbc_ops || !sbc_ops->execute_unmap) { + nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR); + return; + } + + ios->se_dev = dev; + ios->iomem = NULL; + ios->t_comp_func = NULL; + for (i = 0; i <= le32_to_cpu(req->cmd->dsm.nr); i++) { status = nvmet_copy_from_sgl(req, i * sizeof(range), , sizeof(range)); if (status) break; - status = nvmet_discard_range(req->ns, , type, ); + status = nvmet_discard_range(req, sbc_ops, , ); if (status) break; } if (bio) { bio->bi_private = req; - bio->bi_end_io = nvmet_bio_done; + bio->bi_end_io = nvmet_discard_bio_done; if (status) { bio->bi_error = -EIO; bio_endio(bio); } else { - submit_bio(type, bio); + submit_bio(REQ_WRITE | REQ_DISCARD, bio); } } else { nvmet_req_complete(req, status); } -#endif } static void nvmet_execute_dsm(struct nvmet_req *req) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 3/8] nvmet: Hookup nvmet_ns->dev to nvmet_ns_enable
From: Nicholas BellingerThis patch hooks up nvmet_ns_enable() to accept the RCU protected struct se_device provided as a configfs symlink from existing /sys/kernel/config/target/core/ driver backends. Also, drop the now unused internal ns->bdev + ns->device_path usage, and add WIP stubs for nvmet/io-cmd sbc_ops backend conversion to be added in subsequent patches. Cc: Jens Axboe Cc: Christoph Hellwig Cc: Martin Petersen Cc: Sagi Grimberg Cc: Hannes Reinecke Cc: Mike Christie Signed-off-by: Nicholas Bellinger --- drivers/nvme/target/configfs-ng.c | 3 +-- drivers/nvme/target/core.c| 30 -- drivers/nvme/target/io-cmd.c | 17 +++-- drivers/nvme/target/nvmet.h | 6 ++ 4 files changed, 26 insertions(+), 30 deletions(-) diff --git a/drivers/nvme/target/configfs-ng.c b/drivers/nvme/target/configfs-ng.c index d495017..e160186 100644 --- a/drivers/nvme/target/configfs-ng.c +++ b/drivers/nvme/target/configfs-ng.c @@ -392,8 +392,7 @@ static int nvmet_ns_link(struct config_item *ns_ci, struct config_item *dev_ci) return -ENOSYS; } - // XXX: Pass in struct se_device into nvmet_ns_enable - return nvmet_ns_enable(ns); + return nvmet_ns_enable(ns, dev); } static int nvmet_ns_unlink(struct config_item *ns_ci, struct config_item *dev_ci) diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index 7b42d2b..171e440 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -13,6 +13,8 @@ */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include +#include +#include #include "nvmet.h" static struct nvmet_fabrics_ops *nvmet_transports[NVMF_TRTYPE_MAX]; @@ -287,7 +289,7 @@ void nvmet_put_namespace(struct nvmet_ns *ns) percpu_ref_put(>ref); } -int nvmet_ns_enable(struct nvmet_ns *ns) +int nvmet_ns_enable(struct nvmet_ns *ns, struct se_device *dev) { struct nvmet_subsys *subsys = ns->subsys; struct nvmet_ctrl *ctrl; @@ -297,23 +299,14 @@ int nvmet_ns_enable(struct nvmet_ns *ns) if (!list_empty(>dev_link)) goto out_unlock; - ns->bdev = blkdev_get_by_path(ns->device_path, FMODE_READ | FMODE_WRITE, - NULL); - if (IS_ERR(ns->bdev)) { - pr_err("nvmet: failed to open block device %s: (%ld)\n", - ns->device_path, PTR_ERR(ns->bdev)); - ret = PTR_ERR(ns->bdev); - ns->bdev = NULL; - goto out_unlock; - } - - ns->size = i_size_read(ns->bdev->bd_inode); - ns->blksize_shift = blksize_bits(bdev_logical_block_size(ns->bdev)); + rcu_assign_pointer(ns->dev, dev); + ns->size = dev->transport->get_blocks(dev) * dev->dev_attrib.hw_block_size; + ns->blksize_shift = blksize_bits(dev->dev_attrib.hw_block_size); ret = percpu_ref_init(>ref, nvmet_destroy_namespace, 0, GFP_KERNEL); if (ret) - goto out_blkdev_put; + goto out_unlock; if (ns->nsid > subsys->max_nsid) subsys->max_nsid = ns->nsid; @@ -343,10 +336,6 @@ int nvmet_ns_enable(struct nvmet_ns *ns) out_unlock: mutex_unlock(>lock); return ret; -out_blkdev_put: - blkdev_put(ns->bdev, FMODE_WRITE|FMODE_READ); - ns->bdev = NULL; - goto out_unlock; } void nvmet_ns_disable(struct nvmet_ns *ns) @@ -379,16 +368,13 @@ void nvmet_ns_disable(struct nvmet_ns *ns) list_for_each_entry(ctrl, >ctrls, subsys_entry) nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE, 0, 0); - if (ns->bdev) - blkdev_put(ns->bdev, FMODE_WRITE|FMODE_READ); + rcu_assign_pointer(ns->dev, NULL); mutex_unlock(>lock); } void nvmet_ns_free(struct nvmet_ns *ns) { nvmet_ns_disable(ns); - - kfree(ns->device_path); kfree(ns); } diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c index 76dbf73..38c2e97 100644 --- a/drivers/nvme/target/io-cmd.c +++ b/drivers/nvme/target/io-cmd.c @@ -16,6 +16,7 @@ #include #include "nvmet.h" +#if 0 static void nvmet_bio_done(struct bio *bio) { struct nvmet_req *req = bio->bi_private; @@ -26,6 +27,7 @@ static void nvmet_bio_done(struct bio *bio) if (bio != >inline_bio) bio_put(bio); } +#endif static inline u32 nvmet_rw_len(struct nvmet_req *req) { @@ -33,6 +35,7 @@ static inline u32 nvmet_rw_len(struct nvmet_req *req) req->ns->blksize_shift; } +#if 0 static void nvmet_inline_bio_init(struct nvmet_req *req) { struct bio *bio = >inline_bio; @@ -41,21 +44,23 @@ static void nvmet_inline_bio_init(struct nvmet_req *req) bio->bi_max_vecs = NVMET_MAX_INLINE_BIOVEC;
[RFC 4/8] nvmet/io-cmd: Hookup sbc_ops->execute_rw backend ops
From: Nicholas BellingerThis patch converts nvmet_execute_rw() to utilize sbc_ops->execute_rw() for target_iostate + target_iomem based I/O submission into existing backends drivers via configfs in /sys/kernel/config/target/core/. This includes support for passing T10-PI scatterlists via target_iomem into existing sbc_ops->execute_rw() logic, and is functioning with IBLOCK, FILEIO, and RAMDISK. Note the preceeding target/iblock patch absorbs inline bio + bvecs and blk_poll() optimizations from Ming + Sagi in nvmet/io-cmd into target_core_iblock.c code. Cc: Jens Axboe Cc: Christoph Hellwig Cc: Martin Petersen Cc: Sagi Grimberg Cc: Hannes Reinecke Cc: Mike Christie Signed-off-by: Nicholas Bellinger --- drivers/nvme/target/io-cmd.c | 116 ++- drivers/nvme/target/nvmet.h | 7 +++ 2 files changed, 67 insertions(+), 56 deletions(-) diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c index 38c2e97..133a14a 100644 --- a/drivers/nvme/target/io-cmd.c +++ b/drivers/nvme/target/io-cmd.c @@ -14,20 +14,16 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include #include +#include +#include #include "nvmet.h" -#if 0 -static void nvmet_bio_done(struct bio *bio) +static void nvmet_complete_ios(struct target_iostate *ios, u16 status) { - struct nvmet_req *req = bio->bi_private; - - nvmet_req_complete(req, - bio->bi_error ? NVME_SC_INTERNAL | NVME_SC_DNR : 0); + struct nvmet_req *req = container_of(ios, struct nvmet_req, t_iostate); - if (bio != >inline_bio) - bio_put(bio); + nvmet_req_complete(req, status ? NVME_SC_INTERNAL | NVME_SC_DNR : 0); } -#endif static inline u32 nvmet_rw_len(struct nvmet_req *req) { @@ -35,72 +31,80 @@ static inline u32 nvmet_rw_len(struct nvmet_req *req) req->ns->blksize_shift; } -#if 0 -static void nvmet_inline_bio_init(struct nvmet_req *req) -{ - struct bio *bio = >inline_bio; - - bio_init(bio); - bio->bi_max_vecs = NVMET_MAX_INLINE_BIOVEC; - bio->bi_io_vec = req->inline_bvec; -} -#endif - static void nvmet_execute_rw(struct nvmet_req *req) { -#if 0 - int sg_cnt = req->sg_cnt; - struct scatterlist *sg; - struct bio *bio; + struct target_iostate *ios = >t_iostate; + struct target_iomem *iomem = >t_iomem; + struct se_device *dev = rcu_dereference_raw(req->ns->dev); + struct sbc_ops *sbc_ops = dev->transport->sbc_ops; sector_t sector; - blk_qc_t cookie; - int rw, i; -#endif + enum dma_data_direction data_direction; + sense_reason_t rc; + bool fua_write = false, prot_enabled = false; + + if (!sbc_ops || !sbc_ops->execute_rw) { + nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR); + return; + } + if (!req->sg_cnt) { nvmet_req_complete(req, 0); return; } -#if 0 + if (req->cmd->rw.opcode == nvme_cmd_write) { if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA)) - rw = WRITE_FUA; - else - rw = WRITE; + fua_write = true; + + data_direction = DMA_TO_DEVICE; } else { - rw = READ; + data_direction = DMA_FROM_DEVICE; } sector = le64_to_cpu(req->cmd->rw.slba); sector <<= (req->ns->blksize_shift - 9); - nvmet_inline_bio_init(req); - bio = >inline_bio; - bio->bi_bdev = req->ns->bdev; - bio->bi_iter.bi_sector = sector; - bio->bi_private = req; - bio->bi_end_io = nvmet_bio_done; - - for_each_sg(req->sg, sg, req->sg_cnt, i) { - while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset) - != sg->length) { - struct bio *prev = bio; - - bio = bio_alloc(GFP_KERNEL, min(sg_cnt, BIO_MAX_PAGES)); - bio->bi_bdev = req->ns->bdev; - bio->bi_iter.bi_sector = sector; - - bio_chain(bio, prev); - cookie = submit_bio(rw, prev); - } + ios->t_task_lba = sector; + ios->data_length = nvmet_rw_len(req); + ios->data_direction = data_direction; + iomem->t_data_sg = req->sg; + iomem->t_data_nents = req->sg_cnt; + iomem->t_prot_sg = req->prot_sg; + iomem->t_prot_nents = req->prot_sg_cnt; + + // XXX: Make common between sbc_check_prot and nvme-target + switch (dev->dev_attrib.pi_prot_type) { + case TARGET_DIF_TYPE3_PROT: + ios->reftag_seed = 0x; + prot_enabled = true; + break; + case
[RFC 1/8] nvmet: Add nvmet_fabric_ops get/put transport helpers
From: Nicholas BellingerThis patch introduces two helpers for obtaining + releasing nvmet_fabric_ops for nvmet_port usage, and the associated struct module ops->owner reference. This is required in order to support nvmet/configfs-ng and multiple nvmet_port configfs groups living under /sys/kernel/config/nvmet/subsystems/$SUBSYS_NQN/ports/ Cc: Jens Axboe Cc: Christoph Hellwig Cc: Martin Petersen Cc: Sagi Grimberg Cc: Hannes Reinecke Cc: Mike Christie Signed-off-by: Nicholas Bellinger --- drivers/nvme/target/core.c | 31 +++ drivers/nvme/target/nvmet.h | 3 +++ 2 files changed, 34 insertions(+) diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index e0b3f01..9af813c 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -191,6 +191,37 @@ void nvmet_disable_port(struct nvmet_port *port) module_put(ops->owner); } +struct nvmet_fabrics_ops *nvmet_get_transport(struct nvmet_port *port) +{ + struct nvmet_fabrics_ops *ops; + + down_write(_config_sem); + ops = nvmet_transports[port->disc_addr.trtype]; + if (!ops) { + pr_err("transport type %d not supported\n", + port->disc_addr.trtype); + return ERR_PTR(-EINVAL); + } + + if (!try_module_get(ops->owner)) { + up_write(_config_sem); + return ERR_PTR(-EINVAL); + } + up_write(_config_sem); + + return ops; +} + +void nvmet_put_transport(struct nvmet_port *port) +{ + struct nvmet_fabrics_ops *ops; + + down_write(_config_sem); + ops = nvmet_transports[port->disc_addr.trtype]; + module_put(ops->owner); + up_write(_config_sem); +} + static void nvmet_keep_alive_timer(struct work_struct *work) { struct nvmet_ctrl *ctrl = container_of(to_delayed_work(work), diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h index 57dd6d8..2bf15088b 100644 --- a/drivers/nvme/target/nvmet.h +++ b/drivers/nvme/target/nvmet.h @@ -299,6 +299,9 @@ void nvmet_unregister_transport(struct nvmet_fabrics_ops *ops); int nvmet_enable_port(struct nvmet_port *port); void nvmet_disable_port(struct nvmet_port *port); +struct nvmet_fabrics_ops *nvmet_get_transport(struct nvmet_port *port); +void nvmet_put_transport(struct nvmet_port *port); + void nvmet_referral_enable(struct nvmet_port *parent, struct nvmet_port *port); void nvmet_referral_disable(struct nvmet_port *port); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NVMe over Fabrics target implementation
Hi HCH & Co, On Mon, 2016-06-06 at 23:22 +0200, Christoph Hellwig wrote: > This patch set adds a generic NVMe over Fabrics target. The > implementation conforms to the NVMe 1.2b specification (which > includes Fabrics) and provides the NVMe over Fabrics access > to Linux block devices. > Thanks for all of the development work by the fabric_linux_driver team (HCH, Sagi, Ming, James F., James S., and Dave M.) over the last year. Very excited to see this code get a public release now that NVMf specification is out. Now that it's in the wild, it's a good opportunity to discuss some of the more interesting implementation details, beyond the new NVMf wire-protocol itself. (Adding target-devel + linux-scsi CC') > The target implementation consists of several elements: > > - NVMe target core: defines and manages the NVMe entities (subsystems, > controllers, namespaces, ...) and their allocation, responsible > for initial commands processing and correct orchestration of > the stack setup and tear down. > > - NVMe admin command implementation: responsible for parsing and > servicing admin commands such as controller identify, set features, > keep-alive, log page, ...). > > - NVMe I/O command implementation: responsible for performing the actual > I/O (Read, Write, Flush, Deallocate (aka Discard). It is a very thin > layer on top of the block layer and implements no logic of it's own. > To support exporting file systems please use the loopback block driver > in direct I/O mode, which gives very good performance. > > - NVMe over Fabrics support: responsible for servicing Fabrics commands > (connect, property get/set). > > - NVMe over Fabrics discovery service: responsible to serve the Discovery > log page through a special cut down Discovery controller. > > The target is configured using configfs, and configurable entities are: > > - NVMe subsystems and namespaces > - NVMe over Fabrics ports and referrals > - Host ACLs for primitive access control - NVMe over Fabrics access >control is still work in progress at the specification level and >will be implemented once that work has finished. > > To configure the target use the nvmetcli tool from > http://git.infradead.org/users/hch/nvmetcli.git, which includes detailed > setup documentation. > > In addition to the Fabrics target implementation we provide a loopback > driver which also conforms the NVMe over Fabrics specification and allows > evaluation of the target stack with local access without requiring a real > fabric. > So as-is, I have two main objections that been discussed off-list for some time, that won't be a big surprise to anyone following fabrics_linux_driver list. ;P First topic, I think nvme-target name-spaces should be utilizing existing configfs logic, and sharing /sys/kernel/config/target/core/ backend driver symlinks as individual nvme-target subsystem namespaces. That is, we've already got a configfs ABI in place for target mode back-ends that today is able to operate independently from SCSI architecture model dependencies. To that end, the prerequisite series to allow target-core backends to operate independent of se_cmd, and allow se_device backends to be configfs symlinked directly into /sys/kernel/config/nvmet/, outside of /sys/kernel/config/target/$FABRIC/ has been posted earlier here: http://marc.info/?l=linux-scsi=146527281416606=2 Note the -v2 series has absorbed the nvmet/io-cmd execute_rw() improvements from Sagi + Ming (inline bio/bvec and blk_poll) into target_core_iblock.c driver code. Second topic, and more important from a kernel ABI perspective are the current scale limitations around the first pass of nvmet configfs.c layout code in /sys/kernel/config/nvmet/. Namely, the design of having three top level configfs groups in /sys/kernel/config/nvmet/[subsystems,ports,hosts] that are configfs symlinked between each other, with a single rw_mutex (nvmet_config_sem) used for global list lookup and enforcing a globally synchronized nvmet_fabrics_ops->add_port() creation across all subsystem NQN ports. >From the shared experience in target_core_fabric_configfs.c over the last 8 years, perhaps the greatest strength of configfs has been it's ability to allow config_item_type parent/child relationships to exist and operate independently of one another. Specifically in the context of storage tenants, this means creation + deletion of one backend + target fabric endpoint tenant, should not block creation + deletion of another backend + target fabric endpoint tenant. As-is, a nvmet configfs layout holding a global mutex across subsystem/port/host creation + deletion, and doing internal list lookup within configfs ->allow_link + ->drop_link callbacks ends up being severely limiting when scaling up the total number of nvmet subsystem NQNs and ports. Specifically, modern deployments of /sys/kernel/config/target/iscsi/ expect backends + fabric endpoints to be configured in parallel at <