Re: Behavior for hot-plugging SD card in USB SD card reader

2016-06-07 Thread Frkuska, Joshua

Hello,

I would appreciate any feedback or suggestions regarding the scenario below.

Thanks again


On 05/12/2016 11:02 PM, Frkuska, Joshua wrote:

Hello,

I have two questions regarding the behavior for hot-plugging SD card 
in USB SD card reader relating to the scsi sub system.


Before the questions, the background and my understanding:

On the 3.14.y branch and later (not sure about 4.2+), when using a USB 
SD card reader with sd card inserted and filesystem mounted, there are 
4 possible cases for physical removal of the sd card. The first is 
when the card (not usb reader) is removed and the scsi device is not 
busy (being accessed). The second is with the sd card busy (being 
accessed) and then the sd card is removed. The 3rd and 4th are the 
similar but with the usb reader itself being removed with the sd card 
inserted. This mail focuses on the behavior of the second case where 
the sd card is busy and then removed.


If my understanding is correct, in the scenario above, access of the 
scsi disk block device keeps the reference count positive. This is 
followed by the removal of the sd card which in the non-busy case 
would make the reference count zero and allow the disk partitions to 
be unmounted. However in the busy case any new sd card inserted into 
the card reader should recalculate the geometry and allow the disk 
partitions to be discovered and be auto-mounted. However, it seems 
that the scsi disk reference count does not go to zero and we are 
unable to unmount the partition. A lazy unmount appears to occur 
leaving a stale device node. Similar behavior can be seen on inserting 
a USB flash based storage device and executing the same problem case. 
In the case of a USB flash drive, after hotplug and re-insertion, a 
new block device node gets assigned as the previous one is still busy. 
In the case of the SD card, a new block device node does not get created.


A simple recreation case is to use 'tail -f' on any file on the 
auto-mounted sd card file system, then remove the SD card and then 
insert a second different SD card. The second SD card does not 
auto-mount as one would first think. The device is not listed in MP 
shadow fblock nor mounted at all.


In the case of the USB flash drive, the host adapter is created, scsi 
bus scanned, finding the scsi device controller on the bus. The device 
controller responds to the scan and the scsi disk is populated with 
appropriate geometry sizes. This is provided to the block layer, then 
to the vfs which the auto-mounter picks up on and mounts the 
partitions. In the case of the USB SD card reader, when the card 
reader is plugged in without SD card populated, the scsi device 
controller is still found, root scsi disk node still created, but its 
geometry is 0 and the device is put in a NO_MEDIUM_INSERTED state. Any 
open attempts on the device node will fail with a NOMEDIUM error. 
Insertion of an SD card changes the reported geometry of the SCSI disk 
during a periodic scan, the NOMEDIUM state removed and disk 
information updated. As far as kernel events go, a 'change' and 'add' 
events are provided to up to userland.


When a USB flash drive is busy (being accessed) when hotplugged 
(removed), the partition remains mounted and root and subsequent 
re-insertion creates a new root scsi device node in sequential 
alphabetical order (e.g. /dev/sdb). The reason re-population of the 
scsi disk works in the USB case is due to the entire virtual host 
being destroyed and repopulated. This repopulates the parent device 
node after the device controller is scanned and found on the virtual 
scsi bus. In the case of the sd card, removal of the SD card only 
changes the medium inserted state and does not result in a scsi disk 
root node re-population.


I have two questions.

1. In the case of a USB card reader sd card being hotplugged while 
busy, is the current behavior desired? If so, why? An embedded system 
where the sd card reader is physically locked/mounted to the usb hub 
could see such a scenario if the sd card is being accessed during 
removal from the user. Such a usecase is possible on PC as well.
2. If this behavior should be more similar to a USB flash drive during 
hotplug what layer modifications would be necessary? (mass storage 
driver, scsi disk, scsi layer, block, etc)


I have a few ideas how this could be done but I would appreciate any 
additional thoughts or feedback regarding creating similar behavior to 
a usb hotplug.


The current thoughts I have are:

a.) change the behavior of the scsi disk layer in the case when the 
media is ejected with its user count not equal zero. The idea is to 
force drop (up and down the layer stack) the existing references to 
the disk so that it can be properly cleaned up and removed. This is a 
little different than the usb case above as the scsi device node will 
be re-used.
b.) Use a notifier chain to notify the usb mass storage driver to 
reset and re-instantiate the virtual host 

Re: NVMe over Fabrics target implementation

2016-06-07 Thread Nicholas A. Bellinger
On Tue, 2016-06-07 at 12:55 +0200, Christoph Hellwig wrote:
> There is absolutely no point in dragging in an overcomplicated configfs 
> structure for a very simple protocol which also is very different from
> SCSI in it's nitty gritty details.

Please be more specific wrt the two individual points that have been
raised.

>  Keeping the nvme target self contains
> allows it to be both much simpler and much easier to understand, as well
> as much better testable - see the amount of test coverage we could easily
> add for example.

I disagree.

> 
> Or to put it the other way around - if there was any major synergy in
> reusing the SCSI target code that just shows we're missing functionality
> in the block layer or configfs.
> 

To reiterate the points again.

*) Extensible to multiple types of backend drivers.

nvme-target needs a way to absorb new backend drivers, that
does not effect existing configfs group layout or attributes.

Looking at the nvmet/configfs layout as-is, there are no multiple
backend types defined, nor a way to control backend feature bits
exposed to nvme namespaces at runtime.

What is being proposed is a way to share target-core backends via
existing configfs symlinks across SCSI and NVMe targets.

Which means:

   - All I/O state + memory submission is done at RCU protected
 se_device level via sbc_ops
   - percpu reference counting is done outside of target-core
   - Absorb all nvmet/io-cmd optimizations into target_core_iblock.c
   - Base starting point for features in SCSI + NVMe that span
 across multiple endpoints and instances (reservations + APTPL, 
 multipath, copy-offload across fabric types)

Using target-core backends means we get features like T10-PI and
sbc_ops->write_same for free that don't exist in nvmet, and can
utilize a common set of backend drivers for SCSI and NVMe via an
existing configfs ABI and python userspace community.

And to the second, and more important point for defining a configfs ABI
that works for both today's requirements, as well into the 2020s
without breaking user-space compatibility.

As-is, the initial design using top level nvmet configfs symlinks of
subsystem groups into individual port + host groups does not scale.

That is, it currently does:

  - Sequential list lookup under global rw_mutex of top-level nvmet_port
and nvmet_host symlink ->allow_link() and ->drop_link() configfs
callbacks.
  - nvmet_fabrics_ops->add_port() callback invoked under same global
rw mutex.

This is very bad for several reasons.

As-is, this blocks all other configfs port + host operations from
occurring even during normal operation, which makes it quite useless for
any type of multi-tenant target environment where the individual target
endpoints *must* be able to operate independently.

Seriously, there is never a good reason why configfs group or item
callbacks should be performing list lookup under a global lock at
this level.

Why does it ever make sense for $SUBSYSTEM_NQN_0 with $PORT_DRIVER_FOO
to block operation of $SUBSYSTEM_NQN_1 with $PORT_DRIVER_BAR..?

A simple example where this design breaks down quickly is a NVMf
ops->add_port() call that requires a HW reset, or say reloading of
firmware that can take multiple seconds. (qla2xxx comes to mind).

There is a simple test to highlight this limitation.  Take any
nvme-target driver that is capable of multiple ports, and introduce
a sleep(5) into each ops->add_port() call.

Now create 256 different subsystem NQNs with 256 different ports
across four different user-space processes.

What happens to other subsystems, ports and host groups configfs
symlinks when this occurs..?

What happens to the other user-space processes..?

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] sd: remove redundant check for BLK_DEF_MAX_SECTORS

2016-06-07 Thread Long Li
Hi Martin,

Thanks for looking into this. The problem I'm trying to solve is that, I want 
to have lower layer driver to setup max_sectors bigger than 
BLK_DEF_MAX_SECTORS. In Hyper-v, we use 2MB max transfer I/O size, in future 
version the max transfer I/O size will increase to 8MB.
 
The implementation of sd.c limits the maximum value of max_sectors  to 
BLK_DEF_MAX_SECTORS.  Because sd_revalidate_disk is called late in the SCSI 
disk initialization process, there is no way for a lower layer driver to set 
this value to its "bigger" optimal size. 

The reason why I think it may not be necessary for sd.c to setup max_sectors, 
it's because this value may have already been setup twice before reaching the 
code in sd.c:
1. When this disk device is first scanned, or re-scanned (in scsi_scan.c), 
where it eventually calls __scsi_init_queue(), and use the max_sectors in the 
scsi_host_template.
2. in slave_configure of scsi_host_template, when the lower layer driver 
implements this function in its template and it can change this value there.

Long

> -Original Message-
> From: Martin K. Petersen [mailto:martin.peter...@oracle.com]
> Sent: Monday, June 6, 2016 8:42 PM
> To: Long Li 
> Cc: Tom Yan ; James E.J. Bottomley
> ; Martin K. Petersen
> ; linux-scsi@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH] sd: remove redundant check for
> BLK_DEF_MAX_SECTORS
> 
> > "Long" == Long Li  writes:
> 
> Long,
> 
> Long> The reason is that, max_sectors already has value at this point,
> Long> the default value is SCSI_DEFAULT_MAX_SECTORS
> Long> (include/scsi/scsi_host.h). The lower layer host driver can change
> Long> this value in its template.
> 
> The LLD sets max_hw_sectors which indicates the capabilities of the
> controller DMA hardware. Whereas the max_sectors limit is set by sd to
> either follow advise by the device or--if not provided--use the block layer
> default. max_sectors governs the size of READ/WRITE requests and do not
> reflect the capabilities of the DMA hardware.
> 
> Long> I think the drivers care about this value have already set it. So
> Long> it's better not to change it again. If they want max_sectors to be
> Long> set by sd, they can use BLOCK LIMITS VPD to tell it to do so.
> 
> Most drivers don't have the luxury of being able to generate VPDs for their
> attached target devices :)
> 
> --
> Martin K. PetersenOracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 2/2] nvme/loop: Add support for controller-per-port model

2016-06-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch introduces loopback support for a nvme host
controller per nvmet_port instance model, following what
we've done in drivers/target/loopback/ for allowing
multiple host LLDs to co-exist.

It changes nvme_loop_add_port() to use struct nvme_loop_port
and take the nvmf_get_default_host() reference, and invokes
device_register() to nvme_loop_driver_probe() to kick off
controller creation within nvme_loop_create_ctrl().

This allows nvme_loop_queue_rq to setup iod->req.port to
the per nvmet_port pointer, instead of a single hardcoded
global nvmet_loop_port.

Subsequently, it also adds nvme_loop_remove_port() to call
device_unregister() and call nvme_loop_del_ctrl() and
nvmf_free_options() to drop vmet_port's nvme_default_host
rereference, when the nvmet_port port is being removed
from the associated nvmet_subsys.

Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Keith Busch 
Cc: Jay Freyensee 
Cc: Martin Petersen 
Cc: Sagi Grimberg 
Cc: Hannes Reinecke 
Cc: Mike Christie 
Signed-off-by: Nicholas Bellinger 
---
 drivers/nvme/target/loop.c | 183 -
 1 file changed, 165 insertions(+), 18 deletions(-)

diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index b4b4da9..01b73dc 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -46,6 +46,13 @@ struct nvme_loop_iod {
struct scatterlist  first_sgl[];
 };
 
+struct nvme_loop_port {
+   struct device   dev;
+   struct nvmf_ctrl_options *opts;
+   struct nvmet_port   *port;
+   struct nvme_ctrl*ctrl;
+};
+
 struct nvme_loop_ctrl {
spinlock_t  lock;
struct nvme_loop_queue  *queues;
@@ -62,6 +69,8 @@ struct nvme_loop_ctrl {
struct nvmet_ctrl   *target_ctrl;
struct work_struct  delete_work;
struct work_struct  reset_work;
+
+   struct nvme_loop_port   *port;
 };
 
 static inline struct nvme_loop_ctrl *to_loop_ctrl(struct nvme_ctrl *ctrl)
@@ -75,8 +84,6 @@ struct nvme_loop_queue {
struct nvme_loop_ctrl   *ctrl;
 };
 
-static struct nvmet_port *nvmet_loop_port;
-
 static LIST_HEAD(nvme_loop_ctrl_list);
 static DEFINE_MUTEX(nvme_loop_ctrl_mutex);
 
@@ -173,7 +180,8 @@ static int nvme_loop_queue_rq(struct blk_mq_hw_ctx *hctx,
return ret;
 
iod->cmd.common.flags |= NVME_CMD_SGL_METABUF;
-   iod->req.port = nvmet_loop_port;
+   iod->req.port = queue->ctrl->port->port;
+
if (!nvmet_req_init(>req, >nvme_cq,
>nvme_sq, _loop_ops)) {
nvme_cleanup_cmd(req);
@@ -618,6 +626,8 @@ out_destroy_queues:
 static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
struct nvmf_ctrl_options *opts)
 {
+   struct nvme_loop_port *loop_port = container_of(dev,
+   struct nvme_loop_port, dev);
struct nvme_loop_ctrl *ctrl;
bool changed;
int ret;
@@ -626,6 +636,7 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct 
device *dev,
if (!ctrl)
return ERR_PTR(-ENOMEM);
ctrl->ctrl.opts = opts;
+   ctrl->port = loop_port;
INIT_LIST_HEAD(>list);
 
INIT_WORK(>delete_work, nvme_loop_del_ctrl_work);
@@ -700,29 +711,117 @@ out_put_ctrl:
return ERR_PTR(ret);
 }
 
+static int nvme_loop_driver_probe(struct device *dev)
+{
+   struct nvme_loop_port *loop_port = container_of(dev,
+   struct nvme_loop_port, dev);
+   struct nvme_ctrl *ctrl;
+
+   ctrl = nvme_loop_create_ctrl(dev, loop_port->opts);
+   if (IS_ERR(ctrl))
+   return PTR_ERR(ctrl);
+
+   loop_port->ctrl = ctrl;
+   return 0;
+}
+
+static int nvme_loop_driver_remove(struct device *dev)
+{
+   struct nvme_loop_port *loop_port = container_of(dev,
+   struct nvme_loop_port, dev);
+   struct nvme_ctrl *ctrl = loop_port->ctrl;
+   struct nvmf_ctrl_options *opts = loop_port->opts;
+
+   nvme_loop_del_ctrl(ctrl);
+   nvmf_free_options(opts);
+   return 0;
+}
+
+static int pseudo_bus_match(struct device *dev,
+   struct device_driver *dev_driver)
+{
+   return 1;
+}
+
+static struct bus_type nvme_loop_bus = {
+   .name   = "nvme_loop_bus",
+   .match  = pseudo_bus_match,
+   .probe  = nvme_loop_driver_probe,
+   .remove = nvme_loop_driver_remove,
+};
+
+static struct device_driver nvme_loop_driverfs = {
+   .name   = "nvme_loop",
+   .bus= _loop_bus,
+};
+
+static void nvme_loop_release_adapter(struct device *dev)
+{
+   struct nvme_loop_port 

[RFC 1/2] nvme-fabrics: Add nvmf_get_default_host helper

2016-06-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch adds a nvmf_get_default_host() helper used to
setup nvmf_ctrl_options->host to internal nvmf_default_host
reference, following existing nvmf_parse_options() code.

Note it's required for nvme-loop multi-controller support,
in order to drive nvmet_port creation directly via configfs
attribute write from in ../nvmet/subsystems/$NQN/ports/$PORT/
group context.

Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Keith Busch 
Cc: Jay Freyensee 
Cc: Martin Petersen 
Cc: Sagi Grimberg 
Cc: Hannes Reinecke 
Cc: Mike Christie 
Signed-off-by: Nicholas Bellinger 
---
 drivers/nvme/host/fabrics.c | 13 +++--
 drivers/nvme/host/fabrics.h |  1 +
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index ee4b7f1..04317be 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -497,6 +497,16 @@ static struct nvmf_transport_ops *nvmf_lookup_transport(
return NULL;
 }
 
+void nvmf_get_default_host(struct nvmf_ctrl_options *opts)
+{
+   if (opts->host)
+   return;
+
+   kref_get(_default_host->ref);
+   opts->host = nvmf_default_host;
+}
+EXPORT_SYMBOL_GPL(nvmf_get_default_host);
+
 static const match_table_t opt_tokens = {
{ NVMF_OPT_TRANSPORT,   "transport=%s"  },
{ NVMF_OPT_TRADDR,  "traddr=%s" },
@@ -686,8 +696,7 @@ static int nvmf_parse_options(struct nvmf_ctrl_options 
*opts,
}
 
if (!opts->host) {
-   kref_get(_default_host->ref);
-   opts->host = nvmf_default_host;
+   nvmf_get_default_host(opts);
}
 
 out:
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index b540674..73ef7a7 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -128,6 +128,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl);
 int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid);
 void nvmf_register_transport(struct nvmf_transport_ops *ops);
 void nvmf_unregister_transport(struct nvmf_transport_ops *ops);
+void nvmf_get_default_host(struct nvmf_ctrl_options *opts);
 void nvmf_free_options(struct nvmf_ctrl_options *opts);
 const char *nvmf_get_subsysnqn(struct nvme_ctrl *ctrl);
 int nvmf_get_address(struct nvme_ctrl *ctrl, char *buf, int size);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 0/2] nvme/loop: Add support for controllers-per-port model

2016-06-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

Hi again folks,

So building on the nvmet/configfs-ng WIP posted yesterday:

http://marc.info/?l=linux-scsi=146528147018404=2

This series adds support for nvme/loop to utilize a nvme host
controller per nvmet_port, instead of a single hardcoded entry
for nvmet_port pointer access in nvme_loop_queue_rq().

It uses a struct device model following what we've done in
drivers/target/loopback to allow multiple host controllers
to co-exist under configfs, and is driven by the nvmet_port
configfs enable attribute.

Which means that any arbitary number of nvme-loop controllers
can now exist, and each nvmet_subsystem->ports_group can
enable/disable it's own loopback controller!

Here is how it looks in action for controller id 1 + 2:

root@scsi-mq:/sys/kernel/config/nvmet/subsystems# tree -if .
./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon
./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/namespaces
./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/namespaces/1
./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/namespaces/1/ramdisk1 -> 
../../../../../target/core/rd_mcp_2/ramdisk1
./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports
./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports/loop
./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports/loop/addr_adrfam
./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports/loop/addr_portid
./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports/loop/addr_traddr
./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports/loop/addr_treq
./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports/loop/addr_trsvcid
./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports/loop/addr_trtype
./nqn.2003-01.org.linux-iscsi.NVMf.newsilicon/ports/loop/enable
./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep
./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/namespaces
./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/namespaces/1
./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/namespaces/1/ramdisk0 -> 
../../../../../target/core/rd_mcp_1/ramdisk0
./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports
./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports/loop
./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports/loop/addr_adrfam
./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports/loop/addr_portid
./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports/loop/addr_traddr
./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports/loop/addr_treq
./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports/loop/addr_trsvcid
./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports/loop/addr_trtype
./nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep/ports/loop/enable

# cat /proc/partitions  
major minor  #blocks  name

 25904194304 nvme0n1
 2591  65535 nvme1n1
 2592  65535 nvme2n1

Comments..?

--nab

Nicholas Bellinger (2):
  nvme-fabrics: Add nvmf_get_default_host helper
  nvme/loop: Add support for controller-per-port model

 drivers/nvme/host/fabrics.c |  13 +++-
 drivers/nvme/host/fabrics.h |   1 +
 drivers/nvme/target/loop.c  | 183 +++-
 3 files changed, 177 insertions(+), 20 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] scsi: ufs: fix potential memory leak

2016-06-07 Thread Tiezhu Yang
If the function ufshcd_parse_clock_info returns an error, the memory clkfreq
allocated by devm_kzalloc will be freed at that time. But when the function
ufshcd_parse_clock_info returns 0 on success, there exists potential memory
leak, this patch fixes it.

Signed-off-by: Tiezhu Yang 
---
 drivers/scsi/ufs/ufshcd-pltfrm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/ufs/ufshcd-pltfrm.c b/drivers/scsi/ufs/ufshcd-pltfrm.c
index db53f38..83f757f 100644
--- a/drivers/scsi/ufs/ufshcd-pltfrm.c
+++ b/drivers/scsi/ufs/ufshcd-pltfrm.c
@@ -122,6 +122,10 @@ static int ufshcd_parse_clock_info(struct ufs_hba *hba)
clki->min_freq, clki->max_freq, clki->name);
list_add_tail(>list, >clk_list_head);
}
+
+   devm_kfree(dev, clkfreq);
+   clkfreq = NULL;
+
 out:
return ret;
 }
-- 
1.8.3.1N嫥叉靣笡y氊b瞂千v豝�)藓{.n�+壏{北�"妠ay�蕠跈�,jf"穐殝鄗�畐ア�
⒎:+v墾妛鑚豰稛�珣赙zZ+凒殠娸"濟!秈

Re: [PATCH] scsi: ufs: fix potential memory leak

2016-06-07 Thread Tiezhu Yang
At 2016-06-08 03:12:08, "James Bottomley" 
 wrote:
>On Wed, 2016-06-08 at 02:00 +0800, Tiezhu Yang wrote:
>> There exists potential memory leak in ufshcd_parse_clock_info(),
>> this patch fixes it.
>
>What makes you think there's a leak here?  These are all devm_
>allocations, so they're all freed when the device is.  If an error is
>returned, the device is released and the memory freed.
>
>You can argue that on successful initialization, there's no need to
>keep the clkfreq array but this patch isn't the way you'd change that.
>
>James
>
OK, thanks. I will send a v2 patch.

[PATCH] lpfc: fix oops in lpfc_sli4_scmd_to_wqidx_distr() from lpfc_send_taskmgmt()

2016-06-07 Thread Mauricio Faria de Oliveira
The lpfc_sli4_scmd_to_wqidx_distr() function expects the scsi_cmnd
'lpfc_cmd->pCmd' not to be null, and point to the midlayer command.

That's not true in the .eh_(device|target|bus)_reset_handler path,
because lpfc_send_taskmgmt() sends commands not from the midlayer,
so does not set 'lpfc_cmd->pCmd'.

That is true in the .queuecommand path because lpfc_queuecommand()
stores the scsi_cmnd from midlayer in lpfc_cmd->pCmd; and lpfc_cmd
is stored by lpfc_scsi_prep_cmnd() in piocbq->context1 -- which is
passed to lpfc_sli4_scmd_to_wqidx_distr() as lpfc_cmd parameter.

This problem can be hit on SCSI EH, and immediately with sg_reset.
These 2 test-cases demonstrate the problem/fix with next-20160601.

Test-case 1) sg_reset

# strace sg_reset --device /dev/sdm
<...>
open("/dev/sdm", O_RDWR|O_NONBLOCK) = 3
ioctl(3, SG_SCSI_RESET, 0x3fffde6d0994 
+++ killed by SIGSEGV +++
Segmentation fault

# dmesg
Unable to handle kernel paging request for data at address 0x
Faulting instruction address: 0xd0001c88442c
Oops: Kernel access of bad area, sig: 11 [#1]
<...>
CPU: 104 PID: 16333 Comm: sg_reset Tainted: GW   
4.7.0-rc1-next-20160601-4-g95b89dc #6
<...>
NIP [d0001c88442c] lpfc_sli4_scmd_to_wqidx_distr+0xc/0xd0 [lpfc]
LR [d0001c826fe8] lpfc_sli_calc_ring.part.27+0x98/0xd0 [lpfc]
Call Trace:
[c03c9ec876f0] [c03c9ec87770] 0xc03c9ec87770 (unreliable)
[c03c9ec87720] [d0001c82e004] lpfc_sli_issue_iocb+0xd4/0x260 [lpfc]
[c03c9ec87780] [d0001c831a3c] lpfc_sli_issue_iocb_wait+0x15c/0x5b0 
[lpfc]
[c03c9ec87880] [d0001c87f27c] lpfc_send_taskmgmt+0x24c/0x650 [lpfc]
[c03c9ec87950] [d0001c87fd7c] lpfc_device_reset_handler+0x10c/0x200 
[lpfc]
[c03c9ec87a10] [c0610694] scsi_try_bus_device_reset+0x44/0xc0
[c03c9ec87a40] [c06113e8] scsi_ioctl_reset+0x198/0x2c0
[c03c9ec87bf0] [c060fe5c] scsi_ioctl+0x13c/0x4b0
[c03c9ec87c80] [c06629b0] sd_ioctl+0xf0/0x120
[c03c9ec87cd0] [c046e4f8] blkdev_ioctl+0x248/0xb70
[c03c9ec87d30] [c02a1f60] block_ioctl+0x70/0x90
[c03c9ec87d50] [c026d334] do_vfs_ioctl+0xc4/0x890
[c03c9ec87de0] [c026db60] SyS_ioctl+0x60/0xc0
[c03c9ec87e30] [c0009120] system_call+0x38/0x108
Instruction dump:
<...>

With fix:

# strace sg_reset --device /dev/sdm
<...>
open("/dev/sdm", O_RDWR|O_NONBLOCK) = 3
ioctl(3, SG_SCSI_RESET, 0x3fffe103c554) = 0
close(3)= 0
exit_group(0)   = ?
+++ exited with 0 +++

# dmesg
[  424.658649] lpfc 0006:01:00.4: 4:(0):0713 SCSI layer issued Device Reset 
(1, 0) return x2002

Test-case 2) SCSI EH

Using this debug patch to wire an SCSI EH trigger, for 
lpfc_scsi_cmd_iocb_cmpl():
-   cmd->scsi_done(cmd);
+   if ((phba->pport ? phba->pport->cfg_log_verbose : 
phba->cfg_log_verbose) == 0x3210)
+   printk(KERN_ALERT "lpfc: skip scsi_done()\n");
+   else
+   cmd->scsi_done(cmd);

# echo 0x3210 > /sys/class/scsi_host/host11/lpfc_log_verbose

# dd if=/dev/sdm of=/dev/null iflag=direct &
<...>

After a while:

# dmesg
lpfc 0006:01:00.4: 4:(0):3053 lpfc_log_verbose changed from 0 (x0) to 
839909376 (x3210)
lpfc: skip scsi_done()
<...>
Unable to handle kernel paging request for data at address 0x
Faulting instruction address: 0xd000199e448c
Oops: Kernel access of bad area, sig: 11 [#1]
<...>
CPU: 96 PID: 28556 Comm: scsi_eh_11 Tainted: GW   
4.7.0-rc1-next-20160601-4-g95b89dc #6
<...>
NIP [d000199e448c] lpfc_sli4_scmd_to_wqidx_distr+0xc/0xd0 [lpfc]
LR [d00019986fe8] lpfc_sli_calc_ring.part.27+0x98/0xd0 [lpfc]
Call Trace:
[c00ff0d0b890] [c00ff0d0b900] 0xc00ff0d0b900 (unreliable)
[c00ff0d0b8c0] [d0001998e004] lpfc_sli_issue_iocb+0xd4/0x260 [lpfc]
[c00ff0d0b920] [d00019991a3c] lpfc_sli_issue_iocb_wait+0x15c/0x5b0 
[lpfc]
[c00ff0d0ba20] [d000199df27c] lpfc_send_taskmgmt+0x24c/0x650 [lpfc]
[c00ff0d0baf0] [d000199dfd7c] lpfc_device_reset_handler+0x10c/0x200 
[lpfc]
[c00ff0d0bbb0] [c0610694] scsi_try_bus_device_reset+0x44/0xc0
[c00ff0d0bbe0] [c06126cc] scsi_eh_ready_devs+0x49c/0x9c0
[c00ff0d0bcb0] [c0614160] scsi_error_handler+0x580/0x680
[c00ff0d0bd80] [c00ae848] kthread+0x108/0x130
[c00ff0d0be30] [c00094a8] ret_from_kernel_thread+0x5c/0xb4
Instruction dump:
<...>

With fix:

# dmesg
lpfc 0006:01:00.4: 4:(0):3053 lpfc_log_verbose changed from 0 (x0) to 
839909376 (x3210)
lpfc: skip scsi_done()
<...>
lpfc 0006:01:00.4: 4:(0):0713 SCSI layer issued Device 

Re: NVMe over Fabrics target implementation

2016-06-07 Thread Ming Lin
On Tue, Jun 7, 2016 at 2:02 PM, Andy Grover  wrote:
> On 06/06/2016 11:23 PM, Nicholas A. Bellinger wrote:
>>
>> Hi HCH & Co,
>>
>> On Mon, 2016-06-06 at 23:22 +0200, Christoph Hellwig wrote:
>>>
>>> This patch set adds a generic NVMe over Fabrics target. The
>>> implementation conforms to the NVMe 1.2b specification (which
>>> includes Fabrics) and provides the NVMe over Fabrics access
>>> to Linux block devices.
>>>
>>
>> Thanks for all of the development work by the fabric_linux_driver team
>> (HCH, Sagi, Ming, James F., James S., and Dave M.) over the last year.
>>
>> Very excited to see this code get a public release now that NVMf
>> specification is out.  Now that it's in the wild, it's a good
>> opportunity to discuss some of the more interesting implementation
>> details, beyond the new NVMf wire-protocol itself.
>
>
> I'm sorry but I missed it, can you repeat the link to the NVMe spec(s)?

http://www.nvmexpress.org/wp-content/uploads/NVM_Express_1_2_1_Gold_20160603.pdf
http://www.nvmexpress.org/wp-content/uploads/NVMe_over_Fabrics_1_0_Gold_20160605.pdf
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NVMe over Fabrics target implementation

2016-06-07 Thread Andy Grover

On 06/06/2016 11:23 PM, Nicholas A. Bellinger wrote:

Hi HCH & Co,

On Mon, 2016-06-06 at 23:22 +0200, Christoph Hellwig wrote:

This patch set adds a generic NVMe over Fabrics target. The
implementation conforms to the NVMe 1.2b specification (which
includes Fabrics) and provides the NVMe over Fabrics access
to Linux block devices.



Thanks for all of the development work by the fabric_linux_driver team
(HCH, Sagi, Ming, James F., James S., and Dave M.) over the last year.

Very excited to see this code get a public release now that NVMf
specification is out.  Now that it's in the wild, it's a good
opportunity to discuss some of the more interesting implementation
details, beyond the new NVMf wire-protocol itself.


I'm sorry but I missed it, can you repeat the link to the NVMe spec(s)?

Thanks -- Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/4] Add ioctl to issue ZBC/ZAC commands via block layer

2016-06-07 Thread Shaun Tancheff
Add New ioctl types
BLKREPORT- Issue Report Zones to device.
BLKOPENZONE  - Issue an Zone Action: Open Zone command.
BLKCLOSEZONE - Issue an Zone Action: Close Zone command.
BLKRESETZONE - Issue an Zone Action: Reset Zone command.

Signed-off-by: Shaun Tancheff 
---
V2:
 - Added include/uapi/linux/fs.h
 - Removed REQ_META flag from this patch.

 block/ioctl.c | 110 ++
 include/uapi/linux/blkzoned_api.h |   6 +++
 include/uapi/linux/fs.h   |   1 +
 3 files changed, 117 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..97f45f5 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -194,6 +195,109 @@ int blkdev_reread_part(struct block_device *bdev)
 }
 EXPORT_SYMBOL(blkdev_reread_part);
 
+static int blk_zoned_report_ioctl(struct block_device *bdev, fmode_t mode,
+   void __user *parg)
+{
+   int error = -EFAULT;
+   gfp_t gfp = GFP_KERNEL;
+   struct bdev_zone_report_io *zone_iodata = NULL;
+   int order = 0;
+   struct page *pgs = NULL;
+   u32 alloc_size = PAGE_SIZE;
+   unsigned long op_flags = 0;
+   u8 opt = 0;
+
+   if (!(mode & FMODE_READ))
+   return -EBADF;
+
+   zone_iodata = (void *)get_zeroed_page(gfp);
+   if (!zone_iodata) {
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   if (copy_from_user(zone_iodata, parg, sizeof(*zone_iodata))) {
+   error = -EFAULT;
+   goto report_zones_out;
+   }
+   if (zone_iodata->data.in.return_page_count > alloc_size) {
+   int npages;
+
+   alloc_size = zone_iodata->data.in.return_page_count;
+   npages = (alloc_size + PAGE_SIZE - 1) / PAGE_SIZE;
+   order =  ilog2(roundup_pow_of_two(npages));
+   pgs = alloc_pages(gfp, order);
+   if (pgs) {
+   void *mem = page_address(pgs);
+
+   if (!mem) {
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   memset(mem, 0, alloc_size);
+   memcpy(mem, zone_iodata, sizeof(*zone_iodata));
+   free_page((unsigned long)zone_iodata);
+   zone_iodata = mem;
+   } else {
+   /* Result requires DMA capable memory */
+   pr_err("Not enough memory available for request.\n");
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   }
+   opt = zone_iodata->data.in.report_option & 0x7F;
+   error = blkdev_issue_zone_report(bdev, op_flags,
+   zone_iodata->data.in.zone_locator_lba, opt,
+   pgs ? pgs : virt_to_page(zone_iodata),
+   alloc_size, GFP_KERNEL);
+
+   if (error)
+   goto report_zones_out;
+
+   if (copy_to_user(parg, zone_iodata, alloc_size))
+   error = -EFAULT;
+
+report_zones_out:
+   if (pgs)
+   __free_pages(pgs, order);
+   else if (zone_iodata)
+   free_page((unsigned long)zone_iodata);
+   return error;
+}
+
+static int blk_zoned_action_ioctl(struct block_device *bdev, fmode_t mode,
+ unsigned int cmd, unsigned long arg)
+{
+   unsigned long op_flags = 0;
+
+   if (!(mode & FMODE_WRITE))
+   return -EBADF;
+
+   /*
+* When acting on zones we explicitly disallow using a partition.
+*/
+   if (bdev != bdev->bd_contains) {
+   pr_err("%s: All zone operations disallowed on this device\n",
+   __func__);
+   return -EFAULT;
+   }
+
+   switch (cmd) {
+   case BLKOPENZONE:
+   op_flags |= REQ_OPEN_ZONE;
+   break;
+   case BLKCLOSEZONE:
+   op_flags |= REQ_CLOSE_ZONE;
+   break;
+   case BLKRESETZONE:
+   op_flags |= REQ_RESET_ZONE;
+   break;
+   default:
+   pr_err("%s: Unknown action: %u\n", __func__, cmd);
+   WARN_ON(1);
+   }
+   return blkdev_issue_zone_action(bdev, op_flags, arg, GFP_KERNEL);
+}
+
 static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
unsigned long arg, unsigned long flags)
 {
@@ -568,6 +672,12 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, 
unsigned cmd,
case BLKTRACESETUP:
case BLKTRACETEARDOWN:
return blk_trace_ioctl(bdev, cmd, argp);
+   case BLKREPORT:
+   return blk_zoned_report_ioctl(bdev, mode, argp);
+   case BLKOPENZONE:
+   case BLKCLOSEZONE:
+   case BLKRESETZONE:
+   

[PATCH v2 4/4] Add ata pass-through path for ZAC commands.

2016-06-07 Thread Shaun Tancheff
The current generation of HBA SAS adapters support connecting SATA
drives and perform SCSI<->ATA translations in hardware.
Unfortunately the ZBC commands are not being translate (yet).

Currently users of SAS controllers can only send ZAC commands via
ata pass-through.

This method overloads the meaning of REQ_META to direct ZBC commands
to construct ZAC equivalent ATA pass through commands.
Note also that this approach expects the initiator to deal with the
little endian result due to bypassing the normal translation layers.

Signed-off-by: Shaun Tancheff 
---
v2:
 - Added REQ_META to op_flags if high bit is set in opt.

 block/ioctl.c   | 32 
 drivers/scsi/sd.c   | 70 +
 include/linux/ata.h | 15 
 3 files changed, 102 insertions(+), 15 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index 97f45f5..c853c6f 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -245,6 +245,9 @@ static int blk_zoned_report_ioctl(struct block_device 
*bdev, fmode_t mode,
}
}
opt = zone_iodata->data.in.report_option & 0x7F;
+   if (zone_iodata->data.in.report_option & ZOPT_USE_ATA_PASS)
+   op_flags |= REQ_META;
+
error = blkdev_issue_zone_report(bdev, op_flags,
zone_iodata->data.in.zone_locator_lba, opt,
pgs ? pgs : virt_to_page(zone_iodata),
@@ -281,6 +284,35 @@ static int blk_zoned_action_ioctl(struct block_device 
*bdev, fmode_t mode,
return -EFAULT;
}
 
+   /*
+* When the low bit is set force ATA passthrough try to work around
+* older SAS HBA controllers that don't support ZBC to ZAC translation.
+*
+* When the low bit is clear follow the normal path but also correct
+* for ~0ul LBA means 'for all lbas'.
+*
+* NB: We should do extra checking here to see if the user specified
+* the entire block device as opposed to a partition of the
+* device
+*/
+   if (arg & 1) {
+   op_flags |= REQ_META;
+   if (arg != ~0ul)
+   arg &= ~1ul; /* ~1 :: 0xFF...FE */
+   } else {
+   if (arg == ~1ul)
+   arg = ~0ul;
+   }
+
+   /*
+* When acting on zones we explicitly disallow using a partition.
+*/
+   if (bdev != bdev->bd_contains) {
+   pr_err("%s: All zone operations disallowed on this device\n",
+   __func__);
+   return -EFAULT;
+   }
+
switch (cmd) {
case BLKOPENZONE:
op_flags |= REQ_OPEN_ZONE;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 06b54d5..cf96f01 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1182,12 +1183,29 @@ static int sd_setup_zoned_cmnd(struct scsi_cmnd *cmd)
 
cmd->cmd_len = 16;
memset(cmd->cmnd, 0, cmd->cmd_len);
-   cmd->cmnd[0] = ZBC_IN;
-   cmd->cmnd[1] = ZI_REPORT_ZONES;
-   put_unaligned_be64(sector, >cmnd[2]);
-   put_unaligned_be32(nr_bytes, >cmnd[10]);
-   /* FUTURE ... when streamid is available */
-   /* cmd->cmnd[14] = bio_get_streamid(bio); */
+   if (rq->cmd_flags & REQ_META) {
+   cmd->cmnd[0] = ATA_16;
+   cmd->cmnd[1] = (0x6 << 1) | 1;
+   cmd->cmnd[2] = 0x0e;
+   /* FUTURE ... when streamid is available */
+   /* cmd->cmnd[3] = bio_get_streamid(bio); */
+   cmd->cmnd[4] = ATA_SUBCMD_ZAC_MGMT_IN_REPORT_ZONES;
+   cmd->cmnd[5] = ((nr_bytes / 512) >> 8) & 0xff;
+   cmd->cmnd[6] = (nr_bytes / 512) & 0xff;
+
+   _lba_to_cmd_ata(>cmnd[7], sector);
+
+   cmd->cmnd[13] = 1 << 6;
+   cmd->cmnd[14] = ATA_CMD_ZAC_MGMT_IN;
+   } else {
+   cmd->cmnd[0] = ZBC_IN;
+   cmd->cmnd[1] = ZI_REPORT_ZONES;
+   put_unaligned_be64(sector, >cmnd[2]);
+   put_unaligned_be32(nr_bytes, >cmnd[10]);
+   /* FUTURE ... when streamid is available */
+   /* cmd->cmnd[14] = bio_get_streamid(bio); */
+   }
+
cmd->sc_data_direction = DMA_FROM_DEVICE;
cmd->sdb.length = nr_bytes;
cmd->transfersize = sdp->sector_size;
@@ -1208,14 +1226,29 @@ static int sd_setup_zoned_cmnd(struct scsi_cmnd *cmd)
cmd->cmd_len = 16;
memset(cmd->cmnd, 0, cmd->cmd_len);
memset(>sdb, 0, sizeof(cmd->sdb));
-   cmd->cmnd[0] = ZBC_OUT;
-   cmd->cmnd[1] = ZO_OPEN_ZONE;
-

[PATCH v2 2/4] Add bio/request flags for using ZBC/ZAC commands

2016-06-07 Thread Shaun Tancheff
T10 ZBC and T13 ZAC specify operations for Zoned devices.

To be able to access the zone information and open and close zones
adding flags for the report zones command (REQ_REPORT_ZONES) and for
Open and Close zone (REQ_OPEN_ZONE and REQ_CLOSE_ZONE) can be added
for use by struct bio's bi_rw and by struct request's cmd_flags.

To reduce the number of additional flags needed REQ_RESET_ZONE shares
the same flag as REQ_REPORT_ZONES and is differentiated by direction.
Report zones is a device read that requires a buffer. Reset is a device
command (WRITE) that has no associated data transfer.

The Finish zone command is intentionally not implimented as there is no
current use case for that operation.

Report zones currently defaults to reporting on all zones. It expected
that support for the zone option flag will piggy back on streamid
support. The report option is useful as it can reduce the number of
zones in each report, but not critical.

Signed-off-by: Shaun Tancheff 
---
V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Removed include/uapi/linux/fs.h from this patch.

 MAINTAINERS   |   9 ++
 block/blk-lib.c   |  97 +
 drivers/scsi/sd.c |  99 +-
 drivers/scsi/sd.h |   1 +
 include/linux/blk_types.h |  19 +++-
 include/linux/blkzoned_api.h  |  25 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 215 ++
 8 files changed, 461 insertions(+), 5 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

diff --git a/MAINTAINERS b/MAINTAINERS
index ed42cb6..d9fafa2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12662,6 +12662,15 @@ F: Documentation/networking/z8530drv.txt
 F: drivers/net/hamradio/*scc.c
 F: drivers/net/hamradio/z8530.h
 
+ZBC AND ZBC BLOCK DEVICES
+M: Shaun Tancheff 
+W: http://seagate.com
+W: https://github.com/Seagate/ZDM-Device-Mapper
+L: linux-bl...@vger.kernel.org
+S: Maintained
+F: include/linux/blkzoned_api.h
+F: include/uapi/linux/blkzoned_api.h
+
 ZBUD COMPRESSED PAGE ALLOCATOR
 M: Seth Jennings 
 L: linux...@kvack.org
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 23d7f30..ce4168a 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "blk.h"
 
@@ -249,3 +250,99 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_issue_zone_report - queue a report zones operation
+ * @bdev:  target blockdev
+ * @op_flags:  extra bio rw flags. If unsure, use 0.
+ * @sector:starting sector (report will include this sector).
+ * @opt:   See: zone_report_option, default is 0 (all zones).
+ * @page:  one or more contiguous pages.
+ * @pgsz:  up to size of page in bytes, size of report.
+ * @gfp_mask:  memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *Issue a zone report request for the sectors in question.
+ */
+int blkdev_issue_zone_report(struct block_device *bdev, unsigned int op_flags,
+sector_t sector, u8 opt, struct page *page,
+size_t pgsz, gfp_t gfp_mask)
+{
+   struct bdev_zone_report *conv = page_address(page);
+   struct bio *bio;
+   unsigned int nr_iovecs = 1;
+   int ret = 0;
+
+   if (pgsz < (sizeof(struct bdev_zone_report) +
+   sizeof(struct bdev_zone_descriptor)))
+   return -EINVAL;
+
+   bio = bio_alloc(gfp_mask, nr_iovecs);
+   if (!bio)
+   return -ENOMEM;
+
+   conv->descriptor_count = 0;
+   bio->bi_iter.bi_sector = sector;
+   bio->bi_bdev = bdev;
+   bio->bi_vcnt = 0;
+   bio->bi_iter.bi_size = 0;
+
+   op_flags |= REQ_REPORT_ZONES;
+
+   /* FUTURE ... when streamid is available: */
+   /* bio_set_streamid(bio, opt); */
+
+   bio_add_page(bio, page, pgsz, 0);
+   ret = submit_bio_wait(READ | op_flags, bio);
+
+   /*
+* When our request it nak'd the underlying device maybe conventional
+* so ... report a single conventional zone the size of the device.
+*/
+   if (ret == -EIO && conv->descriptor_count) {
+   /* Adjust the conventional to the size of the partition ... */
+   __be64 blksz = cpu_to_be64(bdev->bd_part->nr_sects);
+
+   conv->maximum_lba = blksz;
+   conv->descriptors[0].type = ZTYP_CONVENTIONAL;
+   conv->descriptors[0].flags = ZCOND_CONVENTIONAL << 4;
+  

[PATCH v2 0/4] Block layer support ZAC/ZBC commands

2016-06-07 Thread Shaun Tancheff
As Host Aware drives are becoming available we would like to be able
to make use of such drives. This series is also intended to be suitable
for use by Host Managed drives.

ZAC/ZBC drives add new commands for discovering and working with Zones.

This extends the ZAC/ZBC support up to the block layer.

Thie first patch in the series is a place-holder for Mike Christi's
separate operations from flags ...
https://lkml.kernel.org/r/1465155145-10812-1-git-send-email-mchri...@redhat.com
Once that work is completed the first patch can be dropped.

Patches for util-linux can be found here:
https://github.com/Seagate/ZDM-Device-Mapper/tree/master/patches/util-linux

Using BIOs to issue ZBC commands allows DM targets (such as ZDM) or
file-systems such as btrfs or nilfs2 to extend their block allocation
schemes and issue discards that are zone aware.

A perhaps non-obvious approach is that a conventional drive will 
returns a descriptor with a single large conventional zone.

This patch is also at
https://github.com/stancheff/linux.git
g...@github.com:stancheff/linux.git
branch: v4.7-rc2+bio.zbc.v2

V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Moved include/uapi/linux/fs.h changes to patch 3
 - Fixed commit message for first patch in series.

Shaun Tancheff (4):
  Losing bits on request.cmd_flags
  Add bio/request flags for using ZBC/ZAC commands
  Add ioctl to issue ZBC/ZAC commands via block layer
  Add ata pass-through path for ZAC commands.

 MAINTAINERS   |   9 ++
 block/blk-core.c  |  17 +--
 block/blk-lib.c   |  97 +
 block/blk-merge.c |   2 +-
 block/blk-mq.c|   2 +-
 block/cfq-iosched.c   |   2 +-
 block/elevator.c  |   4 +-
 block/ioctl.c | 142 
 drivers/scsi/sd.c | 141 +++-
 drivers/scsi/sd.h |   1 +
 include/linux/ata.h   |  15 +++
 include/linux/blk_types.h |  19 +++-
 include/linux/blkzoned_api.h  |  25 +
 include/linux/elevator.h  |   4 +-
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 221 ++
 include/uapi/linux/fs.h   |   1 +
 17 files changed, 683 insertions(+), 20 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/4] Losing bits on request.cmd_flags

2016-06-07 Thread Shaun Tancheff
In a few places a temporary value smaller than a cmd_flags
is used to test for bits and or build up a new cmd_flags.

Change to use explict u64 values where appropriate.

This patch is place holder for: Mike Christie's separate operations ... series
https://lkml.kernel.org/r/1465155145-10812-1-git-send-email-mchri...@redhat.com

Once Mike's patches are stablized this patch can be dropped.

Signed-off-by: Shaun Tancheff 
---

v2:
 - Fixed commit message

 block/blk-core.c | 17 ++---
 block/blk-merge.c|  2 +-
 block/blk-mq.c   |  2 +-
 block/cfq-iosched.c  |  2 +-
 block/elevator.c |  4 ++--
 include/linux/elevator.h |  4 ++--
 6 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 2475b1c7..945e564 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -959,7 +959,7 @@ static void __freed_request(struct request_list *rl, int 
sync)
  * A request has just been released.  Account for it, update the full and
  * congestion status, wake up any waiters.   Called under q->queue_lock.
  */
-static void freed_request(struct request_list *rl, unsigned int flags)
+static void freed_request(struct request_list *rl, u64 flags)
 {
struct request_queue *q = rl->q;
int sync = rw_is_sync(flags);
@@ -1054,7 +1054,7 @@ static struct io_context *rq_ioc(struct bio *bio)
 /**
  * __get_request - get a free request
  * @rl: request list to allocate from
- * @rw_flags: RW and SYNC flags
+ * @rw: RW and SYNC flags
  * @bio: bio to allocate request for (can be %NULL)
  * @gfp_mask: allocation mask
  *
@@ -1065,7 +1065,7 @@ static struct io_context *rq_ioc(struct bio *bio)
  * Returns ERR_PTR on failure, with @q->queue_lock held.
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
-static struct request *__get_request(struct request_list *rl, int rw_flags,
+static struct request *__get_request(struct request_list *rl, unsigned long rw,
 struct bio *bio, gfp_t gfp_mask)
 {
struct request_queue *q = rl->q;
@@ -1073,6 +1073,7 @@ static struct request *__get_request(struct request_list 
*rl, int rw_flags,
struct elevator_type *et = q->elevator->type;
struct io_context *ioc = rq_ioc(bio);
struct io_cq *icq = NULL;
+   u64 rw_flags = rw;
const bool is_sync = rw_is_sync(rw_flags) != 0;
int may_queue;
 
@@ -1237,7 +1238,8 @@ rq_starved:
  * Returns ERR_PTR on failure, with @q->queue_lock held.
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
-static struct request *get_request(struct request_queue *q, int rw_flags,
+static struct request *get_request(struct request_queue *q,
+  unsigned long rw_flags,
   struct bio *bio, gfp_t gfp_mask)
 {
const bool is_sync = rw_is_sync(rw_flags) != 0;
@@ -1490,7 +1492,7 @@ void __blk_put_request(struct request_queue *q, struct 
request *req)
 * it didn't come out of our reserved rq pools
 */
if (req->cmd_flags & REQ_ALLOCED) {
-   unsigned int flags = req->cmd_flags;
+   u64 flags = req->cmd_flags;
struct request_list *rl = blk_rq_rl(req);
 
BUG_ON(!list_empty(>queuelist));
@@ -1712,7 +1714,8 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, 
struct bio *bio)
 {
const bool sync = !!(bio->bi_rw & REQ_SYNC);
struct blk_plug *plug;
-   int el_ret, rw_flags, where = ELEVATOR_INSERT_SORT;
+   u64 rw_flags;
+   int el_ret, where = ELEVATOR_INSERT_SORT;
struct request *req;
unsigned int request_count = 0;
 
@@ -2246,7 +2249,7 @@ EXPORT_SYMBOL_GPL(blk_insert_cloned_request);
  */
 unsigned int blk_rq_err_bytes(const struct request *rq)
 {
-   unsigned int ff = rq->cmd_flags & REQ_FAILFAST_MASK;
+   u64 ff = rq->cmd_flags & REQ_FAILFAST_MASK;
unsigned int bytes = 0;
struct bio *bio;
 
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2613531..fec37e1 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -604,7 +604,7 @@ static int ll_merge_requests_fn(struct request_queue *q, 
struct request *req,
  */
 void blk_rq_set_mixed_merge(struct request *rq)
 {
-   unsigned int ff = rq->cmd_flags & REQ_FAILFAST_MASK;
+   u64 ff = rq->cmd_flags & REQ_FAILFAST_MASK;
struct bio *bio;
 
if (rq->cmd_flags & REQ_MIXED_MERGE)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 29cbc1b..db962bc 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -159,7 +159,7 @@ bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
 EXPORT_SYMBOL(blk_mq_can_queue);
 
 static void blk_mq_rq_ctx_init(struct request_queue *q, struct blk_mq_ctx *ctx,
-  struct request *rq, unsigned int rw_flags)
+  struct request *rq, u64 rw_flags)
 {
if 

Re: [PATCH] scsi: ufs: fix potential memory leak

2016-06-07 Thread James Bottomley
On Wed, 2016-06-08 at 02:00 +0800, Tiezhu Yang wrote:
> There exists potential memory leak in ufshcd_parse_clock_info(),
> this patch fixes it.

What makes you think there's a leak here?  These are all devm_
allocations, so they're all freed when the device is.  If an error is
returned, the device is released and the memory freed.

You can argue that on successful initialization, there's no need to
keep the clkfreq array but this patch isn't the way you'd change that.

James

> Signed-off-by: Tiezhu Yang 
> ---
>  drivers/scsi/ufs/ufshcd-pltfrm.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd-pltfrm.c
> b/drivers/scsi/ufs/ufshcd-pltfrm.c
> index db53f38..8b057f4 100644
> --- a/drivers/scsi/ufs/ufshcd-pltfrm.c
> +++ b/drivers/scsi/ufs/ufshcd-pltfrm.c
> @@ -100,19 +100,19 @@ static int ufshcd_parse_clock_info(struct
> ufs_hba *hba)
>   if (ret && (ret != -EINVAL)) {
>   dev_err(dev, "%s: error reading array %d\n",
>   "freq-table-hz", ret);
> - return ret;
> + goto out_free;
>   }
>  
>   for (i = 0; i < sz; i += 2) {
>   ret = of_property_read_string_index(np,
>   "clock-names", i/2, (const char
> **));
>   if (ret)
> - goto out;
> + goto out_free;
>  
>   clki = devm_kzalloc(dev, sizeof(*clki), GFP_KERNEL);
>   if (!clki) {
>   ret = -ENOMEM;
> - goto out;
> + goto out_free;
>   }
>  
>   clki->min_freq = clkfreq[i];
> @@ -122,6 +122,10 @@ static int ufshcd_parse_clock_info(struct
> ufs_hba *hba)
>   clki->min_freq, clki->max_freq, clki
> ->name);
>   list_add_tail(>list, >clk_list_head);
>   }
> +
> +out_free:
> + devm_kfree(dev, clkfreq);
> + clkfreq = NULL;
>  out:
>   return ret;
>  }

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] scsi: ufs: fix potential memory leak

2016-06-07 Thread Tiezhu Yang
There exists potential memory leak in ufshcd_parse_clock_info(),
this patch fixes it.

Signed-off-by: Tiezhu Yang 
---
 drivers/scsi/ufs/ufshcd-pltfrm.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd-pltfrm.c b/drivers/scsi/ufs/ufshcd-pltfrm.c
index db53f38..8b057f4 100644
--- a/drivers/scsi/ufs/ufshcd-pltfrm.c
+++ b/drivers/scsi/ufs/ufshcd-pltfrm.c
@@ -100,19 +100,19 @@ static int ufshcd_parse_clock_info(struct ufs_hba *hba)
if (ret && (ret != -EINVAL)) {
dev_err(dev, "%s: error reading array %d\n",
"freq-table-hz", ret);
-   return ret;
+   goto out_free;
}
 
for (i = 0; i < sz; i += 2) {
ret = of_property_read_string_index(np,
"clock-names", i/2, (const char **));
if (ret)
-   goto out;
+   goto out_free;
 
clki = devm_kzalloc(dev, sizeof(*clki), GFP_KERNEL);
if (!clki) {
ret = -ENOMEM;
-   goto out;
+   goto out_free;
}
 
clki->min_freq = clkfreq[i];
@@ -122,6 +122,10 @@ static int ufshcd_parse_clock_info(struct ufs_hba *hba)
clki->min_freq, clki->max_freq, clki->name);
list_add_tail(>list, >clk_list_head);
}
+
+out_free:
+   devm_kfree(dev, clkfreq);
+   clkfreq = NULL;
 out:
return ret;
 }
-- 
1.8.3.1

Re: Partitions not detected with firewire - 36 bytes offset - PL-3507

2016-06-07 Thread Stefan Richter
Adding Cc: lsml, quoting your message in full.

On Jun 02 Fab Stz wrote at linux1394-user:
> Hello
> 
> I have a external hard drive box (brand SQP : www.sqp.fr, more details
> below) with USB & Firewire support. The chipset is a PL-3507.
> 
> On MaxOS X or windows (eg win10), if I connect through USB or FW, my
> partitions are displayed and the disk is well detected. In Linux (kernel
> 4.5.3) with USB, disk and partitions are well detected. However when
> connected through firewire, the disk is well detected but not the
> partitions.
> 
> By investigating the issue I came to that thread
> [ https://sourceforge.net/p/linux1394/mailman/message/13218958/ ] which
> I read carefuly. I also have an offset when running dd on the disk.
> There are 36 additional bytes (value : 0x00) at the begining of the disk
> when connected through FW compared to USB. That may explain the fact
> that the partitions are not found. BTW, it is a GUID Partition Table in
> my case.
> 
> I checked the firmware, it is dated 2004/Oct/06. According to the
> release notes of the firmware, there was a fix in firmware 2004/Mar/24
> labeled " Fixed cannot mount HDD in Linux RedHat. Must use kernel
> v2.4.20 and above. [1394] ". So I supposed upgrading wouldn't bring
> anything. Nevertheless, I tried upgrading the firmware to a more recent
> one [2004/Nov/09 or even 2006/Apr/20], but upgrade was rejected. The
> above thread mentioned that some boards have a zero resistor preventing
> upgrade. I searched for the zero resistance R72
> (http://club.cdfreaks.com/showthread.php?p=957178), but on my board
> there isn't any. So I don't see how to upgrade the firmware.
> 
> My questions are :
> - could it be a bug in linux driver ?
> - is it that my device is not well detected ? Or is there any workaround
> in the kernel for buggy devices which doesn't match my particular
> device ? It's vendor/device id are when connected as USB idVendor=0025,
> idProduct=3507 (usualy for prolific it is idVendor=067b, idProduct=3507)
> - do you have any other clue to make it work, or upgrade the firmware,
> or determine which zero resistor has to be removed ?
> - is there some way to tell linux to start reading the disk with an
> offset of 36 bytes or ignore these 36 bytes?
> 
> You can find below all technical details.
> 
> Thank you

First a few theoretical thoughts on why the device works under Windows and
OS X but not under Linux:

The first SCSI request which the Linux kernel issues contains the INQUIRY
command.  You are seeing results from INQUIRY in the dmesg portion below.
There is for example a quirk parameter which control how the kernel reads
the INQUIRY response, but it seems to me that this is not relevant here.
(BLIST_INQUIRY_36 from
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/scsi/scsi_devinfo.h)

Next you are also seeing results from a command which the kernel sent
subsequently, MODE SENSE.  MODE SENSE can request various so-called mode
pages.  Most of them are optional, and some buggy firmwares react badly
when asked for a mode page that they don't know about.  There are some
quirk flags which relate to that sort of device bugs
(BLIST_MS_SKIP_PAGE_08, BLIST_MS_SKIP_PAGE_3F, BLIST_USE_10_BYTE_MS,
BLIST_MS_192_BYTES_FOR_3F) but I doubt that you need any of those.

Next, the sd driver should issue READ_CAPACITY to determine medium size
(or maybe this happened already before MODE SENSE, I would have to look
that up in drivers/scsi/sd.c).  Then the block layer(?) should issue a
READ command of some sort in order to fetch the partition table.

If the Linux kernel performs anything of the above in only the tiniest
different way than Windows does it, a buggy firmware like the typical
firmwares of the FireWire portion of PL-3507, the firmware will start
sending malformed results or flat out freeze.

AFAIR this series of commands which the kernel issues, the subsequent
commands come from userland.

And again, if Linux userland sends commands which Windows kernel or
Windows userland do not send (or do not send with such and such parameters
or in this or that particular order of sequence), PL-3507 firmwares will
take offense and go belly-up.

One example would be that userland sends another INQUIRY request.  Which
would be perfectly standards conforming but is highly dangerous for
PL-3507.

The 36 bytes offset which you observed happens to be the same number of
bytes as the length of a typical INQUIRY response buffer.  Coincidence?

So,
  - the SCSI layer and block layer of the Linux kerne could issue commands
which the PL-3507's firmware cannot handle, but this can be influenced
to some degree by specifying quirk flags to SCSI core.  (I don't
remember right now how to do that precisely.)
  - Or Linux userland could issue commands which crash the firmware.

To investigate the latter possibility, you could boot into a minimal
single-user command line and try to use the disk from there.  Consult your
distribution's 

Re: [PATCH v3 0/6] Introduce pci_(request|release)_(mem|io)_regions

2016-06-07 Thread Christoph Hellwig
The whole series looks fine to me:

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] scsi_debug: fix sleep in invalid context

2016-06-07 Thread Christoph Hellwig
> +static int p_fill_from_dev_buffer(struct scsi_cmnd *scp, const void *arr,
> +   int arr_len, unsigned int off_dst)
> +{
> + int act_len, n;
> + struct scsi_data_buffer *sdb = scsi_in(scp);
> + off_t skip = off_dst;

Why off_t which is a signed value instead of the unsigned in passed in?

> +#define RL_BUCKET_ELEMS 8
> +
>  /* Even though each pseudo target has a REPORT LUNS "well known logical unit"
>   * (W-LUN), the normal Linux scanning logic does not associate it with a
>   * device (e.g. /dev/sg7). The following magic will make that association:
> @@ -3285,12 +3315,14 @@ static int resp_report_luns(struct scsi_cmnd *scp,
>   unsigned char select_report;
>   u64 lun;
>   struct scsi_lun *lun_p;
> - u8 *arr;
> + u8 arr[RL_BUCKET_ELEMS * sizeof(struct scsi_lun)];

just use an on-stack array of type struct scsi_lun here, e.g.:

struct scsi_lun arr[RL_BUCKET_ELEMS];

Which you can then use directly instead of lun_p later, but which can
also be passed p_fill_from_dev_buffer as that takes a void pointer.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NVMe over Fabrics target implementation

2016-06-07 Thread Christoph Hellwig
There is absolutely no point in dragging in an overcomplicated configfs 
structure for a very simple protocol which also is very different from
SCSI in it's nitty gritty details.  Keeping the nvme target self contains
allows it to be both much simpler and much easier to understand, as well
as much better testable - see the amount of test coverage we could easily
add for example.

Or to put it the other way around - if there was any major synergy in
reusing the SCSI target code that just shows we're missing functionality
in the block layer or configfs.

The only thing where this is the case mid-term is persistent reservations,
but Mike Christie already has a plan for a pluggable PR API, which I'm
very interested in.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/6] Introduce pci_(request|release)_(mem|io)_regions

2016-06-07 Thread Johannes Thumshirn
The first patch in this series introduces the following 4 helper functions to
the PCI core:

* pci_request_mem_regions()
* pci_request_io_regions()
* pci_release_mem_regions()
* pci_release_io_regions()

which encapsulate the request and release of a PCI device's memory or I/O
bars.

The subsequent patches convert the drivers, which use the
pci_request_selected_regions(pdev, 
pci_select_bars(pdev, IORESOURCE_MEM), name); 
and similar pattern to use the new interface.

This was suggested by Christoph Hellwig in
http://lists.infradead.org/pipermail/linux-nvme/2016-May/004570.html and
tested on kernel v4.6 with NVMe.

The conversion of the drivers has been performed by the following coccinelle
spatch:
// IORESOURCE_MEM
@@
expression err, pdev, name;
@@

- err = pci_request_selected_regions(pdev, pci_select_bars(pdev, 
IORESOURCE_MEM), name);
+ err = pci_request_mem_regions(pdev, name);

@@
expression pdev;
@@
- pci_release_selected_regions(pdev, pci_select_bars(pdev, IORESOURCE_MEM));
+ pci_release_mem_regions(pdev);

@@
expression err, pdev, name;
identifier bars;
@@
- bars = pci_select_bars(pdev, IORESOURCE_MEM);
...
- err = pci_request_selected_regions(pdev, bars, name);
+ err = pci_request_mem_regions(pdev, name);

@@
expression pdev;
identifier bars;
@@
- bars = pci_select_bars(pdev, IORESOURCE_MEM);
...
- pci_release_selected_regions(pdev, bars);
+ pci_release_mem_regions(pdev);

// IORESOURCE_IO
@@
expression err, pdev, name;
@@

- err = pci_request_selected_regions(pdev, pci_select_bars(pdev,
IORESOURCE_IO), name);
+ err = pci_request_io_regions(pdev, name);

@@
expression pdev;
@@
- pci_release_selected_regions(pdev, pci_select_bars(pdev, IORESOURCE_IO));
+ pci_release_io_regions(pdev);

@@
expression err, pdev, name;
identifier bars;
@@
- bars = pci_select_bars(pdev, IORESOURCE_IO);
...
- err = pci_request_selected_regions(pdev, bars, name);
+ err = pci_request_io_regions(pdev, name);

@@
expression pdev;
identifier bars;
@@
- bars = pci_select_bars(pdev, IORESOURCE_IO);
...
- pci_release_selected_regions(pdev, bars);
+ pci_release_io_regions(pdev);

Changes since v2:
* Fixed compilation error on platforms with CONFIG_PCI=n
* Added Jeff's Acked-by on the Intel ethernet patch
* Added Dick's Acked-by on the lpfc patch

Changes since v1:
* Fixed indendatoin in pci.h patch to not cross the 80 chars boundary.
* Split Ethernet patches into two, one for Atheros and one for Intel drivers.
* Correctly named lpfc patch.
* Converted init-path of lpfc driver as well.
* Added Reviewed-by tags were appropriate.

Johannes Thumshirn (6):
  PCI: Add helpers to request/release memory and I/O regions
  NVMe: Use pci_(request|release)_mem_regions
  lpfc: Use pci_(request|release)_mem_regions
  GenWQE: Use pci_(request|release)_mem_regions
  ethernet/intel: Use pci_(request|release)_mem_regions
  alx: Use pci_(request|release)_mem_regions

 drivers/misc/genwqe/card_base.c   | 13 +
 drivers/net/ethernet/atheros/alx/main.c   | 12 +---
 drivers/net/ethernet/intel/e1000e/netdev.c|  6 ++
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c  | 11 +++
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  9 +++--
 drivers/net/ethernet/intel/igb/igb_main.c | 10 +++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  9 +++--
 drivers/nvme/host/pci.c   | 10 +++---
 drivers/scsi/lpfc/lpfc_init.c | 15 --
 include/linux/pci.h   | 28 +++
 10 files changed, 59 insertions(+), 64 deletions(-)

Cc: Christoph Hellwig 
Cc: Keith Busch 
Cc: Jens Axboe 
Cc: linux-n...@lists.infradead.org
Cc: James Smart 
Cc: Dick Kennedy 
Cc: "James E.J. Bottomley" 
Cc: "Martin K. Petersen" 
Cc: linux-scsi@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: Frank Haverkamp 
Cc: Greg Kroah-Hartman 
Cc: Jay Cliburn 
Cc: Chris Snook 
Cc: Jeff Kirsher 
Cc: David S. Miller 
Cc: net...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: intel-wired-...@lists.osuosl.org
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/6] lpfc: Use pci_(request|release)_mem_regions

2016-06-07 Thread Johannes Thumshirn
Now that we do have pci_request_mem_regions() and pci_release_mem_regions() at
hand, use it in the lpfc driver.

Suggested-by: Christoph Hellwig 
Signed-off-by: Johannes Thumshirn 
Acked-by: Dick Kennedy 
Cc: James Smart 
Cc: Dick Kennedy 
Cc: "James E.J. Bottomley" 
Cc: "Martin K. Petersen" 
Cc: linux-scsi@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
---
 drivers/scsi/lpfc/lpfc_init.c | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index f57d02c..a8735f7 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -4775,20 +4775,17 @@ static int
 lpfc_enable_pci_dev(struct lpfc_hba *phba)
 {
struct pci_dev *pdev;
-   int bars = 0;
 
/* Obtain PCI device reference */
if (!phba->pcidev)
goto out_error;
else
pdev = phba->pcidev;
-   /* Select PCI BARs */
-   bars = pci_select_bars(pdev, IORESOURCE_MEM);
/* Enable PCI device */
if (pci_enable_device_mem(pdev))
goto out_error;
/* Request PCI resource for the device */
-   if (pci_request_selected_regions(pdev, bars, LPFC_DRIVER_NAME))
+   if (pci_request_mem_regions(pdev, LPFC_DRIVER_NAME))
goto out_disable_device;
/* Set up device as PCI master and save state for EEH */
pci_set_master(pdev);
@@ -4805,7 +4802,7 @@ out_disable_device:
pci_disable_device(pdev);
 out_error:
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
-   "1401 Failed to enable pci device, bars:x%x\n", bars);
+   "1401 Failed to enable pci device\n");
return -ENODEV;
 }
 
@@ -4820,17 +4817,14 @@ static void
 lpfc_disable_pci_dev(struct lpfc_hba *phba)
 {
struct pci_dev *pdev;
-   int bars;
 
/* Obtain PCI device reference */
if (!phba->pcidev)
return;
else
pdev = phba->pcidev;
-   /* Select PCI BARs */
-   bars = pci_select_bars(pdev, IORESOURCE_MEM);
/* Release PCI resource and disable PCI device */
-   pci_release_selected_regions(pdev, bars);
+   pci_release_mem_regions(pdev);
pci_disable_device(pdev);
 
return;
@@ -9705,7 +9699,6 @@ lpfc_pci_remove_one_s3(struct pci_dev *pdev)
struct lpfc_vport **vports;
struct lpfc_hba   *phba = vport->phba;
int i;
-   int bars = pci_select_bars(pdev, IORESOURCE_MEM);
 
spin_lock_irq(>hbalock);
vport->load_flag |= FC_UNLOADING;
@@ -9780,7 +9773,7 @@ lpfc_pci_remove_one_s3(struct pci_dev *pdev)
 
lpfc_hba_free(phba);
 
-   pci_release_selected_regions(pdev, bars);
+   pci_release_mem_regions(pdev);
pci_disable_device(pdev);
 }
 
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/2] Documentation/scsi: update scsi_eh.txt about ->host_failed

2016-06-07 Thread Wei Fang
Update the new concurrency rules of ->host_failed.

Signed-off-by: Wei Fang 
---
 Documentation/scsi/scsi_eh.txt |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
index 8638f61..37eca00 100644
--- a/Documentation/scsi/scsi_eh.txt
+++ b/Documentation/scsi/scsi_eh.txt
@@ -263,19 +263,23 @@ scmd->allowed.
 
  3. scmd recovered
 ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd
-   - shost->host_failed--
- clear scmd->eh_eflags
- scsi_setup_cmd_retry()
- move from local eh_work_q to local eh_done_q
 LOCKING: none
+CONCURRENCY: at most one thread per separate eh_work_q to
+keep queue manipulation lockless
 
  4. EH completes
 ACTION: scsi_eh_flush_done_q() retries scmds or notifies upper
-   layer of failure.
+   layer of failure. May be called concurrently but must have
+   a no more than one thread per separate eh_work_q to
+   manipulate the queue locklessly
- scmd is removed from eh_done_q and scmd->eh_entry is cleared
- if retry is necessary, scmd is requeued using
   scsi_queue_insert()
- otherwise, scsi_finish_command() is invoked for scmd
+   - zero shost->host_failed
 LOCKING: queue or finish function performs appropriate locking
 
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/2] scsi: fix race between simultaneous decrements of ->host_failed

2016-06-07 Thread Wei Fang
sas_ata_strategy_handler() adds the works of the ata error handler
to system_unbound_wq. This workqueue asynchronously runs work items,
so the ata error handler will be performed concurrently on different
CPUs. In this case, ->host_failed will be decreased simultaneously in
scsi_eh_finish_cmd() on different CPUs, and become abnormal.

It will lead to permanently inequal between ->host_failed and
 ->host_busy, and scsi error handler thread won't become running.
IO errors after that won't be handled forever.

Since all scmds must have been handled in the strategy handle, just
remove the decrement in scsi_eh_finish_cmd() and zero ->host_busy
after the strategy handle to fix this race.

This fixes the problem introduced in
commit 50824d6c5657 ("[SCSI] libsas: async ata-eh").

Reviewed-by: James Bottomley 
Signed-off-by: Wei Fang 
---
Changes v1->v2:
- update Documentation/scsi/scsi_eh.txt about ->host_failed
Changes v2->v3:
- don't use atomic type, just zero ->host_failed after the strategy handle
Changes v3->v4:
- add new concurrency rules of ->host_failed in scsi_eh.txt

 drivers/ata/libata-eh.c   |2 +-
 drivers/scsi/scsi_error.c |4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 961acc7..91a9e6a 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -606,7 +606,7 @@ void ata_scsi_error(struct Scsi_Host *host)
ata_scsi_port_error_handler(host, ap);
 
/* finish or retry handled scmd's and clean up */
-   WARN_ON(host->host_failed || !list_empty(_work_q));
+   WARN_ON(!list_empty(_work_q));
 
DPRINTK("EXIT\n");
 }
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 984ddcb..1b9c049 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1127,7 +1127,6 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
  */
 void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
 {
-   scmd->device->host->host_failed--;
scmd->eh_eflags = 0;
list_move_tail(>eh_entry, done_q);
 }
@@ -2226,6 +2225,9 @@ int scsi_error_handler(void *data)
else
scsi_unjam_host(shost);
 
+   /* All scmds have been handled */
+   shost->host_failed = 0;
+
/*
 * Note - if the above fails completely, the action is to take
 * individual devices offline and flush the queue of any
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 0/8] nvmet: Add support for multi-tenant configfs

2016-06-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

Hi folks,

Here's the first pass of a nvmet multi-tenant configfs layout,
following what we've learned in target_core_fabric_configfs.c
wrt to independent operation of storage endpoints.

Here is how the running RFC-v1 code currently looks:

/sys/kernel/config/nvmet/subsystems/
└── nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep
├── namespaces
│   └── 1
│   └── ramdisk0 -> ../../../../../target/core/rd_mcp_1/ramdisk0
└── ports
└── loop
├── addr_adrfam
├── addr_portid
├── addr_traddr
├── addr_treq
├── addr_trsvcid
├── addr_trtype
└── enable

Namely, it allows existing /sys/kernel/config/target/core/ backends
to be configfs symlinked into ../nvmet/subsystems/$SUBSYS_NQN/
as nvme namespaces.

The series exposes T10-PI from /sys/kernel/config/target/core/ as
ID_NS.ms + ID_NS.dps feature bits, and enables block integrity
support with nvme/loop driver.

Note this series depends upon the following prerequisites of
target-core:

  http://marc.info/?l=linux-scsi=146527281416606=2

and of course, today's earlier release of nvmet + friends:

  http://lists.infradead.org/pipermail/linux-nvme/2016-June/004754.html

Note the full set of patches is available from:

  
https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/log/?h=nvmet-configfs-ng

Comments..?

--nab

Nicholas Bellinger (8):
  nvmet: Add nvmet_fabric_ops get/put transport helpers
  nvmet: Add support for configfs-ng multi-tenant logic
  nvmet: Hookup nvmet_ns->dev to nvmet_ns_enable
  nvmet/io-cmd: Hookup sbc_ops->execute_rw backend ops
  nvmet/io-cmd: Hookup sbc_ops->execute_sync_cache backend ops
  nvmet/io-cmd: Hookup sbc_ops->execute_unmap backend ops
  nvmet/admin-cmd: Hookup T10-PI to ID_NS.ms + ID_NS.dps feature bits
  nvme/loop: Add support for bio integrity handling

 drivers/nvme/target/Makefile  |   2 +-
 drivers/nvme/target/admin-cmd.c   |  17 ++
 drivers/nvme/target/configfs-ng.c | 585 ++
 drivers/nvme/target/configfs.c|   5 +-
 drivers/nvme/target/core.c|  83 +++---
 drivers/nvme/target/io-cmd.c  | 169 ++-
 drivers/nvme/target/loop.c|  19 ++
 drivers/nvme/target/nvmet.h   |  27 +-
 8 files changed, 799 insertions(+), 108 deletions(-)
 create mode 100644 drivers/nvme/target/configfs-ng.c

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 2/8] nvmet: Add support for configfs-ng multi-tenant logic

2016-06-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch introduces support for configfs-ng, that allows
for multi-tenant /sys/kernel/config/nvmet/subsystems/$SUBSYS_NQN/
operation, using existing /sys/kernel/config/target/core/
backends from target-core to be configfs symlinked as
per nvme-target subsystem NQN namespaces.

Here's how the layout looks:

/sys/kernel/config/nvmet/
└── subsystems
└── nqn.2003-01.org.linux-iscsi.NVMf.skylake-ep
├── namespaces
│   └── 1
│   └── ramdisk0 -> ../../../../../target/core/rd_mcp_1/ramdisk0
└── ports
└── loop
├── addr_adrfam
├── addr_portid
├── addr_traddr
├── addr_treq
├── addr_trsvcid
├── addr_trtype
└── enable

Also convert nvmet_find_get_subsys to port->nf_subsys, and
do the same for nvmet_host_discovery_allowed.

Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Martin Petersen 
Cc: Sagi Grimberg 
Cc: Hannes Reinecke 
Cc: Mike Christie 
Signed-off-by: Nicholas Bellinger 
---
 drivers/nvme/target/Makefile  |   2 +-
 drivers/nvme/target/configfs-ng.c | 586 ++
 drivers/nvme/target/configfs.c|   5 +-
 drivers/nvme/target/core.c|  22 +-
 drivers/nvme/target/nvmet.h   |  11 +
 5 files changed, 608 insertions(+), 18 deletions(-)
 create mode 100644 drivers/nvme/target/configfs-ng.c

diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
index b7a0623..2799e07 100644
--- a/drivers/nvme/target/Makefile
+++ b/drivers/nvme/target/Makefile
@@ -3,7 +3,7 @@ obj-$(CONFIG_NVME_TARGET)   += nvmet.o
 obj-$(CONFIG_NVME_TARGET_LOOP) += nvme-loop.o
 obj-$(CONFIG_NVME_TARGET_RDMA) += nvmet-rdma.o
 
-nvmet-y+= core.o configfs.o admin-cmd.o io-cmd.o fabrics-cmd.o 
\
+nvmet-y+= core.o configfs-ng.o admin-cmd.o io-cmd.o 
fabrics-cmd.o \
discovery.o
 nvme-loop-y+= loop.o
 nvmet-rdma-y   += rdma.o
diff --git a/drivers/nvme/target/configfs-ng.c 
b/drivers/nvme/target/configfs-ng.c
new file mode 100644
index 000..d495017
--- /dev/null
+++ b/drivers/nvme/target/configfs-ng.c
@@ -0,0 +1,586 @@
+/*
+ * Based on target_core_fabric_configfs.c code
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "nvmet.h"
+
+/*
+ * nvmet_port Generic ConfigFS definitions.
+ */
+static ssize_t nvmet_port_addr_adrfam_show(struct config_item *item,
+   char *page)
+{
+   switch (to_nvmet_port(item)->disc_addr.adrfam) {
+   case NVMF_ADDR_FAMILY_IP4:
+   return sprintf(page, "ipv4\n");
+   case NVMF_ADDR_FAMILY_IP6:
+   return sprintf(page, "ipv6\n");
+   case NVMF_ADDR_FAMILY_IB:
+   return sprintf(page, "ib\n");
+   default:
+   return sprintf(page, "\n");
+   }
+}
+
+static ssize_t nvmet_port_addr_adrfam_store(struct config_item *item,
+   const char *page, size_t count)
+{
+   struct nvmet_port *port = to_nvmet_port(item);
+
+   if (port->enabled) {
+   pr_err("Cannot modify address while enabled\n");
+   pr_err("Disable the address before modifying\n");
+   return -EACCES;
+   }
+
+   if (sysfs_streq(page, "ipv4")) {
+   port->disc_addr.adrfam = NVMF_ADDR_FAMILY_IP4;
+   } else if (sysfs_streq(page, "ipv6")) {
+   port->disc_addr.adrfam = NVMF_ADDR_FAMILY_IP6;
+   } else if (sysfs_streq(page, "ib")) {
+   port->disc_addr.adrfam = NVMF_ADDR_FAMILY_IB;
+   } else {
+   pr_err("Invalid value '%s' for adrfam\n", page);
+   return -EINVAL;
+   }
+
+   return count;
+}
+
+CONFIGFS_ATTR(nvmet_port_, addr_adrfam);
+
+static ssize_t nvmet_port_addr_portid_show(struct config_item *item,
+   char *page)
+{
+   struct nvmet_port *port = to_nvmet_port(item);
+
+   return snprintf(page, PAGE_SIZE, "%d\n",
+   le16_to_cpu(port->disc_addr.portid));
+}
+
+static ssize_t nvmet_port_addr_portid_store(struct config_item *item,
+   const char *page, size_t count)
+{
+   struct nvmet_port *port = to_nvmet_port(item);
+   u16 portid = 0;
+
+   if (kstrtou16(page, 0, )) {
+   pr_err("Invalid value '%s' for portid\n", page);
+   return -EINVAL;
+   }
+
+   if (port->enabled) {
+   pr_err("Cannot modify address while enabled\n");
+   pr_err("Disable the address before modifying\n");
+   return -EACCES;
+   }
+   port->disc_addr.portid = cpu_to_le16(portid);
+   return count;
+}
+
+CONFIGFS_ATTR(nvmet_port_, addr_portid);
+
+static ssize_t nvmet_port_addr_traddr_show(struct 

[RFC 5/8] nvmet/io-cmd: Hookup sbc_ops->execute_sync_cache backend ops

2016-06-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch converts nvmet_execute_flush() to utilize
sbc_ops->execute_sync_cache() for target_iostate
submission into existing backends drivers via
configfs in /sys/kernel/config/target/core/.

Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Martin Petersen 
Cc: Sagi Grimberg 
Cc: Hannes Reinecke 
Cc: Mike Christie 
Signed-off-by: Nicholas Bellinger 
---
 drivers/nvme/target/io-cmd.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c
index 133a14a..23905a8 100644
--- a/drivers/nvme/target/io-cmd.c
+++ b/drivers/nvme/target/io-cmd.c
@@ -109,18 +109,21 @@ static void nvmet_execute_rw(struct nvmet_req *req)
 
 static void nvmet_execute_flush(struct nvmet_req *req)
 {
-#if 0
-   struct bio *bio;
+   struct target_iostate *ios = >t_iostate;
+   struct se_device *dev = rcu_dereference_raw(req->ns->dev);
+   struct sbc_ops *sbc_ops = dev->transport->sbc_ops;
+   sense_reason_t rc;
 
-   nvmet_inline_bio_init(req);
-   bio = >inline_bio;
+   if (!sbc_ops || !sbc_ops->execute_sync_cache) {
+   nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
+   return;
+   }
 
-   bio->bi_bdev = req->ns->bdev;
-   bio->bi_private = req;
-   bio->bi_end_io = nvmet_bio_done;
+   ios->se_dev = dev;
+   ios->iomem = NULL;
+   ios->t_comp_func = _complete_ios;
 
-   submit_bio(WRITE_FLUSH, bio);
-#endif
+   rc = sbc_ops->execute_sync_cache(ios, false);
 }
 
 #if 0
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 7/8] nvmet/admin-cmd: Hookup T10-PI to ID_NS.ms + ID_NS.dps feature bits

2016-06-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch updates nvmet_execute_identify_ns() to report
target-core backend T10-PI related feature bits to the
NVMe host controller.

Note this assumes support for NVME_NS_DPC_PI_TYPE1 and
NVME_NS_DPC_PI_TYPE3 as reported by backend drivers via
/sys/kernel/config/target/core/*/*/attrib/pi_prot_type.

Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Martin Petersen 
Cc: Sagi Grimberg 
Cc: Hannes Reinecke 
Cc: Mike Christie 
Signed-off-by: Nicholas Bellinger 
---
 drivers/nvme/target/admin-cmd.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 240e323..3a808dc 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -200,6 +200,7 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 {
struct nvmet_ns *ns;
struct nvme_id_ns *id;
+   struct se_device *dev;
u16 status = 0;
 
ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
@@ -228,6 +229,22 @@ static void nvmet_execute_identify_ns(struct nvmet_req 
*req)
id->nlbaf = 0;
id->flbas = 0;
 
+   /* Populate bits for T10-PI from se_device backend */
+   rcu_read_lock();
+   dev = rcu_dereference(ns->dev);
+   if (dev && dev->dev_attrib.pi_prot_type) {
+   int pi_prot_type = dev->dev_attrib.pi_prot_type;
+
+   id->lbaf[0].ms = cpu_to_le16(sizeof(struct t10_pi_tuple));
+   printk("nvmet_set_id_ns: ms: %u\n", id->lbaf[0].ms);
+
+   if (pi_prot_type == 1)
+   id->dps = NVME_NS_DPC_PI_TYPE1;
+   else if (pi_prot_type == 3)
+   id->dps = NVME_NS_DPC_PI_TYPE3;
+   }
+   rcu_read_unlock();
+
/*
 * Our namespace might always be shared.  Not just with other
 * controllers, but also with any other user of the block device.
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 8/8] nvme/loop: Add support for bio integrity handling

2016-06-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch adds support for nvme/loop block integrity,
based upon the reported ID_NS.ms + ID_NS.dps feature
bits in nvmet_execute_identify_ns().

Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Martin Petersen 
Cc: Sagi Grimberg 
Cc: Hannes Reinecke 
Cc: Mike Christie 
Signed-off-by: Nicholas Bellinger 
---
 drivers/nvme/target/loop.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 08b4fbb..b4b4da9 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -42,6 +42,7 @@ struct nvme_loop_iod {
struct nvme_loop_queue  *queue;
struct work_struct  work;
struct sg_table sg_table;
+   struct scatterlist  meta_sg;
struct scatterlist  first_sgl[];
 };
 
@@ -193,6 +194,24 @@ static int nvme_loop_queue_rq(struct blk_mq_hw_ctx *hctx,
BUG_ON(iod->req.sg_cnt > req->nr_phys_segments);
}
 
+   if (blk_integrity_rq(req)) {
+   int count;
+
+   if (blk_rq_count_integrity_sg(hctx->queue, req->bio) != 1)
+   BUG_ON(1);
+
+   sg_init_table(>meta_sg, 1);
+   count = blk_rq_map_integrity_sg(hctx->queue, req->bio,
+   >meta_sg);
+
+   iod->req.prot_sg = >meta_sg;
+   iod->req.prot_sg_cnt = 1;
+#if 0
+   printk("nvme/loop: Set prot_sg %p and prot_sg_cnt: %d\n",
+   iod->req.prot_sg, iod->req.prot_sg_cnt);
+#endif
+   }
+
iod->cmd.common.command_id = req->tag;
blk_mq_start_request(req);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 6/8] nvmet/io-cmd: Hookup sbc_ops->execute_unmap backend ops

2016-06-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch converts nvmet_execute_discard() to utilize
sbc_ops->execute_unmap() for target_iostate submission
into existing backends drivers via configfs in
/sys/kernel/config/target/core/.

Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Martin Petersen 
Cc: Sagi Grimberg 
Cc: Hannes Reinecke 
Cc: Mike Christie 
Signed-off-by: Nicholas Bellinger 
---
 drivers/nvme/target/io-cmd.c | 47 
 1 file changed, 34 insertions(+), 13 deletions(-)

diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c
index 23905a8..605f560 100644
--- a/drivers/nvme/target/io-cmd.c
+++ b/drivers/nvme/target/io-cmd.c
@@ -126,52 +126,73 @@ static void nvmet_execute_flush(struct nvmet_req *req)
rc = sbc_ops->execute_sync_cache(ios, false);
 }
 
-#if 0
-static u16 nvmet_discard_range(struct nvmet_ns *ns,
-   struct nvme_dsm_range *range, int type, struct bio **bio)
+static u16 nvmet_discard_range(struct nvmet_req *req, struct sbc_ops *sbc_ops,
+   struct nvme_dsm_range *range, struct bio **bio)
 {
-   if (__blkdev_issue_discard(ns->bdev,
+   struct nvmet_ns *ns = req->ns;
+   sense_reason_t rc;
+
+   rc = sbc_ops->execute_unmap(>t_iostate,
le64_to_cpu(range->slba) << (ns->blksize_shift - 9),
le32_to_cpu(range->nlb) << (ns->blksize_shift - 9),
-   GFP_KERNEL, type, bio))
+   bio);
+   if (rc)
return NVME_SC_INTERNAL | NVME_SC_DNR;
 
return 0;
 }
-#endif
+
+static void nvmet_discard_bio_done(struct bio *bio)
+{
+   struct nvmet_req *req = bio->bi_private;
+   int err = bio->bi_error;
+
+   bio_put(bio);
+   nvmet_req_complete(req, err ? NVME_SC_INTERNAL | NVME_SC_DNR : 0);
+}
 
 static void nvmet_execute_discard(struct nvmet_req *req)
 {
-#if 0
-   struct nvme_dsm_range range;
+   struct target_iostate *ios = >t_iostate;
+   struct se_device *dev = rcu_dereference_raw(req->ns->dev);
+   struct sbc_ops *sbc_ops = dev->transport->sbc_ops;
struct bio *bio = NULL;
-   int type = REQ_WRITE | REQ_DISCARD, i;
+   struct nvme_dsm_range range;
+   int i;
u16 status;
 
+   if (!sbc_ops || !sbc_ops->execute_unmap) {
+   nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
+   return;
+   }
+
+   ios->se_dev = dev;
+   ios->iomem = NULL;
+   ios->t_comp_func = NULL;
+
for (i = 0; i <= le32_to_cpu(req->cmd->dsm.nr); i++) {
status = nvmet_copy_from_sgl(req, i * sizeof(range), ,
sizeof(range));
if (status)
break;
 
-   status = nvmet_discard_range(req->ns, , type, );
+   status = nvmet_discard_range(req, sbc_ops, , );
if (status)
break;
}
 
if (bio) {
bio->bi_private = req;
-   bio->bi_end_io = nvmet_bio_done;
+   bio->bi_end_io = nvmet_discard_bio_done;
if (status) {
bio->bi_error = -EIO;
bio_endio(bio);
} else {
-   submit_bio(type, bio);
+   submit_bio(REQ_WRITE | REQ_DISCARD, bio);
}
} else {
nvmet_req_complete(req, status);
}
-#endif
 }
 
 static void nvmet_execute_dsm(struct nvmet_req *req)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 3/8] nvmet: Hookup nvmet_ns->dev to nvmet_ns_enable

2016-06-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch hooks up nvmet_ns_enable() to accept the RCU protected
struct se_device provided as a configfs symlink from existing
/sys/kernel/config/target/core/ driver backends.

Also, drop the now unused internal ns->bdev + ns->device_path
usage, and add WIP stubs for nvmet/io-cmd sbc_ops backend
conversion to be added in subsequent patches.

Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Martin Petersen 
Cc: Sagi Grimberg 
Cc: Hannes Reinecke 
Cc: Mike Christie 
Signed-off-by: Nicholas Bellinger 
---
 drivers/nvme/target/configfs-ng.c |  3 +--
 drivers/nvme/target/core.c| 30 --
 drivers/nvme/target/io-cmd.c  | 17 +++--
 drivers/nvme/target/nvmet.h   |  6 ++
 4 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/drivers/nvme/target/configfs-ng.c 
b/drivers/nvme/target/configfs-ng.c
index d495017..e160186 100644
--- a/drivers/nvme/target/configfs-ng.c
+++ b/drivers/nvme/target/configfs-ng.c
@@ -392,8 +392,7 @@ static int nvmet_ns_link(struct config_item *ns_ci, struct 
config_item *dev_ci)
return -ENOSYS;
}
 
-   // XXX: Pass in struct se_device into nvmet_ns_enable
-   return nvmet_ns_enable(ns);
+   return nvmet_ns_enable(ns, dev);
 }
 
 static int nvmet_ns_unlink(struct config_item *ns_ci, struct config_item 
*dev_ci)
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 7b42d2b..171e440 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -13,6 +13,8 @@
  */
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 #include 
+#include 
+#include 
 #include "nvmet.h"
 
 static struct nvmet_fabrics_ops *nvmet_transports[NVMF_TRTYPE_MAX];
@@ -287,7 +289,7 @@ void nvmet_put_namespace(struct nvmet_ns *ns)
percpu_ref_put(>ref);
 }
 
-int nvmet_ns_enable(struct nvmet_ns *ns)
+int nvmet_ns_enable(struct nvmet_ns *ns, struct se_device *dev)
 {
struct nvmet_subsys *subsys = ns->subsys;
struct nvmet_ctrl *ctrl;
@@ -297,23 +299,14 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
if (!list_empty(>dev_link))
goto out_unlock;
 
-   ns->bdev = blkdev_get_by_path(ns->device_path, FMODE_READ | FMODE_WRITE,
-   NULL);
-   if (IS_ERR(ns->bdev)) {
-   pr_err("nvmet: failed to open block device %s: (%ld)\n",
-   ns->device_path, PTR_ERR(ns->bdev));
-   ret = PTR_ERR(ns->bdev);
-   ns->bdev = NULL;
-   goto out_unlock;
-   }
-
-   ns->size = i_size_read(ns->bdev->bd_inode);
-   ns->blksize_shift = blksize_bits(bdev_logical_block_size(ns->bdev));
+   rcu_assign_pointer(ns->dev, dev);
+   ns->size = dev->transport->get_blocks(dev) * 
dev->dev_attrib.hw_block_size;
+   ns->blksize_shift = blksize_bits(dev->dev_attrib.hw_block_size);
 
ret = percpu_ref_init(>ref, nvmet_destroy_namespace,
0, GFP_KERNEL);
if (ret)
-   goto out_blkdev_put;
+   goto out_unlock;
 
if (ns->nsid > subsys->max_nsid)
subsys->max_nsid = ns->nsid;
@@ -343,10 +336,6 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 out_unlock:
mutex_unlock(>lock);
return ret;
-out_blkdev_put:
-   blkdev_put(ns->bdev, FMODE_WRITE|FMODE_READ);
-   ns->bdev = NULL;
-   goto out_unlock;
 }
 
 void nvmet_ns_disable(struct nvmet_ns *ns)
@@ -379,16 +368,13 @@ void nvmet_ns_disable(struct nvmet_ns *ns)
list_for_each_entry(ctrl, >ctrls, subsys_entry)
nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE, 0, 0);
 
-   if (ns->bdev)
-   blkdev_put(ns->bdev, FMODE_WRITE|FMODE_READ);
+   rcu_assign_pointer(ns->dev, NULL);
mutex_unlock(>lock);
 }
 
 void nvmet_ns_free(struct nvmet_ns *ns)
 {
nvmet_ns_disable(ns);
-
-   kfree(ns->device_path);
kfree(ns);
 }
 
diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c
index 76dbf73..38c2e97 100644
--- a/drivers/nvme/target/io-cmd.c
+++ b/drivers/nvme/target/io-cmd.c
@@ -16,6 +16,7 @@
 #include 
 #include "nvmet.h"
 
+#if 0
 static void nvmet_bio_done(struct bio *bio)
 {
struct nvmet_req *req = bio->bi_private;
@@ -26,6 +27,7 @@ static void nvmet_bio_done(struct bio *bio)
if (bio != >inline_bio)
bio_put(bio);
 }
+#endif
 
 static inline u32 nvmet_rw_len(struct nvmet_req *req)
 {
@@ -33,6 +35,7 @@ static inline u32 nvmet_rw_len(struct nvmet_req *req)
req->ns->blksize_shift;
 }
 
+#if 0
 static void nvmet_inline_bio_init(struct nvmet_req *req)
 {
struct bio *bio = >inline_bio;
@@ -41,21 +44,23 @@ static void nvmet_inline_bio_init(struct nvmet_req *req)
bio->bi_max_vecs = NVMET_MAX_INLINE_BIOVEC;

[RFC 4/8] nvmet/io-cmd: Hookup sbc_ops->execute_rw backend ops

2016-06-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch converts nvmet_execute_rw() to utilize sbc_ops->execute_rw()
for target_iostate + target_iomem based I/O submission into existing
backends drivers via configfs in /sys/kernel/config/target/core/.

This includes support for passing T10-PI scatterlists via target_iomem
into existing sbc_ops->execute_rw() logic, and is functioning with
IBLOCK, FILEIO, and RAMDISK.

Note the preceeding target/iblock patch absorbs inline bio + bvecs
and blk_poll() optimizations from Ming + Sagi in nvmet/io-cmd into
target_core_iblock.c code.

Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Martin Petersen 
Cc: Sagi Grimberg 
Cc: Hannes Reinecke 
Cc: Mike Christie 
Signed-off-by: Nicholas Bellinger 
---
 drivers/nvme/target/io-cmd.c | 116 ++-
 drivers/nvme/target/nvmet.h  |   7 +++
 2 files changed, 67 insertions(+), 56 deletions(-)

diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c
index 38c2e97..133a14a 100644
--- a/drivers/nvme/target/io-cmd.c
+++ b/drivers/nvme/target/io-cmd.c
@@ -14,20 +14,16 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 #include 
 #include 
+#include 
+#include 
 #include "nvmet.h"
 
-#if 0
-static void nvmet_bio_done(struct bio *bio)
+static void nvmet_complete_ios(struct target_iostate *ios, u16 status)
 {
-   struct nvmet_req *req = bio->bi_private;
-
-   nvmet_req_complete(req,
-   bio->bi_error ? NVME_SC_INTERNAL | NVME_SC_DNR : 0);
+   struct nvmet_req *req = container_of(ios, struct nvmet_req, t_iostate);
 
-   if (bio != >inline_bio)
-   bio_put(bio);
+   nvmet_req_complete(req, status ? NVME_SC_INTERNAL | NVME_SC_DNR : 0);
 }
-#endif
 
 static inline u32 nvmet_rw_len(struct nvmet_req *req)
 {
@@ -35,72 +31,80 @@ static inline u32 nvmet_rw_len(struct nvmet_req *req)
req->ns->blksize_shift;
 }
 
-#if 0
-static void nvmet_inline_bio_init(struct nvmet_req *req)
-{
-   struct bio *bio = >inline_bio;
-
-   bio_init(bio);
-   bio->bi_max_vecs = NVMET_MAX_INLINE_BIOVEC;
-   bio->bi_io_vec = req->inline_bvec;
-}
-#endif
-
 static void nvmet_execute_rw(struct nvmet_req *req)
 {
-#if 0
-   int sg_cnt = req->sg_cnt;
-   struct scatterlist *sg;
-   struct bio *bio;
+   struct target_iostate *ios = >t_iostate;
+   struct target_iomem *iomem = >t_iomem;
+   struct se_device *dev = rcu_dereference_raw(req->ns->dev);
+   struct sbc_ops *sbc_ops = dev->transport->sbc_ops;
sector_t sector;
-   blk_qc_t cookie;
-   int rw, i;
-#endif
+   enum dma_data_direction data_direction;
+   sense_reason_t rc;
+   bool fua_write = false, prot_enabled = false;
+
+   if (!sbc_ops || !sbc_ops->execute_rw) {
+   nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
+   return;
+   }
+
if (!req->sg_cnt) {
nvmet_req_complete(req, 0);
return;
}
-#if 0
+
if (req->cmd->rw.opcode == nvme_cmd_write) {
if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
-   rw = WRITE_FUA;
-   else
-   rw = WRITE;
+   fua_write = true;
+
+   data_direction = DMA_TO_DEVICE;
} else {
-   rw = READ;
+   data_direction = DMA_FROM_DEVICE;
}
 
sector = le64_to_cpu(req->cmd->rw.slba);
sector <<= (req->ns->blksize_shift - 9);
 
-   nvmet_inline_bio_init(req);
-   bio = >inline_bio;
-   bio->bi_bdev = req->ns->bdev;
-   bio->bi_iter.bi_sector = sector;
-   bio->bi_private = req;
-   bio->bi_end_io = nvmet_bio_done;
-
-   for_each_sg(req->sg, sg, req->sg_cnt, i) {
-   while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
-   != sg->length) {
-   struct bio *prev = bio;
-
-   bio = bio_alloc(GFP_KERNEL, min(sg_cnt, BIO_MAX_PAGES));
-   bio->bi_bdev = req->ns->bdev;
-   bio->bi_iter.bi_sector = sector;
-
-   bio_chain(bio, prev);
-   cookie = submit_bio(rw, prev);
-   }
+   ios->t_task_lba = sector;
+   ios->data_length = nvmet_rw_len(req);
+   ios->data_direction = data_direction;
+   iomem->t_data_sg = req->sg;
+   iomem->t_data_nents = req->sg_cnt;
+   iomem->t_prot_sg = req->prot_sg;
+   iomem->t_prot_nents = req->prot_sg_cnt;
+
+   // XXX: Make common between sbc_check_prot and nvme-target
+   switch (dev->dev_attrib.pi_prot_type) {
+   case TARGET_DIF_TYPE3_PROT:
+   ios->reftag_seed = 0x;
+   prot_enabled = true;
+   break;
+   case 

[RFC 1/8] nvmet: Add nvmet_fabric_ops get/put transport helpers

2016-06-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch introduces two helpers for obtaining + releasing
nvmet_fabric_ops for nvmet_port usage, and the associated
struct module ops->owner reference.

This is required in order to support nvmet/configfs-ng
and multiple nvmet_port configfs groups living under
/sys/kernel/config/nvmet/subsystems/$SUBSYS_NQN/ports/

Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Martin Petersen 
Cc: Sagi Grimberg 
Cc: Hannes Reinecke 
Cc: Mike Christie 
Signed-off-by: Nicholas Bellinger 
---
 drivers/nvme/target/core.c  | 31 +++
 drivers/nvme/target/nvmet.h |  3 +++
 2 files changed, 34 insertions(+)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index e0b3f01..9af813c 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -191,6 +191,37 @@ void nvmet_disable_port(struct nvmet_port *port)
module_put(ops->owner);
 }
 
+struct nvmet_fabrics_ops *nvmet_get_transport(struct nvmet_port *port)
+{
+   struct nvmet_fabrics_ops *ops;
+
+   down_write(_config_sem);
+   ops = nvmet_transports[port->disc_addr.trtype];
+   if (!ops) {
+   pr_err("transport type %d not supported\n",
+   port->disc_addr.trtype);
+   return ERR_PTR(-EINVAL);
+   }
+
+   if (!try_module_get(ops->owner)) {
+   up_write(_config_sem);
+   return ERR_PTR(-EINVAL);
+   }
+   up_write(_config_sem);
+
+   return ops;
+}
+
+void nvmet_put_transport(struct nvmet_port *port)
+{
+   struct nvmet_fabrics_ops *ops;
+
+   down_write(_config_sem);
+   ops = nvmet_transports[port->disc_addr.trtype];
+   module_put(ops->owner);
+   up_write(_config_sem);
+}
+
 static void nvmet_keep_alive_timer(struct work_struct *work)
 {
struct nvmet_ctrl *ctrl = container_of(to_delayed_work(work),
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 57dd6d8..2bf15088b 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -299,6 +299,9 @@ void nvmet_unregister_transport(struct nvmet_fabrics_ops 
*ops);
 int nvmet_enable_port(struct nvmet_port *port);
 void nvmet_disable_port(struct nvmet_port *port);
 
+struct nvmet_fabrics_ops *nvmet_get_transport(struct nvmet_port *port);
+void nvmet_put_transport(struct nvmet_port *port);
+
 void nvmet_referral_enable(struct nvmet_port *parent, struct nvmet_port *port);
 void nvmet_referral_disable(struct nvmet_port *port);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NVMe over Fabrics target implementation

2016-06-07 Thread Nicholas A. Bellinger
Hi HCH & Co,

On Mon, 2016-06-06 at 23:22 +0200, Christoph Hellwig wrote:
> This patch set adds a generic NVMe over Fabrics target. The
> implementation conforms to the NVMe 1.2b specification (which
> includes Fabrics) and provides the NVMe over Fabrics access
> to Linux block devices.
> 

Thanks for all of the development work by the fabric_linux_driver team
(HCH, Sagi, Ming, James F., James S., and Dave M.) over the last year. 

Very excited to see this code get a public release now that NVMf
specification is out.  Now that it's in the wild, it's a good
opportunity to discuss some of the more interesting implementation
details, beyond the new NVMf wire-protocol itself.

(Adding target-devel + linux-scsi CC')

> The target implementation consists of several elements:
> 
> - NVMe target core: defines and manages the NVMe entities (subsystems,
>   controllers, namespaces, ...) and their allocation, responsible
>   for initial commands processing and correct orchestration of
>   the stack setup and tear down.
> 
> - NVMe admin command implementation: responsible for parsing and
>   servicing admin commands such as controller identify, set features,
>   keep-alive, log page, ...).
> 
> - NVMe I/O command implementation: responsible for performing the actual
>   I/O (Read, Write, Flush, Deallocate (aka Discard).  It is a very thin
>   layer on top of the block layer and implements no logic of it's own.
>   To support exporting file systems please use the loopback block driver
>   in direct I/O mode, which gives very good performance.
> 
> - NVMe over Fabrics support: responsible for servicing Fabrics commands
>   (connect, property get/set).
> 
> - NVMe over Fabrics discovery service: responsible to serve the Discovery
>   log page through a special cut down Discovery controller.
> 
> The target is configured using configfs, and configurable entities are:
> 
>  - NVMe subsystems and namespaces
>  - NVMe over Fabrics ports and referrals
>  - Host ACLs for primitive access control - NVMe over Fabrics access
>control is still work in progress at the specification level and
>will be implemented once that work has finished.
> 
> To configure the target use the nvmetcli tool from
> http://git.infradead.org/users/hch/nvmetcli.git, which includes detailed
> setup documentation.
> 
> In addition to the Fabrics target implementation we provide a loopback
> driver which also conforms the NVMe over Fabrics specification and allows
> evaluation of the target stack with local access without requiring a real
> fabric.
> 

So as-is, I have two main objections that been discussed off-list for
some time, that won't be a big surprise to anyone following
fabrics_linux_driver list.  ;P

First topic, I think nvme-target name-spaces should be utilizing
existing configfs logic, and sharing /sys/kernel/config/target/core/
backend driver symlinks as individual nvme-target subsystem namespaces.

That is, we've already got a configfs ABI in place for target mode
back-ends that today is able to operate independently from SCSI
architecture model dependencies.

To that end, the prerequisite series to allow target-core backends to
operate independent of se_cmd, and allow se_device backends to be
configfs symlinked directly into /sys/kernel/config/nvmet/, outside
of /sys/kernel/config/target/$FABRIC/ has been posted earlier here:

http://marc.info/?l=linux-scsi=146527281416606=2

Note the -v2 series has absorbed the nvmet/io-cmd execute_rw()
improvements from Sagi + Ming (inline bio/bvec and blk_poll) into
target_core_iblock.c driver code.

Second topic, and more important from a kernel ABI perspective are the
current scale limitations around the first pass of nvmet configfs.c
layout code in /sys/kernel/config/nvmet/.

Namely, the design of having three top level configfs groups in
/sys/kernel/config/nvmet/[subsystems,ports,hosts] that are configfs
symlinked between each other, with a single rw_mutex (nvmet_config_sem)
used for global list lookup and enforcing a globally synchronized
nvmet_fabrics_ops->add_port() creation across all subsystem NQN ports.

>From the shared experience in target_core_fabric_configfs.c over the
last 8 years, perhaps the greatest strength of configfs has been it's
ability to allow config_item_type parent/child relationships to exist
and operate independently of one another.

Specifically in the context of storage tenants, this means creation +
deletion of one backend + target fabric endpoint tenant, should not
block creation + deletion of another backend + target fabric endpoint
tenant.

As-is, a nvmet configfs layout holding a global mutex across
subsystem/port/host creation + deletion, and doing internal list lookup
within configfs ->allow_link + ->drop_link callbacks ends up being
severely limiting when scaling up the total number of nvmet subsystem
NQNs and ports.

Specifically, modern deployments of /sys/kernel/config/target/iscsi/
expect backends + fabric endpoints to be configured in parallel at
<