Re: [RFD] I/O scheduling in blk-mq

2016-09-29 Thread Paolo Valente
Hi Omar,
have you had a chance to look at these last questions of mine?

Thanks,
Paolo

> Il giorno 31 ago 2016, alle ore 17:20, Paolo Valente 
>  ha scritto:
> 
> 
> Il giorno 08/ago/2016, alle ore 22:09, Omar Sandoval  ha 
> scritto:
> 
>> On Mon, Aug 08, 2016 at 04:09:56PM +0200, Paolo wrote:
>>> Hi Jens, Tejun, Christoph, all,
>>> AFAIK blk-mq does not yet feature I/O schedulers. In particular, there
>>> is no scheduler providing strong guarantees in terms of
>>> responsiveness, latency for time-sensitive applications and bandwidth
>>> distribution.
>>> 
>>> For this reason, I'm trying to port BFQ to blk-mq, or to develop
>>> something simpler if even a reduced version of BFQ proves to be too
>>> heavy (this project is supported by Linaro). If you are willing to
>>> provide some feedback in this respect, I would like to ask for
>>> opinions/suggestions on the following two matters, and possibly to
>>> open a more general discussion on I/O scheduling in blk-mq.
>>> 
>>> 1) My idea is to have an independent instance of BFQ, or in general of
>>> the I/O scheduler, executed for each software queue. Then there would
>>> be no global scheduling. The drawback of no global scheduling is that
>>> each process cannot get more than 1/M of the total throughput of the
>>> device, if M is the number of software queues. But, if I'm not
>>> mistaken, it is however unfeasible to give a process more than 1/M of
>>> the total throughput, without lowering the throughput itself. In fact,
>>> giving a process more than 1/M of the total throughput implies serving
>>> its software queue, say Q, more than the others.  The only way to do
>>> it is periodically stopping the service of the other software queues
>>> and dispatching only the requests in Q. But this would reduce
>>> parallelism, which is the main way how blk-mq achieves a very high
>>> throughput. Are these considerations, and, in particular, one
>>> independent I/O scheduler per software queue, sensible?
>>> 
>>> 2) To provide per-process service guarantees, an I/O scheduler must
>>> create per-process internal queues. BFQ and CFQ use I/O contexts to
>>> achieve this goal. Is something like that (or exactly the same)
>>> available also in blk-mq? If so, do you have any suggestion, or link to
>>> documentation/code on how to use what is available in blk-mq?
>>> 
>>> Thanks,
>>> Paolo
>> 
>> Hi, Paolo,
>> 
>> I've been working on I/O scheduling for blk-mq with Jens for the past
>> few months (splitting time with other small projects), and we're making
>> good progress. Like you noticed, the hard part isn't really grafting a
>> scheduler interface onto blk-mq, it's maintaining good scalability while
>> providing adequate fairness.
>> 
>> We're working towards a scheduler more like deadline and getting the
>> architectural issues worked out. The goal is some sort of fairness
>> across all queues.
> 
> If I'm not mistaken, the requests of a process (the bios after your
> patch) end up in a given software queue basically by chance, i.e.,
> because the process happens to be executed on the core which that
> queue is associated with. If this is true, then the scheduler cannot
> control in which queue a request is sent. So, how do you imagine the
> scheduler to control the global request service order exactly? By
> stopping the service of some queues and letting only the head-of-line
> request(s) of some other queue(s) be dispatched?
> 
> In this respect, I guess that, as of now, it is again chance that
> determines from which software queue the next request to dispatch is
> picked, i.e., it depends on which core the dispatch functions happen
> to be executed. Is it correct?
> 
>> The scheduler-per-software-queue model won't hold up
>> so well if we have a slower device with an I/O-hungry process on one CPU
>> and an interactive process on another CPU.
>> 
> 
> So, the problem would be that the hungry process eats all the
> bandwidth, and the interactive one never gets served.
> 
> What about the case where both processes are on the same CPU, i.e.,
> where the requests of both processes are on the same software queue?
> How does the scheduler you envisage guarantees a good latency to the
> interactive process in this case? By properly reordering requests
> inside the software queue?
> 
> I'm sorry if my questions are quite silly, or do not make much sense.
> 
> Thanks,
> Paolo
> 
> 
>> The issue I'm working through now is that on blk-mq, we only have as
>> many `struct request`s as the hardware has tags, so on a device with a
>> limited queue depth, it's really hard to do any sort of intelligent
>> scheduling. The solution for that is switching over to working with
>> `struct bio`s in the software queues instead, which abstracts away the
>> hardware capabilities. I have some work in progress at
>> https://github.com/osandov/linux/tree/blk-mq-iosched, but it's not yet
>> at feature-parity.
>> 
>> After that, I'll be back to working on the scheduling itself. The vague
>> idea

[PATCH v5 2/7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes

2016-09-29 Thread Damien Le Moal
From: Hannes Reinecke 

The queue limits already have a 'chunk_sectors' setting, so
we should be presenting it via sysfs.

Signed-off-by: Hannes Reinecke 

[Damien: Updated Documentation/ABI/testing/sysfs-block]

Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Shaun Tancheff 
Tested-by: Shaun Tancheff 

---
 Documentation/ABI/testing/sysfs-block | 13 +
 block/blk-sysfs.c | 11 +++
 2 files changed, 24 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block 
b/Documentation/ABI/testing/sysfs-block
index 75a5055..ee2d5cd 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -251,3 +251,16 @@ Description:
since drive-managed zoned block devices do not support
zone commands, they will be treated as regular block
devices and zoned will report "none".
+
+What:  /sys/block//queue/chunk_sectors
+Date:  September 2016
+Contact:   Hannes Reinecke 
+Description:
+   chunk_sectors has different meaning depending on the type
+   of the disk. For a RAID device (dm-raid), chunk_sectors
+   indicates the size in 512B sectors of the RAID volume
+   stripe segment. For a zoned block device, either
+   host-aware or host-managed, chunk_sectors indicates the
+   size of 512B sectors of the zones of the device, with
+   the eventual exception of the last zone of the device
+   which may be smaller.
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index ff9cd9c..488c2e2 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -130,6 +130,11 @@ static ssize_t queue_physical_block_size_show(struct 
request_queue *q, char *pag
return queue_var_show(queue_physical_block_size(q), page);
 }
 
+static ssize_t queue_chunk_sectors_show(struct request_queue *q, char *page)
+{
+   return queue_var_show(q->limits.chunk_sectors, page);
+}
+
 static ssize_t queue_io_min_show(struct request_queue *q, char *page)
 {
return queue_var_show(queue_io_min(q), page);
@@ -455,6 +460,11 @@ static struct queue_sysfs_entry 
queue_physical_block_size_entry = {
.show = queue_physical_block_size_show,
 };
 
+static struct queue_sysfs_entry queue_chunk_sectors_entry = {
+   .attr = {.name = "chunk_sectors", .mode = S_IRUGO },
+   .show = queue_chunk_sectors_show,
+};
+
 static struct queue_sysfs_entry queue_io_min_entry = {
.attr = {.name = "minimum_io_size", .mode = S_IRUGO },
.show = queue_io_min_show,
@@ -555,6 +565,7 @@ static struct attribute *default_attrs[] = {
&queue_hw_sector_size_entry.attr,
&queue_logical_block_size_entry.attr,
&queue_physical_block_size_entry.attr,
+   &queue_chunk_sectors_entry.attr,
&queue_io_min_entry.attr,
&queue_io_opt_entry.attr,
&queue_discard_granularity_entry.attr,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/7] block: Add 'zoned' queue limit

2016-09-29 Thread Damien Le Moal
Add the zoned queue limit to indicate the zoning model of a block device.
Defined values are 0 (BLK_ZONED_NONE) for regular block devices,
1 (BLK_ZONED_HA) for host-aware zone block devices and 2 (BLK_ZONED_HM)
for host-managed zone block devices. The standards defined drive managed
model is not defined here since these block devices do not provide any
command for accessing zone information. Drive managed model devices will
be reported as BLK_ZONED_NONE.

The helper functions blk_queue_zoned_model and bdev_zoned_model return
the zoned limit and the functions blk_queue_is_zoned and bdev_is_zoned
return a boolean for callers to test if a block device is zoned.

The zoned attribute is also exported as a string to applications via
sysfs. BLK_ZONED_NONE shows as "none", BLK_ZONED_HA as "host-aware" and
BLK_ZONED_HM as "host-managed".

Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Shaun Tancheff 
Tested-by: Shaun Tancheff 

---
 Documentation/ABI/testing/sysfs-block | 16 
 block/blk-settings.c  |  1 +
 block/blk-sysfs.c | 18 ++
 include/linux/blkdev.h| 47 +++
 4 files changed, 82 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block 
b/Documentation/ABI/testing/sysfs-block
index 71d184d..75a5055 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -235,3 +235,19 @@ Description:
write_same_max_bytes is 0, write same is not supported
by the device.
 
+What:  /sys/block//queue/zoned
+Date:  September 2016
+Contact:   Damien Le Moal 
+Description:
+   zoned indicates if the device is a zoned block device
+   and the zone model of the device if it is indeed zoned.
+   The possible values indicated by zoned are "none" for
+   regular block devices and "host-aware" or "host-managed"
+   for zoned block devices. The characteristics of
+   host-aware and host-managed zoned block devices are
+   described in the ZBC (Zoned Block Commands) and ZAC
+   (Zoned Device ATA Command Set) standards. These standards
+   also define the "drive-managed" zone model. However,
+   since drive-managed zoned block devices do not support
+   zone commands, they will be treated as regular block
+   devices and zoned will report "none".
diff --git a/block/blk-settings.c b/block/blk-settings.c
index f679ae1..b1d5b7f 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -107,6 +107,7 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->io_opt = 0;
lim->misaligned = 0;
lim->cluster = 1;
+   lim->zoned = BLK_ZONED_NONE;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9cc8d7c..ff9cd9c 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -257,6 +257,18 @@ QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
+static ssize_t queue_zoned_show(struct request_queue *q, char *page)
+{
+   switch (blk_queue_zoned_model(q)) {
+   case BLK_ZONED_HA:
+   return sprintf(page, "host-aware\n");
+   case BLK_ZONED_HM:
+   return sprintf(page, "host-managed\n");
+   default:
+   return sprintf(page, "none\n");
+   }
+}
+
 static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
 {
return queue_var_show((blk_queue_nomerges(q) << 1) |
@@ -485,6 +497,11 @@ static struct queue_sysfs_entry queue_nonrot_entry = {
.store = queue_store_nonrot,
 };
 
+static struct queue_sysfs_entry queue_zoned_entry = {
+   .attr = {.name = "zoned", .mode = S_IRUGO },
+   .show = queue_zoned_show,
+};
+
 static struct queue_sysfs_entry queue_nomerges_entry = {
.attr = {.name = "nomerges", .mode = S_IRUGO | S_IWUSR },
.show = queue_nomerges_show,
@@ -546,6 +563,7 @@ static struct attribute *default_attrs[] = {
&queue_discard_zeroes_data_entry.attr,
&queue_write_same_max_entry.attr,
&queue_nonrot_entry.attr,
+   &queue_zoned_entry.attr,
&queue_nomerges_entry.attr,
&queue_rq_affinity_entry.attr,
&queue_iostats_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..f19e16b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -261,6 +261,15 @@ struct blk_queue_tag {
 #define BLK_SCSI_MAX_CMDS  (256)
 #define BLK_SCSI_CMD_PER_LONG  (BLK_SCSI_MAX_CMDS / (sizeof(long) * 8))
 
+/*
+ * Zoned block device models (zoned limit).
+ */
+enum blk_zoned_model {
+   BLK_ZONED_NONE, /* Regular block device */
+   BLK_ZONED_HA,   /* Host-aware zoned block device */
+   BLK_ZONED_HM,   /* Host-managed

[PATCH v5 3/7] block: update chunk_sectors in blk_stack_limits()

2016-09-29 Thread Damien Le Moal
From: Hannes Reinecke 

Signed-off-by: Hannes Reinecke 
Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Shaun Tancheff 
Tested-by: Shaun Tancheff 

---
 block/blk-settings.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b1d5b7f..55369a6 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -631,6 +631,10 @@ int blk_stack_limits(struct queue_limits *t, struct 
queue_limits *b,
t->discard_granularity;
}
 
+   if (b->chunk_sectors)
+   t->chunk_sectors = min_not_zero(t->chunk_sectors,
+   b->chunk_sectors);
+
return ret;
 }
 EXPORT_SYMBOL(blk_stack_limits);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 6/7] sd: Implement support for ZBC devices

2016-09-29 Thread Damien Le Moal
From: Hannes Reinecke 

Implement ZBC support functions to setup zoned disks, both
host-managed and host-aware models. Only zoned disks that satisfy
the following conditions are supported:
1) All zones are the same size, with the exception of an eventual
   last smaller runt zone.
2) For host-managed disks, reads are unrestricted (reads are not
   failed due to zone or write pointer alignement constraints).
Zoned disks that do not satisfy these 2 conditions are setup with
a capacity of 0 to prevent their use.

The function sd_zbc_read_zones, called from sd_revalidate_disk,
checks that the device satisfies the above two constraints. This
function may also change the disk capacity previously set by
sd_read_capacity for devices reporting only the capacity of
conventional zones at the beginning of the LBA range (i.e. devices
reporting rc_basis set to 0).

The capacity message output was moved out of sd_read_capacity into
a new function sd_print_capacity to include this eventual capacity
change by sd_zbc_read_zones. This new function also includes a call
to sd_zbc_print_zones to display the number of zones and zone size
of the device.

Signed-off-by: Hannes Reinecke 

[Damien: * Removed zone cache support
 * Removed mapping of discard to reset write pointer command
 * Modified sd_zbc_read_zones to include checks that the
   device satisfies the kernel constraints
 * Implemeted REPORT ZONES setup and post-processing based
   on code from Shaun Tancheff 
 * Removed confusing use of 512B sector units in functions
   interface]
Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Shaun Tancheff 
Tested-by: Shaun Tancheff 

---
 drivers/scsi/Makefile |   1 +
 drivers/scsi/sd.c | 141 ---
 drivers/scsi/sd.h |  67 +
 drivers/scsi/sd_zbc.c | 627 ++
 include/scsi/scsi_proto.h |  17 ++
 5 files changed, 820 insertions(+), 33 deletions(-)
 create mode 100644 drivers/scsi/sd_zbc.c

diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index fc0d9b8..350513c 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -180,6 +180,7 @@ hv_storvsc-y:= storvsc_drv.o
 
 sd_mod-objs:= sd.o
 sd_mod-$(CONFIG_BLK_DEV_INTEGRITY) += sd_dif.o
+sd_mod-$(CONFIG_BLK_DEV_ZONED) += sd_zbc.o
 
 sr_mod-objs:= sr.o sr_ioctl.o sr_vendor.o
 ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 51e5629..20889b5 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -93,6 +93,7 @@ MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK15_MAJOR);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
+MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
 
 #if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
 #define SD_MINORS  16
@@ -163,7 +164,7 @@ cache_type_store(struct device *dev, struct 
device_attribute *attr,
static const char temp[] = "temporary ";
int len;
 
-   if (sdp->type != TYPE_DISK)
+   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
/* no cache control on RBC devices; theoretically they
 * can do it, but there's probably so many exceptions
 * it's not worth the risk */
@@ -262,7 +263,7 @@ allow_restart_store(struct device *dev, struct 
device_attribute *attr,
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
 
-   if (sdp->type != TYPE_DISK)
+   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
return -EINVAL;
 
sdp->allow_restart = simple_strtoul(buf, NULL, 10);
@@ -392,6 +393,11 @@ provisioning_mode_store(struct device *dev, struct 
device_attribute *attr,
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
 
+   if (sd_is_zoned(sdkp)) {
+   sd_config_discard(sdkp, SD_LBP_DISABLE);
+   return count;
+   }
+
if (sdp->type != TYPE_DISK)
return -EINVAL;
 
@@ -459,7 +465,7 @@ max_write_same_blocks_store(struct device *dev, struct 
device_attribute *attr,
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
 
-   if (sdp->type != TYPE_DISK)
+   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
return -EINVAL;
 
err = kstrtoul(buf, 10, &max);
@@ -844,6 +850,12 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
 
BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
 
+   if (sd_is_zoned(sdkp)) {
+   ret = sd_zbc_setup_read_write(cmd);
+   if (ret != BLKPREP_OK)
+   return ret;
+   }
+
sector >>= ilog2(sdp->sector_size) - 9;
nr_sectors >>= ilog2(sdp->sector_size) - 9;
 
@@ -963,6 +975,12 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd 
*SCpnt)
SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "block=%llu\n",
   

[PATCH v5 7/7] blk-zoned: implement ioctls

2016-09-29 Thread Damien Le Moal
From: Shaun Tancheff 

Adds the new BLKREPORTZONE and BLKRESETZONE ioctls for respectively
obtaining the zone configuration of a zoned block device and resetting
the write pointer of sequential zones of a zoned block device.

The BLKREPORTZONE ioctl maps directly to a single call of the function
blkdev_report_zones. The zone information result is passed as an array
of struct blk_zone identical to the structure used internally for
processing the REQ_OP_ZONE_REPORT operation.  The BLKRESETZONE ioctl
maps to a single call of the blkdev_reset_zones function.

Signed-off-by: Shaun Tancheff 
Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Martin K. Petersen 

---
 block/blk-zoned.c | 93 +++
 block/ioctl.c |  4 ++
 include/linux/blkdev.h| 21 ++
 include/uapi/linux/blkzoned.h | 40 +++
 include/uapi/linux/fs.h   |  4 ++
 5 files changed, 162 insertions(+)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 1603573..667f95d 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -255,3 +255,96 @@ int blkdev_reset_zones(struct block_device *bdev,
return 0;
 }
 EXPORT_SYMBOL_GPL(blkdev_reset_zones);
+
+/**
+ * BLKREPORTZONE ioctl processing.
+ * Called from blkdev_ioctl.
+ */
+int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
+ unsigned int cmd, unsigned long arg)
+{
+   void __user *argp = (void __user *)arg;
+   struct request_queue *q;
+   struct blk_zone_report rep;
+   struct blk_zone *zones;
+   int ret;
+
+   if (!argp)
+   return -EINVAL;
+
+   q = bdev_get_queue(bdev);
+   if (!q)
+   return -ENXIO;
+
+   if (!blk_queue_is_zoned(q))
+   return -ENOTTY;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EACCES;
+
+   if (copy_from_user(&rep, argp, sizeof(struct blk_zone_report)))
+   return -EFAULT;
+
+   if (!rep.nr_zones)
+   return -EINVAL;
+
+   zones = kcalloc(rep.nr_zones, sizeof(struct blk_zone), GFP_KERNEL);
+   if (!zones)
+   return -ENOMEM;
+
+   ret = blkdev_report_zones(bdev, rep.sector,
+ zones, &rep.nr_zones,
+ GFP_KERNEL);
+   if (ret)
+   goto out;
+
+   if (copy_to_user(argp, &rep, sizeof(struct blk_zone_report))) {
+   ret = -EFAULT;
+   goto out;
+   }
+
+   if (rep.nr_zones) {
+   if (copy_to_user(argp + sizeof(struct blk_zone_report), zones,
+sizeof(struct blk_zone) * rep.nr_zones))
+   ret = -EFAULT;
+   }
+
+ out:
+   kfree(zones);
+
+   return ret;
+}
+
+/**
+ * BLKRESETZONE ioctl processing.
+ * Called from blkdev_ioctl.
+ */
+int blkdev_reset_zones_ioctl(struct block_device *bdev, fmode_t mode,
+unsigned int cmd, unsigned long arg)
+{
+   void __user *argp = (void __user *)arg;
+   struct request_queue *q;
+   struct blk_zone_range zrange;
+
+   if (!argp)
+   return -EINVAL;
+
+   q = bdev_get_queue(bdev);
+   if (!q)
+   return -ENXIO;
+
+   if (!blk_queue_is_zoned(q))
+   return -ENOTTY;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EACCES;
+
+   if (!(mode & FMODE_WRITE))
+   return -EBADF;
+
+   if (copy_from_user(&zrange, argp, sizeof(struct blk_zone_range)))
+   return -EFAULT;
+
+   return blkdev_reset_zones(bdev, zrange.sector, zrange.nr_sectors,
+ GFP_KERNEL);
+}
diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..448f78a 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -513,6 +513,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, 
unsigned cmd,
BLKDEV_DISCARD_SECURE);
case BLKZEROOUT:
return blk_ioctl_zeroout(bdev, mode, arg);
+   case BLKREPORTZONE:
+   return blkdev_report_zones_ioctl(bdev, mode, cmd, arg);
+   case BLKRESETZONE:
+   return blkdev_reset_zones_ioctl(bdev, mode, cmd, arg);
case HDIO_GETGEO:
return blkdev_getgeo(bdev, argp);
case BLKRAGET:
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 252043f..90097dd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -316,6 +316,27 @@ extern int blkdev_report_zones(struct block_device *bdev,
 extern int blkdev_reset_zones(struct block_device *bdev, sector_t sectors,
  sector_t nr_sectors, gfp_t gfp_mask);
 
+extern int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
+unsigned int cmd, unsigned long arg);
+extern int blkdev_reset_zones_ioctl(struct block_device *bdev, fmode_

[PATCH v5 5/7] block: Implement support for zoned block devices

2016-09-29 Thread Damien Le Moal
From: Hannes Reinecke 

Implement zoned block device zone information reporting and reset.
Zone information are reported as struct blk_zone. This implementation
does not differentiate between host-aware and host-managed device
models and is valid for both. Two functions are provided:
blkdev_report_zones for discovering the zone configuration of a
zoned block device, and blkdev_reset_zones for resetting the write
pointer of sequential zones. The helper function blk_queue_zone_size
and bdev_zone_size are also provided for, as the name suggest,
obtaining the zone size (in 512B sectors) of the zones of the device.

Signed-off-by: Hannes Reinecke 

[Damien: * Removed the zone cache
 * Implement report zones operation based on earlier proposal
   by Shaun Tancheff ]
Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Shaun Tancheff 
Tested-by: Shaun Tancheff 

---
 block/Kconfig |   8 ++
 block/Makefile|   1 +
 block/blk-zoned.c | 257 ++
 include/linux/blkdev.h|  31 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned.h | 103 +
 6 files changed, 401 insertions(+)
 create mode 100644 block/blk-zoned.c
 create mode 100644 include/uapi/linux/blkzoned.h

diff --git a/block/Kconfig b/block/Kconfig
index 1d4d624..6b0ad08 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -89,6 +89,14 @@ config BLK_DEV_INTEGRITY
T10/SCSI Data Integrity Field or the T13/ATA External Path
Protection.  If in doubt, say N.
 
+config BLK_DEV_ZONED
+   bool "Zoned block device support"
+   ---help---
+   Block layer zoned block device support. This option enables
+   support for ZAC/ZBC host-managed and host-aware zoned block devices.
+
+   Say yes here if you have a ZAC or ZBC storage device.
+
 config BLK_DEV_THROTTLING
bool "Block layer bio throttling support"
depends on BLK_CGROUP=y
diff --git a/block/Makefile b/block/Makefile
index 36acdd7..9371bc7 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -22,4 +22,5 @@ obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
 obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
 obj-$(CONFIG_BLK_CMDLINE_PARSER)   += cmdline-parser.o
 obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
+obj-$(CONFIG_BLK_DEV_ZONED)+= blk-zoned.o
 obj-$(CONFIG_BLK_MQ_PCI)   += blk-mq-pci.o
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
new file mode 100644
index 000..1603573
--- /dev/null
+++ b/block/blk-zoned.c
@@ -0,0 +1,257 @@
+/*
+ * Zoned block device handling
+ *
+ * Copyright (c) 2015, Hannes Reinecke
+ * Copyright (c) 2015, SUSE Linux GmbH
+ *
+ * Copyright (c) 2016, Damien Le Moal
+ * Copyright (c) 2016, Western Digital
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+static inline sector_t blk_zone_start(struct request_queue *q,
+ sector_t sector)
+{
+   sector_t zone_mask = blk_queue_zone_size(q) - 1;
+
+   return sector & ~zone_mask;
+}
+
+/*
+ * Check that a zone report belongs to the partition.
+ * If yes, fix its start sector and write pointer, copy it in the
+ * zone information array and return true. Return false otherwise.
+ */
+static bool blkdev_report_zone(struct block_device *bdev,
+  struct blk_zone *rep,
+  struct blk_zone *zone)
+{
+   sector_t offset = get_start_sect(bdev);
+
+   if (rep->start < offset)
+   return false;
+
+   rep->start -= offset;
+   if (rep->start + rep->len > bdev->bd_part->nr_sects)
+   return false;
+
+   if (rep->type == BLK_ZONE_TYPE_CONVENTIONAL)
+   rep->wp = rep->start + rep->len;
+   else
+   rep->wp -= offset;
+   memcpy(zone, rep, sizeof(struct blk_zone));
+
+   return true;
+}
+
+/**
+ * blkdev_report_zones - Get zones information
+ * @bdev:  Target block device
+ * @sector:Sector from which to report zones
+ * @zones: Array of zone structures where to return the zones information
+ * @nr_zones:  Number of zone structures in the zone array
+ * @gfp_mask:  Memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *Get zone information starting from the zone containing @sector.
+ *The number of zone information reported may be less than the number
+ *requested by @nr_zones. The number of zones actually reported is
+ *returned in @nr_zones.
+ */
+int blkdev_report_zones(struct block_device *bdev,
+   sector_t sector,
+   struct blk_zone *zones,
+   unsigned int *nr_zones,
+   gfp_t gfp_mask)
+{
+   struct request_queue *q = bdev_get_queue(bdev);
+   struct blk_zone_report_hdr *hdr;
+   unsigned int nrz = *nr_zones;
+   struct page *page;
+   unsigned int nr_rep;
+   

[PATCH v5 4/7] block: Define zoned block device operations

2016-09-29 Thread Damien Le Moal
From: Shaun Tancheff 

Define REQ_OP_ZONE_REPORT and REQ_OP_ZONE_RESET for handling zones of
host-managed and host-aware zoned block devices. With with these two
new operations, the total number of operations defined reaches 8 and
still fits with the 3 bits definition of REQ_OP_BITS.

Signed-off-by: Shaun Tancheff 
Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Martin K. Petersen 

---
 block/blk-core.c  | 4 
 include/linux/blk_types.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 14d7c07..e4eda5d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1941,6 +1941,10 @@ generic_make_request_checks(struct bio *bio)
case REQ_OP_WRITE_SAME:
if (!bdev_write_same(bio->bi_bdev))
goto not_supported;
+   case REQ_OP_ZONE_REPORT:
+   case REQ_OP_ZONE_RESET:
+   if (!bdev_is_zoned(bio->bi_bdev))
+   goto not_supported;
break;
default:
break;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cd395ec..dd50dce 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -243,6 +243,8 @@ enum req_op {
REQ_OP_SECURE_ERASE,/* request to securely erase sectors */
REQ_OP_WRITE_SAME,  /* write same block many times */
REQ_OP_FLUSH,   /* request for cache flush */
+   REQ_OP_ZONE_REPORT, /* Get zone information */
+   REQ_OP_ZONE_RESET,  /* Reset a zone write pointer */
 };
 
 #define REQ_OP_BITS 3
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 0/7] ZBC / Zoned block device support

2016-09-29 Thread Damien Le Moal
This series introduces support for zoned block devices. It integrates
earlier submissions by Hannes Reinecke and Shaun Tancheff. Compared to the
previous series version, the code was significantly simplified by limiting
support to zoned devices satisfying the following conditions:
1) All zones of the device are the same size, with the exception of an
   eventual last smaller runt zone.
2) For host-managed disks, reads must be unrestricted (read commands do not
   fail due to zone or write pointer alignement constraints).
Zoned disks that do not satisfy these 2 conditions are ignored.

These 2 conditions allowed dropping the zone information cache implemented
in the previous version. This simplifies the code and also reduces the memory
consumption at run time. Support for zoned devices now only require one bit
per zone (less than 8KB in total). This bit field is used to write-lock
zones and prevent the concurrent execution of multiple write commands in
the same zone. This avoids write ordering problems at dispatch time, for
both the simple queue and scsi-mq settings.

The new operations introduced to suport zone manipulation was reduced to
only the two main ZBC/ZAC defined commands: REPORT ZONES (REQ_OP_ZONE_REPORT)
and RESET WRITE POINTER (REQ_OP_ZONE_RESET). This brings the total number of
operations defined to 8, which fits in the 3 bits (REQ_OP_BITS) reserved for
operation code in bio->bi_opf and req->cmd_flags.

Most of the ZBC specific code is kept out of sd.c and implemented in the
new file sd_zbc.c. Similarly, at the block layer, most of the zoned block
device code is implemented in the new blk-zoned.c.

For host-managed zoned block devices, the sequential write constraint of
write pointer zones is exposed to the user. Users of the disk (applications,
file systems or device mappers) must sequentially write to zones. This means
that for raw block device accesses from applications, buffered writes are
unreliable and direct I/Os must be used (or buffered writes with O_SYNC).

Access to zone manipulation operations is also provided to applications
through a set of new ioctls. This allows applications operating on raw
block devices (e.g. mkfs.xxx) to discover a device zone layout and
manipulate zone state.

Changes from v4:
* Changed interface of sd_zbc_setup_read_write

Changes from v3:
* Fixed several typos and tabs/spaces
* Added description of zoned and chunk_sectors queue attributes in
  Documentation/ABI/testing/sysfs-block
* Fixed sd_read_capacity call in sd.c and to avoid missing information on
  the first pass of a disk scan
* Fixed scsi_disk zone related field to use logical block size unit instead
  of 512B sector unit.

Changes from v2:
* Use kcalloc to allocate zone information array for ioctl
* Use kcalloc to allocate zone information array for ioctl
* Export GPL the functions blkdev_report_zones and blkdev_reset_zones
* Shuffled uapi definitions from patch 7 into patch 5

Damien Le Moal (1):
  block: Add 'zoned' queue limit

Hannes Reinecke (4):
  blk-sysfs: Add 'chunk_sectors' to sysfs attributes
  block: update chunk_sectors in blk_stack_limits()
  block: Implement support for zoned block devices
  sd: Implement support for ZBC devices

Shaun Tancheff (2):
  block: Define zoned block device operations
  blk-zoned: implement ioctls

 Documentation/ABI/testing/sysfs-block |  29 ++
 block/Kconfig |   8 +
 block/Makefile|   1 +
 block/blk-core.c  |   4 +
 block/blk-settings.c  |   5 +
 block/blk-sysfs.c |  29 ++
 block/blk-zoned.c | 350 +++
 block/ioctl.c |   4 +
 drivers/scsi/Makefile |   1 +
 drivers/scsi/sd.c | 141 ++--
 drivers/scsi/sd.h |  67 
 drivers/scsi/sd_zbc.c | 627 ++
 include/linux/blk_types.h |   2 +
 include/linux/blkdev.h|  99 ++
 include/scsi/scsi_proto.h |  17 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned.h | 143 
 include/uapi/linux/fs.h   |   4 +
 18 files changed, 1499 insertions(+), 33 deletions(-)
 create mode 100644 block/blk-zoned.c
 create mode 100644 drivers/scsi/sd_zbc.c
 create mode 100644 include/uapi/linux/blkzoned.h

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 6/7] sd: Implement support for ZBC devices

2016-09-29 Thread Martin K. Petersen
> "Damien" == Damien Le Moal  writes:

Damien,

Almost there! And A-OK on the read capacity changes.

However:

@@ -844,6 +850,13 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
 
BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
 
+   if (sd_is_zoned(sdkp)) {
+   /* sd_zbc_setup_read_write uses block layer sector units */

That comment really says: "I am doing confusing stuff that doesn't
follow the normal calling convention in the driver". Plus it's another
case of using block layer sectors where they shouldn't be.

Please just pass the scsi_cmnd to sd_zbc_set_read_write() like it's done
for sd_zbc_setup_reset_cmnd() and the regular sd_setup_* calls. And then
no commentary is necessary...

+   ret = sd_zbc_setup_read_write(sdkp, rq, sector, nr_sectors);
+   if (ret != BLKPREP_OK)
+   return ret;
+   }
+

@@ -963,6 +976,13 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd 
*SCpnt)
SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "block=%llu\n",
(unsigned long long)block));
 
+   if (sd_is_zoned(sdkp)) {
+   /* sd_zbc_setup_read_write uses block layer sector units */
+   ret = sd_zbc_setup_read_write(sdkp, rq, block, this_count);
+   if (ret != BLKPREP_OK)
+   goto out;
+   }
+

Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 5/7] block: Implement support for zoned block devices

2016-09-29 Thread Martin K. Petersen
> "Damien" == Damien Le Moal  writes:

Damien> Implement zoned block device zone information reporting and
Damien> reset.  Zone information are reported as struct blk_zone. This
Damien> implementation does not differentiate between host-aware and
Damien> host-managed device models and is valid for both. Two functions
Damien> are provided: blkdev_report_zones for discovering the zone
Damien> configuration of a zoned block device, and blkdev_reset_zones
Damien> for resetting the write pointer of sequential zones. The helper
Damien> function blk_queue_zone_size and bdev_zone_size are also
Damien> provided for, as the name suggest, obtaining the zone size (in
Damien> 512B sectors) of the zones of the device.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 4/7] block: Define zoned block device operations

2016-09-29 Thread Martin K. Petersen
> "Damien" == Damien Le Moal  writes:

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/7] block: update chunk_sectors in blk_stack_limits()

2016-09-29 Thread Martin K. Petersen
> "Damien" == Damien Le Moal  writes:

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes

2016-09-29 Thread Martin K. Petersen
> "Damien" == Damien Le Moal  writes:

Damien> From: Hannes Reinecke  The queue limits already
Damien> have a 'chunk_sectors' setting, so we should be presenting it
Damien> via sysfs.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/7] block: Add 'zoned' queue limit

2016-09-29 Thread Martin K. Petersen
> "Damien" == Damien Le Moal  writes:

Damien> Add the zoned queue limit to indicate the zoning model of a
Damien> block device.  Defined values are 0 (BLK_ZONED_NONE) for regular
Damien> block devices, 1 (BLK_ZONED_HA) for host-aware zone block
Damien> devices and 2 (BLK_ZONED_HM) for host-managed zone block
Damien> devices. The standards defined drive managed model is not
Damien> defined here since these block devices do not provide any
Damien> command for accessing zone information. Drive managed model
Damien> devices will be reported as BLK_ZONED_NONE.

Damien> The helper functions blk_queue_zoned_model and bdev_zoned_model
Damien> return the zoned limit and the functions blk_queue_is_zoned and
Damien> bdev_is_zoned return a boolean for callers to test if a block
Damien> device is zoned.

Damien> The zoned attribute is also exported as a string to applications
Damien> via sysfs. BLK_ZONED_NONE shows as "none", BLK_ZONED_HA as
Damien> "host-aware" and BLK_ZONED_HM as "host-managed".

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()

2016-09-29 Thread Ming Lei
On Thu, Sep 29, 2016 at 7:59 AM, Bart Van Assche
 wrote:
> blk_quiesce_queue() prevents that new queue_rq() invocations

blk_mq_quiesce_queue()

> occur and waits until ongoing invocations have finished. This
> function does *not* wait until all outstanding requests have

I guess it still may wait finally since this way may exhaust tag space
easily, :-)

> finished (this means invocation of request.end_io()).
> blk_resume_queue() resumes normal I/O processing.
>
> Signed-off-by: Bart Van Assche 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> ---
>  block/blk-mq.c | 137 
> -
>  include/linux/blk-mq.h |   2 +
>  include/linux/blkdev.h |   5 ++
>  3 files changed, 131 insertions(+), 13 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index d8c45de..f5c71ad 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -58,15 +58,23 @@ static void blk_mq_hctx_clear_pending(struct 
> blk_mq_hw_ctx *hctx,
> sbitmap_clear_bit(&hctx->ctx_map, ctx->index_hw);
>  }
>
> -void blk_mq_freeze_queue_start(struct request_queue *q)
> +static bool __blk_mq_freeze_queue_start(struct request_queue *q,
> +   bool kill_percpu_ref)
>  {
> int freeze_depth;
>
> freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
> if (freeze_depth == 1) {
> -   percpu_ref_kill(&q->q_usage_counter);
> +   if (kill_percpu_ref)
> +   percpu_ref_kill(&q->q_usage_counter);
> blk_mq_run_hw_queues(q, false);
> }
> +   return freeze_depth == 1;
> +}
> +
> +void blk_mq_freeze_queue_start(struct request_queue *q)
> +{
> +   __blk_mq_freeze_queue_start(q, true);
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
>
> @@ -102,19 +110,90 @@ void blk_mq_freeze_queue(struct request_queue *q)
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
>
> -void blk_mq_unfreeze_queue(struct request_queue *q)
> +static bool __blk_mq_unfreeze_queue(struct request_queue *q,
> +   bool reinit_percpu_ref)
>  {
> int freeze_depth;
>
> freeze_depth = atomic_dec_return(&q->mq_freeze_depth);
> WARN_ON_ONCE(freeze_depth < 0);
> if (!freeze_depth) {
> -   percpu_ref_reinit(&q->q_usage_counter);
> +   if (reinit_percpu_ref)
> +   percpu_ref_reinit(&q->q_usage_counter);
> wake_up_all(&q->mq_freeze_wq);
> }
> +   return freeze_depth == 0;
> +}
> +
> +void blk_mq_unfreeze_queue(struct request_queue *q)
> +{
> +   __blk_mq_unfreeze_queue(q, true);
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>
> +/**
> + * blk_mq_quiesce_queue() - wait until all pending queue_rq calls have 
> finished
> + *
> + * Prevent that new I/O requests are queued and wait until all pending
> + * queue_rq() calls have finished. Must not be called if the queue has 
> already
> + * been frozen. Additionally, freezing the queue after having quiesced the
> + * queue and before resuming the queue is not allowed.
> + *
> + * Note: this function does not prevent that the struct request end_io()
> + * callback function is invoked.
> + */
> +void blk_mq_quiesce_queue(struct request_queue *q)
> +{
> +   struct blk_mq_hw_ctx *hctx;
> +   unsigned int i;
> +   bool res, rcu = false;
> +
> +   spin_lock_irq(q->queue_lock);
> +   WARN_ON_ONCE(blk_queue_quiescing(q));
> +   queue_flag_set(QUEUE_FLAG_QUIESCING, q);
> +   spin_unlock_irq(q->queue_lock);
> +
> +   res = __blk_mq_freeze_queue_start(q, false);

Looks the implementation is a bit tricky and complicated, if the percpu
usage counter isn't killed, it isn't necessary to touch .mq_freeze_depth
since you use QUEUE_FLAG_QUIESCING to set/get this status of
the queue.

Then using synchronize_rcu() and rcu_read_lock()/rcu_read_unlock()
with the flag of QUIESCING may be enough to wait for completing
of ongoing invocations of .queue_rq() and avoid to run new .queue_rq,
right?

> +   WARN_ON_ONCE(!res);
> +   queue_for_each_hw_ctx(q, hctx, i) {
> +   if (hctx->flags & BLK_MQ_F_BLOCKING) {
> +   mutex_lock(&hctx->queue_rq_mutex);
> +   mutex_unlock(&hctx->queue_rq_mutex);

Could you explain a bit why all BLOCKING is treated so special? And
that flag just means the hw queue always need to schedule asynchronously,
and of couse even though the flag isn't set, it still may be run
asynchronously too. So looks it should be enough to just use RCU.

> +   } else {
> +   rcu = true;
> +   }
> +   }
> +   if (rcu)
> +   synchronize_rcu();
> +
> +   spin_lock_irq(q->queue_lock);
> +   WARN_ON_ONCE(!blk_queue_quiescing(q));
> +   queue_flag_clear(QUEUE_FLAG_QUIESCING, q);
> +   spin_unlock_irq(q->queue_lock);
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
> +
> +/**
> + 

[PATCH 3/3] block: implement (some of) fallocate for block devices

2016-09-29 Thread Darrick J. Wong
After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

Signed-off-by: Darrick J. Wong 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Bart Van Assche 
---
 fs/block_dev.c |   77 
 fs/open.c  |3 +-
 2 files changed, 79 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 08ae993..777fd9b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "internal.h"
 
@@ -1787,6 +1788,81 @@ static const struct address_space_operations 
def_blk_aops = {
.is_dirty_writeback = buffer_check_dirty_writeback,
 };
 
+#defineBLKDEV_FALLOC_FL_SUPPORTED  
\
+   (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |   \
+FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+loff_t len)
+{
+   struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+   struct request_queue *q = bdev_get_queue(bdev);
+   struct address_space *mapping;
+   loff_t end = start + len - 1;
+   loff_t isize;
+   int error;
+
+   /* Fail if we don't recognize the flags. */
+   if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+   return -EOPNOTSUPP;
+
+   /* Don't go off the end of the device. */
+   isize = i_size_read(bdev->bd_inode);
+   if (start >= isize)
+   return -EINVAL;
+   if (end >= isize) {
+   if (mode & FALLOC_FL_KEEP_SIZE) {
+   len = isize - start;
+   end = start + len - 1;
+   } else
+   return -EINVAL;
+   }
+
+   /*
+* Don't allow IO that isn't aligned to logical block size.
+*/
+   if ((start | len) & (bdev_logical_block_size(bdev) - 1))
+   return -EINVAL;
+
+   /* Invalidate the page cache, including dirty pages. */
+   mapping = bdev->bd_inode->i_mapping;
+   truncate_inode_pages_range(mapping, start, end);
+
+   switch (mode) {
+   case FALLOC_FL_ZERO_RANGE:
+   case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+   error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+   GFP_KERNEL, false);
+   break;
+   case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+   /* Only punch if the device can do zeroing discard. */
+   if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+   return -EOPNOTSUPP;
+   error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+GFP_KERNEL, 0);
+   break;
+   case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | 
FALLOC_FL_NO_HIDE_STALE:
+   if (!blk_queue_discard(q))
+   return -EOPNOTSUPP;
+   error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+GFP_KERNEL, 0);
+   break;
+   default:
+   return -EOPNOTSUPP;
+   }
+   if (error)
+   return error;
+
+   /*
+* Invalidate again; if someone wandered in and dirtied a page,
+* the caller will be given -EBUSY.  The third argument is
+* inclusive, so the rounding here is safe.
+*/
+   return invalidate_inode_pages2_range(mapping,
+start >> PAGE_SHIFT,
+end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
.open   = blkdev_open,
.release= blkdev_close,
@@ -1801,6 +1877,7 @@ const struct file_operations def_blk_fops = {
 #endif
.splice_read= generic_file_splice_read,
.splice_write   = iter_file_splice_write,
+   .fallocate  = blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 4fd6e25..01b6092 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t 
offset, loff_t len)
 * Let individual file system decide if it s

[PATCH 2/3] block: require write_same and discard requests align to logical block size

2016-09-29 Thread Darrick J. Wong
Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Bart Van Assche 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Hannes Reinecke 
---
 block/blk-lib.c |   15 +++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 083e56f..46fe924 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -31,6 +31,7 @@ int __blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
unsigned int granularity;
enum req_op op;
int alignment;
+   sector_t bs_mask;
 
if (!q)
return -ENXIO;
@@ -50,6 +51,10 @@ int __blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
op = REQ_OP_DISCARD;
}
 
+   bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+   if ((sector | nr_sects) & bs_mask)
+   return -EINVAL;
+
/* Zero-sector (unknown) and one-sector granularities are the same.  */
granularity = max(q->limits.discard_granularity >> 9, 1U);
alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -150,10 +155,15 @@ int blkdev_issue_write_same(struct block_device *bdev, 
sector_t sector,
unsigned int max_write_same_sectors;
struct bio *bio = NULL;
int ret = 0;
+   sector_t bs_mask;
 
if (!q)
return -ENXIO;
 
+   bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+   if ((sector | nr_sects) & bs_mask)
+   return -EINVAL;
+
/* Ensure that max_write_same_sectors doesn't overflow bi_size */
max_write_same_sectors = UINT_MAX >> 9;
 
@@ -202,6 +212,11 @@ static int __blkdev_issue_zeroout(struct block_device 
*bdev, sector_t sector,
int ret;
struct bio *bio = NULL;
unsigned int sz;
+   sector_t bs_mask;
+
+   bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+   if ((sector | nr_sects) & bs_mask)
+   return -EINVAL;
 
while (nr_sects != 0) {
bio = next_bio(bio, min(nr_sects, (sector_t)BIO_MAX_PAGES),

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.

2016-09-29 Thread Darrick J. Wong
Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.

Signed-off-by: Darrick J. Wong 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Bart Van Assche 
Reviewed-by: Hannes Reinecke 
---
 block/ioctl.c |   18 --
 1 file changed, 12 insertions(+), 6 deletions(-)


diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..755119c 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -225,7 +225,8 @@ static int blk_ioctl_zeroout(struct block_device *bdev, 
fmode_t mode,
unsigned long arg)
 {
uint64_t range[2];
-   uint64_t start, len;
+   struct address_space *mapping;
+   uint64_t start, end, len;
 
if (!(mode & FMODE_WRITE))
return -EBADF;
@@ -235,18 +236,23 @@ static int blk_ioctl_zeroout(struct block_device *bdev, 
fmode_t mode,
 
start = range[0];
len = range[1];
+   end = start + len - 1;
 
if (start & 511)
return -EINVAL;
if (len & 511)
return -EINVAL;
-   start >>= 9;
-   len >>= 9;
-
-   if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+   if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+   return -EINVAL;
+   if (end < start)
return -EINVAL;
 
-   return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+   /* Invalidate the page cache, including dirty pages */
+   mapping = bdev->bd_inode->i_mapping;
+   truncate_inode_pages_range(mapping, start, end);
+
+   return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+   false);
 }
 
 static int put_ushort(unsigned long arg, unsigned short val)

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v11 0/3] fallocate for block devices

2016-09-29 Thread Darrick J. Wong
Hi Andrew & others,

This is a resend of the patchset to fix page cache coherency with
BLKZEROOUT and implement fallocate for block devices.  This time I'm
sending it direct to Andrew for inclusion because the block layer
maintainer has not been responsive over the past year of submissions.
Can this please go upstream for 4.9?

The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds.  Without this patch we still have the pagecache
coherence bug that's been in the kernel forever.

The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size.  Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.

The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices.  It also allows
the combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing
discard.

Test cases for the new block device fallocate are now in xfstests as
generic/349-351.

Comments and questions are, as always, welcome.  Patches are against
4.8-rc8.

v7: Strengthen parameter checking and fix various code issues pointed
out by Linus and Christoph.
v8: More code rearranging, rebase to 4.6-rc3, and dig into alignment
issues.
v9: Forward port to 4.7.
v10: Forward port to 4.8.  Remove the extra call to
invalidate_inode_pages2_range per Bart Van Assche's request.
v11: Minor fallocate code cleanups suggested by Bart Van Assche.

--D
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] block: implement (some of) fallocate for block devices

2016-09-29 Thread Darrick J. Wong
On Thu, Sep 29, 2016 at 01:08:57PM -0700, Bart Van Assche wrote:
> On 09/28/2016 07:19 PM, Darrick J. Wong wrote:
> >After much discussion, it seems that the fallocate feature flag
> >FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> >FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> >whitelisted for zeroing SCSI UNMAP.  Punch still requires that
> >FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
> >device will be clamped to the device size if KEEP_SIZE is set; or will
> >return -EINVAL if not.  Both start and length must be aligned to the
> >device's logical block size.
> >
> >Since the semantics of fallocate are fairly well established already,
> >wire up the two pieces.  The other fallocate variants (collapse range,
> >insert range, and allocate blocks) are not supported.
> 
> For the FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE
> case, it's probably safer not to try to send a discard to block devices that
> do not support discard in order not to hit block driver bugs. But that's
> something we can still discuss later. Hence:

I'll just change it to check the queue flags and post a new revision.
At this point I might as well repost the whole thing to reflect the
reviewed-bys.

--D

> Reviewed-by: Bart Van Assche 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] block: implement (some of) fallocate for block devices

2016-09-29 Thread Bart Van Assche

On 09/28/2016 07:19 PM, Darrick J. Wong wrote:

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.


For the FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | 
FALLOC_FL_NO_HIDE_STALE case, it's probably safer not to try to send a 
discard to block devices that do not support discard in order not to hit 
block driver bugs. But that's something we can still discuss later. Hence:


Reviewed-by: Bart Van Assche 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] vfio_pci: use pci_irq_allocate_vectors

2016-09-29 Thread Alex Williamson
On Thu, 29 Sep 2016 21:24:04 +0200
Christoph Hellwig  wrote:

> On Thu, Sep 29, 2016 at 01:21:01PM -0600, Alex Williamson wrote:
> > Sorry for the delay, slipped by me.  Overall a really nice cleanup.
> > One tiny nit, the commit log mis-names the function as
> > pci_irq_allocate_vectors instead of pci_alloc_irq_vectors.  With that,
> > 
> > Acked-by: Alex Williamson 
> > 
> > Let me know if you're wanting me to pull this through my tree, I'm
> > assuming not.  Thanks,  
> 
> Please pull in through your tree.  If you can also just fix up that
> type that'd be even better.

Will do.  Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] vfio_pci: use pci_irq_allocate_vectors

2016-09-29 Thread Christoph Hellwig
On Thu, Sep 29, 2016 at 01:21:01PM -0600, Alex Williamson wrote:
> Sorry for the delay, slipped by me.  Overall a really nice cleanup.
> One tiny nit, the commit log mis-names the function as
> pci_irq_allocate_vectors instead of pci_alloc_irq_vectors.  With that,
> 
> Acked-by: Alex Williamson 
> 
> Let me know if you're wanting me to pull this through my tree, I'm
> assuming not.  Thanks,

Please pull in through your tree.  If you can also just fix up that
type that'd be even better.

Thanks a lot!
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] vfio_pci: use pci_irq_allocate_vectors

2016-09-29 Thread Alex Williamson
On Sun, 11 Sep 2016 15:31:26 +0200
Christoph Hellwig  wrote:

> Simply the interrupt setup by using the new PCI layer helpers.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/vfio/pci/vfio_pci_intrs.c   | 45 
> +
>  drivers/vfio/pci/vfio_pci_private.h |  1 -
>  2 files changed, 10 insertions(+), 36 deletions(-)

Sorry for the delay, slipped by me.  Overall a really nice cleanup.
One tiny nit, the commit log mis-names the function as
pci_irq_allocate_vectors instead of pci_alloc_irq_vectors.  With that,

Acked-by: Alex Williamson 

Let me know if you're wanting me to pull this through my tree, I'm
assuming not.  Thanks,

Alex
 
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
> b/drivers/vfio/pci/vfio_pci_intrs.c
> index 152b438..a1d283e 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -250,6 +250,7 @@ static irqreturn_t vfio_msihandler(int irq, void *arg)
>  static int vfio_msi_enable(struct vfio_pci_device *vdev, int nvec, bool msix)
>  {
>   struct pci_dev *pdev = vdev->pdev;
> + unsigned int flag = msix ? PCI_IRQ_MSIX : PCI_IRQ_MSI;
>   int ret;
>  
>   if (!is_irq_none(vdev))
> @@ -259,35 +260,13 @@ static int vfio_msi_enable(struct vfio_pci_device 
> *vdev, int nvec, bool msix)
>   if (!vdev->ctx)
>   return -ENOMEM;
>  
> - if (msix) {
> - int i;
> -
> - vdev->msix = kzalloc(nvec * sizeof(struct msix_entry),
> -  GFP_KERNEL);
> - if (!vdev->msix) {
> - kfree(vdev->ctx);
> - return -ENOMEM;
> - }
> -
> - for (i = 0; i < nvec; i++)
> - vdev->msix[i].entry = i;
> -
> - ret = pci_enable_msix_range(pdev, vdev->msix, 1, nvec);
> - if (ret < nvec) {
> - if (ret > 0)
> - pci_disable_msix(pdev);
> - kfree(vdev->msix);
> - kfree(vdev->ctx);
> - return ret;
> - }
> - } else {
> - ret = pci_enable_msi_range(pdev, 1, nvec);
> - if (ret < nvec) {
> - if (ret > 0)
> - pci_disable_msi(pdev);
> - kfree(vdev->ctx);
> - return ret;
> - }
> + /* return the number of supported vectors if we can't get all: */
> + ret = pci_alloc_irq_vectors(pdev, 1, nvec, flag);
> + if (ret < nvec) {
> + if (ret > 0)
> + pci_free_irq_vectors(pdev);
> + kfree(vdev->ctx);
> + return ret;
>   }
>  
>   vdev->num_ctx = nvec;
> @@ -315,7 +294,7 @@ static int vfio_msi_set_vector_signal(struct 
> vfio_pci_device *vdev,
>   if (vector < 0 || vector >= vdev->num_ctx)
>   return -EINVAL;
>  
> - irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
> + irq = pci_irq_vector(pdev, vector);
>  
>   if (vdev->ctx[vector].trigger) {
>   free_irq(irq, vdev->ctx[vector].trigger);
> @@ -408,11 +387,7 @@ static void vfio_msi_disable(struct vfio_pci_device 
> *vdev, bool msix)
>  
>   vfio_msi_set_block(vdev, 0, vdev->num_ctx, NULL, msix);
>  
> - if (msix) {
> - pci_disable_msix(vdev->pdev);
> - kfree(vdev->msix);
> - } else
> - pci_disable_msi(pdev);
> + pci_free_irq_vectors(pdev);
>  
>   vdev->irq_type = VFIO_PCI_NUM_IRQS;
>   vdev->num_ctx = 0;
> diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> b/drivers/vfio/pci/vfio_pci_private.h
> index 2128de8..f561ac1 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -72,7 +72,6 @@ struct vfio_pci_device {
>   struct perm_bits*msi_perm;
>   spinlock_t  irqlock;
>   struct mutexigate;
> - struct msix_entry   *msix;
>   struct vfio_pci_irq_ctx *ctx;
>   int num_ctx;
>   int irq_type;

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] genwqe: use pci_irq_allocate_vectors

2016-09-29 Thread Gabriel Krisman Bertazi
Christoph Hellwig  writes:

> On Thu, Sep 29, 2016 at 03:45:29PM -0300, Gabriel Krisman Bertazi wrote:
>> I'm stepping up to assist with the genwqe_card driver just now, since we
>> (ibm) missed some of the last patches that went in.  I'll add myself to
>> maintainers file.
>
> Can your forward it to Greg together with whatever other changes are
> pending for the driver?

sure, will do.

-- 
Gabriel Krisman Bertazi

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] genwqe: use pci_irq_allocate_vectors

2016-09-29 Thread Christoph Hellwig
On Thu, Sep 29, 2016 at 03:45:29PM -0300, Gabriel Krisman Bertazi wrote:
> I'm stepping up to assist with the genwqe_card driver just now, since we
> (ibm) missed some of the last patches that went in.  I'll add myself to
> maintainers file.

Can your forward it to Greg together with whatever other changes are
pending for the driver?
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] genwqe: use pci_irq_allocate_vectors

2016-09-29 Thread Gabriel Krisman Bertazi
Christoph Hellwig  writes:

> On Thu, Sep 29, 2016 at 03:28:02PM -0300, Gabriel Krisman Bertazi wrote:
>> Christoph Hellwig  writes:
>> 
>> > Simply the interrupt setup by using the new PCI layer helpers.
>> 
>> Good clean up.  Tested and:
>> 
>> Acked-by: Gabriel Krisman Bertazi 
>
> Which tree should this go in through?

I'd say Greg's char-misc tree.

I'm stepping up to assist with the genwqe_card driver just now, since we
(ibm) missed some of the last patches that went in.  I'll add myself to
maintainers file.


-- 
Gabriel Krisman Bertazi

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] skd: use pci_alloc_irq_vectors

2016-09-29 Thread Christoph Hellwig
On Sun, Sep 11, 2016 at 03:31:25PM +0200, Christoph Hellwig wrote:
> Switch the skd driver to use pci_alloc_irq_vectors.  We need to two calls to
> pci_alloc_irq_vectors as skd only supports multiple MSI-X vectors, but not
> multiple MSI vectors.
> 
> Otherwise this cleans up a lot of cruft and allows to a lot more common code.
> 
> Signed-off-by: Christoph Hellwig 

Jens, is this something you could pick up for 4.9?  I'd like to get get
rid of all users of pci_enable_msi_range in this merge window.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] genwqe: use pci_irq_allocate_vectors

2016-09-29 Thread Christoph Hellwig
On Thu, Sep 29, 2016 at 03:28:02PM -0300, Gabriel Krisman Bertazi wrote:
> Christoph Hellwig  writes:
> 
> > Simply the interrupt setup by using the new PCI layer helpers.
> 
> Good clean up.  Tested and:
> 
> Acked-by: Gabriel Krisman Bertazi 

Which tree should this go in through?
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] genwqe: use pci_irq_allocate_vectors

2016-09-29 Thread Gabriel Krisman Bertazi
Christoph Hellwig  writes:

> Simply the interrupt setup by using the new PCI layer helpers.

Good clean up.  Tested and:

Acked-by: Gabriel Krisman Bertazi 

> One odd thing about this driver is that it looks like it could request
> multiple MSI vectors, but it will then only ever use a single one.

I'll take a look at this.

-- 
Gabriel Krisman Bertazi

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [PATCH][V3] nbd: add multi-connection support

2016-09-29 Thread Josef Bacik

On 09/29/2016 12:41 PM, Wouter Verhelst wrote:

On Thu, Sep 29, 2016 at 10:03:50AM -0400, Josef Bacik wrote:

So think of it like normal disks with multiple channels.  We don't send flushes
down all the hwq's to make sure they are clear, we leave that decision up to the
application (usually a FS of course).


Well, when I asked earlier, Christoph said[1] that blk-mq assumes that
when a FLUSH is sent over one channel, and the reply comes in, that all
commands which have been received, regardless of which channel they were
received over, have reched disk.

[1] Message-ID: <20160915122304.ga15...@infradead.org>

It is impossible for nbd to make such a guarantee, due to head-of-line
blocking on TCP.



Huh I missed that.  Yeah that's not possible for us for sure, I think my option 
idea is the less awful way forward if we want to address that limitation.  Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [PATCH][V3] nbd: add multi-connection support

2016-09-29 Thread Wouter Verhelst
On Thu, Sep 29, 2016 at 10:03:50AM -0400, Josef Bacik wrote:
> So think of it like normal disks with multiple channels.  We don't send 
> flushes 
> down all the hwq's to make sure they are clear, we leave that decision up to 
> the 
> application (usually a FS of course).

Well, when I asked earlier, Christoph said[1] that blk-mq assumes that
when a FLUSH is sent over one channel, and the reply comes in, that all
commands which have been received, regardless of which channel they were
received over, have reched disk.

[1] Message-ID: <20160915122304.ga15...@infradead.org>

It is impossible for nbd to make such a guarantee, due to head-of-line
blocking on TCP.

[...]
> perhaps we could add a flag later that says send all the flushes down
> all the connections for the paranoid, it should be relatively
> straightforward to do.  Thanks,

That's not an unreasonable approach, I guess.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blkcg: Unlock blkcg_pol_mutex once if cpd == NULL

2016-09-29 Thread Bart Van Assche

On 09/29/2016 03:03 AM, Tejun Heo wrote:

Hello,

On Mon, Sep 26, 2016 at 03:36:25PM -0700, Bart Van Assche wrote:

Unlocking a mutex twice is wrong. Hence modify blkcg_policy_register()
such that blkcg_pol_mutex is unlocked once if cpd == NULL. This patch
avoids that smatch reports the following error:

block/blk-cgroup.c:1378: blkcg_policy_register() error: double unlock 
'mutex:&blkcg_pol_mutex'

Signed-off-by: Bart Van Assche 
Cc: Tejun Heo 
Cc: 
---
 block/blk-cgroup.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index dd38e5c..cdbca1c 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1327,8 +1327,10 @@ int blkcg_policy_register(struct blkcg_policy *pol)
for (i = 0; i < BLKCG_MAX_POLS; i++)
if (!blkcg_policy[i])
break;
-   if (i >= BLKCG_MAX_POLS)
+   if (i >= BLKCG_MAX_POLS) {
+   mutex_unlock(&blkcg_pol_mutex);
goto err_unlock;
+   }


Wouldn't it be better to drop explicit mutx_unlock(&blkcg_pol_mutex)
on "if (!cpd)"?


Agreed. I will rework this patch accordingly. Thanks for the feedback.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] blkcg: Unlock blkcg_pol_mutex once if cpd == NULL

2016-09-29 Thread Bart Van Assche
Unlocking a mutex twice is wrong. Hence modify blkcg_policy_register()
such that blkcg_pol_mutex is unlocked once if cpd == NULL. This patch
avoids that smatch reports the following error:

block/blk-cgroup.c:1378: blkcg_policy_register() error: double unlock 
'mutex:&blkcg_pol_mutex'

Fixes: 06b285bd1125 ("blkcg: fix blkcg_policy_data allocation bug")
Signed-off-by: Bart Van Assche 
Cc: Tejun Heo 
Cc: 
---
 block/blk-cgroup.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index dd38e5c..b08ccbb 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1340,10 +1340,8 @@ int blkcg_policy_register(struct blkcg_policy *pol)
struct blkcg_policy_data *cpd;
 
cpd = pol->cpd_alloc_fn(GFP_KERNEL);
-   if (!cpd) {
-   mutex_unlock(&blkcg_pol_mutex);
+   if (!cpd)
goto err_free_cpds;
-   }
 
blkcg->cpd[pol->plid] = cpd;
cpd->blkcg = blkcg;
-- 
2.10.0


--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] ipr: use pci_irq_allocate_vectors

2016-09-29 Thread Christoph Hellwig
On Thu, Sep 29, 2016 at 09:01:44AM -0500, Brian King wrote:
> Thanks Christoph. Very nice. As I was reviewing the patch, I noticed
> the additional PCI_IRQ_AFFINITY flag, which is currently not being set
> in this patch. Is the intention to set that globally by default, or
> should I follow up with a one liner to add that to the ipr driver
> in the next patch set I send out?

Hi Brian,

PCI_IRQ_AFFINITY seems useful for ipr, especially if you also increase
the number of vectors above the current default 2.  And yes, please
make it a separate patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [PATCH][V3] nbd: add multi-connection support

2016-09-29 Thread Josef Bacik

On 09/29/2016 05:52 AM, Wouter Verhelst wrote:

Hi Josef,

On Wed, Sep 28, 2016 at 04:01:32PM -0400, Josef Bacik wrote:

NBD can become contended on its single connection.  We have to serialize all
writes and we can only process one read response at a time.  Fix this by
allowing userspace to provide multiple connections to a single nbd device.  This
coupled with block-mq drastically increases performance in multi-process cases.
Thanks,


This reminds me: I've been pondering this for a while, and I think there
is no way we can guarantee the correct ordering of FLUSH replies in the
face of multiple connections, since a WRITE reply on one connection may
arrive before a FLUSH reply on another which it does not cover, even if
the server has no cache coherency issues otherwise.

Having said that, there can certainly be cases where that is not a
problem, and where performance considerations are more important than
reliability guarantees; so once this patch lands in the kernel (and the
necessary support patch lands in the userland utilities), I think I'll
just update the documentation to mention the problems that might ensue,
and be done with it.

I can see only a few ways in which to potentially solve this problem:
- Kernel-side nbd-client could send a FLUSH command over every channel,
  and only report successful completion once all replies have been
  received. This might negate some of the performance benefits, however.
- Multiplexing commands over a single connection (perhaps an SCTP one,
  rather than TCP); this would require some effort though, as you said,
  and would probably complicate the protocol significantly.



So think of it like normal disks with multiple channels.  We don't send flushes 
down all the hwq's to make sure they are clear, we leave that decision up to the 
application (usually a FS of course).  So what we're doing here is no worse than 
what every real disk on the planet does, our hw queues are just have a lot 
longer transfer times and are more error prone ;).  I definitely think 
documenting the behavior is important so that people don't expect magic to 
happen, and perhaps we could add a flag later that says send all the flushes 
down all the connections for the paranoid, it should be relatively 
straightforward to do.  Thanks,


Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] ipr: use pci_irq_allocate_vectors

2016-09-29 Thread Brian King
Thanks Christoph. Very nice. As I was reviewing the patch, I noticed
the additional PCI_IRQ_AFFINITY flag, which is currently not being set
in this patch. Is the intention to set that globally by default, or
should I follow up with a one liner to add that to the ipr driver
in the next patch set I send out?

Acked-by: Brian King 

Thanks,

Brian


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG] block: bdi_register_owner() failure cause NULL pointer dereference

2016-09-29 Thread Michael Wang
Hi, Folks

We observed the hard lockup while trying raid assemble with sas3ircu,
it was start with the failure inside bdi_register_owner() with duplicated
kobj path, and later comeup the NULL pointer dereference, after that system
hang and we saw hard lockup on screen.

The duplicated issue could be with the scsi controller driver and we are
going to upgrade it anyway, but my question is why we don't do some error
handling like:

diff --git a/block/genhd.c b/block/genhd.c
index a178c8e..318bc63 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -614,7 +614,15 @@ void device_add_disk(struct device *parent, struct gendisk 
*disk)
 
/* Register BDI before referencing it from bdev */
bdi = &disk->queue->backing_dev_info;
-   bdi_register_owner(bdi, disk_to_dev(disk));
+   if (bdi_register_owner(bdi, disk_to_dev(disk))) {
+   disk_release_events(disk);
+   blk_free_devt(devt);
+   disk->ev = NULL;
+   disk->first_minor = 0;
+   disk->major = 0;
+   WARN_ON(1);
+   return;
+   }
 
blk_register_region(disk_devt(disk), disk->minors, NULL,
exact_match, exact_lock, disk);

to prevent the following NULL pointer dereference and hard lockup?

Regards,
Michael Wang
Sep 29 09:53:28 st401b-3 systemd[1]: Starting Update UTMP about System Runlevel 
Changes...
Sep 29 09:53:28 st401b-3 ntpd[4970]: Listen and drop on 1 v6wildcard :: UDP 123
Sep 29 09:53:28 st401b-3 ntpd[4970]: Listen normally on 2 lo 127.0.0.1 UDP 123
Sep 29 09:53:28 st401b-3 ntpd[4970]: Listen normally on 3 eth0 10.41.12.3 UDP 
123
Sep 29 09:53:28 st401b-3 ntpd[4970]: Listen normally on 4 lo ::1 UDP 123
Sep 29 09:53:28 st401b-3 ntpd[4970]: Listen normally on 5 eth0 
fe80::ec4:7aff:feab:6b0 UDP 123
Sep 29 09:53:28 st401b-3 ntpd[4970]: peers refreshed
Sep 29 09:53:28 st401b-3 ntpd[4970]: Listening on routing socket on fd #22 for 
interface updates
Sep 29 09:53:28 st401b-3 systemd[1]: Started Update UTMP about System Runlevel 
Changes.
Sep 29 09:53:28 st401b-3 systemd[1]: Startup finished in 18.720s (kernel) + 
39.513s (userspace) = 58.233s.
Sep 29 09:55:01 st401b-3 CRON[5433]: (root) CMD (command -v debian-sa1 > 
/dev/null && debian-sa1 1 1)
Sep 29 09:55:01 st401b-3 CRON[5434]: (root) CMD (test -x 
/opt/profitbricks/bin/check_memory && /opt/profitbricks/bin/check_memory)
Sep 29 09:55:17 st401b-3 kernel: [  167.693658] scsi 0:1:0:0: Direct-Access 
LSI  Logical Volume   3000 PQ: 0 ANSI: 6
Sep 29 09:55:17 st401b-3 kernel: [  167.693795] scsi 0:1:0:0: RAID1: 
handle(0x0143), wwid(0x02f7ec7949091b05), pd_count(2), type(SSP)
Sep 29 09:55:17 st401b-3 kernel: [  167.694042] sd 0:1:0:0: [sdam] 5859373056 
512-byte logical blocks: (3.00 TB/2.73 TiB)
Sep 29 09:55:17 st401b-3 kernel: [  167.694044] sd 0:1:0:0: [sdam] 4096-byte 
physical blocks
Sep 29 09:55:17 st401b-3 kernel: [  167.694057] sd 0:1:0:0: Attached scsi 
generic sg40 type 0
Sep 29 09:55:17 st401b-3 kernel: [  167.694129] sd 0:1:0:0: [sdam] Write 
Protect is off
Sep 29 09:55:17 st401b-3 kernel: [  167.694131] sd 0:1:0:0: [sdam] Mode Sense: 
03 00 00 08
Sep 29 09:55:17 st401b-3 kernel: [  167.694166] sd 0:1:0:0: [sdam] No Caching 
mode page found
Sep 29 09:55:17 st401b-3 kernel: [  167.694282] sd 0:0:4:0: hidding raid 
component
Sep 29 09:55:17 st401b-3 kernel: [  167.694589] sd 0:1:0:0: [sdam] Assuming 
drive cache: write through
Sep 29 09:55:17 st401b-3 kernel: [  167.703346] sd 0:1:0:0: [sdam] Attached 
SCSI disk
Sep 29 09:55:17 st401b-3 kernel: [  167.703653] sd 0:0:5:0: hidding raid 
component
Sep 29 09:55:39 st401b-3 check_backup_lvm_push: critical: local git command 
rev-parse HEAD failed, retval: 0, fatal: Not a git repository: 
'/var/lib/backup-lvm/.git/'
Sep 29 09:56:03 st401b-3 kernel: [  213.684812] scsi 0:1:1:0: Direct-Access 
LSI  Logical Volume   3000 PQ: 0 ANSI: 6
Sep 29 09:56:03 st401b-3 kernel: [  213.684946] scsi 0:1:1:0: RAID1: 
handle(0x0142), wwid(0x0ab3eca651cd1b58), pd_count(2), type(SSP)
Sep 29 09:56:03 st401b-3 kernel: [  213.685189] sd 0:1:1:0: [sde] 5859373056 
512-byte logical blocks: (3.00 TB/2.73 TiB)
Sep 29 09:56:03 st401b-3 kernel: [  213.685192] sd 0:1:1:0: [sde] 4096-byte 
physical blocks
Sep 29 09:56:03 st401b-3 kernel: [  213.685204] sd 0:1:1:0: Attached scsi 
generic sg41 type 0
Sep 29 09:56:03 st401b-3 kernel: [  213.685275] sd 0:1:1:0: [sde] Write Protect 
is off
Sep 29 09:56:03 st401b-3 kernel: [  213.685277] sd 0:1:1:0: [sde] Mode Sense: 
03 00 00 08
Sep 29 09:56:03 st401b-3 kernel: [  213.685307] sd 0:1:1:0: [sde] No Caching 
mode page found
Sep 29 09:56:03 st401b-3 kernel: [  213.685423] sd 0:0:6:0: hidding raid 
component
Sep 29 09:56:03 st401b-3 kernel: [  213.685698] sd 0:1:1:0: [sde] Assuming 
drive cache: write through
Sep 29 09:56:03 st401b-3 kernel: [  213.686226] [ cut here 
]
Sep 29 09:56:03 st401b-3 kernel: [  213.686232] WARNING: CPU: 2 PID: 2061 at 
fs/sysfs/dir.c:31 sysfs_

Re: [PATCH] blkcg: Unlock blkcg_pol_mutex once if cpd == NULL

2016-09-29 Thread Tejun Heo
Hello,

On Mon, Sep 26, 2016 at 03:36:25PM -0700, Bart Van Assche wrote:
> Unlocking a mutex twice is wrong. Hence modify blkcg_policy_register()
> such that blkcg_pol_mutex is unlocked once if cpd == NULL. This patch
> avoids that smatch reports the following error:
> 
> block/blk-cgroup.c:1378: blkcg_policy_register() error: double unlock 
> 'mutex:&blkcg_pol_mutex'
> 
> Signed-off-by: Bart Van Assche 
> Cc: Tejun Heo 
> Cc: 
> ---
>  block/blk-cgroup.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index dd38e5c..cdbca1c 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -1327,8 +1327,10 @@ int blkcg_policy_register(struct blkcg_policy *pol)
>   for (i = 0; i < BLKCG_MAX_POLS; i++)
>   if (!blkcg_policy[i])
>   break;
> - if (i >= BLKCG_MAX_POLS)
> + if (i >= BLKCG_MAX_POLS) {
> + mutex_unlock(&blkcg_pol_mutex);
>   goto err_unlock;
> + }

Wouldn't it be better to drop explicit mutx_unlock(&blkcg_pol_mutex)
on "if (!cpd)"?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [PATCH][V3] nbd: add multi-connection support

2016-09-29 Thread Wouter Verhelst
Hi Josef,

On Wed, Sep 28, 2016 at 04:01:32PM -0400, Josef Bacik wrote:
> NBD can become contended on its single connection.  We have to serialize all
> writes and we can only process one read response at a time.  Fix this by
> allowing userspace to provide multiple connections to a single nbd device.  
> This
> coupled with block-mq drastically increases performance in multi-process 
> cases.
> Thanks,

This reminds me: I've been pondering this for a while, and I think there
is no way we can guarantee the correct ordering of FLUSH replies in the
face of multiple connections, since a WRITE reply on one connection may
arrive before a FLUSH reply on another which it does not cover, even if
the server has no cache coherency issues otherwise.

Having said that, there can certainly be cases where that is not a
problem, and where performance considerations are more important than
reliability guarantees; so once this patch lands in the kernel (and the
necessary support patch lands in the userland utilities), I think I'll
just update the documentation to mention the problems that might ensue,
and be done with it.

I can see only a few ways in which to potentially solve this problem:
- Kernel-side nbd-client could send a FLUSH command over every channel,
  and only report successful completion once all replies have been
  received. This might negate some of the performance benefits, however.
- Multiplexing commands over a single connection (perhaps an SCTP one,
  rather than TCP); this would require some effort though, as you said,
  and would probably complicate the protocol significantly.

Regards,

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Enabling ATA Command Priorities

2016-09-29 Thread t...@kernel.org
Hello,

On Wed, Sep 28, 2016 at 03:43:32AM +, Adam Manzanares wrote:
> I prefer having the feature conditional so you can use the CFQ
> scheduler with I/O priorities as is. If you decide to enable the
> feature then the priorities will be passed down to the drive in
> addition to the work that the CFQ scheduler does. Since this feature
> may change the user perceived performance of the device I want to
> make sure they know what they are getting into.

Yeah, I prefer to have it behind an explicit flag given the history of
optional ATA features.  The feature is unlikely to matter to a lot of
people and is almost bound to break existing RT prio usages.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] ata: Enabling ATA Command Priorities

2016-09-29 Thread Tejun Heo
Hello,

On Tue, Sep 27, 2016 at 11:14:55AM -0700, Adam Manzanares wrote:
> +/**
> + *   ata_ncq_prio_enabled - Test whether NCQ prio is enabled
> + *   @dev: ATA device to test for
> + *
> + *   LOCKING:
> + *   spin_lock_irqsave(host lock)
> + *
> + *   RETURNS:
> + *   1 if NCQ prio is enabled for @dev, 0 otherwise.
> + */
> +static inline int ata_ncq_prio_enabled(struct ata_device *dev)
> +{
> + return (dev->flags & (ATA_DFLAG_PIO | ATA_DFLAG_NCQ_OFF |
> +   ATA_DFLAG_NCQ_PRIO)) == ATA_DFLAG_NCQ_PRIO;

I'm not sure this needs to test PIO and NCQ_OFF.  This functions
pretty much can assume that it'd be only called in NCQ context, no?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] block: Add iocontext priority to request

2016-09-29 Thread Tejun Heo
Hello,

On Tue, Sep 27, 2016 at 11:14:54AM -0700, Adam Manzanares wrote:
> Patch adds association between iocontext and a request.
> 
> Signed-off-by: Adam Manzanares 

Can you please describe how this may impact existing usages?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html