Re: [PATCH 14/15] block/bsg: move queue creation into bsg_setup_queue

2017-01-11 Thread Christoph Hellwig
On Wed, Jan 11, 2017 at 05:01:22PM -0500, Mike Snitzer wrote:
> But I've seen you reference the need to stop multipath from allocating
> its own requests.  Are you referring to old request_fn request-based
> multipath's clone_old_rq:alloc_old_clone_request?

Yes, that one is the issue.  It allocates a struct request "blind",
that is without known what queue it goes to.  With this queue (or blk-mq
for that matter) we need to know the queue, because the request structures
might have additional data behind it and require additional initialization
for drivers that require per-request data.  We make use of the per-request
data for SCSI passthrough in this patch.

> Or how blk-mq request-based multipath gets a request from the blk-mq tag
> space (via blk_mq_alloc_request)?

That's fine because it works on the queue of the device that I/O is
submitted to.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[LSF/MM TOPIC][LSF/MM ATTEND] IO completion polling for block drivers

2017-01-11 Thread Stephen Bates
Hi

I'd like to discuss the ongoing work in the kernel to enable high priority
IO via polling for completion in the blk-mq subsystem.

Given that iopoll only really makes sense for low-latency, low queue depth
environments (i.e. down below 10-20us) I'd like to discuss which drivers
we think will need/want to be upgraded (aside from NVMe ;-)).

I'd also be interested in discussing how best to enable and disable
polling. In the past some of us have pushed for a "big hammer" to turn
polling on for a given device or HW queue [1]. I'd like to discuss this
again as well as looking at other methods above and beyond the preadv2
system call and the HIPRI flag.

Finally I'd like to discuss some of the recent work to improve the
heuristics around when to poll and when not to poll. I'd like to see if we
can come up with more optimal balance between CPU load and average
completion times [2].

Stephen Bates

[1] http://marc.info/?l=linux-block=146307410101827=2
[2] http://marc.info/?l=linux-block=147803441801858=2

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Jens Axboe
On 01/11/2017 09:36 PM, Stephen Bates wrote:
>>>
>>> I'd like to attend LSF/MM and would like to discuss polling for block
>>> drivers.
>>>
>>> Currently there is blk-iopoll but it is neither as widely used as NAPI
>>> in the networking field and accoring to Sagi's findings in [1]
>>> performance with polling is not on par with IRQ usage.
>>>
>>> On LSF/MM I'd like to whether it is desirable to have NAPI like polling
>>> in more block drivers and how to overcome the currently seen performance
>>> issues.
>>
>> It would be an interesting topic to discuss, as it is a shame that
>> blk-iopoll isn't used more widely.
>>
>> --
>> Jens Axboe
>>
> 
> I'd also be interested in this topic. Given that iopoll only really makes
> sense for low-latency, low queue depth environments (i.e. down below
> 10-20us) I'd like to discuss which drivers we think will need/want to be
> upgraded (aside from NVMe ;-)).
> 
> I'd also be interested in discussing how best to enable and disable
> polling. In the past some of us have pushed for a "big hammer" to turn
> polling on for a given device or HW queue [1]. I'd like to discuss this
> again as well as looking at other methods above and beyond the preadv2
> system call and the HIPRI flag.

This is a separate topic. The initial proposal is for polling for
interrupt mitigation, you are talking about polling in the context of
polling for completion of an IO.

We can definitely talk about this form of polling as well, but it should
be a separate topic and probably proposed independently.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Rename blk_queue_zone_size and bdev_zone_size

2017-01-11 Thread Damien Le Moal

On 1/12/17 13:38, Jens Axboe wrote:
> On 01/11/2017 09:36 PM, Damien Le Moal wrote:
>> Jens,
>>
>> On 1/12/17 12:52, Jens Axboe wrote:
>>> On Thu, Jan 12 2017, Damien Le Moal wrote:
 All block device data fields and functions returning a number of 512B
 sectors are by convention named xxx_sectors while names in the form
 of xxx_size are generally used for a number of bytes. The 
 blk_queue_zone_size
 and bdev_zone_size functions were not following this convention so rename
 them.

 This is a style fix and no functional change is introduced by this patch.
>>>
>>> I agree, this cleans it up. Applied.
>>
>> Thank you. I saw that you applied to for-4.11/block. Could we get these
>> in applied to 4.10-rc so that the zoned block device API is cleaner from
>> the first stable release of that API?
> 
> Sure, I did consider that as well. Since I just pushed out the 4.11
> branch, I'll rebase and yank these into the 4.10 branch instead.

Thanks !

-- 
Damien Le Moal, Ph.D.
Sr. Manager, System Software Research Group,
Western Digital Corporation
damien.lem...@wdc.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.wdc.com, www.hgst.com
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Rename blk_queue_zone_size and bdev_zone_size

2017-01-11 Thread Jens Axboe
On 01/11/2017 09:36 PM, Damien Le Moal wrote:
> Jens,
> 
> On 1/12/17 12:52, Jens Axboe wrote:
>> On Thu, Jan 12 2017, Damien Le Moal wrote:
>>> All block device data fields and functions returning a number of 512B
>>> sectors are by convention named xxx_sectors while names in the form
>>> of xxx_size are generally used for a number of bytes. The 
>>> blk_queue_zone_size
>>> and bdev_zone_size functions were not following this convention so rename
>>> them.
>>>
>>> This is a style fix and no functional change is introduced by this patch.
>>
>> I agree, this cleans it up. Applied.
> 
> Thank you. I saw that you applied to for-4.11/block. Could we get these
> in applied to 4.10-rc so that the zoned block device API is cleaner from
> the first stable release of that API?

Sure, I did consider that as well. Since I just pushed out the 4.11
branch, I'll rebase and yank these into the 4.10 branch instead.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Rename blk_queue_zone_size and bdev_zone_size

2017-01-11 Thread Damien Le Moal
Jens,

On 1/12/17 12:52, Jens Axboe wrote:
> On Thu, Jan 12 2017, Damien Le Moal wrote:
>> All block device data fields and functions returning a number of 512B
>> sectors are by convention named xxx_sectors while names in the form
>> of xxx_size are generally used for a number of bytes. The blk_queue_zone_size
>> and bdev_zone_size functions were not following this convention so rename
>> them.
>>
>> This is a style fix and no functional change is introduced by this patch.
> 
> I agree, this cleans it up. Applied.

Thank you. I saw that you applied to for-4.11/block. Could we get these
in applied to 4.10-rc so that the zoned block device API is cleaner from
the first stable release of that API?

Best regards.

-- 
Damien Le Moal, Ph.D.
Sr. Manager, System Software Research Group,
Western Digital Corporation
damien.lem...@wdc.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.wdc.com, www.hgst.com
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] MAINTAINERS: Update maintainer entry for NBD

2017-01-11 Thread Jens Axboe
On 01/11/2017 01:41 PM, Josef Bacik wrote:
> The old maintainers email is bouncing and I've rewritten most of this
> driver in the recent months.  Also add linux-block to the mailinglist
> and remove the old tree, I will send patches through the linux-block
> tree.  Thanks,

Added, thanks Josef.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] block: Rename blk_queue_zone_size and bdev_zone_size

2017-01-11 Thread Damien Le Moal
All block device data fields and functions returning a number of 512B
sectors are by convention named xxx_sectors while names in the form
xxx_size are generally used for a number of bytes. The blk_queue_zone_size
and bdev_zone_size functions were not following this convention so rename
them.

No functional change is introduced by this patch.

Signed-off-by: Damien Le Moal 
---
 block/blk-zoned.c |  4 ++--
 block/partition-generic.c | 14 +++---
 include/linux/blkdev.h|  6 +++---
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 472211f..3bd15d8 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -16,7 +16,7 @@
 static inline sector_t blk_zone_start(struct request_queue *q,
  sector_t sector)
 {
-   sector_t zone_mask = blk_queue_zone_size(q) - 1;
+   sector_t zone_mask = blk_queue_zone_sectors(q) - 1;
 
return sector & ~zone_mask;
 }
@@ -222,7 +222,7 @@ int blkdev_reset_zones(struct block_device *bdev,
return -EINVAL;
 
/* Check alignment (handle eventual smaller last zone) */
-   zone_sectors = blk_queue_zone_size(q);
+   zone_sectors = blk_queue_zone_sectors(q);
if (sector & (zone_sectors - 1))
return -EINVAL;
 
diff --git a/block/partition-generic.c b/block/partition-generic.c
index d7beb6b..7afb990 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -434,7 +434,7 @@ static bool part_zone_aligned(struct gendisk *disk,
  struct block_device *bdev,
  sector_t from, sector_t size)
 {
-   unsigned int zone_size = bdev_zone_size(bdev);
+   unsigned int zone_sectors = bdev_zone_sectors(bdev);
 
/*
 * If this function is called, then the disk is a zoned block device
@@ -446,7 +446,7 @@ static bool part_zone_aligned(struct gendisk *disk,
 * regular block devices (no zone operation) and their zone size will
 * be reported as 0. Allow this case.
 */
-   if (!zone_size)
+   if (!zone_sectors)
return true;
 
/*
@@ -455,24 +455,24 @@ static bool part_zone_aligned(struct gendisk *disk,
 * use it. Check the zone size too: it should be a power of 2 number
 * of sectors.
 */
-   if (WARN_ON_ONCE(!is_power_of_2(zone_size))) {
+   if (WARN_ON_ONCE(!is_power_of_2(zone_sectors))) {
u32 rem;
 
-   div_u64_rem(from, zone_size, );
+   div_u64_rem(from, zone_sectors, );
if (rem)
return false;
if ((from + size) < get_capacity(disk)) {
-   div_u64_rem(size, zone_size, );
+   div_u64_rem(size, zone_sectors, );
if (rem)
return false;
}
 
} else {
 
-   if (from & (zone_size - 1))
+   if (from & (zone_sectors - 1))
return false;
if ((from + size) < get_capacity(disk) &&
-   (size & (zone_size - 1)))
+   (size & (zone_sectors - 1)))
return false;
 
}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 8369564..ff3d774 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -739,7 +739,7 @@ static inline bool blk_queue_is_zoned(struct request_queue 
*q)
}
 }
 
-static inline unsigned int blk_queue_zone_size(struct request_queue *q)
+static inline unsigned int blk_queue_zone_sectors(struct request_queue *q)
 {
return blk_queue_is_zoned(q) ? q->limits.chunk_sectors : 0;
 }
@@ -1536,12 +1536,12 @@ static inline bool bdev_is_zoned(struct block_device 
*bdev)
return false;
 }
 
-static inline unsigned int bdev_zone_size(struct block_device *bdev)
+static inline unsigned int bdev_zone_sectors(struct block_device *bdev)
 {
struct request_queue *q = bdev_get_queue(bdev);
 
if (q)
-   return blk_queue_zone_size(q);
+   return blk_queue_zone_sectors(q);
 
return 0;
 }
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Lsf-pc] [LSF/MM ATTEND] OCSSDs - SMR, Hierarchical Interface, and Vector I/Os

2017-01-11 Thread James Bottomley
On Thu, 2017-01-12 at 11:35 +0900, Damien Le Moal wrote:
> > Just a note for the poor admin looking after the lists: to find all 
> > the ATTEND and TOPIC requests for the lists I fold up the threads 
> > to the top.  If you frame your attend request as a reply, it's 
> > possible it won't get counted because I didn't find it
> > 
> > so please *start a new thread* for ATTEND and TOPIC requests.
> 
> My apologies for the overhead. I will resend.
> Thank you.

You don't need to resend ... I've got you on the list.  I replied
publicly just in case there were any other people who did this that I
didn't notice.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Lsf-pc] [LSF/MM ATTEND] OCSSDs - SMR, Hierarchical Interface, and Vector I/Os

2017-01-11 Thread James Bottomley
On Thu, 2017-01-12 at 10:33 +0900, Damien Le Moal wrote:
> Hello,
> 
> A long discussion on the list followed this initial topic proposal 
> from Matias. I think this is a worthy topic to discuss at LSF in 
> order to steer development of the zoned block device interface in the 
> right direction. Considering the relation and implication to ZBC/ZAC
> support,I would like to attend LSF/MM to participate in this
> discussion.

Just a note for the poor admin looking after the lists: to find all the
ATTEND and TOPIC requests for the lists I fold up the threads to the
top.  If you frame your attend request as a reply, it's possible it
won't get counted because I didn't find it

so please *start a new thread* for ATTEND and TOPIC requests.

Thanks,

James

PS If you think you sent a TOPIC/ATTEND request in reply to something,
then I really haven't seen it because this is the first one I noticed,
and you should resend.


--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[LSF/MM ATTEND] OCSSDs - SMR, Hierarchical Interface, and Vector I/Os

2017-01-11 Thread Damien Le Moal
Hello,

A long discussion on the list followed this initial topic proposal from
Matias. I think this is a worthy topic to discuss at LSF in order to
steer development of the zoned block device interface in the right
direction. Considering the relation and implication to ZBC/ZAC support,
I would like to attend LSF/MM to participate in this discussion.

Thank you.

Best regards.

On 1/3/17 06:06, Matias Bjørling wrote:
> Hi,
> 
> The open-channel SSD subsystem is maturing, and drives are beginning to 
> become available on the market. The open-channel SSD interface is very 
> similar to the one exposed by SMR hard-drives. They both have a set of 
> chunks (zones) exposed, and zones are managed using open/close logic. 
> The main difference on open-channel SSDs is that it additionally exposes 
> multiple sets of zones through a hierarchical interface, which covers a 
> numbers levels (X channels, Y LUNs per channel, Z zones per LUN).
> 
> Given that the SMR interface is similar to OCSSDs interface, I like to 
> propose to discuss this at LSF/MM to align the efforts and make a clear 
> path forward:
> 
> 1. SMR Compatibility
> 
> Can the SMR host interface be adapted to Open-Channel SSDs? For example, 
> the interface may be exposed as a single-level set of zones, which 
> ignore the channel and lun concept for simplicity. Another approach 
> might be to extend the SMR implementation sysfs entries to expose the 
> hierarchy of the device (channels with X LUNs and each luns have a set 
> of zones).
> 
> 2. How to expose the tens of LUNs that OCSSDs have?
> 
> An open-channel SSDs typically has 64-256 LUNs that each acts as a 
> parallel unit. How can these be efficiently exposed?
> 
> One may expose these as separate namespaces/partitions. For a DAS with 
> 24 drives, that will be 1536-6144 separate LUNs to manage. That many 
> LUNs will blow up the host with gendisk instances. While if we do, then 
> we have an excellent 1:1 mapping between the SMR interface and the OCSSD 
> interface.
> 
> On the other hand, one could expose the device LUNs within a single LBA 
> address space and lay the LUNs out linearly. In that case, the block 
> layer may expose a variable that enables applications to understand this 
> hierarchy. Mainly the channels with LUNs. Any warm feelings towards this?
> 
> Currently, a shortcut is taken with the geometry and hierarchy, which 
> expose it through the /lightnvm sysfs entries. These (or a type thereof) 
> can be moved to the block layer /queue directory.
> 
> If keeping the LUNs exposed on the same gendisk, vector I/Os becomes a 
> viable path:
> 
> 3. Vector I/Os
> 
> To derive parallelism from an open-channel SSD (and SSDs in parallel), 
> one need to access them in parallel. Parallelism is achieved either by 
> issuing I/Os for each LUN (similar to driving multiple SSDs today) or 
> using a vector interface (encapsulating a list of LBAs, length, and data 
> buffer) into the kernel. The latter approach allows I/Os to be 
> vectorized and sent as a single unit to hardware.
> 
> Implementing this in generic block layer code might be overkill if only 
> open-channel SSDs use it. I like to hear other use-cases (e.g., 
> preadv/pwritev, file-systems, virtio?) that can take advantage of 
> vectored I/Os. If it makes sense, then which level to implement: 
> bio/request level, SGLs, or a new structure?
> 
> Device drivers that support vectored I/Os should be able to opt into the 
> interface, while the block layer may automatically roll out for device 
> drivers that don't have the support.
> 
> What has the history been in the Linux kernel about vector I/Os? What 
> have reasons in the past been that such an interface was not adopted?
> 
> I will post RFC SMR patches before LSF/MM, such that we have a firm 
> ground to discuss how it may be integrated.
> 
> -- Besides OCSSDs, I also like to participate in the discussions of 
> XCOPY, NVMe, multipath, multi-queue interrupt management as well.
> 
> -Matias
> 
> ___
> Linux-nvme mailing list
> linux-n...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
> 

-- 
Damien Le Moal, Ph.D.
Sr. Manager, System Software Research Group,
Western Digital Corporation
damien.lem...@wdc.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.wdc.com, www.hgst.com
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/15] dm: remove incomple BLOCK_PC support

2017-01-11 Thread Mike Snitzer
On Tue, Jan 10 2017 at 10:06am -0500,
Christoph Hellwig  wrote:

> DM tries to copy a few fields around for BLOCK_PC requests, but given
> that no dm-target ever wires up scsi_cmd_ioctl BLOCK_PC can't actually
> be sent to dm.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/md/dm-rq.c | 16 
>  1 file changed, 16 deletions(-)
> 
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index 93f6e9f..3f12916 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -270,19 +270,6 @@ static void dm_end_request(struct request *clone, int 
> error)
>   struct mapped_device *md = tio->md;
>   struct request *rq = tio->orig;
>  
> - if (rq->cmd_type == REQ_TYPE_BLOCK_PC) {
> - rq->errors = clone->errors;
> - rq->resid_len = clone->resid_len;
> -
> - if (rq->sense)
> - /*
> -  * We are using the sense buffer of the original
> -  * request.
> -  * So setting the length of the sense data is enough.
> -  */
> - rq->sense_len = clone->sense_len;
> - }
> -
>   free_rq_clone(clone);
>   rq_end_stats(md, rq);
>   if (!rq->q->mq_ops)
> @@ -511,9 +498,6 @@ static int setup_clone(struct request *clone, struct 
> request *rq,
>   if (r)
>   return r;
>  
> - clone->cmd = rq->cmd;
> - clone->cmd_len = rq->cmd_len;
> - clone->sense = rq->sense;
>   clone->end_io = end_clone_request;
>   clone->end_io_data = tio;
>  

I'm not following your reasoning.

dm_blk_ioctl calls __blkdev_driver_ioctl and will call scsi_cmd_ioctl
(sd_ioctl -> scsi_cmd_blk_ioctl -> scsi_cmd_ioctl) if DM's underlying
block device is a scsi device.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] block/bsg: move queue creation into bsg_setup_queue

2017-01-11 Thread Mike Snitzer
On Wed, Jan 11 2017 at  3:45am -0500,
Christoph Hellwig  wrote:

> On Wed, Jan 11, 2017 at 09:42:44AM +0100, Johannes Thumshirn wrote:
> > On Tue, Jan 10, 2017 at 04:06:19PM +0100, Christoph Hellwig wrote:
> > > Simply the boilerplate code needed for bsg nodes a bit.
> > > 
> > > Signed-off-by: Christoph Hellwig 
> > > ---
> > 
> > that reminds me of posting my SAS bsg-lib patch...
> 
> Yes.  Having SAS use bsg-lib, and bsg-lib switched away from abusing
> struct request_queue would make this series a lot cleaner.
> 
> So maybe we should get that into the scsi tree for 4.10 together
> with the prep patches in this series as a priority and defer the actual
> struct request changes once again.  That should also give us some more
> time to sort out the dm-mpath story..

I'm not aware of the story you're referring to.  I'm missing the actual
challenge you're facing.

But I've seen you reference the need to stop multipath from allocating
its own requests.  Are you referring to old request_fn request-based
multipath's clone_old_rq:alloc_old_clone_request?

Or how blk-mq request-based multipath gets a request from the blk-mq tag
space (via blk_mq_alloc_request)?

Or both?  How is that holding you back?
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/10] mq-deadline: add blk-mq adaptation of the deadline IO scheduler

2017-01-11 Thread Jens Axboe
This is basically identical to deadline-iosched, except it registers
as a MQ capable scheduler. This is still a single queue design.

Signed-off-by: Jens Axboe 
---
 block/Kconfig.iosched |   6 +
 block/Makefile|   1 +
 block/mq-deadline.c   | 569 ++
 3 files changed, 576 insertions(+)
 create mode 100644 block/mq-deadline.c

diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index 421bef9c4c48..490ef2850fae 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -32,6 +32,12 @@ config IOSCHED_CFQ
 
  This is the default I/O scheduler.
 
+config MQ_IOSCHED_DEADLINE
+   tristate "MQ deadline I/O scheduler"
+   default y
+   ---help---
+ MQ version of the deadline IO scheduler.
+
 config CFQ_GROUP_IOSCHED
bool "CFQ Group Scheduling support"
depends on IOSCHED_CFQ && BLK_CGROUP
diff --git a/block/Makefile b/block/Makefile
index 2eee9e1bb6db..3ee0abd7205a 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -18,6 +18,7 @@ obj-$(CONFIG_BLK_DEV_THROTTLING)  += blk-throttle.o
 obj-$(CONFIG_IOSCHED_NOOP) += noop-iosched.o
 obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
 obj-$(CONFIG_IOSCHED_CFQ)  += cfq-iosched.o
+obj-$(CONFIG_MQ_IOSCHED_DEADLINE)  += mq-deadline.o
 
 obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
 obj-$(CONFIG_BLK_CMDLINE_PARSER)   += cmdline-parser.o
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
new file mode 100644
index ..693f281607df
--- /dev/null
+++ b/block/mq-deadline.c
@@ -0,0 +1,569 @@
+/*
+ *  MQ Deadline i/o scheduler - adaptation of the legacy deadline scheduler,
+ *  for the blk-mq scheduling framework
+ *
+ *  Copyright (C) 2016 Jens Axboe 
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "blk.h"
+#include "blk-mq.h"
+#include "blk-mq-tag.h"
+#include "blk-mq-sched.h"
+
+/*
+ * See Documentation/block/deadline-iosched.txt
+ */
+static const int read_expire = HZ / 2;  /* max time before a read is 
submitted. */
+static const int write_expire = 5 * HZ; /* ditto for writes, these limits are 
SOFT! */
+static const int writes_starved = 2;/* max times reads can starve a write 
*/
+static const int fifo_batch = 16;   /* # of sequential requests treated as 
one
+by the above parameters. For throughput. */
+
+struct deadline_data {
+   /*
+* run time data
+*/
+
+   /*
+* requests (deadline_rq s) are present on both sort_list and fifo_list
+*/
+   struct rb_root sort_list[2];
+   struct list_head fifo_list[2];
+
+   /*
+* next in sort order. read, write or both are NULL
+*/
+   struct request *next_rq[2];
+   unsigned int batching;  /* number of sequential requests made */
+   unsigned int starved;   /* times reads have starved writes */
+
+   /*
+* settings that change how the i/o scheduler behaves
+*/
+   int fifo_expire[2];
+   int fifo_batch;
+   int writes_starved;
+   int front_merges;
+
+   spinlock_t lock;
+   struct list_head dispatch;
+};
+
+static inline struct rb_root *
+deadline_rb_root(struct deadline_data *dd, struct request *rq)
+{
+   return >sort_list[rq_data_dir(rq)];
+}
+
+/*
+ * get the request after `rq' in sector-sorted order
+ */
+static inline struct request *
+deadline_latter_request(struct request *rq)
+{
+   struct rb_node *node = rb_next(>rb_node);
+
+   if (node)
+   return rb_entry_rq(node);
+
+   return NULL;
+}
+
+static void
+deadline_add_rq_rb(struct deadline_data *dd, struct request *rq)
+{
+   struct rb_root *root = deadline_rb_root(dd, rq);
+
+   elv_rb_add(root, rq);
+}
+
+static inline void
+deadline_del_rq_rb(struct deadline_data *dd, struct request *rq)
+{
+   const int data_dir = rq_data_dir(rq);
+
+   if (dd->next_rq[data_dir] == rq)
+   dd->next_rq[data_dir] = deadline_latter_request(rq);
+
+   elv_rb_del(deadline_rb_root(dd, rq), rq);
+}
+
+/*
+ * remove rq from rbtree and fifo.
+ */
+static void deadline_remove_request(struct request_queue *q, struct request 
*rq)
+{
+   struct deadline_data *dd = q->elevator->elevator_data;
+
+   list_del_init(>queuelist);
+
+   /*
+* We might not be on the rbtree, if we are doing an insert merge
+*/
+   if (!RB_EMPTY_NODE(>rb_node))
+   deadline_del_rq_rb(dd, rq);
+
+   elv_rqhash_del(q, rq);
+   if (q->last_merge == rq)
+   q->last_merge = NULL;
+}
+
+static void dd_request_merged(struct request_queue *q, struct request *req,
+ int type)
+{
+   struct deadline_data *dd = q->elevator->elevator_data;
+
+   /*
+* if the merge was a front merge, we need to reposition request

[PATCH 10/10] blk-mq-sched: allow setting of default IO scheduler

2017-01-11 Thread Jens Axboe
Add Kconfig entries to manage what devices get assigned an MQ
scheduler, and add a blk-mq flag for drivers to opt out of scheduling.
The latter is useful for admin type queues that still allocate a blk-mq
queue and tag set, but aren't use for normal IO.

Signed-off-by: Jens Axboe 
---
 block/Kconfig.iosched   | 56 +++--
 block/blk-mq-sched.c| 20 ++
 block/blk-mq-sched.h|  2 ++
 block/blk-mq.c  |  8 +++
 block/elevator.c|  8 ++-
 drivers/nvme/host/pci.c |  1 +
 include/linux/blk-mq.h  |  1 +
 7 files changed, 89 insertions(+), 7 deletions(-)

diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index 490ef2850fae..0715ce93daef 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -32,12 +32,6 @@ config IOSCHED_CFQ
 
  This is the default I/O scheduler.
 
-config MQ_IOSCHED_DEADLINE
-   tristate "MQ deadline I/O scheduler"
-   default y
-   ---help---
- MQ version of the deadline IO scheduler.
-
 config CFQ_GROUP_IOSCHED
bool "CFQ Group Scheduling support"
depends on IOSCHED_CFQ && BLK_CGROUP
@@ -69,6 +63,56 @@ config DEFAULT_IOSCHED
default "cfq" if DEFAULT_CFQ
default "noop" if DEFAULT_NOOP
 
+config MQ_IOSCHED_DEADLINE
+   tristate "MQ deadline I/O scheduler"
+   default y
+   ---help---
+ MQ version of the deadline IO scheduler.
+
+config MQ_IOSCHED_NONE
+   bool
+   default y
+
+choice
+   prompt "Default single-queue blk-mq I/O scheduler"
+   default DEFAULT_SQ_NONE
+   help
+ Select the I/O scheduler which will be used by default for blk-mq
+ managed block devices with a single queue.
+
+   config DEFAULT_SQ_DEADLINE
+   bool "MQ Deadline" if MQ_IOSCHED_DEADLINE=y
+
+   config DEFAULT_SQ_NONE
+   bool "None"
+
+endchoice
+
+config DEFAULT_SQ_IOSCHED
+   string
+   default "mq-deadline" if DEFAULT_SQ_DEADLINE
+   default "none" if DEFAULT_SQ_NONE
+
+choice
+   prompt "Default multi-queue blk-mq I/O scheduler"
+   default DEFAULT_MQ_NONE
+   help
+ Select the I/O scheduler which will be used by default for blk-mq
+ managed block devices with multiple queues.
+
+   config DEFAULT_MQ_DEADLINE
+   bool "MQ Deadline" if MQ_IOSCHED_DEADLINE=y
+
+   config DEFAULT_MQ_NONE
+   bool "None"
+
+endchoice
+
+config DEFAULT_MQ_IOSCHED
+   string
+   default "mq-deadline" if DEFAULT_MQ_DEADLINE
+   default "none" if DEFAULT_MQ_NONE
+
 endmenu
 
 endif
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 44cf30eb1589..26e9e20f67ce 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -332,3 +332,23 @@ int blk_mq_sched_teardown(struct request_queue *q)
 
return 0;
 }
+
+int blk_mq_sched_init(struct request_queue *q)
+{
+   int ret;
+
+#if defined(CONFIG_DEFAULT_SQ_NONE)
+   if (q->nr_hw_queues == 1)
+   return 0;
+#endif
+#if defined(CONFIG_DEFAULT_MQ_NONE)
+   if (q->nr_hw_queues > 1)
+   return 0;
+#endif
+
+   mutex_lock(>sysfs_lock);
+   ret = elevator_init(q, NULL);
+   mutex_unlock(>sysfs_lock);
+
+   return ret;
+}
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 68d6a202b827..77859eae19c9 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -25,6 +25,8 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
*hctx);
 int blk_mq_sched_setup(struct request_queue *q);
 int blk_mq_sched_teardown(struct request_queue *q);
 
+int blk_mq_sched_init(struct request_queue *q);
+
 static inline bool
 blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
 {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3180b5fac88c..0dcd593e4ddd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2227,6 +2227,14 @@ struct request_queue *blk_mq_init_allocated_queue(struct 
blk_mq_tag_set *set,
mutex_unlock(_q_mutex);
put_online_cpus();
 
+   if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
+   int ret;
+
+   ret = blk_mq_sched_init(q);
+   if (ret)
+   return ERR_PTR(ret);
+   }
+
return q;
 
 err_hctxs:
diff --git a/block/elevator.c b/block/elevator.c
index 79e74da26343..b3ea721e51b4 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -219,7 +219,13 @@ int elevator_init(struct request_queue *q, char *name)
}
 
if (!e) {
-   e = elevator_get(CONFIG_DEFAULT_IOSCHED, false);
+   if (q->mq_ops && q->nr_hw_queues == 1)
+   e = elevator_get(CONFIG_DEFAULT_SQ_IOSCHED, false);
+   else if (q->mq_ops)
+   e = elevator_get(CONFIG_DEFAULT_MQ_IOSCHED, false);
+   else
+   e = elevator_get(CONFIG_DEFAULT_IOSCHED, false);
+
if (!e) {

[PATCH 03/10] block: move rq_ioc() to blk.h

2017-01-11 Thread Jens Axboe
We want to use it outside of blk-core.c.

Signed-off-by: Jens Axboe 
---
 block/blk-core.c | 16 
 block/blk.h  | 16 
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 61ba08c58b64..92baea07acbc 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1040,22 +1040,6 @@ static bool blk_rq_should_init_elevator(struct bio *bio)
 }
 
 /**
- * rq_ioc - determine io_context for request allocation
- * @bio: request being allocated is for this bio (can be %NULL)
- *
- * Determine io_context to use for request allocation for @bio.  May return
- * %NULL if %current->io_context doesn't exist.
- */
-static struct io_context *rq_ioc(struct bio *bio)
-{
-#ifdef CONFIG_BLK_CGROUP
-   if (bio && bio->bi_ioc)
-   return bio->bi_ioc;
-#endif
-   return current->io_context;
-}
-
-/**
  * __get_request - get a free request
  * @rl: request list to allocate from
  * @op: operation and flags
diff --git a/block/blk.h b/block/blk.h
index f46c0ac8ae3d..9a716b5925a4 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -264,6 +264,22 @@ void ioc_clear_queue(struct request_queue *q);
 int create_task_io_context(struct task_struct *task, gfp_t gfp_mask, int node);
 
 /**
+ * rq_ioc - determine io_context for request allocation
+ * @bio: request being allocated is for this bio (can be %NULL)
+ *
+ * Determine io_context to use for request allocation for @bio.  May return
+ * %NULL if %current->io_context doesn't exist.
+ */
+static inline struct io_context *rq_ioc(struct bio *bio)
+{
+#ifdef CONFIG_BLK_CGROUP
+   if (bio && bio->bi_ioc)
+   return bio->bi_ioc;
+#endif
+   return current->io_context;
+}
+
+/**
  * create_io_context - try to create task->io_context
  * @gfp_mask: allocation mask
  * @node: allocation node
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/10] block: move existing elevator ops to union

2017-01-11 Thread Jens Axboe
Prep patch for adding MQ ops as well, since doing anon unions with
named initializers doesn't work on older compilers.

Signed-off-by: Jens Axboe 
---
 block/blk-ioc.c  |  8 +++
 block/blk-merge.c|  4 ++--
 block/blk.h  | 10 
 block/cfq-iosched.c  |  2 +-
 block/deadline-iosched.c |  2 +-
 block/elevator.c | 60 
 block/noop-iosched.c |  2 +-
 include/linux/elevator.h |  4 +++-
 8 files changed, 47 insertions(+), 45 deletions(-)

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 381cb50a673c..ab372092a57d 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -43,8 +43,8 @@ static void ioc_exit_icq(struct io_cq *icq)
if (icq->flags & ICQ_EXITED)
return;
 
-   if (et->ops.elevator_exit_icq_fn)
-   et->ops.elevator_exit_icq_fn(icq);
+   if (et->ops.sq.elevator_exit_icq_fn)
+   et->ops.sq.elevator_exit_icq_fn(icq);
 
icq->flags |= ICQ_EXITED;
 }
@@ -383,8 +383,8 @@ struct io_cq *ioc_create_icq(struct io_context *ioc, struct 
request_queue *q,
if (likely(!radix_tree_insert(>icq_tree, q->id, icq))) {
hlist_add_head(>ioc_node, >icq_list);
list_add(>q_node, >icq_list);
-   if (et->ops.elevator_init_icq_fn)
-   et->ops.elevator_init_icq_fn(icq);
+   if (et->ops.sq.elevator_init_icq_fn)
+   et->ops.sq.elevator_init_icq_fn(icq);
} else {
kmem_cache_free(et->icq_cache, icq);
icq = ioc_lookup_icq(ioc, q);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 182398cb1524..480570b691dc 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -763,8 +763,8 @@ int blk_attempt_req_merge(struct request_queue *q, struct 
request *rq,
 {
struct elevator_queue *e = q->elevator;
 
-   if (e->type->ops.elevator_allow_rq_merge_fn)
-   if (!e->type->ops.elevator_allow_rq_merge_fn(q, rq, next))
+   if (e->type->ops.sq.elevator_allow_rq_merge_fn)
+   if (!e->type->ops.sq.elevator_allow_rq_merge_fn(q, rq, next))
return 0;
 
return attempt_merge(q, rq, next);
diff --git a/block/blk.h b/block/blk.h
index 041185e5f129..f46c0ac8ae3d 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -167,7 +167,7 @@ static inline struct request *__elv_next_request(struct 
request_queue *q)
return NULL;
}
if (unlikely(blk_queue_bypass(q)) ||
-   !q->elevator->type->ops.elevator_dispatch_fn(q, 0))
+   !q->elevator->type->ops.sq.elevator_dispatch_fn(q, 0))
return NULL;
}
 }
@@ -176,16 +176,16 @@ static inline void elv_activate_rq(struct request_queue 
*q, struct request *rq)
 {
struct elevator_queue *e = q->elevator;
 
-   if (e->type->ops.elevator_activate_req_fn)
-   e->type->ops.elevator_activate_req_fn(q, rq);
+   if (e->type->ops.sq.elevator_activate_req_fn)
+   e->type->ops.sq.elevator_activate_req_fn(q, rq);
 }
 
 static inline void elv_deactivate_rq(struct request_queue *q, struct request 
*rq)
 {
struct elevator_queue *e = q->elevator;
 
-   if (e->type->ops.elevator_deactivate_req_fn)
-   e->type->ops.elevator_deactivate_req_fn(q, rq);
+   if (e->type->ops.sq.elevator_deactivate_req_fn)
+   e->type->ops.sq.elevator_deactivate_req_fn(q, rq);
 }
 
 #ifdef CONFIG_FAIL_IO_TIMEOUT
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index c73a6fcaeb9d..37aeb20fa454 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -4837,7 +4837,7 @@ static struct elv_fs_entry cfq_attrs[] = {
 };
 
 static struct elevator_type iosched_cfq = {
-   .ops = {
+   .ops.sq = {
.elevator_merge_fn =cfq_merge,
.elevator_merged_fn =   cfq_merged_request,
.elevator_merge_req_fn =cfq_merged_requests,
diff --git a/block/deadline-iosched.c b/block/deadline-iosched.c
index 55e0bb6d7da7..05fc0ea25a98 100644
--- a/block/deadline-iosched.c
+++ b/block/deadline-iosched.c
@@ -439,7 +439,7 @@ static struct elv_fs_entry deadline_attrs[] = {
 };
 
 static struct elevator_type iosched_deadline = {
-   .ops = {
+   .ops.sq = {
.elevator_merge_fn =deadline_merge,
.elevator_merged_fn =   deadline_merged_request,
.elevator_merge_req_fn =deadline_merged_requests,
diff --git a/block/elevator.c b/block/elevator.c
index 40f0c04e5ad3..022a26830297 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -58,8 +58,8 @@ static int elv_iosched_allow_bio_merge(struct request *rq, 
struct bio *bio)
struct request_queue *q = rq->q;
struct elevator_queue *e = q->elevator;
 
-   if (e->type->ops.elevator_allow_bio_merge_fn)

[PATCH] MAINTAINERS: Update maintainer entry for NBD

2017-01-11 Thread Josef Bacik
The old maintainers email is bouncing and I've rewritten most of this
driver in the recent months.  Also add linux-block to the mailinglist
and remove the old tree, I will send patches through the linux-block
tree.  Thanks,

Signed-off-by: Josef Bacik 
---
 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1174508..4c07e01 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8539,10 +8539,10 @@ S:  Maintained
 F: drivers/net/ethernet/netronome/
 
 NETWORK BLOCK DEVICE (NBD)
-M: Markus Pargmann 
+M: Josef Bacik 
 S: Maintained
+L: linux-block@vger.kernel.org
 L: nbd-gene...@lists.sourceforge.net
-T: git git://git.pengutronix.de/git/mpa/linux-nbd.git
 F: Documentation/blockdev/nbd.txt
 F: drivers/block/nbd.c
 F: include/uapi/linux/nbd.h
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] preview - block layer help to detect sequential IO

2017-01-11 Thread kbuild test robot
Hi Kashyap,

[auto build test ERROR on v4.9-rc8]
[cannot apply to block/for-next linus/master linux/master next-20170111]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Kashyap-Desai/preview-block-layer-help-to-detect-sequential-IO/20170112-024228
config: i386-randconfig-a0-201702 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   block/blk-core.c: In function 'add_sequential':
>> block/blk-core.c:1899:16: error: 'struct task_struct' has no member named 
>> 'sequential_io_avg'
 blk_ewma_add(t->sequential_io_avg,
   ^
   block/blk-core.c:1893:10: note: in definition of macro 'blk_ewma_add'
(ewma) *= (weight) - 1; \
 ^~~~
>> block/blk-core.c:1899:16: error: 'struct task_struct' has no member named 
>> 'sequential_io_avg'
 blk_ewma_add(t->sequential_io_avg,
   ^
   block/blk-core.c:1894:10: note: in definition of macro 'blk_ewma_add'
(ewma) += (val) << factor;  \
 ^~~~
>> block/blk-core.c:1900:5: error: 'struct task_struct' has no member named 
>> 'sequential_io'
   t->sequential_io, 8, 0);
^
   block/blk-core.c:1894:20: note: in definition of macro 'blk_ewma_add'
(ewma) += (val) << factor;  \
   ^~~
>> block/blk-core.c:1899:16: error: 'struct task_struct' has no member named 
>> 'sequential_io_avg'
 blk_ewma_add(t->sequential_io_avg,
   ^
   block/blk-core.c:1895:10: note: in definition of macro 'blk_ewma_add'
(ewma) /= (weight); \
 ^~~~
>> block/blk-core.c:1899:16: error: 'struct task_struct' has no member named 
>> 'sequential_io_avg'
 blk_ewma_add(t->sequential_io_avg,
   ^
   block/blk-core.c:1896:10: note: in definition of macro 'blk_ewma_add'
(ewma) >> factor;   \
 ^~~~
   block/blk-core.c:1902:3: error: 'struct task_struct' has no member named 
'sequential_io'
 t->sequential_io = 0;
  ^~
   block/blk-core.c: In function 'generic_make_request_checks':
   block/blk-core.c:2012:7: error: 'struct task_struct' has no member named 
'sequential_io'
  task->sequential_io  = i->sequential;
  ^~
   In file included from block/blk-core.c:14:0:
   block/blk-core.c:2020:21: error: 'struct task_struct' has no member named 
'sequential_io'
  sectors = max(task->sequential_io,
^
   include/linux/kernel.h:747:2: note: in definition of macro '__max'
 t1 max1 = (x); \
 ^~
   block/blk-core.c:2020:13: note: in expansion of macro 'max'
  sectors = max(task->sequential_io,
^~~
   block/blk-core.c:2020:21: error: 'struct task_struct' has no member named 
'sequential_io'
  sectors = max(task->sequential_io,
^
   include/linux/kernel.h:747:13: note: in definition of macro '__max'
 t1 max1 = (x); \
^
   block/blk-core.c:2020:13: note: in expansion of macro 'max'
  sectors = max(task->sequential_io,
^~~
   block/blk-core.c:2021:14: error: 'struct task_struct' has no member named 
'sequential_io_avg'
 task->sequential_io_avg) >> 9;
 ^
   include/linux/kernel.h:748:2: note: in definition of macro '__max'
 t2 max2 = (y); \
 ^~
   block/blk-core.c:2020:13: note: in expansion of macro 'max'
  sectors = max(task->sequential_io,
^~~
   block/blk-core.c:2021:14: error: 'struct task_struct' has no member named 
'sequential_io_avg'
 task->sequential_io_avg) >> 9;
 ^
   include/linux/kernel.h:748:13: note: in definition of macro '__max'
 t2 max2 = (y); \
^
   block/blk-core.c:2020:13: note: in expansion of macro 'max'
  sectors = max(task->sequential_io,
^~~

vim +1899 block/blk-core.c

  1887  }
  1888  
  1889  static void add_sequential(struct task_struct *t)
  1890  {
  1891  #define blk_ewma_add(ewma, val, weight, factor) 
\
  1892  ({  
\
> 1893  (ewma) *= (weight) - 1; 
> \
  1894  (ewma) += (val) << factor;  
\
  1895  (ewma) /= (weight); 
\
  1896  (ewma) >> factor;   

Re: [GIT PULL] NVMe fixes for 4.10-rc4

2017-01-11 Thread Jens Axboe
On 01/11/2017 11:42 AM, Christoph Hellwig wrote:
> Hi Jens,
> 
> below are two small NVMe fixes for this merge window.
> 
> The following changes since commit 6bf6b0aa3da84a3d9126919a94c49c0fb7ee2fb3:
> 
>   virtio_blk: fix panic in initialization error path (2017-01-10 13:30:50 
> -0700)
> 
> are available in the git repository at:
> 
>   git://git.infradead.org/nvme.git nvme-4.10-fixes

Pulled, thanks.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] NVMe fixes for 4.10-rc4

2017-01-11 Thread Christoph Hellwig
Hi Jens,

below are two small NVMe fixes for this merge window.

The following changes since commit 6bf6b0aa3da84a3d9126919a94c49c0fb7ee2fb3:

  virtio_blk: fix panic in initialization error path (2017-01-10 13:30:50 -0700)

are available in the git repository at:

  git://git.infradead.org/nvme.git nvme-4.10-fixes

for you to fetch changes up to b5a10c5f7532b7473776da87e67f8301bbc32693:

  nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too (2017-01-11 17:21:35 
+0100)


Christoph Hellwig (1):
  nvme-rdma: fix nvme_rdma_queue_is_ready

Guilherme G. Piccoli (1):
  nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too

 drivers/nvme/host/core.c | 7 +--
 drivers/nvme/host/rdma.c | 2 +-
 2 files changed, 2 insertions(+), 7 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Hannes Reinecke
On 01/11/2017 05:12 PM, h...@infradead.org wrote:
> On Wed, Jan 11, 2017 at 04:08:31PM +, Bart Van Assche wrote:
>> A typical Ethernet network adapter delays the generation of an interrupt
>> after it has received a packet. A typical block device or HBA does not delay
>> the generation of an interrupt that reports an I/O completion.
> 
> NVMe allows for configurable interrupt coalescing, as do a few modern
> SCSI HBAs.

Essentially every modern SCSI HBA does interrupt coalescing; otherwise
the queuing interface won't work efficiently.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Bart Van Assche
On Wed, 2017-01-11 at 17:22 +0100, Hannes Reinecke wrote:
> On 01/11/2017 05:12 PM, h...@infradead.org wrote:
> > On Wed, Jan 11, 2017 at 04:08:31PM +, Bart Van Assche wrote:
> > > A typical Ethernet network adapter delays the generation of an
> > > interrupt
> > > after it has received a packet. A typical block device or HBA does not
> > > delay
> > > the generation of an interrupt that reports an I/O completion.
> > 
> > NVMe allows for configurable interrupt coalescing, as do a few modern
> > SCSI HBAs.
> 
> Essentially every modern SCSI HBA does interrupt coalescing; otherwise
> the queuing interface won't work efficiently.

Hello Hannes,

The first e-mail in this e-mail thread referred to measurements against a
block device for which interrupt coalescing was not enabled. I think that
the measurements have to be repeated against a block device for which
interrupt coalescing is enabled.

Bart.--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Johannes Thumshirn
On Wed, Jan 11, 2017 at 04:08:31PM +, Bart Van Assche wrote:

[...]

> A typical Ethernet network adapter delays the generation of an interrupt
> after it has received a packet. A typical block device or HBA does not delay
> the generation of an interrupt that reports an I/O completion. I think that
> is why polling is more effective for network adapters than for block
> devices. I'm not sure whether it is possible to achieve benefits similar to
> NAPI for block devices without implementing interrupt coalescing in the
> block device firmware. Note: for block device implementations that use the
> RDMA API, the RDMA API supports interrupt coalescing (see also
> ib_modify_cq()).

Well you can always turn off IRQ generation in the HBA just before scheuduling
the poll handler and re-enable it after you've exhausted your budget or used
too much time, can't you? 

I'll do some prototyping and tests tomorrow so we have some more ground for
discussion.

Byte,
Johannes
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Jens Axboe
On 01/11/2017 09:12 AM, h...@infradead.org wrote:
> On Wed, Jan 11, 2017 at 04:08:31PM +, Bart Van Assche wrote:
>> A typical Ethernet network adapter delays the generation of an interrupt
>> after it has received a packet. A typical block device or HBA does not delay
>> the generation of an interrupt that reports an I/O completion.
> 
> NVMe allows for configurable interrupt coalescing, as do a few modern
> SCSI HBAs.

Unfortunately it's way too coarse on NVMe, with the timer being in 100
usec increments... I've had mixed success with the depth trigger.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Bart Van Assche
On Wed, 2017-01-11 at 14:43 +0100, Johannes Thumshirn wrote:
> I'd like to attend LSF/MM and would like to discuss polling for block
> drivers.
> 
> Currently there is blk-iopoll but it is neither as widely used as NAPI in
> the networking field and accoring to Sagi's findings in [1] performance
> with polling is not on par with IRQ usage.
> 
> On LSF/MM I'd like to whether it is desirable to have NAPI like polling in
> more block drivers and how to overcome the currently seen performance
> issues.
> 
> [1] http://lists.infradead.org/pipermail/linux-nvme/2016-October/006975.ht
> ml

A typical Ethernet network adapter delays the generation of an interrupt
after it has received a packet. A typical block device or HBA does not delay
the generation of an interrupt that reports an I/O completion. I think that
is why polling is more effective for network adapters than for block
devices. I'm not sure whether it is possible to achieve benefits similar to
NAPI for block devices without implementing interrupt coalescing in the
block device firmware. Note: for block device implementations that use the
RDMA API, the RDMA API supports interrupt coalescing (see also
ib_modify_cq()).

An example of the interrupt coalescing parameters for a network adapter:

# ethtool -c em1 | grep -E 'rx-usecs:|tx-usecs:'
rx-usecs: 3
tx-usecs: 0

Bart.--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread h...@infradead.org
On Wed, Jan 11, 2017 at 04:08:31PM +, Bart Van Assche wrote:
> A typical Ethernet network adapter delays the generation of an interrupt
> after it has received a packet. A typical block device or HBA does not delay
> the generation of an interrupt that reports an I/O completion.

NVMe allows for configurable interrupt coalescing, as do a few modern
SCSI HBAs.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Hannes Reinecke
On 01/11/2017 04:07 PM, Jens Axboe wrote:
> On 01/11/2017 06:43 AM, Johannes Thumshirn wrote:
>> Hi all,
>>
>> I'd like to attend LSF/MM and would like to discuss polling for block 
>> drivers.
>>
>> Currently there is blk-iopoll but it is neither as widely used as NAPI in the
>> networking field and accoring to Sagi's findings in [1] performance with
>> polling is not on par with IRQ usage.
>>
>> On LSF/MM I'd like to whether it is desirable to have NAPI like polling in
>> more block drivers and how to overcome the currently seen performance issues.
> 
> It would be an interesting topic to discuss, as it is a shame that blk-iopoll
> isn't used more widely.
> 
Indeed; some drivers like lpfc already _have_ a polling mode, but not
hooked up to blk-iopoll. Would be really cool to get that going.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Jens Axboe
On 01/11/2017 08:07 AM, Jens Axboe wrote:
> On 01/11/2017 06:43 AM, Johannes Thumshirn wrote:
>> Hi all,
>>
>> I'd like to attend LSF/MM and would like to discuss polling for block 
>> drivers.
>>
>> Currently there is blk-iopoll but it is neither as widely used as NAPI in the
>> networking field and accoring to Sagi's findings in [1] performance with
>> polling is not on par with IRQ usage.
>>
>> On LSF/MM I'd like to whether it is desirable to have NAPI like polling in
>> more block drivers and how to overcome the currently seen performance issues.
> 
> It would be an interesting topic to discuss, as it is a shame that blk-iopoll
> isn't used more widely.

Forgot to mention - it should only be a topic, if experimentation has
been done and results gathered to pin point what the issues are, so we
have something concrete to discus. I'm not at all interested in a hand
wavy discussion on the topic.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-11 Thread Hannes Reinecke
On 01/11/2017 02:43 PM, Johannes Thumshirn wrote:
> Hi all,
> 
> I'd like to attend LSF/MM and would like to discuss polling for block drivers.
> 
> Currently there is blk-iopoll but it is neither as widely used as NAPI in the
> networking field and accoring to Sagi's findings in [1] performance with
> polling is not on par with IRQ usage.
> 
> On LSF/MM I'd like to whether it is desirable to have NAPI like polling in
> more block drivers and how to overcome the currently seen performance issues.
> 
> [1] http://lists.infradead.org/pipermail/linux-nvme/2016-October/006975.html
> 
Yup.

I'm all for it.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Lsf-pc] [LSF/MM TOPIC] [LSF/MM ATTEND] FS Management Interfaces

2017-01-11 Thread Steven Whitehouse

Hi,


On 10/01/17 10:14, Jan Kara wrote:

Hi,

On Tue 10-01-17 09:44:59, Steven Whitehouse wrote:

I had originally thought about calling the proposal "kernel/userland
interface", however that seemed a bit vague and management interfaces seems
like a better title since it is I hope a bit clearer of the kind of thing
that I'm thinking about in this case.

There are a number of possible sub-topics and I hope that I might find a few
more before LSF too. One is that of space management (we have statfs, but
currently no notifications for thresholds crossed etc., so everything is
polled. That is ok sometimes, but statfs can be expensive in the case of
distributed filesystems, if 100% accurate. We could just have ENOSPC
notifications for 100% full, or something more generic), another is state
transitions (is the fs running normally, or has it gone read
only/withdrawn/etc due to I/O errors?) and a further topic would be working
towards a common interface for fs statistics (at the moment each fs defines
their own interface). One potential implementation, at least for the first
two sub-topics, would be to use something along the lines of the quota
netlink interface, but since few ideas survive first contact with the
community at large, I'm throwing this out for further discussion and
feedback on whether this approach is considered the right way to go.

Assuming the topic is accepted, my intention would be to gather together
some additional sub-topics relating to fs management to go along with those
I mentioned above, and I'd be very interested to hear of any other issues
that could be usefully added to the list for discussion.

So this topic came up last year and probably the year before as well (heh,
I can even find some patches from 2011 [1]). I think the latest attempt at
what you suggest was here [2]. So clearly there's some interest in these
interfaces but not enough to actually drive anything to completion. So for
this topic to be useful, I think you need to go at least through the
patches in [2] and comments to them and have a concrete proposal that can
be discussed and some commitment (not necessarily from yourself) that
someone is going to devote time to implement it. Because generally nobody
seems to be opposed to the abstract idea but once it gets to the
implementation details, it is non-trivial to get some wider agreement
(statx anybody? ;)).

Honza

[1] https://lkml.org/lkml/2011/8/18/170
[2] https://lkml.org/lkml/2015/6/16/456


Yes, statx is something else I'd like to see progress too :-) Going back 
to this topic though, I agree wrt having a concrete proposal, and I'll 
try and have something ready for LSF, we have a few weeks in hand. I'll 
collect up the details of the previous efforts (including Lukas' 
suggestion) and see how far we can get in the mean time,


Steve.


--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign

2017-01-11 Thread Hannes Reinecke
Hi all,

I'd like to attend LSF/MM this year, and would like to discuss a
redesign of the multipath handling.

With recent kernels we've got quite some functionality required for
multipathing already implemented, making some design decisions of the
original multipath-tools implementation quite pointless.

I'm working on a proof-of-concept implementation which just uses a
simple configfs interface and doesn't require a daemon altogether.

At LSF/MM I'd like to discuss how to move forward here, and whether we'd
like to stay with the current device-mapper integration or move away
from that towards a stand-alone implementation.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LFS/MM TOPIC][LFS/MM ATTEND]: - Storage Stack and Driver Testing methodology.

2017-01-11 Thread Hannes Reinecke
On 01/11/2017 10:24 AM, Christoph Hellwig wrote:
> On Wed, Jan 11, 2017 at 10:19:45AM +0100, Johannes Thumshirn wrote:
>> Well, something I was thinking about but didn't find enough time to actually
>> implement is making a xfstestes like test suite written using sg3_utils for
>> SCSI.
> 
> Ronnie's libiscsi testsuite can use SG_IO for a new years now:
> 
> https://github.com/sahlberg/libiscsi/tree/master/test-tool
> 
> and has been very useful to find bus in various protocol
> implementations.
> 
>> This idea could very well be extented to NVMe
> 
> Chaitanya suite is doing something similar for NVMe, although the
> coverage is still much more limited so far.
> 
One of the discussion points here indeed would be if we want to go in
the direction of a protocol-specific testsuites (of which we already
have several) or if it makes sense to move to functional testing.

And if we can have a common interface / documentation on how these
things should run.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] block/bsg: move queue creation into bsg_setup_queue

2017-01-11 Thread Hannes Reinecke
On 01/11/2017 10:01 AM, Christoph Hellwig wrote:
> On Wed, Jan 11, 2017 at 09:59:17AM +0100, Hannes Reinecke wrote:
>> I'd advocate to discuss this at LSF.
>> Now that Mike moved the bio-based mpath stuff back in things got even
>> more complex.
> 
> Yeah.  If we'd _only_ have bio based support it would simplify things
> a lot, but as a third parallel path it's not exactly making things easier.
> 
>> I'll be posting a patchset for reimplementing multipath as a stand-alone
>> driver shortly; that'll give us a good starting point on how we want
>> multipath to evolve.
>>
>> Who knows; we might even manage to move multipath out of device-mapper
>> altogether.
>> That would make Mike very happy, and I wouldn't mind, either :-)
> 
> Heh.  I'm curious how you want to do that while keeping existing setups
> working, though.

which will become challenging, indeed.

ATM it's just a testbed on how things could work; we've got most of the
required infrastucture in the kernel nowadays, so that we can just drop
most of the complexity from the present multipath-tools mess.

In the end it might even boil down to update the existing device-mapper
multipath implementation. We'll see.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LFS/MM TOPIC][LFS/MM ATTEND]: - Storage Stack and Driver Testing methodology.

2017-01-11 Thread Johannes Thumshirn
On Tue, Jan 10, 2017 at 10:40:53PM +, Chaitanya Kulkarni wrote:
> Resending it at as a plain text.
> 
> From: Chaitanya Kulkarni
> Sent: Tuesday, January 10, 2017 2:37 PM
> To: lsf...@lists.linux-foundation.org
> Cc: linux-fsde...@vger.kernel.org; linux-block@vger.kernel.org; 
> linux-n...@lists.infradead.org; linux-s...@vger.kernel.org; 
> linux-...@vger.kernel.org
> Subject: [LFS/MM TOPIC][LFS/MM ATTEND]: - Storage Stack and Driver Testing 
> methodology.
>   
> 
> Hi Folks,
> 
> I would like to propose a general discussion on Storage stack and device 
> driver testing.
> 
> Purpose:-
> -
> The main objective of this discussion is to address the need for 
> a Unified Test Automation Framework which can be used by different subsystems
> in the kernel in order to improve the overall development and stability
> of the storage stack.
> 
> For Example:- 
> From my previous experience, I've worked on the NVMe driver testing last year 
> and we
> have developed simple unit test framework
>  (https://github.com/linux-nvme/nvme-cli/tree/master/tests). 
> In current implementation Upstream NVMe Driver supports following subsystems:-
> 1. PCI Host.
> 2. RDMA Target.
> 3. Fiber Channel Target (in progress).
> Today due to lack of centralized automated test framework NVMe Driver testing 
> is 
> scattered and performed using the combination of various utilities like 
> nvme-cli/tests, 
> nvmet-cli, shell scripts (git://git.infradead.org/nvme-fabrics.git 
> nvmf-selftests) etc.
> 
> In order to improve overall driver stability with various subsystems, it will 
> be beneficial
> to have a Unified Test Automation Framework (UTAF) which will centralize 
> overall
> testing. 
> 
> This topic will allow developers from various subsystem engage in the 
> discussion about 
> how to collaborate efficiently instead of having discussions on lengthy email 
> threads.
> 
> Participants:-
> --
> I'd like to invite developers from different subsystems to discuss an 
> approach towards 
> a unified testing methodology for storage stack and device drivers belongs to 
> different subsystems.
> 
> Topics for Discussion:-
> --
> As a part of discussion following are some of the key points which we can 
> focus on:-
> 1. What are the common components of the kernel used by the various 
> subsystems?
> 2. What are the potential target drivers which can benefit from this 
> approach? 
>   (e.g. NVMe, NVMe Over Fabric, Open Channel Solid State Drives etc.)
> 3. What are the desired features that can be implemented in this Framework?
>   (code coverage, unit tests, stress testings, regression, generating 
> Coccinelle reports etc.) 
> 4. Desirable Report generation mechanism?
> 5. Basic performance validation?
> 6. Whether QEMU can be used to emulate some of the H/W functionality to 
> create a test 
>   platform? (Optional subsystem specific)

Well, something I was thinking about but didn't find enough time to actually
implement is making a xfstestes like test suite written using sg3_utils for
SCSI. This idea could very well be extented to NVMe, AHCI, blk, etc...

Byte,
Johannes
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] block/bsg: move queue creation into bsg_setup_queue

2017-01-11 Thread Christoph Hellwig
On Wed, Jan 11, 2017 at 09:59:17AM +0100, Hannes Reinecke wrote:
> I'd advocate to discuss this at LSF.
> Now that Mike moved the bio-based mpath stuff back in things got even
> more complex.

Yeah.  If we'd _only_ have bio based support it would simplify things
a lot, but as a third parallel path it's not exactly making things easier.

> I'll be posting a patchset for reimplementing multipath as a stand-alone
> driver shortly; that'll give us a good starting point on how we want
> multipath to evolve.
> 
> Who knows; we might even manage to move multipath out of device-mapper
> altogether.
> That would make Mike very happy, and I wouldn't mind, either :-)

Heh.  I'm curious how you want to do that while keeping existing setups
working, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] block/bsg: move queue creation into bsg_setup_queue

2017-01-11 Thread Hannes Reinecke
On 01/11/2017 09:45 AM, Christoph Hellwig wrote:
> On Wed, Jan 11, 2017 at 09:42:44AM +0100, Johannes Thumshirn wrote:
>> On Tue, Jan 10, 2017 at 04:06:19PM +0100, Christoph Hellwig wrote:
>>> Simply the boilerplate code needed for bsg nodes a bit.
>>>
>>> Signed-off-by: Christoph Hellwig 
>>> ---
>>
>> that reminds me of posting my SAS bsg-lib patch...
> 
> Yes.  Having SAS use bsg-lib, and bsg-lib switched away from abusing
> struct request_queue would make this series a lot cleaner.
> 
> So maybe we should get that into the scsi tree for 4.10 together
> with the prep patches in this series as a priority and defer the actual
> struct request changes once again.  That should also give us some more
> time to sort out the dm-mpath story..
I'd advocate to discuss this at LSF.
Now that Mike moved the bio-based mpath stuff back in things got even
more complex.

I'll be posting a patchset for reimplementing multipath as a stand-alone
driver shortly; that'll give us a good starting point on how we want
multipath to evolve.

Who knows; we might even manage to move multipath out of device-mapper
altogether.
That would make Mike very happy, and I wouldn't mind, either :-)

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] block/bsg: move queue creation into bsg_setup_queue

2017-01-11 Thread Christoph Hellwig
On Wed, Jan 11, 2017 at 09:42:44AM +0100, Johannes Thumshirn wrote:
> On Tue, Jan 10, 2017 at 04:06:19PM +0100, Christoph Hellwig wrote:
> > Simply the boilerplate code needed for bsg nodes a bit.
> > 
> > Signed-off-by: Christoph Hellwig 
> > ---
> 
> that reminds me of posting my SAS bsg-lib patch...

Yes.  Having SAS use bsg-lib, and bsg-lib switched away from abusing
struct request_queue would make this series a lot cleaner.

So maybe we should get that into the scsi tree for 4.10 together
with the prep patches in this series as a priority and defer the actual
struct request changes once again.  That should also give us some more
time to sort out the dm-mpath story..
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/15] virtio_blk: avoid DMA to stack for the sense buffer

2017-01-11 Thread Christoph Hellwig
On Wed, Jan 11, 2017 at 09:26:46AM +0100, Johannes Thumshirn wrote:
> Isn't that one already queued in Jens' tree?

Yes, it's now queued up.  Patch 2 was submitted as well and should
hopefully go into the next 4.10 RC.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] block/bsg: move queue creation into bsg_setup_queue

2017-01-11 Thread Johannes Thumshirn
On Tue, Jan 10, 2017 at 04:06:19PM +0100, Christoph Hellwig wrote:
> Simply the boilerplate code needed for bsg nodes a bit.
> 
> Signed-off-by: Christoph Hellwig 
> ---

that reminds me of posting my SAS bsg-lib patch...

Anyways looks good,
Reviewed-by: Johannes Thumshirn 

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html