date:20171026

Re: [PATCH v4] virtio_blk: Fix an SG_IO regression

2017-10-26 Thread Christoph Hellwig

On Thu, Oct 26, 2017 at 12:10:15PM +0200, Bart Van Assche wrote:
> Avoid that submitting an SG_IO ioctl triggers a kernel oops that
> is preceded by:
> 
> usercopy: kernel memory overwrite attempt detected to (null) () (6 
> bytes)
> kernel BUG at mm/usercopy.c:72!
> 
> Reported-by: Dann Frazier 
> Fixes: commit ca18d6f769d2 ("block: Make most scsi_req_init() calls implicit")
> Signed-off-by: Bart Van Assche 
> Cc: Michael S. Tsirkin 
> Cc: Dann Frazier 
> Cc:  # v4.13
> ---
>  drivers/block/virtio_blk.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 34e17ee799be..e477d4a5181e 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -593,10 +593,20 @@ static int virtblk_map_queues(struct blk_mq_tag_set 
> *set)
>   return blk_mq_virtio_map_queues(set, vblk->vdev, 0);
>  }
>  
> +static void virtblk_initialize_rq(struct request *req)
> +{
> + struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
> +
> +#ifdef CONFIG_VIRTIO_BLK_SCSI
> + scsi_req_init(&vbr->sreq);
> +#endif

How about only defininig the initialize_rq method and implementation
if CONFIG_VIRTIO_BLK_SCSI is set?

Else looks good:

Reviewed-by: Christoph Hellwig

[GIT PULL] one nvme fix for 4.14

2017-10-26 Thread Christoph Hellwig

The following changes since commit f04b9cc87b5fc466b1b7231ba7b078e885956c5b:

  nvme-rdma: Fix error status return in tagset allocation failure (2017-10-19 
17:13:51 +0200)

are available in the git repository at:

  git://git.infradead.org/nvme.git nvme-4.14

for you to fetch changes up to 7db814465395f3196ee98c8bd40d214d63e4f708:

  nvme-rdma: fix possible hang when issuing commands during ctrl removal 
(2017-10-23 16:27:44 +0200)


Sagi Grimberg (1):
  nvme-rdma: fix possible hang when issuing commands during ctrl removal

 drivers/nvme/host/rdma.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-10-26 Thread Ming Lei

Hello Bart,

On Fri, Oct 27, 2017 at 04:53:18AM +, Bart Van Assche wrote:
> On Fri, 2017-10-27 at 12:43 +0800, Ming Lei wrote:
> > The 1st patch removes the RESTART for TAG-SHARED because SCSI handles it
> > by itself, and not necessary to waste CPU to do the expensive RESTART.
> > And Roman Pen reported that this RESTART cuts half of IOPS in his case.
> > 
> > The 2nd patch removes the RESTART when .get_budget returns BLK_STS_RESOURCE,
> > and this RESTART is handled by SCSI's RESTART(scsi_end_request()) too.
> 
> Hello Ming,
> 
> There are more block drivers than the SCSI core that share tags. If the

Could you share us what the other in-tree driver which share tags is?

If there are really more, and when all can share similar RESTART mechanism, I
think that is the time for considering the common RESTART.

> restart mechanism is removed from the blk-mq core, does that mean that all
> block drivers that share tags will have to follow the example of the SCSI
> core and implement a restart mechanism themselves? As far as I know there

If there are such drivers, there should have been their own restart mechanism
which works for long time before blk-mq comes, and more importantly each driver
has much more knowledge than generic block layer to handle the restart,
such as SCSI's restart, that means driver's implementation may be more 
efficient.

Also the RESTART for TAG-SHARED may never function as expected wrt. SCSI.

And more importantly the block's RESTART for TAG-SHARED has caused big 
performance
issue for people.

IMO, the above two reasons are enough to remove the current RESTART for 
shared-tag.

> is a strong preference in the Linux community to implement common mechanisms
> in the (block layer) core instead of in drivers. It seems to me like you are
> proposing the opposite, namely removing a general mechanism from the (block
> layer) core and moving it into a driver, namely the SCSI core?

Actually SCSI's RESTART is very thin, and it is just the per-host starved_list.

Thanks,
Ming

Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-10-26 Thread Bart Van Assche

On Fri, 2017-10-27 at 12:43 +0800, Ming Lei wrote:
> The 1st patch removes the RESTART for TAG-SHARED because SCSI handles it
> by itself, and not necessary to waste CPU to do the expensive RESTART.
> And Roman Pen reported that this RESTART cuts half of IOPS in his case.
> 
> The 2nd patch removes the RESTART when .get_budget returns BLK_STS_RESOURCE,
> and this RESTART is handled by SCSI's RESTART(scsi_end_request()) too.

Hello Ming,

There are more block drivers than the SCSI core that share tags. If the
restart mechanism is removed from the blk-mq core, does that mean that all
block drivers that share tags will have to follow the example of the SCSI
core and implement a restart mechanism themselves? As far as I know there
is a strong preference in the Linux community to implement common mechanisms
in the (block layer) core instead of in drivers. It seems to me like you are
proposing the opposite, namely removing a general mechanism from the (block
layer) core and moving it into a driver, namely the SCSI core?

Bart.

[PATCH V2 0/2] block: remove unnecessary RESTART

2017-10-26 Thread Ming Lei

Hi Jens,

The 1st patch removes the RESTART for TAG-SHARED because SCSI handles it
by itself, and not necessary to waste CPU to do the expensive RESTART.
And Roman Pen reported that this RESTART cuts half of IOPS in his case.

The 2nd patch removes the RESTART when .get_budget returns BLK_STS_RESOURCE,
and this RESTART is handled by SCSI's RESTART(scsi_end_request()) too.


Ming Lei (2):
  blk-mq: don't handle TAG_SHARED in restart
  blk-mq: don't restart queue when .get_budget returns BLK_STS_RESOURCE

 block/blk-mq-sched.c | 123 -
 block/blk-mq-sched.h |   2 +-
 block/blk-mq.c   |   8 +---
 3 files changed, 27 insertions(+), 106 deletions(-)

-- 
2.9.5

[PATCH V2 1/2] blk-mq: don't handle TAG_SHARED in restart

2017-10-26 Thread Ming Lei

Now restart is used in the following cases, and TAG_SHARED is for
SCSI only.

1) .get_budget() returns BLK_STS_RESOURCE
- if resource in target/host level isn't satisfied, this SCSI device
will be added in shost->starved_list, and the whole queue will be rerun
(via SCSI's built-in RESTART) in scsi_end_request() after any request
initiated from this host/targe is completed. Forget to mention, host level
resource can't be an issue for blk-mq at all.

- the same is true if resource in the queue level isn't satisfied.

- if there isn't outstanding request on this queue, then SCSI's RESTART
can't work(blk-mq's can't work too), and the queue will be run after
SCSI_QUEUE_DELAY, and finally all starved sdevs will be handled by SCSI's
RESTART when this request is finished

2) scsi_dispatch_cmd() returns BLK_STS_RESOURCE
- if there isn't onprogressing request on this queue, the queue
will be run after SCSI_QUEUE_DELAY

- otherwise, SCSI's RESTART covers the rerun.

3) blk_mq_get_driver_tag() failed
- BLK_MQ_S_TAG_WAITING covers the cross-queue RESTART for driver
allocation.

In one word, SCSI's built-in RESTART is enough to cover the queue
rerun, and we don't need to pay special attention to TAG_SHARED wrt. restart.

In my test on scsi_debug(8 luns), this patch improves IOPS by 20% ~ 30% when
running I/O on these 8 luns concurrently.

Aslo Roman Pen reported the current RESTART is very expensive especialy
when there are lots of LUNs attached in one host, such as in his
test, RESTART causes half of IOPS be cut.

Fixes: https://marc.info/?l=linux-kernel&m=150832216727524&w=2
Fixes: 6d8c6c0f97ad ("blk-mq: Restart a single queue if tag sets are shared")
Signed-off-by: Ming Lei 
---
 block/blk-mq-sched.c | 78 +++-
 1 file changed, 4 insertions(+), 74 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index df8581bb0a37..daab27feb653 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -68,25 +68,17 @@ static void blk_mq_sched_mark_restart_hctx(struct 
blk_mq_hw_ctx *hctx)
set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
 }
 
-static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
+void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
 {
if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
-   return false;
-
-   if (hctx->flags & BLK_MQ_F_TAG_SHARED) {
-   struct request_queue *q = hctx->queue;
+   return;
 
-   if (test_and_clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
-   atomic_dec(&q->shared_hctx_restart);
-   } else
-   clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
+   clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
 
if (blk_mq_hctx_has_pending(hctx)) {
blk_mq_run_hw_queue(hctx, true);
-   return true;
+   return;
}
-
-   return false;
 }
 
 /* return true if hctx need to run again */
@@ -385,68 +377,6 @@ static bool blk_mq_sched_bypass_insert(struct 
blk_mq_hw_ctx *hctx,
return true;
 }
 
-/**
- * list_for_each_entry_rcu_rr - iterate in a round-robin fashion over rcu list
- * @pos:loop cursor.
- * @skip:   the list element that will not be examined. Iteration starts at
- *  @skip->next.
- * @head:   head of the list to examine. This list must have at least one
- *  element, namely @skip.
- * @member: name of the list_head structure within typeof(*pos).
- */
-#define list_for_each_entry_rcu_rr(pos, skip, head, member)\
-   for ((pos) = (skip);\
-(pos = (pos)->member.next != (head) ? list_entry_rcu(  \
-   (pos)->member.next, typeof(*pos), member) : \
- list_entry_rcu((pos)->member.next->next, typeof(*pos), member)), \
-(pos) != (skip); )
-
-/*
- * Called after a driver tag has been freed to check whether a hctx needs to
- * be restarted. Restarts @hctx if its tag set is not shared. Restarts hardware
- * queues in a round-robin fashion if the tag set of @hctx is shared with other
- * hardware queues.
- */
-void blk_mq_sched_restart(struct blk_mq_hw_ctx *const hctx)
-{
-   struct blk_mq_tags *const tags = hctx->tags;
-   struct blk_mq_tag_set *const set = hctx->queue->tag_set;
-   struct request_queue *const queue = hctx->queue, *q;
-   struct blk_mq_hw_ctx *hctx2;
-   unsigned int i, j;
-
-   if (set->flags & BLK_MQ_F_TAG_SHARED) {
-   /*
-* If this is 0, then we know that no hardware queues
-* have RESTART marked. We're done.
-*/
-   if (!atomic_read(&queue->shared_hctx_restart))
-   return;
-
-   rcu_read_lock();
-   list_for_each_entry_rcu_rr(q, queue, &set->tag_list,
-  tag_set_list) {
-   queue_for_e

[PATCH V2 2/2] blk-mq: don't restart queue when .get_budget returns BLK_STS_RESOURCE

2017-10-26 Thread Ming Lei

SCSI restarts its queue in scsi_end_request() automatically, so we don't
need to handle this case in blk-mq.

Especailly any request won't be dequeued in this case, we needn't to
worry about IO hang caused by restart vs. dispatch.

Signed-off-by: Ming Lei 
---
 block/blk-mq-sched.c | 45 -
 block/blk-mq-sched.h |  2 +-
 block/blk-mq.c   |  8 ++--
 3 files changed, 23 insertions(+), 32 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index daab27feb653..7775f6b12fa9 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -81,8 +81,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
}
 }
 
-/* return true if hctx need to run again */
-static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
+/*
+ * Only SCSI implements .get_budget and .put_budget, and SCSI restarts
+ * its queue by itself in its completion handler, so we don't need to
+ * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
+ */
+static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 {
struct request_queue *q = hctx->queue;
struct elevator_queue *e = q->elevator;
@@ -98,7 +102,7 @@ static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx 
*hctx)
 
ret = blk_mq_get_dispatch_budget(hctx);
if (ret == BLK_STS_RESOURCE)
-   return true;
+   break;
 
rq = e->type->ops.mq.dispatch_request(hctx);
if (!rq) {
@@ -116,8 +120,6 @@ static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx 
*hctx)
 */
list_add(&rq->queuelist, &rq_list);
} while (blk_mq_dispatch_rq_list(q, &rq_list, true));
-
-   return false;
 }
 
 static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
@@ -131,8 +133,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct 
blk_mq_hw_ctx *hctx,
return hctx->ctxs[idx];
 }
 
-/* return true if hctx need to run again */
-static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
+/*
+ * Only SCSI implements .get_budget and .put_budget, and SCSI restarts
+ * its queue by itself in its completion handler, so we don't need to
+ * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
+ */
+static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
 {
struct request_queue *q = hctx->queue;
LIST_HEAD(rq_list);
@@ -147,7 +153,7 @@ static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx 
*hctx)
 
ret = blk_mq_get_dispatch_budget(hctx);
if (ret == BLK_STS_RESOURCE)
-   return true;
+   break;
 
rq = blk_mq_dequeue_from_ctx(hctx, ctx);
if (!rq) {
@@ -171,22 +177,19 @@ static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx 
*hctx)
} while (blk_mq_dispatch_rq_list(q, &rq_list, true));
 
WRITE_ONCE(hctx->dispatch_from, ctx);
-
-   return false;
 }
 
 /* return true if hw queue need to be run again */
-bool blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
+void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 {
struct request_queue *q = hctx->queue;
struct elevator_queue *e = q->elevator;
const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
LIST_HEAD(rq_list);
-   bool run_queue = false;
 
/* RCU or SRCU read lock is needed before checking quiesced flag */
if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
-   return false;
+   return;
 
hctx->run++;
 
@@ -218,12 +221,12 @@ bool blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
*hctx)
blk_mq_sched_mark_restart_hctx(hctx);
if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
if (has_sched_dispatch)
-   run_queue = blk_mq_do_dispatch_sched(hctx);
+   blk_mq_do_dispatch_sched(hctx);
else
-   run_queue = blk_mq_do_dispatch_ctx(hctx);
+   blk_mq_do_dispatch_ctx(hctx);
}
} else if (has_sched_dispatch) {
-   run_queue = blk_mq_do_dispatch_sched(hctx);
+   blk_mq_do_dispatch_sched(hctx);
} else if (q->mq_ops->get_budget) {
/*
 * If we need to get budget before queuing request, we
@@ -233,19 +236,11 @@ bool blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
*hctx)
 * TODO: get more budgets, and dequeue more requests in
 * one time.
 */
-   run_queue = blk_mq_do_dispatch_ctx(hctx);
+   blk_mq_do_dispatch_ctx(hctx);
} else {
blk_mq_flush_busy_ctxs(hctx, &rq_list);
blk_mq_dispatch_rq_list(q, &rq_list, false);
}
-
-   if (run_qu

[PATCH] block: avoid to fail elevator switch

2017-10-26 Thread Ming Lei

elevator switch can be done just between register_disk() and
blk_register_queue(), then we can't change elevator during this period
because FLAG_REGISTERED isn't set at that time.

One typical use case is that elevator is changed via udev by the following
rule, and the KOBJ_ADD uevent is just emited at the end of register_disk()
and before running blk_register_queue().

ACTION=="add|change", SUBSYSTEM=="block" , KERNEL=="sda",  RUN+="/bin/sh -c 
'echo none > /sys/block/sda/queue/scheduler'"

This patch fixes the elevator switch failure issue.

Fixes: e9a823fb34a8b0(block: fix warning when I/O elevator is changed as 
request_queue is being removed)
Cc: David Jeffery 
Signed-off-by: Ming Lei 
---
 block/blk-sysfs.c |  7 ++-
 block/elevator.c  | 13 +
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index b8362c0df51d..480959c5b036 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -927,12 +927,17 @@ int blk_register_queue(struct gendisk *disk)
 void blk_unregister_queue(struct gendisk *disk)
 {
struct request_queue *q = disk->queue;
+   bool elv_registered;
 
if (WARN_ON(!q))
return;
 
mutex_lock(&q->sysfs_lock);
queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q);
+   if (q->request_fn || (q->mq_ops && q->elevator))
+   elv_registered = q->elevator->registered;
+   else
+   elv_registered = false;
mutex_unlock(&q->sysfs_lock);
 
wbt_exit(q);
@@ -941,7 +946,7 @@ void blk_unregister_queue(struct gendisk *disk)
if (q->mq_ops)
blk_mq_unregister_dev(disk_to_dev(disk), q);
 
-   if (q->request_fn || (q->mq_ops && q->elevator))
+   if (elv_registered)
elv_unregister_queue(q);
 
kobject_uevent(&q->kobj, KOBJ_REMOVE);
diff --git a/block/elevator.c b/block/elevator.c
index 7ae50eb2732b..0e2a5140111f 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -852,6 +852,15 @@ int elv_register_queue(struct request_queue *q)
struct elevator_queue *e = q->elevator;
int error;
 
+   /*
+* When queue isn't registerd to gendisk, the elevator queue
+* will be added after it is registered to gendisk; When queue
+* is being unregistered, not necessary to add elevator queue
+* any more.
+*/
+   if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags))
+   return 0;
+
error = kobject_add(&e->kobj, &q->kobj, "%s", "iosched");
if (!error) {
struct elv_fs_entry *attr = e->type->elevator_attrs;
@@ -1055,10 +1064,6 @@ static int __elevator_change(struct request_queue *q, 
const char *name)
char elevator_name[ELV_NAME_MAX];
struct elevator_type *e;
 
-   /* Make sure queue is not in the middle of being removed */
-   if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags))
-   return -ENOENT;
-
/*
 * Special case for mq, turn off scheduling
 */
-- 
2.9.5

[RFC PATCH] blk-throttle: add burst allowance

2017-10-26 Thread Khazhismel Kumykov

Allows configuration additional bytes or ios before a throttle is
triggered. Slice end is extended to cover expended allowance recovery
time.

Usage would be e.g. per device to allow users to take up to X bytes/ios
at full speed, but be limited to Y bps/iops with sustained usage.

Signed-off-by: Khazhismel Kumykov 
---
 block/Kconfig|  11 +++
 block/blk-throttle.c | 185 ---
 2 files changed, 186 insertions(+), 10 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index 3ab42bbb06d5..16545caa7fc9 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -127,6 +127,17 @@ config BLK_DEV_THROTTLING_LOW
 
Note, this is an experimental interface and could be changed someday.
 
+config BLK_DEV_THROTTLING_BURST
+bool "Block throttling .burst allowance interface"
+depends on BLK_DEV_THROTTLING
+default n
+---help---
+Add .burst allowance for block throttling. Burst allowance allows for
+additional unthrottled usage, while still limiting speed for sustained
+usage.
+
+If in doubt, say N.
+
 config BLK_CMDLINE_PARSER
bool "Block device command line partition parser"
default n
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index d80c3f0144c5..e09ec11e9c5f 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -156,6 +156,11 @@ struct throtl_grp {
/* Number of bio's dispatched in current slice */
unsigned int io_disp[2];
 
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   uint64_t bytes_burst_conf[2];
+   unsigned int io_burst_conf[2];
+#endif
+
unsigned long last_low_overflow_time[2];
 
uint64_t last_bytes_disp[2];
@@ -506,6 +511,12 @@ static struct blkg_policy_data *throtl_pd_alloc(gfp_t gfp, 
int node)
tg->bps_conf[WRITE][LIMIT_MAX] = U64_MAX;
tg->iops_conf[READ][LIMIT_MAX] = UINT_MAX;
tg->iops_conf[WRITE][LIMIT_MAX] = UINT_MAX;
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   tg->bytes_burst_conf[READ] = 0;
+   tg->bytes_burst_conf[WRITE] = 0;
+   tg->io_burst_conf[READ] = 0;
+   tg->io_burst_conf[WRITE] = 0;
+#endif
/* LIMIT_LOW will have default value 0 */
 
tg->latency_target = DFL_LATENCY_TARGET;
@@ -799,6 +810,26 @@ static inline void throtl_start_new_slice(struct 
throtl_grp *tg, bool rw)
   tg->slice_end[rw], jiffies);
 }
 
+/*
+ * When current slice should end.
+ *
+ * With CONFIG_BLK_DEV_THROTTLING_BURST, we will wait longer than min_wait
+ * for slice to recover used burst allowance. (*_disp -> 0). Setting slice_end
+ * before this would result in tg receiving additional burst allowance.
+ */
+static inline unsigned long throtl_slice_wait(struct throtl_grp *tg, bool rw,
+   unsigned long min_wait)
+{
+   unsigned long bytes_wait = 0, io_wait = 0;
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   if (tg->bytes_burst_conf[rw])
+   bytes_wait = (tg->bytes_disp[rw] * HZ) / tg_bps_limit(tg, rw);
+   if (tg->io_burst_conf[rw])
+   io_wait = (tg->io_disp[rw] * HZ) / tg_iops_limit(tg, rw);
+#endif
+   return jiffies + max(min_wait, max(bytes_wait, io_wait));
+}
+
 static inline void throtl_set_slice_end(struct throtl_grp *tg, bool rw,
unsigned long jiffy_end)
 {
@@ -848,7 +879,8 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, 
bool rw)
 * is bad because it does not allow new slice to start.
 */
 
-   throtl_set_slice_end(tg, rw, jiffies + tg->td->throtl_slice);
+   throtl_set_slice_end(tg, rw,
+throtl_slice_wait(tg, rw, tg->td->throtl_slice));
 
time_elapsed = jiffies - tg->slice_start[rw];
 
@@ -888,7 +920,7 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, 
struct bio *bio,
  unsigned long *wait)
 {
bool rw = bio_data_dir(bio);
-   unsigned int io_allowed;
+   unsigned int io_allowed, io_disp;
unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
u64 tmp;
 
@@ -907,6 +939,17 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, 
struct bio *bio,
 * have been trimmed.
 */
 
+   io_disp = tg->io_disp[rw];
+
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   if (tg->io_disp[rw] < tg->io_burst_conf[rw]) {
+   if (wait)
+   *wait = 0;
+   return true;
+   }
+   io_disp -= tg->io_burst_conf[rw];
+#endif
+
tmp = (u64)tg_iops_limit(tg, rw) * jiffy_elapsed_rnd;
do_div(tmp, HZ);
 
@@ -915,14 +958,14 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, 
struct bio *bio,
else
io_allowed = tmp;
 
-   if (tg->io_disp[rw] + 1 <= io_allowed) {
+   if (io_disp + 1 <= io_allowed) {
if (wait)
*wait = 0;
return true;
}
 
/* Calc approx ti

RE: [PATCH 00/12 v4] multiqueue for MMC/SD

2017-10-26 Thread Hunter, Adrian

> -Original Message-
> From: Linus Walleij [mailto:linus.wall...@linaro.org]
> Sent: Thursday, October 26, 2017 5:20 PM
> To: Hunter, Adrian 
> Cc: linux-...@vger.kernel.org; Ulf Hansson ;
> linux-block ; Jens Axboe ;
> Christoph Hellwig ; Arnd Bergmann ;
> Bartlomiej Zolnierkiewicz ; Paolo Valente
> ; Avri Altman 
> Subject: Re: [PATCH 00/12 v4] multiqueue for MMC/SD
> 
> On Thu, Oct 26, 2017 at 3:34 PM, Adrian Hunter 
> wrote:
> > On 26/10/17 15:57, Linus Walleij wrote:
> >> In my opinion this is also a better fit for command queueuing.
> >
> > Not true.  CQE support worked perfectly before blk-mq and did not
> > depend on blk-mq in any way.  Obviously the current CQE patch set
> > actually implements the CQE requirements for blk-mq - which this patch set
> does not.
> 
> What I mean is that the CQE code will likely look better on top of these
> refactorings.
> 
> But as I say it is a matter of taste. I just love the looks of my own code as
> much as the next programmer so I can't judge that. Let's see what the
> reviewers say.

It doesn't look ready.  There seems still to be 2 task switches between
each transfer.  mmc_blk_rw_done_error() is still using the messy error
handling and doesn’t handle retries e.g. 'retry' is a local variable so it can't
count the number of retries now that there is no loop.

> >> It sounds simple but I bet this drives a truck through Adrians patch
> >> series. Sorry. :(
> >
> > I waited a long time for your patches but I had to give up waiting
> > when Ulf belated insisted on blk-mq before CQE.  I am not sure what
> > you are expecting now it seems too late.
> 
> Too late for what? It's just a patch set, I don't really have a deadline for 
> this or
> anything. As I explained above I have been working on this all the time, the
> problem was that I was/am not smart enough to find that solution for host
> claiming by context.

Too late to go before CQE.  All the blk-mq work is now in the CQE patchset.

> 
> The host claiming by context was merged a month ago and now I have
> understood that I can use that and rebased my patches on it. Slow learner, I
> guess.
> 
> If you feel it is ungrateful that you have put in so much work and things are
> not getting merged, and you feel your patches deserve to be merged first
> (because of human nature reasons) I can empathize with that. It's sad that
> your patches are at v12. Also I see that patch 4 bears the signoffs of a
> significant team at CodeAurora, so they are likely as impatient.

It is important that you understand that this has nothing to do with
"human nature reasons".Linux distributions use upstream kernels. 
ChromeOS  has an "upstream first" policy.  Delaying features for long
periods has real-world consequences.  When people ask, what kernel
should they use, we expect to reply, just use mainline.

[PATCH 4/4] block: add WARN_ON if bdi register fail

2017-10-26 Thread weiping zhang

device_add_disk need do more safety error handle, so this patch just
add WARN_ON.

Signed-off-by: weiping zhang 
---
 block/genhd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/genhd.c b/block/genhd.c
index dd305c65ffb0..cb55eea821eb 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -660,7 +660,9 @@ void device_add_disk(struct device *parent, struct gendisk 
*disk)
 
/* Register BDI before referencing it from bdev */
bdi = disk->queue->backing_dev_info;
-   bdi_register_owner(bdi, disk_to_dev(disk));
+   retval = bdi_register_owner(bdi, disk_to_dev(disk));
+   if (retval)
+   WARN_ON(1);
 
blk_register_region(disk_devt(disk), disk->minors, NULL,
exact_match, exact_lock, disk);
-- 
2.14.2

[PATCH 3/4] bdi: add error handle for bdi_debug_register

2017-10-26 Thread weiping zhang

In order to make error handle more cleaner we call bdi_debug_register
before set state to WB_registered, that we can avoid call bdi_unregister
in release_bdi().

Signed-off-by: weiping zhang 
---
 mm/backing-dev.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index e9d6a1ede12b..54396d53f471 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -893,10 +893,13 @@ int bdi_register_va(struct backing_dev_info *bdi, const 
char *fmt, va_list args)
if (IS_ERR(dev))
return PTR_ERR(dev);
 
+   if (bdi_debug_register(bdi, dev_name(dev))) {
+   device_destroy(bdi_class, dev->devt);
+   return -ENOMEM;
+   }
cgwb_bdi_register(bdi);
bdi->dev = dev;
 
-   bdi_debug_register(bdi, dev_name(dev));
set_bit(WB_registered, &bdi->wb.state);
 
spin_lock_bh(&bdi_lock);
@@ -916,6 +919,8 @@ int bdi_register(struct backing_dev_info *bdi, const char 
*fmt, ...)
va_start(args, fmt);
ret = bdi_register_va(bdi, fmt, args);
va_end(args);
+   if (ret)
+   bdi_put(bdi);
return ret;
 }
 EXPORT_SYMBOL(bdi_register);
-- 
2.14.2

[PATCH 1/4] bdi: add check for bdi_debug_root

2017-10-26 Thread weiping zhang

this patch add a check for bdi_debug_root and do error handle for it.
we should make sure it was created success, otherwise when add new
block device's bdi folder(eg, 8:0) will be create a debugfs root directory.

Signed-off-by: weiping zhang 
---
 mm/backing-dev.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 74b52dfd5852..5072be19d9b2 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -36,9 +36,12 @@ struct workqueue_struct *bdi_wq;
 
 static struct dentry *bdi_debug_root;
 
-static void bdi_debug_init(void)
+static int bdi_debug_init(void)
 {
bdi_debug_root = debugfs_create_dir("bdi", NULL);
+   if (!bdi_debug_root)
+   return -ENOMEM;
+   return 0;
 }
 
 static int bdi_debug_stats_show(struct seq_file *m, void *v)
@@ -126,8 +129,9 @@ static void bdi_debug_unregister(struct backing_dev_info 
*bdi)
debugfs_remove(bdi->debug_dir);
 }
 #else
-static inline void bdi_debug_init(void)
+static inline int bdi_debug_init(void)
 {
+   return 0;
 }
 static inline void bdi_debug_register(struct backing_dev_info *bdi,
  const char *name)
@@ -229,12 +233,19 @@ ATTRIBUTE_GROUPS(bdi_dev);
 
 static __init int bdi_class_init(void)
 {
+   int ret;
+
bdi_class = class_create(THIS_MODULE, "bdi");
if (IS_ERR(bdi_class))
return PTR_ERR(bdi_class);
 
bdi_class->dev_groups = bdi_dev_groups;
-   bdi_debug_init();
+   ret = bdi_debug_init();
+   if (ret) {
+   class_destroy(bdi_class);
+   bdi_class = NULL;
+   return ret;
+   }
 
return 0;
 }
-- 
2.14.2

[PATCH 2/4] bdi: convert bdi_debug_register to int

2017-10-26 Thread weiping zhang

Convert bdi_debug_register to int and then do error handle for it.

Signed-off-by: weiping zhang 
---
 mm/backing-dev.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 5072be19d9b2..e9d6a1ede12b 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -116,11 +116,23 @@ static const struct file_operations bdi_debug_stats_fops 
= {
.release= single_release,
 };
 
-static void bdi_debug_register(struct backing_dev_info *bdi, const char *name)
+static int bdi_debug_register(struct backing_dev_info *bdi, const char *name)
 {
+   if (!bdi_debug_root)
+   return -ENOMEM;
+
bdi->debug_dir = debugfs_create_dir(name, bdi_debug_root);
+   if (!bdi->debug_dir)
+   return -ENOMEM;
+
bdi->debug_stats = debugfs_create_file("stats", 0444, bdi->debug_dir,
   bdi, &bdi_debug_stats_fops);
+   if (!bdi->debug_stats) {
+   debugfs_remove(bdi->debug_dir);
+   return -ENOMEM;
+   }
+
+   return 0;
 }
 
 static void bdi_debug_unregister(struct backing_dev_info *bdi)
@@ -133,9 +145,10 @@ static inline int bdi_debug_init(void)
 {
return 0;
 }
-static inline void bdi_debug_register(struct backing_dev_info *bdi,
+static inline int bdi_debug_register(struct backing_dev_info *bdi,
  const char *name)
 {
+   return 0;
 }
 static inline void bdi_debug_unregister(struct backing_dev_info *bdi)
 {
-- 
2.14.2

[PATCH 0/4] add error handle for bdi debugfs register

2017-10-26 Thread weiping zhang


this series add error handle for bdi debugfs register flow, the first
three patches try to convert void function to int and do some cleanup
if create dir or file fail.

the fourth patch only add a WARN_ON in device_add_disk, no function change.

weiping zhang (4):
  bdi: add check for bdi_debug_root
  bdi: convert bdi_debug_register to int
  bdi: add error handle for bdi_debug_register
  block: add WARN_ON if bdi register fail

 block/genhd.c|  4 +++-
 mm/backing-dev.c | 41 +++--
 2 files changed, 38 insertions(+), 7 deletions(-)

-- 
2.14.2

[GIT PULL] Block fixes for 4.14-rc

2017-10-26 Thread Jens Axboe

Hi Linus,

A few select fixes that should go into this series. Mainly for NVMe, but
also a single stable fix for nbd from Josef.

Please pull!


  git://git.kernel.dk/linux-block.git for-linus



James Smart (3):
  nvme-fc: fix iowait hang
  nvme-fc: retry initial controller connections 3 times
  nvmet: synchronize sqhd update

Jens Axboe (1):
  Merge branch 'nvme-4.14' of git://git.infradead.org/nvme into for-linus

Josef Bacik (1):
  nbd: handle interrupted sendmsg with a sndtimeo set

Sagi Grimberg (2):
  nvme-rdma: Fix possible double free in reconnect flow
  nvme-rdma: Fix error status return in tagset allocation failure

 drivers/block/nbd.c | 13 +++--
 drivers/nvme/host/fc.c  | 37 +
 drivers/nvme/host/rdma.c| 16 
 drivers/nvme/target/core.c  | 15 ---
 drivers/nvme/target/nvmet.h |  2 +-
 5 files changed, 69 insertions(+), 14 deletions(-)

-- 
Jens Axboe

Re: [PATCH V12 0/5] mmc: Add Command Queue support

2017-10-26 Thread Linus Walleij

On Thu, Oct 26, 2017 at 3:49 PM, Adrian Hunter  wrote:
> On 26/10/17 16:32, Linus Walleij wrote:

>> My patch series switches the stack around to make it possible
>> to do this. But it doesn't go the whole way to complete the requests
>> from interrupt context.
>>
>> Since we have to send commands for retune etc request finalization
>> cannot easily be done from interrupt context.
>
> Re-tuning and background operations are rare and slow, so there is no reason
> to try to start them from interrupt context.

OK I will try to get them out of the way and see what happens,
hehe :)

What I mean is that we were checking - on every command -
if BKOPS or retune needs to happen. And then doing it. Thus
all was done in process context.

>> But I am thinking about testing to hack it
>> using some ugly approaches ... like assuming we don't need any
>> retune etc and just say all is fine and optimistically complete the
>> request directly in the interrupt handler if all was OK and wait
>> for errors to happen before retuning.
>
> It already works that way.  Re-tuning happens before you start a request.
> We prevent re-tuning in between dependent requests, like between starting a
> transfer and CMD13 polling for completion.

Ah that is what these if()s do ... right. I'll see if I can get around
this then.

Yours,
Linus Walleij

Re: [PATCH 00/12 v4] multiqueue for MMC/SD

2017-10-26 Thread Linus Walleij

On Thu, Oct 26, 2017 at 3:34 PM, Adrian Hunter  wrote:
> On 26/10/17 15:57, Linus Walleij wrote:

>> I have now worked on it for more than a year. I was side
>> tracked to clean up some code, move request allocation to
>> be handled by the block layer, delete bounce buffer handling
>> and refactoring the RPMB support. With the changes to request
>> allocation, the patch set is a better fit and has shrunk
>> from 16 to 12 patches as a result.
>
> None of which was necessary for blk-mq support.

I was not smart enough to realize that it was possible to do what
you did in
commit d685f5d5fcf75c30ef009771d3067f7438cd8baf
"mmc: core: Introduce host claiming by context"
this simple and clever solution simply didn't occur to me
at all.

And now it uses that solution, as you can see :)

But since I didn't have that simple solution, the other
solution was to get rid of the lock altogether (which we should
anyways...) getting rid of the RPMB "partition" for example
removes some locks. (I guess I still will have to go on and
find a solution for the boot and generic partitions but it's no
blocker for MQ anymore.)

My patch set was dependent on solving that. As I already wrote
to you on sep 13:
https://marc.info/?l=linux-mmc&m=150607944524752&w=2

My patches for allocating the struct mmc_queue_req from the
block layer was actually originally a part of this series
so the old patches
  mmc: queue: issue struct mmc_queue_req items
  mmc: queue: get/put struct mmc_queue_req
was doing a stupid homebrewn solution to what the block
layer already can do, mea culpa. (Yeah I was educating
myself in the block layer too...)

Anyways, all of this happened in the context of moving
forward with my MQ patch set, not as random activity.

Now it looks like I'm defending myself from a project leader,
haha :D

Well for better or worse, this was how I was working.

>> We use the trick to set the queue depth to 2 to get two
>> parallel requests pushed down to the host. I tried to set this
>> to 4, the code survives it, the queue just have three items
>> waiting to be submitted all the time.
>
> The queue depth also sets the number of requests, so you are strangling the
> I/O scheduler.

Yup. Just did it to see if it survives.

>> In my opinion this is also a better fit for command queueuing.
>
> Not true.  CQE support worked perfectly before blk-mq and did not depend on
> blk-mq in any way.  Obviously the current CQE patch set actually implements
> the CQE requirements for blk-mq - which this patch set does not.

What I mean is that the CQE code will likely look better on top
of these refactorings.

But as I say it is a matter of taste. I just love the looks of my own code
as much as the next programmer so I can't judge that. Let's see what
the reviewers say.

>> Handling command queueing needs to happen in the asynchronous
>> submission codepath, so instead of waiting on a pending
>> areq, we just stack up requests in the command queue.
>
> That is how CQE has always worked.  It worked that way just fine without 
> blk-mq.

Okay nice.

>> It sounds simple but I bet this drives a truck through Adrians
>> patch series. Sorry. :(
>
> I waited a long time for your patches but I had to give up waiting when Ulf
> belated insisted on blk-mq before CQE.  I am not sure what you are expecting
> now it seems too late.

Too late for what? It's just a patch set, I don't really have a deadline
for this or anything. As I explained above I have
been working on this all the time, the problem was that I was/am not
smart enough to find that solution for host claiming by context.

The host claiming by context was merged a month ago and now I have
understood that I can use that and rebased my patches on it. Slow
learner, I guess.

If you feel it is ungrateful that you have put in so much work and things
are not getting merged, and you feel your patches deserve to be
merged first (because of human nature reasons) I can empathize with
that. It's sad that your patches are at v12. Also I see that patch 4 bears
the signoffs of a significant team at CodeAurora, so they are likely
as impatient.

I would just rebase my remaining work on top of the CQE patches if
they end up being merged first, no big deal, just work.

Yours,
Linus Walleij

Re: [PATCH V12 0/5] mmc: Add Command Queue support

2017-10-26 Thread Adrian Hunter

On 26/10/17 16:32, Linus Walleij wrote:
> On Tue, Oct 24, 2017 at 10:40 AM, Adrian Hunter  
> wrote:
> 
>> Here is V12 of the hardware command queue patches without the software
>> command queue patches, now using blk-mq and now with blk-mq support for
>> non-CQE I/O.
> 
> Since I had my test setup going I gave this a spin with the same set
> of tests that I used before/after my MQ patches.
> 
> It is using the same setup and same eMMC, but I hade to rebase onto
> Ulf's very latest next branch to apply your patches.
> 
> I default-enabled multiqueue.
> 
> Results:
> 
> sync
> echo 3 > /proc/sys/vm/drop_caches
> sync
> time dd if=/dev/mmcblk3 of=/dev/null bs=1M count=1024
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.0GB) copied, 24.251922 seconds, 42.2MB/s
> real0m 24.25s
> user0m 0.03s
> sys 0m 3.80s
> 
> mount /dev/mmcblk3p1 /mnt/
> cd /mnt/
> sync
> echo 3 > /proc/sys/vm/drop_caches
> sync
> time find . > /dev/null
> real0m 3.24s
> user0m 0.22s
> sys 0m 1.23s
> 
> sync
> echo 3 > /proc/sys/vm/drop_caches
> sync
> iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test
> 
>randomrandom
>kB  reclenwrite  rewritereadrereadread write
> 20480   4 1615 1571 6612 6714 6494  531
> 20480   8 2143 2295115591156311499 1164
> 20480  16 3894 4202178261782317755 1369
> 20480  32 5816 7489237412375923709 3016
> 20480  64 7393 9167275322752627502 3591
> 20480 128 7328 8097291842916129159 5592
> 20480 256 7194 8752294242943429424 6700
> 20480 512 8984 9930299032991129909 7420
> 204801024 7072 7446276842768527681 7444
> 204802048 6840 8199273982742027418 6766
> 204804096 8137 6805280912808928093 8209
> 204808192 7255 7485283862838428383 7479
> 20480   16384 7078 7448285842858528585 7447
> 
> In short: no performance regressions.

You really need to test cards that are fast.  A decent UHS-I SD card can do
over 80 MB/s for reads and of course HS400 eMMC can do over 300 MB/s.

> 
> Performance-wise this is on par with my own patch set for MQ.
> 
> As you know my pet peeve is "enable MQ by default" and I see no
> reason from a performance perspective not to enable MQ by default
> on this patch set or mine for that matter.

That is a side-issue.  A single small patch can change that.

> 
>> While we should look at changing blk-mq to give better workqueue performance,
>> a bigger gain is likely to be made by adding a new host API to enable the
>> next already-prepared request to be issued directly from within ->done()
>> callback of the current request.
> 
> My patch series switches the stack around to make it possible
> to do this. But it doesn't go the whole way to complete the requests
> from interrupt context.
> 
> Since we have to send commands for retune etc request finalization
> cannot easily be done from interrupt context.

Re-tuning and background operations are rare and slow, so there is no reason
to try to start them from interrupt context.

> 
> But I am thinking about testing to hack it
> using some ugly approaches ... like assuming we don't need any
> retune etc and just say all is fine and optimistically complete the
> request directly in the interrupt handler if all was OK and wait
> for errors to happen before retuning.

It already works that way.  Re-tuning happens before you start a request.
We prevent re-tuning in between dependent requests, like between starting a
transfer and CMD13 polling for completion.

Re: [PATCH 00/12 v4] multiqueue for MMC/SD

2017-10-26 Thread Adrian Hunter

On 26/10/17 15:57, Linus Walleij wrote:
> This switches the MMC/SD stack over to unconditionally
> using the multiqueue block interface for block access.
> This modernizes the MMC/SD stack and makes it possible
> to enable BFQ scheduling on these single-queue devices.
> 
> This is the v4 version of this v3 patch set from february:
> https://marc.info/?l=linux-mmc&m=148665788227015&w=2
> 
> The patches are available in a git branch:
> https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson.git/log/?h=mmc-mq-v4.14-rc4
> 
> You can pull it to a clean kernel tree like this:
> git checkout -b mmc-test v4.14-rc4
> git pull 
> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson.git 
> mmc-mq-v4.14-rc4
> 
> I have now worked on it for more than a year. I was side
> tracked to clean up some code, move request allocation to
> be handled by the block layer, delete bounce buffer handling
> and refactoring the RPMB support. With the changes to request
> allocation, the patch set is a better fit and has shrunk
> from 16 to 12 patches as a result.

None of which was necessary for blk-mq support.

> 
> It is still quite invasive. Yet it is something I think would
> be nice to merge for v4.16...
> 
> The rationale for this approach was Arnd's suggestion to try to
> switch the MMC/SD stack around so as to complete requests as
> quickly as possible when they return from the device driver
> so that new requests can be issued. We are doing this now:
> the polling loop that was pulling NULL out of the request
> queue and driving the pipeline with a loop is gone with
> the next-to last patch ("block: issue requests in massive
> parallel"). This sets the stage for MQ to go in and hammer
> requests on the asynchronous issuing layer.
> 
> We use the trick to set the queue depth to 2 to get two
> parallel requests pushed down to the host. I tried to set this
> to 4, the code survives it, the queue just have three items
> waiting to be submitted all the time.

The queue depth also sets the number of requests, so you are strangling the
I/O scheduler.

> 
> In my opinion this is also a better fit for command queueuing.

Not true.  CQE support worked perfectly before blk-mq and did not depend on
blk-mq in any way.  Obviously the current CQE patch set actually implements
the CQE requirements for blk-mq - which this patch set does not.

> Handling command queueing needs to happen in the asynchronous
> submission codepath, so instead of waiting on a pending
> areq, we just stack up requests in the command queue.

That is how CQE has always worked.  It worked that way just fine without blk-mq.

> 
> It sounds simple but I bet this drives a truck through Adrians
> patch series. Sorry. :(

I waited a long time for your patches but I had to give up waiting when Ulf
belated insisted on blk-mq before CQE.  I am not sure what you are expecting
now it seems too late.

Re: [PATCH V12 0/5] mmc: Add Command Queue support

2017-10-26 Thread Linus Walleij

On Tue, Oct 24, 2017 at 10:40 AM, Adrian Hunter  wrote:

> Here is V12 of the hardware command queue patches without the software
> command queue patches, now using blk-mq and now with blk-mq support for
> non-CQE I/O.

Since I had my test setup going I gave this a spin with the same set
of tests that I used before/after my MQ patches.

It is using the same setup and same eMMC, but I hade to rebase onto
Ulf's very latest next branch to apply your patches.

I default-enabled multiqueue.

Results:

sync
echo 3 > /proc/sys/vm/drop_caches
sync
time dd if=/dev/mmcblk3 of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 24.251922 seconds, 42.2MB/s
real0m 24.25s
user0m 0.03s
sys 0m 3.80s

mount /dev/mmcblk3p1 /mnt/
cd /mnt/
sync
echo 3 > /proc/sys/vm/drop_caches
sync
time find . > /dev/null
real0m 3.24s
user0m 0.22s
sys 0m 1.23s

sync
echo 3 > /proc/sys/vm/drop_caches
sync
iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test

   randomrandom
   kB  reclenwrite  rewritereadrereadread write
20480   4 1615 1571 6612 6714 6494  531
20480   8 2143 2295115591156311499 1164
20480  16 3894 4202178261782317755 1369
20480  32 5816 7489237412375923709 3016
20480  64 7393 9167275322752627502 3591
20480 128 7328 8097291842916129159 5592
20480 256 7194 8752294242943429424 6700
20480 512 8984 9930299032991129909 7420
204801024 7072 7446276842768527681 7444
204802048 6840 8199273982742027418 6766
204804096 8137 6805280912808928093 8209
204808192 7255 7485283862838428383 7479
20480   16384 7078 7448285842858528585 7447

In short: no performance regressions.

Performance-wise this is on par with my own patch set for MQ.

As you know my pet peeve is "enable MQ by default" and I see no
reason from a performance perspective not to enable MQ by default
on this patch set or mine for that matter.

> While we should look at changing blk-mq to give better workqueue performance,
> a bigger gain is likely to be made by adding a new host API to enable the
> next already-prepared request to be issued directly from within ->done()
> callback of the current request.

My patch series switches the stack around to make it possible
to do this. But it doesn't go the whole way to complete the requests
from interrupt context.

Since we have to send commands for retune etc request finalization
cannot easily be done from interrupt context.

But I am thinking about testing to hack it
using some ugly approaches ... like assuming we don't need any
retune etc and just say all is fine and optimistically complete the
request directly in the interrupt handler if all was OK and wait
for errors to happen before retuning.

Yours,
Linus Walleij

[PATCH 12/12 v4] mmc: switch MMC/SD to use blk-mq multiqueueing

2017-10-26 Thread Linus Walleij

This switches the MMC/SD stack to use the multiqueue block
layer interface.

We kill off the kthread that was just calling blk_fetch_request()
and let blk-mq drive all traffic, nice, that is how it should work.

Due to having switched the submission mechanics around so that
the completion of requests is now triggered from the host
callbacks, we manage to keep the same performance for linear
reads/writes as we have for the old block layer.

The open questions from earlier patch series v1 thru v3 have
been addressed:

- mmc_[get|put]_card() is now issued across requests from
  .queue_rq() to .complete() using Adrians nifty context lock.
  This means that the block layer does not compete with itself
  on getting access to the host, and we can let other users of
  the host come in. (For SDIO and mixed-mode cards.)

- Partial reads are handled by open coding calls to
  blk_update_request() as advised by Christoph.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c |  87 ++
 drivers/mmc/core/queue.c | 223 ++-
 drivers/mmc/core/queue.h |   8 +-
 3 files changed, 139 insertions(+), 179 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index f06f381146a5..9e0fe07e098a 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -93,7 +94,6 @@ static DEFINE_IDA(mmc_rpmb_ida);
  * There is one mmc_blk_data per slot.
  */
 struct mmc_blk_data {
-   spinlock_t  lock;
struct device   *parent;
struct gendisk  *disk;
struct mmc_queue queue;
@@ -1204,6 +1204,18 @@ static inline void mmc_blk_reset_success(struct 
mmc_blk_data *md, int type)
 }
 
 /*
+ * This reports status back to the block layer for a finished request.
+ */
+static void mmc_blk_complete(struct mmc_queue_req *mq_rq,
+blk_status_t status)
+{
+   struct request *req = mmc_queue_req_to_req(mq_rq);
+
+   blk_mq_end_request(req, status);
+   blk_mq_complete_request(req);
+}
+
+/*
  * The non-block commands come back from the block layer after it queued it and
  * processed it with all other requests and then they get issued in this
  * function.
@@ -1262,9 +1274,9 @@ static void mmc_blk_issue_drv_op(struct mmc_queue_req 
*mq_rq)
ret = -EINVAL;
break;
}
+
mq_rq->drv_op_result = ret;
-   blk_end_request_all(mmc_queue_req_to_req(mq_rq),
-   ret ? BLK_STS_IOERR : BLK_STS_OK);
+   mmc_blk_complete(mq_rq, ret ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
 static void mmc_blk_issue_discard_rq(struct mmc_queue_req *mq_rq)
@@ -1308,7 +1320,7 @@ static void mmc_blk_issue_discard_rq(struct mmc_queue_req 
*mq_rq)
else
mmc_blk_reset_success(md, type);
 fail:
-   blk_end_request(req, status, blk_rq_bytes(req));
+   mmc_blk_complete(mq_rq, status);
 }
 
 static void mmc_blk_issue_secdiscard_rq(struct mmc_queue_req *mq_rq)
@@ -1378,7 +1390,7 @@ static void mmc_blk_issue_secdiscard_rq(struct 
mmc_queue_req *mq_rq)
if (!err)
mmc_blk_reset_success(md, type);
 out:
-   blk_end_request(req, status, blk_rq_bytes(req));
+   mmc_blk_complete(mq_rq, status);
 }
 
 static void mmc_blk_issue_flush(struct mmc_queue_req *mq_rq)
@@ -1388,8 +1400,13 @@ static void mmc_blk_issue_flush(struct mmc_queue_req 
*mq_rq)
int ret = 0;
 
ret = mmc_flush_cache(card);
-   blk_end_request_all(mmc_queue_req_to_req(mq_rq),
-   ret ? BLK_STS_IOERR : BLK_STS_OK);
+   /*
+* NOTE: this used to call blk_end_request_all() for both
+* cases in the old block layer to flush all queued
+* transactions. I am not sure it was even correct to
+* do that for the success case.
+*/
+   mmc_blk_complete(mq_rq, ret ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
 /*
@@ -1768,7 +1785,6 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req 
*mq_rq,
 
mq_rq->areq.err_check = mmc_blk_err_check;
mq_rq->areq.host = card->host;
-   INIT_WORK(&mq_rq->areq.finalization_work, mmc_finalize_areq);
 }
 
 static bool mmc_blk_rw_cmd_err(struct mmc_blk_data *md, struct mmc_card *card,
@@ -1792,10 +1808,13 @@ static bool mmc_blk_rw_cmd_err(struct mmc_blk_data *md, 
struct mmc_card *card,
err = mmc_sd_num_wr_blocks(card, &blocks);
if (err)
req_pending = old_req_pending;
-   else
-   req_pending = blk_end_request(req, BLK_STS_OK, blocks 
<< 9);
+   else {
+   req_pending = blk_update_request(req, BLK_STS_OK,
+blocks << 9);
+   }
} else {
-   req_pending = blk_end_request(req, BLK_STS_OK, 
brq->data.bytes_xfered);
+   req_pendi

[PATCH 08/12 v4] mmc: block: shuffle retry and error handling

2017-10-26 Thread Linus Walleij

Instead of doing retries at the same time as trying to submit new
requests, do the retries when the request is reported as completed
by the driver, in the finalization worker.

This is achieved by letting the core worker call back into the block
layer using a callback in the asynchronous request, ->report_done_status()
that will pass the status back to the block core so it can repeatedly
try to hammer the request using single request, retry etc by calling back
to the core layer using mmc_restart_areq(), which will just kick the
same asynchronous request without waiting for a previous ongoing request.

The beauty of it is that the completion will not complete until the
block layer has had the opportunity to hammer a bit at the card using
a bunch of different approaches that used to be in the while() loop in
mmc_blk_rw_done()

The algorithm for recapture, retry and handle errors is identical to the
one we used to have in mmc_blk_issue_rw_rq(), only augmented to get
called in another path from the core.

We have to add and initialize a pointer back to the struct mmc_queue
from the struct mmc_queue_req to find the queue from the asynchronous
request when reporting the status back to the core.

Other users of the asynchrous request that do not need to retry and
use misc error handling fallbacks will work fine since a NULL
->report_done_status() is just fine. This is currently only done by
the test module.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 337 ---
 drivers/mmc/core/core.c  |  47 ---
 drivers/mmc/core/core.h  |   1 +
 drivers/mmc/core/queue.c |   2 +
 drivers/mmc/core/queue.h |   1 +
 include/linux/mmc/host.h |   5 +-
 6 files changed, 210 insertions(+), 183 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 86ec87c17e71..c1178fa83f75 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1811,198 +1811,207 @@ static void mmc_blk_rw_cmd_abort(struct mmc_queue 
*mq, struct mmc_card *card,
 /**
  * mmc_blk_rw_try_restart() - tries to restart the current async request
  * @mq: the queue with the card and host to restart
- * @req: a new request that want to be started after the current one
+ * @mqrq: the mmc_queue_request containing the areq to be restarted
  */
-static void mmc_blk_rw_try_restart(struct mmc_queue *mq, struct request *req,
+static void mmc_blk_rw_try_restart(struct mmc_queue *mq,
   struct mmc_queue_req *mqrq)
 {
-   if (!req)
-   return;
+   /* Proceed and try to restart the current async request */
+   mmc_blk_rw_rq_prep(mqrq, mq->card, 0, mq);
+   mmc_restart_areq(mq->card->host, &mqrq->areq);
+}
+
+static void mmc_blk_rw_done(struct mmc_async_req *areq, enum mmc_blk_status 
status)
+{
+   struct mmc_queue *mq;
+   struct mmc_blk_data *md;
+   struct mmc_card *card;
+   struct mmc_host *host;
+   struct mmc_queue_req *mq_rq;
+   struct mmc_blk_request *brq;
+   struct request *old_req;
+   bool req_pending = true;
+   int disable_multi = 0, retry = 0, type, retune_retry_done = 0;
 
/*
-* If the card was removed, just cancel everything and return.
+* An asynchronous request has been completed and we proceed
+* to handle the result of it.
 */
-   if (mmc_card_removed(mq->card)) {
-   req->rq_flags |= RQF_QUIET;
-   blk_end_request_all(req, BLK_STS_IOERR);
-   mq->qcnt--; /* FIXME: just set to 0? */
+   mq_rq = container_of(areq, struct mmc_queue_req, areq);
+   mq = mq_rq->mq;
+   md = mq->blkdata;
+   card = mq->card;
+   host = card->host;
+   brq = &mq_rq->brq;
+   old_req = mmc_queue_req_to_req(mq_rq);
+   type = rq_data_dir(old_req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
+
+   switch (status) {
+   case MMC_BLK_SUCCESS:
+   case MMC_BLK_PARTIAL:
+   /*
+* A block was successfully transferred.
+*/
+   mmc_blk_reset_success(md, type);
+   req_pending = blk_end_request(old_req, BLK_STS_OK,
+ brq->data.bytes_xfered);
+   /*
+* If the blk_end_request function returns non-zero even
+* though all data has been transferred and no errors
+* were returned by the host controller, it's a bug.
+*/
+   if (status == MMC_BLK_SUCCESS && req_pending) {
+   pr_err("%s BUG rq_tot %d d_xfer %d\n",
+  __func__, blk_rq_bytes(old_req),
+  brq->data.bytes_xfered);
+   mmc_blk_rw_cmd_abort(mq, card, old_req, mq_rq);
+   return;
+   }
+   break;
+   case MMC_BLK_CMD_ERR:
+   req_pending = mmc_blk_rw_cmd_err(md, card, brq, old_r

[PATCH 11/12 v4] mmc: block: issue requests in massive parallel

2017-10-26 Thread Linus Walleij

This makes a crucial change to the issueing mechanism for the
MMC requests:

Before commit "mmc: core: move the asynchronous post-processing"
some parallelism on the read/write requests was achieved by
speculatively postprocessing a request and re-preprocess and
re-issue the request if something went wrong, which we discover
later when checking for an error.

This is kind of ugly. Instead we need a mechanism like here:

We issue requests, and when they come back from the hardware,
we know if they finished successfully or not. If the request
was successful, we complete the asynchronous request and let a
new request immediately start on the hardware. If, and only if,
it returned an error from the hardware we go down the error
path.

This is achieved by splitting the work path from the hardware
in two: a successful path ending up calling down to
mmc_blk_rw_done() and completing quickly, and an errorpath
calling down to mmc_blk_rw_done_error().

This has a profound effect: we reintroduce the parallelism on
the successful path as mmc_post_req() can now be called in
while the next request is in transit (just like prior to
commit "mmc: core: move the asynchronous post-processing")
and blk_end_request() is called while the next request is
already on the hardware.

The latter has the profound effect of issuing a new request
again so that we actually may have three requests
in transit at the same time: one on the hardware, one being
prepared (such as DMA flushing) and one being prepared for
issuing next by the block layer. This shows up when we
transit to multiqueue, where this can be exploited.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 79 +---
 drivers/mmc/core/core.c  | 38 +--
 2 files changed, 83 insertions(+), 34 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 184907f5fb97..f06f381146a5 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1824,7 +1824,8 @@ static void mmc_blk_rw_try_restart(struct mmc_queue_req 
*mq_rq)
mmc_restart_areq(mq->card->host, &mq_rq->areq);
 }
 
-static void mmc_blk_rw_done(struct mmc_async_req *areq, enum mmc_blk_status 
status)
+static void mmc_blk_rw_done_error(struct mmc_async_req *areq,
+ enum mmc_blk_status status)
 {
struct mmc_queue *mq;
struct mmc_blk_data *md;
@@ -1832,7 +1833,7 @@ static void mmc_blk_rw_done(struct mmc_async_req *areq, 
enum mmc_blk_status stat
struct mmc_host *host;
struct mmc_queue_req *mq_rq;
struct mmc_blk_request *brq;
-   struct request *old_req;
+   struct request *req;
bool req_pending = true;
int disable_multi = 0, retry = 0, type, retune_retry_done = 0;
 
@@ -1846,33 +1847,18 @@ static void mmc_blk_rw_done(struct mmc_async_req *areq, 
enum mmc_blk_status stat
card = mq->card;
host = card->host;
brq = &mq_rq->brq;
-   old_req = mmc_queue_req_to_req(mq_rq);
-   type = rq_data_dir(old_req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
+   req = mmc_queue_req_to_req(mq_rq);
+   type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
 
switch (status) {
-   case MMC_BLK_SUCCESS:
case MMC_BLK_PARTIAL:
-   /*
-* A block was successfully transferred.
-*/
+   /* This should trigger a retransmit */
mmc_blk_reset_success(md, type);
-   req_pending = blk_end_request(old_req, BLK_STS_OK,
+   req_pending = blk_end_request(req, BLK_STS_OK,
  brq->data.bytes_xfered);
-   /*
-* If the blk_end_request function returns non-zero even
-* though all data has been transferred and no errors
-* were returned by the host controller, it's a bug.
-*/
-   if (status == MMC_BLK_SUCCESS && req_pending) {
-   pr_err("%s BUG rq_tot %d d_xfer %d\n",
-  __func__, blk_rq_bytes(old_req),
-  brq->data.bytes_xfered);
-   mmc_blk_rw_cmd_abort(mq_rq);
-   return;
-   }
break;
case MMC_BLK_CMD_ERR:
-   req_pending = mmc_blk_rw_cmd_err(md, card, brq, old_req, 
req_pending);
+   req_pending = mmc_blk_rw_cmd_err(md, card, brq, req, 
req_pending);
if (mmc_blk_reset(md, card->host, type)) {
if (req_pending)
mmc_blk_rw_cmd_abort(mq_rq);
@@ -1911,7 +1897,7 @@ static void mmc_blk_rw_done(struct mmc_async_req *areq, 
enum mmc_blk_status stat
if (brq->data.blocks > 1) {
/* Redo read one sector at a time */
pr_warn("%s: retrying using single block read\n",
-

[PATCH 09/12 v4] mmc: queue: stop flushing the pipeline with NULL

2017-10-26 Thread Linus Walleij

Remove all the pipeline flush: i.e. repeatedly sending NULL
down to the core layer to flush out asynchronous requests,
and also sending NULL after "special" commands to achieve the
same flush.

Instead: let the "special" commands wait for any ongoing
asynchronous transfers using the completion, and apart from
that expect the core.c and block.c layers to deal with the
ongoing requests autonomously without any "push" from the
queue.

Add a function in the core to wait for an asynchronous request
to complete.

Update the tests to use the new function prototypes.

This kills off some FIXME's such as gettin rid of the mq->qcnt
queue depth variable that was introduced a while back.

It is a vital step toward multiqueue enablement that we stop
pulling NULL off the end of the request queue to flush the
asynchronous issueing mechanism.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c| 168 
 drivers/mmc/core/core.c |  50 +++--
 drivers/mmc/core/core.h |   6 +-
 drivers/mmc/core/mmc_test.c |  31 ++--
 drivers/mmc/core/queue.c|  11 ++-
 drivers/mmc/core/queue.h|   7 --
 6 files changed, 106 insertions(+), 167 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index c1178fa83f75..ab01cab4a026 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1805,7 +1805,6 @@ static void mmc_blk_rw_cmd_abort(struct mmc_queue *mq, 
struct mmc_card *card,
if (mmc_card_removed(card))
req->rq_flags |= RQF_QUIET;
while (blk_end_request(req, BLK_STS_IOERR, blk_rq_cur_bytes(req)));
-   mq->qcnt--;
 }
 
 /**
@@ -1873,13 +1872,10 @@ static void mmc_blk_rw_done(struct mmc_async_req *areq, 
enum mmc_blk_status stat
if (mmc_blk_reset(md, card->host, type)) {
if (req_pending)
mmc_blk_rw_cmd_abort(mq, card, old_req, mq_rq);
-   else
-   mq->qcnt--;
mmc_blk_rw_try_restart(mq, mq_rq);
return;
}
if (!req_pending) {
-   mq->qcnt--;
mmc_blk_rw_try_restart(mq, mq_rq);
return;
}
@@ -1923,7 +1919,6 @@ static void mmc_blk_rw_done(struct mmc_async_req *areq, 
enum mmc_blk_status stat
req_pending = blk_end_request(old_req, BLK_STS_IOERR,
  brq->data.blksz);
if (!req_pending) {
-   mq->qcnt--;
mmc_blk_rw_try_restart(mq, mq_rq);
return;
}
@@ -1947,26 +1942,16 @@ static void mmc_blk_rw_done(struct mmc_async_req *areq, 
enum mmc_blk_status stat
 */
mmc_blk_rw_rq_prep(mq_rq, card,
disable_multi, mq);
-   mmc_start_areq(card->host, areq, NULL);
+   mmc_start_areq(card->host, areq);
mq_rq->brq.retune_retry_done = retune_retry_done;
-   } else {
-   /* Else, this request is done */
-   mq->qcnt--;
}
 }
 
 static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req)
 {
-   enum mmc_blk_status status;
-   struct mmc_async_req *new_areq;
-   struct mmc_async_req *old_areq;
struct mmc_card *card = mq->card;
-
-   if (new_req)
-   mq->qcnt++;
-
-   if (!mq->qcnt)
-   return;
+   struct mmc_queue_req *mqrq_cur = req_to_mmc_queue_req(new_req);
+   struct mmc_async_req *areq = &mqrq_cur->areq;
 
/*
 * If the card was removed, just cancel everything and return.
@@ -1974,44 +1959,25 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
if (mmc_card_removed(card)) {
new_req->rq_flags |= RQF_QUIET;
blk_end_request_all(new_req, BLK_STS_IOERR);
-   mq->qcnt--; /* FIXME: just set to 0? */
return;
}
 
-   if (new_req) {
-   struct mmc_queue_req *mqrq_cur = req_to_mmc_queue_req(new_req);
-   /*
-* When 4KB native sector is enabled, only 8 blocks
-* multiple read or write is allowed
-*/
-   if (mmc_large_sector(card) &&
-   !IS_ALIGNED(blk_rq_sectors(new_req), 8)) {
-   pr_err("%s: Transfer size is not 4KB sector size 
aligned\n",
-  new_req->rq_disk->disk_name);
-   mmc_blk_rw_cmd_abort(mq, card, new_req, mqrq_cur);
-   return;
-   }
-
-   mmc_blk_rw_rq_prep(mqrq_cur, card, 0, mq);
-   new_areq = &mqrq_cur->areq;
-   new_areq->report_done_status = mmc_blk_rw_done;
-   } else
-   new_areq = NULL

[PATCH 07/12 v4] mmc: queue: simplify queue logic

2017-10-26 Thread Linus Walleij

The if() statment checking if there is no current or previous
request is now just looking ahead at something that will be
concluded a few lines below. Simplify the logic by moving the
assignment of .asleep.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/queue.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 2c232ba4e594..023bbddc1a0b 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -53,14 +53,6 @@ static int mmc_queue_thread(void *d)
set_current_state(TASK_INTERRUPTIBLE);
req = blk_fetch_request(q);
mq->asleep = false;
-   if (!req) {
-   /*
-* Dispatch queue is empty so set flags for
-* mmc_request_fn() to wake us up.
-*/
-   if (!mq->qcnt)
-   mq->asleep = true;
-   }
spin_unlock_irq(q->queue_lock);
 
if (req || mq->qcnt) {
@@ -68,6 +60,7 @@ static int mmc_queue_thread(void *d)
mmc_blk_issue_rq(mq, req);
cond_resched();
} else {
+   mq->asleep = true;
if (kthread_should_stop()) {
set_current_state(TASK_RUNNING);
break;
-- 
2.13.6

[PATCH 10/12 v4] mmc: queue/block: pass around struct mmc_queue_req*s

2017-10-26 Thread Linus Walleij

Instead of passing two pointers around several pointers to
mmc_queue_req, request, mmc_queue, and reassigning to the left and
right, issue mmc_queue_req and dereference the queue and request
from the mmq_queue_req where needed.

The struct mmc_queue_req is the thing that has a lifecycle after
all: this is what we are keeping in our queue, and what the block
layer helps us manager. Augment a bunch of functions to take a
single argument so we can see the trees and not just a big
jungle of arguments.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 129 ---
 drivers/mmc/core/block.h |   5 +-
 drivers/mmc/core/queue.c |   2 +-
 3 files changed, 70 insertions(+), 66 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index ab01cab4a026..184907f5fb97 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1208,9 +1208,9 @@ static inline void mmc_blk_reset_success(struct 
mmc_blk_data *md, int type)
  * processed it with all other requests and then they get issued in this
  * function.
  */
-static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_issue_drv_op(struct mmc_queue_req *mq_rq)
 {
-   struct mmc_queue_req *mq_rq;
+   struct mmc_queue *mq = mq_rq->mq;
struct mmc_card *card = mq->card;
struct mmc_blk_data *md = mq->blkdata;
struct mmc_blk_ioc_data **idata;
@@ -1220,7 +1220,6 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
int ret;
int i;
 
-   mq_rq = req_to_mmc_queue_req(req);
rpmb_ioctl = (mq_rq->drv_op == MMC_DRV_OP_IOCTL_RPMB);
 
switch (mq_rq->drv_op) {
@@ -1264,12 +1263,14 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
break;
}
mq_rq->drv_op_result = ret;
-   blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+   blk_end_request_all(mmc_queue_req_to_req(mq_rq),
+   ret ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
-static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_issue_discard_rq(struct mmc_queue_req *mq_rq)
 {
-   struct mmc_blk_data *md = mq->blkdata;
+   struct request *req = mmc_queue_req_to_req(mq_rq);
+   struct mmc_blk_data *md = mq_rq->mq->blkdata;
struct mmc_card *card = md->queue.card;
unsigned int from, nr, arg;
int err = 0, type = MMC_BLK_DISCARD;
@@ -1310,10 +1311,10 @@ static void mmc_blk_issue_discard_rq(struct mmc_queue 
*mq, struct request *req)
blk_end_request(req, status, blk_rq_bytes(req));
 }
 
-static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq,
-  struct request *req)
+static void mmc_blk_issue_secdiscard_rq(struct mmc_queue_req *mq_rq)
 {
-   struct mmc_blk_data *md = mq->blkdata;
+   struct request *req = mmc_queue_req_to_req(mq_rq);
+   struct mmc_blk_data *md = mq_rq->mq->blkdata;
struct mmc_card *card = md->queue.card;
unsigned int from, nr, arg;
int err = 0, type = MMC_BLK_SECDISCARD;
@@ -1380,14 +1381,15 @@ static void mmc_blk_issue_secdiscard_rq(struct 
mmc_queue *mq,
blk_end_request(req, status, blk_rq_bytes(req));
 }
 
-static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_issue_flush(struct mmc_queue_req *mq_rq)
 {
-   struct mmc_blk_data *md = mq->blkdata;
+   struct mmc_blk_data *md = mq_rq->mq->blkdata;
struct mmc_card *card = md->queue.card;
int ret = 0;
 
ret = mmc_flush_cache(card);
-   blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+   blk_end_request_all(mmc_queue_req_to_req(mq_rq),
+   ret ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
 /*
@@ -1698,18 +1700,18 @@ static void mmc_blk_data_prep(struct mmc_queue *mq, 
struct mmc_queue_req *mqrq,
*do_data_tag_p = do_data_tag;
 }
 
-static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
-  struct mmc_card *card,
-  int disable_multi,
-  struct mmc_queue *mq)
+static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mq_rq,
+  int disable_multi)
 {
u32 readcmd, writecmd;
-   struct mmc_blk_request *brq = &mqrq->brq;
-   struct request *req = mmc_queue_req_to_req(mqrq);
+   struct mmc_queue *mq = mq_rq->mq;
+   struct mmc_card *card = mq->card;
+   struct mmc_blk_request *brq = &mq_rq->brq;
+   struct request *req = mmc_queue_req_to_req(mq_rq);
struct mmc_blk_data *md = mq->blkdata;
bool do_rel_wr, do_data_tag;
 
-   mmc_blk_data_prep(mq, mqrq, disable_multi, &do_rel_wr, &do_data_tag);
+   mmc_blk_data_prep(mq, mq_rq, disable_multi, &do_rel_wr, &do_data_tag);
 
brq->mrq.cmd = &brq->cmd;
brq->mrq.are

[PATCH 06/12 v4] mmc: core: kill off the context info

2017-10-26 Thread Linus Walleij

The last member of the context info: is_waiting_last_req is
just assigned values, never checked. Delete that and the whole
context info as a result.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c |  2 --
 drivers/mmc/core/bus.c   |  1 -
 drivers/mmc/core/core.c  | 13 -
 drivers/mmc/core/core.h  |  2 --
 drivers/mmc/core/queue.c |  9 +
 include/linux/mmc/host.h |  9 -
 6 files changed, 1 insertion(+), 35 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 5c84175e49be..86ec87c17e71 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -2065,13 +2065,11 @@ void mmc_blk_issue_rq(struct mmc_queue *mq, struct 
request *req)
default:
/* Normal request, just issue it */
mmc_blk_issue_rw_rq(mq, req);
-   card->host->context_info.is_waiting_last_req = false;
break;
}
} else {
/* No request, flushing the pipeline with NULL */
mmc_blk_issue_rw_rq(mq, NULL);
-   card->host->context_info.is_waiting_last_req = false;
}
 
 out:
diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
index a4b49e25fe96..45904a7e87be 100644
--- a/drivers/mmc/core/bus.c
+++ b/drivers/mmc/core/bus.c
@@ -348,7 +348,6 @@ int mmc_add_card(struct mmc_card *card)
 #ifdef CONFIG_DEBUG_FS
mmc_add_card_debugfs(card);
 #endif
-   mmc_init_context_info(card->host);
 
card->dev.of_node = mmc_of_find_child_device(card->host, 0);
 
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index ad832317f25b..865db736c717 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2987,19 +2987,6 @@ void mmc_unregister_pm_notifier(struct mmc_host *host)
 }
 #endif
 
-/**
- * mmc_init_context_info() - init synchronization context
- * @host: mmc host
- *
- * Init struct context_info needed to implement asynchronous
- * request mechanism, used by mmc core, host driver and mmc requests
- * supplier.
- */
-void mmc_init_context_info(struct mmc_host *host)
-{
-   host->context_info.is_waiting_last_req = false;
-}
-
 static int __init mmc_init(void)
 {
int ret;
diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index e493d9d73fe2..88b852ac8f74 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -92,8 +92,6 @@ void mmc_remove_host_debugfs(struct mmc_host *host);
 void mmc_add_card_debugfs(struct mmc_card *card);
 void mmc_remove_card_debugfs(struct mmc_card *card);
 
-void mmc_init_context_info(struct mmc_host *host);
-
 int mmc_execute_tuning(struct mmc_card *card);
 int mmc_hs200_to_hs400(struct mmc_card *card);
 int mmc_hs400_to_hs200(struct mmc_card *card);
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 4a0752ef6154..2c232ba4e594 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -42,7 +42,6 @@ static int mmc_queue_thread(void *d)
 {
struct mmc_queue *mq = d;
struct request_queue *q = mq->queue;
-   struct mmc_context_info *cntx = &mq->card->host->context_info;
 
current->flags |= PF_MEMALLOC;
 
@@ -54,15 +53,12 @@ static int mmc_queue_thread(void *d)
set_current_state(TASK_INTERRUPTIBLE);
req = blk_fetch_request(q);
mq->asleep = false;
-   cntx->is_waiting_last_req = false;
if (!req) {
/*
 * Dispatch queue is empty so set flags for
 * mmc_request_fn() to wake us up.
 */
-   if (mq->qcnt)
-   cntx->is_waiting_last_req = true;
-   else
+   if (!mq->qcnt)
mq->asleep = true;
}
spin_unlock_irq(q->queue_lock);
@@ -96,7 +92,6 @@ static void mmc_request_fn(struct request_queue *q)
 {
struct mmc_queue *mq = q->queuedata;
struct request *req;
-   struct mmc_context_info *cntx;
 
if (!mq) {
while ((req = blk_fetch_request(q)) != NULL) {
@@ -106,8 +101,6 @@ static void mmc_request_fn(struct request_queue *q)
return;
}
 
-   cntx = &mq->card->host->context_info;
-
if (mq->asleep)
wake_up_process(mq->thread);
 }
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index ceb58b27f402..638f11d185bd 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -239,14 +239,6 @@ struct mmc_slot {
void *handler_priv;
 };
 
-/**
- * mmc_context_info - synchronization details for mmc context
- * @is_waiting_last_reqmmc context waiting for single running request
- */
-struct mmc_context_info {
-   boolis_waiting_last_req;
-};
-
 struct regulator;
 struct mmc_pwrseq;
 
@@ -421,7 +413

[PATCH 04/12 v4] mmc: core: do away with is_done_rcv

2017-10-26 Thread Linus Walleij

The "is_done_rcv" in the context info for the host is no longer
needed: it is clear from context (ha!) that as long as we are
waiting for the asynchronous request to come to completion,
we are not done receiving data, and when the finalization work
has run and completed the completion, we are indeed done.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/core.c  | 40 
 include/linux/mmc/host.h |  2 --
 2 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index f6a51608ab0b..68125360a078 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -370,10 +370,8 @@ EXPORT_SYMBOL(mmc_start_request);
 static void mmc_wait_data_done(struct mmc_request *mrq)
 {
struct mmc_host *host = mrq->host;
-   struct mmc_context_info *context_info = &host->context_info;
struct mmc_async_req *areq = mrq->areq;
 
-   context_info->is_done_rcv = true;
/* Schedule a work to deal with finalizing this request */
if (!areq)
pr_err("areq of the data mmc_request was NULL!\n");
@@ -656,7 +654,7 @@ EXPORT_SYMBOL(mmc_cqe_recovery);
 bool mmc_is_req_done(struct mmc_host *host, struct mmc_request *mrq)
 {
if (host->areq)
-   return host->context_info.is_done_rcv;
+   return completion_done(&host->areq->complete);
else
return completion_done(&mrq->completion);
 }
@@ -705,29 +703,24 @@ void mmc_finalize_areq(struct work_struct *work)
struct mmc_async_req *areq =
container_of(work, struct mmc_async_req, finalization_work);
struct mmc_host *host = areq->host;
-   struct mmc_context_info *context_info = &host->context_info;
enum mmc_blk_status status = MMC_BLK_SUCCESS;
+   struct mmc_command *cmd;
 
-   if (context_info->is_done_rcv) {
-   struct mmc_command *cmd;
-
-   context_info->is_done_rcv = false;
-   cmd = areq->mrq->cmd;
+   cmd = areq->mrq->cmd;
 
-   if (!cmd->error || !cmd->retries ||
-   mmc_card_removed(host->card)) {
-   status = areq->err_check(host->card,
-areq);
-   } else {
-   mmc_retune_recheck(host);
-   pr_info("%s: req failed (CMD%u): %d, retrying...\n",
-   mmc_hostname(host),
-   cmd->opcode, cmd->error);
-   cmd->retries--;
-   cmd->error = 0;
-   __mmc_start_request(host, areq->mrq);
-   return; /* wait for done/new event again */
-   }
+   if (!cmd->error || !cmd->retries ||
+   mmc_card_removed(host->card)) {
+   status = areq->err_check(host->card,
+areq);
+   } else {
+   mmc_retune_recheck(host);
+   pr_info("%s: req failed (CMD%u): %d, retrying...\n",
+   mmc_hostname(host),
+   cmd->opcode, cmd->error);
+   cmd->retries--;
+   cmd->error = 0;
+   __mmc_start_request(host, areq->mrq);
+   return; /* wait for done/new event again */
}
 
mmc_retune_release(host);
@@ -3005,7 +2998,6 @@ void mmc_unregister_pm_notifier(struct mmc_host *host)
 void mmc_init_context_info(struct mmc_host *host)
 {
host->context_info.is_new_req = false;
-   host->context_info.is_done_rcv = false;
host->context_info.is_waiting_last_req = false;
 }
 
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 65f23a9ea724..d536325a9640 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -241,12 +241,10 @@ struct mmc_slot {
 
 /**
  * mmc_context_info - synchronization details for mmc context
- * @is_done_rcvwake up reason was done request
  * @is_new_req wake up reason was new request
  * @is_waiting_last_reqmmc context waiting for single running request
  */
 struct mmc_context_info {
-   boolis_done_rcv;
boolis_new_req;
boolis_waiting_last_req;
 };
-- 
2.13.6

[PATCH 03/12 v4] mmc: core: replace waitqueue with worker

2017-10-26 Thread Linus Walleij

The waitqueue in the host context is there to signal back from
mmc_request_done() through mmc_wait_data_done() that the hardware
is done with a command, and when the wait is over, the core
will typically submit the next asynchronous request that is pending
just waiting for the hardware to be available.

This is in the way for letting the mmc_request_done() trigger the
report up to the block layer that a block request is finished.

Re-jig this as a first step, remvoving the waitqueue and introducing
a work that will run after a completed asynchronous request,
finalizing that request, including retransmissions, and eventually
reporting back with a completion and a status code to the
asynchronous issue method.

This has the upside that we can remove the MMC_BLK_NEW_REQUEST
status code and the "new_request" state in the request queue
that is only there to make the state machine spin out
the first time we send a request.

Use the workqueue we introduced in the host for handling just
this, and then add a work and completion in the asynchronous
request to deal with this mechanism.

We introduce a pointer from mmc_request back to the asynchronous
request so these can be referenced from each other, and
augment mmc_wait_data_done() to use this pointer to get at the
areq and kick the worker since that function is only used by
asynchronous requests anyway.

This is a central change that let us do many other changes since
we have broken the submit and complete code paths in two, and we
can potentially remove the NULL flushing of the asynchronous
pipeline and report block requests as finished directly from
the worker.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c |  3 ++
 drivers/mmc/core/core.c  | 93 
 drivers/mmc/core/core.h  |  2 ++
 drivers/mmc/core/queue.c |  1 -
 include/linux/mmc/core.h |  3 +-
 include/linux/mmc/host.h |  7 ++--
 6 files changed, 59 insertions(+), 50 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index ea80ff4cd7f9..5c84175e49be 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1712,6 +1712,7 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
mmc_blk_data_prep(mq, mqrq, disable_multi, &do_rel_wr, &do_data_tag);
 
brq->mrq.cmd = &brq->cmd;
+   brq->mrq.areq = NULL;
 
brq->cmd.arg = blk_rq_pos(req);
if (!mmc_card_blockaddr(card))
@@ -1764,6 +1765,8 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
}
 
mqrq->areq.err_check = mmc_blk_err_check;
+   mqrq->areq.host = card->host;
+   INIT_WORK(&mqrq->areq.finalization_work, mmc_finalize_areq);
 }
 
 static bool mmc_blk_rw_cmd_err(struct mmc_blk_data *md, struct mmc_card *card,
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 9c3baaddb1bd..f6a51608ab0b 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -369,10 +369,15 @@ EXPORT_SYMBOL(mmc_start_request);
  */
 static void mmc_wait_data_done(struct mmc_request *mrq)
 {
-   struct mmc_context_info *context_info = &mrq->host->context_info;
+   struct mmc_host *host = mrq->host;
+   struct mmc_context_info *context_info = &host->context_info;
+   struct mmc_async_req *areq = mrq->areq;
 
context_info->is_done_rcv = true;
-   wake_up_interruptible(&context_info->wait);
+   /* Schedule a work to deal with finalizing this request */
+   if (!areq)
+   pr_err("areq of the data mmc_request was NULL!\n");
+   queue_work(host->req_done_wq, &areq->finalization_work);
 }
 
 static void mmc_wait_done(struct mmc_request *mrq)
@@ -695,43 +700,34 @@ static void mmc_post_req(struct mmc_host *host, struct 
mmc_request *mrq,
  * Returns the status of the ongoing asynchronous request, but
  * MMC_BLK_SUCCESS if no request was going on.
  */
-static enum mmc_blk_status mmc_finalize_areq(struct mmc_host *host)
+void mmc_finalize_areq(struct work_struct *work)
 {
+   struct mmc_async_req *areq =
+   container_of(work, struct mmc_async_req, finalization_work);
+   struct mmc_host *host = areq->host;
struct mmc_context_info *context_info = &host->context_info;
-   enum mmc_blk_status status;
-
-   if (!host->areq)
-   return MMC_BLK_SUCCESS;
-
-   while (1) {
-   wait_event_interruptible(context_info->wait,
-   (context_info->is_done_rcv ||
-context_info->is_new_req));
+   enum mmc_blk_status status = MMC_BLK_SUCCESS;
 
-   if (context_info->is_done_rcv) {
-   struct mmc_command *cmd;
+   if (context_info->is_done_rcv) {
+   struct mmc_command *cmd;
 
-   context_info->is_done_rcv = false;
-   cmd = host->areq->mrq->cmd;
+   context_info->is_done_rcv = false;
+   cmd = areq->mrq->cmd;
 
-

[PATCH 02/12 v4] mmc: core: add a workqueue for completing requests

2017-10-26 Thread Linus Walleij

As we want to complete requests autonomously from feeding the
host with new requests, we create a workqueue to deal with
this specifically in response to the callback from a host driver.
This is necessary to exploit parallelism properly.

This patch just adds the workqueu, later patches will make use of
it.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/core.c  | 9 +
 drivers/mmc/core/host.c  | 1 -
 include/linux/mmc/host.h | 4 
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 3d1270b9aec4..9c3baaddb1bd 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2828,6 +2828,14 @@ void mmc_start_host(struct mmc_host *host)
host->f_init = max(freqs[0], host->f_min);
host->rescan_disable = 0;
host->ios.power_mode = MMC_POWER_UNDEFINED;
+   /* Workqueue for completing requests */
+   host->req_done_wq = alloc_workqueue("mmc%d-reqdone",
+   WQ_FREEZABLE | WQ_HIGHPRI | WQ_MEM_RECLAIM,
+   0, host->index);
+   if (!host->req_done_wq) {
+   dev_err(mmc_dev(host), "could not allocate workqueue\n");
+   return;
+   }
 
if (!(host->caps2 & MMC_CAP2_NO_PRESCAN_POWERUP)) {
mmc_claim_host(host);
@@ -2849,6 +2857,7 @@ void mmc_stop_host(struct mmc_host *host)
 
host->rescan_disable = 1;
cancel_delayed_work_sync(&host->detect);
+   destroy_workqueue(host->req_done_wq);
 
/* clear pm flags now and let card drivers set them as needed */
host->pm_flags = 0;
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index e58be39b1568..8193363a5a46 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -381,7 +381,6 @@ struct mmc_host *mmc_alloc_host(int extra, struct device 
*dev)
INIT_DELAYED_WORK(&host->detect, mmc_rescan);
INIT_DELAYED_WORK(&host->sdio_irq_work, sdio_irq_work);
setup_timer(&host->retune_timer, mmc_retune_timer, (unsigned long)host);
-
/*
 * By default, hosts do not support SGIO or large requests.
 * They have to set these according to their abilities.
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index c296f4351c1d..94a646eebf05 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -423,6 +424,9 @@ struct mmc_host {
struct mmc_async_req*areq;  /* active async req */
struct mmc_context_info context_info;   /* async synchronization info */
 
+   /* finalization workqueue, handles finalizing requests */
+   struct workqueue_struct *req_done_wq;
+
/* Ongoing data transfer that allows commands during transfer */
struct mmc_request  *ongoing_mrq;
 
-- 
2.13.6

[PATCH 05/12 v4] mmc: core: do away with is_new_req

2017-10-26 Thread Linus Walleij

The host context member "is_new_req" is only assigned values,
never checked. Delete it.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/core.c  | 1 -
 drivers/mmc/core/queue.c | 5 -
 include/linux/mmc/host.h | 2 --
 3 files changed, 8 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 68125360a078..ad832317f25b 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2997,7 +2997,6 @@ void mmc_unregister_pm_notifier(struct mmc_host *host)
  */
 void mmc_init_context_info(struct mmc_host *host)
 {
-   host->context_info.is_new_req = false;
host->context_info.is_waiting_last_req = false;
 }
 
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index c46be4402803..4a0752ef6154 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -55,7 +55,6 @@ static int mmc_queue_thread(void *d)
req = blk_fetch_request(q);
mq->asleep = false;
cntx->is_waiting_last_req = false;
-   cntx->is_new_req = false;
if (!req) {
/*
 * Dispatch queue is empty so set flags for
@@ -109,10 +108,6 @@ static void mmc_request_fn(struct request_queue *q)
 
cntx = &mq->card->host->context_info;
 
-   if (cntx->is_waiting_last_req) {
-   cntx->is_new_req = true;
-   }
-
if (mq->asleep)
wake_up_process(mq->thread);
 }
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index d536325a9640..ceb58b27f402 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -241,11 +241,9 @@ struct mmc_slot {
 
 /**
  * mmc_context_info - synchronization details for mmc context
- * @is_new_req wake up reason was new request
  * @is_waiting_last_reqmmc context waiting for single running request
  */
 struct mmc_context_info {
-   boolis_new_req;
boolis_waiting_last_req;
 };
 
-- 
2.13.6

[PATCH 01/12 v4] mmc: core: move the asynchronous post-processing

2017-10-26 Thread Linus Walleij

This moves the asynchronous post-processing of a request over
to the finalization function.

The patch has a slight semantic change:

Both places will be in the code path for if (host->areq) and
in the same sequence, but before this patch, the next request
was started before performing post-processing.

The effect is that whereas before, the post- and preprocessing
happened after starting the next request, now the preprocessing
will happen after the request is done and before the next has
started which would cut half of the pre/post optimizations out.

In the later patch named "mmc: core: replace waitqueue with
worker" we move the finalization to a worker started by
mmc_request_done() and in the patch named
"mmc: block: issue requests in massive parallel" we introduce
a forked success/failure path that can quickly complete
requests when they come back from the hardware.

These two later patches together restore the same optimization
but in a more elegant manner that avoids the need to flush the
two-stage pipleline with NULL, something we remove between these
two patches in the commit named
"mmc: queue: stop flushing the pipeline with NULL".

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/core.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 12b271c2a912..3d1270b9aec4 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -746,6 +746,9 @@ static enum mmc_blk_status mmc_finalize_areq(struct 
mmc_host *host)
mmc_start_bkops(host->card, true);
}
 
+   /* Successfully postprocess the old request at this point */
+   mmc_post_req(host, host->areq->mrq, 0);
+
return status;
 }
 
@@ -790,10 +793,6 @@ struct mmc_async_req *mmc_start_areq(struct mmc_host *host,
if (status == MMC_BLK_SUCCESS && areq)
start_err = __mmc_start_data_req(host, areq->mrq);
 
-   /* Postprocess the old request at this point */
-   if (host->areq)
-   mmc_post_req(host, host->areq->mrq, 0);
-
/* Cancel a prepared request if it was not started. */
if ((status != MMC_BLK_SUCCESS || start_err) && areq)
mmc_post_req(host, areq->mrq, -EINVAL);
-- 
2.13.6

[PATCH 00/12 v4] multiqueue for MMC/SD

2017-10-26 Thread Linus Walleij

This switches the MMC/SD stack over to unconditionally
using the multiqueue block interface for block access.
This modernizes the MMC/SD stack and makes it possible
to enable BFQ scheduling on these single-queue devices.

This is the v4 version of this v3 patch set from february:
https://marc.info/?l=linux-mmc&m=148665788227015&w=2

The patches are available in a git branch:
https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson.git/log/?h=mmc-mq-v4.14-rc4

You can pull it to a clean kernel tree like this:
git checkout -b mmc-test v4.14-rc4
git pull
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson.git
mmc-mq-v4.14-rc4

I have now worked on it for more than a year. I was side
tracked to clean up some code, move request allocation to
be handled by the block layer, delete bounce buffer handling
and refactoring the RPMB support. With the changes to request
allocation, the patch set is a better fit and has shrunk
from 16 to 12 patches as a result.

It is still quite invasive. Yet it is something I think would
be nice to merge for v4.16...

The rationale for this approach was Arnd's suggestion to try to
switch the MMC/SD stack around so as to complete requests as
quickly as possible when they return from the device driver
so that new requests can be issued. We are doing this now:
the polling loop that was pulling NULL out of the request
queue and driving the pipeline with a loop is gone with
the next-to last patch ("block: issue requests in massive
parallel"). This sets the stage for MQ to go in and hammer
requests on the asynchronous issuing layer.

We use the trick to set the queue depth to 2 to get two
parallel requests pushed down to the host. I tried to set this
to 4, the code survives it, the queue just have three items
waiting to be submitted all the time.

In my opinion this is also a better fit for command queueuing.
Handling command queueing needs to happen in the asynchronous
submission codepath, so instead of waiting on a pending
areq, we just stack up requests in the command queue.

It sounds simple but I bet this drives a truck through Adrians
patch series. Sorry. :(

We are not issueing new requests from interrupt context: I still
have to post a work on a workqueue for it. Since there is the
retune and background operations that need to be checked after
every command and yeah, it needs to happen in blocking context
as far as I know.

I might make a hack trying to strip out the retune (etc) and
instead run request until something fail and report requests
back to the block layer in interrupt context. It would be an
interesting experiment, but for later.

We have parallelism in pre/post hooks also with multiqueue.
All asynchronous optimization that was there for the old block
layer is now also there for multiqueue.

Last time I followed up with some open questions
https://marc.info/?l=linux-mmc&m=149075698610224&w=2
I think these are now resolved.

As a result, the last patch is no longer in RFC state. I
think this works. (Famous last words, OK there WILL be
regressions but hey, we need to do this.)
You can see there are three steps:

- I do some necessary refactoring and need to move postprocessing
to after the requests have been completed. This clearly, as you
can see, introduce a performance regression in the dd test with
the patch:
"mmc: core: move the asynchronous post-processing"
It seems the random seek with find isn't much affected.

- I continue the refactoring and get to the point of issueing
requests immediately after every successful transfer, and the
dd performance is restored with patch
"mmc: queue: issue requests in massive parallel"

- Then I add multiqueue on top of the cake. So before the change
we have the nice performance we want so we can study the effect
of just introducing multiqueueing in the last patch
"mmc: switch MMC/SD to use blk-mq multiqueueing v4"

PERFORMANCE BEFORE AND AFTER:

BEFORE this patch series, on Ulf's next branch ending with
commit cf653c788a29fa70e07b86492a7599c471c705de (mmc-next)
Merge: 4dda8e1f70f8 eb701ce16a45 ("Merge branch 'fixes' into next")

sync
echo 3 > /proc/sys/vm/drop_caches
sync
time dd if=/dev/mmcblk3 of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 23.966583 seconds, 42.7MB/s
real0m 23.97s
user0m 0.01s
sys 0m 3.74s

mount /dev/mmcblk3p1 /mnt/
cd /mnt/
sync
echo 3 > /proc/sys/vm/drop_caches
sync
time find . > /dev/null
real0m 3.24s
user0m 0.22s
sys 0m 1.23s

sync
echo 3 > /proc/sys/vm/drop_caches
sync
iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test

randomrandom
kB reclenwrite rewritereadrereadread write
20480 4 1598 1559 6782 6740 6751 536
20480 8 2134 2281114491144911407 1145
20480 16 3695 4171176761767717638 1234
20480 32 5751

[PATCH v4] virtio_blk: Fix an SG_IO regression

2017-10-26 Thread Bart Van Assche

Avoid that submitting an SG_IO ioctl triggers a kernel oops that
is preceded by:

usercopy: kernel memory overwrite attempt detected to (null) () (6 bytes)
kernel BUG at mm/usercopy.c:72!

Reported-by: Dann Frazier 
Fixes: commit ca18d6f769d2 ("block: Make most scsi_req_init() calls implicit")
Signed-off-by: Bart Van Assche 
Cc: Michael S. Tsirkin 
Cc: Dann Frazier 
Cc:  # v4.13
---
 drivers/block/virtio_blk.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 34e17ee799be..e477d4a5181e 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -593,10 +593,20 @@ static int virtblk_map_queues(struct blk_mq_tag_set *set)
return blk_mq_virtio_map_queues(set, vblk->vdev, 0);
 }
 
+static void virtblk_initialize_rq(struct request *req)
+{
+   struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
+
+#ifdef CONFIG_VIRTIO_BLK_SCSI
+   scsi_req_init(&vbr->sreq);
+#endif
+}
+
 static const struct blk_mq_ops virtio_mq_ops = {
.queue_rq   = virtio_queue_rq,
.complete   = virtblk_request_done,
.init_request   = virtblk_init_request,
+   .initialize_rq_fn = virtblk_initialize_rq,
.map_queues = virtblk_map_queues,
 };
 
-- 
2.14.2

Re: [PATCH v4] virtio_blk: Fix an SG_IO regression

[GIT PULL] one nvme fix for 4.14

Re: [PATCH V2 0/2] block: remove unnecessary RESTART

Re: [PATCH V2 0/2] block: remove unnecessary RESTART

[PATCH V2 0/2] block: remove unnecessary RESTART

[PATCH V2 1/2] blk-mq: don't handle TAG_SHARED in restart

[PATCH V2 2/2] blk-mq: don't restart queue when .get_budget returns BLK_STS_RESOURCE

[PATCH] block: avoid to fail elevator switch

[RFC PATCH] blk-throttle: add burst allowance

RE: [PATCH 00/12 v4] multiqueue for MMC/SD

[PATCH 4/4] block: add WARN_ON if bdi register fail

[PATCH 3/4] bdi: add error handle for bdi_debug_register

[PATCH 1/4] bdi: add check for bdi_debug_root

[PATCH 2/4] bdi: convert bdi_debug_register to int

[PATCH 0/4] add error handle for bdi debugfs register

[GIT PULL] Block fixes for 4.14-rc

Re: [PATCH V12 0/5] mmc: Add Command Queue support

Re: [PATCH 00/12 v4] multiqueue for MMC/SD

Re: [PATCH V12 0/5] mmc: Add Command Queue support

Re: [PATCH 00/12 v4] multiqueue for MMC/SD

Re: [PATCH V12 0/5] mmc: Add Command Queue support

[PATCH 12/12 v4] mmc: switch MMC/SD to use blk-mq multiqueueing

[PATCH 08/12 v4] mmc: block: shuffle retry and error handling

[PATCH 11/12 v4] mmc: block: issue requests in massive parallel

[PATCH 09/12 v4] mmc: queue: stop flushing the pipeline with NULL

[PATCH 07/12 v4] mmc: queue: simplify queue logic

[PATCH 10/12 v4] mmc: queue/block: pass around struct mmc_queue_req*s

[PATCH 06/12 v4] mmc: core: kill off the context info

[PATCH 04/12 v4] mmc: core: do away with is_done_rcv

[PATCH 03/12 v4] mmc: core: replace waitqueue with worker

[PATCH 02/12 v4] mmc: core: add a workqueue for completing requests

[PATCH 05/12 v4] mmc: core: do away with is_new_req

[PATCH 01/12 v4] mmc: core: move the asynchronous post-processing

[PATCH 00/12 v4] multiqueue for MMC/SD

[PATCH v4] virtio_blk: Fix an SG_IO regression

35 matches

Site Navigation

Mail list logo

Footer information