from:"Linus Walleij"

Re: [PATCH V2 00/22] Replace the CFQ I/O Scheduler with BFQ

2016-09-01 Thread Linus Walleij

On Thu, Sep 1, 2016 at 12:09 AM, Mark Brown  wrote:

>  - Do some benchmarks on the current status of the various branches on
>relevant hardware (including trying to convert some of these slower
>devices to blk-mq and seeing what happens).  Linus has been working
>on this already in the context of MMC.

I'm trying to do a patch switching MMC to use blk-mq, so I can
benchmark performance before/after this.

While we expect mq to perform worse on single-hardware-queue
devices like these, we don't know until we tried, so I'm trying.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2 00/22] Replace the CFQ I/O Scheduler with BFQ

2016-09-08 Thread Linus Walleij

On Mon, Sep 5, 2016 at 5:56 PM, Bartlomiej Zolnierkiewicz
 wrote:

> I did this (switched MMC to blk-mq) some time ago.  Patches are
> extremely ugly and hacky (basically the whole MMC block layer
> glue code needs to be re-done) so I'm rather reluctant to
> sharing them yet (to be honest I would like to rewrite them
> completely before posting).

You're right, I can also see the quick and dirty replacement path,
but that is not an honest patch, we need to make a patch that takes
advantage of the new features of the MQ tag set.

There is a bit of mechanisms in mq for handling parallell work
better so that e.g. the request stacking with calling out to
.pre_req() and .post_req() need to be done
differently and sglist handling can be simplified AFAICT (still
reading up on it).

> I only did linear read tests (using dd) so far and results that
> I got were mixed (BTW the hardware I'm doing this work on is
> Odroid-XU3).  Pure block performance under maximum CPU frequency
> was slightly worse (5-12%) but the CPU consumption was reduced so
> when CPU was scaled down manually (or ondemand CPUfreq governor
> was used) blk-mq mode results were better then vanilla ones (up
> to 10% when CPU was scaled down to minimum frequency and even
> up to 50% when using ondemand governor - this finding is very
> interesting and needs to be investigated further).

Hm right, it is important to keep in mind that we may be trading
performance for scalability here.

Naive storage development only care about performance to hitting
the media and it may be a bit of narrow usecase to just get a
figure on the paper. In reality the system load when doing this
matters.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] block: remove remnant refs to hardsect

2016-09-14 Thread Linus Walleij

commit e1defc4ff0cf57aca6c5e3ff99fa503f5943c1f1
"block: Do away with the notion of hardsect_size"
removed the notion of "hardware sector size" from
the kernel in favor of logical block size, but
references remain in comments and documentation.

Update the remaining sites mentioning hardsect.

Signed-off-by: Linus Walleij 
---
 Documentation/block/biodoc.txt | 4 ++--
 block/bio.c| 2 +-
 fs/befs/linuxvfs.c | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index bcdb2b4c1f12..918e1e0d0e78 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -115,7 +115,7 @@ i. Per-queue limits/values exported to the generic layer by 
the driver
 
 Various parameters that the generic i/o scheduler logic uses are set at
 a per-queue level (e.g maximum request size, maximum number of segments in
-a scatter-gather list, hardsect size)
+a scatter-gather list, logical block size)
 
 Some parameters that were earlier available as global arrays indexed by
 major/minor are now directly associated with the queue. Some of these may
@@ -156,7 +156,7 @@ Some new queue property settings:
blk_queue_max_segment_size(q, max_seg_size)
Maximum size of a clustered segment, 64kB default.
 
-   blk_queue_hardsect_size(q, hardsect_size)
+   blk_queue_logical_block_size(q, logical_block_size)
Lowest possible sector size that the hardware can operate
on, 512 bytes default.
 
diff --git a/block/bio.c b/block/bio.c
index aa7354088008..a6d279e1ea9e 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1274,7 +1274,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
 
nr_pages += end - start;
/*
-* buffer must be aligned to at least hardsector size for now
+* buffer must be aligned to at least logical block size for now
 */
if (uaddr & queue_dma_alignment(q))
return ERR_PTR(-EINVAL);
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 7da05b159ade..bfe9f9994935 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -789,7 +789,7 @@ befs_fill_super(struct super_block *sb, void *data, int 
silent)
 * Will be set to real fs blocksize later.
 *
 * Linux 2.4.10 and later refuse to read blocks smaller than
-* the hardsect size for the device. But we also need to read at 
+* the logical block size for the device. But we also need to read at
 * least 1k to get the second 512 bytes of the volume.
 * -WD 10-26-01
 */ 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] block: remove blk_mq_alloc_single_hw_queue() prototype

2016-09-14 Thread Linus Walleij

The blk_mq_alloc_single_hw_queue() is a prototype artifact that
should have been removed with
commit cdef54dd85ad66e77262ea57796a3e81683dd5d6
"blk-mq: remove alloc_hctx and free_hctx methods" where the last
users of it were deleted.

Fixes: cdef54dd85ad ("blk-mq: remove alloc_hctx and free_hctx methods")
Cc: Christoph Hellwig 
Signed-off-by: Linus Walleij 
---
 include/linux/blk-mq.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index e43bbffb5b7a..d3fb9c3c6969 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -221,7 +221,6 @@ static inline u16 blk_mq_unique_tag_to_tag(u32 unique_tag)
 }
 
 struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *, const int 
ctx_index);
-struct blk_mq_hw_ctx *blk_mq_alloc_single_hw_queue(struct blk_mq_tag_set *, 
unsigned int, int);
 
 int blk_mq_request_started(struct request *rq);
 void blk_mq_start_request(struct request *rq);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-06 Thread Linus Walleij

On Tue, Oct 4, 2016 at 9:14 PM, Tejun Heo  wrote:

> I get that bfq can be a good compromise on most desktop workloads and
> behave reasonably well for some server workloads with the slice
> expiration mechanism but it really isn't an IO resource partitioning
> mechanism.

Not just desktops, also Android phones.

So why not have BFQ as a separate scheduling policy upstream,
alongside CFQ, deadline and noop?

I understand the CPU scheduler people's position that they want
one scheduler for everyone's everyday loads (except RT and
SCHED_DEADLINE) and I guess that is the source of the highlander
"there can be only one" argument, but note this:

kernel/Kconfig.preempt:

config PREEMPT_NONE
bool "No Forced Preemption (Server)"
config PREEMPT_VOLUNTARY
bool "Voluntary Kernel Preemption (Desktop)"
config PREEMPT
bool "Preemptible Kernel (Low-Latency Desktop)"

We're already doing the per-usecase Kconfig thing for preemption.
But maybe somebody already hates that and want to get rid of it,
I don't know.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-27 Thread Linus Walleij

On Thu, Oct 27, 2016 at 11:08 PM, Jens Axboe  wrote:

> blk-mq has evolved to support a variety of devices, there's nothing
> special about mmc that can't work well within that framework.

There is. Read mmc_queue_thread() in drivers/mmc/card/queue.c

This repeatedly calls req = blk_fetch_request(q);, starting one request
and then getting the next one off the queue, including reading
a few NULL requests off the end of the queue (to satisfy the
semantics of its state machine.

It then preprocess each request by esstially calling .pre() and .post()
hooks all the way down to the driver, flushing its mapped
sglist from CPU to DMA device memory (not a problem on x86 and
other DMA-coherent archs, but a big win on the incoherent ones).

In the attempt that was posted recently this is achieved by lying
and saying the HW queue is two items deep and eating requests
off that queue calling pre/post on them.

But as there actually exist MMC cards with command queueing, this
would become hopeless to handle, the hw queue depth has to reflect
the real depth. What we need is for the block core to call pre/post
hooks on each request.

The "only" thing that doesn't work well after that is that CFQ is no
longer in action, which will have interesting effects on MMC throughput
in any fio-like stress test as it is mostly single-hw-queue.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-28 Thread Linus Walleij

On Fri, Oct 28, 2016 at 12:27 AM, Linus Walleij
 wrote:
> On Thu, Oct 27, 2016 at 11:08 PM, Jens Axboe  wrote:
>
>> blk-mq has evolved to support a variety of devices, there's nothing
>> special about mmc that can't work well within that framework.
>
> There is. Read mmc_queue_thread() in drivers/mmc/card/queue.c

So I'm not just complaining by the way, I'm trying to fix this. Also
Bartlomiej from Samsung has done some stabs at switching MMC/SD
to blk-mq. I just rebased my latest stab at a naïve switch to blk-mq
to v4.9-rc2 with these results.

The patch to enable MQ looks like this:
https://git.kernel.org/cgit/linux/kernel/git/linusw/linux-stericsson.git/commit/?h=mmc-mq&id=8f79b527e2e854071d8da019451da68d4753f71d

I run these tests directly after boot with cold caches. The results
are consistent: I ran the same commands 10 times in a row.


BEFORE switching to BLK-MQ (clean v4.9-rc2):

time dd if=/dev/mmcblk0 of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 47.781464 seconds, 21.4MB/s
real0m 47.79s
user0m 0.02s
sys 0m 9.35s

mount /dev/mmcblk0p1 /mnt/
cd /mnt/
time find . > /dev/null
real0m 3.60s
user0m 0.25s
sys 0m 1.58s

mount /dev/mmcblk0p1 /mnt/
iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test
(kBytes/second)
randomrandom
kB  reclenwrite  rewritereadrereadread write
 20480   4 2112 2157 6052 6060 6025   40
 20480   8 4820 5074 9163 9121 9125   81
 20480  16 5755 5242123171232012280  165
 20480  32 6176 6261149811498714962  336
 20480  64 6547 5875168261682816810  692
 20480 128 6762 6828178991789617896 1408
 20480 256 6802 6871169601751318373 3048
 20480 512 7220 7252186751874618741 7228
 204801024 7222 7304184361785818246 7322
 204802048 7316 7398187441875118526 7419
 204804096 7520 7636207742099520703 7609
 204808192 7519 7704218502148921467 7663
 20480   16384 7395 7782223992221022215 7781


AFTER switching to BLK-MQ:

time dd if=/dev/mmcblk0 of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 60.551117 seconds, 16.9MB/s
real1m 0.56s
user0m 0.02s
sys 0m 9.81s

mount /dev/mmcblk0p1 /mnt/
cd /mnt/
time find . > /dev/null
real0m 4.42s
user0m 0.24s
sys 0m 1.81s

mount /dev/mmcblk0p1 /mnt/
iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test
(kBytes/second)
randomrandom
kB  reclenwrite  rewritereadrereadread write
 20480   4 2086 2201 6024 6036 6006   40
 20480   8 4812 5036 8014 9121 9090   82
 20480  16 5432 563312267 977612212  168
 20480  32 6180 6233148701489114852  340
 20480  64 6382 5454167441677116746  702
 20480 128 6761 6776178161784617836 1394
 20480 256 6828 6842177891789517094 3084
 20480 512 7158 7222179571768117698 7232
 204801024 7215 7274186421767918031 7300
 204802048 7229 7269179431864217732 7358
 204804096 7212 7360182721815718889 7371
 204808192 7008 7271186321870718225 7282
 20480   16384 6889 7211182431842918018 7246


A simple dd readtest of 1 GB is always consistently 10+
seconds slower with MQ. find in the rootfs is a second slower.
iozone results are consistently lower throughput or the same.

This is without using Bartlomiej's clever hack to pretend we have
2 elements in the HW queue though. His early tests indicate that
it doesn't help much: the performance regression we see is due to
lack of block scheduling.

I try to find a way forward with this, and also massage the MMC/SD
code to be more MQ friendly to begin with (like only pick requests
when we get a request notification and stop pulling NULL requests
off the queue) but it's really a messy piece of code.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-28 Thread Linus Walleij

On Fri, Oct 28, 2016 at 4:22 PM, Jens Axboe  wrote:
> On 10/28/2016 03:32 AM, Linus Walleij wrote:
>>
>> This is without using Bartlomiej's clever hack to pretend we have
>> 2 elements in the HW queue though. His early tests indicate that
>> it doesn't help much: the performance regression we see is due to
>> lack of block scheduling.
>
> A simple dd test, I don't see how that can be slower due to lack of
> scheduling. There's nothing to schedule there, just issue them in order?

Yeah I guess you're right, I guess it could be in part to not having
activated front- and back-end merges properly as Christoph pointed
out, I'll look closer at this.

> So that would probably be where I would start looking. A blktrace of the
> in-kernel code and the blk-mq enabled code would perhaps be
> enlightening. I don't think it's worth looking at the more complex test
> cases until the dd test case is at least as fast as the non-mq version.

Yeah.

> Was that with CFQ, btw, or what scheduler did it run?

CFQ, just plain defconfig.

> It'd be nice to NOT have to rely on that fake QD=2 setup, since it will
> mess with the IO scheduling as well.

I agree.

>> I try to find a way forward with this, and also massage the MMC/SD
>> code to be more MQ friendly to begin with (like only pick requests
>> when we get a request notification and stop pulling NULL requests
>> off the queue) but it's really a messy piece of code.
>
> Yeah, it does look pretty messy... I'd be happy to help out with that,
> and particularly in figuring out why the direct conversion is slower for
> a basic 'dd' test case.

I'm looking into it.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-28 Thread Linus Walleij

On Fri, Oct 28, 2016 at 5:29 PM, Christoph Hellwig  wrote:
> On Fri, Oct 28, 2016 at 11:32:21AM +0200, Linus Walleij wrote:
>> So I'm not just complaining by the way, I'm trying to fix this. Also
>> Bartlomiej from Samsung has done some stabs at switching MMC/SD
>> to blk-mq. I just rebased my latest stab at a naīve switch to blk-mq
>> to v4.9-rc2 with these results.
>>
>> The patch to enable MQ looks like this:
>> https://git.kernel.org/cgit/linux/kernel/git/linusw/linux-stericsson.git/commit/?h=mmc-mq&id=8f79b527e2e854071d8da019451da68d4753f71d
>>
>> I run these tests directly after boot with cold caches. The results
>> are consistent: I ran the same commands 10 times in a row.
>
> A couple comments from a quick look over the patch:
>
> In the changelog you complain:
>
> ". Lack of front- and back-end merging in the MQ block layer creating
> several small requests instead of a few large ones."
>
> In blk-mq merging is controller by the BLK_MQ_F_SHOULD_MERGE and
> BLK_MQ_F_SG_MERGE flags.  You set the former, but not the latter.
> BLK_MQ_F_SG_MERGE controls wether multiple physical contiguous pages get
> merged into a single segment.  For a dd after a fresh boot that is
> probably very common.  Except for the polarity of the merge flags the
> basic merge functionality between the legacy and blk-mq path should be
> the same, and if they aren't you've found a bug we need to address.

Aha OK I will make sure to set both flags next time. (I will also stop
guessing about that as a cause since that part probably works.)

> You also say that you disable the pipelining.  How much of a performance
> gain did this feature give when added? How much does just removing that
> on it's own cost you?

Interestingly, the original commit doesn't say.
http://marc.info/?l=linaro-dev&m=137645684811479&w=2

It however dependends the cache architecture of the machine how
much is won. The heavier the cache flushes, the more it gains.

I guess I need to make a patch removing that mechanism to bench
it. It's pretty hard to get rid of because it goes really deep into the
MMC subsystem. It's massaged in like a schampoo.

> While I think that features is rather messy and
> should be avoided if possible I don't see how it's impossible to
> implement in blk-mq.

It's probably possible. What I discussed with Arnd was to let
the blk-mq core call out to these pre-request and post-request
hooks on new requests in parallel with processing a request or
a queue of requests. I.e. add .prep_request() and .unprep_request()
callbacks to struct blk_mq_ops.

I tried to understand if the existing .init_request and .exit_request
callbacks could be used. But as I understand it they are only used
to allocate and prepare the extra per-request-associated memory
and state, and does not have access to the request per se,
so it doesn't know anything about the actual request when
.init_request() is called.

So we're looking for something called whenever the contents of
a request are done, right before queueing it, and right after
dequeueing it after being served.

>  If you just increase your queue depth and use
> the old scheme you should get it - if you currently can't handle the
> second command for some reason (i.e. the special request magic) you
> can just return BLK_MQ_RQ_QUEUE_BUSY from the queue_rq function.

Bartlomiejs patch set did that, but I haven't been able to reproduce it.

I will try to make a clean patch in the spirit of his.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] mmc: block: delete packed command support

2016-11-21 Thread Linus Walleij

[CC the block layer maintainers so they can smack my fingers
if I misunderstood any of how the block layer semantics work...]

On Mon, Nov 21, 2016 at 3:17 PM, Adrian Hunter  wrote:
> On 21/11/16 16:02, Ulf Hansson wrote:

>> I also believe, the implementation would become really complex, as you
>> would need to "hold" the first write request, while waiting for a
>> second to arrive. Then for how long shall you hold it? And then what
>> if you are unlucky so the next is read request, thus you can't pack
>> them. The solution will be suboptimal, won't it?
>
> It doesn't hold and wait now.  So why would it in the blk-mq case?

The current kthread in drivers/mmc/card/queue.c looks like this
in essence:

struct request *r;

while (1)
r = blk_fetch_request(q);
issue();
}

It is pulling out as much as it can to asynchronously issue
two requests in parallel and also the packed command (as
mentioned in the commitlog to $SUBJECT) pulls out even more
stuff of the queue to speculatively issue things in a packed
command.

The block layer isn't supposed to be used like this. It is
supposed to be used like so:

1. You get notified by the request_fn that is passed with
   blk_init_queue()
2. The request function fires a work.
3. The work pick ONE request with blk_fetch_request()
  and handles it.
4. Repeat from (1)

Instead of doing this the MMC layer kthread is speculatively
pulling out stuff of the queue whenever it can, including
pulling out a few NULL at the end before it stops. The
mechanism is similar to a person running along a queue and
picking a segment of passengers into a bus to send off,
batching them. Which is clever, but the block layer is not
supposed to be used like that. It just happens to work.

In blk-mq this speculative fetching is no longer possible.
Instead you register a notification function in struct blk_mq_ops
vtable .queue_rq() callback: this will be called by the block
layer core whenever there is a request available on "our"
queue. It further approves a .init_request() callback that is
called overhead to allocate a per-request context, such as
the current struct mmc_queue_req - just not quite because
the current mmc_queue_req is not mapped 1-to-1 onto a
request from the block layer because of packed command;
but it is after this patch, hehe ;)

Any speculative batching needs to happen *after* this, i.e.
the MMC layer would have to report a certain larger queue
depth (if you set it to 1 you only ever get one request at the time
and have to finish it before you get a new one), group the
requests itself with packed command or command queueing,
then signal them back as they are confirmed completed by
the device, or, if they cannot be grouped, handle as far as you
can and put the remaining requests back on the queue
(creating a "bubble" in the pipeline).

Relying on iterating and inspecting the block layer queue is
*not* possible with blk-mq, sure blk_fetch_request() is still
there, but if you call it on an mq-registered queue, it will
crash the kernel. (At least it did for me.) Clearly it is not
intended to be used with MQ: none of the MQ-converted
subsystems use this. (drivers/mtd/ubi/block.c is a good
simple example)

I liken these mechanisms to a pipeline:

- The two-levels deep speculation buffer in struct mmc_queue
  field .mqrq[2] is a "software pipeline" in the MMC layer (so we
  can prepare and handle requests in parallel)

- The packed command and command queue is a
  hardware-supported pipeline on the device side.

Both try to overcome the hardware limitations of the MMC/SD
logical interface. This batching-style pipelining isn't really
solving the problem the way real multiqueue hardware does,
so it is a poor man's patchwork to the problem.

In either case, as Ulf notes, you need to get a few requests
off the queue and group them using packed command or
command queueing if possible, but that grouping needs to
happen in the MMC/SD layer, after picking the requests from
the queue. I think it is OK to do so and just put any requests
you cannot pack into the pipeline back on the queue. But I
am not sure (still learning)

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] RFD: switch MMC/SD to use blk-mq multiqueueing

2016-12-20 Thread Linus Walleij

HACK ALERT: DO NOT MERGE THIS! IT IS A FYI PATCH FOR DISCUSSION
ONLY.

This hack switches the MMC/SD subsystem from using the legacy blk
layer to using blk-mq. It does this by registering one single
hardware queue, since MMC/SD has only one command pipe. I kill
off the worker thread altogether and let the MQ core logic fire
sleepable requests directly into the MMC core.

We emulate the 2 elements deep pipeline by specifying queue depth
2, which is an elaborate lie that makes the block layer issue
another request while a previous request is in transit. It't not
neat but it works.

As the pipeline needs to be flushed by pushing in a NULL request
after the last block layer request I added a delayed work with a
timeout of zero. This will fire as soon as the block layer stops
pushing in requests: as long as there are new requests the MQ
block layer will just repeatedly cancel this pipeline flush work
and push new requests into the pipeline, but once the requests
stop coming the NULL request will be flushed into the pipeline.

It's not pretty but it works... Look at the following performance
statistics:

BEFORE this patch:

time dd if=/dev/mmcblk0 of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 45.145874 seconds, 22.7MB/s
real0m 45.15s
user0m 0.02s
sys 0m 7.51s

mount /dev/mmcblk0p1 /mnt/
cd /mnt/
time find . > /dev/null
real0m 3.70s
user0m 0.29s
sys 0m 1.63s

AFTER this patch:

time dd if=/dev/mmcblk0 of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 45.285431 seconds, 22.6MB/s
real0m 45.29s
user0m 0.02s
sys 0m 6.58s

mount /dev/mmcblk0p1 /mnt/
cd /mnt/
time find . > /dev/null
real0m 4.37s
user0m 0.27s
sys 0m 1.65s

The results are consistent.

As you can see, for a straight dd-like task, we get more or less the
same nice parallelism as for the old framework. I have confirmed
through debugprints that indeed this is because the two-stage pipeline
is full at all times.

However, for spurious reads in the find command, we already see a big
performance regression.

This is because there are many small operations requireing a flush of
the pipeline, which used to happen immediately with the old block
layer interface code that used to pull a few NULL requests off the
queue and feed them into the pipeline immediately after the last
request, but happens after the delayed work is executed in this
new framework. The delayed work is never quick enough to terminate
all these small operations even if we schedule it immediately after
the last request.

AFAICT the only way forward to provide proper performance with MQ
for MMC/SD is to get the requests to complete out-of-sync, i.e. when
the driver calls back to MMC/SD core to notify that a request is
complete, it should not notify any main thread with a completion
as is done right now, but instead directly call blk_end_request_all()
and only schedule some extra communication with the card if necessary
for example to handle an error condition.

This rework needs a bigger rewrite so we can get rid of the paradigm
of the block layer "driving" the requests throgh the pipeline.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c |  43 +++
 drivers/mmc/core/queue.c | 308 ++-
 drivers/mmc/core/queue.h |  13 +-
 3 files changed, 203 insertions(+), 161 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index bab3f07b1117..308ab7838f0d 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -90,7 +91,6 @@ static DEFINE_SPINLOCK(mmc_blk_lock);
  * There is one mmc_blk_data per slot.
  */
 struct mmc_blk_data {
-   spinlock_t  lock;
struct device   *parent;
struct gendisk  *disk;
struct mmc_queue queue;
@@ -1181,7 +1181,8 @@ static int mmc_blk_issue_discard_rq(struct mmc_queue *mq, 
struct request *req)
goto retry;
if (!err)
mmc_blk_reset_success(md, type);
-   blk_end_request(req, err, blk_rq_bytes(req));
+
+   blk_mq_complete_request(req, err);
 
return err ? 0 : 1;
 }
@@ -1248,7 +1249,7 @@ static int mmc_blk_issue_secdiscard_rq(struct mmc_queue 
*mq,
if (!err)
mmc_blk_reset_success(md, type);
 out:
-   blk_end_request(req, err, blk_rq_bytes(req));
+   blk_mq_complete_request(req, err);
 
return err ? 0 : 1;
 }
@@ -1263,7 +1264,8 @@ static int mmc_blk_issue_flush(struct mmc_queue *mq, 
struct request *req)
if (ret)
ret = -EIO;
 
-   blk_end_request_all(req, ret);
+   /* FIXME: was using blk_end_request_all() to flush */
+   blk_mq_complete_request(req, ret);
 
return ret ? 0 : 1;
 }
@@ -1585,10 +1587,12 @@ static int mmc_blk_cmd_err(struct mmc_blk_

Re: [PATCH v2] RFD: switch MMC/SD to use blk-mq multiqueueing

2016-12-27 Thread Linus Walleij

On Wed, Dec 21, 2016 at 6:22 PM, Ritesh Harjani  wrote:

> I may have some silly queries here. Please bear with my little understanding
> on blk-mq.

It's OK we need to build consensus.

> On 12/20/2016 7:31 PM, Linus Walleij wrote:

>> This hack switches the MMC/SD subsystem from using the legacy blk
>> layer to using blk-mq. It does this by registering one single
>> hardware queue, since MMC/SD has only one command pipe. I kill
>
> Could you please confirm on this- does even the HW/SW CMDQ in emmc would use
> only 1 hardware queue with (say ~31) as queue depth, of that HW queue? Is
> this understanding correct?

Yes as far as I can tell.

But you may have to tell me, because I'm not an expert in CMDQ.

Multiple queues are for when you can issue different request truly parallel
without taking any previous and later request into account. CMDQ on
MMC seems to require rollback etc if any of the issued requests after
a certain request fail, and then it is essentially one queue, like a pipeline,
and if one request fails all requests after that request needs to be backed
out, correct?

> Or will it be possible to have more than 1 HW Queue with lesser queue depth
> per HW queue?

Depends on the above.

Each queue must have its own error handling, and work isolated from
the other queues to be considered a real hardware queue.

If the requests have dependencies, like referring each other, or
as pointed out, needing to be cancelled if there is an error on a totally
different request, it is just a deep pipeline, single hardware queue.

> I understand that the block drivers are moving to blk-mq framework.
> But keeping that reason apart, do we also anticipate any theoretical
> performance gains in moving mmc driver to blk-mq framework  - for both in
> case of legacy emmc, and SW/HW CMDQ in emmc ? And by how much?

On the contrary we expect a performance regression as mq has no
scheduling. MQ is created for the usecase where you have multiple
hardware queues and they are so hungry for work that you have a problem
feeding them all. Needless to say, on eMMC/SD we don't have that problem
right now atleast.

> It would be even better to know if adding of scheduler to blk-mq will make
> any difference in perf gains or not in this case?

The tentative plan as I see it is to shunt in BFQ as the default scheduler
for MQ in the single-hw-queue case. The old block layer schedulers getting
deprecated in the process. But this is really up to the block layer developers.

> Do we any rough estimate or study on that?
> This is only out of curiosity and for information purpose.

No it is a venture into the unknown to go where no man has gone before.

I just have a good feeling about this and confidence that it will work out.

So I am doing RFD patches like this one to see if I'm right.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] RFD: switch MMC/SD to use blk-mq multiqueueing

2016-12-28 Thread Linus Walleij

On Wed, Dec 28, 2016 at 9:55 AM, Christoph Hellwig  wrote:
> On Tue, Dec 27, 2016 at 01:21:28PM +0100, Linus Walleij wrote:

>> On the contrary we expect a performance regression as mq has no
>> scheduling. MQ is created for the usecase where you have multiple
>> hardware queues and they are so hungry for work that you have a problem
>> feeding them all. Needless to say, on eMMC/SD we don't have that problem
>> right now atleast.
>
> That's not entirely correct.  blk-mq is designed to replace the legacy
> request code eventually.  The focus is on not wasting CPU cycles, and
> to support multiple queues (but not require them).

OK! Performance is paramount, so this indeed confirms that we need
to re-engineer the MMC/SD stack to not rely on this kthread to "drive"
transactions, instead we need to complete them quickly from the driver
callbacks and let MQ drive.

A problem here is that issueing the requests are in blocking context
while completion is in IRQ context (for most drivers) so we need to
look into this.

>  Sequential workloads
> should always be as fast as the legacy path and use less CPU cycles,

That seems more or less confirmed by my dd-test in the commit
message. sys time is really small with the simple time+dd tests.

> for random workloads we might have to wait for I/O scheduler support,
> which is under way now:
>
> http://git.kernel.dk/cgit/linux-block/log/?h=blk-mq-sched

Awesome.

> All that assumes a properly converted driver, which as seen by your
> experiments isn't easy for MMC as it's a very convoluted beast thanks
> the hardware interface which isn't up to the standards we expect from
> block storage protocols.

I think we can hash it out, we just need to rewrite the MMC/SD
core request handling a bit.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] RFD: switch MMC/SD to use blk-mq multiqueueing

2017-01-03 Thread Linus Walleij

On Mon, Jan 2, 2017 at 10:40 AM, Arnd Bergmann  wrote:

> b) without MMC CMDQ support:
>   - report queue depth of '2'
>   - first request gets handled as above
>   - if one request is pending, prepare the second request and
> add a pointer to the mmc host structure (not that different
> from what we do today)
>   - when the host driver completes a request, have it immediately
> issue the next one from the interrupt handler. In case we need
> to sleep here, use a threaded IRQ, or a workqueue. This should
> avoid the need for the NULL requests

This part we can do already today with the old block layer and I think
we (heh, I guess me) should do that as the first step.

After this migrating to blk-mq becomes much easier.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] RFD: switch MMC/SD to use blk-mq multiqueueing

2017-01-03 Thread Linus Walleij

On Mon, Jan 2, 2017 at 12:55 PM, Bart Van Assche
 wrote:
> On Thu, 2016-12-29 at 00:59 +0100, Linus Walleij wrote:
>> A problem here is that issueing the requests are in blocking context
>> while completion is in IRQ context (for most drivers) so we need to
>> look into this.
>
> Hello Linus,
>
> Although I'm not sure whether I understood you correctly: are you familiar
> with the request queue flag BLK_MQ_F_BLOCKING, a flag that was introduced
> for the nbd driver?

BLK_MQ_F_BLOCKING is what the patch set is currently using...

The problem I have is that request need to be *issued* in blocking
context and *completed* in fastpath/IRQ context.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH PoCv2 0/2] mmc: add blk-mq support

2017-01-04 Thread Linus Walleij

On Tue, Jan 3, 2017 at 6:30 PM, Bartlomiej Zolnierkiewicz
 wrote:

Interesting patch!

> The differences between these patches and Linus' RFD patch:
> - request completion is handled from the IRQ handler
>   (or tasklet context as happens in dw_mmc host driver)

I think we need to make a patch like this separately to the
current (only old blk) code, to stop the kthread from forcing
in NULL requests to clean the pipeline.

We should also avoid sending any completion at
all: if we know we only get one request at a time as your
patch does, there is no reason to wait for a completion: we
will always be complete when a new request arrives.

Same with two requests in parallel, we can trust the block
layer to not send more than 2 (the queue depth) so there is
no reason to wait for a completion.

I'm trying to work in this direction at least, your patch is great
inspiration!

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Outstanding MQ questions from MMC

2017-04-15 Thread Linus Walleij

On Fri, Apr 14, 2017 at 8:41 PM, Avri Altman  wrote:
> [Me]
>> 2. Turn RPMB and other ioctl() MMC operations into mmc_queue_req
>>things and funnel them into the block scheduler
>>using REQ_OP_DRV_IN/OUT requests.
>>
>
> Accessing the RPMB is done via a strange protocol, in which each access is 
> comprised of several requests.
> For example, writing to the RPMB will require sending 5 different requests:
> 2 requests to read the write counter, and then 3 more requests for the write 
> operation itself.
>
> Once the sequence has started, it should not get interfered by other 
> requests, or the operation will fail.

So I guess currently something takes a host lock and then performs the
5 requests.

Thus we need to send a single custom request containing a list of 5
things to do, and return after that.

Or do you mean that we return to userspace inbetween these different
requests and the sequencing is done in userspace?

I hope not because that sounds fragile, like userspace could crash and
leave the host lock dangling :/

Yours,
Linus Walleij

[PATCH 0/5] mmc: core: modernize ioctl() requests

2017-05-10 Thread Linus Walleij

This is a series that starts to untangle the MMC "big host lock",
i.e. what is taken by issueing mmc_claim_host() which usually happens
through mmc_get_card().

The host lock is standing in the way of a bunch of modernizations,
because the block layer interface takes this lock when a new request
arrives, then will not release it until the queue is empty, which
is detected by getting NULL from blk_fetch_request().

This does not work with the new MQ block layer because it issues
requests asynchrously and there is no way of telling if the submit
queue is empty or not. Block requests must always be served. Trying
to introduce an interface for figuring out if the queue is empty
is probably generally a bad idea as it will require cross-talk
between all CPUs and then we hit Amdahl's Law again when scaling
upwards with CPUs.

So Arnd Bergmann suggested that whatever parallel requests the
host lock is protecting, maybe these requests can be funneled
through the block layer itself?

It turns out that most block drivers already do this, by using the
"special" block requests REQ_OP_DRV_IN and REQ_OP_DRV_OUT. And
it is what we should have done from the beginning.

To do this, the per-request extra data (which I think is also
referred to as "tag") need to be handled the same way that all
other block drivers do it: allocate it through the block layer.
In the MMC stack, this extra per-request data (tag) is called
struct mmc_queue_req.

So the second patch convert to using the generic block layer
mechanism to allocate tags. This makes the core simpler IMO and
that is why the patch series has negative line count. We move
stuff over to the block layer.

The first patch cleans out bounce buffer configurability and
move that to be a property of the host rather than a Kconfig option
in order to make it cleaner to do the rest of the refactorings.

The last two patches moves the ioctl() calls over to use the
new per-request tags and removes two instances of mmc_get_card(),
so we start to untangle the big host lock.

This approach can be used to get rid of the debugfs mmc_get_card()
as well but I want to start with the ioctl()s.

It already fixes a problem: userspace and the block layer could
essentially starve each other by bombing requests from either
block access or ioctl() congesting the host lock. With this,
ioctl() operations get scheduled by the block layer with
everything else. Not that I know how the block layer prioritizes
REQ_OP_DRV_IN and REQ_OP_DRV_OUT requests, but I am confident
it does a better job than the first-come-first-served host lock.

This drives a truck through mine and Adrians patch sets for
multiqueue and command queueing respectively, but I think that
both of our patch series will get easier to implement if we
exploit these patches as a base for future work, so if they can
get positive reviews and does not make everything explode, I
would suggest we merge them as a starter for the v4.13 kernel
cycle.

Linus Walleij (5):
  mmc: core: Delete bounce buffer Kconfig option
  mmc: core: Allocate per-request data using the block layer core
  mmc: block: Tag is_rpmb as bool
  mmc: block: move single ioctl() commands to block requests
  mmc: block: move multi-ioctl() to use block layer

 drivers/mmc/core/Kconfig  |  18 
 drivers/mmc/core/block.c  | 126 +++--
 drivers/mmc/core/queue.c  | 235 --
 drivers/mmc/core/queue.h  |  26 +++--
 drivers/mmc/host/cavium.c |   3 +
 drivers/mmc/host/pxamci.c |   6 ++
 include/linux/mmc/card.h  |   2 -
 include/linux/mmc/host.h  |   1 +
 8 files changed, 161 insertions(+), 256 deletions(-)

-- 
2.9.3

[PATCH 1/5] mmc: core: Delete bounce buffer Kconfig option

2017-05-10 Thread Linus Walleij

This option is activated by all multiplatform configs and what
not so we almost always have it turned on, and the memory it
saves is negligible, even more so moving forward. The actual
bounce buffer only gets allocated only when used, the only
thing the ifdefs are saving is a little bit of code.

It is highly improper to have this as a Kconfig option that
get turned on by Kconfig, make this a pure runtime-thing and
let the host decide whether we use bounce buffers. We add a
new property "disable_bounce" to the host struct.

Notice that mmc_queue_calc_bouncesz() already disables the
bounce buffers if host->max_segs != 1, so any arch that has a
maximum number of segments higher than 1 will have bounce
buffers disabled.

The option CONFIG_MMC_BLOCK_BOUNCE is default y so the
majority of platforms in the kernel already have it on, and
it then gets turned off at runtime since most of these have
a host->max_segs > 1. The few exceptions that have
host->max_segs == 1 and still turn off the bounce buffering
are those that disable it in their defconfig.

Those are the following:

arch/arm/configs/colibri_pxa300_defconfig
arch/arm/configs/zeus_defconfig
- Uses MMC_PXA, drivers/mmc/host/pxamci.c
- Sets host->max_segs = NR_SG, which is 1
- This needs its bounce buffer deactivated so we set
  host->disable_bounce to true in the host driver

arch/arm/configs/davinci_all_defconfig
- Uses MMC_DAVINCI, drivers/mmc/host/davinci_mmc.c
- This driver sets host->max_segs to MAX_NR_SG, which is 16
- That means this driver anyways disabled bounce buffers
- No special action needed for this platform

arch/arm/configs/lpc32xx_defconfig
arch/arm/configs/nhk8815_defconfig
arch/arm/configs/u300_defconfig
- Uses MMC_ARMMMCI, drivers/mmc/host/mmci.[c|h]
- This driver by default sets host->max_segs to NR_SG,
  which is 128, unless a DMA engine is used, and in that case
  the number of segments are also > 1
- That means this driver already disables bounce buffers
- No special action needed for these platforms

arch/arm/configs/sama5_defconfig
- Uses MMC_SDHCI, MMC_SDHCI_PLTFM, MMC_SDHCI_OF_AT91, MMC_ATMELMCI
- Uses drivers/mmc/host/sdhci.c
- Normally sets host->max_segs to SDHCI_MAX_SEGS which is 128 and
  thus disables bounce buffers
- Sets host->max_segs to 1 if SDHCI_USE_SDMA is set
- SDHCI_USE_SDMA is only set by SDHCI on PCI adapers
- That means that for this platform bounce buffers are already
  disabled at runtime
- No special action needed for this platform

arch/blackfin/configs/CM-BF533_defconfig
arch/blackfin/configs/CM-BF537E_defconfig
- Uses MMC_SPI (a simple MMC card connected on SPI pins)
- Uses drivers/mmc/host/mmc_spi.c
- Sets host->max_segs to MMC_SPI_BLOCKSATONCE which is 128
- That means this platform already disables bounce buffers at
  runtime
- No special action needed for these platforms

arch/mips/configs/cavium_octeon_defconfig
- Uses MMC_CAVIUM_OCTEON, drivers/mmc/host/cavium.c
- Sets host->max_segs to 16 or 1
- Setting host->disable_bounce to be sure for the 1 case

arch/mips/configs/qi_lb60_defconfig
- Uses MMC_JZ4740, drivers/mmc/host/jz4740_mmc.c
- This sets host->max_segs to 128 so bounce buffers are
  already runtime disabled
- No action needed for this platform

It would be interesting to come up with a list of the platforms
that actually end up using bounce buffers. I have not been
able to infer such a list, but it occurs when
host->max_segs == 1 and the bounce buffering is not explicitly
disabled.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/Kconfig  | 18 --
 drivers/mmc/core/queue.c  | 15 +--
 drivers/mmc/host/cavium.c |  3 +++
 drivers/mmc/host/pxamci.c |  6 ++
 include/linux/mmc/host.h  |  1 +
 5 files changed, 11 insertions(+), 32 deletions(-)

diff --git a/drivers/mmc/core/Kconfig b/drivers/mmc/core/Kconfig
index fc1ecdaaa9ca..42e89060cd41 100644
--- a/drivers/mmc/core/Kconfig
+++ b/drivers/mmc/core/Kconfig
@@ -61,24 +61,6 @@ config MMC_BLOCK_MINORS
 
  If unsure, say 8 here.
 
-config MMC_BLOCK_BOUNCE
-   bool "Use bounce buffer for simple hosts"
-   depends on MMC_BLOCK
-   default y
-   help
- SD/MMC is a high latency protocol where it is crucial to
- send large requests in order to get high performance. Many
- controllers, however, are restricted to continuous memory
- (i.e. they can't do scatter-gather), something the kernel
- rarely can provide.
-
- Say Y here to help these restricted hosts by bouncing
- requests back and forth from a large buffer. You will get
- a big performance gain at the cost of up to 64 KiB of
- physical memory.
-
- If unsure, say Y here.
-
 config SDIO_UART
tristate "SDIO UART/GPS class support"
depends on TTY
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 5c37b6be3e7b..545466342fb1 100644
--- a/driver

[PATCH 5/5] mmc: block: move multi-ioctl() to use block layer

2017-05-10 Thread Linus Walleij

This switches also the multiple-command ioctl() call to issue
all ioctl()s through the block layer instead of going directly
to the device.

We extend the passed argument with an argument count and loop
over all passed commands in the ioctl() issue function called
from the block layer.

By doing this we are again loosening the grip on the big host
lock, since two calls to mmc_get_card()/mmc_put_card() are
removed.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 38 +-
 drivers/mmc/core/queue.h |  3 ++-
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 640db4f57a31..152de904d5e4 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -563,6 +563,7 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
 struct mmc_ioc_cmd __user *ic_ptr)
 {
struct mmc_blk_ioc_data *idata;
+   struct mmc_blk_ioc_data *idatas[1];
struct mmc_blk_data *md;
struct mmc_queue *mq;
struct mmc_card *card;
@@ -600,7 +601,9 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
req = blk_get_request(mq->queue,
idata->ic.write_flag ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
__GFP_RECLAIM);
-   req_to_mq_rq(req)->idata = idata;
+   idatas[0] = idata;
+   req_to_mq_rq(req)->idata = idatas;
+   req_to_mq_rq(req)->ioc_count = 1;
blk_execute_rq(mq->queue, NULL, req, 0);
ioc_err = req_to_mq_rq(req)->ioc_result;
err = mmc_blk_ioctl_copy_to_user(ic_ptr, idata);
@@ -622,14 +625,17 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
 static void mmc_blk_ioctl_cmd_issue(struct mmc_queue *mq, struct request *req)
 {
struct mmc_queue_req *mq_rq;
-   struct mmc_blk_ioc_data *idata;
struct mmc_card *card = mq->card;
struct mmc_blk_data *md = mq->blkdata;
int ioc_err;
+   int i;
 
mq_rq = req_to_mq_rq(req);
-   idata = mq_rq->idata;
-   ioc_err = __mmc_blk_ioctl_cmd(card, md, idata);
+   for (i = 0; i < mq_rq->ioc_count; i++) {
+   ioc_err = __mmc_blk_ioctl_cmd(card, md, mq_rq->idata[i]);
+   if (ioc_err)
+   break;
+   }
mq_rq->ioc_result = ioc_err;
 
/* Always switch back to main area after RPMB access */
@@ -646,8 +652,10 @@ static int mmc_blk_ioctl_multi_cmd(struct block_device 
*bdev,
struct mmc_ioc_cmd __user *cmds = user->cmds;
struct mmc_card *card;
struct mmc_blk_data *md;
+   struct mmc_queue *mq;
int i, err = 0, ioc_err = 0;
__u64 num_of_cmds;
+   struct request *req;
 
/*
 * The caller must have CAP_SYS_RAWIO, and must be calling this on the
@@ -689,21 +697,25 @@ static int mmc_blk_ioctl_multi_cmd(struct block_device 
*bdev,
goto cmd_done;
}
 
-   mmc_get_card(card);
-
-   for (i = 0; i < num_of_cmds && !ioc_err; i++)
-   ioc_err = __mmc_blk_ioctl_cmd(card, md, idata[i]);
-
-   /* Always switch back to main area after RPMB access */
-   if (md->area_type & MMC_BLK_DATA_AREA_RPMB)
-   mmc_blk_part_switch(card, dev_get_drvdata(&card->dev));
 
-   mmc_put_card(card);
+   /*
+* Dispatch the ioctl()s into the block request queue.
+*/
+   mq = &md->queue;
+   req = blk_get_request(mq->queue,
+   idata[0]->ic.write_flag ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
+   __GFP_RECLAIM);
+   req_to_mq_rq(req)->idata = idata;
+   req_to_mq_rq(req)->ioc_count = num_of_cmds;
+   blk_execute_rq(mq->queue, NULL, req, 0);
+   ioc_err = req_to_mq_rq(req)->ioc_result;
 
/* copy to user if data and response */
for (i = 0; i < num_of_cmds && !err; i++)
err = mmc_blk_ioctl_copy_to_user(&cmds[i], idata[i]);
 
+   blk_put_request(req);
+
 cmd_done:
mmc_blk_put(md);
 cmd_err:
diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index aeb3408dc85e..7015df6681c3 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -42,7 +42,8 @@ struct mmc_queue_req {
unsigned intbounce_sg_len;
struct mmc_async_reqareq;
int ioc_result;
-   struct mmc_blk_ioc_data *idata;
+   struct mmc_blk_ioc_data **idata;
+   unsigned intioc_count;
 };
 
 struct mmc_queue {
-- 
2.9.3

[PATCH 3/5] mmc: block: Tag is_rpmb as bool

2017-05-10 Thread Linus Walleij

The variable is_rpmb is clearly a bool and even assigned true
and false, yet declared as an int.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index be782b8d4a0d..323f3790b629 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -443,7 +443,7 @@ static int __mmc_blk_ioctl_cmd(struct mmc_card *card, 
struct mmc_blk_data *md,
struct mmc_request mrq = {};
struct scatterlist sg;
int err;
-   int is_rpmb = false;
+   bool is_rpmb = false;
u32 status = 0;
 
if (!card || !md || !idata)
-- 
2.9.3

[PATCH 4/5] mmc: block: move single ioctl() commands to block requests

2017-05-10 Thread Linus Walleij

This wraps single ioctl() commands into block requests using
the custom block layer request types REQ_OP_DRV_IN and
REQ_OP_DRV_OUT.

By doing this we are loosening the grip on the big host lock,
since two calls to mmc_get_card()/mmc_put_card() are removed.

We are storing the ioctl() in/out argument as a pointer in
the per-request struct mmc_blk_request container. Since we
now let the block layer allocate this data, blk_get_request()
will allocate it for us and we can immediately dereference
it and use it to pass the argument into the block layer.

Tested on the ux500 with the userspace:
mmc extcsd read /dev/mmcblk3
resulting in a successful EXTCSD info dump back to the
console.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 56 ++--
 drivers/mmc/core/queue.h |  3 +++
 2 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 323f3790b629..640db4f57a31 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -564,8 +564,10 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
 {
struct mmc_blk_ioc_data *idata;
struct mmc_blk_data *md;
+   struct mmc_queue *mq;
struct mmc_card *card;
int err = 0, ioc_err = 0;
+   struct request *req;
 
/*
 * The caller must have CAP_SYS_RAWIO, and must be calling this on the
@@ -591,17 +593,18 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
goto cmd_done;
}
 
-   mmc_get_card(card);
-
-   ioc_err = __mmc_blk_ioctl_cmd(card, md, idata);
-
-   /* Always switch back to main area after RPMB access */
-   if (md->area_type & MMC_BLK_DATA_AREA_RPMB)
-   mmc_blk_part_switch(card, dev_get_drvdata(&card->dev));
-
-   mmc_put_card(card);
-
+   /*
+* Dispatch the ioctl() into the block request queue.
+*/
+   mq = &md->queue;
+   req = blk_get_request(mq->queue,
+   idata->ic.write_flag ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
+   __GFP_RECLAIM);
+   req_to_mq_rq(req)->idata = idata;
+   blk_execute_rq(mq->queue, NULL, req, 0);
+   ioc_err = req_to_mq_rq(req)->ioc_result;
err = mmc_blk_ioctl_copy_to_user(ic_ptr, idata);
+   blk_put_request(req);
 
 cmd_done:
mmc_blk_put(md);
@@ -611,6 +614,31 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
return ioc_err ? ioc_err : err;
 }
 
+/*
+ * The ioctl commands come back from the block layer after it queued it and
+ * processed it with all other requests and then they get issued in this
+ * function.
+ */
+static void mmc_blk_ioctl_cmd_issue(struct mmc_queue *mq, struct request *req)
+{
+   struct mmc_queue_req *mq_rq;
+   struct mmc_blk_ioc_data *idata;
+   struct mmc_card *card = mq->card;
+   struct mmc_blk_data *md = mq->blkdata;
+   int ioc_err;
+
+   mq_rq = req_to_mq_rq(req);
+   idata = mq_rq->idata;
+   ioc_err = __mmc_blk_ioctl_cmd(card, md, idata);
+   mq_rq->ioc_result = ioc_err;
+
+   /* Always switch back to main area after RPMB access */
+   if (md->area_type & MMC_BLK_DATA_AREA_RPMB)
+   mmc_blk_part_switch(card, dev_get_drvdata(&card->dev));
+
+   blk_end_request_all(req, ioc_err);
+}
+
 static int mmc_blk_ioctl_multi_cmd(struct block_device *bdev,
   struct mmc_ioc_multi_cmd __user *user)
 {
@@ -1854,7 +1882,13 @@ void mmc_blk_issue_rq(struct mmc_queue *mq, struct 
request *req)
goto out;
}
 
-   if (req && req_op(req) == REQ_OP_DISCARD) {
+   if (req &&
+   (req_op(req) == REQ_OP_DRV_IN || req_op(req) == REQ_OP_DRV_OUT)) {
+   /* complete ongoing async transfer before issuing ioctl()s */
+   if (mq->qcnt)
+   mmc_blk_issue_rw_rq(mq, NULL);
+   mmc_blk_ioctl_cmd_issue(mq, req);
+   } else if (req && req_op(req) == REQ_OP_DISCARD) {
/* complete ongoing async transfer before issuing discard */
if (mq->qcnt)
mmc_blk_issue_rw_rq(mq, NULL);
diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index 8aa10ffdf622..aeb3408dc85e 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -22,6 +22,7 @@ static inline bool mmc_req_is_special(struct request *req)
 
 struct task_struct;
 struct mmc_blk_data;
+struct mmc_blk_ioc_data;
 
 struct mmc_blk_request {
struct mmc_request  mrq;
@@ -40,6 +41,8 @@ struct mmc_queue_req {
struct scatterlist  *bounce_sg;
unsigned intbounce_sg_len;
struct mmc_async_reqareq;
+   int ioc_result;
+   struct mmc_blk_ioc_data *idata;
 };
 
 struct mmc_queue {
-- 
2.9.3

[PATCH 2/5] mmc: core: Allocate per-request data using the block layer core

2017-05-10 Thread Linus Walleij

The mmc_queue_req is a per-request state container the MMC core uses
to carry bounce buffers, pointers to asynchronous requests and so on.
Currently allocated as a static array of objects, then as a request
comes in, a mmc_queue_req is assigned to it, and used during the
lifetime of the request.

This is backwards compared to how other block layer drivers work:
they usally let the block core provide a per-request struct that get
allocated right beind the struct request, and which can be obtained
using the blk_mq_rq_to_pdu() helper. (The _mq_ infix in this function
name is misleading: it is used by both the old and the MQ block
layer.)

The per-request struct gets allocated to the size stored in the queue
variable .cmd_size initialized using the .init_rq_fn() and
cleaned up using .exit_rq_fn().

The block layer code makes the MMC core rely on this mechanism to
allocate the per-request mmc_queue_req state container.

Doing this make a lot of complicated queue handling go away. We only
need to keep the .qnct that keeps count of how many request are
currently being processed by the MMC layer. The MQ block layer will
replace also this once we transition to it.

Doing this refactoring is necessary to move the ioctl() operations
into custom block layer requests tagged with REQ_OP_DRV_[IN|OUT]
instead of the custom code using the BigMMCHostLock that we have
today: those require that per-request data be obtainable easily from
a request after creating a custom request with e.g.:

struct request *rq = blk_get_request(q, REQ_OP_DRV_IN, __GFP_RECLAIM);
struct mmc_queue_req *mq_rq = req_to_mq_rq(rq);

And this is not possible with the current construction, as the request
is not immediately assigned the per-request state container, but
instead it gets assigned when the request finally enters the MMC
queue, which is way too late for custom requests.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c |  38 ++--
 drivers/mmc/core/queue.c | 222 +--
 drivers/mmc/core/queue.h |  22 ++---
 include/linux/mmc/card.h |   2 -
 4 files changed, 80 insertions(+), 204 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 8273b078686d..be782b8d4a0d 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -129,13 +129,6 @@ static inline int mmc_blk_part_switch(struct mmc_card 
*card,
  struct mmc_blk_data *md);
 static int get_card_status(struct mmc_card *card, u32 *status, int retries);
 
-static void mmc_blk_requeue(struct request_queue *q, struct request *req)
-{
-   spin_lock_irq(q->queue_lock);
-   blk_requeue_request(q, req);
-   spin_unlock_irq(q->queue_lock);
-}
-
 static struct mmc_blk_data *mmc_blk_get(struct gendisk *disk)
 {
struct mmc_blk_data *md;
@@ -1642,7 +1635,7 @@ static void mmc_blk_rw_cmd_abort(struct mmc_queue *mq, 
struct mmc_card *card,
if (mmc_card_removed(card))
req->rq_flags |= RQF_QUIET;
while (blk_end_request(req, -EIO, blk_rq_cur_bytes(req)));
-   mmc_queue_req_free(mq, mqrq);
+   mq->qcnt--;
 }
 
 /**
@@ -1662,7 +1655,7 @@ static void mmc_blk_rw_try_restart(struct mmc_queue *mq, 
struct request *req,
if (mmc_card_removed(mq->card)) {
req->rq_flags |= RQF_QUIET;
blk_end_request_all(req, -EIO);
-   mmc_queue_req_free(mq, mqrq);
+   mq->qcnt--; /* FIXME: just set to 0? */
return;
}
/* Else proceed and try to restart the current async request */
@@ -1685,12 +1678,8 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
bool req_pending = true;
 
if (new_req) {
-   mqrq_cur = mmc_queue_req_find(mq, new_req);
-   if (!mqrq_cur) {
-   WARN_ON(1);
-   mmc_blk_requeue(mq->queue, new_req);
-   new_req = NULL;
-   }
+   mqrq_cur = req_to_mq_rq(new_req);
+   mq->qcnt++;
}
 
if (!mq->qcnt)
@@ -1764,12 +1753,12 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
if (req_pending)
mmc_blk_rw_cmd_abort(mq, card, old_req, 
mq_rq);
else
-   mmc_queue_req_free(mq, mq_rq);
+   mq->qcnt--;
mmc_blk_rw_try_restart(mq, new_req, mqrq_cur);
return;
}
if (!req_pending) {
-   mmc_queue_req_free(mq, mq_rq);
+   mq->qcnt--;
mmc_blk_rw_try_restart(mq, new_req, mqrq_cur);
return;

Re: [PATCH 1/5] mmc: core: Delete bounce buffer Kconfig option

2017-05-18 Thread Linus Walleij

On Mon, May 15, 2017 at 4:04 PM, Bartlomiej Zolnierkiewicz
 wrote:

> [ I added Daniel, Marc & Steven to cc: ]

>> The option CONFIG_MMC_BLOCK_BOUNCE is default y so the
>> majority of platforms in the kernel already have it on, and
>> it then gets turned off at runtime since most of these have
>> a host->max_segs > 1. The few exceptions that have
>> host->max_segs == 1 and still turn off the bounce buffering
>> are those that disable it in their defconfig.
>>
>> Those are the following:
>>
>> arch/arm/configs/colibri_pxa300_defconfig
>> arch/arm/configs/zeus_defconfig
>> - Uses MMC_PXA, drivers/mmc/host/pxamci.c
>> - Sets host->max_segs = NR_SG, which is 1
>> - This needs its bounce buffer deactivated so we set
>>   host->disable_bounce to true in the host driver
>
> [...]
>
>> arch/mips/configs/cavium_octeon_defconfig
>> - Uses MMC_CAVIUM_OCTEON, drivers/mmc/host/cavium.c
>> - Sets host->max_segs to 16 or 1
>> - Setting host->disable_bounce to be sure for the 1 case
>
> From looking at the code it seems that bounce buffering should
> be always beneficial to MMC performance when host->max_segs == 1.
>
> It would be useful to know why these specific defconfigs
> disable bounce buffering (to save memory?). Maybe the defaults
> should be changed nowadays?

I agree. But I can't test on pxamci and cavium so I would
appreciate if the driver maintainers try to remove the
flag (MMC_CAP_NO_BOUNCE_BUFF in the v2 patch set)
and see what happens.

I would be happy to cut this special flag altogether, but I am
also afraid of screwing up some config :/

Yours,
Linus Walleij

Re: [PATCH 2/5] mmc: core: Allocate per-request data using the block layer core

2017-05-18 Thread Linus Walleij

On Tue, May 16, 2017 at 11:02 AM, Ulf Hansson  wrote:
> On 10 May 2017 at 10:24, Linus Walleij  wrote:

>> @@ -1662,7 +1655,7 @@ static void mmc_blk_rw_try_restart(struct mmc_queue 
>> *mq, struct request *req,
>> if (mmc_card_removed(mq->card)) {
>> req->rq_flags |= RQF_QUIET;
>> blk_end_request_all(req, -EIO);
>> -   mmc_queue_req_free(mq, mqrq);
>> +   mq->qcnt--; /* FIXME: just set to 0? */
>
> As mentioned below, perhaps this FIXME is fine to add. As I assume you
> soon intend to take care of it, right?

Yes that goes away with my MQ patches (not yet rebased)
by stopping to try to look when the queue is empty and just
issue requests asynchronously. I just wanted to point this out,
that counter is kind of fragile and scary to me.

>> -   for (i = 0; i < qdepth; i++) {
>> -   mqrq[i].sg = mmc_alloc_sg(max_segs);
>> -   if (!mqrq[i].sg)
>> +   /* FIXME: use req_to_mq_rq() everywhere this is dereferenced */
>
> Why not do that right now, instead of adding a FIXME comment?

This comment is wrong, just a development artifact I will just delete it.

>> mq->card = card;
>> -   mq->queue = blk_init_queue(mmc_request_fn, lock);
>> +   mq->queue = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE);
>
> Seems like we should use blk_alloc_queue() instead, as it calls
> blk_alloc_queue_node(gfp_mask, NUMA_NO_NODE) for us.

OK

>> +static inline struct mmc_queue_req *req_to_mq_rq(struct request *rq)
>
> To be more consistent with existing function names, perhaps rename this to:
> req_to_mmc_queue_req()
>
>> +{
>> +   return blk_mq_rq_to_pdu(rq);
>> +}
>> +
>
> [...]
>
>>  struct mmc_queue {
>> @@ -45,14 +50,15 @@ struct mmc_queue {
>> boolasleep;
>> struct mmc_blk_data *blkdata;
>> struct request_queue*queue;
>> -   struct mmc_queue_req*mqrq;
>> -   int qdepth;
>> +   /*
>> +* FIXME: this counter is not a very reliable way of keeping
>> +* track of how many requests that are ongoing. Switch to just
>> +* letting the block core keep track of requests and per-request
>> +* associated mmc_queue_req data.
>> +*/
>> int qcnt;
>
> I am not very fond of FIXME comments, however perhaps this one really
> deserves to be a FIXME because you intend to fix this asap, right?

Same as the first comment. It is fragile and I don't like it,
with asynchronous issueing in MQ this goes away.

Yours,
Linus Walleij

Re: [PATCH 2/5] mmc: core: Allocate per-request data using the block layer core

2017-05-18 Thread Linus Walleij

On Tue, May 16, 2017 at 1:54 PM, Adrian Hunter  wrote:
> On 10/05/17 11:24, Linus Walleij wrote:
>> The mmc_queue_req is a per-request state container the MMC core uses
>> to carry bounce buffers, pointers to asynchronous requests and so on.
>> Currently allocated as a static array of objects, then as a request
>> comes in, a mmc_queue_req is assigned to it, and used during the
>> lifetime of the request.
>>
>> This is backwards compared to how other block layer drivers work:
>> they usally let the block core provide a per-request struct that get
>> allocated right beind the struct request, and which can be obtained
>> using the blk_mq_rq_to_pdu() helper. (The _mq_ infix in this function
>> name is misleading: it is used by both the old and the MQ block
>> layer.)
>>
>> The per-request struct gets allocated to the size stored in the queue
>> variable .cmd_size initialized using the .init_rq_fn() and
>> cleaned up using .exit_rq_fn().
>>
>> The block layer code makes the MMC core rely on this mechanism to
>> allocate the per-request mmc_queue_req state container.
>>
>> Doing this make a lot of complicated queue handling go away.
>
> Isn't that at the expense of increased memory allocation.
>
> Have you compared the number of allocations?  It looks to me like the block
> layer allocates a minimum of 4 requests in the memory pool which will
> increase if there are more in the I/O scheduler, plus 1 for flush.  There
> are often 4 queues per eMMC (2x boot,RPMB and main area), so that is 20
> requests minimum, up from 2 allocations previously.  For someone using 64K
> bounce buffers, you have increased memory allocation by at least 18x64 =
> 1152k.  However the I/O scheduler could allocate a lot more.

That is not a realistic example.

As pointed out in patch #1, bounce buffers are used on old systems
which have max_segs == 1. No modern hardware has that,
they all have multiple segments-capable host controllers and
often also DMA engines.

Old systems with max_segs == 1 also have:

- One SD or MMC slot
- No eMMC (because it was not yet invented in those times)
- So no RPMB or Boot partitions, just main area

If you can point me to a system that has max_segs == 1 and an
eMMC mounted, I can look into it and ask the driver maintainers to
check if it disturbs them, but I think those simply do not exist.

>> Doing this refactoring is necessary to move the ioctl() operations
>> into custom block layer requests tagged with REQ_OP_DRV_[IN|OUT]
>
> Obviously you could create a per-request data structure with only the
> reference to the IOCTL data, and without putting all the memory allocations
> there as well.

Not easily, and this is the way all IDE, ATA, SCSI disks etc are
doing this so why would be try to be different and maintain a lot
of deviant code.

The allocation of extra data is done by the block layer when issueing
blk_get_request() so trying to keep the old mechanism of a list of
struct mmc_queue_req and trying to pair these with incoming requests
inevitably means a lot of extra work, possibly deepening that list or
creating out-of-list extra entries and whatnot.

It's better to do what everyone else does and let the core do this
allocation of extra data (tag) instead.

Yours,
Linus Walleij

Re: [PATCH 5/5] mmc: block: move multi-ioctl() to use block layer

2017-05-18 Thread Linus Walleij

On Fri, May 12, 2017 at 11:09 PM, Avri Altman  wrote:

>> + req = blk_get_request(mq->queue,
>> + idata[0]->ic.write_flag ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
>> + __GFP_RECLAIM);
>
> It is possible, e.g. as in RPMB access, that some commands are read and some 
> are write.
> Not sure that it makes any difference, because once it get back to 
> mmc_blk_ioctl_cmd_issue(),
> The correct mmc requests will be issued anyway?

The OP type (REQ_OP_DRV_OUT or REQ_OP_DRV_IN) has no semantic
effect in the MMC/SD stack, I just need to set it to something reasonable.
They will all be handled the same when issueing the request later.

The only semantic effect would be if the block layer prioritize these types
of requests differently compared to each other or to other requests, in
which case I think this rough guess is not a big issue.

Yours,
Linus Walleij

[PATCH 2/6 v2] mmc: core: Allocate per-request data using the block layer core

2017-05-18 Thread Linus Walleij

The mmc_queue_req is a per-request state container the MMC core uses
to carry bounce buffers, pointers to asynchronous requests and so on.
Currently allocated as a static array of objects, then as a request
comes in, a mmc_queue_req is assigned to it, and used during the
lifetime of the request.

This is backwards compared to how other block layer drivers work:
they usally let the block core provide a per-request struct that get
allocated right beind the struct request, and which can be obtained
using the blk_mq_rq_to_pdu() helper. (The _mq_ infix in this function
name is misleading: it is used by both the old and the MQ block
layer.)

The per-request struct gets allocated to the size stored in the queue
variable .cmd_size initialized using the .init_rq_fn() and
cleaned up using .exit_rq_fn().

The block layer code makes the MMC core rely on this mechanism to
allocate the per-request mmc_queue_req state container.

Doing this make a lot of complicated queue handling go away. We only
need to keep the .qnct that keeps count of how many request are
currently being processed by the MMC layer. The MQ block layer will
replace also this once we transition to it.

Doing this refactoring is necessary to move the ioctl() operations
into custom block layer requests tagged with REQ_OP_DRV_[IN|OUT]
instead of the custom code using the BigMMCHostLock that we have
today: those require that per-request data be obtainable easily from
a request after creating a custom request with e.g.:

struct request *rq = blk_get_request(q, REQ_OP_DRV_IN, __GFP_RECLAIM);
struct mmc_queue_req *mq_rq = req_to_mq_rq(rq);

And this is not possible with the current construction, as the request
is not immediately assigned the per-request state container, but
instead it gets assigned when the request finally enters the MMC
queue, which is way too late for custom requests.

Signed-off-by: Linus Walleij 
---
ChangeLog v1->v2:
- Rename req_to_mq_rq() to req_to_mmc_queue_req()
- Drop irrelevant FIXME comment.
---
 drivers/mmc/core/block.c |  38 ++--
 drivers/mmc/core/queue.c | 221 +--
 drivers/mmc/core/queue.h |  22 ++---
 include/linux/mmc/card.h |   2 -
 4 files changed, 79 insertions(+), 204 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 8273b078686d..5f29b5625216 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -129,13 +129,6 @@ static inline int mmc_blk_part_switch(struct mmc_card 
*card,
  struct mmc_blk_data *md);
 static int get_card_status(struct mmc_card *card, u32 *status, int retries);
 
-static void mmc_blk_requeue(struct request_queue *q, struct request *req)
-{
-   spin_lock_irq(q->queue_lock);
-   blk_requeue_request(q, req);
-   spin_unlock_irq(q->queue_lock);
-}
-
 static struct mmc_blk_data *mmc_blk_get(struct gendisk *disk)
 {
struct mmc_blk_data *md;
@@ -1642,7 +1635,7 @@ static void mmc_blk_rw_cmd_abort(struct mmc_queue *mq, 
struct mmc_card *card,
if (mmc_card_removed(card))
req->rq_flags |= RQF_QUIET;
while (blk_end_request(req, -EIO, blk_rq_cur_bytes(req)));
-   mmc_queue_req_free(mq, mqrq);
+   mq->qcnt--;
 }
 
 /**
@@ -1662,7 +1655,7 @@ static void mmc_blk_rw_try_restart(struct mmc_queue *mq, 
struct request *req,
if (mmc_card_removed(mq->card)) {
req->rq_flags |= RQF_QUIET;
blk_end_request_all(req, -EIO);
-   mmc_queue_req_free(mq, mqrq);
+   mq->qcnt--; /* FIXME: just set to 0? */
return;
}
/* Else proceed and try to restart the current async request */
@@ -1685,12 +1678,8 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
bool req_pending = true;
 
if (new_req) {
-   mqrq_cur = mmc_queue_req_find(mq, new_req);
-   if (!mqrq_cur) {
-   WARN_ON(1);
-   mmc_blk_requeue(mq->queue, new_req);
-   new_req = NULL;
-   }
+   mqrq_cur = req_to_mmc_queue_req(new_req);
+   mq->qcnt++;
}
 
if (!mq->qcnt)
@@ -1764,12 +1753,12 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
if (req_pending)
mmc_blk_rw_cmd_abort(mq, card, old_req, 
mq_rq);
else
-   mmc_queue_req_free(mq, mq_rq);
+   mq->qcnt--;
mmc_blk_rw_try_restart(mq, new_req, mqrq_cur);
return;
}
if (!req_pending) {
-   mmc_queue_req_free(mq, mq_rq);
+   mq->qcnt--;

[PATCH 1/6 v2] mmc: core: Delete bounce buffer Kconfig option

2017-05-18 Thread Linus Walleij

This option is activated by all multiplatform configs and what
not so we almost always have it turned on, and the memory it
saves is negligible, even more so moving forward. The actual
bounce buffer only gets allocated only when used, the only
thing the ifdefs are saving is a little bit of code.

It is highly improper to have this as a Kconfig option that
get turned on by Kconfig, make this a pure runtime-thing and
let the host decide whether we use bounce buffers. We add a
new property "disable_bounce" to the host struct.

Notice that mmc_queue_calc_bouncesz() already disables the
bounce buffers if host->max_segs != 1, so any arch that has a
maximum number of segments higher than 1 will have bounce
buffers disabled.

The option CONFIG_MMC_BLOCK_BOUNCE is default y so the
majority of platforms in the kernel already have it on, and
it then gets turned off at runtime since most of these have
a host->max_segs > 1. The few exceptions that have
host->max_segs == 1 and still turn off the bounce buffering
are those that disable it in their defconfig.

Those are the following:

arch/arm/configs/colibri_pxa300_defconfig
arch/arm/configs/zeus_defconfig
- Uses MMC_PXA, drivers/mmc/host/pxamci.c
- Sets host->max_segs = NR_SG, which is 1
- This needs its bounce buffer deactivated so we set
  host->disable_bounce to true in the host driver

arch/arm/configs/davinci_all_defconfig
- Uses MMC_DAVINCI, drivers/mmc/host/davinci_mmc.c
- This driver sets host->max_segs to MAX_NR_SG, which is 16
- That means this driver anyways disabled bounce buffers
- No special action needed for this platform

arch/arm/configs/lpc32xx_defconfig
arch/arm/configs/nhk8815_defconfig
arch/arm/configs/u300_defconfig
- Uses MMC_ARMMMCI, drivers/mmc/host/mmci.[c|h]
- This driver by default sets host->max_segs to NR_SG,
  which is 128, unless a DMA engine is used, and in that case
  the number of segments are also > 1
- That means this driver already disables bounce buffers
- No special action needed for these platforms

arch/arm/configs/sama5_defconfig
- Uses MMC_SDHCI, MMC_SDHCI_PLTFM, MMC_SDHCI_OF_AT91, MMC_ATMELMCI
- Uses drivers/mmc/host/sdhci.c
- Normally sets host->max_segs to SDHCI_MAX_SEGS which is 128 and
  thus disables bounce buffers
- Sets host->max_segs to 1 if SDHCI_USE_SDMA is set
- SDHCI_USE_SDMA is only set by SDHCI on PCI adapers
- That means that for this platform bounce buffers are already
  disabled at runtime
- No special action needed for this platform

arch/blackfin/configs/CM-BF533_defconfig
arch/blackfin/configs/CM-BF537E_defconfig
- Uses MMC_SPI (a simple MMC card connected on SPI pins)
- Uses drivers/mmc/host/mmc_spi.c
- Sets host->max_segs to MMC_SPI_BLOCKSATONCE which is 128
- That means this platform already disables bounce buffers at
  runtime
- No special action needed for these platforms

arch/mips/configs/cavium_octeon_defconfig
- Uses MMC_CAVIUM_OCTEON, drivers/mmc/host/cavium.c
- Sets host->max_segs to 16 or 1
- Setting host->disable_bounce to be sure for the 1 case

arch/mips/configs/qi_lb60_defconfig
- Uses MMC_JZ4740, drivers/mmc/host/jz4740_mmc.c
- This sets host->max_segs to 128 so bounce buffers are
  already runtime disabled
- No action needed for this platform

It would be interesting to come up with a list of the platforms
that actually end up using bounce buffers. I have not been
able to infer such a list, but it occurs when
host->max_segs == 1 and the bounce buffering is not explicitly
disabled.

Signed-off-by: Linus Walleij 
---
ChangeLog v1->v2:
- Instead of adding a new bool "disable_bounce" we use the host
  caps variable, reuse the free bit 21 to indicate that bounce
  buffers should be disabled on the host.
---
 drivers/mmc/core/Kconfig  | 18 --
 drivers/mmc/core/queue.c  | 15 +--
 drivers/mmc/host/cavium.c |  4 +++-
 drivers/mmc/host/pxamci.c |  6 +-
 include/linux/mmc/host.h  |  1 +
 5 files changed, 10 insertions(+), 34 deletions(-)

diff --git a/drivers/mmc/core/Kconfig b/drivers/mmc/core/Kconfig
index fc1ecdaaa9ca..42e89060cd41 100644
--- a/drivers/mmc/core/Kconfig
+++ b/drivers/mmc/core/Kconfig
@@ -61,24 +61,6 @@ config MMC_BLOCK_MINORS
 
  If unsure, say 8 here.
 
-config MMC_BLOCK_BOUNCE
-   bool "Use bounce buffer for simple hosts"
-   depends on MMC_BLOCK
-   default y
-   help
- SD/MMC is a high latency protocol where it is crucial to
- send large requests in order to get high performance. Many
- controllers, however, are restricted to continuous memory
- (i.e. they can't do scatter-gather), something the kernel
- rarely can provide.
-
- Say Y here to help these restricted hosts by bouncing
- requests back and forth from a large buffer. You will get
- a big performance gain at the cost of up to 64 KiB of
- physical memory.
-
- If unsure, say Y here.
-
 config SDIO_U

[PATCH 3/6 v2] mmc: block: Tag is_rpmb as bool

2017-05-18 Thread Linus Walleij

The variable is_rpmb is clearly a bool and even assigned true
and false, yet declared as an int.

Signed-off-by: Linus Walleij 
---
ChangeLog v1->v2:
- No changes, just resending
---
 drivers/mmc/core/block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 5f29b5625216..f4dab1dfd2ab 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -443,7 +443,7 @@ static int __mmc_blk_ioctl_cmd(struct mmc_card *card, 
struct mmc_blk_data *md,
struct mmc_request mrq = {};
struct scatterlist sg;
int err;
-   int is_rpmb = false;
+   bool is_rpmb = false;
u32 status = 0;
 
if (!card || !md || !idata)
-- 
2.9.3

[PATCH 5/6 v2] mmc: block: move multi-ioctl() to use block layer

2017-05-18 Thread Linus Walleij

This switches also the multiple-command ioctl() call to issue
all ioctl()s through the block layer instead of going directly
to the device.

We extend the passed argument with an argument count and loop
over all passed commands in the ioctl() issue function called
from the block layer.

By doing this we are again loosening the grip on the big host
lock, since two calls to mmc_get_card()/mmc_put_card() are
removed.

Signed-off-by: Linus Walleij 
---
ChangeLog v1->v2:
- - Update to the API change for req_to_mmc_queue_req()
---
 drivers/mmc/core/block.c | 38 +-
 drivers/mmc/core/queue.h |  3 ++-
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 9fb2bd529156..e9737987956f 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -563,6 +563,7 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
 struct mmc_ioc_cmd __user *ic_ptr)
 {
struct mmc_blk_ioc_data *idata;
+   struct mmc_blk_ioc_data *idatas[1];
struct mmc_blk_data *md;
struct mmc_queue *mq;
struct mmc_card *card;
@@ -600,7 +601,9 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
req = blk_get_request(mq->queue,
idata->ic.write_flag ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
__GFP_RECLAIM);
-   req_to_mmc_queue_req(req)->idata = idata;
+   idatas[0] = idata;
+   req_to_mmc_queue_req(req)->idata = idatas;
+   req_to_mmc_queue_req(req)->ioc_count = 1;
blk_execute_rq(mq->queue, NULL, req, 0);
ioc_err = req_to_mmc_queue_req(req)->ioc_result;
err = mmc_blk_ioctl_copy_to_user(ic_ptr, idata);
@@ -622,14 +625,17 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
 static void mmc_blk_ioctl_cmd_issue(struct mmc_queue *mq, struct request *req)
 {
struct mmc_queue_req *mq_rq;
-   struct mmc_blk_ioc_data *idata;
struct mmc_card *card = mq->card;
struct mmc_blk_data *md = mq->blkdata;
int ioc_err;
+   int i;
 
mq_rq = req_to_mmc_queue_req(req);
-   idata = mq_rq->idata;
-   ioc_err = __mmc_blk_ioctl_cmd(card, md, idata);
+   for (i = 0; i < mq_rq->ioc_count; i++) {
+   ioc_err = __mmc_blk_ioctl_cmd(card, md, mq_rq->idata[i]);
+   if (ioc_err)
+   break;
+   }
mq_rq->ioc_result = ioc_err;
 
/* Always switch back to main area after RPMB access */
@@ -646,8 +652,10 @@ static int mmc_blk_ioctl_multi_cmd(struct block_device 
*bdev,
struct mmc_ioc_cmd __user *cmds = user->cmds;
struct mmc_card *card;
struct mmc_blk_data *md;
+   struct mmc_queue *mq;
int i, err = 0, ioc_err = 0;
__u64 num_of_cmds;
+   struct request *req;
 
/*
 * The caller must have CAP_SYS_RAWIO, and must be calling this on the
@@ -689,21 +697,25 @@ static int mmc_blk_ioctl_multi_cmd(struct block_device 
*bdev,
goto cmd_done;
}
 
-   mmc_get_card(card);
-
-   for (i = 0; i < num_of_cmds && !ioc_err; i++)
-   ioc_err = __mmc_blk_ioctl_cmd(card, md, idata[i]);
-
-   /* Always switch back to main area after RPMB access */
-   if (md->area_type & MMC_BLK_DATA_AREA_RPMB)
-   mmc_blk_part_switch(card, dev_get_drvdata(&card->dev));
 
-   mmc_put_card(card);
+   /*
+* Dispatch the ioctl()s into the block request queue.
+*/
+   mq = &md->queue;
+   req = blk_get_request(mq->queue,
+   idata[0]->ic.write_flag ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
+   __GFP_RECLAIM);
+   req_to_mmc_queue_req(req)->idata = idata;
+   req_to_mmc_queue_req(req)->ioc_count = num_of_cmds;
+   blk_execute_rq(mq->queue, NULL, req, 0);
+   ioc_err = req_to_mmc_queue_req(req)->ioc_result;
 
/* copy to user if data and response */
for (i = 0; i < num_of_cmds && !err; i++)
err = mmc_blk_ioctl_copy_to_user(&cmds[i], idata[i]);
 
+   blk_put_request(req);
+
 cmd_done:
mmc_blk_put(md);
 cmd_err:
diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index 005ece9ac7cb..8c76e7118c95 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -42,7 +42,8 @@ struct mmc_queue_req {
unsigned intbounce_sg_len;
struct mmc_async_reqareq;
int ioc_result;
-   struct mmc_blk_ioc_data *idata;
+   struct mmc_blk_ioc_data **idata;
+   unsigned intioc_count;
 };
 
 struct mmc_queue {
-- 
2.9.3

[PATCH 6/6 v2] mmc: queue: delete mmc_req_is_special()

2017-05-18 Thread Linus Walleij

commit cdf8a6fb48882651049e468e6b16956fb83db86c
"mmc: block: Introduce queue semantics"
deleted the last user of mmc_req_is_special() and it was
a horrible hack to classify requests as "special" or
"not special" to begin with, so delete the helper.

Signed-off-by: Linus Walleij 
---
ChangeLog v1->v2:
- No changes, just include this patch with in my
  series.
---
 drivers/mmc/core/queue.h | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index 8c76e7118c95..dfe481a8b5ed 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -12,14 +12,6 @@ static inline struct mmc_queue_req 
*req_to_mmc_queue_req(struct request *rq)
return blk_mq_rq_to_pdu(rq);
 }
 
-static inline bool mmc_req_is_special(struct request *req)
-{
-   return req &&
-   (req_op(req) == REQ_OP_FLUSH ||
-req_op(req) == REQ_OP_DISCARD ||
-req_op(req) == REQ_OP_SECURE_ERASE);
-}
-
 struct task_struct;
 struct mmc_blk_data;
 struct mmc_blk_ioc_data;
-- 
2.9.3

[PATCH 4/6 v2] mmc: block: move single ioctl() commands to block requests

2017-05-18 Thread Linus Walleij

This wraps single ioctl() commands into block requests using
the custom block layer request types REQ_OP_DRV_IN and
REQ_OP_DRV_OUT.

By doing this we are loosening the grip on the big host lock,
since two calls to mmc_get_card()/mmc_put_card() are removed.

We are storing the ioctl() in/out argument as a pointer in
the per-request struct mmc_blk_request container. Since we
now let the block layer allocate this data, blk_get_request()
will allocate it for us and we can immediately dereference
it and use it to pass the argument into the block layer.

We refactor the if/else/if/else ladder in mmc_blk_issue_rq()
as part of the job, keeping some extra attention to the
case when a NULL req is passed into this function and
making that pipeline flush more explicit.

Tested on the ux500 with the userspace:
mmc extcsd read /dev/mmcblk3
resulting in a successful EXTCSD info dump back to the
console.

This commit fixes a starvation issue in the MMC/SD stack
that can be easily provoked in the following way by
issueing the following commands in sequence:

> dd if=/dev/mmcblk3 of=/dev/null bs=1M &
> mmc extcs read /dev/mmcblk3

Before this patch, the extcsd read command would hang
(starve) while waiting for the dd command to finish since
the block layer was holding the card/host lock.

After this patch, the extcsd ioctl() command is nicely
interpersed with the rest of the block commands and we
can issue a bunch of ioctl()s from userspace while there
is some busy block IO going on without any problems.

Conversely userspace ioctl()s can no longer starve
the block layer by holding the card/host lock.

Signed-off-by: Linus Walleij 
---
ChangeLog v1->v2:
- Replace the if/else/if/else nest in mmc_blk_issue_rq()
  with a switch() clause at Ulf's request.
- Update to the API change for req_to_mmc_queue_req()
---
 drivers/mmc/core/block.c | 111 ---
 drivers/mmc/core/queue.h |   3 ++
 2 files changed, 88 insertions(+), 26 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index f4dab1dfd2ab..9fb2bd529156 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -564,8 +564,10 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
 {
struct mmc_blk_ioc_data *idata;
struct mmc_blk_data *md;
+   struct mmc_queue *mq;
struct mmc_card *card;
int err = 0, ioc_err = 0;
+   struct request *req;
 
/*
 * The caller must have CAP_SYS_RAWIO, and must be calling this on the
@@ -591,17 +593,18 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
goto cmd_done;
}
 
-   mmc_get_card(card);
-
-   ioc_err = __mmc_blk_ioctl_cmd(card, md, idata);
-
-   /* Always switch back to main area after RPMB access */
-   if (md->area_type & MMC_BLK_DATA_AREA_RPMB)
-   mmc_blk_part_switch(card, dev_get_drvdata(&card->dev));
-
-   mmc_put_card(card);
-
+   /*
+* Dispatch the ioctl() into the block request queue.
+*/
+   mq = &md->queue;
+   req = blk_get_request(mq->queue,
+   idata->ic.write_flag ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
+   __GFP_RECLAIM);
+   req_to_mmc_queue_req(req)->idata = idata;
+   blk_execute_rq(mq->queue, NULL, req, 0);
+   ioc_err = req_to_mmc_queue_req(req)->ioc_result;
err = mmc_blk_ioctl_copy_to_user(ic_ptr, idata);
+   blk_put_request(req);
 
 cmd_done:
mmc_blk_put(md);
@@ -611,6 +614,31 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
return ioc_err ? ioc_err : err;
 }
 
+/*
+ * The ioctl commands come back from the block layer after it queued it and
+ * processed it with all other requests and then they get issued in this
+ * function.
+ */
+static void mmc_blk_ioctl_cmd_issue(struct mmc_queue *mq, struct request *req)
+{
+   struct mmc_queue_req *mq_rq;
+   struct mmc_blk_ioc_data *idata;
+   struct mmc_card *card = mq->card;
+   struct mmc_blk_data *md = mq->blkdata;
+   int ioc_err;
+
+   mq_rq = req_to_mmc_queue_req(req);
+   idata = mq_rq->idata;
+   ioc_err = __mmc_blk_ioctl_cmd(card, md, idata);
+   mq_rq->ioc_result = ioc_err;
+
+   /* Always switch back to main area after RPMB access */
+   if (md->area_type & MMC_BLK_DATA_AREA_RPMB)
+   mmc_blk_part_switch(card, dev_get_drvdata(&card->dev));
+
+   blk_end_request_all(req, ioc_err);
+}
+
 static int mmc_blk_ioctl_multi_cmd(struct block_device *bdev,
   struct mmc_ioc_multi_cmd __user *user)
 {
@@ -1854,23 +1882,54 @@ void mmc_blk_issue_rq(struct mmc_queue *mq, struct 
request *req)
goto out;
}
 
-   if (req && req_op(req) == REQ_OP_DISCARD) {
-   /* complete ongoing async transfer before issuing discard */
-   if (mq->qcnt)
-

Re: Outstanding MQ questions from MMC

2017-05-18 Thread Linus Walleij

On Tue, Apr 18, 2017 at 5:31 PM, Alex Lemberg  wrote:

> There is an additional functionality, which is require the host lock
> to be held for several write commands - the FFU.
> In case of FFU, the FW can be download/write in several iterations
> of Write command (CMD25). This sequence should not be interrupted by regular
> Write requests.
> In current driver, both FFU and RPMB can be sent by using 
> mmc_blk_ioctl_multi_cmd().

Both single and multi ioctl()s are funneled into the block layer
using the driver-specific request ops in the latest iteration of
my patch set.

It turns out this was a simpler change than I though.

If you check the patch set I realized that I also fixes a userspace
starvation issue when issueing ioctls() such as for RPMB
during heavy block I/O.

This usecase:

> dd if=/dev/mmcblk3 of=/dev/null bs=1M &
> mmc extcs read /dev/mmcblk3

This would previously hang until the dd command was complete before
issuing the ioctl() command, just waiting for the host lock.
I guess RPMB has the same problem...

It is now fixed.

If you can verify the v2 patch set (just posted) and provide Tested-by's
(or bug reports...) it's appreciated.

Yours,
Linus Walleij

Re: [PATCH 2/6 v2] mmc: core: Allocate per-request data using the block layer core

2017-05-18 Thread Linus Walleij

On Thu, May 18, 2017 at 11:32 AM, Christoph Hellwig  wrote:

> Btw, you can also remove the struct request backpointer in
> struct mmc_queue_req now - blk_mq_rq_from_pdu will do it for you
> without the need for a pointer.

Thanks I made a patch for this in the front of my next clean-up
series.

Yours,
Linus Walleij

Re: [PATCH 2/5] mmc: core: Allocate per-request data using the block layer core

2017-05-18 Thread Linus Walleij

On Thu, May 18, 2017 at 2:42 PM, Adrian Hunter  wrote:
> On 18/05/17 11:21, Linus Walleij wrote:

>> It's better to do what everyone else does and let the core do this
>> allocation of extra data (tag) instead.
>
> I agree it is much nicer, but the extra bounce buffer allocations still seem
> gratuitous.  Maybe we should allocate them as needed from a memory pool,
> instead of for every request.

Incidentally IIRC that is what happens when we migrate to MQ.
In the old block layer, the per-request data is indeed initialized for
every request as you say, but in MQ the same struct request *'s
are reused from a pool, they are only initialized once, i.e. when
you add the block device.

(If I remember my logs right.)

Yours,
Linus Walleij

[PATCH 0/6] More MMC block core refactorings

2017-05-19 Thread Linus Walleij

This series builds on top of the previous series that created
custom DRV_OP requests for ioctl() operations in MMC.

The first patch is a suggestion from Christoph, the second
builds infrastructure for issuing more, currently orthogonal
custom operations through the block layer.

The first operation we move over is pretty uncontroversial
and straight-forward: it is the operation that
write-protect-locks the boot partitions from sysfs. This
is now done through the block layer so we do not need
to congest and starve in the big MMC lock.

The last two patches are more contoversial: they move the
two debugfs accesses for reading card status and EXT CSD
over to using the block layer funnel *if* *present*.

So if the block layer is configured out, these will still
issue operations directly and take the big MMC lock.

The patch series is fully ABI safe: any scripts or code
using the debugfs with or without the block layer will
still work.

However this leaves the mmc_card_get() locks in the block.h
header for the !CONFIG_MMC_BLOCK case and I'm not really happy
to keep them around, the idea is to terminate them.

Ways forward after these patches:

- Simply remove the debugfs files for status and ext_csd if
  the block layer is not there. The debugfs is not ABI after
  all, and there is an ioctl() to do the same job, and
  that is what mmc-utils is using.

- Simply remove the debugfs files for status and ext_csd
  completely - and require users to switch to using the
  ioctl() mmc-utils way of doing things if they want to
  inspect their MMC/SD cards.

- Wait and see: when I get to removing the big MMC lock from
  SDIO I will anyway have to deal with this mess since
  the big lock is no more a block layer problem, but a
  problem with the entire MMC/SD/SDIO stack.

In any case: these patches fixes the starvation of the
boot partition locking and the debugfs access when using
the block layer heavily at the same time.

Linus Walleij (6):
  mmc: block: remove req back pointer
  mmc: block: Tag DRV_OPs with a driver operation type
  mmc: block: Move DRV OP issue function
  mmc: block: Move boot partition locking into a driver op
  mmc: debugfs: Move card status retrieveal into the block layer
  mmc: debugfs: Move EXT CSD debugfs acces to block layer

 drivers/mmc/core/block.c   | 168 -
 drivers/mmc/core/block.h   |  49 +
 drivers/mmc/core/debugfs.c |  19 +
 drivers/mmc/core/queue.c   |  13 ++--
 drivers/mmc/core/queue.h   |  27 +++-
 5 files changed, 200 insertions(+), 76 deletions(-)

-- 
2.9.3

[PATCH 2/6] mmc: block: Tag DRV_OPs with a driver operation type

2017-05-19 Thread Linus Walleij

We will expand the DRV_OP usage, so we need to know which
operation we're performing. Tag the operations with an
enum:ed type and rename the function so it is clear that
it deals with any command and put a switch statement in
it. Currently only ioctls are supported.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 37 -
 drivers/mmc/core/queue.h |  9 +
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 553ab4d1db94..b24e7f5171c9 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -602,6 +602,7 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
idata->ic.write_flag ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
__GFP_RECLAIM);
idatas[0] = idata;
+   req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_IOCTL;
req_to_mmc_queue_req(req)->idata = idatas;
req_to_mmc_queue_req(req)->ioc_count = 1;
blk_execute_rq(mq->queue, NULL, req, 0);
@@ -618,11 +619,11 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
 }
 
 /*
- * The ioctl commands come back from the block layer after it queued it and
+ * The non-block commands come back from the block layer after it queued it and
  * processed it with all other requests and then they get issued in this
  * function.
  */
-static void mmc_blk_ioctl_cmd_issue(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req)
 {
struct mmc_queue_req *mq_rq;
struct mmc_card *card = mq->card;
@@ -631,18 +632,27 @@ static void mmc_blk_ioctl_cmd_issue(struct mmc_queue *mq, 
struct request *req)
int i;
 
mq_rq = req_to_mmc_queue_req(req);
-   for (i = 0; i < mq_rq->ioc_count; i++) {
-   ioc_err = __mmc_blk_ioctl_cmd(card, md, mq_rq->idata[i]);
-   if (ioc_err)
-   break;
-   }
-   mq_rq->ioc_result = ioc_err;
 
-   /* Always switch back to main area after RPMB access */
-   if (md->area_type & MMC_BLK_DATA_AREA_RPMB)
-   mmc_blk_part_switch(card, dev_get_drvdata(&card->dev));
+   switch (mq_rq->drv_op) {
+   case MMC_DRV_OP_IOCTL:
+   for (i = 0; i < mq_rq->ioc_count; i++) {
+   ioc_err =
+   __mmc_blk_ioctl_cmd(card, md, mq_rq->idata[i]);
+   if (ioc_err)
+   break;
+   }
+   mq_rq->ioc_result = ioc_err;
+
+   /* Always switch back to main area after RPMB access */
+   if (md->area_type & MMC_BLK_DATA_AREA_RPMB)
+   mmc_blk_part_switch(card, dev_get_drvdata(&card->dev));
 
-   blk_end_request_all(req, ioc_err);
+   blk_end_request_all(req, ioc_err);
+   break;
+   default:
+   /* Unknown operation */
+   break;
+   }
 }
 
 static int mmc_blk_ioctl_multi_cmd(struct block_device *bdev,
@@ -705,6 +715,7 @@ static int mmc_blk_ioctl_multi_cmd(struct block_device 
*bdev,
req = blk_get_request(mq->queue,
idata[0]->ic.write_flag ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
__GFP_RECLAIM);
+   req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_IOCTL;
req_to_mmc_queue_req(req)->idata = idata;
req_to_mmc_queue_req(req)->ioc_count = num_of_cmds;
blk_execute_rq(mq->queue, NULL, req, 0);
@@ -1904,7 +1915,7 @@ void mmc_blk_issue_rq(struct mmc_queue *mq, struct 
request *req)
 */
if (mq->qcnt)
mmc_blk_issue_rw_rq(mq, NULL);
-   mmc_blk_ioctl_cmd_issue(mq, req);
+   mmc_blk_issue_drv_op(mq, req);
break;
case REQ_OP_DISCARD:
/*
diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index 2793020a3c8c..1e6062eb3e07 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -32,6 +32,14 @@ struct mmc_blk_request {
int retune_retry_done;
 };
 
+/**
+ * enum mmc_drv_op - enumerates the operations in the mmc_queue_req
+ * @MMC_DRV_OP_IOCTL: ioctl operation
+ */
+enum mmc_drv_op {
+   MMC_DRV_OP_IOCTL,
+};
+
 struct mmc_queue_req {
struct mmc_blk_request  brq;
struct scatterlist  *sg;
@@ -39,6 +47,7 @@ struct mmc_queue_req {
struct scatterlist  *bounce_sg;
unsigned intbounce_sg_len;
struct mmc_async_reqareq;
+   enum mmc_drv_op drv_op;
int ioc_result;
struct mmc_blk_ioc_data **idata;
unsigned intioc_count;
-- 
2.9.3

[PATCH 5/6] mmc: debugfs: Move card status retrieveal into the block layer

2017-05-19 Thread Linus Walleij

The debugfs file "status" (in e.g. /debug/mmc3/mmc3:0001) is
only available if and only if the card used is an (e)MMC or
SD card, not for SDIO, as can be seen from this guard in
mmc_add_card_debugfs();

if (mmc_card_mmc(card) || mmc_card_sd(card))
  (...create debugfs "status" entry...)

Further this debugfs entry suffers from all the same starvation
issues as the other userspace things, under e.g. a heavy
dd operation.

It is therefore logical to move this over to the block layer
when it is enabled, using the new custom requests and issue
it using the block request queue.

This makes this debugfs card access land under the request
queue host lock instead of orthogonally taking the lock.

Tested during heavy dd load by cat:in the status file. We
add IS_ENABLED() guards and keep the code snippet just
issueing the card status as a static inline in the header
so we can still have card status working when the block
layer is compiled out.

Keeping two copies of mmc_dbg_card_status_get() around
seems to be a necessary evil to be able to have the MMC/SD
stack working with the block layer disabled: under these
circumstances, the code must simply take another path.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c   | 28 
 drivers/mmc/core/block.h   | 37 +
 drivers/mmc/core/debugfs.c | 15 ++-
 drivers/mmc/core/queue.h   |  2 ++
 4 files changed, 69 insertions(+), 13 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 52635120a0a5..8858798d1349 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1191,6 +1191,7 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
struct mmc_queue_req *mq_rq;
struct mmc_card *card = mq->card;
struct mmc_blk_data *md = mq->blkdata;
+   u32 status;
int ret;
int i;
 
@@ -1219,6 +1220,11 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
card->ext_csd.boot_ro_lock |=
EXT_CSD_BOOT_WP_B_PWR_WP_EN;
break;
+   case MMC_DRV_OP_GET_CARD_STATUS:
+   ret = mmc_send_status(card, &status);
+   if (!ret)
+   ret = status;
+   break;
default:
pr_err("%s: unknown driver specific operation\n",
   md->disk->disk_name);
@@ -1968,6 +1974,28 @@ void mmc_blk_issue_rq(struct mmc_queue *mq, struct 
request *req)
mmc_put_card(card);
 }
 
+/* Called from debugfs for MMC/SD cards */
+int mmc_blk_card_status_get(struct mmc_card *card, u64 *val)
+{
+   struct mmc_blk_data *md = dev_get_drvdata(&card->dev);
+   struct mmc_queue *mq = &md->queue;
+   struct request *req;
+   int ret;
+
+   /* Ask the block layer about the card status */
+   req = blk_get_request(mq->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
+   req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_GET_CARD_STATUS;
+   blk_execute_rq(mq->queue, NULL, req, 0);
+   ret = req_to_mmc_queue_req(req)->drv_op_result;
+   if (ret >= 0) {
+   *val = ret;
+   ret = 0;
+   }
+
+   return ret;
+}
+EXPORT_SYMBOL(mmc_blk_card_status_get);
+
 static inline int mmc_blk_readonly(struct mmc_card *card)
 {
return mmc_card_readonly(card) ||
diff --git a/drivers/mmc/core/block.h b/drivers/mmc/core/block.h
index 860ca7c8df86..1e26755a864b 100644
--- a/drivers/mmc/core/block.h
+++ b/drivers/mmc/core/block.h
@@ -1,9 +1,46 @@
 #ifndef _MMC_CORE_BLOCK_H
 #define _MMC_CORE_BLOCK_H
 
+#include 
+#include "core.h"
+#include "mmc_ops.h"
+
 struct mmc_queue;
 struct request;
 
+#if IS_ENABLED(CONFIG_MMC_BLOCK)
+
 void mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req);
+int mmc_blk_card_status_get(struct mmc_card *card, u64 *val);
+
+#else
+
+/*
+ * Small stub functions to be used when the block layer is not
+ * enabled, e.g. for pure SDIO without MMC/SD configurations.
+ */
+
+static inline void mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
+{
+   return;
+}
+
+static inline int mmc_blk_card_status_get(struct mmc_card *card, u64 *val)
+{
+   u32 status;
+   int ret;
+
+   mmc_get_card(card);
+
+   ret = mmc_send_status(card, &status);
+   if (!ret)
+   *val = status;
+
+   mmc_put_card(card);
+
+   return ret;
+}
+
+#endif /* IS_ENABLED(CONFIG_MMC_BLOCK) */
 
 #endif
diff --git a/drivers/mmc/core/debugfs.c b/drivers/mmc/core/debugfs.c
index a1fba5732d66..ce5b921c7d96 100644
--- a/drivers/mmc/core/debugfs.c
+++ b/drivers/mmc/core/debugfs.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 
+#include "block.h"
 #include "core.h"
 #include "card.h"
 #include "host.h"
@@ -283,19 +284,7 @@ vo

[PATCH 3/6] mmc: block: Move DRV OP issue function

2017-05-19 Thread Linus Walleij

We will need to access static functions above the pure block layer
operations in the file, so move the driver operations issue
function down so we can see all non-blocklayer symbols.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 74 
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index b24e7f5171c9..75b1baacf28b 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -618,43 +618,6 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
return ioc_err ? ioc_err : err;
 }
 
-/*
- * The non-block commands come back from the block layer after it queued it and
- * processed it with all other requests and then they get issued in this
- * function.
- */
-static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req)
-{
-   struct mmc_queue_req *mq_rq;
-   struct mmc_card *card = mq->card;
-   struct mmc_blk_data *md = mq->blkdata;
-   int ioc_err;
-   int i;
-
-   mq_rq = req_to_mmc_queue_req(req);
-
-   switch (mq_rq->drv_op) {
-   case MMC_DRV_OP_IOCTL:
-   for (i = 0; i < mq_rq->ioc_count; i++) {
-   ioc_err =
-   __mmc_blk_ioctl_cmd(card, md, mq_rq->idata[i]);
-   if (ioc_err)
-   break;
-   }
-   mq_rq->ioc_result = ioc_err;
-
-   /* Always switch back to main area after RPMB access */
-   if (md->area_type & MMC_BLK_DATA_AREA_RPMB)
-   mmc_blk_part_switch(card, dev_get_drvdata(&card->dev));
-
-   blk_end_request_all(req, ioc_err);
-   break;
-   default:
-   /* Unknown operation */
-   break;
-   }
-}
-
 static int mmc_blk_ioctl_multi_cmd(struct block_device *bdev,
   struct mmc_ioc_multi_cmd __user *user)
 {
@@ -1222,6 +1185,43 @@ int mmc_access_rpmb(struct mmc_queue *mq)
return false;
 }
 
+/*
+ * The non-block commands come back from the block layer after it queued it and
+ * processed it with all other requests and then they get issued in this
+ * function.
+ */
+static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req)
+{
+   struct mmc_queue_req *mq_rq;
+   struct mmc_card *card = mq->card;
+   struct mmc_blk_data *md = mq->blkdata;
+   int ioc_err;
+   int i;
+
+   mq_rq = req_to_mmc_queue_req(req);
+
+   switch (mq_rq->drv_op) {
+   case MMC_DRV_OP_IOCTL:
+   for (i = 0; i < mq_rq->ioc_count; i++) {
+   ioc_err =
+   __mmc_blk_ioctl_cmd(card, md, mq_rq->idata[i]);
+   if (ioc_err)
+   break;
+   }
+   mq_rq->ioc_result = ioc_err;
+
+   /* Always switch back to main area after RPMB access */
+   if (md->area_type & MMC_BLK_DATA_AREA_RPMB)
+   mmc_blk_part_switch(card, dev_get_drvdata(&card->dev));
+
+   blk_end_request_all(req, ioc_err);
+   break;
+   default:
+   /* Unknown operation */
+   break;
+   }
+}
+
 static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
 {
struct mmc_blk_data *md = mq->blkdata;
-- 
2.9.3

[PATCH 4/6] mmc: block: Move boot partition locking into a driver op

2017-05-19 Thread Linus Walleij

This moves the boot partition lock command (issued from sysfs)
into a custom block layer request, just like the ioctl()s,
getting rid of yet another instance of mmc_get_card().

Since we now have two operations issuing special DRV_OP's, we
rename the result variable ->drv_op_result.

Tested by locking the boot partition from userspace:
> cd /sys/devices/platform/soc/80114000.sdi4_per2/mmc_host/mmc3/
 mmc3:0001/block/mmcblk3/mmcblk3boot0
> echo 1 > ro_lock_until_next_power_on
[  178.645324] mmcblk3boot1: Locking boot partition ro until next power on
[  178.652221] mmcblk3boot0: Locking boot partition ro until next power on

Also tested this with a huge dd job in the background: it
is now possible to lock the boot partitions on the card even
under heavy I/O.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 53 +++-
 drivers/mmc/core/queue.h |  4 +++-
 2 files changed, 33 insertions(+), 24 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 75b1baacf28b..52635120a0a5 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -190,6 +190,8 @@ static ssize_t power_ro_lock_store(struct device *dev,
int ret;
struct mmc_blk_data *md, *part_md;
struct mmc_card *card;
+   struct mmc_queue *mq;
+   struct request *req;
unsigned long set;
 
if (kstrtoul(buf, 0, &set))
@@ -199,20 +201,14 @@ static ssize_t power_ro_lock_store(struct device *dev,
return count;
 
md = mmc_blk_get(dev_to_disk(dev));
+   mq = &md->queue;
card = md->queue.card;
 
-   mmc_get_card(card);
-
-   ret = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL, EXT_CSD_BOOT_WP,
-   card->ext_csd.boot_ro_lock |
-   EXT_CSD_BOOT_WP_B_PWR_WP_EN,
-   card->ext_csd.part_time);
-   if (ret)
-   pr_err("%s: Locking boot partition ro until next power on 
failed: %d\n", md->disk->disk_name, ret);
-   else
-   card->ext_csd.boot_ro_lock |= EXT_CSD_BOOT_WP_B_PWR_WP_EN;
-
-   mmc_put_card(card);
+   /* Dispatch locking to the block layer */
+   req = blk_get_request(mq->queue, REQ_OP_DRV_OUT, __GFP_RECLAIM);
+   req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_BOOT_WP;
+   blk_execute_rq(mq->queue, NULL, req, 0);
+   ret = req_to_mmc_queue_req(req)->drv_op_result;
 
if (!ret) {
pr_info("%s: Locking boot partition ro until next power on\n",
@@ -606,7 +602,7 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
req_to_mmc_queue_req(req)->idata = idatas;
req_to_mmc_queue_req(req)->ioc_count = 1;
blk_execute_rq(mq->queue, NULL, req, 0);
-   ioc_err = req_to_mmc_queue_req(req)->ioc_result;
+   ioc_err = req_to_mmc_queue_req(req)->drv_op_result;
err = mmc_blk_ioctl_copy_to_user(ic_ptr, idata);
blk_put_request(req);
 
@@ -682,7 +678,7 @@ static int mmc_blk_ioctl_multi_cmd(struct block_device 
*bdev,
req_to_mmc_queue_req(req)->idata = idata;
req_to_mmc_queue_req(req)->ioc_count = num_of_cmds;
blk_execute_rq(mq->queue, NULL, req, 0);
-   ioc_err = req_to_mmc_queue_req(req)->ioc_result;
+   ioc_err = req_to_mmc_queue_req(req)->drv_op_result;
 
/* copy to user if data and response */
for (i = 0; i < num_of_cmds && !err; i++)
@@ -1195,7 +1191,7 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
struct mmc_queue_req *mq_rq;
struct mmc_card *card = mq->card;
struct mmc_blk_data *md = mq->blkdata;
-   int ioc_err;
+   int ret;
int i;
 
mq_rq = req_to_mmc_queue_req(req);
@@ -1203,23 +1199,34 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
switch (mq_rq->drv_op) {
case MMC_DRV_OP_IOCTL:
for (i = 0; i < mq_rq->ioc_count; i++) {
-   ioc_err =
-   __mmc_blk_ioctl_cmd(card, md, mq_rq->idata[i]);
-   if (ioc_err)
+   ret = __mmc_blk_ioctl_cmd(card, md, mq_rq->idata[i]);
+   if (ret)
break;
}
-   mq_rq->ioc_result = ioc_err;
-
/* Always switch back to main area after RPMB access */
if (md->area_type & MMC_BLK_DATA_AREA_RPMB)
mmc_blk_part_switch(card, dev_get_drvdata(&card->dev));
-
-   blk_end_request_all(req, ioc_err);
+   break;
+   case MMC_DRV_OP_BOOT_WP:
+   ret = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL, EXT_CSD_BOOT_WP,
+card->ext_csd

[PATCH 1/6] mmc: block: remove req back pointer

2017-05-19 Thread Linus Walleij

Just as we can use blk_mq_rq_from_pdu() to get the per-request
tag we can use blk_mq_rq_to_pdu() to get a request from a tag.
Introduce a static inline helper so we are on the clear what
is happening.

Suggested-by: Christoph Hellwig 
Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c |  8 
 drivers/mmc/core/queue.c | 13 +
 drivers/mmc/core/queue.h |  8 +++-
 3 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index e9737987956f..553ab4d1db94 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1366,7 +1366,7 @@ static enum mmc_blk_status mmc_blk_err_check(struct 
mmc_card *card,
struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
areq);
struct mmc_blk_request *brq = &mq_mrq->brq;
-   struct request *req = mq_mrq->req;
+   struct request *req = mmc_queue_req_to_req(mq_mrq);
int need_retune = card->host->need_retune;
bool ecc_err = false;
bool gen_err = false;
@@ -1473,7 +1473,7 @@ static void mmc_blk_data_prep(struct mmc_queue *mq, 
struct mmc_queue_req *mqrq,
struct mmc_blk_data *md = mq->blkdata;
struct mmc_card *card = md->queue.card;
struct mmc_blk_request *brq = &mqrq->brq;
-   struct request *req = mqrq->req;
+   struct request *req = mmc_queue_req_to_req(mqrq);
 
/*
 * Reliable writes are used to implement Forced Unit Access and
@@ -1578,7 +1578,7 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
 {
u32 readcmd, writecmd;
struct mmc_blk_request *brq = &mqrq->brq;
-   struct request *req = mqrq->req;
+   struct request *req = mmc_queue_req_to_req(mqrq);
struct mmc_blk_data *md = mq->blkdata;
bool do_rel_wr, do_data_tag;
 
@@ -1760,7 +1760,7 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
 */
mq_rq = container_of(old_areq, struct mmc_queue_req, areq);
brq = &mq_rq->brq;
-   old_req = mq_rq->req;
+   old_req = mmc_queue_req_to_req(mq_rq);
type = rq_data_dir(old_req) == READ ? MMC_BLK_READ : 
MMC_BLK_WRITE;
mmc_queue_bounce_post(mq_rq);
 
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index c18c41289ecf..4bf9978b707a 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -184,8 +184,6 @@ static int mmc_init_request(struct request_queue *q, struct 
request *req,
struct mmc_card *card = mq->card;
struct mmc_host *host = card->host;
 
-   mq_rq->req = req;
-
if (card->bouncesz) {
mq_rq->bounce_buf = kmalloc(card->bouncesz, gfp);
if (!mq_rq->bounce_buf)
@@ -223,8 +221,6 @@ static void mmc_exit_request(struct request_queue *q, 
struct request *req)
 
kfree(mq_rq->sg);
mq_rq->sg = NULL;
-
-   mq_rq->req = NULL;
 }
 
 /**
@@ -374,12 +370,13 @@ unsigned int mmc_queue_map_sg(struct mmc_queue *mq, 
struct mmc_queue_req *mqrq)
unsigned int sg_len;
size_t buflen;
struct scatterlist *sg;
+   struct request *req = mmc_queue_req_to_req(mqrq);
int i;
 
if (!mqrq->bounce_buf)
-   return blk_rq_map_sg(mq->queue, mqrq->req, mqrq->sg);
+   return blk_rq_map_sg(mq->queue, req, mqrq->sg);
 
-   sg_len = blk_rq_map_sg(mq->queue, mqrq->req, mqrq->bounce_sg);
+   sg_len = blk_rq_map_sg(mq->queue, req, mqrq->bounce_sg);
 
mqrq->bounce_sg_len = sg_len;
 
@@ -401,7 +398,7 @@ void mmc_queue_bounce_pre(struct mmc_queue_req *mqrq)
if (!mqrq->bounce_buf)
return;
 
-   if (rq_data_dir(mqrq->req) != WRITE)
+   if (rq_data_dir(mmc_queue_req_to_req(mqrq)) != WRITE)
return;
 
sg_copy_to_buffer(mqrq->bounce_sg, mqrq->bounce_sg_len,
@@ -417,7 +414,7 @@ void mmc_queue_bounce_post(struct mmc_queue_req *mqrq)
if (!mqrq->bounce_buf)
return;
 
-   if (rq_data_dir(mqrq->req) != READ)
+   if (rq_data_dir(mmc_queue_req_to_req(mqrq)) != READ)
return;
 
sg_copy_from_buffer(mqrq->bounce_sg, mqrq->bounce_sg_len,
diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index dfe481a8b5ed..2793020a3c8c 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -12,6 +12,13 @@ static inline struct mmc_queue_req 
*req_to_mmc_queue_req(struct request *rq)
return blk_mq_rq_to_pdu(rq);
 }
 
+struct mmc_queue_req;
+
+static inline struct request *mmc_queue_req_to_req(struct mmc_queue_req *mqr)
+{
+   return blk_mq_rq_from_pdu(mqr);
+}
+
 struct task_struct;
 struct mmc_blk_data;
 struct mmc_blk_ioc_data;
@

[PATCH 6/6] mmc: debugfs: Move EXT CSD debugfs acces to block layer

2017-05-19 Thread Linus Walleij

Just like the previous commit moving status retriveal for
MMC and SD cards into the block layer (when active), this
moves the retrieveal of the EXT CSD from the card from
the special ext_csd file into the block stack as well.

Again special care is taken to make the debugfs work even
with the block layer disabled. Again this solves a
starvation issue during heavy block workloads.

It has been tested with and without the block layer and
during heavy load from dd.

Since we can't keep adding weirdo data pointers into
struct mmc_queue_req this converts the
struct mmc_blk_ioc_data **idata pointer to a simple
void *drv_op_data that gets casted into whatever data
the driver-specific command needs to pass, and then I
cast it to the right target type in the sending and
receiving functions.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c   | 30 +++---
 drivers/mmc/core/block.h   | 12 
 drivers/mmc/core/debugfs.c |  4 +---
 drivers/mmc/core/queue.h   |  4 +++-
 4 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 8858798d1349..5be7f06d4ecd 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -599,7 +599,7 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
__GFP_RECLAIM);
idatas[0] = idata;
req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_IOCTL;
-   req_to_mmc_queue_req(req)->idata = idatas;
+   req_to_mmc_queue_req(req)->drv_op_data = idatas;
req_to_mmc_queue_req(req)->ioc_count = 1;
blk_execute_rq(mq->queue, NULL, req, 0);
ioc_err = req_to_mmc_queue_req(req)->drv_op_result;
@@ -675,7 +675,7 @@ static int mmc_blk_ioctl_multi_cmd(struct block_device 
*bdev,
idata[0]->ic.write_flag ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
__GFP_RECLAIM);
req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_IOCTL;
-   req_to_mmc_queue_req(req)->idata = idata;
+   req_to_mmc_queue_req(req)->drv_op_data = idata;
req_to_mmc_queue_req(req)->ioc_count = num_of_cmds;
blk_execute_rq(mq->queue, NULL, req, 0);
ioc_err = req_to_mmc_queue_req(req)->drv_op_result;
@@ -1191,6 +1191,8 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
struct mmc_queue_req *mq_rq;
struct mmc_card *card = mq->card;
struct mmc_blk_data *md = mq->blkdata;
+   struct mmc_blk_ioc_data **idata;
+   u8 **ext_csd;
u32 status;
int ret;
int i;
@@ -1199,8 +1201,9 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
 
switch (mq_rq->drv_op) {
case MMC_DRV_OP_IOCTL:
+   idata = mq_rq->drv_op_data;
for (i = 0; i < mq_rq->ioc_count; i++) {
-   ret = __mmc_blk_ioctl_cmd(card, md, mq_rq->idata[i]);
+   ret = __mmc_blk_ioctl_cmd(card, md, idata[i]);
if (ret)
break;
}
@@ -1225,6 +1228,10 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
if (!ret)
ret = status;
break;
+   case MMC_DRV_OP_GET_EXT_CSD:
+   ext_csd = mq_rq->drv_op_data;
+   ret = mmc_get_ext_csd(card, ext_csd);
+   break;
default:
pr_err("%s: unknown driver specific operation\n",
   md->disk->disk_name);
@@ -1996,6 +2003,23 @@ int mmc_blk_card_status_get(struct mmc_card *card, u64 
*val)
 }
 EXPORT_SYMBOL(mmc_blk_card_status_get);
 
+
+/* Called from debugfs for MMC cards */
+int mmc_blk_get_ext_csd(struct mmc_card *card, u8 **ext_csd)
+{
+   struct mmc_blk_data *md = dev_get_drvdata(&card->dev);
+   struct mmc_queue *mq = &md->queue;
+   struct request *req;
+
+   /* Ask the block layer about the EXT CSD */
+   req = blk_get_request(mq->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
+   req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_GET_EXT_CSD;
+   req_to_mmc_queue_req(req)->drv_op_data = ext_csd;
+   blk_execute_rq(mq->queue, NULL, req, 0);
+   return req_to_mmc_queue_req(req)->drv_op_result;
+}
+EXPORT_SYMBOL(mmc_blk_get_ext_csd);
+
 static inline int mmc_blk_readonly(struct mmc_card *card)
 {
return mmc_card_readonly(card) ||
diff --git a/drivers/mmc/core/block.h b/drivers/mmc/core/block.h
index 1e26755a864b..c85c3b71dcad 100644
--- a/drivers/mmc/core/block.h
+++ b/drivers/mmc/core/block.h
@@ -12,6 +12,7 @@ struct request;
 
 void mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req);
 int mmc_blk_card_status_get(struct mmc_card *card, u64 *val);
+int mmc_blk_get_ext_csd(struct mmc_card *card, u8 **ext_csd);
 
 #else
 
@@ -41,6 +42,17 @@ static inline int mmc_blk_card_st

Re: [PATCH 1/6 v2] mmc: core: Delete bounce buffer Kconfig option

2017-05-19 Thread Linus Walleij

On Fri, May 19, 2017 at 10:30 AM, Ulf Hansson  wrote:

> Thanks, the *series* applied for next! (Responding to patch1 as
> couldn't find the cover-letter for v2).

Awesome, and just to make sure you will not be bored in the weekend
I just sent a sequel series expanding the use of per-request datas
to move more host locking and congestion out of the way.

Yours,
Linus Walleij

Re: [RFC PATCH] mmc: core: Remove CONFIG_MMC_BLOCK_BOUNCE option.

2017-05-23 Thread Linus Walleij

On Fri, May 19, 2017 at 9:30 AM, Steven J. Hill  wrote:

> Remove MMC bounce buffer config option and associated code. This
> is proposed in addition to Linus' changes to remove the config
> option. I have tested this on our Octeon hardware platforms.
>
> Signed-off-by: Steven J. Hill 

This would have to be rebased as Ulf merged my patch making this
a per-host runtime config option. (The Kconfig is gone for example.)

Bounce buffers were added by Pierre Ossman for kernel 2.6.23 in
commit 98ccf14909ba02a41c5925b0b2c92aeeef23d3b9
"mmc: bounce requests for simple hosts"

Quote:

Some hosts cannot do scatter/gather in hardware. Since not doing sg
is such a big performance hit, we (optionally) bounce the requests
to a simple linear buffer that we hand over to the driver.

Signed-off-by: Pierre Ossman 

So this runs the risk on reducing performance on simple MMC/SD
controllers. Notice: simple, not old.

We need to know if people are deploying simple controllers still
and if this is something that really affects their performance.

That said: this was put in place because the kernel was sending
SG lists that the host DMA could not manage.

Nowadays we have two mechanisms:

- DMA engine and DMA-API that help out in managing bounce
  buffers when used. This means this only is useful for hardware
  that does autonomous DMA, without any separate DMA engine.

- CMA that can actually allocate a big chunk of memory: I think
  this original code is restricted to a 64KB segment because
  kmalloc() will only guarantee contigous physical memory up to
  64-128KiB or so. Now we could actually allocate a big badass
  CMA buffer if that improves the performance, and that would be
  a per-host setting.

It would be good to hear from people seeing benefits from bounce
buffers about this. What hardware is there that acually sees a
significant improvement with bounce buffers?

Pierre, what host were you developing this for? Maybe I can try
to get the same and test it.

Yours,
Linus Walleij

Re: [PATCH 5/6] mmc: debugfs: Move card status retrieveal into the block layer

2017-05-23 Thread Linus Walleij

On Mon, May 22, 2017 at 9:42 AM, Ulf Hansson  wrote:
> On 19 May 2017 at 15:37, Linus Walleij  wrote:

>> The debugfs file "status" (in e.g. /debug/mmc3/mmc3:0001) is
>> only available if and only if the card used is an (e)MMC or
>> SD card, not for SDIO, as can be seen from this guard in
>> mmc_add_card_debugfs();
>>
>> if (mmc_card_mmc(card) || mmc_card_sd(card))
>>   (...create debugfs "status" entry...)
>>
>> Further this debugfs entry suffers from all the same starvation
>> issues as the other userspace things, under e.g. a heavy
>> dd operation.
>>
>> It is therefore logical to move this over to the block layer
>> when it is enabled, using the new custom requests and issue
>> it using the block request queue.
>>
>> This makes this debugfs card access land under the request
>> queue host lock instead of orthogonally taking the lock.
>>
>> Tested during heavy dd load by cat:in the status file. We
>> add IS_ENABLED() guards and keep the code snippet just
>> issueing the card status as a static inline in the header
>> so we can still have card status working when the block
>> layer is compiled out.
>
> Seems like a bit of re-factoring/cleanup could help here as
> preparation step. I just posted a patch [1] cleaning up how the mmc
> block layer fetches the card's status.
>
> Perhaps that could at least simplify a bit for $subject patch,
> especially since it also makes mmc_send_status() being exported.

OK if you merge your stuff I can iterate my patches on top,
no problem.

>> +#if IS_ENABLED(CONFIG_MMC_BLOCK)
>
> What wrong with a regular ifdefs stubs? Why do you need IS_ENABLED()?

This is because the entire block layer can be compiled as a
module.

In that case CONFIG_MMC_BLOCK contains 'm' instead of 'y'
which confusingly does not evaluate to true in the preprocessor
(it assumes it is a misspelled 'n' I guess).

And then the autobuilders wreak havoc.

And that is why the IS_ENABLED() defines exist in the first
place IIUC.

I'm all for making CONFIG_MMC_BLOCK into a bool... but
don't know how people (Intel laptops) feel about that extra
code in their kernel at all times.

>>  void mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req);
>
> I don't get what mmc_blk_issue_rq() has to do with this change? Could
> you please explain?

If we start doing stubs we should stub everything IMO, but if you
prefer I can make that a separate patch. Seems a bit overdone
though.

> Hmm, this thing seems a bit upside-down.
>
> Currently it's possible to build the mmc block device driver as a
> module. In cases like this, accessing the card debugs node to request
> the card's status, would trigger a call to mmc_blk_card_status_get().
> How would that work when the mmc block device driver isn't loaded and
> probed?

Module symbol resolution and driver loading is always necessary,
it is no different for e.g. a wifi driver using the 802.11 library.
It is definately possible to shoot oneself in the foot, but I think
udev & friends usually load things in the right order?

> It seems like the life cycle of the card debugfs node is now being
> controlled as when the mmc block device driver has been successfully
> probed. We need to deal with that somehow.

Only for these two files but yes.

ext_csd is a bit dubious as it is only available on storage devices
(eMMC) that can not be SD, SDIO or combo cards, and we could make
it only appear if using the block layer.

The card status however we need to keep if people want it.

We *COULD* consider just thrashing these debugfs files. It is not
technically ABI and I wonder who is actually using them.

Yours,
Linus Walleij

[PATCH 1/5] mmc: block: Move duplicate check

2017-06-15 Thread Linus Walleij

mmc_blk_ioctl() calls either mmc_blk_ioctl_cmd() or
mmc_blk_ioctl_multi_cmd() and each of these make the same
check. Factor it into a new helper function, call it on
both branches of the switch() statement and save a chunk
of duplicate code.

Cc: Shawn Lin 
Signed-off-by: Linus Walleij 
---
ChangeLog v1->v2:
- We need to check the block device only if an actual
  well-known ioctl() is coming in, on the path of the
  switch() statments, only those branches that handle
  actual ioctl()s. Create a new helper function to check
  the block device and call that.
---
 drivers/mmc/core/block.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 1ce6012ce3c1..d1b824e65590 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -566,14 +566,6 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
int err = 0, ioc_err = 0;
struct request *req;
 
-   /*
-* The caller must have CAP_SYS_RAWIO, and must be calling this on the
-* whole block device, not on a partition.  This prevents overspray
-* between sibling partitions.
-*/
-   if ((!capable(CAP_SYS_RAWIO)) || (bdev != bdev->bd_contains))
-   return -EPERM;
-
idata = mmc_blk_ioctl_copy_from_user(ic_ptr);
if (IS_ERR(idata))
return PTR_ERR(idata);
@@ -626,14 +618,6 @@ static int mmc_blk_ioctl_multi_cmd(struct block_device 
*bdev,
__u64 num_of_cmds;
struct request *req;
 
-   /*
-* The caller must have CAP_SYS_RAWIO, and must be calling this on the
-* whole block device, not on a partition.  This prevents overspray
-* between sibling partitions.
-*/
-   if ((!capable(CAP_SYS_RAWIO)) || (bdev != bdev->bd_contains))
-   return -EPERM;
-
if (copy_from_user(&num_of_cmds, &user->num_of_cmds,
   sizeof(num_of_cmds)))
return -EFAULT;
@@ -697,14 +681,34 @@ static int mmc_blk_ioctl_multi_cmd(struct block_device 
*bdev,
return ioc_err ? ioc_err : err;
 }
 
+static int mmc_blk_check_blkdev(struct block_device *bdev)
+{
+   /*
+* The caller must have CAP_SYS_RAWIO, and must be calling this on the
+* whole block device, not on a partition.  This prevents overspray
+* between sibling partitions.
+*/
+   if ((!capable(CAP_SYS_RAWIO)) || (bdev != bdev->bd_contains))
+   return -EPERM;
+   return 0;
+}
+
 static int mmc_blk_ioctl(struct block_device *bdev, fmode_t mode,
unsigned int cmd, unsigned long arg)
 {
+   int ret;
+
switch (cmd) {
case MMC_IOC_CMD:
+   ret = mmc_blk_check_blkdev(bdev);
+   if (ret)
+   return ret;
return mmc_blk_ioctl_cmd(bdev,
(struct mmc_ioc_cmd __user *)arg);
case MMC_IOC_MULTI_CMD:
+   ret = mmc_blk_check_blkdev(bdev);
+   if (ret)
+   return ret;
return mmc_blk_ioctl_multi_cmd(bdev,
(struct mmc_ioc_multi_cmd __user *)arg);
default:
-- 
2.9.4

[PATCH 0/5] Convert RPMB block device to a character device

2017-06-15 Thread Linus Walleij

Looking for ways to get rid of the RPMB "block device" and the
extra block queue. This is one approach, I don't know if it will
stick, let's discuss it, especially the RFC patch.

Patches 1,2,3 can be applied as cleanups unless they collide with
something else.

Patch 5 is a consequence of the character device conversion.

For motivation and in-depth description of the problem I am trying
to solve, see patch 4, the RFC.

Linus Walleij (5):
  mmc: block: Move duplicate check
  mmc: block: Refactor mmc_blk_part_switch()
  mmc: block: Reparametrize mmc_blk_ioctl_[multi]_cmd()
  RFC: mmc: block: Convert RPMB to a character device
  mmc: block: Delete mmc_access_rpmb()

 drivers/mmc/core/block.c | 384 ---
 drivers/mmc/core/queue.c |   2 +-
 drivers/mmc/core/queue.h |   4 +-
 3 files changed, 303 insertions(+), 87 deletions(-)

-- 
2.9.4

[PATCH 2/5] mmc: block: Refactor mmc_blk_part_switch()

2017-06-15 Thread Linus Walleij

Instead of passing a struct mmc_blk_data * to mmc_blk_part_switch()
let's pass the actual partition type we want to switch to. This
is necessary in order not to have a block device with a backing
mmc_blk_data and request queue and all for every hardware partition,
such as RPMB.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index d1b824e65590..94b97f97be1a 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -127,7 +127,7 @@ module_param(perdev_minors, int, 0444);
 MODULE_PARM_DESC(perdev_minors, "Minors numbers to allocate per device");
 
 static inline int mmc_blk_part_switch(struct mmc_card *card,
- struct mmc_blk_data *md);
+ unsigned int part_type);
 
 static struct mmc_blk_data *mmc_blk_get(struct gendisk *disk)
 {
@@ -490,7 +490,7 @@ static int __mmc_blk_ioctl_cmd(struct mmc_card *card, 
struct mmc_blk_data *md,
 
mrq.cmd = &cmd;
 
-   err = mmc_blk_part_switch(card, md);
+   err = mmc_blk_part_switch(card, md->part_type);
if (err)
return err;
 
@@ -767,29 +767,29 @@ static int mmc_blk_part_switch_post(struct mmc_card *card,
 }
 
 static inline int mmc_blk_part_switch(struct mmc_card *card,
- struct mmc_blk_data *md)
+ unsigned int part_type)
 {
int ret = 0;
struct mmc_blk_data *main_md = dev_get_drvdata(&card->dev);
 
-   if (main_md->part_curr == md->part_type)
+   if (main_md->part_curr == part_type)
return 0;
 
if (mmc_card_mmc(card)) {
u8 part_config = card->ext_csd.part_config;
 
-   ret = mmc_blk_part_switch_pre(card, md->part_type);
+   ret = mmc_blk_part_switch_pre(card, part_type);
if (ret)
return ret;
 
part_config &= ~EXT_CSD_PART_CONFIG_ACC_MASK;
-   part_config |= md->part_type;
+   part_config |= part_type;
 
ret = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL,
 EXT_CSD_PART_CONFIG, part_config,
 card->ext_csd.part_time);
if (ret) {
-   mmc_blk_part_switch_post(card, md->part_type);
+   mmc_blk_part_switch_post(card, part_type);
return ret;
}
 
@@ -798,7 +798,7 @@ static inline int mmc_blk_part_switch(struct mmc_card *card,
ret = mmc_blk_part_switch_post(card, main_md->part_curr);
}
 
-   main_md->part_curr = md->part_type;
+   main_md->part_curr = part_type;
return ret;
 }
 
@@ -1141,7 +1141,7 @@ static int mmc_blk_reset(struct mmc_blk_data *md, struct 
mmc_host *host,
int part_err;
 
main_md->part_curr = main_md->part_type;
-   part_err = mmc_blk_part_switch(host->card, md);
+   part_err = mmc_blk_part_switch(host->card, md->part_type);
if (part_err) {
/*
 * We have failed to get back into the correct
@@ -1180,6 +1180,7 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
struct mmc_queue_req *mq_rq;
struct mmc_card *card = mq->card;
struct mmc_blk_data *md = mq->blkdata;
+   struct mmc_blk_data *main_md = dev_get_drvdata(&card->dev);
struct mmc_blk_ioc_data **idata;
u8 **ext_csd;
u32 status;
@@ -1198,7 +1199,7 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
}
/* Always switch back to main area after RPMB access */
if (md->area_type & MMC_BLK_DATA_AREA_RPMB)
-   mmc_blk_part_switch(card, dev_get_drvdata(&card->dev));
+   mmc_blk_part_switch(card, main_md->part_type);
break;
case MMC_DRV_OP_BOOT_WP:
ret = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL, EXT_CSD_BOOT_WP,
@@ -1906,7 +1907,7 @@ void mmc_blk_issue_rq(struct mmc_queue *mq, struct 
request *req)
/* claim host only for the first request */
mmc_get_card(card);
 
-   ret = mmc_blk_part_switch(card, md);
+   ret = mmc_blk_part_switch(card, md->part_type);
if (ret) {
if (req) {
blk_end_request_all(req, -EIO);
@@ -2436,7 +2437,7 @@ static void mmc_blk_remove(struct mmc_card *card)
mmc_blk_remove_parts(card, md);
pm_runtime_get_sync(&card->dev);
mmc_claim_host(card->host);
-   mmc_blk_part_switch(card, md);
+

[PATCH 3/5] mmc: block: Reparametrize mmc_blk_ioctl_[multi]_cmd()

2017-06-15 Thread Linus Walleij

Instead of passing a block device to
mmc_blk_ioctl[_multi]_cmd(), let's pass struct mmc_blk_data()
so we operate ioctl()s on the MMC block device representation
rather than the vanilla block device.

This saves a little duplicated code and makes it possible to
issue ioctl()s not targeted for a specific block device but
rather for a specific partition/area.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 43 ++-
 1 file changed, 18 insertions(+), 25 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 94b97f97be1a..b8c71fdb6ed4 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -555,12 +555,11 @@ static int __mmc_blk_ioctl_cmd(struct mmc_card *card, 
struct mmc_blk_data *md,
return err;
 }
 
-static int mmc_blk_ioctl_cmd(struct block_device *bdev,
+static int mmc_blk_ioctl_cmd(struct mmc_blk_data *md,
 struct mmc_ioc_cmd __user *ic_ptr)
 {
struct mmc_blk_ioc_data *idata;
struct mmc_blk_ioc_data *idatas[1];
-   struct mmc_blk_data *md;
struct mmc_queue *mq;
struct mmc_card *card;
int err = 0, ioc_err = 0;
@@ -570,12 +569,6 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
if (IS_ERR(idata))
return PTR_ERR(idata);
 
-   md = mmc_blk_get(bdev->bd_disk);
-   if (!md) {
-   err = -EINVAL;
-   goto cmd_err;
-   }
-
card = md->queue.card;
if (IS_ERR(card)) {
err = PTR_ERR(card);
@@ -599,20 +592,17 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
blk_put_request(req);
 
 cmd_done:
-   mmc_blk_put(md);
-cmd_err:
kfree(idata->buf);
kfree(idata);
return ioc_err ? ioc_err : err;
 }
 
-static int mmc_blk_ioctl_multi_cmd(struct block_device *bdev,
+static int mmc_blk_ioctl_multi_cmd(struct mmc_blk_data *md,
   struct mmc_ioc_multi_cmd __user *user)
 {
struct mmc_blk_ioc_data **idata = NULL;
struct mmc_ioc_cmd __user *cmds = user->cmds;
struct mmc_card *card;
-   struct mmc_blk_data *md;
struct mmc_queue *mq;
int i, err = 0, ioc_err = 0;
__u64 num_of_cmds;
@@ -638,16 +628,10 @@ static int mmc_blk_ioctl_multi_cmd(struct block_device 
*bdev,
}
}
 
-   md = mmc_blk_get(bdev->bd_disk);
-   if (!md) {
-   err = -EINVAL;
-   goto cmd_err;
-   }
-
card = md->queue.card;
if (IS_ERR(card)) {
err = PTR_ERR(card);
-   goto cmd_done;
+   goto cmd_err;
}
 
 
@@ -670,8 +654,6 @@ static int mmc_blk_ioctl_multi_cmd(struct block_device 
*bdev,
 
blk_put_request(req);
 
-cmd_done:
-   mmc_blk_put(md);
 cmd_err:
for (i = 0; i < num_of_cmds; i++) {
kfree(idata[i]->buf);
@@ -696,6 +678,7 @@ static int mmc_blk_check_blkdev(struct block_device *bdev)
 static int mmc_blk_ioctl(struct block_device *bdev, fmode_t mode,
unsigned int cmd, unsigned long arg)
 {
+   struct mmc_blk_data *md;
int ret;
 
switch (cmd) {
@@ -703,14 +686,24 @@ static int mmc_blk_ioctl(struct block_device *bdev, 
fmode_t mode,
ret = mmc_blk_check_blkdev(bdev);
if (ret)
return ret;
-   return mmc_blk_ioctl_cmd(bdev,
-   (struct mmc_ioc_cmd __user *)arg);
+   md = mmc_blk_get(bdev->bd_disk);
+   if (!md)
+   return -EINVAL;
+   ret = mmc_blk_ioctl_cmd(md,
+   (struct mmc_ioc_cmd __user *)arg);
+   mmc_blk_put(md);
+   return ret;
case MMC_IOC_MULTI_CMD:
ret = mmc_blk_check_blkdev(bdev);
if (ret)
return ret;
-   return mmc_blk_ioctl_multi_cmd(bdev,
-   (struct mmc_ioc_multi_cmd __user *)arg);
+   md = mmc_blk_get(bdev->bd_disk);
+   if (!md)
+   return -EINVAL;
+   ret = mmc_blk_ioctl_multi_cmd(md,
+   (struct mmc_ioc_multi_cmd __user *)arg);
+   mmc_blk_put(md);
+   return ret;
default:
return -EINVAL;
}
-- 
2.9.4

[PATCH 4/5] RFC: mmc: block: Convert RPMB to a character device

2017-06-15 Thread Linus Walleij

The RPMB partition on the eMMC devices is a special area used
for storing cryptographically safe information signed by a
special secret key. To write and read records from this special
area, authentication is needed.

The RPMB area is *only* and *exclusively* accessed using
ioctl():s from userspace. It is not really a block device,
as blocks cannot be read or written from the device, also
the signed chunks that can be stored on the RPMB are actually
256 bytes, not 512 making a block device a real bad fit.

Currently the RPMB partition spawns a separate block device
named /dev/mmcblkNrpmb for each device with an RPMB partition,
including the creation of a block queue with its own kernel
thread and all overhead associated with this. On the Ux500
HREFv60 platform, for example, the two eMMCs means that two
block queues with separate threads are created for no use
whatsoever.

I have concluded that this block device design for RPMB is
actually pretty wrong. The RPMB area should have been designed
to be accessed from /dev/mmcblkN directly, using ioctl()s on
the main block device. It is however way too late to change
that, since userspace expects to open an RPMB device in
/dev/mmcblkNrpmb and we cannot break userspace.

This patch tries to amend the situation using the following
strategy:

- Stop creating a block device for the RPMB partition/area

- Instead create a custom, dynamic character device with
  the same name.

- Make this new character device support exactly the same
  set of ioctl()s as the old block device.

- Wrap the requests back to the same ioctl() handlers, but
  issue them on the block queue of the main partition/area,
  i.e. /dev/mmcblkN

We need to create a special "rpmb" bus type in order to get
udev and/or busybox hot/coldplug to instantiate the device
node properly.

Before the patch, this appears in 'ps aux':

101 root   0:00 [mmcqd/2rpmb]
123 root   0:00 [mmcqd/3rpmb]

After applying the patch these surplus block queue threads
are gone, but RPMB is as usable as ever using the userspace
MMC tools, such as 'mmc rpmb read-counter'.

We get instead those dynamice devices in /dev:

brw-rw1 root root  179,   0 Jan  1  2000 mmcblk0
brw-rw1 root root  179,   1 Jan  1  2000 mmcblk0p1
brw-rw1 root root  179,   2 Jan  1  2000 mmcblk0p2
brw-rw1 root root  179,   5 Jan  1  2000 mmcblk0p5
brw-rw1 root root  179,   8 Jan  1  2000 mmcblk2
brw-rw1 root root  179,  16 Jan  1  2000 mmcblk2boot0
brw-rw1 root root  179,  24 Jan  1  2000 mmcblk2boot1
crw-rw1 root root  248,   0 Jan  1  2000 mmcblk2rpmb
brw-rw1 root root  179,  32 Jan  1  2000 mmcblk3
brw-rw1 root root  179,  40 Jan  1  2000 mmcblk3boot0
brw-rw1 root root  179,  48 Jan  1  2000 mmcblk3boot1
brw-rw1 root root  179,  33 Jan  1  2000 mmcblk3p1
crw-rw1 root root  248,   1 Jan  1  2000 mmcblk3rpmb

Notice the (248,0) and (248,1) character devices for RPMB.

Tomas Winkler 
Signed-off-by: Linus Walleij 
---
Some discussion points:

I am aware of Tomas Winklers attempts to make RPMB handling into
its own subsystem. I have no such ambitions whatsoever, I am only
trying to sensibly accomodate what we already have and handle
our RPMB in a way that is not littering the place with weirdo
block devices.

The patch is a lot of "that should have been done differently from
the outset" and "it is not a perfect solution", I'd appreciate if
you take a look at the kernel before and after this patch and
think of it as a path forward, are things better or worse like
this, thinking toward the future.

I guess it would be nicer if I could (from the KERNEL) create a
symlink from mmcblk2rpmb -> mmcblk2 or a second mknod creating
mmcblk2rpmb with the same major/minor numbers as the main device.
I guess that can be done with udev scripts, but that breaks all
setups with old udev scripts, busybox, Android etc. So creating
a proper device seems necessary to satisfy userspace.

I haven't been able to do much testing as my RPMB-capable device
seems to be failing to do anything sensible, but I atleast get
the same error codes from "mmc rpmb read-counter /deb/mmcblkNrpmb"
before/after the patch.
---
 drivers/mmc/core/block.c | 278 +++
 drivers/mmc/core/queue.h |   2 +
 2 files changed, 256 insertions(+), 24 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index b8c71fdb6ed4..0a226bc23429 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -86,6 +87,7 @@ static int max_devices;
 #define MAX_DEVICES 256
 
 static DEFINE_IDA(mmc_blk_ida);
+static DEFINE_IDA(mmc_rpmb_ida);
 
 /*
  * There is one mmc_blk_data

[PATCH 5/5] mmc: block: Delete mmc_access_rpmb()

2017-06-15 Thread Linus Walleij

This function is used by the block layer queue to bail out of
requests if the current request is an RPMB request.

However this makes no sense: RPMB is only used from ioctl():s,
there are no RPMB accesses coming from the block layer.
An RPMB ioctl() always switches to the RPMB partition and
then back to the main partition before completing.

The only (possible) use of this check must have been to
duct-tape over a race between RPMB ioctl()s colliding with
concurrent non-RPMB accesses to the same device.

This could happen in the past because the RPMB device was
created as a separate block device/disk with its own submit
queue competing with the main partition, and submitting
requests in parallel. This is now gone as we removed the
offending RPMB block device in another patch.

Signed-off-by: Linus Walleij 
---
This patch is not an RFC since it is a logical consequence
of the RFC patch, not really much to discuss about it.
---
 drivers/mmc/core/block.c | 12 
 drivers/mmc/core/queue.c |  2 +-
 drivers/mmc/core/queue.h |  2 --
 3 files changed, 1 insertion(+), 15 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 0a226bc23429..8bb97ac3be08 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1196,18 +1196,6 @@ static inline void mmc_blk_reset_success(struct 
mmc_blk_data *md, int type)
md->reset_done &= ~type;
 }
 
-int mmc_access_rpmb(struct mmc_queue *mq)
-{
-   struct mmc_blk_data *md = mq->blkdata;
-   /*
-* If this is a RPMB partition access, return ture
-*/
-   if (md && md->part_type == EXT_CSD_PART_CONFIG_ACC_RPMB)
-   return true;
-
-   return false;
-}
-
 /*
  * The non-block commands come back from the block layer after it queued it and
  * processed it with all other requests and then they get issued in this
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index ba689a2ffc51..9d3de2859c33 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -32,7 +32,7 @@ static int mmc_prep_request(struct request_queue *q, struct 
request *req)
 {
struct mmc_queue *mq = q->queuedata;
 
-   if (mq && (mmc_card_removed(mq->card) || mmc_access_rpmb(mq)))
+   if (mq && mmc_card_removed(mq->card))
return BLKPREP_KILL;
 
req->rq_flags |= RQF_DONTPREP;
diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index a2b6a9fcab01..7649ed6cbef7 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -89,6 +89,4 @@ extern unsigned int mmc_queue_map_sg(struct mmc_queue *,
 extern void mmc_queue_bounce_pre(struct mmc_queue_req *);
 extern void mmc_queue_bounce_post(struct mmc_queue_req *);
 
-extern int mmc_access_rpmb(struct mmc_queue *);
-
 #endif
-- 
2.9.4

Re: [PATCH 2/5] mmc: block: Refactor mmc_blk_part_switch()

2017-06-20 Thread Linus Walleij

On Mon, Jun 19, 2017 at 9:53 PM, Tomas Winkler  wrote:
> On Thu, Jun 15, 2017 at 3:12 PM, Linus Walleij  
> wrote:
>>
>> Instead of passing a struct mmc_blk_data * to mmc_blk_part_switch()
>> let's pass the actual partition type we want to switch to. This
>> is necessary in order not to have a block device with a backing
>> mmc_blk_data and request queue and all for every hardware partition,
>> such as RPMB.
>>
>> Signed-off-by: Linus Walleij 
>> ---
>>  drivers/mmc/core/block.c | 25 +
>>  1 file changed, 13 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
>> index d1b824e65590..94b97f97be1a 100644
>> --- a/drivers/mmc/core/block.c
>> +++ b/drivers/mmc/core/block.c
>> @@ -127,7 +127,7 @@ module_param(perdev_minors, int, 0444);
>>  MODULE_PARM_DESC(perdev_minors, "Minors numbers to allocate per device");
>>
>>  static inline int mmc_blk_part_switch(struct mmc_card *card,
>> - struct mmc_blk_data *md);
>> + unsigned int part_type);
>
> Maybe it's time to change this misleading 'part_type' name, this a bit
> that represent  the actual  partition to
> access and not a type of an partition. Maybe part_to_access will more
> reflect the spec wording.
> Need to change also  in mmc_blk_data;

That would be a separate patch I think (one patch, one technical step)
what about target_partition or something?

>> /* Always switch back to main area after RPMB access */
>> if (md->area_type & MMC_BLK_DATA_AREA_RPMB)
>> -   mmc_blk_part_switch(card, 
>> dev_get_drvdata(&card->dev));
>> +   mmc_blk_part_switch(card, main_md->part_type);
>
> Actually this switch back should be probably done for any partition
> which is not user data area.
> so this should be
>  if (md->area_type != MMC_BLK_DATA_AREA_MAIN)

That is another technical step so it should be a separate patch as
well.

Actually I think this code is broken in several ways, especially if
you do something crazy like access the main partition, both boot
partitions and the RPMB partition at the same time. It will invariably
screw something up.

I am trying to rework this to use the block layer properly, RPMB is
just the first step...

Yours,
Linus Walleij

[PATCH 2/3 v4] mmc: ops: export mmc_get_status()

2017-06-30 Thread Linus Walleij

This function retrieves the status of the card with the default
number of retries. Since the block layer wants to use this, and
since the block layer is a loadable kernel module, we need to
export this symbol.

Signed-off-by: Linus Walleij 
---
ChangeLog v3->v4:
- No changes just resending
ChangeLog v2->v3:
- New patch to fix a build error, enumerating v3 to keep it
  together with the other patches.
---
 drivers/mmc/core/mmc_ops.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
index 5f7c5920231a..b56fa822ab6b 100644
--- a/drivers/mmc/core/mmc_ops.c
+++ b/drivers/mmc/core/mmc_ops.c
@@ -83,6 +83,7 @@ int mmc_send_status(struct mmc_card *card, u32 *status)
 {
return __mmc_send_status(card, status, MMC_CMD_RETRIES);
 }
+EXPORT_SYMBOL_GPL(mmc_send_status);
 
 static int _mmc_select_card(struct mmc_host *host, struct mmc_card *card)
 {
-- 
2.9.4

[PATCH 3/3 v4] mmc: debugfs: Move block debugfs into block module

2017-06-30 Thread Linus Walleij

If we don't have the block layer enabled, we do not present card
status and extcsd in the debugfs.

Debugfs is not ABI, and maintaining files of no relevance for
non-block devices comes at a high maintenance cost if we shall
support it with the block layer compiled out.

The debugfs entries suffer from all the same starvation
issues as the other userspace things, under e.g. a heavy
dd operation.

The expected number of debugfs users utilizing these two
debugfs files is already low as there is an ioctl() to get the
same information using the mmc-tools, and of these few users
the expected number of people using it on SDIO or combo cards
are expected to be zero.

It is therefore logical to move this over to the block layer
when it is enabled, using the new custom requests and issue
it using the block request queue.

On the other hand it moves some debugfs code from debugfs.c
and into block.c.

Tested during heavy dd load by cat:in the status file.

Signed-off-by: Linus Walleij 
---
ChangeLog v3->v4:
- Squash all the refactorings of these operations into a big
  commit simply moving all the debugfs over to the block layer
  and only creating the files from there.
- Avoid the whole middle-step of creating #if IS_ENABLED()
  that was required to move it stepwise from the debugfs file
  to the block file.
---
 drivers/mmc/core/block.c   | 143 +
 drivers/mmc/core/debugfs.c |  89 
 drivers/mmc/core/queue.h   |   4 ++
 3 files changed, 147 insertions(+), 89 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index d410f578b8df..dbfbc76576ea 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1176,6 +1177,8 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
struct mmc_card *card = mq->card;
struct mmc_blk_data *md = mq->blkdata;
struct mmc_blk_ioc_data **idata;
+   u8 **ext_csd;
+   u32 status;
int ret;
int i;
 
@@ -1205,6 +1208,15 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
card->ext_csd.boot_ro_lock |=
EXT_CSD_BOOT_WP_B_PWR_WP_EN;
break;
+   case MMC_DRV_OP_GET_CARD_STATUS:
+   ret = mmc_send_status(card, &status);
+   if (!ret)
+   ret = status;
+   break;
+   case MMC_DRV_OP_GET_EXT_CSD:
+   ext_csd = mq_rq->drv_op_data;
+   ret = mmc_get_ext_csd(card, ext_csd);
+   break;
default:
pr_err("%s: unknown driver specific operation\n",
   md->disk->disk_name);
@@ -2236,6 +2248,134 @@ static int mmc_add_disk(struct mmc_blk_data *md)
return ret;
 }
 
+#ifdef CONFIG_DEBUG_FS
+
+static int mmc_dbg_card_status_get(void *data, u64 *val)
+{
+   struct mmc_card *card = data;
+   struct mmc_blk_data *md = dev_get_drvdata(&card->dev);
+   struct mmc_queue *mq = &md->queue;
+   struct request *req;
+   int ret;
+
+   /* Ask the block layer about the card status */
+   req = blk_get_request(mq->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
+   req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_GET_CARD_STATUS;
+   blk_execute_rq(mq->queue, NULL, req, 0);
+   ret = req_to_mmc_queue_req(req)->drv_op_result;
+   if (ret >= 0) {
+   *val = ret;
+   ret = 0;
+   }
+
+   return ret;
+}
+DEFINE_SIMPLE_ATTRIBUTE(mmc_dbg_card_status_fops, mmc_dbg_card_status_get,
+   NULL, "%08llx\n");
+
+/* That is two digits * 512 + 1 for newline */
+#define EXT_CSD_STR_LEN 1025
+
+static int mmc_ext_csd_open(struct inode *inode, struct file *filp)
+{
+   struct mmc_card *card = inode->i_private;
+   struct mmc_blk_data *md = dev_get_drvdata(&card->dev);
+   struct mmc_queue *mq = &md->queue;
+   struct request *req;
+   char *buf;
+   ssize_t n = 0;
+   u8 *ext_csd;
+   int err, i;
+
+   buf = kmalloc(EXT_CSD_STR_LEN + 1, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   /* Ask the block layer for the EXT CSD */
+   req = blk_get_request(mq->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
+   req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_GET_EXT_CSD;
+   req_to_mmc_queue_req(req)->drv_op_data = &ext_csd;
+   blk_execute_rq(mq->queue, NULL, req, 0);
+   err = req_to_mmc_queue_req(req)->drv_op_result;
+   if (err) {
+   pr_err("FAILED %d\n", err);
+   goto out_free;
+   }
+
+   for (i = 0; i < 512; i++)
+   n += sprintf(buf + n, "%02x", ext_csd[i]);
+   n += sprintf(buf + n, "\n"

[PATCH 1/3 v4] mmc: block: Anonymize the drv op data pointer

2017-06-30 Thread Linus Walleij

We have a data pointer for the ioctl() data, but we need to
pass other data along with the DRV_OP:s, so make this a
void * so it can be reused.

Signed-off-by: Linus Walleij 
---
ChangeLog v3->v4:
- No changes just resending
ChangeLog v2->v3:
- No changes just resending
---
 drivers/mmc/core/block.c | 8 +---
 drivers/mmc/core/queue.h | 2 +-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 3c7efbdc8591..d410f578b8df 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -598,7 +598,7 @@ static int mmc_blk_ioctl_cmd(struct block_device *bdev,
__GFP_RECLAIM);
idatas[0] = idata;
req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_IOCTL;
-   req_to_mmc_queue_req(req)->idata = idatas;
+   req_to_mmc_queue_req(req)->drv_op_data = idatas;
req_to_mmc_queue_req(req)->ioc_count = 1;
blk_execute_rq(mq->queue, NULL, req, 0);
ioc_err = req_to_mmc_queue_req(req)->drv_op_result;
@@ -674,7 +674,7 @@ static int mmc_blk_ioctl_multi_cmd(struct block_device 
*bdev,
idata[0]->ic.write_flag ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
__GFP_RECLAIM);
req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_IOCTL;
-   req_to_mmc_queue_req(req)->idata = idata;
+   req_to_mmc_queue_req(req)->drv_op_data = idata;
req_to_mmc_queue_req(req)->ioc_count = num_of_cmds;
blk_execute_rq(mq->queue, NULL, req, 0);
ioc_err = req_to_mmc_queue_req(req)->drv_op_result;
@@ -1175,6 +1175,7 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
struct mmc_queue_req *mq_rq;
struct mmc_card *card = mq->card;
struct mmc_blk_data *md = mq->blkdata;
+   struct mmc_blk_ioc_data **idata;
int ret;
int i;
 
@@ -1182,8 +1183,9 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
 
switch (mq_rq->drv_op) {
case MMC_DRV_OP_IOCTL:
+   idata = mq_rq->drv_op_data;
for (i = 0; i < mq_rq->ioc_count; i++) {
-   ret = __mmc_blk_ioctl_cmd(card, md, mq_rq->idata[i]);
+   ret = __mmc_blk_ioctl_cmd(card, md, idata[i]);
if (ret)
break;
}
diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index 361b46408e0f..cf26a15a64bf 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -51,7 +51,7 @@ struct mmc_queue_req {
struct mmc_async_reqareq;
enum mmc_drv_op drv_op;
int drv_op_result;
-   struct mmc_blk_ioc_data **idata;
+   void*drv_op_data;
unsigned intioc_count;
 };
 
-- 
2.9.4

Re: [PATCH 4/6 v2] mmc: block: move single ioctl() commands to block requests

2017-07-31 Thread Linus Walleij

On Wed, Jul 5, 2017 at 9:00 PM, Christoph Hellwig  wrote:
> On Thu, May 18, 2017 at 11:36:14AM +0200, Christoph Hellwig wrote:
>> On Thu, May 18, 2017 at 11:29:34AM +0200, Linus Walleij wrote:
>> > We are storing the ioctl() in/out argument as a pointer in
>> > the per-request struct mmc_blk_request container.
>>
>> Btw, for the main ioctl data (not the little reponse field) it might
>> make sense to use blk_rq_map_user, which will do a get_user_pages
>> on the user data if the alignment fits, and otherwise handle the
>> kernel bounce buffering for you.  This should simplify the code
>> quite a bit more, and in the case where you can access the user
>> memory directly provide a nice little performance boost.
>
> Did you get a chance to look into this?

Sorry, just back from vacation.

I am rebasing my MMC patch stack, so I will take this opportunity to
also look at this during the week. I just need to make sure I find the
right userspace calls to exercise it.

Yours,
Linus Walleij

Re: [PATCH 4/5] RFC: mmc: block: Convert RPMB to a character device

2017-08-11 Thread Linus Walleij

On Mon, Jun 19, 2017 at 11:18 PM, Tomas Winkler  wrote:

> That's correct, I guess someone didn't read the spec till the end when
> adding rpmb block device.
> though also looks like that the software guys where drinking up in the
> bar while jdec committee has met.

:D

>> +/* Device type for RPMB character devices */
>> +static dev_t rpmb_devt;
>
> This is mmc_rpmb device not 'rpmb' as there are other storage devices
> that provide RPMB partition.

OK fixed it.

>> +
>> +/* Bus type for RPMB character devices */
>> +static struct bus_type rpmb_bus_type = {
>> +   .name = "rpmb",
>> +};
>
> Same here, mmc_rpmb_... , and other place bellow.

OK fixed it.

>> +struct mmc_rpmb_data {
(...)
> would keep also partition access bit needed for the partition switching.
(...)
>>  static int __mmc_blk_ioctl_cmd(struct mmc_card *card, struct mmc_blk_data 
>> *md,
>> -  struct mmc_blk_ioc_data *idata)
>> +  struct mmc_blk_ioc_data *idata, bool 
>> rpmb_ioctl)
> Don't remember now if this is for eMMC but in future there might be
> more then one RPMB partition  on the device
> and boolean will not work here. rather use target_part, tho bits are
> exhausted there too.
(...)
>> -   bool is_rpmb = false;
>> +   unsigned int target_part;
> should come as a function input.
(...)
>> +   ret = mmc_blk_alloc_rpmb_part(card, md,
>> +   card->part[idx].size >> 9,
>> +   card->part[idx].name);
> Extract partition access bits formcard->part[idx].part_cfg,

OK I am trying my best with this too...

Yours,
Linus Walleij

[PATCH 0/6] mmc: block: command issue cleanups

2017-01-24 Thread Linus Walleij

The function mmc_blk_issue_rw_rq() is hopelessly convoluted and
need to be refactored to it can be understood by humans.

In the process I found some weird magic return values passed
around for no good reason.

Things are more readable after this.

This work is done towards the goal of breaking the function in
two parts: one just submitting the requests and one checking the
result and possibly resubmitting the command on error, so we
can make the usual path (non-errorpath) smooth and quick, and
be called directly when the driver completes a request.

That in turn is a prerequisite for proper blk-mq integration
with the MMC/SD stack.

All that comes later.

Linus Walleij (6):
  mmc: block: break out mmc_blk_rw_cmd_abort()
  mmc: block: break out mmc_blk_rw_start_new()
  mmc: block: do not assign mq_rq when aborting command
  mmc: block: inline command abortions
  mmc: block: introduce new_areq and old_areq
  mmc: block: stop passing around pointless return values

 drivers/mmc/core/block.c | 108 ++-
 drivers/mmc/core/block.h |   2 +-
 2 files changed, 60 insertions(+), 50 deletions(-)

-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/6] mmc: block: break out mmc_blk_rw_cmd_abort()

2017-01-24 Thread Linus Walleij

As a first step toward breaking apart the very complex function
mmc_blk_issue_rw_rq() we break out the command abort code.
This code assumes "ret" is != 0 and then repeatedly hammers
blk_end_request() until the request to the block layer to end
the request succeeds.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 7bd03381810d..14efe92a14ef 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1598,6 +1598,17 @@ static int mmc_blk_cmd_err(struct mmc_blk_data *md, 
struct mmc_card *card,
return ret;
 }
 
+static void mmc_blk_rw_cmd_abort(struct mmc_card *card, struct request *req)
+{
+   int ret = 1;
+
+   if (mmc_card_removed(card))
+   req->rq_flags |= RQF_QUIET;
+   while (ret)
+   ret = blk_end_request(req, -EIO,
+ blk_rq_cur_bytes(req));
+}
+
 static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
 {
struct mmc_blk_data *md = mq->blkdata;
@@ -1737,11 +1748,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
return 1;
 
  cmd_abort:
-   if (mmc_card_removed(card))
-   req->rq_flags |= RQF_QUIET;
-   while (ret)
-   ret = blk_end_request(req, -EIO,
-   blk_rq_cur_bytes(req));
+   mmc_blk_rw_cmd_abort(card, req);
 
  start_new_req:
if (rqc) {
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/6] mmc: block: do not assign mq_rq when aborting command

2017-01-24 Thread Linus Walleij

The code in mmc_blk_issue_rq_rq() aborts a command if the request
is not properly aligned on large sectors. As part of the path
jumping out, it assigns the local variable mq_rq reflecting
a MMC queue request to the current MMC queue request, which is
confusing since the variable is not used after this jump.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index b60d1fb3a07a..13e6fe060f26 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1649,7 +1649,6 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
!IS_ALIGNED(blk_rq_sectors(rqc), 8)) {
pr_err("%s: Transfer size is not 4KB sector 
size aligned\n",
rqc->rq_disk->disk_name);
-   mq_rq = mq->mqrq_cur;
req = rqc;
rqc = NULL;
goto cmd_abort;
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/6] mmc: block: introduce new_areq and old_areq

2017-01-24 Thread Linus Walleij

Recycling the same variable in an x=x+1 fashion may seem
clever here but it makes the code terse and hard to follow
for humans. Introduce a new_areq and old_areq variable so
we see what is going on.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 4bbb3d16c09b..f3e0c778cdbd 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1631,7 +1631,8 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
enum mmc_blk_status status;
struct mmc_queue_req *mq_rq;
struct request *req;
-   struct mmc_async_req *areq;
+   struct mmc_async_req *new_areq;
+   struct mmc_async_req *old_areq;
 
if (!rqc && !mq->mqrq_prev->req)
return 0;
@@ -1651,11 +1652,12 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
}
 
mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
-   areq = &mq->mqrq_cur->mmc_active;
+   new_areq = &mq->mqrq_cur->mmc_active;
} else
-   areq = NULL;
-   areq = mmc_start_req(card->host, areq, &status);
-   if (!areq) {
+   new_areq = NULL;
+
+   old_areq = mmc_start_req(card->host, new_areq, &status);
+   if (!old_areq) {
/*
 * We have just put the first request into the pipeline
 * and there is nothing more to do until it is
@@ -1670,7 +1672,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
 * An asynchronous request has been completed and we proceed
 * to handle the result of it.
 */
-   mq_rq = container_of(areq, struct mmc_queue_req, mmc_active);
+   mq_rq = container_of(old_areq, struct mmc_queue_req, 
mmc_active);
brq = &mq_rq->brq;
req = mq_rq->req;
type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/6] mmc: block: break out mmc_blk_rw_start_new()

2017-01-24 Thread Linus Walleij

As a step toward breaking apart the very complex function
mmc_blk_issue_rw_rq() we break out the code to start a new
request.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 14efe92a14ef..b60d1fb3a07a 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1609,6 +1609,22 @@ static void mmc_blk_rw_cmd_abort(struct mmc_card *card, 
struct request *req)
  blk_rq_cur_bytes(req));
 }
 
+static void mmc_blk_rw_start_new(struct mmc_queue *mq, struct mmc_card *card,
+struct request *req)
+{
+   if (!req)
+   return;
+
+   if (mmc_card_removed(card)) {
+   req->rq_flags |= RQF_QUIET;
+   blk_end_request_all(req, -EIO);
+   } else {
+   mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
+   mmc_start_req(card->host,
+ &mq->mqrq_cur->mmc_active, NULL);
+   }
+}
+
 static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
 {
struct mmc_blk_data *md = mq->blkdata;
@@ -1751,16 +1767,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
mmc_blk_rw_cmd_abort(card, req);
 
  start_new_req:
-   if (rqc) {
-   if (mmc_card_removed(card)) {
-   rqc->rq_flags |= RQF_QUIET;
-   blk_end_request_all(rqc, -EIO);
-   } else {
-   mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
-   mmc_start_req(card->host,
- &mq->mqrq_cur->mmc_active, NULL);
-   }
-   }
+   mmc_blk_rw_start_new(mq, card, rqc);
 
return 0;
 }
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/6] mmc: block: stop passing around pointless return values

2017-01-24 Thread Linus Walleij

The mmc_blk_issue_rq() function is called in exactly one place
in queue.c and there the return value is ignored. So the
functions called from that function that also meticulously
return 0/1 do so for no good reason.

Error reporting on the asynchronous requests are done upward to
the block layer when the requests are eventually completed or
fail, which may happen during the flow of the mmc_blk_issue_*
functions directly (for "special commands") or later, when an
asynchronous read/write request is completed.

The issuing functions do not give rise to errors on their own,
and there is nothing to return back to the caller in queue.c.
Drop all return values and make the function return void.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 38 ++
 drivers/mmc/core/block.h |  2 +-
 2 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index f3e0c778cdbd..ede759dda395 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1149,7 +1149,7 @@ int mmc_access_rpmb(struct mmc_queue *mq)
return false;
 }
 
-static int mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
 {
struct mmc_blk_data *md = mq->blkdata;
struct mmc_card *card = md->queue.card;
@@ -1187,11 +1187,9 @@ static int mmc_blk_issue_discard_rq(struct mmc_queue 
*mq, struct request *req)
mmc_blk_reset_success(md, type);
 fail:
blk_end_request(req, err, blk_rq_bytes(req));
-
-   return err ? 0 : 1;
 }
 
-static int mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq,
+static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq,
   struct request *req)
 {
struct mmc_blk_data *md = mq->blkdata;
@@ -1254,11 +1252,9 @@ static int mmc_blk_issue_secdiscard_rq(struct mmc_queue 
*mq,
mmc_blk_reset_success(md, type);
 out:
blk_end_request(req, err, blk_rq_bytes(req));
-
-   return err ? 0 : 1;
 }
 
-static int mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req)
 {
struct mmc_blk_data *md = mq->blkdata;
struct mmc_card *card = md->queue.card;
@@ -1269,8 +1265,6 @@ static int mmc_blk_issue_flush(struct mmc_queue *mq, 
struct request *req)
ret = -EIO;
 
blk_end_request_all(req, ret);
-
-   return ret ? 0 : 1;
 }
 
 /*
@@ -1622,7 +1616,7 @@ static void mmc_blk_rw_start_new(struct mmc_queue *mq, 
struct mmc_card *card,
}
 }
 
-static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
+static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
 {
struct mmc_blk_data *md = mq->blkdata;
struct mmc_card *card = md->queue.card;
@@ -1635,7 +1629,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
struct mmc_async_req *old_areq;
 
if (!rqc && !mq->mqrq_prev->req)
-   return 0;
+   return;
 
do {
if (rqc) {
@@ -1648,7 +1642,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
pr_err("%s: Transfer size is not 4KB sector 
size aligned\n",
rqc->rq_disk->disk_name);
mmc_blk_rw_cmd_abort(card, rqc);
-   return 0;
+   return;
}
 
mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
@@ -1665,7 +1659,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
 */
if (status == MMC_BLK_NEW_REQUEST)
mq->flags |= MMC_QUEUE_NEW_REQUEST;
-   return 0;
+   return;
}
 
/*
@@ -1699,7 +1693,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
   __func__, blk_rq_bytes(req),
   brq->data.bytes_xfered);
mmc_blk_rw_cmd_abort(card, req);
-   return 0;
+   return;
}
break;
case MMC_BLK_CMD_ERR:
@@ -1767,18 +1761,16 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
}
} while (ret);
 
-   return 1;
+   return;
 
  cmd_abort:
mmc_blk_rw_cmd_abort(card, req);
 
  start_new_req:
mmc_blk_rw_start_new(mq, card, rqc);
-
-   return 0;
 }
 
-int mmc_blk_issue_rq(struct mmc_queue *mq, struct request

[PATCH 4/6] mmc: block: inline command abortions

2017-01-24 Thread Linus Walleij

Setting rqc to NULL followed by a goto to cmd_abort is just a way
to do unconditional abort without starting any new command.
Inline the calls to mmc_blk_rw_cmd_abort() and return immediately
in those cases.

As a result, mmc_blk_rw_start_new() is not called with NULL
requests, and we can remove the NULL check in the beginning of
this function.

Add some comments to the code flow so it is clear that this is
where the asynchronous requests come back in and the result of
them gets handled.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 13e6fe060f26..4bbb3d16c09b 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1612,9 +1612,6 @@ static void mmc_blk_rw_cmd_abort(struct mmc_card *card, 
struct request *req)
 static void mmc_blk_rw_start_new(struct mmc_queue *mq, struct mmc_card *card,
 struct request *req)
 {
-   if (!req)
-   return;
-
if (mmc_card_removed(card)) {
req->rq_flags |= RQF_QUIET;
blk_end_request_all(req, -EIO);
@@ -1649,9 +1646,8 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
!IS_ALIGNED(blk_rq_sectors(rqc), 8)) {
pr_err("%s: Transfer size is not 4KB sector 
size aligned\n",
rqc->rq_disk->disk_name);
-   req = rqc;
-   rqc = NULL;
-   goto cmd_abort;
+   mmc_blk_rw_cmd_abort(card, rqc);
+   return 0;
}
 
mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
@@ -1660,11 +1656,20 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
areq = NULL;
areq = mmc_start_req(card->host, areq, &status);
if (!areq) {
+   /*
+* We have just put the first request into the pipeline
+* and there is nothing more to do until it is
+* complete.
+*/
if (status == MMC_BLK_NEW_REQUEST)
mq->flags |= MMC_QUEUE_NEW_REQUEST;
return 0;
}
 
+   /*
+* An asynchronous request has been completed and we proceed
+* to handle the result of it.
+*/
mq_rq = container_of(areq, struct mmc_queue_req, mmc_active);
brq = &mq_rq->brq;
req = mq_rq->req;
@@ -1691,8 +1696,8 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
pr_err("%s BUG rq_tot %d d_xfer %d\n",
   __func__, blk_rq_bytes(req),
   brq->data.bytes_xfered);
-   rqc = NULL;
-   goto cmd_abort;
+   mmc_blk_rw_cmd_abort(card, req);
+   return 0;
}
break;
case MMC_BLK_CMD_ERR:
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/6] mmc: block: break out mmc_blk_rw_cmd_abort()

2017-01-26 Thread Linus Walleij

On Wed, Jan 25, 2017 at 10:23 AM, Mateusz Nowak
 wrote:
> On 1/24/2017 11:17, Linus Walleij wrote:
>>
>> As a first step toward breaking apart the very complex function
>> mmc_blk_issue_rw_rq() we break out the command abort code.
>> This code assumes "ret" is != 0 and then repeatedly hammers
>> blk_end_request() until the request to the block layer to end
>> the request succeeds.
>>
>> Signed-off-by: Linus Walleij 
>> ---
>>  drivers/mmc/core/block.c | 17 -
>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
>> index 7bd03381810d..14efe92a14ef 100644
>> --- a/drivers/mmc/core/block.c
>> +++ b/drivers/mmc/core/block.c
>> @@ -1598,6 +1598,17 @@ static int mmc_blk_cmd_err(struct mmc_blk_data *md,
>> struct mmc_card *card,
>> return ret;
>>  }
>>
>> +static void mmc_blk_rw_cmd_abort(struct mmc_card *card, struct request
>> *req)
>> +{
>> +   int ret = 1;
>
> blk_end_request is returning bool, so maybe this variable should have
> matching type since it is only usage in this scope? And maybe it should have
> more meaningful name for this case?

I am just moving/refactoring the code syntax without changing any
semantics. It was an int in the old code so it is an int in the new
code.

It is possible to fix this and other problems as separate patches.

As you can see it also spins here for all eternity if the function
just returns != 0 all the time, that doesn't look good either.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/6] mmc: block: refactor mmc_blk_rw_try_restart()

2017-01-26 Thread Linus Walleij

The mmc_blk_rw_start_new() was named after the label inside
mmc_blk_issue_rw_rq() but is really a confusing name for this
function: what it does is to try to restart the latest issued
command on the host and card of the current MMC queue.

So rename it mmc_blk_rw_try_restart() that reflects what it
is doing and at this point also refactore the function to
treat the removed card as an exception and just exit if this
happens and run on in the function if that is not happening.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 35 +--
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index de9200470c13..14c33f57776c 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1603,17 +1603,24 @@ static void mmc_blk_rw_cmd_abort(struct mmc_card *card, 
struct request *req)
  blk_rq_cur_bytes(req));
 }
 
-static void mmc_blk_rw_start_new(struct mmc_queue *mq, struct mmc_card *card,
-struct request *req)
+/**
+ * mmc_blk_rw_try_restart() - tries to restart the current async request
+ * @mq: the queue with the card and host to restart
+ * @req: a new request that want to be started after the current one
+ */
+static void mmc_blk_rw_try_restart(struct mmc_queue *mq, struct request *req)
 {
-   if (mmc_card_removed(card)) {
+   /*
+* If the card was removed, just cancel everything and return.
+*/
+   if (mmc_card_removed(mq->card)) {
req->rq_flags |= RQF_QUIET;
blk_end_request_all(req, -EIO);
-   } else {
-   mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
-   mmc_start_areq(card->host,
-  &mq->mqrq_cur->mmc_active, NULL);
+   return;
}
+   /* Else proceed and try to restart the current async request */
+   mmc_blk_rw_rq_prep(mq->mqrq_cur, mq->card, 0, mq);
+   mmc_start_areq(mq->card->host, &mq->mqrq_cur->mmc_active, NULL);
 }
 
 static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req)
@@ -1700,11 +1707,11 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
ret = mmc_blk_cmd_err(md, card, brq, old_req, ret);
if (mmc_blk_reset(md, card->host, type)) {
mmc_blk_rw_cmd_abort(card, old_req);
-   mmc_blk_rw_start_new(mq, card, new_req);
+   mmc_blk_rw_try_restart(mq, new_req);
return;
}
if (!ret) {
-   mmc_blk_rw_start_new(mq, card, new_req);
+   mmc_blk_rw_try_restart(mq, new_req);
return;
}
break;
@@ -1717,7 +1724,7 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
if (!mmc_blk_reset(md, card->host, type))
break;
mmc_blk_rw_cmd_abort(card, old_req);
-   mmc_blk_rw_start_new(mq, card, new_req);
+   mmc_blk_rw_try_restart(mq, new_req);
return;
case MMC_BLK_DATA_ERR: {
int err;
@@ -1727,7 +1734,7 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
break;
if (err == -ENODEV) {
mmc_blk_rw_cmd_abort(card, old_req);
-   mmc_blk_rw_start_new(mq, card, new_req);
+   mmc_blk_rw_try_restart(mq, new_req);
return;
}
/* Fall through */
@@ -1748,19 +1755,19 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
ret = blk_end_request(old_req, -EIO,
brq->data.blksz);
if (!ret) {
-   mmc_blk_rw_start_new(mq, card, new_req);
+   mmc_blk_rw_try_restart(mq, new_req);
return;
}
break;
case MMC_BLK_NOMEDIUM:
mmc_blk_rw_cmd_abort(card, old_req);
-   mmc_blk_rw_start_new(mq, card, new_req);
+   mmc_blk_rw_try_restart(mq, new_req);
return;
default:
pr_err("%s: Unhandled return value (%d)",
old_req->rq_disk->disk_name

[PATCH 5/6] mmc: block: rename mmc_active to areq

2017-01-26 Thread Linus Walleij

The mmc_active member of struct mmc_queue_req has a very
confusing name: this is certainly not always "active", it is
the asynchronous request associated by the mmc_queue_req
but it is not guaranteed to be "active" in any sense, such
as being running on the host.

Simply rename this member to "areq".

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 14 +++---
 drivers/mmc/core/queue.h |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 14c33f57776c..04c7162f444e 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1302,7 +1302,7 @@ static enum mmc_blk_status mmc_blk_err_check(struct 
mmc_card *card,
 struct mmc_async_req *areq)
 {
struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
-   mmc_active);
+   areq);
struct mmc_blk_request *brq = &mq_mrq->brq;
struct request *req = mq_mrq->req;
int need_retune = card->host->need_retune;
@@ -1558,8 +1558,8 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
brq->data.sg_len = i;
}
 
-   mqrq->mmc_active.mrq = &brq->mrq;
-   mqrq->mmc_active.err_check = mmc_blk_err_check;
+   mqrq->areq.mrq = &brq->mrq;
+   mqrq->areq.err_check = mmc_blk_err_check;
 
mmc_queue_bounce_pre(mqrq);
 }
@@ -1620,7 +1620,7 @@ static void mmc_blk_rw_try_restart(struct mmc_queue *mq, 
struct request *req)
}
/* Else proceed and try to restart the current async request */
mmc_blk_rw_rq_prep(mq->mqrq_cur, mq->card, 0, mq);
-   mmc_start_areq(mq->card->host, &mq->mqrq_cur->mmc_active, NULL);
+   mmc_start_areq(mq->card->host, &mq->mqrq_cur->areq, NULL);
 }
 
 static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req)
@@ -1653,7 +1653,7 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
}
 
mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
-   new_areq = &mq->mqrq_cur->mmc_active;
+   new_areq = &mq->mqrq_cur->areq;
} else
new_areq = NULL;
 
@@ -1673,7 +1673,7 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
 * An asynchronous request has been completed and we proceed
 * to handle the result of it.
 */
-   mq_rq = container_of(old_areq, struct mmc_queue_req, 
mmc_active);
+   mq_rq = container_of(old_areq, struct mmc_queue_req, areq);
brq = &mq_rq->brq;
old_req = mq_rq->req;
type = rq_data_dir(old_req) == READ ? MMC_BLK_READ : 
MMC_BLK_WRITE;
@@ -1779,7 +1779,7 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
mmc_blk_rw_rq_prep(mq_rq, card,
disable_multi, mq);
mmc_start_areq(card->host,
-   &mq_rq->mmc_active, NULL);
+   &mq_rq->areq, NULL);
mq_rq->brq.retune_retry_done = retune_retry_done;
}
} while (ret);
diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index 0cea02af79d1..e0cd5b1f40ee 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -33,7 +33,7 @@ struct mmc_queue_req {
char*bounce_buf;
struct scatterlist  *bounce_sg;
unsigned intbounce_sg_len;
-   struct mmc_async_reqmmc_active;
+   struct mmc_async_reqareq;
 };
 
 struct mmc_queue {
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/6] mmc: block: inline the command abort and start new goto:s

2017-01-26 Thread Linus Walleij

The goto statements sprinkled over the mmc_blk_issue_rw_rq()
function has grown over the years and makes the code pretty hard
to read.

Inline the calls such that:

goto cmd_abort; ->
mmc_blk_rw_cmd_abort(card, req);
mmc_blk_rw_start_new(mq, card, rqc);
return;

goto start_new_req; ->
mmc_blk_rw_start_new(mq, card, rqc);
return;

After this it is more clear how we exit the do {} while
loop in this function, and it gets possible to split the
code apart.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 46 +++---
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index ede759dda395..8f91d7ddfc56 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1698,10 +1698,15 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
break;
case MMC_BLK_CMD_ERR:
ret = mmc_blk_cmd_err(md, card, brq, req, ret);
-   if (mmc_blk_reset(md, card->host, type))
-   goto cmd_abort;
-   if (!ret)
-   goto start_new_req;
+   if (mmc_blk_reset(md, card->host, type)) {
+   mmc_blk_rw_cmd_abort(card, req);
+   mmc_blk_rw_start_new(mq, card, rqc);
+   return;
+   }
+   if (!ret) {
+   mmc_blk_rw_start_new(mq, card, rqc);
+   return;
+   }
break;
case MMC_BLK_RETRY:
retune_retry_done = brq->retune_retry_done;
@@ -1711,15 +1716,20 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
case MMC_BLK_ABORT:
if (!mmc_blk_reset(md, card->host, type))
break;
-   goto cmd_abort;
+   mmc_blk_rw_cmd_abort(card, req);
+   mmc_blk_rw_start_new(mq, card, rqc);
+   return;
case MMC_BLK_DATA_ERR: {
int err;
 
err = mmc_blk_reset(md, card->host, type);
if (!err)
break;
-   if (err == -ENODEV)
-   goto cmd_abort;
+   if (err == -ENODEV) {
+   mmc_blk_rw_cmd_abort(card, req);
+   mmc_blk_rw_start_new(mq, card, rqc);
+   return;
+   }
/* Fall through */
}
case MMC_BLK_ECC_ERR:
@@ -1737,15 +1747,21 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
 */
ret = blk_end_request(req, -EIO,
brq->data.blksz);
-   if (!ret)
-   goto start_new_req;
+   if (!ret) {
+   mmc_blk_rw_start_new(mq, card, rqc);
+   return;
+   }
break;
case MMC_BLK_NOMEDIUM:
-   goto cmd_abort;
+   mmc_blk_rw_cmd_abort(card, req);
+   mmc_blk_rw_start_new(mq, card, rqc);
+   return;
default:
pr_err("%s: Unhandled return value (%d)",
req->rq_disk->disk_name, status);
-   goto cmd_abort;
+   mmc_blk_rw_cmd_abort(card, req);
+   mmc_blk_rw_start_new(mq, card, rqc);
+   return;
}
 
if (ret) {
@@ -1760,14 +1776,6 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
mq_rq->brq.retune_retry_done = retune_retry_done;
}
} while (ret);
-
-   return;
-
- cmd_abort:
-   mmc_blk_rw_cmd_abort(card, req);
-
- start_new_req:
-   mmc_blk_rw_start_new(mq, card, rqc);
 }
 
 void mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/6] mmc: queue: turn queue flags into bools

2017-01-26 Thread Linus Walleij

Instead of masking and setting two bits in the "flags" field
for the mmc_queue, just use two bools named "suspended" and
"new_request".

The masking and setting would likely have race conditions
anyways, it is better to use a simple member like this.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c |  6 +++---
 drivers/mmc/core/queue.c | 12 ++--
 drivers/mmc/core/queue.h |  5 ++---
 3 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 04c7162f444e..7be50ebf300f 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1665,7 +1665,7 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
 * complete.
 */
if (status == MMC_BLK_NEW_REQUEST)
-   mq->flags |= MMC_QUEUE_NEW_REQUEST;
+   mq->new_request = true;
return;
}
 
@@ -1804,7 +1804,7 @@ void mmc_blk_issue_rq(struct mmc_queue *mq, struct 
request *req)
goto out;
}
 
-   mq->flags &= ~MMC_QUEUE_NEW_REQUEST;
+   mq->new_request = false;
if (req && req_op(req) == REQ_OP_DISCARD) {
/* complete ongoing async transfer before issuing discard */
if (card->host->areq)
@@ -1825,7 +1825,7 @@ void mmc_blk_issue_rq(struct mmc_queue *mq, struct 
request *req)
}
 
 out:
-   if ((!req && !(mq->flags & MMC_QUEUE_NEW_REQUEST)) || req_is_special)
+   if ((!req && !mq->new_request) || req_is_special)
/*
 * Release host when there are no more requests
 * and after special request(discard, flush) is done.
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 611f5c6d1950..5cb369c2664b 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -86,8 +86,8 @@ static int mmc_queue_thread(void *d)
set_current_state(TASK_RUNNING);
mmc_blk_issue_rq(mq, req);
cond_resched();
-   if (mq->flags & MMC_QUEUE_NEW_REQUEST) {
-   mq->flags &= ~MMC_QUEUE_NEW_REQUEST;
+   if (mq->new_request) {
+   mq->new_request = false;
continue; /* fetch again */
}
 
@@ -401,8 +401,8 @@ void mmc_queue_suspend(struct mmc_queue *mq)
struct request_queue *q = mq->queue;
unsigned long flags;
 
-   if (!(mq->flags & MMC_QUEUE_SUSPENDED)) {
-   mq->flags |= MMC_QUEUE_SUSPENDED;
+   if (!mq->suspended) {
+   mq->suspended |= true;
 
spin_lock_irqsave(q->queue_lock, flags);
blk_stop_queue(q);
@@ -421,8 +421,8 @@ void mmc_queue_resume(struct mmc_queue *mq)
struct request_queue *q = mq->queue;
unsigned long flags;
 
-   if (mq->flags & MMC_QUEUE_SUSPENDED) {
-   mq->flags &= ~MMC_QUEUE_SUSPENDED;
+   if (mq->suspended) {
+   mq->suspended = false;
 
up(&mq->thread_sem);
 
diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index e0cd5b1f40ee..e298f100101b 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -40,9 +40,8 @@ struct mmc_queue {
struct mmc_card *card;
struct task_struct  *thread;
struct semaphorethread_sem;
-   unsigned intflags;
-#define MMC_QUEUE_SUSPENDED(1 << 0)
-#define MMC_QUEUE_NEW_REQUEST  (1 << 1)
+   boolnew_request;
+   boolsuspended;
boolasleep;
struct mmc_blk_data *blkdata;
struct request_queue*queue;
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/6] mmc: core: rename mmc_start_req() to *areq()

2017-01-26 Thread Linus Walleij

With the coexisting __mmc_start_request(), mmc_start_request()
and __mmc_start_req() it is a bit confusing that mmc_start_req()
actually does not start a normal request, but an asynchronous
request.

Rename it to mmc_start_areq() to make it explicit what the
function is doing, also fix the kerneldoc for this function
while we're at it.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c|  8 
 drivers/mmc/core/core.c | 14 +++---
 drivers/mmc/core/mmc_test.c |  8 
 include/linux/mmc/core.h|  2 +-
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index aaade079603e..de9200470c13 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1611,8 +1611,8 @@ static void mmc_blk_rw_start_new(struct mmc_queue *mq, 
struct mmc_card *card,
blk_end_request_all(req, -EIO);
} else {
mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
-   mmc_start_req(card->host,
- &mq->mqrq_cur->mmc_active, NULL);
+   mmc_start_areq(card->host,
+  &mq->mqrq_cur->mmc_active, NULL);
}
 }
 
@@ -1650,7 +1650,7 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
} else
new_areq = NULL;
 
-   old_areq = mmc_start_req(card->host, new_areq, &status);
+   old_areq = mmc_start_areq(card->host, new_areq, &status);
if (!old_areq) {
/*
 * We have just put the first request into the pipeline
@@ -1771,7 +1771,7 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
 */
mmc_blk_rw_rq_prep(mq_rq, card,
disable_multi, mq);
-   mmc_start_req(card->host,
+   mmc_start_areq(card->host,
&mq_rq->mmc_active, NULL);
mq_rq->brq.retune_retry_done = retune_retry_done;
}
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 8c458255e55a..ed1768cf464a 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -631,10 +631,10 @@ static void mmc_post_req(struct mmc_host *host, struct 
mmc_request *mrq,
 }
 
 /**
- * mmc_start_req - start a non-blocking request
+ * mmc_start_areq - start an asynchronous request
  * @host: MMC host to start command
- * @areq: async request to start
- * @error: out parameter returns 0 for success, otherwise non zero
+ * @areq: asynchronous request to start
+ * @ret_stat: out parameter for status
  *
  * Start a new MMC custom command request for a host.
  * If there is on ongoing async request wait for completion
@@ -646,9 +646,9 @@ static void mmc_post_req(struct mmc_host *host, struct 
mmc_request *mrq,
  * return the completed request. If there is no ongoing request, NULL
  * is returned without waiting. NULL is not an error condition.
  */
-struct mmc_async_req *mmc_start_req(struct mmc_host *host,
-   struct mmc_async_req *areq,
-   enum mmc_blk_status *ret_stat)
+struct mmc_async_req *mmc_start_areq(struct mmc_host *host,
+struct mmc_async_req *areq,
+enum mmc_blk_status *ret_stat)
 {
enum mmc_blk_status status = MMC_BLK_SUCCESS;
int start_err = 0;
@@ -708,7 +708,7 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host,
*ret_stat = status;
return data;
 }
-EXPORT_SYMBOL(mmc_start_req);
+EXPORT_SYMBOL(mmc_start_areq);
 
 /**
  * mmc_wait_for_req - start a request and wait for completion
diff --git a/drivers/mmc/core/mmc_test.c b/drivers/mmc/core/mmc_test.c
index 83d193c09d98..f99ac3123fd2 100644
--- a/drivers/mmc/core/mmc_test.c
+++ b/drivers/mmc/core/mmc_test.c
@@ -853,7 +853,7 @@ static int mmc_test_nonblock_transfer(struct mmc_test_card 
*test,
for (i = 0; i < count; i++) {
mmc_test_prepare_mrq(test, cur_areq->mrq, sg, sg_len, dev_addr,
 blocks, blksz, write);
-   done_areq = mmc_start_req(test->card->host, cur_areq, &status);
+   done_areq = mmc_start_areq(test->card->host, cur_areq, &status);
 
if (status != MMC_BLK_SUCCESS || (!done_areq && i > 0)) {
ret = RESULT_FAIL;
@@ -872,7 +872,7 @@ static int mmc_test_nonblock_transfer(struct mmc_test_card 
*test,
dev_addr += blocks;
}
 
-   done_areq = mmc_start_req(test->card->host, NULL, &status);
+   done_areq = mmc_start_

[PATCH 2/6] mmc: block: rename rqc and req

2017-01-26 Thread Linus Walleij

In the function mmc_blk_issue_rw_rq() the new request coming in
from the block layer is called "rqc" and the old request that
was potentially just returned back from the asynchronous
mechanism is called "req".

This is really confusing when trying to analyze and understand
the code, it becomes a perceptual nightmare to me. Maybe others
have better parserheads but it is not working for me.

Rename "rqc" to "new_req" and "req" to "old_req" to reflect what
is semantically going on into the syntax.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 56 
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 8f91d7ddfc56..aaade079603e 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1616,7 +1616,7 @@ static void mmc_blk_rw_start_new(struct mmc_queue *mq, 
struct mmc_card *card,
}
 }
 
-static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
+static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req)
 {
struct mmc_blk_data *md = mq->blkdata;
struct mmc_card *card = md->queue.card;
@@ -1624,24 +1624,24 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
int ret = 1, disable_multi = 0, retry = 0, type, retune_retry_done = 0;
enum mmc_blk_status status;
struct mmc_queue_req *mq_rq;
-   struct request *req;
+   struct request *old_req;
struct mmc_async_req *new_areq;
struct mmc_async_req *old_areq;
 
-   if (!rqc && !mq->mqrq_prev->req)
+   if (!new_req && !mq->mqrq_prev->req)
return;
 
do {
-   if (rqc) {
+   if (new_req) {
/*
 * When 4KB native sector is enabled, only 8 blocks
 * multiple read or write is allowed
 */
if (mmc_large_sector(card) &&
-   !IS_ALIGNED(blk_rq_sectors(rqc), 8)) {
+   !IS_ALIGNED(blk_rq_sectors(new_req), 8)) {
pr_err("%s: Transfer size is not 4KB sector 
size aligned\n",
-   rqc->rq_disk->disk_name);
-   mmc_blk_rw_cmd_abort(card, rqc);
+   new_req->rq_disk->disk_name);
+   mmc_blk_rw_cmd_abort(card, new_req);
return;
}
 
@@ -1668,8 +1668,8 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
 */
mq_rq = container_of(old_areq, struct mmc_queue_req, 
mmc_active);
brq = &mq_rq->brq;
-   req = mq_rq->req;
-   type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
+   old_req = mq_rq->req;
+   type = rq_data_dir(old_req) == READ ? MMC_BLK_READ : 
MMC_BLK_WRITE;
mmc_queue_bounce_post(mq_rq);
 
switch (status) {
@@ -1680,7 +1680,7 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
 */
mmc_blk_reset_success(md, type);
 
-   ret = blk_end_request(req, 0,
+   ret = blk_end_request(old_req, 0,
brq->data.bytes_xfered);
 
/*
@@ -1690,21 +1690,21 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *rqc)
 */
if (status == MMC_BLK_SUCCESS && ret) {
pr_err("%s BUG rq_tot %d d_xfer %d\n",
-  __func__, blk_rq_bytes(req),
+  __func__, blk_rq_bytes(old_req),
   brq->data.bytes_xfered);
-   mmc_blk_rw_cmd_abort(card, req);
+   mmc_blk_rw_cmd_abort(card, old_req);
return;
}
break;
case MMC_BLK_CMD_ERR:
-   ret = mmc_blk_cmd_err(md, card, brq, req, ret);
+   ret = mmc_blk_cmd_err(md, card, brq, old_req, ret);
if (mmc_blk_reset(md, card->host, type)) {
-   mmc_blk_rw_cmd_abort(card, req);
-   mmc_blk_rw_start_new(mq, card, rqc);
+   mmc_blk_rw_cmd_abort(card, old_req);
+   mmc_blk_rw_start_new(mq, card, new_req);

Re: [PATCH 0/6] mmc: block: command issue cleanups

2017-01-30 Thread Linus Walleij

On Fri, Jan 27, 2017 at 8:58 AM, Ulf Hansson  wrote:

>>> Linus Walleij (6):
>>>   mmc: block: break out mmc_blk_rw_cmd_abort()
>>>   mmc: block: break out mmc_blk_rw_start_new()
>>>   mmc: block: do not assign mq_rq when aborting command
>>>   mmc: block: inline command abortions
>>>   mmc: block: introduce new_areq and old_areq
>>>   mmc: block: stop passing around pointless return values
(...)
> Seems like this series may have issues. I have looked at boot reports
> from kernelci, and particular the reports for
> https://kernelci.org/boot/sun7i-a20-bananapi/job/ulfh/ are
> interesting.
>
> Apparently, this board has an SD card attached. There have been errors
> reported in the log for a while when doing data transfers, although
> none of these errors have triggered the kernelci to report a boot
> error.

Damned I wish I could be hands-on with this system and bisect
it. It's very helpful with shaky systems really. Sadly the errors are
hard to reproduce :(

The old errors look like so:

[6.099124] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD SBE !!
[6.105211] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[6.122394] mmcblk0: timed out sending r/w cmd command, card status 0x900
[6.665013] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !!
[6.671011] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[6.677812] mmcblk0: timed out sending r/w cmd command, card status 0x900
[7.123727] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !!
[7.129692] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[7.136489] mmcblk0: timed out sending r/w cmd command, card status 0x900
[7.143349] blk_update_request: I/O error, dev mmcblk0, sector 124800
[7.493691] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !!
[7.499651] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[7.506229] mmcblk0: timed out sending r/w cmd command, card status 0x900
[7.943641] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !!
[7.949595] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[7.956222] mmcblk0: timed out sending r/w cmd command, card status 0x900
[7.963010] blk_update_request: I/O error, dev mmcblk0, sector 124800
[7.969499] Buffer I/O error on dev mmcblk0p1, logical block 15344,
async page read
[8.321411] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !!
[8.327378] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[8.334018] mmcblk0: timed out sending r/w cmd command, card status 0x900
[8.763338] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !!
[8.769276] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[8.775960] mmcblk0: timed out sending r/w cmd command, card status 0x900
[8.782750] blk_update_request: I/O error, dev mmcblk0, sector 124928
[9.125126] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !!
[9.131084] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[9.137624] mmcblk0: timed out sending r/w cmd command, card status 0x900
[9.15] blk_update_request: I/O error, dev mmcblk0, sector 124928
[9.150881] Buffer I/O error on dev mmcblk0p2, logical block 0,
async page read

So something was causing errors on the read command.

> However, I suspect that some of the changes in this series make it
> worse. Perhaps because of a changed error handling the mmc block
> layer!?
>
> Particular, look at the difference between these [1] boot logs, it
> might give you some hints. I have also added Maxime to this thread,
> perhaps he can help out with the sunxi mmc driver.

The new errors look like so:

[6.099171] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD SBE !!
[6.105259] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[6.127415] mmcblk0: timed out sending r/w cmd command, card status 0x900
[6.28] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !!
[6.672626] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[6.679420] mmcblk0: timed out sending r/w cmd command, card status 0x900
[7.503256] sunxi-mmc 1c0f000.mmc: fatal err update clk timeout
[8.623257] sunxi-mmc 1c0f000.mmc: fatal err update clk timeout

This "fatal err update clk timeout" is new and is coming from the driver.

[8.630370] mmc0: tried to reset card, got error -5
[8.635309] blk_update_request: I/O error, dev mmcblk0, sector 124800
[8.642366] mmcblk0: error -5 sending status command, retrying
[8.648279] mmcblk0: error -5 sending status command, retrying
[8.654132] mmcblk0: error -5 sending status command, aborting
[8.659961] blk_update_request: I/O error, dev mmcblk0, sector 7167872
[8.667201] mmcblk0: error -5 sending status command, retrying
[8.673031] mmcblk0: error -5 sending status command, retrying
[8.678916] mmcblk0: error -5 sending status command, aborting
[8.684758] blk_upda

Re: [PATCH 0/6] mmc: block: command issue cleanups

2017-01-30 Thread Linus Walleij

On Mon, Jan 30, 2017 at 2:05 PM, Linus Walleij  wrote:
> On Fri, Jan 27, 2017 at 8:58 AM, Ulf Hansson  wrote:

> And now it is crashing at mmc_blk_rw_rq_prep() + 0x20 so I suspect it is one 
> of
> these:
>
> struct mmc_blk_request *brq = &mqrq->brq;
> struct request *req = mqrq->req;
> struct mmc_blk_data *md = mq->blkdata;
>
> I guess the first: mqrq is NULL.
>
> So I suspect the oneliner in commit 0ebd6e72b5ee2592625d5ae567a729345dfe07b6
> "mmc: block: do not assign mq_rq when aborting command"

Bah looking closer at that it just doesn't make any logical sense at all.

I now suspect the NULL check removed in mmc_blk_rw_start_new()
to be behind this, so sent a patch to restore it.

No idea how to test it or if it's the real problem.

I had to rebase my further patches on top too, so if you merge this
I need to resend the next series.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/16] multiqueue for MMC/SD third try

2017-02-09 Thread Linus Walleij

 seems to visibly regress with MQ.
My best guess is that it is caused by the absence of the block
scheduler.

I do not know if my conclusions are right or anything, please
scrutinize.

Linus Walleij (16):
  mmc: core: move some code in mmc_start_areq()
  mmc: core: refactor asynchronous request finalization
  mmc: core: refactor mmc_request_done()
  mmc: core: move the asynchronous post-processing
  mmc: core: add a kthread for completing requests
  mmc: core: replace waitqueue with worker
  mmc: core: do away with is_done_rcv
  mmc: core: do away with is_new_req
  mmc: core: kill off the context info
  mmc: queue: simplify queue logic
  mmc: block: shuffle retry and error handling
  mmc: queue: stop flushing the pipeline with NULL
  mmc: queue: issue struct mmc_queue_req items
  mmc: queue: get/put struct mmc_queue_req
  mmc: queue: issue requests in massive parallel
  RFC: mmc: switch MMC/SD to use blk-mq multiqueueing v3

 drivers/mmc/core/block.c | 426 +++
 drivers/mmc/core/block.h |  10 +-
 drivers/mmc/core/bus.c   |   1 -
 drivers/mmc/core/core.c  | 228 -
 drivers/mmc/core/core.h  |   2 -
 drivers/mmc/core/host.c  |   2 +-
 drivers/mmc/core/queue.c | 337 ++---
 drivers/mmc/core/queue.h |  21 ++-
 include/linux/mmc/core.h |   9 +-
 include/linux/mmc/host.h |  24 +--
 10 files changed, 481 insertions(+), 579 deletions(-)

-- 
2.9.3

[PATCH 16/16] RFC: mmc: switch MMC/SD to use blk-mq multiqueueing v3

2017-02-09 Thread Linus Walleij

HACK ALERT: DO NOT MERGE THIS! IT IS A FYI PATCH FOR DISCUSSION
ONLY.

This is a totally new implementation of how to do multiqueue
in the MMC/SD stack. It is based on top of my refactorings in the
series which ends with this patch, and now makes proper use of
.init_request() and .exit_request() to initialize the per-request
hardware context, reusing good old struct mmc_queue_req which is
what is actually intended by these functions.

We kill off the kthread that was just calling blk_fetch_request()
and let blk-mq drive all traffic, nice, that is how it should work.

Due to having switched the submission mechanics around so that
the completion of requests is now triggered from the host
callbacks, we manage to keep the same performance for linear
reads/writes as we have for the old block layer.

Some open questions:

- We used to issue mmc_get_card() when the first request comes
  in and mmc_put_card() when we get NULL from blk_fetch_request().
  Well as we are not pushed any NULL requests anymore we need
  another way for the queue to tell us it is idle, or we should
  just set up a delayed work and release the card if no new
  requests appear for some time.

- The flush was handled by issueing blk_end_request_all() in
  the old scheduler. Is blk_mq_complete_request() really doing
  the same job, or is there some extra magic needed here?

- We can sometime get a partial read from a MMC command, meaning
  some of the request has been handled. We know how many bytes
  were read/written. We used to report this to the block layer
  using blk_end_request(old_req, 0, bytes_xfered) but the MQ
  scheduler seems to be missing a command that reports
  partial completion. How do we handle this?

Apart from that my only remaining worries are about the
block scheduler, but I hear Jens and Paolo are working to fix
this.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c |  66 -
 drivers/mmc/core/core.c  |   4 -
 drivers/mmc/core/queue.c | 355 ---
 drivers/mmc/core/queue.h |  15 +-
 4 files changed, 159 insertions(+), 281 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index f1008ce5376b..f977117f7435 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -96,7 +97,6 @@ static DEFINE_SPINLOCK(mmc_blk_lock);
  * There is one mmc_blk_data per slot.
  */
 struct mmc_blk_data {
-   spinlock_t  lock;
struct device   *parent;
struct gendisk  *disk;
struct mmc_queue queue;
@@ -1188,7 +1188,7 @@ static void mmc_blk_issue_discard_rq(struct mmc_queue_req 
*mq_rq)
if (!err)
mmc_blk_reset_success(md, type);
 fail:
-   blk_end_request(mq_rq->req, err, blk_rq_bytes(mq_rq->req));
+   blk_mq_complete_request(mq_rq->req, err);
 }
 
 static void mmc_blk_issue_secdiscard_rq(struct mmc_queue_req *mq_rq)
@@ -1265,7 +1265,8 @@ static void mmc_blk_issue_flush(struct mmc_queue_req 
*mq_rq)
if (ret)
ret = -EIO;
 
-   blk_end_request_all(mq_rq->req, ret);
+   /* FIXME: was using blk_end_request_all() to flush */
+   blk_mq_complete_request(mq_rq->req, ret);
 }
 
 /*
@@ -1589,12 +1590,15 @@ static bool mmc_blk_rw_cmd_err(struct mmc_blk_data *md, 
struct mmc_card *card,
int err;
 
err = mmc_sd_num_wr_blocks(card, &blocks);
-   if (err)
+   if (err) {
req_pending = old_req_pending;
-   else
-   req_pending = blk_end_request(req, 0, blocks << 9);
+   } else {
+   blk_mq_complete_request(req, 0);
+   req_pending = false;
+   }
} else {
-   req_pending = blk_end_request(req, 0, brq->data.bytes_xfered);
+   blk_mq_complete_request(req, 0);
+   req_pending = false;
}
return req_pending;
 }
@@ -1630,33 +1634,18 @@ static void mmc_blk_rw_try_restart(struct mmc_queue_req 
*mq_rq)
 void mmc_blk_rw_done_success(struct mmc_async_req *areq)
 {
struct mmc_queue_req *mq_rq;
-   struct mmc_blk_request *brq;
struct mmc_blk_data *md;
struct request *old_req;
-   bool req_pending;
int type;
 
mq_rq = container_of(areq, struct mmc_queue_req, areq);
md = mq_rq->mq->blkdata;
-   brq = &mq_rq->brq;
old_req = mq_rq->req;
type = rq_data_dir(old_req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
 
mmc_queue_bounce_post(mq_rq);
mmc_blk_reset_success(md, type);
-   req_pending = blk_end_request(old_req, 0,
- brq->data.bytes_xfered);
-   /*
-* If the blk_end_request function returns non-zero even
-* though all data has been transferred and no errors
-

[PATCH 13/16] mmc: queue: issue struct mmc_queue_req items

2017-02-09 Thread Linus Walleij

Instead of passing two pointers around and messing and reassigning
to the left and right, issue mmc_queue_req and dereference
the queue from the request where needed. The struct mmc_queue_req
is the thing that has a lifecycle after all: this is what we are
keepin in out queue. Augment all users to be passed the struct
mmc_queue_req as well.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 88 
 drivers/mmc/core/block.h |  5 ++-
 drivers/mmc/core/queue.c |  6 ++--
 3 files changed, 50 insertions(+), 49 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 4952a105780e..628a22b9bf41 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1151,9 +1151,9 @@ int mmc_access_rpmb(struct mmc_queue *mq)
return false;
 }
 
-static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_issue_discard_rq(struct mmc_queue_req *mq_rq)
 {
-   struct mmc_blk_data *md = mq->blkdata;
+   struct mmc_blk_data *md = mq_rq->mq->blkdata;
struct mmc_card *card = md->queue.card;
unsigned int from, nr, arg;
int err = 0, type = MMC_BLK_DISCARD;
@@ -1163,8 +1163,8 @@ static void mmc_blk_issue_discard_rq(struct mmc_queue 
*mq, struct request *req)
goto fail;
}
 
-   from = blk_rq_pos(req);
-   nr = blk_rq_sectors(req);
+   from = blk_rq_pos(mq_rq->req);
+   nr = blk_rq_sectors(mq_rq->req);
 
if (mmc_can_discard(card))
arg = MMC_DISCARD_ARG;
@@ -1188,13 +1188,12 @@ static void mmc_blk_issue_discard_rq(struct mmc_queue 
*mq, struct request *req)
if (!err)
mmc_blk_reset_success(md, type);
 fail:
-   blk_end_request(req, err, blk_rq_bytes(req));
+   blk_end_request(mq_rq->req, err, blk_rq_bytes(mq_rq->req));
 }
 
-static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq,
-  struct request *req)
+static void mmc_blk_issue_secdiscard_rq(struct mmc_queue_req *mq_rq)
 {
-   struct mmc_blk_data *md = mq->blkdata;
+   struct mmc_blk_data *md = mq_rq->mq->blkdata;
struct mmc_card *card = md->queue.card;
unsigned int from, nr, arg;
int err = 0, type = MMC_BLK_SECDISCARD;
@@ -1204,8 +1203,8 @@ static void mmc_blk_issue_secdiscard_rq(struct mmc_queue 
*mq,
goto out;
}
 
-   from = blk_rq_pos(req);
-   nr = blk_rq_sectors(req);
+   from = blk_rq_pos(mq_rq->req);
+   nr = blk_rq_sectors(mq_rq->req);
 
if (mmc_can_trim(card) && !mmc_erase_group_aligned(card, from, nr))
arg = MMC_SECURE_TRIM1_ARG;
@@ -1253,12 +1252,12 @@ static void mmc_blk_issue_secdiscard_rq(struct 
mmc_queue *mq,
if (!err)
mmc_blk_reset_success(md, type);
 out:
-   blk_end_request(req, err, blk_rq_bytes(req));
+   blk_end_request(mq_rq->req, err, blk_rq_bytes(mq_rq->req));
 }
 
-static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_issue_flush(struct mmc_queue_req *mq_rq)
 {
-   struct mmc_blk_data *md = mq->blkdata;
+   struct mmc_blk_data *md = mq_rq->mq->blkdata;
struct mmc_card *card = md->queue.card;
int ret = 0;
 
@@ -1266,7 +1265,7 @@ static void mmc_blk_issue_flush(struct mmc_queue *mq, 
struct request *req)
if (ret)
ret = -EIO;
 
-   blk_end_request_all(req, ret);
+   blk_end_request_all(mq_rq->req, ret);
 }
 
 /*
@@ -1614,11 +1613,13 @@ static void mmc_blk_rw_cmd_abort(struct mmc_card *card, 
struct request *req)
  * @mq: the queue with the card and host to restart
  * @req: a new request that want to be started after the current one
  */
-static void mmc_blk_rw_try_restart(struct mmc_queue *mq)
+static void mmc_blk_rw_try_restart(struct mmc_queue_req *mq_rq)
 {
+   struct mmc_queue *mq = mq_rq->mq;
+
/* Proceed and try to restart the current async request */
-   mmc_blk_rw_rq_prep(mq->mqrq_cur, mq->card, 0, mq);
-   mmc_restart_areq(mq->card->host, &mq->mqrq_cur->areq);
+   mmc_blk_rw_rq_prep(mq_rq, mq->card, 0, mq);
+   mmc_restart_areq(mq->card->host, &mq_rq->areq);
 }
 
 void mmc_blk_rw_done(struct mmc_async_req *areq,
@@ -1676,11 +1677,11 @@ void mmc_blk_rw_done(struct mmc_async_req *areq,
req_pending = mmc_blk_rw_cmd_err(md, card, brq, old_req, 
req_pending);
if (mmc_blk_reset(md, host, type)) {
mmc_blk_rw_cmd_abort(card, old_req);
-   mmc_blk_rw_try_restart(mq);
+   mmc_blk_rw_try_restart(mq_rq);
return;
}
if (!req_pending) {
-   mmc_blk_rw_try_restart(mq);
+   mmc_blk_rw_try_restart(mq_rq);

[PATCH 15/16] mmc: queue: issue requests in massive parallel

2017-02-09 Thread Linus Walleij

This makes a crucial change to the issueing mechanism for the
MMC requests:

Before commit "mmc: core: move the asynchronous post-processing"
some parallelism on the read/write requests was achieved by
speculatively postprocessing a request and re-preprocess and
re-issue the request if something went wrong, which we discover
later when checking for an error.

This is kind of ugly. Instead we need a mechanism like here:

We issue requests, and when they come back from the hardware,
we know if they finished successfully or not. If the request
was successful, we complete the asynchronous request and let a
new request immediately start on the hardware. If, and only if,
it returned an error from the hardware we go down the error
path.

This is achieved by splitting the work path from the hardware
in two: a successful path ending up calling down to
mmc_blk_rw_done_success() and an errorpath calling down to
mmc_blk_rw_done_error().

This has a profound effect: we reintroduce the parallelism on
the successful path as mmc_post_req() can now be called in
while the next request is in transit (just like prior to
commit "mmc: core: move the asynchronous post-processing")
but ALSO we can call mmc_queue_bounce_post() and
blk_end_request() in parallel.

The latter has the profound effect of issuing a new request
again so that we actually need to have at least three requests
in transit at the same time: we haven't yet dropped the
reference to our struct mmc_queue_req so we need at least
three. I put the pool to 4 requests for now.

I expect the imrovement to be noticeable on systems that use
bounce buffers since they can now process requests in parallel
with post-processing their bounce buffers, but I don't have a
test target for that.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 61 +---
 drivers/mmc/core/block.h |  4 +++-
 drivers/mmc/core/core.c  | 27 ++---
 drivers/mmc/core/queue.c |  2 +-
 4 files changed, 75 insertions(+), 19 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index acca15cc1807..f1008ce5376b 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1622,8 +1622,51 @@ static void mmc_blk_rw_try_restart(struct mmc_queue_req 
*mq_rq)
mmc_restart_areq(mq->card->host, &mq_rq->areq);
 }
 
-void mmc_blk_rw_done(struct mmc_async_req *areq,
-enum mmc_blk_status status)
+/**
+ * Final handling of an asynchronous request if there was no error.
+ * This is the common path that we take when everything is nice
+ * and smooth. The status from the command is always MMC_BLK_SUCCESS.
+ */
+void mmc_blk_rw_done_success(struct mmc_async_req *areq)
+{
+   struct mmc_queue_req *mq_rq;
+   struct mmc_blk_request *brq;
+   struct mmc_blk_data *md;
+   struct request *old_req;
+   bool req_pending;
+   int type;
+
+   mq_rq = container_of(areq, struct mmc_queue_req, areq);
+   md = mq_rq->mq->blkdata;
+   brq = &mq_rq->brq;
+   old_req = mq_rq->req;
+   type = rq_data_dir(old_req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
+
+   mmc_queue_bounce_post(mq_rq);
+   mmc_blk_reset_success(md, type);
+   req_pending = blk_end_request(old_req, 0,
+ brq->data.bytes_xfered);
+   /*
+* If the blk_end_request function returns non-zero even
+* though all data has been transferred and no errors
+* were returned by the host controller, it's a bug.
+*/
+   if (req_pending) {
+   pr_err("%s BUG rq_tot %d d_xfer %d\n",
+  __func__, blk_rq_bytes(old_req),
+  brq->data.bytes_xfered);
+   return;
+   }
+}
+
+/**
+ * Error, recapture, retry etc for asynchronous requests.
+ * This is the error path that we take when there is bad status
+ * coming back from the hardware and we need to do a bit of
+ * cleverness.
+ */
+void mmc_blk_rw_done_error(struct mmc_async_req *areq,
+  enum mmc_blk_status status)
 {
struct mmc_queue *mq;
struct mmc_queue_req *mq_rq;
@@ -1652,6 +1695,8 @@ void mmc_blk_rw_done(struct mmc_async_req *areq,
 
switch (status) {
case MMC_BLK_SUCCESS:
+   pr_err("%s: MMC_BLK_SUCCESS on error path\n", __func__);
+   /* This should not happen: anyway fall through */
case MMC_BLK_PARTIAL:
/*
 * A block was successfully transferred.
@@ -1660,18 +1705,6 @@ void mmc_blk_rw_done(struct mmc_async_req *areq,
 
req_pending = blk_end_request(old_req, 0,
  brq->data.bytes_xfered);
-   /*
-* If the blk_end_request function returns non-zero even
-* though all data has been

[PATCH 14/16] mmc: queue: get/put struct mmc_queue_req

2017-02-09 Thread Linus Walleij

The per-hardware-transaction struct mmc_queue_req is assigned
from a pool of 2 requests using a current/previous scheme and
then swapped around.

This is confusing, especially if we need more than two to make
our work efficient and parallel.

Rewrite the mechanism to have a pool of struct mmc_queue_req
and take one when we need one and put it back when we don't
need it anymore.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c |  3 +++
 drivers/mmc/core/core.c  |  4 
 drivers/mmc/core/queue.c | 57 ++--
 drivers/mmc/core/queue.h |  8 ---
 4 files changed, 57 insertions(+), 15 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 628a22b9bf41..acca15cc1807 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1797,6 +1797,7 @@ void mmc_blk_issue_rq(struct mmc_queue_req *mq_rq)
card->host->areq = NULL;
}
mmc_blk_issue_discard_rq(mq_rq);
+   mmc_queue_req_put(mq_rq);
} else if (req_op(mq_rq->req) == REQ_OP_SECURE_ERASE) {
/* complete ongoing async transfer before issuing secure erase*/
if (card->host->areq) {
@@ -1804,6 +1805,7 @@ void mmc_blk_issue_rq(struct mmc_queue_req *mq_rq)
card->host->areq = NULL;
}
mmc_blk_issue_secdiscard_rq(mq_rq);
+   mmc_queue_req_put(mq_rq);
} else if (req_op(mq_rq->req) == REQ_OP_FLUSH) {
/* complete ongoing async transfer before issuing flush */
if (card->host->areq) {
@@ -1811,6 +1813,7 @@ void mmc_blk_issue_rq(struct mmc_queue_req *mq_rq)
card->host->areq = NULL;
}
mmc_blk_issue_flush(mq_rq);
+   mmc_queue_req_put(mq_rq);
} else {
mmc_blk_issue_rw_rq(mq_rq);
}
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 03c290e5e2c9..50a8942b98c2 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -39,6 +39,7 @@
 #define CREATE_TRACE_POINTS
 #include 
 
+#include "queue.h"
 #include "block.h"
 #include "core.h"
 #include "card.h"
@@ -598,6 +599,8 @@ void mmc_finalize_areq(struct kthread_work *work)
 {
struct mmc_async_req *areq =
container_of(work, struct mmc_async_req, finalization_work);
+   struct mmc_queue_req *mq_rq = container_of(areq, struct mmc_queue_req,
+  areq);
struct mmc_host *host = areq->host;
enum mmc_blk_status status = MMC_BLK_SUCCESS;
struct mmc_command *cmd;
@@ -636,6 +639,7 @@ void mmc_finalize_areq(struct kthread_work *work)
mmc_blk_rw_done(areq, status);
 
complete(&areq->complete);
+   mmc_queue_req_put(mq_rq);
 }
 EXPORT_SYMBOL(mmc_finalize_areq);
 
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index c4e1ced55796..cab0f51dbb4d 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -49,6 +49,42 @@ static int mmc_prep_request(struct request_queue *q, struct 
request *req)
return BLKPREP_OK;
 }
 
+/**
+ * Get an available queue item from the pool.
+ */
+static struct mmc_queue_req *mmc_queue_req_get(struct mmc_queue *mq)
+{
+   int i;
+
+   /*
+* This simply cannot fail so we just spin here
+* until we get a queue request to work on.
+*/
+   while (1) {
+   /* Just take the first unused queue request */
+   spin_lock_irq(&mq->mqrq_lock);
+   for (i = 0; i < mq->qdepth; i++) {
+   if (!mq->mqrq[i].in_use) {
+   mq->mqrq[i].in_use = true;
+   spin_unlock_irq(&mq->mqrq_lock);
+   return &mq->mqrq[i];
+   }
+   }
+   spin_unlock_irq(&mq->mqrq_lock);
+
+   pr_warn_once("%s: out of queue items, spinning\n", __func__);
+   }
+}
+
+void mmc_queue_req_put(struct mmc_queue_req *mq_rq)
+{
+   mq_rq->brq.mrq.data = NULL;
+   mq_rq->req = NULL;
+   spin_lock_irq(&mq_rq->mq->mqrq_lock);
+   mq_rq->in_use = false;
+   spin_unlock_irq(&mq_rq->mq->mqrq_lock);
+}
+
 static int mmc_queue_thread(void *d)
 {
struct mmc_queue *mq = d;
@@ -62,17 +98,17 @@ static int mmc_queue_thread(void *d)
do {
struct request *req = NULL;
 
-   spin_lock_irq(q->queue_lock);
set_current_state(TASK_INTERRUPTIBLE);
+   spin_lock_irq(q->queue_lock);
req = blk_fetch_request(q);
-   mq->asleep = false;
-   mq_rq = mq->mqrq_cur;
-

[PATCH 11/16] mmc: block: shuffle retry and error handling

2017-02-09 Thread Linus Walleij

Instead of doing retries at the same time as trying to submit new
requests, do the retries when the request is reported as completed
by the driver, in the finalization worker.

This is achieved by letting the core worker call back into the block
layer using mmc_blk_rw_done(), that will read the status and repeatedly
try to hammer the request using single request etc by calling back to
the core layer using mmc_restart_areq()

The beauty of it is that the completion will not complete until the
block layer has had the opportunity to hammer a bit at the card using
a bunch of different approaches in the while() loop in
mmc_blk_rw_done()

The algorithm for recapture, retry and handle errors is essentially
identical to the one we used to have in mmc_blk_issue_rw_rq(),
only augmented to get called in another path.

We have to add and initialize a pointer back to the struct mmc_queue
from the struct mmc_queue_req to find the queue from the asynchronous
request.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 307 +++
 drivers/mmc/core/block.h |   3 +
 drivers/mmc/core/core.c  |  23 +++-
 drivers/mmc/core/queue.c |   2 +
 drivers/mmc/core/queue.h |   1 +
 include/linux/mmc/core.h |   1 +
 include/linux/mmc/host.h |   1 -
 7 files changed, 177 insertions(+), 161 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index c459d80c66bf..0bd9070f5f2e 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1614,182 +1614,181 @@ static void mmc_blk_rw_cmd_abort(struct mmc_card 
*card, struct request *req)
  * @mq: the queue with the card and host to restart
  * @req: a new request that want to be started after the current one
  */
-static void mmc_blk_rw_try_restart(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_rw_try_restart(struct mmc_queue *mq)
 {
-   if (!req)
-   return;
-
-   /*
-* If the card was removed, just cancel everything and return.
-*/
-   if (mmc_card_removed(mq->card)) {
-   req->rq_flags |= RQF_QUIET;
-   blk_end_request_all(req, -EIO);
-   return;
-   }
-   /* Else proceed and try to restart the current async request */
+   /* Proceed and try to restart the current async request */
mmc_blk_rw_rq_prep(mq->mqrq_cur, mq->card, 0, mq);
-   mmc_start_areq(mq->card->host, &mq->mqrq_cur->areq, NULL);
+   mmc_restart_areq(mq->card->host, &mq->mqrq_cur->areq);
 }
 
-static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req)
+void mmc_blk_rw_done(struct mmc_async_req *areq,
+enum mmc_blk_status status)
 {
-   struct mmc_blk_data *md = mq->blkdata;
-   struct mmc_card *card = md->queue.card;
-   struct mmc_blk_request *brq;
-   int disable_multi = 0, retry = 0, type, retune_retry_done = 0;
-   enum mmc_blk_status status;
+   struct mmc_queue *mq;
struct mmc_queue_req *mq_rq;
+   struct mmc_blk_request *brq;
+   struct mmc_blk_data *md;
struct request *old_req;
-   struct mmc_async_req *new_areq;
-   struct mmc_async_req *old_areq;
+   struct mmc_card *card;
+   struct mmc_host *host;
+   int disable_multi = 0, retry = 0, type, retune_retry_done = 0;
bool req_pending = true;
 
-   if (!new_req && !mq->mqrq_prev->req)
-   return;
-
-   do {
-   if (new_req) {
-   /*
-* When 4KB native sector is enabled, only 8 blocks
-* multiple read or write is allowed
-*/
-   if (mmc_large_sector(card) &&
-   !IS_ALIGNED(blk_rq_sectors(new_req), 8)) {
-   pr_err("%s: Transfer size is not 4KB sector 
size aligned\n",
-   new_req->rq_disk->disk_name);
-   mmc_blk_rw_cmd_abort(card, new_req);
-   return;
-   }
-
-   mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
-   new_areq = &mq->mqrq_cur->areq;
-   } else
-   new_areq = NULL;
-
-   old_areq = mmc_start_areq(card->host, new_areq, &status);
-   if (!old_areq) {
-   /*
-* We have just put the first request into the pipeline
-* and there is nothing more to do until it is
-* complete.
-*/
-   return;
-   }
-
+   /*
+* An asynchronous request has been completed and we proceed
+* to handle the result of it.
+*/
+   mq_rq = container_of(areq, struct mmc_queue_req,

[PATCH 01/16] mmc: core: move some code in mmc_start_areq()

2017-02-09 Thread Linus Walleij

"previous" is a better name for the variable storing the previous
asynchronous request, better than the opaque name "data" atleast.
We see that we assign the return status to the returned variable
on all code paths, so we might as well just do that immediately
after calling mmc_finalize_areq().

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/core.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 41b4cd01fccc..53065d1cebf7 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -683,7 +683,7 @@ struct mmc_async_req *mmc_start_areq(struct mmc_host *host,
 {
enum mmc_blk_status status;
int start_err = 0;
-   struct mmc_async_req *data = host->areq;
+   struct mmc_async_req *previous = host->areq;
 
/* Prepare a new request */
if (areq)
@@ -691,13 +691,12 @@ struct mmc_async_req *mmc_start_areq(struct mmc_host 
*host,
 
/* Finalize previous request */
status = mmc_finalize_areq(host);
+   if (ret_stat)
+   *ret_stat = status;
 
/* The previous request is still going on... */
-   if (status == MMC_BLK_NEW_REQUEST) {
-   if (ret_stat)
-   *ret_stat = status;
+   if (status == MMC_BLK_NEW_REQUEST)
return NULL;
-   }
 
/* Fine so far, start the new request! */
if (status == MMC_BLK_SUCCESS && areq)
@@ -716,9 +715,7 @@ struct mmc_async_req *mmc_start_areq(struct mmc_host *host,
else
host->areq = areq;
 
-   if (ret_stat)
-   *ret_stat = status;
-   return data;
+   return previous;
 }
 EXPORT_SYMBOL(mmc_start_areq);
 
-- 
2.9.3

[PATCH 05/16] mmc: core: add a kthread for completing requests

2017-02-09 Thread Linus Walleij

As we want to complete requests autonomously from feeding the
host with new requests, we create a worker thread to deal with
this specifically in response to the callback from a host driver.

This patch just adds the worker, later patches will make use of
it.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/core.c  | 6 ++
 drivers/mmc/core/host.c  | 2 +-
 include/linux/mmc/host.h | 5 +
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 0972c649ea7a..663799240635 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2794,6 +2794,10 @@ void mmc_start_host(struct mmc_host *host)
host->f_init = max(freqs[0], host->f_min);
host->rescan_disable = 0;
host->ios.power_mode = MMC_POWER_UNDEFINED;
+   /* Worker for completing requests */
+   host->req_done_worker_task = kthread_run(kthread_worker_fn,
+ &host->req_done_worker,
+ "mmc%d-reqdone", host->index);
 
if (!(host->caps2 & MMC_CAP2_NO_PRESCAN_POWERUP)) {
mmc_claim_host(host);
@@ -2818,6 +2822,8 @@ void mmc_stop_host(struct mmc_host *host)
 
host->rescan_disable = 1;
cancel_delayed_work_sync(&host->detect);
+   kthread_flush_worker(&host->req_done_worker);
+   kthread_stop(host->req_done_worker_task);
 
/* clear pm flags now and let card drivers set them as needed */
host->pm_flags = 0;
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index 98f25ffb4258..d33e2b260bf3 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -388,7 +388,7 @@ struct mmc_host *mmc_alloc_host(int extra, struct device 
*dev)
init_waitqueue_head(&host->wq);
INIT_DELAYED_WORK(&host->detect, mmc_rescan);
setup_timer(&host->retune_timer, mmc_retune_timer, (unsigned long)host);
-
+   kthread_init_worker(&host->req_done_worker);
/*
 * By default, hosts do not support SGIO or large requests.
 * They have to set these according to their abilities.
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 97699d55b2ae..b04f8cd51c82 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -375,6 +376,10 @@ struct mmc_host {
struct mmc_async_req*areq;  /* active async req */
struct mmc_context_info context_info;   /* async synchronization info */
 
+   /* finalization work thread, handles finalizing requests */
+   struct kthread_worker   req_done_worker;
+   struct task_struct  *req_done_worker_task;
+
/* Ongoing data transfer that allows commands during transfer */
struct mmc_request  *ongoing_mrq;
 
-- 
2.9.3

[PATCH 07/16] mmc: core: do away with is_done_rcv

2017-02-09 Thread Linus Walleij

The "is_done_rcv" in the context info for the host is no longer
needed: it is clear from context (ha!) that as long as we are
waiting for the asynchronous request to come to completion,
we are not done receiving data, and when the finalization work
has run and completed the completion, we are indeed done.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/core.c  | 40 
 include/linux/mmc/host.h |  2 --
 2 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 8ecf61e51662..fcb40ade9b82 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -416,10 +416,8 @@ EXPORT_SYMBOL(mmc_start_bkops);
 static void mmc_wait_data_done(struct mmc_request *mrq)
 {
struct mmc_host *host = mrq->host;
-   struct mmc_context_info *context_info = &host->context_info;
struct mmc_async_req *areq = host->areq;
 
-   context_info->is_done_rcv = true;
/* Schedule a work to deal with finalizing this request */
kthread_queue_work(&host->req_done_worker, &areq->finalization_work);
 }
@@ -551,7 +549,7 @@ EXPORT_SYMBOL(mmc_wait_for_req_done);
 bool mmc_is_req_done(struct mmc_host *host, struct mmc_request *mrq)
 {
if (host->areq)
-   return host->context_info.is_done_rcv;
+   return completion_done(&host->areq->complete);
else
return completion_done(&mrq->completion);
 }
@@ -600,29 +598,24 @@ void mmc_finalize_areq(struct kthread_work *work)
struct mmc_async_req *areq =
container_of(work, struct mmc_async_req, finalization_work);
struct mmc_host *host = areq->host;
-   struct mmc_context_info *context_info = &host->context_info;
enum mmc_blk_status status = MMC_BLK_SUCCESS;
+   struct mmc_command *cmd;
 
-   if (context_info->is_done_rcv) {
-   struct mmc_command *cmd;
-
-   context_info->is_done_rcv = false;
-   cmd = areq->mrq->cmd;
+   cmd = areq->mrq->cmd;
 
-   if (!cmd->error || !cmd->retries ||
-   mmc_card_removed(host->card)) {
-   status = areq->err_check(host->card,
-areq);
-   } else {
-   mmc_retune_recheck(host);
-   pr_info("%s: req failed (CMD%u): %d, retrying...\n",
-   mmc_hostname(host),
-   cmd->opcode, cmd->error);
-   cmd->retries--;
-   cmd->error = 0;
-   __mmc_start_request(host, areq->mrq);
-   return; /* wait for done/new event again */
-   }
+   if (!cmd->error || !cmd->retries ||
+   mmc_card_removed(host->card)) {
+   status = areq->err_check(host->card,
+areq);
+   } else {
+   mmc_retune_recheck(host);
+   pr_info("%s: req failed (CMD%u): %d, retrying...\n",
+   mmc_hostname(host),
+   cmd->opcode, cmd->error);
+   cmd->retries--;
+   cmd->error = 0;
+   __mmc_start_request(host, areq->mrq);
+   return; /* wait for done/new event again */
}
 
mmc_retune_release(host);
@@ -2993,7 +2986,6 @@ void mmc_unregister_pm_notifier(struct mmc_host *host)
 void mmc_init_context_info(struct mmc_host *host)
 {
host->context_info.is_new_req = false;
-   host->context_info.is_done_rcv = false;
host->context_info.is_waiting_last_req = false;
 }
 
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index c5f61f2f2310..cbb40682024a 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -194,12 +194,10 @@ struct mmc_slot {
 
 /**
  * mmc_context_info - synchronization details for mmc context
- * @is_done_rcvwake up reason was done request
  * @is_new_req wake up reason was new request
  * @is_waiting_last_reqmmc context waiting for single running request
  */
 struct mmc_context_info {
-   boolis_done_rcv;
boolis_new_req;
boolis_waiting_last_req;
 };
-- 
2.9.3

[PATCH 09/16] mmc: core: kill off the context info

2017-02-09 Thread Linus Walleij

The last member of the context info: is_waiting_last_req is
just assigned values, never checked. Delete that and the whole
context info as a result.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/bus.c   |  1 -
 drivers/mmc/core/core.c  | 13 -
 drivers/mmc/core/core.h  |  2 --
 drivers/mmc/core/queue.c |  9 +
 include/linux/mmc/host.h |  9 -
 5 files changed, 1 insertion(+), 33 deletions(-)

diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
index 301246513a37..22ed11ac961b 100644
--- a/drivers/mmc/core/bus.c
+++ b/drivers/mmc/core/bus.c
@@ -348,7 +348,6 @@ int mmc_add_card(struct mmc_card *card)
 #ifdef CONFIG_DEBUG_FS
mmc_add_card_debugfs(card);
 #endif
-   mmc_init_context_info(card->host);
 
card->dev.of_node = mmc_of_find_child_device(card->host, 0);
 
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 933a4d1f20d5..4b84f18518ac 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2975,19 +2975,6 @@ void mmc_unregister_pm_notifier(struct mmc_host *host)
 }
 #endif
 
-/**
- * mmc_init_context_info() - init synchronization context
- * @host: mmc host
- *
- * Init struct context_info needed to implement asynchronous
- * request mechanism, used by mmc core, host driver and mmc requests
- * supplier.
- */
-void mmc_init_context_info(struct mmc_host *host)
-{
-   host->context_info.is_waiting_last_req = false;
-}
-
 static int __init mmc_init(void)
 {
int ret;
diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index 8a95c82554be..620bea373c3a 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -90,8 +90,6 @@ void mmc_remove_host_debugfs(struct mmc_host *host);
 void mmc_add_card_debugfs(struct mmc_card *card);
 void mmc_remove_card_debugfs(struct mmc_card *card);
 
-void mmc_init_context_info(struct mmc_host *host);
-
 int mmc_execute_tuning(struct mmc_card *card);
 int mmc_hs200_to_hs400(struct mmc_card *card);
 int mmc_hs400_to_hs200(struct mmc_card *card);
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 63927ffd6825..a845fe8d4fd1 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -53,7 +53,6 @@ static int mmc_queue_thread(void *d)
 {
struct mmc_queue *mq = d;
struct request_queue *q = mq->queue;
-   struct mmc_context_info *cntx = &mq->card->host->context_info;
 
current->flags |= PF_MEMALLOC;
 
@@ -65,15 +64,12 @@ static int mmc_queue_thread(void *d)
set_current_state(TASK_INTERRUPTIBLE);
req = blk_fetch_request(q);
mq->asleep = false;
-   cntx->is_waiting_last_req = false;
if (!req) {
/*
 * Dispatch queue is empty so set flags for
 * mmc_request_fn() to wake us up.
 */
-   if (mq->mqrq_prev->req)
-   cntx->is_waiting_last_req = true;
-   else
+   if (!mq->mqrq_prev->req)
mq->asleep = true;
}
mq->mqrq_cur->req = req;
@@ -123,7 +119,6 @@ static void mmc_request_fn(struct request_queue *q)
 {
struct mmc_queue *mq = q->queuedata;
struct request *req;
-   struct mmc_context_info *cntx;
 
if (!mq) {
while ((req = blk_fetch_request(q)) != NULL) {
@@ -133,8 +128,6 @@ static void mmc_request_fn(struct request_queue *q)
return;
}
 
-   cntx = &mq->card->host->context_info;
-
if (mq->asleep)
wake_up_process(mq->thread);
 }
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 970d7f9b1eba..a7c0ed887391 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -192,14 +192,6 @@ struct mmc_slot {
void *handler_priv;
 };
 
-/**
- * mmc_context_info - synchronization details for mmc context
- * @is_waiting_last_reqmmc context waiting for single running request
- */
-struct mmc_context_info {
-   boolis_waiting_last_req;
-};
-
 struct regulator;
 struct mmc_pwrseq;
 
@@ -373,7 +365,6 @@ struct mmc_host {
struct dentry   *debugfs_root;
 
struct mmc_async_req*areq;  /* active async req */
-   struct mmc_context_info context_info;   /* async synchronization info */
 
/* finalization work thread, handles finalizing requests */
struct kthread_worker   req_done_worker;
-- 
2.9.3

[PATCH 10/16] mmc: queue: simplify queue logic

2017-02-09 Thread Linus Walleij

The if() statment checking if there is no current or previous
request is now just looking ahead at something that will be
concluded a few lines below. Simplify the logic by moving the
assignment of .asleep.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/queue.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index a845fe8d4fd1..bc116709c806 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -64,14 +64,6 @@ static int mmc_queue_thread(void *d)
set_current_state(TASK_INTERRUPTIBLE);
req = blk_fetch_request(q);
mq->asleep = false;
-   if (!req) {
-   /*
-* Dispatch queue is empty so set flags for
-* mmc_request_fn() to wake us up.
-*/
-   if (!mq->mqrq_prev->req)
-   mq->asleep = true;
-   }
mq->mqrq_cur->req = req;
spin_unlock_irq(q->queue_lock);
 
@@ -95,6 +87,7 @@ static int mmc_queue_thread(void *d)
mq->mqrq_prev->req = NULL;
swap(mq->mqrq_prev, mq->mqrq_cur);
} else {
+   mq->asleep = true;
if (kthread_should_stop()) {
set_current_state(TASK_RUNNING);
break;
-- 
2.9.3

[PATCH 06/16] mmc: core: replace waitqueue with worker

2017-02-09 Thread Linus Walleij

The waitqueue in the host context is there to signal back from
mmc_request_done() through mmc_wait_data_done() that the hardware
is done with a command, and when the wait is over, the core
will typically submit the next asynchronous request that is pending
just waiting for the hardware to be available.

This is in the way for letting the mmc_request_done() trigger the
report up to the block layer that a block request is finished.

Re-jig this as a first step, remvoving the waitqueue and introducing
a work that will run after a completed asynchronous request,
finalizing that request, including retransmissions, and eventually
reporting back with a completion and a status code to the
asynchronous issue method.

This had the upside that we can remove the MMC_BLK_NEW_REQUEST
status code and the "new_request" state in the request queue
that is only there to make the state machine spin out
the first time we send a request.

Introduce a workqueue in the host for handling just this, and
then a work and completion in the asynchronous request to deal
with this mechanism.

This is a central change that let us do many other changes since
we have broken the submit and complete code paths in two, and we
can potentially remove the NULL flushing of the asynchronous
pipeline and report block requests as finished directly from
the worker.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c |  7 ++--
 drivers/mmc/core/core.c  | 84 +++-
 drivers/mmc/core/queue.c |  6 
 drivers/mmc/core/queue.h |  1 -
 include/linux/mmc/core.h |  3 +-
 include/linux/mmc/host.h |  7 ++--
 6 files changed, 51 insertions(+), 57 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index c49c90dba839..c459d80c66bf 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1562,6 +1562,8 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
 
mqrq->areq.mrq = &brq->mrq;
mqrq->areq.err_check = mmc_blk_err_check;
+   mqrq->areq.host = card->host;
+   kthread_init_work(&mqrq->areq.finalization_work, mmc_finalize_areq);
 
mmc_queue_bounce_pre(mqrq);
 }
@@ -1672,8 +1674,6 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *new_req)
 * and there is nothing more to do until it is
 * complete.
 */
-   if (status == MMC_BLK_NEW_REQUEST)
-   mq->new_request = true;
return;
}
 
@@ -1811,7 +1811,6 @@ void mmc_blk_issue_rq(struct mmc_queue *mq, struct 
request *req)
goto out;
}
 
-   mq->new_request = false;
if (req && req_op(req) == REQ_OP_DISCARD) {
/* complete ongoing async transfer before issuing discard */
if (card->host->areq)
@@ -1832,7 +1831,7 @@ void mmc_blk_issue_rq(struct mmc_queue *mq, struct 
request *req)
}
 
 out:
-   if ((!req && !mq->new_request) || req_is_special)
+   if (!req || req_is_special)
/*
 * Release host when there are no more requests
 * and after special request(discard, flush) is done.
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 663799240635..8ecf61e51662 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -415,10 +415,13 @@ EXPORT_SYMBOL(mmc_start_bkops);
  */
 static void mmc_wait_data_done(struct mmc_request *mrq)
 {
-   struct mmc_context_info *context_info = &mrq->host->context_info;
+   struct mmc_host *host = mrq->host;
+   struct mmc_context_info *context_info = &host->context_info;
+   struct mmc_async_req *areq = host->areq;
 
context_info->is_done_rcv = true;
-   wake_up_interruptible(&context_info->wait);
+   /* Schedule a work to deal with finalizing this request */
+   kthread_queue_work(&host->req_done_worker, &areq->finalization_work);
 }
 
 static void mmc_wait_done(struct mmc_request *mrq)
@@ -592,43 +595,34 @@ static void mmc_post_req(struct mmc_host *host, struct 
mmc_request *mrq,
  * will return MMC_BLK_SUCCESS if no request was
  * going on.
  */
-static enum mmc_blk_status mmc_finalize_areq(struct mmc_host *host)
+void mmc_finalize_areq(struct kthread_work *work)
 {
+   struct mmc_async_req *areq =
+   container_of(work, struct mmc_async_req, finalization_work);
+   struct mmc_host *host = areq->host;
struct mmc_context_info *context_info = &host->context_info;
-   enum mmc_blk_status status;
-
-   if (!host->areq)
-   return MMC_BLK_SUCCESS;
-
-   while (1) {
-   wait_event_interruptible(context_info->wait,
-   (context_info->is_done_rcv ||
-

[PATCH 12/16] mmc: queue: stop flushing the pipeline with NULL

2017-02-09 Thread Linus Walleij

Remove all the pipeline flush: i.e. repeatedly sending NULL
down to the core layer to flush out asynchronous requests,
and also sending NULL after "special" commands to achieve the
same flush.

Instead: let the "special" commands wait for any ongoing
asynchronous transfers using the completion, and apart from
that expect the core.c and block.c layers to deal with the
ongoing requests autonomously without any "push" from the
queue.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/block.c | 80 +---
 drivers/mmc/core/core.c  | 37 ++
 drivers/mmc/core/queue.c | 18 ---
 include/linux/mmc/core.h |  5 ++-
 4 files changed, 60 insertions(+), 80 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 0bd9070f5f2e..4952a105780e 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1753,42 +1753,27 @@ void mmc_blk_rw_done(struct mmc_async_req *areq,
 
 static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req)
 {
-   enum mmc_blk_status status;
-   struct mmc_async_req *new_areq;
-   struct mmc_async_req *old_areq;
struct mmc_card *card = mq->card;
 
-   if (!new_req && !mq->mqrq_prev->req)
+   if (!new_req) {
+   pr_err("%s: NULL request!\n", __func__);
return;
+   }
 
-   if (new_req) {
-   /*
-* When 4KB native sector is enabled, only 8 blocks
-* multiple read or write is allowed
-*/
-   if (mmc_large_sector(card) &&
-   !IS_ALIGNED(blk_rq_sectors(new_req), 8)) {
-   pr_err("%s: Transfer size is not 4KB sector size 
aligned\n",
-  new_req->rq_disk->disk_name);
-   mmc_blk_rw_cmd_abort(card, new_req);
-   return;
-   }
-
-   mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
-   new_areq = &mq->mqrq_cur->areq;
-   } else
-   new_areq = NULL;
-
-   old_areq = mmc_start_areq(card->host, new_areq, &status);
-   if (!old_areq) {
-   /*
-* We have just put the first request into the pipeline
-* and there is nothing more to do until it is
-* complete.
-*/
+   /*
+* When 4KB native sector is enabled, only 8 blocks
+* multiple read or write is allowed
+*/
+   if (mmc_large_sector(card) &&
+   !IS_ALIGNED(blk_rq_sectors(new_req), 8)) {
+   pr_err("%s: Transfer size is not 4KB sector size aligned\n",
+  new_req->rq_disk->disk_name);
+   mmc_blk_rw_cmd_abort(card, new_req);
return;
}
-   /* FIXME: yes, we just disregard the old_areq */
+
+   mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
+   mmc_start_areq(card->host, &mq->mqrq_cur->areq);
 }
 
 void mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
@@ -1796,48 +1781,39 @@ void mmc_blk_issue_rq(struct mmc_queue *mq, struct 
request *req)
int ret;
struct mmc_blk_data *md = mq->blkdata;
struct mmc_card *card = md->queue.card;
-   bool req_is_special = mmc_req_is_special(req);
-
-   if (req && !mq->mqrq_prev->req)
-   /* claim host only for the first request */
-   mmc_get_card(card);
 
ret = mmc_blk_part_switch(card, md);
if (ret) {
if (req) {
blk_end_request_all(req, -EIO);
}
-   goto out;
+   return;
}
 
if (req && req_op(req) == REQ_OP_DISCARD) {
/* complete ongoing async transfer before issuing discard */
-   if (card->host->areq)
-   mmc_blk_issue_rw_rq(mq, NULL);
+   if (card->host->areq) {
+   wait_for_completion(&card->host->areq->complete);
+   card->host->areq = NULL;
+   }
mmc_blk_issue_discard_rq(mq, req);
} else if (req && req_op(req) == REQ_OP_SECURE_ERASE) {
/* complete ongoing async transfer before issuing secure erase*/
-   if (card->host->areq)
-   mmc_blk_issue_rw_rq(mq, NULL);
+   if (card->host->areq) {
+   wait_for_completion(&card->host->areq->complete);
+   card->host->areq = NULL;
+   }
mmc_blk_issue_secdiscard_rq(mq, req);
} else if (req && req_op(req) == REQ_OP_FLUSH) {
/* complete ongoing async transfer before issuing flu

[PATCH 08/16] mmc: core: do away with is_new_req

2017-02-09 Thread Linus Walleij

The host context member "is_new_req" is only assigned values,
never checked. Delete it.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/core.c  | 1 -
 drivers/mmc/core/queue.c | 5 -
 include/linux/mmc/host.h | 2 --
 3 files changed, 8 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index fcb40ade9b82..933a4d1f20d5 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2985,7 +2985,6 @@ void mmc_unregister_pm_notifier(struct mmc_host *host)
  */
 void mmc_init_context_info(struct mmc_host *host)
 {
-   host->context_info.is_new_req = false;
host->context_info.is_waiting_last_req = false;
 }
 
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 73250ed8f093..63927ffd6825 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -66,7 +66,6 @@ static int mmc_queue_thread(void *d)
req = blk_fetch_request(q);
mq->asleep = false;
cntx->is_waiting_last_req = false;
-   cntx->is_new_req = false;
if (!req) {
/*
 * Dispatch queue is empty so set flags for
@@ -136,10 +135,6 @@ static void mmc_request_fn(struct request_queue *q)
 
cntx = &mq->card->host->context_info;
 
-   if (cntx->is_waiting_last_req) {
-   cntx->is_new_req = true;
-   }
-
if (mq->asleep)
wake_up_process(mq->thread);
 }
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index cbb40682024a..970d7f9b1eba 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -194,11 +194,9 @@ struct mmc_slot {
 
 /**
  * mmc_context_info - synchronization details for mmc context
- * @is_new_req wake up reason was new request
  * @is_waiting_last_reqmmc context waiting for single running request
  */
 struct mmc_context_info {
-   boolis_new_req;
boolis_waiting_last_req;
 };
 
-- 
2.9.3

[PATCH 04/16] mmc: core: move the asynchronous post-processing

2017-02-09 Thread Linus Walleij

This moves the asynchronous post-processing of a request over
to the finalization function.

The patch has a slight semantic change:

Both places will be in the code path for if (host->areq) and
in the same sequence, but before this patch, the next request
was started before performing post-processing.

The effect is that whereas before, the post- and preprocessing
happened after starting the next request, now the preprocessing
will happen after the request is done and before the next has
started which would cut half of the pre/post optimizations out.

The idea is to later move the finalization to a worker started
by mmc_request_done() and introduce a completion where the code
now has a TODO comment so that we can push in a new request
as soon as the host has completed the previous one.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/core.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 8dbed198750f..0972c649ea7a 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -643,6 +643,9 @@ static enum mmc_blk_status mmc_finalize_areq(struct 
mmc_host *host)
mmc_start_bkops(host->card, true);
}
 
+   /* Successfully postprocess the old request at this point */
+   mmc_post_req(host, host->areq->mrq, 0);
+
return status;
 }
 
@@ -687,10 +690,6 @@ struct mmc_async_req *mmc_start_areq(struct mmc_host *host,
if (status == MMC_BLK_SUCCESS && areq)
start_err = __mmc_start_data_req(host, areq->mrq);
 
-   /* Postprocess the old request at this point */
-   if (host->areq)
-   mmc_post_req(host, host->areq->mrq, 0);
-
/* Cancel a prepared request if it was not started. */
if ((status != MMC_BLK_SUCCESS || start_err) && areq)
mmc_post_req(host, areq->mrq, -EINVAL);
-- 
2.9.3

[PATCH 03/16] mmc: core: refactor mmc_request_done()

2017-02-09 Thread Linus Walleij

We have this construction:

if (a && b && !c)
   finalize;
else
   block;
   finalize;

Which is equivalent by boolean logic to:

if (!a || !b || c)
   block;
finalize;

Which is simpler code.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/core.c | 27 ---
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index b2e7a6dfcbf0..8dbed198750f 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -172,14 +172,16 @@ void mmc_request_done(struct mmc_host *host, struct 
mmc_request *mrq)
 
trace_mmc_request_done(host, mrq);
 
-   if (err && cmd->retries && !mmc_card_removed(host->card)) {
-   /*
-* Request starter must handle retries - see
-* mmc_wait_for_req_done().
-*/
-   if (mrq->done)
-   mrq->done(mrq);
-   } else {
+   /*
+* We list various conditions for the command to be considered
+* properly done:
+*
+* - There was no error, OK fine then
+* - We are not doing some kind of retry
+* - The card was removed (...so just complete everything no matter
+*   if there are errors or retries)
+*/
+   if (!err || !cmd->retries || mmc_card_removed(host->card)) {
mmc_should_fail_request(host, mrq);
 
if (!host->ongoing_mrq)
@@ -211,10 +213,13 @@ void mmc_request_done(struct mmc_host *host, struct 
mmc_request *mrq)
mrq->stop->resp[0], mrq->stop->resp[1],
mrq->stop->resp[2], mrq->stop->resp[3]);
}
-
-   if (mrq->done)
-   mrq->done(mrq);
}
+   /*
+* Request starter must handle retries - see
+* mmc_wait_for_req_done().
+*/
+   if (mrq->done)
+   mrq->done(mrq);
 }
 
 EXPORT_SYMBOL(mmc_request_done);
-- 
2.9.3

[PATCH 02/16] mmc: core: refactor asynchronous request finalization

2017-02-09 Thread Linus Walleij

mmc_wait_for_data_req_done() is called in exactly one place,
and having it spread out is making things hard to oversee.
Factor this function into mmc_finalize_areq().

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/core.c | 86 +++--
 1 file changed, 33 insertions(+), 53 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 53065d1cebf7..b2e7a6dfcbf0 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -485,56 +485,6 @@ static int __mmc_start_req(struct mmc_host *host, struct 
mmc_request *mrq)
return err;
 }
 
-/*
- * mmc_wait_for_data_req_done() - wait for request completed
- * @host: MMC host to prepare the command.
- * @mrq: MMC request to wait for
- *
- * Blocks MMC context till host controller will ack end of data request
- * execution or new request notification arrives from the block layer.
- * Handles command retries.
- *
- * Returns enum mmc_blk_status after checking errors.
- */
-static enum mmc_blk_status mmc_wait_for_data_req_done(struct mmc_host *host,
- struct mmc_request *mrq)
-{
-   struct mmc_command *cmd;
-   struct mmc_context_info *context_info = &host->context_info;
-   enum mmc_blk_status status;
-
-   while (1) {
-   wait_event_interruptible(context_info->wait,
-   (context_info->is_done_rcv ||
-context_info->is_new_req));
-
-   if (context_info->is_done_rcv) {
-   context_info->is_done_rcv = false;
-   cmd = mrq->cmd;
-
-   if (!cmd->error || !cmd->retries ||
-   mmc_card_removed(host->card)) {
-   status = host->areq->err_check(host->card,
-  host->areq);
-   break; /* return status */
-   } else {
-   mmc_retune_recheck(host);
-   pr_info("%s: req failed (CMD%u): %d, 
retrying...\n",
-   mmc_hostname(host),
-   cmd->opcode, cmd->error);
-   cmd->retries--;
-   cmd->error = 0;
-   __mmc_start_request(host, mrq);
-   continue; /* wait for done/new event again */
-   }
-   }
-
-   return MMC_BLK_NEW_REQUEST;
-   }
-   mmc_retune_release(host);
-   return status;
-}
-
 void mmc_wait_for_req_done(struct mmc_host *host, struct mmc_request *mrq)
 {
struct mmc_command *cmd;
@@ -639,14 +589,44 @@ static void mmc_post_req(struct mmc_host *host, struct 
mmc_request *mrq,
  */
 static enum mmc_blk_status mmc_finalize_areq(struct mmc_host *host)
 {
+   struct mmc_context_info *context_info = &host->context_info;
enum mmc_blk_status status;
 
if (!host->areq)
return MMC_BLK_SUCCESS;
 
-   status = mmc_wait_for_data_req_done(host, host->areq->mrq);
-   if (status == MMC_BLK_NEW_REQUEST)
-   return status;
+   while (1) {
+   wait_event_interruptible(context_info->wait,
+   (context_info->is_done_rcv ||
+context_info->is_new_req));
+
+   if (context_info->is_done_rcv) {
+   struct mmc_command *cmd;
+
+   context_info->is_done_rcv = false;
+   cmd = host->areq->mrq->cmd;
+
+   if (!cmd->error || !cmd->retries ||
+   mmc_card_removed(host->card)) {
+   status = host->areq->err_check(host->card,
+  host->areq);
+   break; /* return status */
+   } else {
+   mmc_retune_recheck(host);
+   pr_info("%s: req failed (CMD%u): %d, 
retrying...\n",
+   mmc_hostname(host),
+   cmd->opcode, cmd->error);
+   cmd->retries--;
+   cmd->error = 0;
+   __mmc_start_request(host, host->areq->mrq);
+   continue; /* wait for done/new event again */
+   }
+   }
+
+   return MMC_BLK_NEW_REQUEST;
+   }
+
+   mmc_retune_release(host);
 
/*
 * Check BKOPS urgency for each R1 response
-- 
2.9.3

Re: [PATCH 00/16] multiqueue for MMC/SD third try

2017-02-12 Thread Linus Walleij

On Sat, Feb 11, 2017 at 2:03 PM, Avri Altman  wrote:
>>
>> The iozone results seem a bit consistent and all values seem to be noisy and
>> not say much. I don't know why really, maybe the test is simply not relevant,
>> the tests don't seem to be significantly affected by any of the patches, so
>> let's focus on the dd and find tests.
>
> Maybe use a more selective testing mode instead of -az.
> Also maybe you want to clear the cache between the sequential and random 
> tests:
> #sync
> #echo 3 > /proc/sys/vm/drop_caches
> #sync
> It helps to obtain a more robust results.

OK I'll try that. I actually cold booted the system between each test to
avoid cache effects.

>> What immediately jumps out at you is that linear read/writes perform just as
>> nicely or actually better with MQ than with the old block layer.
>
> How come 22.7MB/s before vs. 22.1MB/s after is better?  or did I 
> misunderstand the output?
> Also as dd is probably using the buffer cache, unlike the iozone test  in 
> which you properly used -I
> for direct mode to isolate the blk-mq effect - does it really say much?

Sorry I guess I was a bit too enthusiastic there. The difference is in
the error margin, it is just based on a single test. I guess I should re-run
them with a few iterations, then drop caches iterate drop caches iterate
and get some more stable figures.

We need to understand what is meant by "better" too:
quicker compared to wall clock time (real), user
or sys.

So for the dd command:

real user sys
Before patches: 45.130.02 7.60
Move asynch pp  52.170.01 6.96
Issue in parallel   49.310.00 7.11
Multiqueue  46.250.03 6.42

For these pure kernel patches only the last figure (sys) is really relevant
IIUC. The other figures are just system noise, but still the eventual
throughput figure from dd is including the time spent on other processes
in the system etc, so that value is not relevant.

But I guess Paolo may need to beat me up a bit here: what the user
percieves in the end if of course the most relevant for any human ...

Nevertheless if we just look at sys then MQ is already winning this test.
I just think there is too little tested here.

I think 1GiB is maybe too little. Maybe I need to read the entire card
a few times or something?

Since dd is just using sequenctially blocks from mmcblk0 on a cold
booted system I think the buffer cache is empty except for maybe
the partition table blocks. But I dunno. I will use your trick the next
time to drop caches.

Yours,
Linus Walleij

[PATCH] mmc: core: make block layer non-optional

2017-02-15 Thread Linus Walleij

I do not know at what point people have wanted to have a system
with MMC/SD support without the block layer. We are anyway now
so tightly integrated with the block layer that this is onlt
teoretical and it makes no sense to have the block layer interface
as optional.

Signed-off-by: Linus Walleij 
---
 drivers/mmc/core/Kconfig  | 12 
 drivers/mmc/core/Makefile |  5 ++---
 2 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/mmc/core/Kconfig b/drivers/mmc/core/Kconfig
index cdfa8520a4b1..2920bc4351ed 100644
--- a/drivers/mmc/core/Kconfig
+++ b/drivers/mmc/core/Kconfig
@@ -23,19 +23,8 @@ config PWRSEQ_SIMPLE
  This driver can also be built as a module. If so, the module
  will be called pwrseq_simple.
 
-config MMC_BLOCK
-   tristate "MMC block device driver"
-   depends on BLOCK
-   default y
-   help
- Say Y here to enable the MMC block device driver support.
- This provides a block device driver, which you can use to
- mount the filesystem. Almost everyone wishing MMC support
- should say Y or M here.
-
 config MMC_BLOCK_MINORS
int "Number of minors per block device"
-   depends on MMC_BLOCK
range 4 256
default 8
help
@@ -53,7 +42,6 @@ config MMC_BLOCK_MINORS
 
 config MMC_BLOCK_BOUNCE
bool "Use bounce buffer for simple hosts"
-   depends on MMC_BLOCK
default y
help
  SD/MMC is a high latency protocol where it is crucial to
diff --git a/drivers/mmc/core/Makefile b/drivers/mmc/core/Makefile
index b2a257dc644f..42f0d781d911 100644
--- a/drivers/mmc/core/Makefile
+++ b/drivers/mmc/core/Makefile
@@ -7,12 +7,11 @@ mmc_core-y:= core.o bus.o host.o \
   mmc.o mmc_ops.o sd.o sd_ops.o \
   sdio.o sdio_ops.o sdio_bus.o \
   sdio_cis.o sdio_io.o sdio_irq.o \
-  quirks.o slot-gpio.o
+  quirks.o slot-gpio.o \
+  block.o queue.o
 mmc_core-$(CONFIG_OF)  += pwrseq.o
 obj-$(CONFIG_PWRSEQ_SIMPLE)+= pwrseq_simple.o
 obj-$(CONFIG_PWRSEQ_EMMC)  += pwrseq_emmc.o
 mmc_core-$(CONFIG_DEBUG_FS)+= debugfs.o
-obj-$(CONFIG_MMC_BLOCK)+= mmc_block.o
-mmc_block-objs := block.o queue.o
 obj-$(CONFIG_MMC_TEST) += mmc_test.o
 obj-$(CONFIG_SDIO_UART)+= sdio_uart.o
-- 
2.9.3

Some throughput tests with MQ and BFQ on MMC/SD

2017-02-17 Thread Linus Walleij

pare this to the performance change we got when first introducing
the asynchronous requests:
https://wiki.linaro.org/WorkingGroups/KernelArchived/Specs/StoragePerfMMC-async-req

The patches need some issues fixed from the build server
complaints and some robustness hammering, but after that I
think they will be ripe for merging for v4.12.

Yours,
Linus Walleij

Re: Some throughput tests with MQ and BFQ on MMC/SD

2017-02-17 Thread Linus Walleij

On Fri, Feb 17, 2017 at 12:53 PM, Ziji Hu  wrote:

> I would like to suggest that you should try the multiple thread
> test mode of iozone, since you are testing *Multi* Queue.

Good point. This target has only 2 CPUs but still, maybe it performs!

> Besides, it seems that your eMMC transfer speed is quite low.
> It is normal that read speed can reach more than 100MB/s in HS400.
> Could you try a higher speed mode? The test result might be
> limited by the bus clock frequency.

The iozone tests are done on an SDcard. And I only did read tests on
the eMMC I have.

It's because I'm afriad of wearing out my eMMC :(

But OK I'll just take the risk and run iozone on the eMMC.

> Actually I have been following your thread for some time.
> But currently I'm a little confused.
> May I know the purpose of your patch?

Ulf describes it: we want to switch MMC/SD to MQ.

To me, there are two reasons for that (no secret agendas...)

1. To get away from the legacy codebase in the old block layer.
   Christoph and Jens have been very clear stating that the old block
   layer is in maintenance mode and should be phased out, and they
   asked explicitly for out help to do so. Currently
   MMC/SD is a big fat roadblock to these plans so it is win-win for
   MMC/SD and the block layer if we can just switch over to MQ.

2. My colleague Paolo Valente is working on the next generation
  block scheduler BFQ which has very promising potential for
  interactive loads. (Like taking a backup of your harddrive while
  playing 1080p video let's say.) Since the
  old block layer is no longer maintained, this scheduler will only
  be merged and made available for systems deploying MQ. He's
  already working full steam on that.

I would like to make 1+2 happen in the next merge window
ultimately, but yeah, maybe I'm overly optimistic. But I will sure
try.

Maybe I should add:

3. MQ is a better and future-proof fit for command queueing.

Yours,
Linus Walleij

Re: Some throughput tests with MQ and BFQ on MMC/SD

2017-02-20 Thread Linus Walleij

On Mon, Feb 20, 2017 at 9:03 AM, Adrian Hunter  wrote:

> MQ is not better - it is just different.

Well it is better in the sense that it has active maintainers and is
not scheduled
for depreciation.

> Because mmc devices do not have
> multiple hardware queues, blk-mq essentially offers nothing but a different
> way of doing the same thing.

I think what Ziji is pointing out is the hourglass-shaped structure of MQ.
It has multiple *issue* queues as well, not just multiple *hardware*
queues. That means that processes can have issue queues on different
CPUs and not all requests end up in a single nexus like with the old blk
layer.

Whether it benefits MMC/SD in the end is a good question. It might,
testing on multicores with multiple issue threads is needed.

> It would be better if blk-mq support was experimental until we can see how
> well it works in practice.

Do you mean experimental in the MMC/SD stack, such that we should
merge it as an additional scheduler instead of as the only scheduler
replacement?

I think SCSI did/still does things like that. On the other hand, UBI
just replaced the old block layer with MQ in commit
ff1f48ee3bb3, and it is also very widely
used, so there are example of both approaches. (How typical.)

Yours,
Linus Walleij

Re: [PATCH RFC 22/39] mmc: block: Prepare CQE data

2017-03-09 Thread Linus Walleij

On Fri, Mar 3, 2017 at 1:22 PM, Adrian Hunter  wrote:
> On 15/02/17 15:49, Linus Walleij wrote:
>> On Fri, Feb 10, 2017 at 1:55 PM, Adrian Hunter  
>> wrote:
>>
>>> Enhance mmc_blk_data_prep() to support CQE requests.
>>>
>>> Signed-off-by: Adrian Hunter 
>>
>> Hey:
>>
>>> +#include 
>> (...)
>>> +   if (IOPRIO_PRIO_CLASS(req_get_ioprio(req)) == IOPRIO_CLASS_RT)
>>> +   brq->data.flags |= MMC_DATA_PRIO;
>>
>> What is this?
>
> It is the command queue priority.
>
> The command queue supports 2 priorities: "high" (1) and "simple" (0).  The
> eMMC will give "high" priority tasks priority over "simple" priority tasks.
>
> So here we give priority to IOPRIO_CLASS_RT which seems appropriate.

So if I understand correctly, you are obtaining the block layer scheduling
priorities, that can (only?) be set by ionice has from the command line?

We need to discuss this with the block maintainers.

I'm not so sure about the future of this. The IOPRIO is only used with the CFQ
scheduler, only two other sites in the kernel use this and MQ and its schedulers
surely does not have ionice handling as far as I know.

The BFQ does not use it, AFAIK it is using different heuristics to prioritize
block traffic, and that does not include using ionice hints.

Is ionice:ing something we're really going to do going forward?
Should this be repurposed so that the block scheduler use this prio to
communicate to the driver layer to prioritize certain traffic?

Yours,
Linus Walleij

Re: [PATCH 06/16] mmc: core: replace waitqueue with worker

2017-03-09 Thread Linus Walleij

On Wed, Feb 22, 2017 at 2:29 PM, Adrian Hunter  wrote:
> On 09/02/17 17:33, Linus Walleij wrote:
>> The waitqueue in the host context is there to signal back from
>> mmc_request_done() through mmc_wait_data_done() that the hardware
>> is done with a command, and when the wait is over, the core
>> will typically submit the next asynchronous request that is pending
>> just waiting for the hardware to be available.
>>
>> This is in the way for letting the mmc_request_done() trigger the
>> report up to the block layer that a block request is finished.
>>
>> Re-jig this as a first step, remvoving the waitqueue and introducing
>> a work that will run after a completed asynchronous request,
>> finalizing that request, including retransmissions, and eventually
>> reporting back with a completion and a status code to the
>> asynchronous issue method.
>>
>> This had the upside that we can remove the MMC_BLK_NEW_REQUEST
>> status code and the "new_request" state in the request queue
>> that is only there to make the state machine spin out
>> the first time we send a request.
>>
>> Introduce a workqueue in the host for handling just this, and
>> then a work and completion in the asynchronous request to deal
>> with this mechanism.
>>
>> This is a central change that let us do many other changes since
>> we have broken the submit and complete code paths in two, and we
>> can potentially remove the NULL flushing of the asynchronous
>> pipeline and report block requests as finished directly from
>> the worker.
>
> This needs more thought.  The completion should go straight to the mmc block
> driver from the ->done() callback.  And from there straight back to the
> block layer if recovery is not needed.  We want to stop using
> mmc_start_areq() altogether because we never want to wait - we always want
> to issue (if possible) and return.

I don't quite follow this. Isn't what you request exactly what
patch 15/16 "mmc: queue: issue requests in massive parallel"
is doing?

The whole patch series leads up to that.

> The core API to use is __mmc_start_req() but the block driver should
> populate mrq->done with its own handler. i.e. change __mmc_start_req()
>
> -   mrq->done = mmc_wait_done;
> +   if (!mrq->done)
> +   mrq->done = mmc_wait_done;
>
> mrq->done() would complete the request (e.g. via blk_complete_request()) if
> it has no errors (and doesn't need polling), and wake up the queue thread to
> finish up everything else and start the next request.

I think this is what it does at the end of the patch series, patch 15/16.
I have to split it somehow...

> For the blk-mq port, the queue thread should also be retained, partly
> because it solves some synchronization problems, but mostly because, at this
> stage, we anyway don't have solutions for all the different ways the driver
> can block.
> (as listed here https://marc.info/?l=linux-mmc&m=148336571720463&w=2 )

Essentially I take out that thread and replace it with this one worker
introduced in this very patch. I agree the driver can block in many ways
and that is why I need to have it running in process context, and this
is what the worker introduced here provides.

Maybe I'm getting something backwards, sorry then :/

Yours,
Linus Walleij

Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-18 Thread Linus Walleij

On Sat, Mar 18, 2017 at 11:52 AM, Paolo Valente
 wrote:
>> Il giorno 14 mar 2017, alle ore 16:32, Bart Van Assche 
>>  ha scritto:

>> (...) what should
>> a developer do who only has access to a small subset of all the storage
>> devices that are supported by the Linux kernel and hence who can not run the
>> benchmark against every supported storage device?

Don't we use the community for that? We are dependent on people
downloading and testing our code eventually, I mean sure it's good if
we make some reasonable effort to test changes we do, but we are
only humans, and we get corrected by the experience of other humans.

>> Do developers who do not
>> fully understand the BFQ algorithms and who run into a performance problem
>> have any other option than trial and error for fixing such performance 
>> issues?
>
> Hi Bart,
> maybe I got your point even before, but I did not reply consistently.
> You are highlighting an important problem, which, I think, can be
> stated in more general terms: if one makes a change in any complex
> component, which, in its turn, interacts with complex I/O devices,
> then it is hard, if ever possible, to prove, that that change will
> cause no regression with any possible device, just by speculation.
> Actually, facts show that this often holds even for simple components,
> given the complexity of the environment in which they work.  Of
> course, if not only the component is complex, but who modifies it does
> not even fully understand how that component works, then regressions
> on untested devices are certainly more probable.

You are running a host of benchmarks on a host of devices, using
the fio tool that Jens devised for this kind of tests. What more can be
asked? More tests, more devices?

If you increase the amount of proof that is requested for any change
to any computer program not to cause unintended side effects or
regressions, you will eventually end up with the brick wall
"solve the halting problem".

Alternatively "test it forever on all systems in the world".

It eventually becomes absurd.

This actually occurred to me .. in a certain mission-critical algorithm
my department was requested to "prove that this will run to completion".
I was baffled and said that what they were requesting was that I
solve the halting problem. It turned out they just wanted something like
a comprehensible test suite.

Yours,
Linus Walleij

Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-18 Thread Linus Walleij

On Sat, Mar 18, 2017 at 6:46 PM, Bart Van Assche
 wrote:
> On Sat, 2017-03-18 at 18:09 +0100, Linus Walleij wrote:
>> On Sat, Mar 18, 2017 at 11:52 AM, Paolo Valente
>>  wrote:
>> > > Il giorno 14 mar 2017, alle ore 16:32, Bart Van Assche 
>> > >  ha scritto:
>> > > (...) what should
>> > > a developer do who only has access to a small subset of all the storage
>> > > devices that are supported by the Linux kernel and hence who can not run 
>> > > the
>> > > benchmark against every supported storage device?
>>
>> Don't we use the community for that? We are dependent on people
>> downloading and testing our code eventually, I mean sure it's good if
>> we make some reasonable effort to test changes we do, but we are
>> only humans, and we get corrected by the experience of other humans.
>
> Hello Linus,
>
> Do you mean relying on the community to test other storage devices before
> or after a patch is upstream?

I guess they should test it when it is in linux-next?

> Relying on the community to file bug reports
> after a patch is upstream would be wrong. The Linux kernel should not be
> used for experiments. As you know patches that are sent upstream should
> not introduce regressions.

Yeah still people introduce regressions and we have 7-8 release
candidates for each kernel because of this. Humans have flaws
I guess.

> My primary concern about BFQ is that it is a very complicated I/O scheduler
> and also that the concepts used internally in that I/O scheduler are far
> away from the concepts we are used to when reasoning about I/O devices. I'm
> concerned that this will make the BFQ I/O scheduler hard to maintain.

I understand that. Let's follow all rules of thumb that make code
easy to maintain.

We have pretty broad agreement on what makes code easy to
maintain on the syntactic level, checkpatch and some manual inspection
easily gives that.

I think where we need the most brainshare is in how to make semantics
maintainable. It's no fun reading terse code and try to figure out how
the developer writing it was thinking, so let's focus on anything we do not
understand and make it understandable, it seems Paolo is onto this task
for what I can tell.

Yours,
Linus Walleij

1 2 >

1 - 100 of 198 matches

Mail list logo