Re: [PATCH V12 0/5] mmc: Add Command Queue support

2017-10-26 Thread Linus Walleij
On Thu, Oct 26, 2017 at 3:49 PM, Adrian Hunter  wrote:
> On 26/10/17 16:32, Linus Walleij wrote:

>> My patch series switches the stack around to make it possible
>> to do this. But it doesn't go the whole way to complete the requests
>> from interrupt context.
>>
>> Since we have to send commands for retune etc request finalization
>> cannot easily be done from interrupt context.
>
> Re-tuning and background operations are rare and slow, so there is no reason
> to try to start them from interrupt context.

OK I will try to get them out of the way and see what happens,
hehe :)

What I mean is that we were checking - on every command -
if BKOPS or retune needs to happen. And then doing it. Thus
all was done in process context.

>> But I am thinking about testing to hack it
>> using some ugly approaches ... like assuming we don't need any
>> retune etc and just say all is fine and optimistically complete the
>> request directly in the interrupt handler if all was OK and wait
>> for errors to happen before retuning.
>
> It already works that way.  Re-tuning happens before you start a request.
> We prevent re-tuning in between dependent requests, like between starting a
> transfer and CMD13 polling for completion.

Ah that is what these if()s do ... right. I'll see if I can get around
this then.

Yours,
Linus Walleij


Re: [PATCH V12 0/5] mmc: Add Command Queue support

2017-10-26 Thread Adrian Hunter
On 26/10/17 16:32, Linus Walleij wrote:
> On Tue, Oct 24, 2017 at 10:40 AM, Adrian Hunter  
> wrote:
> 
>> Here is V12 of the hardware command queue patches without the software
>> command queue patches, now using blk-mq and now with blk-mq support for
>> non-CQE I/O.
> 
> Since I had my test setup going I gave this a spin with the same set
> of tests that I used before/after my MQ patches.
> 
> It is using the same setup and same eMMC, but I hade to rebase onto
> Ulf's very latest next branch to apply your patches.
> 
> I default-enabled multiqueue.
> 
> Results:
> 
> sync
> echo 3 > /proc/sys/vm/drop_caches
> sync
> time dd if=/dev/mmcblk3 of=/dev/null bs=1M count=1024
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.0GB) copied, 24.251922 seconds, 42.2MB/s
> real0m 24.25s
> user0m 0.03s
> sys 0m 3.80s
> 
> mount /dev/mmcblk3p1 /mnt/
> cd /mnt/
> sync
> echo 3 > /proc/sys/vm/drop_caches
> sync
> time find . > /dev/null
> real0m 3.24s
> user0m 0.22s
> sys 0m 1.23s
> 
> sync
> echo 3 > /proc/sys/vm/drop_caches
> sync
> iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test
> 
>randomrandom
>kB  reclenwrite  rewritereadrereadread write
> 20480   4 1615 1571 6612 6714 6494  531
> 20480   8 2143 2295115591156311499 1164
> 20480  16 3894 4202178261782317755 1369
> 20480  32 5816 7489237412375923709 3016
> 20480  64 7393 9167275322752627502 3591
> 20480 128 7328 8097291842916129159 5592
> 20480 256 7194 8752294242943429424 6700
> 20480 512 8984 9930299032991129909 7420
> 204801024 7072 7446276842768527681 7444
> 204802048 6840 8199273982742027418 6766
> 204804096 8137 6805280912808928093 8209
> 204808192 7255 7485283862838428383 7479
> 20480   16384 7078 7448285842858528585 7447
> 
> In short: no performance regressions.

You really need to test cards that are fast.  A decent UHS-I SD card can do
over 80 MB/s for reads and of course HS400 eMMC can do over 300 MB/s.

> 
> Performance-wise this is on par with my own patch set for MQ.
> 
> As you know my pet peeve is "enable MQ by default" and I see no
> reason from a performance perspective not to enable MQ by default
> on this patch set or mine for that matter.

That is a side-issue.  A single small patch can change that.

> 
>> While we should look at changing blk-mq to give better workqueue performance,
>> a bigger gain is likely to be made by adding a new host API to enable the
>> next already-prepared request to be issued directly from within ->done()
>> callback of the current request.
> 
> My patch series switches the stack around to make it possible
> to do this. But it doesn't go the whole way to complete the requests
> from interrupt context.
> 
> Since we have to send commands for retune etc request finalization
> cannot easily be done from interrupt context.

Re-tuning and background operations are rare and slow, so there is no reason
to try to start them from interrupt context.

> 
> But I am thinking about testing to hack it
> using some ugly approaches ... like assuming we don't need any
> retune etc and just say all is fine and optimistically complete the
> request directly in the interrupt handler if all was OK and wait
> for errors to happen before retuning.

It already works that way.  Re-tuning happens before you start a request.
We prevent re-tuning in between dependent requests, like between starting a
transfer and CMD13 polling for completion.


Re: [PATCH V12 0/5] mmc: Add Command Queue support

2017-10-26 Thread Linus Walleij
On Tue, Oct 24, 2017 at 10:40 AM, Adrian Hunter  wrote:

> Here is V12 of the hardware command queue patches without the software
> command queue patches, now using blk-mq and now with blk-mq support for
> non-CQE I/O.

Since I had my test setup going I gave this a spin with the same set
of tests that I used before/after my MQ patches.

It is using the same setup and same eMMC, but I hade to rebase onto
Ulf's very latest next branch to apply your patches.

I default-enabled multiqueue.

Results:

sync
echo 3 > /proc/sys/vm/drop_caches
sync
time dd if=/dev/mmcblk3 of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 24.251922 seconds, 42.2MB/s
real0m 24.25s
user0m 0.03s
sys 0m 3.80s

mount /dev/mmcblk3p1 /mnt/
cd /mnt/
sync
echo 3 > /proc/sys/vm/drop_caches
sync
time find . > /dev/null
real0m 3.24s
user0m 0.22s
sys 0m 1.23s

sync
echo 3 > /proc/sys/vm/drop_caches
sync
iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test

   randomrandom
   kB  reclenwrite  rewritereadrereadread write
20480   4 1615 1571 6612 6714 6494  531
20480   8 2143 2295115591156311499 1164
20480  16 3894 4202178261782317755 1369
20480  32 5816 7489237412375923709 3016
20480  64 7393 9167275322752627502 3591
20480 128 7328 8097291842916129159 5592
20480 256 7194 8752294242943429424 6700
20480 512 8984 9930299032991129909 7420
204801024 7072 7446276842768527681 7444
204802048 6840 8199273982742027418 6766
204804096 8137 6805280912808928093 8209
204808192 7255 7485283862838428383 7479
20480   16384 7078 7448285842858528585 7447

In short: no performance regressions.

Performance-wise this is on par with my own patch set for MQ.

As you know my pet peeve is "enable MQ by default" and I see no
reason from a performance perspective not to enable MQ by default
on this patch set or mine for that matter.

> While we should look at changing blk-mq to give better workqueue performance,
> a bigger gain is likely to be made by adding a new host API to enable the
> next already-prepared request to be issued directly from within ->done()
> callback of the current request.

My patch series switches the stack around to make it possible
to do this. But it doesn't go the whole way to complete the requests
from interrupt context.

Since we have to send commands for retune etc request finalization
cannot easily be done from interrupt context.

But I am thinking about testing to hack it
using some ugly approaches ... like assuming we don't need any
retune etc and just say all is fine and optimistically complete the
request directly in the interrupt handler if all was OK and wait
for errors to happen before retuning.

Yours,
Linus Walleij