[dm-devel] [RFC PATCH 0/4] dm mpath: vastly improve blk-mq IO performance

2016-03-31 Thread Mike Snitzer
I developed these changes some weeks ago but have since focused on
regression and performance testing on larger NUMA systems.

For regression testing I've been using mptest:
https://github.com/snitm/mptest

For performance testing I've been using a null_blk device (with
various configuration permutations, e.g. pinning memory to a
particular NUMA node, and varied number of submit_queues).

By eliminating multipath's heavy use of the m->lock spinlock in the
fast IO paths serious performance improvements are realized.

Overview of performance test setup:
===

NULL_BLK_HW_QUEUES=12
NULL_BLK_QUEUE_DEPTH=4096

DM_MQ_HW_QUEUES=12
DM_MQ_QUEUE_DEPTH=2048

FIO_QUEUE_DEPTH=32
FIO_RUNTIME=10
FIO_NUMJOBS=12

NID=0

run_fio() {
DEVICE=$1
TASK_NAME=$(basename ${DEVICE})
PERF_RECORD=$2
RUN_CMD="${FIO} --numa_cpu_nodes=${NID} --numa_mem_policy=bind:${NID} 
--cpus_allowed_policy=split --group_reporting --rw=randread --bs=4k 
--numjobs=${FIO_NUMJOBS} \
  --iodepth=${FIO_QUEUE_DEPTH} --runtime=${FIO_RUNTIME} 
--time_based --loops=1 --ioengine=libaio \
  --direct=1 --invalidate=1 --randrepeat=1 --norandommap --exitall 
--name task_${TASK_NAME} --filename=${DEVICE}"
${RUN_CMD}
}

modprobe null_blk gb=4 bs=512 hw_queue_depth=${NULL_BLK_QUEUE_DEPTH} 
nr_devices=1 queue_mode=2 irqmode=1 completion_nsec=1 
submit_queues=${NULL_BLK_HW_QUEUES}
run_fio /dev/nullb0

echo ${NID} > /sys/module/dm_mod/parameters/dm_numa_node
echo Y > /sys/module/dm_mod/parameters/use_blk_mq
echo ${DM_MQ_QUEUE_DEPTH} > /sys/module/dm_mod/parameters/dm_mq_queue_depth
echo ${DM_MQ_HW_QUEUES} > /sys/module/dm_mod/parameters/dm_mq_nr_hw_queues
echo "0 8388608 multipath 0 0 1 1 service-time 0 1 2 /dev/nullb0 1 1" | dmsetup 
create dm_mq
run_fio /dev/mapper/dm_mq
dmsetup remove dm_mq

echo "0 8388608 multipath 0 0 1 1 queue-length 0 1 1 /dev/nullb0 1" | dmsetup 
create dm_mq
run_fio /dev/mapper/dm_mq
dmsetup remove dm_mq

echo "0 8388608 multipath 0 0 1 1 round-robin 0 1 1 /dev/nullb0 1" | dmsetup 
create dm_mq
run_fio /dev/mapper/dm_mq
dmsetup remove dm_mq

Test results on 4 NUMA node 192-way x86_64 system with 524G of memory:
==

Big picture is the move to lockless really helps.

round-robin's repeat_count and percpu current_path code (went upstream
during 4.6 merge) seems to _really_ help (even if repeat_count is 1, as
is the case in all these results).

Below, each set of 4 results in the named file (e.g. "result.lockless_pinned") 
are:
raw null_blk
service-time
queue-length
round-robin

The files with the trailing "_12" means:
NULL_BLK_HW_QUEUES=12
DM_MQ_HW_QUEUES=12
FIO_NUMJOBS=12

And the file without "_12" means:
NULL_BLK_HW_QUEUES=32
DM_MQ_HW_QUEUES=32
FIO_NUMJOBS=32

lockless: (this patchset applied)
*
result.lockless_pinned:  read : io=236580MB, bw=23656MB/s, iops=6055.9K, runt= 
10001msec
result.lockless_pinned:  read : io=108536MB, bw=10853MB/s, iops=2778.3K, runt= 
10001msec
result.lockless_pinned:  read : io=106649MB, bw=10664MB/s, iops=2729.1K, runt= 
10001msec
result.lockless_pinned:  read : io=162906MB, bw=16289MB/s, iops=4169.1K, runt= 
10001msec

result.lockless_pinned_12:  read : io=165233MB, bw=16522MB/s, iops=4229.6K, 
runt= 10001msec
result.lockless_pinned_12:  read : io=96686MB, bw=9667.7MB/s, iops=2474.1K, 
runt= 10001msec
result.lockless_pinned_12:  read : io=97197MB, bw=9718.8MB/s, iops=2488.3K, 
runt= 10001msec
result.lockless_pinned_12:  read : io=104509MB, bw=10450MB/s, iops=2675.2K, 
runt= 10001msec

result.lockless_unpinned:  read : io=101525MB, bw=10151MB/s, iops=2598.8K, 
runt= 10001msec
result.lockless_unpinned:  read : io=61313MB, bw=6130.8MB/s, iops=1569.5K, 
runt= 10001msec
result.lockless_unpinned:  read : io=64892MB, bw=6488.6MB/s, iops=1661.8K, 
runt= 10001msec
result.lockless_unpinned:  read : io=78557MB, bw=7854.1MB/s, iops=2010.9K, 
runt= 10001msec

result.lockless_unpinned_12:  read : io=83455MB, bw=8344.7MB/s, iops=2136.3K, 
runt= 10001msec
result.lockless_unpinned_12:  read : io=50638MB, bw=5063.4MB/s, iops=1296.3K, 
runt= 10001msec
result.lockless_unpinned_12:  read : io=56103MB, bw=5609.8MB/s, iops=1436.1K, 
runt= 10001msec
result.lockless_unpinned_12:  read : io=56421MB, bw=5641.6MB/s, iops=1444.3K, 
runt= 10001msec

spinlock:
*
result.spinlock_pinned:  read : io=236048MB, bw=23602MB/s, iops=6042.3K, runt= 
10001msec
result.spinlock_pinned:  read : io=64657MB, bw=6465.4MB/s, iops=1655.5K, runt= 
10001msec
result.spinlock_pinned:  read : io=67519MB, bw=6751.2MB/s, iops=1728.4K, runt= 
10001msec
result.spinlock_pinned:  read : io=81409MB, bw=8140.4MB/s, iops=2083.9K, runt= 
10001msec

result.spinlock_pinned_12:  read : io=159782MB, bw=15977MB/s, iops=4090.3K, 
runt= 10001msec
result.spinlock_pinned_12:  read : io=64368MB, bw=6436.2MB/s, iops=1647.7K, 
runt= 10001msec
result.spinlock_pinned_12:  read : io=67337MB, bw=6733.5MB/s, iops=1723.7K, 
runt= 10001mse

Re: [dm-devel] [RFC PATCH 0/4] dm mpath: vastly improve blk-mq IO performance

2016-04-01 Thread Johannes Thumshirn

On 2016-03-31 22:04, Mike Snitzer wrote:

I developed these changes some weeks ago but have since focused on
regression and performance testing on larger NUMA systems.

For regression testing I've been using mptest:
https://github.com/snitm/mptest

For performance testing I've been using a null_blk device (with
various configuration permutations, e.g. pinning memory to a
particular NUMA node, and varied number of submit_queues).

By eliminating multipath's heavy use of the m->lock spinlock in the
fast IO paths serious performance improvements are realized.


Hi Mike,

Are this the patches you pointed Hannes to?

If yes, please add my Tested-by: Johannes Thumshirn 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [RFC PATCH 0/4] dm mpath: vastly improve blk-mq IO performance

2016-04-01 Thread Mike Snitzer
On Fri, Apr 01 2016 at  4:12am -0400,
Johannes Thumshirn  wrote:

> On 2016-03-31 22:04, Mike Snitzer wrote:
> >I developed these changes some weeks ago but have since focused on
> >regression and performance testing on larger NUMA systems.
> >
> >For regression testing I've been using mptest:
> >https://github.com/snitm/mptest
> >
> >For performance testing I've been using a null_blk device (with
> >various configuration permutations, e.g. pinning memory to a
> >particular NUMA node, and varied number of submit_queues).
> >
> >By eliminating multipath's heavy use of the m->lock spinlock in the
> >fast IO paths serious performance improvements are realized.
> 
> Hi Mike,
> 
> Are this the patches you pointed Hannes to?
> 
> If yes, please add my Tested-by: Johannes Thumshirn 

No they are not.

Hannes seems to have last pulled in my DM mpath changes that (ab)used RCU.
I ended up dropping those changes and this patchset is the replacement.

So please retest with this patchset (I know you guys have a large setup
that these changes are very relevant for).  If you could actually share
_how_ yo've tested that'd help me understand how these changes are
holding up.  So far all looks good for me...

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [RFC PATCH 0/4] dm mpath: vastly improve blk-mq IO performance

2016-04-01 Thread Johannes Thumshirn

[ +Cc Hannes ]

On 2016-04-01 15:22, Mike Snitzer wrote:

On Fri, Apr 01 2016 at  4:12am -0400,
Johannes Thumshirn  wrote:


On 2016-03-31 22:04, Mike Snitzer wrote:
>I developed these changes some weeks ago but have since focused on
>regression and performance testing on larger NUMA systems.
>
>For regression testing I've been using mptest:
>https://github.com/snitm/mptest
>
>For performance testing I've been using a null_blk device (with
>various configuration permutations, e.g. pinning memory to a
>particular NUMA node, and varied number of submit_queues).
>
>By eliminating multipath's heavy use of the m->lock spinlock in the
>fast IO paths serious performance improvements are realized.

Hi Mike,

Are this the patches you pointed Hannes to?

If yes, please add my Tested-by: Johannes Thumshirn 



No they are not.

Hannes seems to have last pulled in my DM mpath changes that (ab)used 
RCU.

I ended up dropping those changes and this patchset is the replacement.


Now that you're saying it I can remember some inspiring RCU usage in the 
patches.



So please retest with this patchset (I know you guys have a large setup
that these changes are very relevant for).  If you could actually share
_how_ yo've tested that'd help me understand how these changes are
holding up.  So far all looks good for me...


The test itself is actually quite simple, we're testing with fio against 
a fiber channel array (all SSDs but I was very careful to only write 
into the cache)


Here's my fio job file:
[mq-test]
iodepth=128
numjobs=40
group_reporting
direct=1
ioengine=libaio
size=3G
filename=/dev/dm-0
filename=/dev/dm-1
filename=/dev/dm-2
filename=/dev/dm-3
filename=/dev/dm-4
filename=/dev/dm-5
filename=/dev/dm-6
filename=/dev/dm-7
name="MQ Test"

and the test runner:
#!/bin/sh

for rw in 'randread' 'randwrite' 'read' 'write'; do
for bs in '4k' '8k' '16k' '32k' '64k'; do
fio mq-test.fio --bs="${bs}" --rw="${rw}" 
--output="fio-${bs}-${rw}.txt"

done
done

The initiator has 40 CPUs on 4 NUMA nodes (no HT) and 64GB RAM. I'm not 
sure how much in term of numbers I can share from the old patchset (will 
ask Hannes on Monday), but I'm aware I'll have to when I retested with 
your new patches and we want to compare the results.


Byte,
   Johannes

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [RFC PATCH 0/4] dm mpath: vastly improve blk-mq IO performance

2016-04-01 Thread Mike Snitzer
On Fri, Apr 01 2016 at  9:37am -0400,
Johannes Thumshirn  wrote:

> [ +Cc Hannes ]
> 
> On 2016-04-01 15:22, Mike Snitzer wrote:
> >On Fri, Apr 01 2016 at  4:12am -0400,
> >Johannes Thumshirn  wrote:
> >
> >>On 2016-03-31 22:04, Mike Snitzer wrote:
> >>>I developed these changes some weeks ago but have since focused on
> >>>regression and performance testing on larger NUMA systems.
> >>>
> >>>For regression testing I've been using mptest:
> >>>https://github.com/snitm/mptest
> >>>
> >>>For performance testing I've been using a null_blk device (with
> >>>various configuration permutations, e.g. pinning memory to a
> >>>particular NUMA node, and varied number of submit_queues).
> >>>
> >>>By eliminating multipath's heavy use of the m->lock spinlock in the
> >>>fast IO paths serious performance improvements are realized.
> >>
> >>Hi Mike,
> >>
> >>Are this the patches you pointed Hannes to?
> >>
> >>If yes, please add my Tested-by: Johannes Thumshirn
> >>
> >
> >No they are not.
> >
> >Hannes seems to have last pulled in my DM mpath changes that
> >(ab)used RCU.
> >I ended up dropping those changes and this patchset is the replacement.
> 
> Now that you're saying it I can remember some inspiring RCU usage in
> the patches.
> 
> >So please retest with this patchset (I know you guys have a large setup
> >that these changes are very relevant for).  If you could actually share
> >_how_ yo've tested that'd help me understand how these changes are
> >holding up.  So far all looks good for me...
> 
> The test itself is actually quite simple, we're testing with fio
> against a fiber channel array (all SSDs but I was very careful to
> only write into the cache)
> 
> Here's my fio job file:
> [mq-test]
> iodepth=128
> numjobs=40
> group_reporting
> direct=1
> ioengine=libaio
> size=3G
> filename=/dev/dm-0
> filename=/dev/dm-1
> filename=/dev/dm-2
> filename=/dev/dm-3
> filename=/dev/dm-4
> filename=/dev/dm-5
> filename=/dev/dm-6
> filename=/dev/dm-7
> name="MQ Test"
> 
> and the test runner:
> #!/bin/sh
> 
> for rw in 'randread' 'randwrite' 'read' 'write'; do
> for bs in '4k' '8k' '16k' '32k' '64k'; do
> fio mq-test.fio --bs="${bs}" --rw="${rw}"
> --output="fio-${bs}-${rw}.txt"
> done
> done
> 
> The initiator has 40 CPUs on 4 NUMA nodes (no HT) and 64GB RAM. I'm
> not sure how much in term of numbers I can share from the old
> patchset (will ask Hannes on Monday), but I'm aware I'll have to
> when I retested with your new patches and we want to compare the
> results.

OK, yes please share as much as you can.  Having shared code that seems
to perform well it'd be nice to see some results from your testbed.
It'll help build my confidence in the code actually landing upstream.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [RFC PATCH 0/4] dm mpath: vastly improve blk-mq IO performance

2016-04-07 Thread Hannes Reinecke
On 03/31/2016 10:04 PM, Mike Snitzer wrote:
> I developed these changes some weeks ago but have since focused on
> regression and performance testing on larger NUMA systems.
> 
> For regression testing I've been using mptest:
> https://github.com/snitm/mptest
> 
> For performance testing I've been using a null_blk device (with
> various configuration permutations, e.g. pinning memory to a
> particular NUMA node, and varied number of submit_queues).
> 
> By eliminating multipath's heavy use of the m->lock spinlock in the
> fast IO paths serious performance improvements are realized.
> 
[ .. ]
> Jeff Moyer has been helping review these changes (and has graciously
> labored over _really_ understanding all the concurrency at play in DM
> mpath) -- his review isn't yet complete but I wanted to get this
> patchset out now to raise awareness about how I think DM multipath
> will be changing (for inclussion during the Linux 4.7 merge window).
> 
> Mike Snitzer (4):
>   dm mpath: switch to using bitops for state flags
>   dm mpath: use atomic_t for counting members of 'struct multipath'
>   dm mpath: move trigger_event member to the end of 'struct multipath'
>   dm mpath: eliminate use of spinlock in IO fast-paths
> 
>  drivers/md/dm-mpath.c | 351 
> --
>  1 file changed, 195 insertions(+), 156 deletions(-)
> 
Finally got around to test this.
The performance is comparable to the previous (RCU-ified) patchset,
however, this one is the far superious approach.
In fact, the first two are pretty much identical to what I've
already had, but I've shirked at modifying the path selectors.
So well done here.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [RFC PATCH 0/4] dm mpath: vastly improve blk-mq IO performance

2016-04-07 Thread Mike Snitzer
On Thu, Apr 07 2016 at 10:58am -0400,
Hannes Reinecke  wrote:

> On 03/31/2016 10:04 PM, Mike Snitzer wrote:
> > I developed these changes some weeks ago but have since focused on
> > regression and performance testing on larger NUMA systems.
> > 
> > For regression testing I've been using mptest:
> > https://github.com/snitm/mptest
> > 
> > For performance testing I've been using a null_blk device (with
> > various configuration permutations, e.g. pinning memory to a
> > particular NUMA node, and varied number of submit_queues).
> > 
> > By eliminating multipath's heavy use of the m->lock spinlock in the
> > fast IO paths serious performance improvements are realized.
> > 
> [ .. ]
> > Jeff Moyer has been helping review these changes (and has graciously
> > labored over _really_ understanding all the concurrency at play in DM
> > mpath) -- his review isn't yet complete but I wanted to get this
> > patchset out now to raise awareness about how I think DM multipath
> > will be changing (for inclussion during the Linux 4.7 merge window).
> > 
> > Mike Snitzer (4):
> >   dm mpath: switch to using bitops for state flags
> >   dm mpath: use atomic_t for counting members of 'struct multipath'
> >   dm mpath: move trigger_event member to the end of 'struct multipath'
> >   dm mpath: eliminate use of spinlock in IO fast-paths
> > 
> >  drivers/md/dm-mpath.c | 351 
> > --
> >  1 file changed, 195 insertions(+), 156 deletions(-)
> > 
> Finally got around to test this.
> The performance is comparable to the previous (RCU-ified) patchset,
> however, this one is the far superious approach.
> In fact, the first two are pretty much identical to what I've
> already had, but I've shirked at modifying the path selectors.
> So well done here.

Awesome, thanks for reviewing and testing, very much appreciated.

I'll get this set staged in linux-next for 4.7 shortly.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [RFC PATCH 0/4] dm mpath: vastly improve blk-mq IO performance

2016-04-08 Thread Johannes Thumshirn
Ladies and Gentlemen,
To show off some numbers from our testing:

All tests are performed against the cache of the Array, not the disks as we 
wanted to test the Linux stack not the Disk Array.

All single queue tests have been performed with the deadline I/O Scheduler.

Comments welcome, have fun reading :-)

QLogic 32GBit FC HBA NetApp EF560 Array w/ DM MQ this patchset:

Random Read:
BS   | IOPS | BW|
--++---+
4k   | 785136 | 3066.1MB/s |
8k   | 734983 | 5742.6MB/s |
16k | 398516 | 6226.9MB/s |
32k | 200589 | 6268.5MB/s |
64k | 100417 | 6276.2MB/s |

Sequential Read:
BS   | IOPS | BW|
--++---+
4k   | 788620 | 3080.6MB/s |
8k   | 736359 | 5752.9MB/s |
16k | 398597 | 6228.8MB/s |
32k |  200487| 6265.3MB/s |
64k | 100402 | 6275.2MB/s |

Random Write:
BS   | IOPS | BW|
--++---+
4k   | 242105 | 968423KB/s |
8k   | 90 | 1736.7MB/s |
16k | 178191 | 2784.3MB/s |
32k | 133619 | 4175.7MB/s |
64k |   97693 | 6105.9MB/s |

Sequential Write:
BS   | IOPS | BW|
--++---+
4k   | 134788 | 539155KB/s |
8k   | 132361 | 1034.8MB/s |
16k | 129941 | 2030.4MB/s |
32k | 128288 | 4009.4MB/s |
64k |   97776 | 6111.0MB/s |

QLogic 32GBit FC HBA NetApp EF560 Array w/ DM SQ this patchset:

Random Read:
BS   | IOPS | BW|
--++---+
4k   | 112402 | 449608KB/s |
8k   | 112818 | 902551KB/s |
16k | 111885 | 1748.3MB/s |
32k | 188015 | 5875.6MB/s |
64k |   99021 |  6188.9MB/s |

Sequential Read:
BS   | IOPS | BW|
--++---+
4k   | 115046 | 460186KB/s |
8k   | 113974 | 911799KB/s |
16k | 113374 | 1771.5MB/s |
32k | 192932 | 6029.2MB/s |
64k | 100474 | 6279.7MB/s |

Random Write:
BS   | IOPS | BW|
--++---+
4k   | 114284 | 457138KB/s |
8k   | 113992 | 911944KB/s |
16k | 113715 | 1776.9MB/s |
32k | 130402 | 4075.9MB/s |
64k |   92243 | 5765.3MB/s |

Sequential Write:
BS   | IOPS | BW|
--++---+
4k   | 115540 | 462162KB/s |
8k   | 114243 | 913951KB/s |
16k | 300153 | 4689.1MB/s |
32k | 141069 | 4408.5MB/s |
64k |   97620 | 6101.3MB/s |


QLogic 32GBit FC HBA NetApp EF560 Array w/ DM MQ previous patchset:

Random Read:
BS   | IOPS | BW|
--++---+
4k   | 782733 | 3057.6MB/s |
8k   | 732143 | 5719.9MB/s |
16k | 398314 | 6223.7MB/s |
32k | 200538 | 6266.9MB/s |
64k | 100422 | 6276.5MB/s |

Sequential Read:
BS   | IOPS | BW|
--++---+
4k   | 786707 | 3073.8MB/s |
8k   | 730579 | 5707.7MB/s |
16k | 398799 | 6231.3MB/s |
32k | 200518 | 6266.2MB/s |
64k | 100397 | 6274.9MB/s |

Random Write:
BS   | IOPS | BW|
--++---+
4k   | 242426 | 969707KB/s |
8k   | 223079 | 1742.9MB/s |
16k | 177889 | 2779.6MB/s |
32k | 133637 | 4176.2MB/s |
64k |   97727 | 6107.1MB/s |

Sequential Write:
BS   | IOPS | BW|
--++---+
4k   | 134360 | 537442KB/s |
8k   | 129738 | 1013.6MB/s |
16k | 129746 | 2027.3MB/s |
32k | 127875 | 3996.1MB/s |
64k |   97683 | 6105.3MB/s |

Emulex 16GBit FC HBA NetApp EF560 Array w/ DM MQ this patchset :

[Beware, this is with Hannes' lockless lpfc patches, which are not upstream as 
they're quite experimental, but are good at showing the capability of the new 
dm-mpath]

Random Read:
BS   | IOPS | BW|
--++---+
4k   | 939752 | 3670.1MB/s |
8k   | 741462 | 5792.7MB/s |
16k | 399285 | 6238.9MB/s |
32k | 196490 | 6140.4MB/s |
64k | 100325 | 6270.4MB/s |

Sequential Read:
BS   | IOPS | BW
--++---+
4k   | 926222 | 3618.6MB/s |
8k   | 750125 | 5860.4MB/s |
16k | 397770 | 6215.2MB/s |
32k | 200130 | 6254.8MB/s | 
64k | 100397 | 6274.9MB/s |

Random Write:
BS   | IOPS | BW|
--++---+
4k   | 251938 | 984.14MB/s |
8k   | 226712 | 1771.2MB/s |
16k | 180739 | 2824.5MB/s |
32k | 133316 | 4166.2MB/s |
64k |  98738  | 6171.2MB/s |

Sequential Write:
BS   | IOPS | BW|
--++---+
4k   | 134660 | 538643KB/s |
8k   | 131585 | 1028.9MB/s |
16k | 131030 | 2047.4MB/s |
32k | 126987 | 3968.4MB/s |
64k |  98882  | 6180.2MB/s |

Emulex 16GBit FC HBA NetApp EF560 Array w/ DM SQ this patchset:

Random Read:
BS   | IOPS | BW|
--++---+
4k   | 101860 | 

Re: [dm-devel] [RFC PATCH 0/4] dm mpath: vastly improve blk-mq IO performance

2016-04-08 Thread Mike Snitzer
On Fri, Apr 08 2016 at  7:42am -0400,
Johannes Thumshirn  wrote:

> Ladies and Gentlemen,
> To show off some numbers from our testing:
> 
> All tests are performed against the cache of the Array, not the disks as we 
> wanted to test the Linux stack not the Disk Array.
> 
> All single queue tests have been performed with the deadline I/O Scheduler.
> 
> Comments welcome, have fun reading :-)

Any chance you collected performance results from DM MQ on this same
testbed without any variant of my lockless patches?  The DM SQ results
aren't too interesting a reference point.  Seeing how much better
lockless DM MQ (multipath) is than the old m->lock heacy code (still in
4.6) would be more interesting.

Not a big deal if you don't have it.. but figured I'd check to see.

And thanks for the numbers you've provided.
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [RFC PATCH 0/4] dm mpath: vastly improve blk-mq IO performance

2016-04-13 Thread Johannes Thumshirn
On Freitag, 8. April 2016 15:29:39 CEST Mike Snitzer wrote:
> On Fri, Apr 08 2016 at  7:42am -0400,
> Johannes Thumshirn  wrote:
> 
> > Ladies and Gentlemen,
> > To show off some numbers from our testing:
> > 
> > All tests are performed against the cache of the Array, not the disks as we 
> > wanted to test the Linux stack not the Disk Array.
> > 
> > All single queue tests have been performed with the deadline I/O Scheduler.
> > 
> > Comments welcome, have fun reading :-)
> 
> Any chance you collected performance results from DM MQ on this same
> testbed without any variant of my lockless patches?  The DM SQ results
> aren't too interesting a reference point.  Seeing how much better
> lockless DM MQ (multipath) is than the old m->lock heacy code (still in
> 4.6) would be more interesting.
> 
> Not a big deal if you don't have it.. but figured I'd check to see.

I'll have to look if there are some of the old logfiles still available, but we
can't re-test, as the array was just temporarily allocated to us and we've
now lost access to it. But IIRC it's been somewhere around 300K IOPS, but 
don't quote me on that.

Byte,
Joahnnes

 
> And thanks for the numbers you've provided.
> Mike
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 


-- 
Johannes Thumshirn  
  Storage
jthumsh...@suse.de 
+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel