On Wed, May 03, 2017 at 09:08:34AM -0600, Jens Axboe wrote: > On 05/03/2017 09:03 AM, Ming Lei wrote: > > On Wed, May 03, 2017 at 08:10:58AM -0600, Jens Axboe wrote: > >> On 05/03/2017 08:08 AM, Jens Axboe wrote: > >>> On 05/02/2017 10:03 PM, Ming Lei wrote: > >>>> On Fri, Apr 28, 2017 at 02:29:18PM -0600, Jens Axboe wrote: > >>>>> On 04/28/2017 09:15 AM, Ming Lei wrote: > >>>>>> Hi, > >>>>>> > >>>>>> This patchset introduces flag of BLK_MQ_F_SCHED_USE_HW_TAG and > >>>>>> allows to use hardware tag directly for IO scheduling if the queue's > >>>>>> depth is big enough. In this way, we can avoid to allocate extra tags > >>>>>> and request pool for IO schedule, and the schedule tag > >>>>>> allocation/release > >>>>>> can be saved in I/O submit path. > >>>>> > >>>>> Ming, I like this approach, it's pretty clean. It'd be nice to have a > >>>>> bit of performance data to back up that it's useful to add this code, > >>>>> though. Have you run anything on eg kyber on nvme that shows a > >>>>> reduction in overhead when getting rid of separate scheduler tags? > >>>> > >>>> I can observe small improvement in the following tests: > >>>> > >>>> 1) fio script > >>>> # io scheduler: kyber > >>>> > >>>> RWS="randread read randwrite write" > >>>> for RW in $RWS; do > >>>> echo "Running test $RW" > >>>> sudo echo 3 > /proc/sys/vm/drop_caches > >>>> sudo fio --direct=1 --size=128G --bsrange=4k-4k --runtime=20 > >>>> --numjobs=1 --ioengine=libaio --iodepth=10240 --group_reporting=1 > >>>> --filename=$DISK --name=$DISK-test-$RW --rw=$RW --output-format=json > >>>> done > >>>> > >>>> 2) results > >>>> > >>>> --------------------------------------------------------- > >>>> |sched tag(iops/lat) | use hw tag to sched(iops/lat) > >>>> ---------------------------------------------------------- > >>>> randread |188940/54107 | 193865/52734 > >>>> ---------------------------------------------------------- > >>>> read |192646/53069 | 199738/51188 > >>>> ---------------------------------------------------------- > >>>> randwrite |171048/59777 | 179038/57112 > >>>> ---------------------------------------------------------- > >>>> write |171886/59492 | 181029/56491 > >>>> ---------------------------------------------------------- > >>>> > >>>> I guess it may be a bit more obvious when running the test on one slow > >>>> NVMe device, and will try to find one and run the test again. > >>> > >>> Thanks for running that. As I said in my original reply, I think this > >>> is a good optimization, and the implementation is clean. I'm fine with > >>> the current limitations of when to enable it, and it's not like we > >>> can't extend this later, if we want. > >>> > >>> I do agree with Bart that patch 1+4 should be combined. I'll do that. > >> > >> Actually, can you do that when reposting? Looks like you needed to > >> do that anyway. > > > > Yeah, I will do that in V1. > > V2? :-) > > Sounds good. I just wanted to check the numbers here, with the series > applied on top of for-linus crashes when switching to kyber. A few hunks
Yeah, I saw that too, it has been fixed in my local tree, :-) > threw fuzz, but it looked fine to me. But I bet I fat fingered > something. So it'd be great if you could respin against my for-linus > branch. Actually, that is exactly what I am doing, :-) Thanks, Ming