Re: testing io.low limit for blk-throttle

2018-04-22 Thread jianchao.wang
Hi Paolo

On 04/23/2018 01:32 PM, Paolo Valente wrote:
> Thanks for sharing this fix.  I tried it too, but nothing changes in
> my test :(> 

That's really sad.

> At this point, my doubt is still: am I getting io.low limit right?  I
> understand that an I/O-bound group should be guaranteed a rbps at
> least equal to the rbps set with io.low for that group (of course,
> provided that the sum of io.low limits is lower than the rate at which
> the device serves all the I/O generated by the groups).  Is this
> really what io.low shall guarantee?

I agree with your point about this even if I'm not qualified to judge it.

On the other hand, could you share your test case and blk-throl config here ?

Thanks
Jianchao


Re: testing io.low limit for blk-throttle

2018-04-22 Thread Joseph Qi
Hi Paolo,
What's your idle and latency config?
IMO, io.low will allow others run more bandwidth if cgroup's average
idle time is high or latency is low. In such cases, low limit won't get
guaranteed.

Thanks,
Joseph

On 18/4/22 17:23, Paolo Valente wrote:
> Hi Shaohua, all,
> at last, I started testing your io.low limit for blk-throttle.  One of
> the things I'm interested in is how good throttling is in achieving a
> high throughput in the presence of realistic, variable workloads.
> 
> However, I seem to have bumped into a totally different problem.  The
> io.low parameter doesn't seem to guarantee what I understand it is meant
> to guarantee: minimum per-group bandwidths.  For example, with
> - one group, the interfered, containing one process that does sequential
>   reads with fio
> - io.low set to 100MB/s for the interfered
> - six other groups, the interferers, with each interferer containing one
>   process doing sequential read with fio
> - io.low set to 10MB/s for each interferer
> - the workload executed on an SSD, with a 500MB/s of overall throughput
> the interfered gets only 75MB/s.
> 
> In particular, the throughput of the interfered becomes lower and
> lower as the number of interferers is increased.  So you can make it
> become even much lower than the 75MB/s in the example above.  There
> seems to be no control on bandwidth.
> 
> Am I doing something wrong?  Or did I simply misunderstand the goal of
> io.low, and the only parameter for guaranteeing the desired bandwidth to
> a group is io.max (to be used indirectly, by limiting the bandwidth of
> the interferers)?
> 
> If useful for you, you can reproduce the above test very quickly, by
> using the S suite [1] and typing:
> 
> cd thr-lat-with-interference
> sudo ./thr-lat-with-interference.sh -b t -w 1 -W "1000 1000 
> 1000 1000 1000 1000" -n 6 -T "read read read read read read" 
> -R "0 0 0 0 0 0"
> 
> Looking forward to your feedback,
> Paolo
> 
> [1] 
> 


Re: testing io.low limit for blk-throttle

2018-04-22 Thread Paolo Valente


> Il giorno 23 apr 2018, alle ore 04:19, jianchao.wang 
>  ha scritto:
> 
> Hi Paolo
> 
> As I said, I used to meet similar scenario.
> After dome debug, I found out 3 issues.
> 
> Here is my setup command:
> mkdir test0 test1
> echo "259:0 riops=15" > test0/io.max 
> echo "259:0 riops=15" > test1/io.max 
> echo "259:0 riops=15" > test2/io.max 
> echo "259:0 riops=5 wiops=5 rbps=209715200 wbps=209715200 idle=200 
> latency=10" > test0/io.low 
> echo "259:0 riops=5 wiops=5 rbps=209715200 wbps=209715200 idle=200 
> latency=10" > test1/io.low 
> 
> My NVMe card's max bps is ~600M, and max iops is ~160k.
> Two cgroups' io.low is bps 200M and 50k. io.max is iops 150k
> 
> 1. I setup 2 cgroups test0 and test1, one process per cgroup.
> Even if only the process in test0 does IO, its iops is just 50k.
> This is fixed by following patch.
> https://marc.info/?l=linux-block&m=152325457607425&w=2 
> 
> 2. Let the process in test0 and test1 both do IO.
> Sometimes, the iops of both cgroup are 50k, look at the log, blk-throl's 
> upgrade always fails.
> This is fixed by following patch:
> https://marc.info/?l=linux-block&m=152325456307423&w=2
> 
> 3. After applied patch 1 and 2, still see that one of cgroup's iops will fall 
> down to 30k ~ 40k but
> blk-throl doesn't downgrade. It is due to even if the iops has been lower 
> than the io.low limit for some time,
> but the cgroup is idle, so downgrade fails. More detailed, it is due to the 
> code segment in throtl_tg_is_idle
> 
>(tg->latency_target && tg->bio_cnt &&
>   tg->bad_bio_cnt * 5 < tg->bio_cnt)
> 
> I fixed it with following patch.
> But I'm not sure about this patch, so I didn't submit it.
> Please also try it. :)
> 

Thanks for sharing this fix.  I tried it too, but nothing changes in
my test :(

At this point, my doubt is still: am I getting io.low limit right?  I
understand that an I/O-bound group should be guaranteed a rbps at
least equal to the rbps set with io.low for that group (of course,
provided that the sum of io.low limits is lower than the rate at which
the device serves all the I/O generated by the groups).  Is this
really what io.low shall guarantee?

Thanks,
Paolo


> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index b5ba845..c9a43a4 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -1819,7 +1819,7 @@ static unsigned long tg_last_low_overflow_time(struct 
> throtl_grp *tg)
>return ret;
> }
> 
> -static bool throtl_tg_is_idle(struct throtl_grp *tg)
> +static bool throtl_tg_is_idle(struct throtl_grp *tg, bool latency)
> {
>/*
> * cgroup is idle if:
> @@ -1836,7 +1836,7 @@ static bool throtl_tg_is_idle(struct throtl_grp *tg)
>  tg->idletime_threshold == DFL_IDLE_THRESHOLD ||
>  (ktime_get_ns() >> 10) - tg->last_finish_time > time ||
>  tg->avg_idletime > tg->idletime_threshold ||
> - (tg->latency_target && tg->bio_cnt &&
> + (tg->latency_target && tg->bio_cnt && latency &&
>tg->bad_bio_cnt * 5 < tg->bio_cnt);
>throtl_log(&tg->service_queue,
>"avg_idle=%ld, idle_threshold=%ld, bad_bio=%d, total_bio=%d, 
> is_idle=%d, scale=%d",
> @@ -1867,7 +1867,7 @@ static bool throtl_tg_can_upgrade(struct throtl_grp *tg)
> 
>if (time_after_eq(jiffies,
>tg_last_low_overflow_time(tg) + tg->td->throtl_slice) &&
> -   throtl_tg_is_idle(tg))
> +   throtl_tg_is_idle(tg, true))
>return true;
>return false;
> }
> @@ -1983,7 +1983,7 @@ static bool throtl_tg_can_downgrade(struct throtl_grp 
> *tg)
>if (time_after_eq(now, td->low_upgrade_time + td->throtl_slice) &&
>time_after_eq(now, tg_last_low_overflow_time(tg) +
>td->throtl_slice) &&
> -   (!throtl_tg_is_idle(tg) ||
> +   (!throtl_tg_is_idle(tg, false) ||
> !list_empty(&tg_to_blkg(tg)->blkcg->css.children)))
>return true;
>return false;
> 
> 
> 
> On 04/22/2018 11:53 PM, Paolo Valente wrote:
>> 
>> 
>>> Il giorno 22 apr 2018, alle ore 15:29, jianchao.wang 
>>>  ha scritto:
>>> 
>>> Hi Paolo
>>> 
>>> I used to meet similar issue on io.low.
>>> Can you try the following patch to see whether the issue could be fixed.
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dblock-26m-3D152325456307423-26w-3D2&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=asJMDy9zIe2AqRVpoLbe9RMjsdZOJZ0HrRWTM3CPZeA&s=AZ4kllxCfaXspjeSylBpK8K7ai6IPjSiffrGmzt4VEM&e=
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dblock-26m-3D152325457607425-26w-3D2&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=asJMDy9zIe2AqRVpoLbe9RMjsdZOJZ0HrRWTM3CPZeA&s=1EhsoSMte3kIxuVSYBFSE9W2jRKrIWI5z7-stlZ80H4&e=
>>> 
>

Re: question about request merge

2018-04-22 Thread Zhengyuan Liu
2018-04-22 0:23 GMT+08:00 Jens Axboe :
> On 4/21/18 8:07 AM, Zhengyuan Liu wrote:
>> 2018-04-20 22:34 GMT+08:00 Jens Axboe :
>>> On 4/19/18 9:51 PM, Zhengyuan Liu wrote:
 Hi, Shaohua

 I found it indeed doesn't do front merge when two threads flush plug list  
 concurrently.   To
 reappear , I prepared two IO threads , named a0.io and a1.io .
 Thread a1.io  uses libaio to write 5 requests :
 sectors: 16 + 8, 40 + 8, 64 + 8, 88 + 8, 112 + 8
 Thread a0.io  uses libaio to write other 5 requests :
 sectors: 8+ 8, 32 + 8, 56 + 8, 80 + 8, 104 + 8
>>>
>>> I'm cutting some of the below.
>>>
>>> Thanks for the detailed email. It's mostly on purpose that we don't
>>> spend cycles and memory on maintaining a separate front merge hash,
>>> since it's generally not something that happens very often. If you have
>>> a thread pool doing IO and split sequential IO such that you would
>>> benefit a lot from front merging, then I would generally claim that
>>> you're not writing your app in the most optimal manner.
>>>
>>
>> Thanks for explanation, I only consider the problem through the code's
>> perspective and ignore the reality situation of app.
>
> That's quite by design and not accidental.
>
>>> So I'm curious, what's the big interest in front merging?
>>
>> If it's not something that happens so much often, I think it's not worth to
>> support front merging too.
>>
>> By the way, I got another question that why not  blktrace tracing the back
>> merging of requests while flushing plugged requests to queue,  if it does
>> we may get a more clear view about IO merging.
>
> Not sure I follow, exactly where is a back merge trace missing?
>

I mean blktrace only traces bio merging , not traces request merging; Let
me give a example, I use thread a.out to write three bios, seeing bellow:

a.out:  0 + 8, 16 + 8, 8 + 8

The result of blktrace was showed as bellow:

  8,16   17 0.292069180  1222  Q  WS 0 + 8 [a.out]
  8,16   18 0.292073960  1222  G  WS 0 + 8 [a.out]
  8,16   19 0.292074440  1222  P   N [a.out]
  8,16   1   10 0.292079380  1222  Q  WS 16 + 8 [a.out]
  8,16   1   11 0.292081840  1222  G  WS 16 + 8 [a.out]
  8,16   1   12 0.292085860  1222  Q  WS 8 + 8 [a.out]
  8,16   1   13 0.292087240  1222  F  WS 8 + 8 [a.out]
  8,16   1   14 0.292089100  1222  I  WS 0 + 8 [a.out]
  8,16   1   15 0.292095200  1222  I  WS 8 + 16 [a.out]
  8,16   1   16 0.295931920  1222  U   N [a.out] 2
  8,16   1   17 0.298528980  1222  D  WS 0 + 24 [a.out]
  8,16   03 0.302617360 3  C  WS 0 + 24 [0]
  
  Total (8,16):
 Reads Queued:   0,0KiB  Writes Queued: 
  3,   12KiB
 Read Dispatches:0,0KiB Write Dispatches:   
 1,   12KiB
 Reads Requeued: 0 Writes Requeued: 0
 Reads Completed:0,0KiB  Writes Completed:1,
   12KiB
 Read Merges:0,0KiB  Write Merges:  
  1,4KiB
 PC Reads Queued:0,0KiB  PC Writes Queued:0,
0KiB
 PC Read Disp.:  3,0KiB  PC Write Disp.:
  0,0KiB
 PC Reads Req.:  0   PC Writes Req.:  0
 PC Reads Compl.:3   PC Writes Compl.:0
 IO unplugs: 1   Timer unplugs:   0

we merge bio(8 + 8) into request(16 + 8) at plug stage and that's well traced
as F, when comes to unplug stage request(0 + 8) and request(8 + 16) merge
into only one request(0 + 24),  but there isn't tracing information about that
operation.

 So I'm just a bit curious and please forgive my ignorance.

Thanks.
> --
> Jens Axboe
>

Re: testing io.low limit for blk-throttle

2018-04-22 Thread jianchao.wang
Hi Paolo

As I said, I used to meet similar scenario.
After dome debug, I found out 3 issues.

Here is my setup command:
mkdir test0 test1
echo "259:0 riops=15" > test0/io.max 
echo "259:0 riops=15" > test1/io.max 
echo "259:0 riops=15" > test2/io.max 
echo "259:0 riops=5 wiops=5 rbps=209715200 wbps=209715200 idle=200 
latency=10" > test0/io.low 
echo "259:0 riops=5 wiops=5 rbps=209715200 wbps=209715200 idle=200 
latency=10" > test1/io.low 

My NVMe card's max bps is ~600M, and max iops is ~160k.
Two cgroups' io.low is bps 200M and 50k. io.max is iops 150k

1. I setup 2 cgroups test0 and test1, one process per cgroup.
Even if only the process in test0 does IO, its iops is just 50k.
This is fixed by following patch.
https://marc.info/?l=linux-block&m=152325457607425&w=2 

2. Let the process in test0 and test1 both do IO.
Sometimes, the iops of both cgroup are 50k, look at the log, blk-throl's 
upgrade always fails.
This is fixed by following patch:
https://marc.info/?l=linux-block&m=152325456307423&w=2

3. After applied patch 1 and 2, still see that one of cgroup's iops will fall 
down to 30k ~ 40k but
blk-throl doesn't downgrade. It is due to even if the iops has been lower than 
the io.low limit for some time,
but the cgroup is idle, so downgrade fails. More detailed, it is due to the 
code segment in throtl_tg_is_idle

(tg->latency_target && tg->bio_cnt &&
tg->bad_bio_cnt * 5 < tg->bio_cnt)

I fixed it with following patch.
But I'm not sure about this patch, so I didn't submit it.
Please also try it. :)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index b5ba845..c9a43a4 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1819,7 +1819,7 @@ static unsigned long tg_last_low_overflow_time(struct 
throtl_grp *tg)
return ret;
 }
 
-static bool throtl_tg_is_idle(struct throtl_grp *tg)
+static bool throtl_tg_is_idle(struct throtl_grp *tg, bool latency)
 {
/*
 * cgroup is idle if:
@@ -1836,7 +1836,7 @@ static bool throtl_tg_is_idle(struct throtl_grp *tg)
  tg->idletime_threshold == DFL_IDLE_THRESHOLD ||
  (ktime_get_ns() >> 10) - tg->last_finish_time > time ||
  tg->avg_idletime > tg->idletime_threshold ||
- (tg->latency_target && tg->bio_cnt &&
+ (tg->latency_target && tg->bio_cnt && latency &&
tg->bad_bio_cnt * 5 < tg->bio_cnt);
throtl_log(&tg->service_queue,
"avg_idle=%ld, idle_threshold=%ld, bad_bio=%d, total_bio=%d, 
is_idle=%d, scale=%d",
@@ -1867,7 +1867,7 @@ static bool throtl_tg_can_upgrade(struct throtl_grp *tg)
 
if (time_after_eq(jiffies,
tg_last_low_overflow_time(tg) + tg->td->throtl_slice) &&
-   throtl_tg_is_idle(tg))
+   throtl_tg_is_idle(tg, true))
return true;
return false;
 }
@@ -1983,7 +1983,7 @@ static bool throtl_tg_can_downgrade(struct throtl_grp *tg)
if (time_after_eq(now, td->low_upgrade_time + td->throtl_slice) &&
time_after_eq(now, tg_last_low_overflow_time(tg) +
td->throtl_slice) &&
-   (!throtl_tg_is_idle(tg) ||
+   (!throtl_tg_is_idle(tg, false) ||
 !list_empty(&tg_to_blkg(tg)->blkcg->css.children)))
return true;
return false;



On 04/22/2018 11:53 PM, Paolo Valente wrote:
> 
> 
>> Il giorno 22 apr 2018, alle ore 15:29, jianchao.wang 
>>  ha scritto:
>>
>> Hi Paolo
>>
>> I used to meet similar issue on io.low.
>> Can you try the following patch to see whether the issue could be fixed.
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dblock-26m-3D152325456307423-26w-3D2&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=asJMDy9zIe2AqRVpoLbe9RMjsdZOJZ0HrRWTM3CPZeA&s=AZ4kllxCfaXspjeSylBpK8K7ai6IPjSiffrGmzt4VEM&e=
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dblock-26m-3D152325457607425-26w-3D2&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=asJMDy9zIe2AqRVpoLbe9RMjsdZOJZ0HrRWTM3CPZeA&s=1EhsoSMte3kIxuVSYBFSE9W2jRKrIWI5z7-stlZ80H4&e=
>>
> 
> Just tried. Unfortunately, nothing seems to change :(
> 
> Thanks,
> Paolo
> 
>> Thanks
>> Jianchao
>>
>> On 04/22/2018 05:23 PM, Paolo Valente wrote:
>>> Hi Shaohua, all,
>>> at last, I started testing your io.low limit for blk-throttle.  One of
>>> the things I'm interested in is how good throttling is in achieving a
>>> high throughput in the presence of realistic, variable workloads.
>>>
>>> However, I seem to have bumped into a totally different problem.  The
>>> io.low parameter doesn't seem to guarantee what I understand it is meant
>>> to guarantee: minimum per-group bandwidths.  For example, with
>>> - one group, the interfered, containing one process that does sequential
>>>  reads with fio
>>> -

Re: testing io.low limit for blk-throttle

2018-04-22 Thread Paolo Valente


> Il giorno 22 apr 2018, alle ore 15:29, jianchao.wang 
>  ha scritto:
> 
> Hi Paolo
> 
> I used to meet similar issue on io.low.
> Can you try the following patch to see whether the issue could be fixed.
> https://marc.info/?l=linux-block&m=152325456307423&w=2
> https://marc.info/?l=linux-block&m=152325457607425&w=2
> 

Just tried. Unfortunately, nothing seems to change :(

Thanks,
Paolo

> Thanks
> Jianchao
> 
> On 04/22/2018 05:23 PM, Paolo Valente wrote:
>> Hi Shaohua, all,
>> at last, I started testing your io.low limit for blk-throttle.  One of
>> the things I'm interested in is how good throttling is in achieving a
>> high throughput in the presence of realistic, variable workloads.
>> 
>> However, I seem to have bumped into a totally different problem.  The
>> io.low parameter doesn't seem to guarantee what I understand it is meant
>> to guarantee: minimum per-group bandwidths.  For example, with
>> - one group, the interfered, containing one process that does sequential
>>  reads with fio
>> - io.low set to 100MB/s for the interfered
>> - six other groups, the interferers, with each interferer containing one
>>  process doing sequential read with fio
>> - io.low set to 10MB/s for each interferer
>> - the workload executed on an SSD, with a 500MB/s of overall throughput
>> the interfered gets only 75MB/s.
>> 
>> In particular, the throughput of the interfered becomes lower and
>> lower as the number of interferers is increased.  So you can make it
>> become even much lower than the 75MB/s in the example above.  There
>> seems to be no control on bandwidth.
>> 
>> Am I doing something wrong?  Or did I simply misunderstand the goal of
>> io.low, and the only parameter for guaranteeing the desired bandwidth to
>> a group is io.max (to be used indirectly, by limiting the bandwidth of
>> the interferers)?
>> 
>> If useful for you, you can reproduce the above test very quickly, by
>> using the S suite [1] and typing:
>> 
>> cd thr-lat-with-interference
>> sudo ./thr-lat-with-interference.sh -b t -w 1 -W "1000 1000 
>> 1000 1000 1000 1000" -n 6 -T "read read read read read read" 
>> -R "0 0 0 0 0 0"
>> 
>> Looking forward to your feedback,
>> Paolo
>> 
>> [1] 
>> 



Re: [PATCH 3/3] xen blkback: add fault injection facility

2018-04-22 Thread kbuild test robot
Hi Stanislav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]
[also build test ERROR on v4.17-rc1 next-20180420]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Stanislav-Kinsburskii/Introduce-Xen-fault-injection-facility/20180422-201946
config: x86_64-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/block//xen-blkback/blkback_fi.c: In function 'xen_blkif_fi_init':
>> drivers/block//xen-blkback/blkback_fi.c:87:51: error: dereferencing pointer 
>> to incomplete type 'struct backend_info'
 bfi->dir = debugfs_create_dir(dev_name(&blkif->be->dev->dev),
  ^~

vim +87 drivers/block//xen-blkback/blkback_fi.c

77  
78  int xen_blkif_fi_init(struct xen_blkif *blkif)
79  {
80  struct xen_blkif_fi *bfi;
81  int fi, err = -ENOMEM;
82  
83  bfi = kmalloc(sizeof(*bfi), GFP_KERNEL);
84  if (!bfi)
85  return -ENOMEM;
86  
  > 87  bfi->dir = debugfs_create_dir(dev_name(&blkif->be->dev->dev),
88blkif_fi_dir);
89  if (!bfi->dir)
90  goto err_dir;
91  
92  for (fi = 0; fi < XENBLKIF_FI_MAX; fi++) {
93  bfi->faults[fi] = xen_fi_dir_add(bfi->dir,
94   
xen_blkif_fi_names[fi]);
95  if (!bfi->faults[fi])
96  goto err_fault;
97  }
98  
99  blkif->fi_info = bfi;
   100  return 0;
   101  
   102  err_fault:
   103  for (; fi > 0; fi--)
   104  xen_fi_del(bfi->faults[fi]);
   105  debugfs_remove_recursive(bfi->dir);
   106  err_dir:
   107  kfree(bfi);
   108  return err;
   109  }
   110  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: testing io.low limit for blk-throttle

2018-04-22 Thread jianchao.wang
Hi Paolo

I used to meet similar issue on io.low.
Can you try the following patch to see whether the issue could be fixed.
https://marc.info/?l=linux-block&m=152325456307423&w=2
https://marc.info/?l=linux-block&m=152325457607425&w=2

Thanks
Jianchao

On 04/22/2018 05:23 PM, Paolo Valente wrote:
> Hi Shaohua, all,
> at last, I started testing your io.low limit for blk-throttle.  One of
> the things I'm interested in is how good throttling is in achieving a
> high throughput in the presence of realistic, variable workloads.
> 
> However, I seem to have bumped into a totally different problem.  The
> io.low parameter doesn't seem to guarantee what I understand it is meant
> to guarantee: minimum per-group bandwidths.  For example, with
> - one group, the interfered, containing one process that does sequential
>   reads with fio
> - io.low set to 100MB/s for the interfered
> - six other groups, the interferers, with each interferer containing one
>   process doing sequential read with fio
> - io.low set to 10MB/s for each interferer
> - the workload executed on an SSD, with a 500MB/s of overall throughput
> the interfered gets only 75MB/s.
> 
> In particular, the throughput of the interfered becomes lower and
> lower as the number of interferers is increased.  So you can make it
> become even much lower than the 75MB/s in the example above.  There
> seems to be no control on bandwidth.
> 
> Am I doing something wrong?  Or did I simply misunderstand the goal of
> io.low, and the only parameter for guaranteeing the desired bandwidth to
> a group is io.max (to be used indirectly, by limiting the bandwidth of
> the interferers)?
> 
> If useful for you, you can reproduce the above test very quickly, by
> using the S suite [1] and typing:
> 
> cd thr-lat-with-interference
> sudo ./thr-lat-with-interference.sh -b t -w 1 -W "1000 1000 
> 1000 1000 1000 1000" -n 6 -T "read read read read read read" 
> -R "0 0 0 0 0 0"
> 
> Looking forward to your feedback,
> Paolo
> 
> [1] 
> 


testing io.low limit for blk-throttle

2018-04-22 Thread Paolo Valente
Hi Shaohua, all,
at last, I started testing your io.low limit for blk-throttle.  One of
the things I'm interested in is how good throttling is in achieving a
high throughput in the presence of realistic, variable workloads.

However, I seem to have bumped into a totally different problem.  The
io.low parameter doesn't seem to guarantee what I understand it is meant
to guarantee: minimum per-group bandwidths.  For example, with
- one group, the interfered, containing one process that does sequential
  reads with fio
- io.low set to 100MB/s for the interfered
- six other groups, the interferers, with each interferer containing one
  process doing sequential read with fio
- io.low set to 10MB/s for each interferer
- the workload executed on an SSD, with a 500MB/s of overall throughput
the interfered gets only 75MB/s.

In particular, the throughput of the interfered becomes lower and
lower as the number of interferers is increased.  So you can make it
become even much lower than the 75MB/s in the example above.  There
seems to be no control on bandwidth.

Am I doing something wrong?  Or did I simply misunderstand the goal of
io.low, and the only parameter for guaranteeing the desired bandwidth to
a group is io.max (to be used indirectly, by limiting the bandwidth of
the interferers)?

If useful for you, you can reproduce the above test very quickly, by
using the S suite [1] and typing:

cd thr-lat-with-interference
sudo ./thr-lat-with-interference.sh -b t -w 1 -W "1000 1000 
1000 1000 1000 1000" -n 6 -T "read read read read read read" -R 
"0 0 0 0 0 0"

Looking forward to your feedback,
Paolo

[1] 

Re: [PATCH] bsg referencing bus driver module

2018-04-22 Thread James Bottomley
On Fri, 2018-04-20 at 16:44 -0600, Anatoliy Glagolev wrote:
>  
> > This patch isn't applyable because your mailer has changed all the
> > tabs to spaces.
> > 
> > I also think there's no need to do it this way.  I think what we
> > need is for fc_bsg_remove() to wait until the bsg queue is
> > drained.  It does look like the author thought this happened
> > otherwise the code wouldn't have the note.  If we fix it that way
> > we can do the same thing in all the other transport classes that
> > use bsg (which all have a similar issue).
> > 
> > James
> > 
> 
> Thanks, James. Sorry about the tabs; re-sending.
> 
> On fc_bsg_remove()...: are you suggesting to implement the whole fix
> in scsi_transport_fc.c?

Yes, but it's not just scsi_transport_fc, scsi_transport_sas has the
same issue.  I think it's probably just the one liner addition of
blk_drain_queue() that fixes this.  There should probably be a block
primitive that does the correct queue reference dance and calls
blk_cleanup_queue() and blk_drain_queue() in order.

>  That would be nice, but I do not see how that
> is possible. Even with the queue drained bsg still holds a reference
> to the Scsi_Host via bsg_class_device; bsg_class_device itself is
> referenced on bsg_open and kept around while a user-mode process
> keeps a handle to bsg.

Once you've called bsg_unregister_queue(), the queue will be destroyed
and the reference released once the last job is drained, meaning the
user can keep the bsg device open, but it will just return errors
because of the lack of queue.  This scenario allows removal to proceed
without being held hostage by open devices.

> Even if we somehow implement the waiting the call may be stuck
> forever if the user-mode process keeps the handle.

No it won't: after blk_cleanup_queue(), the queue is in bypass mode: no
requests queued after this do anything other than complete with error,
so they never make it into SCSI.

> I think handling it via a rererence to the module is more consistent
> with the way things are done in Linux. You suggested the approach
> youself back in "Waiting for scsi_host_template release" discussion.

That was before I analyzed the code paths.  Module release is tricky,
because the module exit won't be called until the references drop to
zero, so you have to be careful about not creating a situation where
module exit never gets called and module exit code should force stuff
to detach and wait for the forcing to complete to make up for the
reference circularity problem.  If you do it purely by refcounting, the
module actually may never release (that's why scsi_remove_host works
the way it does, for instance).

James