Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Paolo Valente

Il giorno 30/mag/2014, alle ore 19:09, Vivek Goyal  ha 
scritto:

> […]
> Instead of just looking at numbers, I am keen on knowing what's the
> fundamental design change which allows this. What is CFQ doing wrong
> which BFQ gets right.
> 

I think that Tejun has already highlighted the key points and provided many 
details. To contribute to answer your questions about the reasons why bfq 
outperforms cfq, here is a summary of the most relevant underlying facts:

1) cfq is based on a round-robin scheme, in which an unlucky queue that should 
be served immediately may happen instead to wait for all the other busy queues 
before being served. In this respect, combining round robin with 
virtual-time-based improvements is likely to lead to not very clear solutions, 
and probably to sub-optimal results with respect to just using an optimal 
scheduler with provable deterministic guarantees (as the internal scheduler of 
bfq).

2) To provide a queue with a higher fraction of the throughput, a round-robin 
scheduler serves the queue for a longer time slice. Increasing time slices 
further increases per-request latencies. The problem may be mitigated by using 
preemption, but the result is a combination of a basic algorithm and a 
‘corrective’ heuristic. This is again a more convoluted, and likely less 
accurate, solution than using directly an optimal algorithm with provable 
guarantees.

3) In bfq, budgets play a similar role as time slices in cfq, i.e., once a 
queue has been granted access to the device, the queue is served, in the 
simplest case, until it finishes its budget. But, under bfq, the fraction of 
the throughput received by a queue is *independent* of the budget assigned to 
the queue. I agree that this may seem counterintuitive in the first place, 
especially if one is accustomed to thinking a la round-robin. Differently from 
a round-robin algorithm, the internal scheduler of bfq controls throughput 
distribution by controlling the frequency at which queues are served. The 
resulting degree of freedom with respect to budget sizes has the following two 
main advantages:
3.a) bfq can choose for each queue the budget that best fits the requirements 
or characteristics of the queue. For example, queues corresponding to 
time-sensitive applications are assigned small budgets, which guarantees that 
they are served quickly. On the opposite side, queues associated to I/O-bound 
processes performing mostly sequential I/O are assigned large budgets, which 
helps boost the throughput.
3.b) bfq does not need to assign large budgets to queues to provide them with 
large fractions of the throughput; hence bfq does not need to deteriorate 
per-request latencies to guarantee a differentiated throughput distribution.

3) The internal scheduler of bfq guarantees that a queue that needs to be 
served quickly may wait, unjustly, for the service of at most one queue. More 
formally, bfq guarantees that each budget is completed with the smallest 
possible delay, for a budget-by-budget scheduler, with respect to an ideal, 
perfectly fair scheduler (i.e., an ideal scheduler that serves all busy queues 
at the same, providing each with a fraction of the throughput proportional to 
its weight).

4) Assigning temporarily a large fraction of the throughput is the main 
mechanism through which bfq provides interactive and soft real-time 
applications with a low latency. Thanks to fact 3.b, bfq achieves this goal 
without increasing per-request latencies. As for how applications are deemed as 
interactive or soft real-time, I have tried to describe both detection 
heuristics in patches 06 and 07.

Finally, as for adding to cfq the heuristics I have added to bfq, I think that 
this would probably improve application latencies also with cfq. But, because 
of the above facts, the result would unavoidably be worse than with bfq.

Paolo

--
Paolo Valente 
Algogroup
Dipartimento di Fisica, Informatica e Matematica
Via Campi, 213/B
41125 Modena - Italy  
homepage:  http://algogroup.unimore.it/people/paolo/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Tejun Heo
Hey, Vivek.

On Fri, May 30, 2014 at 01:55:27PM -0400, Vivek Goyal wrote:
> Now CFQ also dynamically adjusts the slice length based on the how
> many queues are ready to do IO. One problem with fixed slice lenth
> round robin was that if there are lot of queues doing IO, then after
> serving one queue, same queue will get time slice after a long time.
> 
> Corrado Zoccolo did work in this area in an attempt to improve latency.
> Now slice length is calculated dynamically in an attempt to achieve
> better latency.

Consdier it a better version of that.

> Ok, so you prefer to have another IO scheduler instead of improving
> CFQ?

As I wrote in another reply, I think the best approach would be
morphing cfq to accomodate the scheduling algorithm and heristics of
bfq.

> Tejun, I have spent significant amount of time on BFQ few years ago. And
> that's the reason I have not read it again yet. My understanding was
> that there was nothing which would not be done in CFQ (atleast things
> which mattered).

THERE IS A NEW PAPER.

> Looks like you prefer introducing a new scheduler instead of improving
> CFQ. My preference is to improve CFQ. Borrow good ideas from BFQ and
> implement them in CFQ.

LOOKS LIKE YOU ARE NOT READING ANYTHING AND JUST WRITING IRRELEVANT
SHIT.

> So why not improve CFQ instead of carrying and maintaining another 
> scheduler. And then have a discussion that what's the default scheduler.

For fuck's sake, I'm out.  This is total waste of time.  You don't
read what others are writing and refuse to study what's right there.
What are you doing?

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Vivek Goyal
On Fri, May 30, 2014 at 01:26:09PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Fri, May 30, 2014 at 01:09:58PM -0400, Vivek Goyal wrote:
> > Are you referring to BFQ paper. I had read one in the past and it was
> > all about how to achieve more accurate fairness. At this point of time
> > I don't think that smarter algorithm is the problem. Until and unless
> > somebody can show me that how algorithm is a problem.
> 
> Because cfq rr's with heuristically guessed slices and bfq calculates
> where each one is supposed to end and then schedules the slices
> accordingly.  With a lot of slices to serve, cfq loses track of which
> one should come first to adhere to desired latency guarantees while
> bfq doesn't, which in turn allows more latitude in using longer slices
> for bfq allowing for better throughput.  It's all in the paper and
> backed by numbers.  What more do you want?

Now CFQ also dynamically adjusts the slice length based on the how
many queues are ready to do IO. One problem with fixed slice lenth
round robin was that if there are lot of queues doing IO, then after
serving one queue, same queue will get time slice after a long time.

Corrado Zoccolo did work in this area in an attempt to improve latency.
Now slice length is calculated dynamically in an attempt to achieve
better latency.

commit 5db5d64277bf390056b1a87d0bb288c8b8553f96
Author: Corrado Zoccolo 
Date:   Mon Oct 26 22:44:04 2009 +0100

cfq-iosched: adapt slice to number of processes doing I/O
 
> 
> > Apart from algorithm, one thing BFQ did different in the past was provide
> > fairness based on amount of IO done *and not in terms of time slice*. I
> > don't know if that's still the case. If it is the case I am wondering
> > why CFQ was converted from amount of IO based fairness to time slice
> > based fairness.
> 
> Rotating rusts being rotating rusts, either measure isn't perfect.
> Bandwidth tracking gets silly with seeky workloads while pure time
> slice tracking unfairly treats users of inner and outer tracks.  BFQ
> uses bw tracking for sequential workload while basically switches to
> time slice tracking for seeky workloads.  These were pretty clearly
> discussed in the paper.

Ok, so you prefer to have another IO scheduler instead of improving
CFQ?

> 
> > I am all for a better algorithm. I just want to somebody to explain it
> > to me that what magic they have done to achieve better throughput as
> > well as better latency.
> 
> It feels more like you're actively refusing to understand it even when
> the algorithm and heristics involved are as clearly presented as it
> could be.  This is one of the best documented piece of work in this
> area of the kernel.  Just read the papers.

Tejun, I have spent significant amount of time on BFQ few years ago. And
that's the reason I have not read it again yet. My understanding was
that there was nothing which would not be done in CFQ (atleast things
which mattered).

Looks like you prefer introducing a new scheduler instead of improving
CFQ. My preference is to improve CFQ. Borrow good ideas from BFQ and
implement them in CFQ.

> 
> > Do you think that CFQ's problems come from round robin algorithms. No,
> > they don't. Most of the time we don't even honor the allocated time
> 
> Oh yes, why do you think we bother with preemption at all?  bfq
> reportedly achieves sufficient fairness and responsiveness without the
> need for preemption.  This is a lot clearer model.

We primarily allow preemption of buffered write queue. If you allocate
a share for buffered write queue, that would be good for not starving
buffered writes but your read latencies should go down. 

> 
> > slice (except sequential IO) and preempt the allocated slice. That's
> > why I think implementing a more fair algorithm is probably not the
> > solution.
> 
> If you actually study what has been presented, you'd immediately
> recognize that a lot of what bfq does is clarification and
> improvements of the buried heuristics.
> 
> Look at it this way: RR + preemption becomes the basic timestamp based
> scheduling and other heuristics are extracted and clarified.  That's
> what it is.
> 
> > CFQ's throughput problems come from idling and driving lower queue depth.
> > And CFQ's throughput problems arise due to suppressing buffered write
> > very actively in presence of sync IO. 
> > 
> > I want to know what has BFQ done fundamentally different to take care of
> > above issues. 
> 
> Yeah, read the paper.  They also analyzed why some things behave
> certain ways.  A lot of them are pretty spot-on.
> 
> > Remember CFQ had started with a simple algorith. Just start allocating
> > time slice in proportion to ioprio. That simple scheme did not work in
> > real world and then it became more complicated.
> 
> Yes, this is another refinement step.  WTF is that so hard to warp
> your head around it?  It doesn't throw away much.  It builds upon cfq
> and clarifies and improves what it has been doing.

So why not improve CFQ instead 

Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Tejun Heo
On Fri, May 30, 2014 at 01:31:46PM -0400, Vivek Goyal wrote:
> > What allows BFQ to provide the above features is its accurate
> > scheduling engine (patches 1-4), combined with a set of simple
> > heuristics and improvements (patches 5-14). 
> 
> This is very hard to understand. This puzzle need to be broken down
> into small pieces and explained in simple design terms so that even
> 5 years down the line I can explain why BFQ was better.

Vivek, stop wasting people's time and just go study the paper.  Let's
*please* talk after that.  What do you expect him to do?  Copy & paste
the whole paper here and walk you through each step of it?  He
presented as much information as he could and then provided sufficient
summary in the head message and as comments in the implementation.
It's now *your* turn to study what has been presented.  Sure, if you
have further questions or think that the implementation and patches
need more documentation, that's completely fine but I find it
ridiculous that you're demanding to be spoon-fed information which is
readily available in an easily consumable form.  Seriously, this is
the best documentation we've had in this area *EVER*.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Vivek Goyal
On Tue, May 27, 2014 at 02:42:24PM +0200, paolo wrote:
> From: Paolo Valente 
> 
> [Re-posting, previous attempt seems to have partially failed]
> 
> Hi,
> this patchset introduces the last version of BFQ, a proportional-share
> storage-I/O scheduler. BFQ also supports hierarchical scheduling with
> a cgroups interface. The first version of BFQ was submitted a few
> years ago [1]. It is denoted as v0 in the patches, to distinguish it
> from the version I am submitting now, v7r4. In particular, the first
> four patches introduce BFQ-v0, whereas the remaining patches turn it
> progressively into BFQ-v7r4. Here are some nice features of this last
> version.
> 
> Low latency for interactive applications
> 
> According to our results, regardless of the actual background
> workload, for interactive tasks the storage device is virtually as
> responsive as if it was idle. For example, even if one or more of the
> following background workloads are being executed:
> - one or more large files are being read or written,
> - a tree of source files is being compiled,
> - one or more virtual machines are performing I/O,
> - a software update is in progress,
> - indexing daemons are scanning filesystems and updating their
>   databases,
> starting an application or loading a file from within an application
> takes about the same time as if the storage device was idle. As a
> comparison, with CFQ, NOOP or DEADLINE, and in the same conditions,
> applications experience high latencies, or even become unresponsive
> until the background workload terminates (also on SSDs).

So how do you achieve it? IOW, how do you figure out something is
interactive and just give it priority and almost stop other. What happens
to notion of fairness in that case.

And if there is a way to figure out interactive applications, then even
in CFQ, one should be able to easily put these queues in the beginning of
service tree so that they get served first and achieve better experience?

And in general that would be a desirable thing to do. So why not modify
CFQ.

> 
> Low latency for soft real-time applications
> 
> Also soft real-time applications, such as audio and video
> players/streamers, enjoy low latency and drop rate, regardless of the
> storage-device background workload. As a consequence, these
> applications do not suffer from almost any glitch due to the
> background workload.

Again, how do you achieve it?

> 
> High throughput
> 
> On hard disks, BFQ achieves up to 30% higher throughput than CFQ,

For what workload? 

> and
> up to 150% higher throughput than DEADLINE and NOOP,

Is this true with buffered write workload also? I think these % will
be very dependent on IO pattern.

Again I would like to know how did you achieve such a high throughput
when compared to CFQ, Deadline, NOOP. One of the things which drops
throughput on rotational hard disk is non-sequential IO pattern and
CFQ already does that. So only way to achieve higher throughput will
be to accumulate more sequetial IO in the queue and then let that queue
run for longer and stop other IO from other queues. And that will mean
higher latencies for IO in other queues.

So on this rotation hard disk, how do you achieve higher throughput as
well as reduced latency.

> with half of the
> parallel workloads considered in our tests. With the rest of the
> workloads, and with all the workloads on flash-based devices, BFQ
> achieves instead about the same throughput as the other schedulers.
> 
> Strong fairness guarantees (already provided by BFQ-v0)
> 
> As for long-term guarantees, BFQ distributes the device throughput
> (and not just the device time) as desired to I/O-bound applications,

I think this is one key differece as comapred to CFQ. Fairness based
on bandwidth and not fairness based on time slice.

So if a process is doing large sequential IO and other is doing small
random IO, most of the disk time will be given to process doing small
random IOs. Is that more fair. I think that's one reason that CFQ was
switched to time based scheme. Provide time slices and after that
it is up to the application how much they can get out of disk in that
slice based on their IO pattern. At least in terms of fairness, that
sounds more fair to me.

I think this is one point which needs to be discussed that is it
a better idea to switch to bandwidth based fairness. Should we change
CFQ to achieve that or we need to introduce new IO scheduler for that.

> with any workload and regardless of the device parameters.
> 
> What allows BFQ to provide the above features is its accurate
> scheduling engine (patches 1-4), combined with a set of simple
> heuristics and improvements (patches 5-14). 

This is very hard to understand. This puzzle need to be broken down
into small pieces and explained in simple design terms so that even
5 years down the line I can explain why BFQ was better.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a 

Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Tejun Heo
Hello,

On Fri, May 30, 2014 at 01:09:58PM -0400, Vivek Goyal wrote:
> Are you referring to BFQ paper. I had read one in the past and it was
> all about how to achieve more accurate fairness. At this point of time
> I don't think that smarter algorithm is the problem. Until and unless
> somebody can show me that how algorithm is a problem.

Because cfq rr's with heuristically guessed slices and bfq calculates
where each one is supposed to end and then schedules the slices
accordingly.  With a lot of slices to serve, cfq loses track of which
one should come first to adhere to desired latency guarantees while
bfq doesn't, which in turn allows more latitude in using longer slices
for bfq allowing for better throughput.  It's all in the paper and
backed by numbers.  What more do you want?

> Apart from algorithm, one thing BFQ did different in the past was provide
> fairness based on amount of IO done *and not in terms of time slice*. I
> don't know if that's still the case. If it is the case I am wondering
> why CFQ was converted from amount of IO based fairness to time slice
> based fairness.

Rotating rusts being rotating rusts, either measure isn't perfect.
Bandwidth tracking gets silly with seeky workloads while pure time
slice tracking unfairly treats users of inner and outer tracks.  BFQ
uses bw tracking for sequential workload while basically switches to
time slice tracking for seeky workloads.  These were pretty clearly
discussed in the paper.

> I am all for a better algorithm. I just want to somebody to explain it
> to me that what magic they have done to achieve better throughput as
> well as better latency.

It feels more like you're actively refusing to understand it even when
the algorithm and heristics involved are as clearly presented as it
could be.  This is one of the best documented piece of work in this
area of the kernel.  Just read the papers.

> Do you think that CFQ's problems come from round robin algorithms. No,
> they don't. Most of the time we don't even honor the allocated time

Oh yes, why do you think we bother with preemption at all?  bfq
reportedly achieves sufficient fairness and responsiveness without the
need for preemption.  This is a lot clearer model.

> slice (except sequential IO) and preempt the allocated slice. That's
> why I think implementing a more fair algorithm is probably not the
> solution.

If you actually study what has been presented, you'd immediately
recognize that a lot of what bfq does is clarification and
improvements of the buried heuristics.

Look at it this way: RR + preemption becomes the basic timestamp based
scheduling and other heuristics are extracted and clarified.  That's
what it is.

> CFQ's throughput problems come from idling and driving lower queue depth.
> And CFQ's throughput problems arise due to suppressing buffered write
> very actively in presence of sync IO. 
> 
> I want to know what has BFQ done fundamentally different to take care of
> above issues. 

Yeah, read the paper.  They also analyzed why some things behave
certain ways.  A lot of them are pretty spot-on.

> Remember CFQ had started with a simple algorith. Just start allocating
> time slice in proportion to ioprio. That simple scheme did not work in
> real world and then it became more complicated.

Yes, this is another refinement step.  WTF is that so hard to warp
your head around it?  It doesn't throw away much.  It builds upon cfq
and clarifies and improves what it has been doing.

> What's the guarantee that same thing will not happen to BFQ. There is
> no point in getting fairness if overall performance sucks. That's what
> happens even with block IO cgroups. Create more than 4 cgroups, put some
> workload in that and with CFQ your performance will be so bad that you
> will drop the idea of getting fairness.

Dude, just go read the paper.

> Can you please provide me the link to the paper. I had read one few
> years back. I am not sure if it is still the same paper of a new one.
> And after experimenting with their implementation and playing with 
> CFQ, my impression was that fairness algorithm is not the core problem.

The references are in the frigging head message and in the head
comment of the implementation.  Are you even reading stuff?  You're
just wasting other people's time.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Vivek Goyal
On Fri, May 30, 2014 at 12:16:50PM -0400, Tejun Heo wrote:
> Hello, Vivek.
> 
> On Fri, May 30, 2014 at 11:32:28AM -0400, Vivek Goyal wrote:
> > I don't think most of the people care about strong fairness guarantee.
> > As an algorithm round robin is not bad for ensuring fairness. CFQ had
> > started with that but then it stopped focussing on fairness and rather
> > focussed on trying to address various real issues.
> 
> Oh, I widely disagree.  Have you looked at the test results in the
> paper?  Unless the results are completely bogus, which it probably
> isn't, this is awesome.  This is a *lot* more clearly characterized
> algorithm which shows significantly better behavior especially in use
> cases where scheduling matters.  I mean, it even reaches higher
> throughput while achieving lower latency.  Why wouldn't we want it?
> 

Of everybody wants "higher throughput while achieving lower latency". Most
of the time these are contradictory goals.

Instead of just looking at numbers, I am keen on knowing what's the
fundamental design change which allows this. What is CFQ doing wrong
which BFQ gets right.

Are you referring to BFQ paper. I had read one in the past and it was
all about how to achieve more accurate fairness. At this point of time
I don't think that smarter algorithm is the problem. Until and unless
somebody can show me that how algorithm is a problem.

Apart from algorithm, one thing BFQ did different in the past was provide
fairness based on amount of IO done *and not in terms of time slice*. I
don't know if that's still the case. If it is the case I am wondering
why CFQ was converted from amount of IO based fairness to time slice
based fairness.

I am all for a better algorithm. I just want to somebody to explain it
to me that what magic they have done to achieve better throughput as
well as better latency.

Do you think that CFQ's problems come from round robin algorithms. No,
they don't. Most of the time we don't even honor the allocated time
slice (except sequential IO) and preempt the allocated slice. That's
why I think implementing a more fair algorithm is probably not the
solution.

CFQ's throughput problems come from idling and driving lower queue depth.
And CFQ's throughput problems arise due to suppressing buffered write
very actively in presence of sync IO. 

I want to know what has BFQ done fundamentally different to take care of
above issues. 

> > And CFQ's problems don't arise from not having a good fairness algorithm.
> > So I don't think this should be the reason for taking a new scheduler.
> 
> In comparison, cfq's fairness behavior is a lot worse but even
> ignoring thing, one of the major problems of cfq is that the behavior
> is hardly characterized.  It's really difficult to anticipate what
> it'd do and understand why, which makes it very difficult to maintain
> and improve.  Even just for the latter point, it'd be worthwhile to
> adopt bfq.

Remember CFQ had started with a simple algorith. Just start allocating
time slice in proportion to ioprio. That simple scheme did not work in
real world and then it became more complicated.

What's the guarantee that same thing will not happen to BFQ. There is
no point in getting fairness if overall performance sucks. That's what
happens even with block IO cgroups. Create more than 4 cgroups, put some
workload in that and with CFQ your performance will be so bad that you
will drop the idea of getting fairness.

Why performance is bad, again due to idling. In an attmept to provide
isolation between IO of two cgroups, we idle. You want to dilute the
the isolation, sure you will get better throughput and CFQ can do that
too.

> 
> > I think instead of numbers, what would help is a short description
> > that what's the fundamental problem with CFQ which BFQ does not
> > have and how did you solve that issue.
> 
> The papers are pretty clear and not too long.  Have you read them?

Can you please provide me the link to the paper. I had read one few
years back. I am not sure if it is still the same paper of a new one.
And after experimenting with their implementation and playing with 
CFQ, my impression was that fairness algorithm is not the core problem.

I will be more than happy to be proven wrong. Just I need somebody to
not throw just numbers at me but rather explain to me that why BFQ
performs better and why CFQ can't do the same thing.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Tejun Heo
Hello, Vivek.

On Fri, May 30, 2014 at 11:32:28AM -0400, Vivek Goyal wrote:
> I don't think most of the people care about strong fairness guarantee.
> As an algorithm round robin is not bad for ensuring fairness. CFQ had
> started with that but then it stopped focussing on fairness and rather
> focussed on trying to address various real issues.

Oh, I widely disagree.  Have you looked at the test results in the
paper?  Unless the results are completely bogus, which it probably
isn't, this is awesome.  This is a *lot* more clearly characterized
algorithm which shows significantly better behavior especially in use
cases where scheduling matters.  I mean, it even reaches higher
throughput while achieving lower latency.  Why wouldn't we want it?

> And CFQ's problems don't arise from not having a good fairness algorithm.
> So I don't think this should be the reason for taking a new scheduler.

In comparison, cfq's fairness behavior is a lot worse but even
ignoring thing, one of the major problems of cfq is that the behavior
is hardly characterized.  It's really difficult to anticipate what
it'd do and understand why, which makes it very difficult to maintain
and improve.  Even just for the latter point, it'd be worthwhile to
adopt bfq.

> I think instead of numbers, what would help is a short description
> that what's the fundamental problem with CFQ which BFQ does not
> have and how did you solve that issue.

The papers are pretty clear and not too long.  Have you read them?

> One issue you seemed to mention is that write is a problem. CFQ 
> suppresses buffered writes very actively in an attempt to improve
> read latencies. How did you make it even better with BFQ.
> 
> Last time I had looked at BFQ, it looked pretty similar to CFQ except
> that core round robin algorithm had been replaced by a more fair
> algo and more things done like less preemption etc.
> 
> But personally I don't think using a more accurate fairness algorithm
> is the problem to begin with most of the time.
> 
> So I fail to understand that why do we need BFQ.

I violently disagree.  This is awesome.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Vivek Goyal
On Tue, May 27, 2014 at 02:42:24PM +0200, paolo wrote:

[..]
> Strong fairness guarantees (already provided by BFQ-v0)
> 
> As for long-term guarantees, BFQ distributes the device throughput
> (and not just the device time) as desired to I/O-bound applications,
> with any workload and regardless of the device parameters.

I don't think most of the people care about strong fairness guarantee.
As an algorithm round robin is not bad for ensuring fairness. CFQ had
started with that but then it stopped focussing on fairness and rather
focussed on trying to address various real issues.

And CFQ's problems don't arise from not having a good fairness algorithm.
So I don't think this should be the reason for taking a new scheduler.

I think instead of numbers, what would help is a short description
that what's the fundamental problem with CFQ which BFQ does not
have and how did you solve that issue.

One issue you seemed to mention is that write is a problem. CFQ 
suppresses buffered writes very actively in an attempt to improve
read latencies. How did you make it even better with BFQ.

Last time I had looked at BFQ, it looked pretty similar to CFQ except
that core round robin algorithm had been replaced by a more fair
algo and more things done like less preemption etc.

But personally I don't think using a more accurate fairness algorithm
is the problem to begin with most of the time.

So I fail to understand that why do we need BFQ.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Vivek Goyal
On Tue, May 27, 2014 at 02:42:24PM +0200, paolo wrote:

[..]
 Strong fairness guarantees (already provided by BFQ-v0)
 
 As for long-term guarantees, BFQ distributes the device throughput
 (and not just the device time) as desired to I/O-bound applications,
 with any workload and regardless of the device parameters.

I don't think most of the people care about strong fairness guarantee.
As an algorithm round robin is not bad for ensuring fairness. CFQ had
started with that but then it stopped focussing on fairness and rather
focussed on trying to address various real issues.

And CFQ's problems don't arise from not having a good fairness algorithm.
So I don't think this should be the reason for taking a new scheduler.

I think instead of numbers, what would help is a short description
that what's the fundamental problem with CFQ which BFQ does not
have and how did you solve that issue.

One issue you seemed to mention is that write is a problem. CFQ 
suppresses buffered writes very actively in an attempt to improve
read latencies. How did you make it even better with BFQ.

Last time I had looked at BFQ, it looked pretty similar to CFQ except
that core round robin algorithm had been replaced by a more fair
algo and more things done like less preemption etc.

But personally I don't think using a more accurate fairness algorithm
is the problem to begin with most of the time.

So I fail to understand that why do we need BFQ.

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Tejun Heo
Hello, Vivek.

On Fri, May 30, 2014 at 11:32:28AM -0400, Vivek Goyal wrote:
 I don't think most of the people care about strong fairness guarantee.
 As an algorithm round robin is not bad for ensuring fairness. CFQ had
 started with that but then it stopped focussing on fairness and rather
 focussed on trying to address various real issues.

Oh, I widely disagree.  Have you looked at the test results in the
paper?  Unless the results are completely bogus, which it probably
isn't, this is awesome.  This is a *lot* more clearly characterized
algorithm which shows significantly better behavior especially in use
cases where scheduling matters.  I mean, it even reaches higher
throughput while achieving lower latency.  Why wouldn't we want it?

 And CFQ's problems don't arise from not having a good fairness algorithm.
 So I don't think this should be the reason for taking a new scheduler.

In comparison, cfq's fairness behavior is a lot worse but even
ignoring thing, one of the major problems of cfq is that the behavior
is hardly characterized.  It's really difficult to anticipate what
it'd do and understand why, which makes it very difficult to maintain
and improve.  Even just for the latter point, it'd be worthwhile to
adopt bfq.

 I think instead of numbers, what would help is a short description
 that what's the fundamental problem with CFQ which BFQ does not
 have and how did you solve that issue.

The papers are pretty clear and not too long.  Have you read them?

 One issue you seemed to mention is that write is a problem. CFQ 
 suppresses buffered writes very actively in an attempt to improve
 read latencies. How did you make it even better with BFQ.
 
 Last time I had looked at BFQ, it looked pretty similar to CFQ except
 that core round robin algorithm had been replaced by a more fair
 algo and more things done like less preemption etc.
 
 But personally I don't think using a more accurate fairness algorithm
 is the problem to begin with most of the time.
 
 So I fail to understand that why do we need BFQ.

I violently disagree.  This is awesome.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Vivek Goyal
On Fri, May 30, 2014 at 12:16:50PM -0400, Tejun Heo wrote:
 Hello, Vivek.
 
 On Fri, May 30, 2014 at 11:32:28AM -0400, Vivek Goyal wrote:
  I don't think most of the people care about strong fairness guarantee.
  As an algorithm round robin is not bad for ensuring fairness. CFQ had
  started with that but then it stopped focussing on fairness and rather
  focussed on trying to address various real issues.
 
 Oh, I widely disagree.  Have you looked at the test results in the
 paper?  Unless the results are completely bogus, which it probably
 isn't, this is awesome.  This is a *lot* more clearly characterized
 algorithm which shows significantly better behavior especially in use
 cases where scheduling matters.  I mean, it even reaches higher
 throughput while achieving lower latency.  Why wouldn't we want it?
 

Of everybody wants higher throughput while achieving lower latency. Most
of the time these are contradictory goals.

Instead of just looking at numbers, I am keen on knowing what's the
fundamental design change which allows this. What is CFQ doing wrong
which BFQ gets right.

Are you referring to BFQ paper. I had read one in the past and it was
all about how to achieve more accurate fairness. At this point of time
I don't think that smarter algorithm is the problem. Until and unless
somebody can show me that how algorithm is a problem.

Apart from algorithm, one thing BFQ did different in the past was provide
fairness based on amount of IO done *and not in terms of time slice*. I
don't know if that's still the case. If it is the case I am wondering
why CFQ was converted from amount of IO based fairness to time slice
based fairness.

I am all for a better algorithm. I just want to somebody to explain it
to me that what magic they have done to achieve better throughput as
well as better latency.

Do you think that CFQ's problems come from round robin algorithms. No,
they don't. Most of the time we don't even honor the allocated time
slice (except sequential IO) and preempt the allocated slice. That's
why I think implementing a more fair algorithm is probably not the
solution.

CFQ's throughput problems come from idling and driving lower queue depth.
And CFQ's throughput problems arise due to suppressing buffered write
very actively in presence of sync IO. 

I want to know what has BFQ done fundamentally different to take care of
above issues. 

  And CFQ's problems don't arise from not having a good fairness algorithm.
  So I don't think this should be the reason for taking a new scheduler.
 
 In comparison, cfq's fairness behavior is a lot worse but even
 ignoring thing, one of the major problems of cfq is that the behavior
 is hardly characterized.  It's really difficult to anticipate what
 it'd do and understand why, which makes it very difficult to maintain
 and improve.  Even just for the latter point, it'd be worthwhile to
 adopt bfq.

Remember CFQ had started with a simple algorith. Just start allocating
time slice in proportion to ioprio. That simple scheme did not work in
real world and then it became more complicated.

What's the guarantee that same thing will not happen to BFQ. There is
no point in getting fairness if overall performance sucks. That's what
happens even with block IO cgroups. Create more than 4 cgroups, put some
workload in that and with CFQ your performance will be so bad that you
will drop the idea of getting fairness.

Why performance is bad, again due to idling. In an attmept to provide
isolation between IO of two cgroups, we idle. You want to dilute the
the isolation, sure you will get better throughput and CFQ can do that
too.

 
  I think instead of numbers, what would help is a short description
  that what's the fundamental problem with CFQ which BFQ does not
  have and how did you solve that issue.
 
 The papers are pretty clear and not too long.  Have you read them?

Can you please provide me the link to the paper. I had read one few
years back. I am not sure if it is still the same paper of a new one.
And after experimenting with their implementation and playing with 
CFQ, my impression was that fairness algorithm is not the core problem.

I will be more than happy to be proven wrong. Just I need somebody to
not throw just numbers at me but rather explain to me that why BFQ
performs better and why CFQ can't do the same thing.

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Tejun Heo
Hello,

On Fri, May 30, 2014 at 01:09:58PM -0400, Vivek Goyal wrote:
 Are you referring to BFQ paper. I had read one in the past and it was
 all about how to achieve more accurate fairness. At this point of time
 I don't think that smarter algorithm is the problem. Until and unless
 somebody can show me that how algorithm is a problem.

Because cfq rr's with heuristically guessed slices and bfq calculates
where each one is supposed to end and then schedules the slices
accordingly.  With a lot of slices to serve, cfq loses track of which
one should come first to adhere to desired latency guarantees while
bfq doesn't, which in turn allows more latitude in using longer slices
for bfq allowing for better throughput.  It's all in the paper and
backed by numbers.  What more do you want?

 Apart from algorithm, one thing BFQ did different in the past was provide
 fairness based on amount of IO done *and not in terms of time slice*. I
 don't know if that's still the case. If it is the case I am wondering
 why CFQ was converted from amount of IO based fairness to time slice
 based fairness.

Rotating rusts being rotating rusts, either measure isn't perfect.
Bandwidth tracking gets silly with seeky workloads while pure time
slice tracking unfairly treats users of inner and outer tracks.  BFQ
uses bw tracking for sequential workload while basically switches to
time slice tracking for seeky workloads.  These were pretty clearly
discussed in the paper.

 I am all for a better algorithm. I just want to somebody to explain it
 to me that what magic they have done to achieve better throughput as
 well as better latency.

It feels more like you're actively refusing to understand it even when
the algorithm and heristics involved are as clearly presented as it
could be.  This is one of the best documented piece of work in this
area of the kernel.  Just read the papers.

 Do you think that CFQ's problems come from round robin algorithms. No,
 they don't. Most of the time we don't even honor the allocated time

Oh yes, why do you think we bother with preemption at all?  bfq
reportedly achieves sufficient fairness and responsiveness without the
need for preemption.  This is a lot clearer model.

 slice (except sequential IO) and preempt the allocated slice. That's
 why I think implementing a more fair algorithm is probably not the
 solution.

If you actually study what has been presented, you'd immediately
recognize that a lot of what bfq does is clarification and
improvements of the buried heuristics.

Look at it this way: RR + preemption becomes the basic timestamp based
scheduling and other heuristics are extracted and clarified.  That's
what it is.

 CFQ's throughput problems come from idling and driving lower queue depth.
 And CFQ's throughput problems arise due to suppressing buffered write
 very actively in presence of sync IO. 
 
 I want to know what has BFQ done fundamentally different to take care of
 above issues. 

Yeah, read the paper.  They also analyzed why some things behave
certain ways.  A lot of them are pretty spot-on.

 Remember CFQ had started with a simple algorith. Just start allocating
 time slice in proportion to ioprio. That simple scheme did not work in
 real world and then it became more complicated.

Yes, this is another refinement step.  WTF is that so hard to warp
your head around it?  It doesn't throw away much.  It builds upon cfq
and clarifies and improves what it has been doing.

 What's the guarantee that same thing will not happen to BFQ. There is
 no point in getting fairness if overall performance sucks. That's what
 happens even with block IO cgroups. Create more than 4 cgroups, put some
 workload in that and with CFQ your performance will be so bad that you
 will drop the idea of getting fairness.

Dude, just go read the paper.

 Can you please provide me the link to the paper. I had read one few
 years back. I am not sure if it is still the same paper of a new one.
 And after experimenting with their implementation and playing with 
 CFQ, my impression was that fairness algorithm is not the core problem.

The references are in the frigging head message and in the head
comment of the implementation.  Are you even reading stuff?  You're
just wasting other people's time.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Vivek Goyal
On Tue, May 27, 2014 at 02:42:24PM +0200, paolo wrote:
 From: Paolo Valente paolo.vale...@unimore.it
 
 [Re-posting, previous attempt seems to have partially failed]
 
 Hi,
 this patchset introduces the last version of BFQ, a proportional-share
 storage-I/O scheduler. BFQ also supports hierarchical scheduling with
 a cgroups interface. The first version of BFQ was submitted a few
 years ago [1]. It is denoted as v0 in the patches, to distinguish it
 from the version I am submitting now, v7r4. In particular, the first
 four patches introduce BFQ-v0, whereas the remaining patches turn it
 progressively into BFQ-v7r4. Here are some nice features of this last
 version.
 
 Low latency for interactive applications
 
 According to our results, regardless of the actual background
 workload, for interactive tasks the storage device is virtually as
 responsive as if it was idle. For example, even if one or more of the
 following background workloads are being executed:
 - one or more large files are being read or written,
 - a tree of source files is being compiled,
 - one or more virtual machines are performing I/O,
 - a software update is in progress,
 - indexing daemons are scanning filesystems and updating their
   databases,
 starting an application or loading a file from within an application
 takes about the same time as if the storage device was idle. As a
 comparison, with CFQ, NOOP or DEADLINE, and in the same conditions,
 applications experience high latencies, or even become unresponsive
 until the background workload terminates (also on SSDs).

So how do you achieve it? IOW, how do you figure out something is
interactive and just give it priority and almost stop other. What happens
to notion of fairness in that case.

And if there is a way to figure out interactive applications, then even
in CFQ, one should be able to easily put these queues in the beginning of
service tree so that they get served first and achieve better experience?

And in general that would be a desirable thing to do. So why not modify
CFQ.

 
 Low latency for soft real-time applications
 
 Also soft real-time applications, such as audio and video
 players/streamers, enjoy low latency and drop rate, regardless of the
 storage-device background workload. As a consequence, these
 applications do not suffer from almost any glitch due to the
 background workload.

Again, how do you achieve it?

 
 High throughput
 
 On hard disks, BFQ achieves up to 30% higher throughput than CFQ,

For what workload? 

 and
 up to 150% higher throughput than DEADLINE and NOOP,

Is this true with buffered write workload also? I think these % will
be very dependent on IO pattern.

Again I would like to know how did you achieve such a high throughput
when compared to CFQ, Deadline, NOOP. One of the things which drops
throughput on rotational hard disk is non-sequential IO pattern and
CFQ already does that. So only way to achieve higher throughput will
be to accumulate more sequetial IO in the queue and then let that queue
run for longer and stop other IO from other queues. And that will mean
higher latencies for IO in other queues.

So on this rotation hard disk, how do you achieve higher throughput as
well as reduced latency.

 with half of the
 parallel workloads considered in our tests. With the rest of the
 workloads, and with all the workloads on flash-based devices, BFQ
 achieves instead about the same throughput as the other schedulers.
 
 Strong fairness guarantees (already provided by BFQ-v0)
 
 As for long-term guarantees, BFQ distributes the device throughput
 (and not just the device time) as desired to I/O-bound applications,

I think this is one key differece as comapred to CFQ. Fairness based
on bandwidth and not fairness based on time slice.

So if a process is doing large sequential IO and other is doing small
random IO, most of the disk time will be given to process doing small
random IOs. Is that more fair. I think that's one reason that CFQ was
switched to time based scheme. Provide time slices and after that
it is up to the application how much they can get out of disk in that
slice based on their IO pattern. At least in terms of fairness, that
sounds more fair to me.

I think this is one point which needs to be discussed that is it
a better idea to switch to bandwidth based fairness. Should we change
CFQ to achieve that or we need to introduce new IO scheduler for that.

 with any workload and regardless of the device parameters.
 
 What allows BFQ to provide the above features is its accurate
 scheduling engine (patches 1-4), combined with a set of simple
 heuristics and improvements (patches 5-14). 

This is very hard to understand. This puzzle need to be broken down
into small pieces and explained in simple design terms so that even
5 years down the line I can explain why BFQ was better.

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More 

Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Tejun Heo
On Fri, May 30, 2014 at 01:31:46PM -0400, Vivek Goyal wrote:
  What allows BFQ to provide the above features is its accurate
  scheduling engine (patches 1-4), combined with a set of simple
  heuristics and improvements (patches 5-14). 
 
 This is very hard to understand. This puzzle need to be broken down
 into small pieces and explained in simple design terms so that even
 5 years down the line I can explain why BFQ was better.

Vivek, stop wasting people's time and just go study the paper.  Let's
*please* talk after that.  What do you expect him to do?  Copy  paste
the whole paper here and walk you through each step of it?  He
presented as much information as he could and then provided sufficient
summary in the head message and as comments in the implementation.
It's now *your* turn to study what has been presented.  Sure, if you
have further questions or think that the implementation and patches
need more documentation, that's completely fine but I find it
ridiculous that you're demanding to be spoon-fed information which is
readily available in an easily consumable form.  Seriously, this is
the best documentation we've had in this area *EVER*.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Vivek Goyal
On Fri, May 30, 2014 at 01:26:09PM -0400, Tejun Heo wrote:
 Hello,
 
 On Fri, May 30, 2014 at 01:09:58PM -0400, Vivek Goyal wrote:
  Are you referring to BFQ paper. I had read one in the past and it was
  all about how to achieve more accurate fairness. At this point of time
  I don't think that smarter algorithm is the problem. Until and unless
  somebody can show me that how algorithm is a problem.
 
 Because cfq rr's with heuristically guessed slices and bfq calculates
 where each one is supposed to end and then schedules the slices
 accordingly.  With a lot of slices to serve, cfq loses track of which
 one should come first to adhere to desired latency guarantees while
 bfq doesn't, which in turn allows more latitude in using longer slices
 for bfq allowing for better throughput.  It's all in the paper and
 backed by numbers.  What more do you want?

Now CFQ also dynamically adjusts the slice length based on the how
many queues are ready to do IO. One problem with fixed slice lenth
round robin was that if there are lot of queues doing IO, then after
serving one queue, same queue will get time slice after a long time.

Corrado Zoccolo did work in this area in an attempt to improve latency.
Now slice length is calculated dynamically in an attempt to achieve
better latency.

commit 5db5d64277bf390056b1a87d0bb288c8b8553f96
Author: Corrado Zoccolo czocc...@gmail.com
Date:   Mon Oct 26 22:44:04 2009 +0100

cfq-iosched: adapt slice to number of processes doing I/O
 
 
  Apart from algorithm, one thing BFQ did different in the past was provide
  fairness based on amount of IO done *and not in terms of time slice*. I
  don't know if that's still the case. If it is the case I am wondering
  why CFQ was converted from amount of IO based fairness to time slice
  based fairness.
 
 Rotating rusts being rotating rusts, either measure isn't perfect.
 Bandwidth tracking gets silly with seeky workloads while pure time
 slice tracking unfairly treats users of inner and outer tracks.  BFQ
 uses bw tracking for sequential workload while basically switches to
 time slice tracking for seeky workloads.  These were pretty clearly
 discussed in the paper.

Ok, so you prefer to have another IO scheduler instead of improving
CFQ?

 
  I am all for a better algorithm. I just want to somebody to explain it
  to me that what magic they have done to achieve better throughput as
  well as better latency.
 
 It feels more like you're actively refusing to understand it even when
 the algorithm and heristics involved are as clearly presented as it
 could be.  This is one of the best documented piece of work in this
 area of the kernel.  Just read the papers.

Tejun, I have spent significant amount of time on BFQ few years ago. And
that's the reason I have not read it again yet. My understanding was
that there was nothing which would not be done in CFQ (atleast things
which mattered).

Looks like you prefer introducing a new scheduler instead of improving
CFQ. My preference is to improve CFQ. Borrow good ideas from BFQ and
implement them in CFQ.

 
  Do you think that CFQ's problems come from round robin algorithms. No,
  they don't. Most of the time we don't even honor the allocated time
 
 Oh yes, why do you think we bother with preemption at all?  bfq
 reportedly achieves sufficient fairness and responsiveness without the
 need for preemption.  This is a lot clearer model.

We primarily allow preemption of buffered write queue. If you allocate
a share for buffered write queue, that would be good for not starving
buffered writes but your read latencies should go down. 

 
  slice (except sequential IO) and preempt the allocated slice. That's
  why I think implementing a more fair algorithm is probably not the
  solution.
 
 If you actually study what has been presented, you'd immediately
 recognize that a lot of what bfq does is clarification and
 improvements of the buried heuristics.
 
 Look at it this way: RR + preemption becomes the basic timestamp based
 scheduling and other heuristics are extracted and clarified.  That's
 what it is.
 
  CFQ's throughput problems come from idling and driving lower queue depth.
  And CFQ's throughput problems arise due to suppressing buffered write
  very actively in presence of sync IO. 
  
  I want to know what has BFQ done fundamentally different to take care of
  above issues. 
 
 Yeah, read the paper.  They also analyzed why some things behave
 certain ways.  A lot of them are pretty spot-on.
 
  Remember CFQ had started with a simple algorith. Just start allocating
  time slice in proportion to ioprio. That simple scheme did not work in
  real world and then it became more complicated.
 
 Yes, this is another refinement step.  WTF is that so hard to warp
 your head around it?  It doesn't throw away much.  It builds upon cfq
 and clarifies and improves what it has been doing.

So why not improve CFQ instead of carrying and maintaining another 
scheduler. And then have a discussion that 

Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Tejun Heo
Hey, Vivek.

On Fri, May 30, 2014 at 01:55:27PM -0400, Vivek Goyal wrote:
 Now CFQ also dynamically adjusts the slice length based on the how
 many queues are ready to do IO. One problem with fixed slice lenth
 round robin was that if there are lot of queues doing IO, then after
 serving one queue, same queue will get time slice after a long time.
 
 Corrado Zoccolo did work in this area in an attempt to improve latency.
 Now slice length is calculated dynamically in an attempt to achieve
 better latency.

Consdier it a better version of that.

 Ok, so you prefer to have another IO scheduler instead of improving
 CFQ?

As I wrote in another reply, I think the best approach would be
morphing cfq to accomodate the scheduling algorithm and heristics of
bfq.

 Tejun, I have spent significant amount of time on BFQ few years ago. And
 that's the reason I have not read it again yet. My understanding was
 that there was nothing which would not be done in CFQ (atleast things
 which mattered).

THERE IS A NEW PAPER.

 Looks like you prefer introducing a new scheduler instead of improving
 CFQ. My preference is to improve CFQ. Borrow good ideas from BFQ and
 implement them in CFQ.

LOOKS LIKE YOU ARE NOT READING ANYTHING AND JUST WRITING IRRELEVANT
SHIT.

 So why not improve CFQ instead of carrying and maintaining another 
 scheduler. And then have a discussion that what's the default scheduler.

For fuck's sake, I'm out.  This is total waste of time.  You don't
read what others are writing and refuse to study what's right there.
What are you doing?

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-30 Thread Paolo Valente

Il giorno 30/mag/2014, alle ore 19:09, Vivek Goyal vgo...@redhat.com ha 
scritto:

 […]
 Instead of just looking at numbers, I am keen on knowing what's the
 fundamental design change which allows this. What is CFQ doing wrong
 which BFQ gets right.
 

I think that Tejun has already highlighted the key points and provided many 
details. To contribute to answer your questions about the reasons why bfq 
outperforms cfq, here is a summary of the most relevant underlying facts:

1) cfq is based on a round-robin scheme, in which an unlucky queue that should 
be served immediately may happen instead to wait for all the other busy queues 
before being served. In this respect, combining round robin with 
virtual-time-based improvements is likely to lead to not very clear solutions, 
and probably to sub-optimal results with respect to just using an optimal 
scheduler with provable deterministic guarantees (as the internal scheduler of 
bfq).

2) To provide a queue with a higher fraction of the throughput, a round-robin 
scheduler serves the queue for a longer time slice. Increasing time slices 
further increases per-request latencies. The problem may be mitigated by using 
preemption, but the result is a combination of a basic algorithm and a 
‘corrective’ heuristic. This is again a more convoluted, and likely less 
accurate, solution than using directly an optimal algorithm with provable 
guarantees.

3) In bfq, budgets play a similar role as time slices in cfq, i.e., once a 
queue has been granted access to the device, the queue is served, in the 
simplest case, until it finishes its budget. But, under bfq, the fraction of 
the throughput received by a queue is *independent* of the budget assigned to 
the queue. I agree that this may seem counterintuitive in the first place, 
especially if one is accustomed to thinking a la round-robin. Differently from 
a round-robin algorithm, the internal scheduler of bfq controls throughput 
distribution by controlling the frequency at which queues are served. The 
resulting degree of freedom with respect to budget sizes has the following two 
main advantages:
3.a) bfq can choose for each queue the budget that best fits the requirements 
or characteristics of the queue. For example, queues corresponding to 
time-sensitive applications are assigned small budgets, which guarantees that 
they are served quickly. On the opposite side, queues associated to I/O-bound 
processes performing mostly sequential I/O are assigned large budgets, which 
helps boost the throughput.
3.b) bfq does not need to assign large budgets to queues to provide them with 
large fractions of the throughput; hence bfq does not need to deteriorate 
per-request latencies to guarantee a differentiated throughput distribution.

3) The internal scheduler of bfq guarantees that a queue that needs to be 
served quickly may wait, unjustly, for the service of at most one queue. More 
formally, bfq guarantees that each budget is completed with the smallest 
possible delay, for a budget-by-budget scheduler, with respect to an ideal, 
perfectly fair scheduler (i.e., an ideal scheduler that serves all busy queues 
at the same, providing each with a fraction of the throughput proportional to 
its weight).

4) Assigning temporarily a large fraction of the throughput is the main 
mechanism through which bfq provides interactive and soft real-time 
applications with a low latency. Thanks to fact 3.b, bfq achieves this goal 
without increasing per-request latencies. As for how applications are deemed as 
interactive or soft real-time, I have tried to describe both detection 
heuristics in patches 06 and 07.

Finally, as for adding to cfq the heuristics I have added to bfq, I think that 
this would probably improve application latencies also with cfq. But, because 
of the above facts, the result would unavoidably be worse than with bfq.

Paolo

--
Paolo Valente 
Algogroup
Dipartimento di Fisica, Informatica e Matematica
Via Campi, 213/B
41125 Modena - Italy  
homepage:  http://algogroup.unimore.it/people/paolo/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-27 Thread paolo
From: Paolo Valente 

[Re-posting, previous attempt seems to have partially failed]

Hi,
this patchset introduces the last version of BFQ, a proportional-share
storage-I/O scheduler. BFQ also supports hierarchical scheduling with
a cgroups interface. The first version of BFQ was submitted a few
years ago [1]. It is denoted as v0 in the patches, to distinguish it
from the version I am submitting now, v7r4. In particular, the first
four patches introduce BFQ-v0, whereas the remaining patches turn it
progressively into BFQ-v7r4. Here are some nice features of this last
version.

Low latency for interactive applications

According to our results, regardless of the actual background
workload, for interactive tasks the storage device is virtually as
responsive as if it was idle. For example, even if one or more of the
following background workloads are being executed:
- one or more large files are being read or written,
- a tree of source files is being compiled,
- one or more virtual machines are performing I/O,
- a software update is in progress,
- indexing daemons are scanning filesystems and updating their
  databases,
starting an application or loading a file from within an application
takes about the same time as if the storage device was idle. As a
comparison, with CFQ, NOOP or DEADLINE, and in the same conditions,
applications experience high latencies, or even become unresponsive
until the background workload terminates (also on SSDs).

Low latency for soft real-time applications

Also soft real-time applications, such as audio and video
players/streamers, enjoy low latency and drop rate, regardless of the
storage-device background workload. As a consequence, these
applications do not suffer from almost any glitch due to the
background workload.

High throughput

On hard disks, BFQ achieves up to 30% higher throughput than CFQ, and
up to 150% higher throughput than DEADLINE and NOOP, with half of the
parallel workloads considered in our tests. With the rest of the
workloads, and with all the workloads on flash-based devices, BFQ
achieves instead about the same throughput as the other schedulers.

Strong fairness guarantees (already provided by BFQ-v0)

As for long-term guarantees, BFQ distributes the device throughput
(and not just the device time) as desired to I/O-bound applications,
with any workload and regardless of the device parameters.

What allows BFQ to provide the above features is its accurate
scheduling engine (patches 1-4), combined with a set of simple
heuristics and improvements (patches 5-14).  A 15-minute demo of the
performance of BFQ is available here [2]. I made this demo with an
older version of BFQ (v3r4) and under Linux 3.4.0. We have further
improved BFQ since then. The performance of the last version of BFQ is
shown, e.g., through some graphs here [3], under 3.14.0, compared
against CFQ, DEADLINE and NOOP, and on: a fast and a slow hard disk, a
RAID1, an SSD, a microSDHC Card and an eMMC. As an example, our
results on the SSD are reported also in a table at the end of this
email. Finally, details on how BFQ and its components work are
provided in the descriptions of the patches. An organic description of
the main BFQ algorithm and of most of its features can instead be
found in this paper [4].

Finally, as for testing in everyday use, BFQ is the default I/O
scheduler in, e.g., Manjaro, Sabayon, OpenMandriva and Arch Linux ARM
in some NAS boxes, plus several kernel forks for PCs and
smartphones. BFQ is instead optionally available in, e.g., Arch,
PCLinuxOS and Gentoo. In addition, we record a few tens of downloads
per day from people using other distributions. The feedback received
so far basically confirms the expected latency drop and throughput
boost.

Paolo

Results on a Plextor PX-256M5S SSD

The first two rows of the next table report the aggregate throughput
achieved by BFQ, CFQ, DEADLINE and NOOP, while ten parallel processes
read, either sequentially or randomly, a separate portion of the
memory blocks each. These processes read directly from the device, and
no process performs writes, to avoid writing large files repeatedly
and wearing out the SSD during the many tests done. As can be seen,
all schedulers achieve about the same throughput with sequential
readers, whereas, with random readers, the throughput slightly grows
as the complexity, and hence the execution time, of the schedulers
decreases. In fact, with random readers, the number of IOPS is
extremely higher, and all CPUs spend all the time either executing
instructions or waiting for I/O (the total idle percentage is
0). Therefore, the processing time of I/O requests influences the
maximum throughput achievable.

The remaining rows report the cold-cache start-up time experienced by
various applications while one of the above two workloads is being
executed in parallel. In particular, "Start-up time 10 seq/rand"
stands for "Start-up time of the application at hand while 10
sequential/random readers are 

[PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler

2014-05-27 Thread paolo
From: Paolo Valente paolo.vale...@unimore.it

[Re-posting, previous attempt seems to have partially failed]

Hi,
this patchset introduces the last version of BFQ, a proportional-share
storage-I/O scheduler. BFQ also supports hierarchical scheduling with
a cgroups interface. The first version of BFQ was submitted a few
years ago [1]. It is denoted as v0 in the patches, to distinguish it
from the version I am submitting now, v7r4. In particular, the first
four patches introduce BFQ-v0, whereas the remaining patches turn it
progressively into BFQ-v7r4. Here are some nice features of this last
version.

Low latency for interactive applications

According to our results, regardless of the actual background
workload, for interactive tasks the storage device is virtually as
responsive as if it was idle. For example, even if one or more of the
following background workloads are being executed:
- one or more large files are being read or written,
- a tree of source files is being compiled,
- one or more virtual machines are performing I/O,
- a software update is in progress,
- indexing daemons are scanning filesystems and updating their
  databases,
starting an application or loading a file from within an application
takes about the same time as if the storage device was idle. As a
comparison, with CFQ, NOOP or DEADLINE, and in the same conditions,
applications experience high latencies, or even become unresponsive
until the background workload terminates (also on SSDs).

Low latency for soft real-time applications

Also soft real-time applications, such as audio and video
players/streamers, enjoy low latency and drop rate, regardless of the
storage-device background workload. As a consequence, these
applications do not suffer from almost any glitch due to the
background workload.

High throughput

On hard disks, BFQ achieves up to 30% higher throughput than CFQ, and
up to 150% higher throughput than DEADLINE and NOOP, with half of the
parallel workloads considered in our tests. With the rest of the
workloads, and with all the workloads on flash-based devices, BFQ
achieves instead about the same throughput as the other schedulers.

Strong fairness guarantees (already provided by BFQ-v0)

As for long-term guarantees, BFQ distributes the device throughput
(and not just the device time) as desired to I/O-bound applications,
with any workload and regardless of the device parameters.

What allows BFQ to provide the above features is its accurate
scheduling engine (patches 1-4), combined with a set of simple
heuristics and improvements (patches 5-14).  A 15-minute demo of the
performance of BFQ is available here [2]. I made this demo with an
older version of BFQ (v3r4) and under Linux 3.4.0. We have further
improved BFQ since then. The performance of the last version of BFQ is
shown, e.g., through some graphs here [3], under 3.14.0, compared
against CFQ, DEADLINE and NOOP, and on: a fast and a slow hard disk, a
RAID1, an SSD, a microSDHC Card and an eMMC. As an example, our
results on the SSD are reported also in a table at the end of this
email. Finally, details on how BFQ and its components work are
provided in the descriptions of the patches. An organic description of
the main BFQ algorithm and of most of its features can instead be
found in this paper [4].

Finally, as for testing in everyday use, BFQ is the default I/O
scheduler in, e.g., Manjaro, Sabayon, OpenMandriva and Arch Linux ARM
in some NAS boxes, plus several kernel forks for PCs and
smartphones. BFQ is instead optionally available in, e.g., Arch,
PCLinuxOS and Gentoo. In addition, we record a few tens of downloads
per day from people using other distributions. The feedback received
so far basically confirms the expected latency drop and throughput
boost.

Paolo

Results on a Plextor PX-256M5S SSD

The first two rows of the next table report the aggregate throughput
achieved by BFQ, CFQ, DEADLINE and NOOP, while ten parallel processes
read, either sequentially or randomly, a separate portion of the
memory blocks each. These processes read directly from the device, and
no process performs writes, to avoid writing large files repeatedly
and wearing out the SSD during the many tests done. As can be seen,
all schedulers achieve about the same throughput with sequential
readers, whereas, with random readers, the throughput slightly grows
as the complexity, and hence the execution time, of the schedulers
decreases. In fact, with random readers, the number of IOPS is
extremely higher, and all CPUs spend all the time either executing
instructions or waiting for I/O (the total idle percentage is
0). Therefore, the processing time of I/O requests influences the
maximum throughput achievable.

The remaining rows report the cold-cache start-up time experienced by
various applications while one of the above two workloads is being
executed in parallel. In particular, Start-up time 10 seq/rand
stands for Start-up time of the application at hand while 10