Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
Il giorno 30/mag/2014, alle ore 19:09, Vivek Goyal ha scritto: > […] > Instead of just looking at numbers, I am keen on knowing what's the > fundamental design change which allows this. What is CFQ doing wrong > which BFQ gets right. > I think that Tejun has already highlighted the key points and provided many details. To contribute to answer your questions about the reasons why bfq outperforms cfq, here is a summary of the most relevant underlying facts: 1) cfq is based on a round-robin scheme, in which an unlucky queue that should be served immediately may happen instead to wait for all the other busy queues before being served. In this respect, combining round robin with virtual-time-based improvements is likely to lead to not very clear solutions, and probably to sub-optimal results with respect to just using an optimal scheduler with provable deterministic guarantees (as the internal scheduler of bfq). 2) To provide a queue with a higher fraction of the throughput, a round-robin scheduler serves the queue for a longer time slice. Increasing time slices further increases per-request latencies. The problem may be mitigated by using preemption, but the result is a combination of a basic algorithm and a ‘corrective’ heuristic. This is again a more convoluted, and likely less accurate, solution than using directly an optimal algorithm with provable guarantees. 3) In bfq, budgets play a similar role as time slices in cfq, i.e., once a queue has been granted access to the device, the queue is served, in the simplest case, until it finishes its budget. But, under bfq, the fraction of the throughput received by a queue is *independent* of the budget assigned to the queue. I agree that this may seem counterintuitive in the first place, especially if one is accustomed to thinking a la round-robin. Differently from a round-robin algorithm, the internal scheduler of bfq controls throughput distribution by controlling the frequency at which queues are served. The resulting degree of freedom with respect to budget sizes has the following two main advantages: 3.a) bfq can choose for each queue the budget that best fits the requirements or characteristics of the queue. For example, queues corresponding to time-sensitive applications are assigned small budgets, which guarantees that they are served quickly. On the opposite side, queues associated to I/O-bound processes performing mostly sequential I/O are assigned large budgets, which helps boost the throughput. 3.b) bfq does not need to assign large budgets to queues to provide them with large fractions of the throughput; hence bfq does not need to deteriorate per-request latencies to guarantee a differentiated throughput distribution. 3) The internal scheduler of bfq guarantees that a queue that needs to be served quickly may wait, unjustly, for the service of at most one queue. More formally, bfq guarantees that each budget is completed with the smallest possible delay, for a budget-by-budget scheduler, with respect to an ideal, perfectly fair scheduler (i.e., an ideal scheduler that serves all busy queues at the same, providing each with a fraction of the throughput proportional to its weight). 4) Assigning temporarily a large fraction of the throughput is the main mechanism through which bfq provides interactive and soft real-time applications with a low latency. Thanks to fact 3.b, bfq achieves this goal without increasing per-request latencies. As for how applications are deemed as interactive or soft real-time, I have tried to describe both detection heuristics in patches 06 and 07. Finally, as for adding to cfq the heuristics I have added to bfq, I think that this would probably improve application latencies also with cfq. But, because of the above facts, the result would unavoidably be worse than with bfq. Paolo -- Paolo Valente Algogroup Dipartimento di Fisica, Informatica e Matematica Via Campi, 213/B 41125 Modena - Italy homepage: http://algogroup.unimore.it/people/paolo/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
Hey, Vivek. On Fri, May 30, 2014 at 01:55:27PM -0400, Vivek Goyal wrote: > Now CFQ also dynamically adjusts the slice length based on the how > many queues are ready to do IO. One problem with fixed slice lenth > round robin was that if there are lot of queues doing IO, then after > serving one queue, same queue will get time slice after a long time. > > Corrado Zoccolo did work in this area in an attempt to improve latency. > Now slice length is calculated dynamically in an attempt to achieve > better latency. Consdier it a better version of that. > Ok, so you prefer to have another IO scheduler instead of improving > CFQ? As I wrote in another reply, I think the best approach would be morphing cfq to accomodate the scheduling algorithm and heristics of bfq. > Tejun, I have spent significant amount of time on BFQ few years ago. And > that's the reason I have not read it again yet. My understanding was > that there was nothing which would not be done in CFQ (atleast things > which mattered). THERE IS A NEW PAPER. > Looks like you prefer introducing a new scheduler instead of improving > CFQ. My preference is to improve CFQ. Borrow good ideas from BFQ and > implement them in CFQ. LOOKS LIKE YOU ARE NOT READING ANYTHING AND JUST WRITING IRRELEVANT SHIT. > So why not improve CFQ instead of carrying and maintaining another > scheduler. And then have a discussion that what's the default scheduler. For fuck's sake, I'm out. This is total waste of time. You don't read what others are writing and refuse to study what's right there. What are you doing? -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
On Fri, May 30, 2014 at 01:26:09PM -0400, Tejun Heo wrote: > Hello, > > On Fri, May 30, 2014 at 01:09:58PM -0400, Vivek Goyal wrote: > > Are you referring to BFQ paper. I had read one in the past and it was > > all about how to achieve more accurate fairness. At this point of time > > I don't think that smarter algorithm is the problem. Until and unless > > somebody can show me that how algorithm is a problem. > > Because cfq rr's with heuristically guessed slices and bfq calculates > where each one is supposed to end and then schedules the slices > accordingly. With a lot of slices to serve, cfq loses track of which > one should come first to adhere to desired latency guarantees while > bfq doesn't, which in turn allows more latitude in using longer slices > for bfq allowing for better throughput. It's all in the paper and > backed by numbers. What more do you want? Now CFQ also dynamically adjusts the slice length based on the how many queues are ready to do IO. One problem with fixed slice lenth round robin was that if there are lot of queues doing IO, then after serving one queue, same queue will get time slice after a long time. Corrado Zoccolo did work in this area in an attempt to improve latency. Now slice length is calculated dynamically in an attempt to achieve better latency. commit 5db5d64277bf390056b1a87d0bb288c8b8553f96 Author: Corrado Zoccolo Date: Mon Oct 26 22:44:04 2009 +0100 cfq-iosched: adapt slice to number of processes doing I/O > > > Apart from algorithm, one thing BFQ did different in the past was provide > > fairness based on amount of IO done *and not in terms of time slice*. I > > don't know if that's still the case. If it is the case I am wondering > > why CFQ was converted from amount of IO based fairness to time slice > > based fairness. > > Rotating rusts being rotating rusts, either measure isn't perfect. > Bandwidth tracking gets silly with seeky workloads while pure time > slice tracking unfairly treats users of inner and outer tracks. BFQ > uses bw tracking for sequential workload while basically switches to > time slice tracking for seeky workloads. These were pretty clearly > discussed in the paper. Ok, so you prefer to have another IO scheduler instead of improving CFQ? > > > I am all for a better algorithm. I just want to somebody to explain it > > to me that what magic they have done to achieve better throughput as > > well as better latency. > > It feels more like you're actively refusing to understand it even when > the algorithm and heristics involved are as clearly presented as it > could be. This is one of the best documented piece of work in this > area of the kernel. Just read the papers. Tejun, I have spent significant amount of time on BFQ few years ago. And that's the reason I have not read it again yet. My understanding was that there was nothing which would not be done in CFQ (atleast things which mattered). Looks like you prefer introducing a new scheduler instead of improving CFQ. My preference is to improve CFQ. Borrow good ideas from BFQ and implement them in CFQ. > > > Do you think that CFQ's problems come from round robin algorithms. No, > > they don't. Most of the time we don't even honor the allocated time > > Oh yes, why do you think we bother with preemption at all? bfq > reportedly achieves sufficient fairness and responsiveness without the > need for preemption. This is a lot clearer model. We primarily allow preemption of buffered write queue. If you allocate a share for buffered write queue, that would be good for not starving buffered writes but your read latencies should go down. > > > slice (except sequential IO) and preempt the allocated slice. That's > > why I think implementing a more fair algorithm is probably not the > > solution. > > If you actually study what has been presented, you'd immediately > recognize that a lot of what bfq does is clarification and > improvements of the buried heuristics. > > Look at it this way: RR + preemption becomes the basic timestamp based > scheduling and other heuristics are extracted and clarified. That's > what it is. > > > CFQ's throughput problems come from idling and driving lower queue depth. > > And CFQ's throughput problems arise due to suppressing buffered write > > very actively in presence of sync IO. > > > > I want to know what has BFQ done fundamentally different to take care of > > above issues. > > Yeah, read the paper. They also analyzed why some things behave > certain ways. A lot of them are pretty spot-on. > > > Remember CFQ had started with a simple algorith. Just start allocating > > time slice in proportion to ioprio. That simple scheme did not work in > > real world and then it became more complicated. > > Yes, this is another refinement step. WTF is that so hard to warp > your head around it? It doesn't throw away much. It builds upon cfq > and clarifies and improves what it has been doing. So why not improve CFQ instead
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
On Fri, May 30, 2014 at 01:31:46PM -0400, Vivek Goyal wrote: > > What allows BFQ to provide the above features is its accurate > > scheduling engine (patches 1-4), combined with a set of simple > > heuristics and improvements (patches 5-14). > > This is very hard to understand. This puzzle need to be broken down > into small pieces and explained in simple design terms so that even > 5 years down the line I can explain why BFQ was better. Vivek, stop wasting people's time and just go study the paper. Let's *please* talk after that. What do you expect him to do? Copy & paste the whole paper here and walk you through each step of it? He presented as much information as he could and then provided sufficient summary in the head message and as comments in the implementation. It's now *your* turn to study what has been presented. Sure, if you have further questions or think that the implementation and patches need more documentation, that's completely fine but I find it ridiculous that you're demanding to be spoon-fed information which is readily available in an easily consumable form. Seriously, this is the best documentation we've had in this area *EVER*. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
On Tue, May 27, 2014 at 02:42:24PM +0200, paolo wrote: > From: Paolo Valente > > [Re-posting, previous attempt seems to have partially failed] > > Hi, > this patchset introduces the last version of BFQ, a proportional-share > storage-I/O scheduler. BFQ also supports hierarchical scheduling with > a cgroups interface. The first version of BFQ was submitted a few > years ago [1]. It is denoted as v0 in the patches, to distinguish it > from the version I am submitting now, v7r4. In particular, the first > four patches introduce BFQ-v0, whereas the remaining patches turn it > progressively into BFQ-v7r4. Here are some nice features of this last > version. > > Low latency for interactive applications > > According to our results, regardless of the actual background > workload, for interactive tasks the storage device is virtually as > responsive as if it was idle. For example, even if one or more of the > following background workloads are being executed: > - one or more large files are being read or written, > - a tree of source files is being compiled, > - one or more virtual machines are performing I/O, > - a software update is in progress, > - indexing daemons are scanning filesystems and updating their > databases, > starting an application or loading a file from within an application > takes about the same time as if the storage device was idle. As a > comparison, with CFQ, NOOP or DEADLINE, and in the same conditions, > applications experience high latencies, or even become unresponsive > until the background workload terminates (also on SSDs). So how do you achieve it? IOW, how do you figure out something is interactive and just give it priority and almost stop other. What happens to notion of fairness in that case. And if there is a way to figure out interactive applications, then even in CFQ, one should be able to easily put these queues in the beginning of service tree so that they get served first and achieve better experience? And in general that would be a desirable thing to do. So why not modify CFQ. > > Low latency for soft real-time applications > > Also soft real-time applications, such as audio and video > players/streamers, enjoy low latency and drop rate, regardless of the > storage-device background workload. As a consequence, these > applications do not suffer from almost any glitch due to the > background workload. Again, how do you achieve it? > > High throughput > > On hard disks, BFQ achieves up to 30% higher throughput than CFQ, For what workload? > and > up to 150% higher throughput than DEADLINE and NOOP, Is this true with buffered write workload also? I think these % will be very dependent on IO pattern. Again I would like to know how did you achieve such a high throughput when compared to CFQ, Deadline, NOOP. One of the things which drops throughput on rotational hard disk is non-sequential IO pattern and CFQ already does that. So only way to achieve higher throughput will be to accumulate more sequetial IO in the queue and then let that queue run for longer and stop other IO from other queues. And that will mean higher latencies for IO in other queues. So on this rotation hard disk, how do you achieve higher throughput as well as reduced latency. > with half of the > parallel workloads considered in our tests. With the rest of the > workloads, and with all the workloads on flash-based devices, BFQ > achieves instead about the same throughput as the other schedulers. > > Strong fairness guarantees (already provided by BFQ-v0) > > As for long-term guarantees, BFQ distributes the device throughput > (and not just the device time) as desired to I/O-bound applications, I think this is one key differece as comapred to CFQ. Fairness based on bandwidth and not fairness based on time slice. So if a process is doing large sequential IO and other is doing small random IO, most of the disk time will be given to process doing small random IOs. Is that more fair. I think that's one reason that CFQ was switched to time based scheme. Provide time slices and after that it is up to the application how much they can get out of disk in that slice based on their IO pattern. At least in terms of fairness, that sounds more fair to me. I think this is one point which needs to be discussed that is it a better idea to switch to bandwidth based fairness. Should we change CFQ to achieve that or we need to introduce new IO scheduler for that. > with any workload and regardless of the device parameters. > > What allows BFQ to provide the above features is its accurate > scheduling engine (patches 1-4), combined with a set of simple > heuristics and improvements (patches 5-14). This is very hard to understand. This puzzle need to be broken down into small pieces and explained in simple design terms so that even 5 years down the line I can explain why BFQ was better. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
Hello, On Fri, May 30, 2014 at 01:09:58PM -0400, Vivek Goyal wrote: > Are you referring to BFQ paper. I had read one in the past and it was > all about how to achieve more accurate fairness. At this point of time > I don't think that smarter algorithm is the problem. Until and unless > somebody can show me that how algorithm is a problem. Because cfq rr's with heuristically guessed slices and bfq calculates where each one is supposed to end and then schedules the slices accordingly. With a lot of slices to serve, cfq loses track of which one should come first to adhere to desired latency guarantees while bfq doesn't, which in turn allows more latitude in using longer slices for bfq allowing for better throughput. It's all in the paper and backed by numbers. What more do you want? > Apart from algorithm, one thing BFQ did different in the past was provide > fairness based on amount of IO done *and not in terms of time slice*. I > don't know if that's still the case. If it is the case I am wondering > why CFQ was converted from amount of IO based fairness to time slice > based fairness. Rotating rusts being rotating rusts, either measure isn't perfect. Bandwidth tracking gets silly with seeky workloads while pure time slice tracking unfairly treats users of inner and outer tracks. BFQ uses bw tracking for sequential workload while basically switches to time slice tracking for seeky workloads. These were pretty clearly discussed in the paper. > I am all for a better algorithm. I just want to somebody to explain it > to me that what magic they have done to achieve better throughput as > well as better latency. It feels more like you're actively refusing to understand it even when the algorithm and heristics involved are as clearly presented as it could be. This is one of the best documented piece of work in this area of the kernel. Just read the papers. > Do you think that CFQ's problems come from round robin algorithms. No, > they don't. Most of the time we don't even honor the allocated time Oh yes, why do you think we bother with preemption at all? bfq reportedly achieves sufficient fairness and responsiveness without the need for preemption. This is a lot clearer model. > slice (except sequential IO) and preempt the allocated slice. That's > why I think implementing a more fair algorithm is probably not the > solution. If you actually study what has been presented, you'd immediately recognize that a lot of what bfq does is clarification and improvements of the buried heuristics. Look at it this way: RR + preemption becomes the basic timestamp based scheduling and other heuristics are extracted and clarified. That's what it is. > CFQ's throughput problems come from idling and driving lower queue depth. > And CFQ's throughput problems arise due to suppressing buffered write > very actively in presence of sync IO. > > I want to know what has BFQ done fundamentally different to take care of > above issues. Yeah, read the paper. They also analyzed why some things behave certain ways. A lot of them are pretty spot-on. > Remember CFQ had started with a simple algorith. Just start allocating > time slice in proportion to ioprio. That simple scheme did not work in > real world and then it became more complicated. Yes, this is another refinement step. WTF is that so hard to warp your head around it? It doesn't throw away much. It builds upon cfq and clarifies and improves what it has been doing. > What's the guarantee that same thing will not happen to BFQ. There is > no point in getting fairness if overall performance sucks. That's what > happens even with block IO cgroups. Create more than 4 cgroups, put some > workload in that and with CFQ your performance will be so bad that you > will drop the idea of getting fairness. Dude, just go read the paper. > Can you please provide me the link to the paper. I had read one few > years back. I am not sure if it is still the same paper of a new one. > And after experimenting with their implementation and playing with > CFQ, my impression was that fairness algorithm is not the core problem. The references are in the frigging head message and in the head comment of the implementation. Are you even reading stuff? You're just wasting other people's time. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
On Fri, May 30, 2014 at 12:16:50PM -0400, Tejun Heo wrote: > Hello, Vivek. > > On Fri, May 30, 2014 at 11:32:28AM -0400, Vivek Goyal wrote: > > I don't think most of the people care about strong fairness guarantee. > > As an algorithm round robin is not bad for ensuring fairness. CFQ had > > started with that but then it stopped focussing on fairness and rather > > focussed on trying to address various real issues. > > Oh, I widely disagree. Have you looked at the test results in the > paper? Unless the results are completely bogus, which it probably > isn't, this is awesome. This is a *lot* more clearly characterized > algorithm which shows significantly better behavior especially in use > cases where scheduling matters. I mean, it even reaches higher > throughput while achieving lower latency. Why wouldn't we want it? > Of everybody wants "higher throughput while achieving lower latency". Most of the time these are contradictory goals. Instead of just looking at numbers, I am keen on knowing what's the fundamental design change which allows this. What is CFQ doing wrong which BFQ gets right. Are you referring to BFQ paper. I had read one in the past and it was all about how to achieve more accurate fairness. At this point of time I don't think that smarter algorithm is the problem. Until and unless somebody can show me that how algorithm is a problem. Apart from algorithm, one thing BFQ did different in the past was provide fairness based on amount of IO done *and not in terms of time slice*. I don't know if that's still the case. If it is the case I am wondering why CFQ was converted from amount of IO based fairness to time slice based fairness. I am all for a better algorithm. I just want to somebody to explain it to me that what magic they have done to achieve better throughput as well as better latency. Do you think that CFQ's problems come from round robin algorithms. No, they don't. Most of the time we don't even honor the allocated time slice (except sequential IO) and preempt the allocated slice. That's why I think implementing a more fair algorithm is probably not the solution. CFQ's throughput problems come from idling and driving lower queue depth. And CFQ's throughput problems arise due to suppressing buffered write very actively in presence of sync IO. I want to know what has BFQ done fundamentally different to take care of above issues. > > And CFQ's problems don't arise from not having a good fairness algorithm. > > So I don't think this should be the reason for taking a new scheduler. > > In comparison, cfq's fairness behavior is a lot worse but even > ignoring thing, one of the major problems of cfq is that the behavior > is hardly characterized. It's really difficult to anticipate what > it'd do and understand why, which makes it very difficult to maintain > and improve. Even just for the latter point, it'd be worthwhile to > adopt bfq. Remember CFQ had started with a simple algorith. Just start allocating time slice in proportion to ioprio. That simple scheme did not work in real world and then it became more complicated. What's the guarantee that same thing will not happen to BFQ. There is no point in getting fairness if overall performance sucks. That's what happens even with block IO cgroups. Create more than 4 cgroups, put some workload in that and with CFQ your performance will be so bad that you will drop the idea of getting fairness. Why performance is bad, again due to idling. In an attmept to provide isolation between IO of two cgroups, we idle. You want to dilute the the isolation, sure you will get better throughput and CFQ can do that too. > > > I think instead of numbers, what would help is a short description > > that what's the fundamental problem with CFQ which BFQ does not > > have and how did you solve that issue. > > The papers are pretty clear and not too long. Have you read them? Can you please provide me the link to the paper. I had read one few years back. I am not sure if it is still the same paper of a new one. And after experimenting with their implementation and playing with CFQ, my impression was that fairness algorithm is not the core problem. I will be more than happy to be proven wrong. Just I need somebody to not throw just numbers at me but rather explain to me that why BFQ performs better and why CFQ can't do the same thing. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
Hello, Vivek. On Fri, May 30, 2014 at 11:32:28AM -0400, Vivek Goyal wrote: > I don't think most of the people care about strong fairness guarantee. > As an algorithm round robin is not bad for ensuring fairness. CFQ had > started with that but then it stopped focussing on fairness and rather > focussed on trying to address various real issues. Oh, I widely disagree. Have you looked at the test results in the paper? Unless the results are completely bogus, which it probably isn't, this is awesome. This is a *lot* more clearly characterized algorithm which shows significantly better behavior especially in use cases where scheduling matters. I mean, it even reaches higher throughput while achieving lower latency. Why wouldn't we want it? > And CFQ's problems don't arise from not having a good fairness algorithm. > So I don't think this should be the reason for taking a new scheduler. In comparison, cfq's fairness behavior is a lot worse but even ignoring thing, one of the major problems of cfq is that the behavior is hardly characterized. It's really difficult to anticipate what it'd do and understand why, which makes it very difficult to maintain and improve. Even just for the latter point, it'd be worthwhile to adopt bfq. > I think instead of numbers, what would help is a short description > that what's the fundamental problem with CFQ which BFQ does not > have and how did you solve that issue. The papers are pretty clear and not too long. Have you read them? > One issue you seemed to mention is that write is a problem. CFQ > suppresses buffered writes very actively in an attempt to improve > read latencies. How did you make it even better with BFQ. > > Last time I had looked at BFQ, it looked pretty similar to CFQ except > that core round robin algorithm had been replaced by a more fair > algo and more things done like less preemption etc. > > But personally I don't think using a more accurate fairness algorithm > is the problem to begin with most of the time. > > So I fail to understand that why do we need BFQ. I violently disagree. This is awesome. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
On Tue, May 27, 2014 at 02:42:24PM +0200, paolo wrote: [..] > Strong fairness guarantees (already provided by BFQ-v0) > > As for long-term guarantees, BFQ distributes the device throughput > (and not just the device time) as desired to I/O-bound applications, > with any workload and regardless of the device parameters. I don't think most of the people care about strong fairness guarantee. As an algorithm round robin is not bad for ensuring fairness. CFQ had started with that but then it stopped focussing on fairness and rather focussed on trying to address various real issues. And CFQ's problems don't arise from not having a good fairness algorithm. So I don't think this should be the reason for taking a new scheduler. I think instead of numbers, what would help is a short description that what's the fundamental problem with CFQ which BFQ does not have and how did you solve that issue. One issue you seemed to mention is that write is a problem. CFQ suppresses buffered writes very actively in an attempt to improve read latencies. How did you make it even better with BFQ. Last time I had looked at BFQ, it looked pretty similar to CFQ except that core round robin algorithm had been replaced by a more fair algo and more things done like less preemption etc. But personally I don't think using a more accurate fairness algorithm is the problem to begin with most of the time. So I fail to understand that why do we need BFQ. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
On Tue, May 27, 2014 at 02:42:24PM +0200, paolo wrote: [..] Strong fairness guarantees (already provided by BFQ-v0) As for long-term guarantees, BFQ distributes the device throughput (and not just the device time) as desired to I/O-bound applications, with any workload and regardless of the device parameters. I don't think most of the people care about strong fairness guarantee. As an algorithm round robin is not bad for ensuring fairness. CFQ had started with that but then it stopped focussing on fairness and rather focussed on trying to address various real issues. And CFQ's problems don't arise from not having a good fairness algorithm. So I don't think this should be the reason for taking a new scheduler. I think instead of numbers, what would help is a short description that what's the fundamental problem with CFQ which BFQ does not have and how did you solve that issue. One issue you seemed to mention is that write is a problem. CFQ suppresses buffered writes very actively in an attempt to improve read latencies. How did you make it even better with BFQ. Last time I had looked at BFQ, it looked pretty similar to CFQ except that core round robin algorithm had been replaced by a more fair algo and more things done like less preemption etc. But personally I don't think using a more accurate fairness algorithm is the problem to begin with most of the time. So I fail to understand that why do we need BFQ. Thanks Vivek -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
Hello, Vivek. On Fri, May 30, 2014 at 11:32:28AM -0400, Vivek Goyal wrote: I don't think most of the people care about strong fairness guarantee. As an algorithm round robin is not bad for ensuring fairness. CFQ had started with that but then it stopped focussing on fairness and rather focussed on trying to address various real issues. Oh, I widely disagree. Have you looked at the test results in the paper? Unless the results are completely bogus, which it probably isn't, this is awesome. This is a *lot* more clearly characterized algorithm which shows significantly better behavior especially in use cases where scheduling matters. I mean, it even reaches higher throughput while achieving lower latency. Why wouldn't we want it? And CFQ's problems don't arise from not having a good fairness algorithm. So I don't think this should be the reason for taking a new scheduler. In comparison, cfq's fairness behavior is a lot worse but even ignoring thing, one of the major problems of cfq is that the behavior is hardly characterized. It's really difficult to anticipate what it'd do and understand why, which makes it very difficult to maintain and improve. Even just for the latter point, it'd be worthwhile to adopt bfq. I think instead of numbers, what would help is a short description that what's the fundamental problem with CFQ which BFQ does not have and how did you solve that issue. The papers are pretty clear and not too long. Have you read them? One issue you seemed to mention is that write is a problem. CFQ suppresses buffered writes very actively in an attempt to improve read latencies. How did you make it even better with BFQ. Last time I had looked at BFQ, it looked pretty similar to CFQ except that core round robin algorithm had been replaced by a more fair algo and more things done like less preemption etc. But personally I don't think using a more accurate fairness algorithm is the problem to begin with most of the time. So I fail to understand that why do we need BFQ. I violently disagree. This is awesome. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
On Fri, May 30, 2014 at 12:16:50PM -0400, Tejun Heo wrote: Hello, Vivek. On Fri, May 30, 2014 at 11:32:28AM -0400, Vivek Goyal wrote: I don't think most of the people care about strong fairness guarantee. As an algorithm round robin is not bad for ensuring fairness. CFQ had started with that but then it stopped focussing on fairness and rather focussed on trying to address various real issues. Oh, I widely disagree. Have you looked at the test results in the paper? Unless the results are completely bogus, which it probably isn't, this is awesome. This is a *lot* more clearly characterized algorithm which shows significantly better behavior especially in use cases where scheduling matters. I mean, it even reaches higher throughput while achieving lower latency. Why wouldn't we want it? Of everybody wants higher throughput while achieving lower latency. Most of the time these are contradictory goals. Instead of just looking at numbers, I am keen on knowing what's the fundamental design change which allows this. What is CFQ doing wrong which BFQ gets right. Are you referring to BFQ paper. I had read one in the past and it was all about how to achieve more accurate fairness. At this point of time I don't think that smarter algorithm is the problem. Until and unless somebody can show me that how algorithm is a problem. Apart from algorithm, one thing BFQ did different in the past was provide fairness based on amount of IO done *and not in terms of time slice*. I don't know if that's still the case. If it is the case I am wondering why CFQ was converted from amount of IO based fairness to time slice based fairness. I am all for a better algorithm. I just want to somebody to explain it to me that what magic they have done to achieve better throughput as well as better latency. Do you think that CFQ's problems come from round robin algorithms. No, they don't. Most of the time we don't even honor the allocated time slice (except sequential IO) and preempt the allocated slice. That's why I think implementing a more fair algorithm is probably not the solution. CFQ's throughput problems come from idling and driving lower queue depth. And CFQ's throughput problems arise due to suppressing buffered write very actively in presence of sync IO. I want to know what has BFQ done fundamentally different to take care of above issues. And CFQ's problems don't arise from not having a good fairness algorithm. So I don't think this should be the reason for taking a new scheduler. In comparison, cfq's fairness behavior is a lot worse but even ignoring thing, one of the major problems of cfq is that the behavior is hardly characterized. It's really difficult to anticipate what it'd do and understand why, which makes it very difficult to maintain and improve. Even just for the latter point, it'd be worthwhile to adopt bfq. Remember CFQ had started with a simple algorith. Just start allocating time slice in proportion to ioprio. That simple scheme did not work in real world and then it became more complicated. What's the guarantee that same thing will not happen to BFQ. There is no point in getting fairness if overall performance sucks. That's what happens even with block IO cgroups. Create more than 4 cgroups, put some workload in that and with CFQ your performance will be so bad that you will drop the idea of getting fairness. Why performance is bad, again due to idling. In an attmept to provide isolation between IO of two cgroups, we idle. You want to dilute the the isolation, sure you will get better throughput and CFQ can do that too. I think instead of numbers, what would help is a short description that what's the fundamental problem with CFQ which BFQ does not have and how did you solve that issue. The papers are pretty clear and not too long. Have you read them? Can you please provide me the link to the paper. I had read one few years back. I am not sure if it is still the same paper of a new one. And after experimenting with their implementation and playing with CFQ, my impression was that fairness algorithm is not the core problem. I will be more than happy to be proven wrong. Just I need somebody to not throw just numbers at me but rather explain to me that why BFQ performs better and why CFQ can't do the same thing. Thanks Vivek -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
Hello, On Fri, May 30, 2014 at 01:09:58PM -0400, Vivek Goyal wrote: Are you referring to BFQ paper. I had read one in the past and it was all about how to achieve more accurate fairness. At this point of time I don't think that smarter algorithm is the problem. Until and unless somebody can show me that how algorithm is a problem. Because cfq rr's with heuristically guessed slices and bfq calculates where each one is supposed to end and then schedules the slices accordingly. With a lot of slices to serve, cfq loses track of which one should come first to adhere to desired latency guarantees while bfq doesn't, which in turn allows more latitude in using longer slices for bfq allowing for better throughput. It's all in the paper and backed by numbers. What more do you want? Apart from algorithm, one thing BFQ did different in the past was provide fairness based on amount of IO done *and not in terms of time slice*. I don't know if that's still the case. If it is the case I am wondering why CFQ was converted from amount of IO based fairness to time slice based fairness. Rotating rusts being rotating rusts, either measure isn't perfect. Bandwidth tracking gets silly with seeky workloads while pure time slice tracking unfairly treats users of inner and outer tracks. BFQ uses bw tracking for sequential workload while basically switches to time slice tracking for seeky workloads. These were pretty clearly discussed in the paper. I am all for a better algorithm. I just want to somebody to explain it to me that what magic they have done to achieve better throughput as well as better latency. It feels more like you're actively refusing to understand it even when the algorithm and heristics involved are as clearly presented as it could be. This is one of the best documented piece of work in this area of the kernel. Just read the papers. Do you think that CFQ's problems come from round robin algorithms. No, they don't. Most of the time we don't even honor the allocated time Oh yes, why do you think we bother with preemption at all? bfq reportedly achieves sufficient fairness and responsiveness without the need for preemption. This is a lot clearer model. slice (except sequential IO) and preempt the allocated slice. That's why I think implementing a more fair algorithm is probably not the solution. If you actually study what has been presented, you'd immediately recognize that a lot of what bfq does is clarification and improvements of the buried heuristics. Look at it this way: RR + preemption becomes the basic timestamp based scheduling and other heuristics are extracted and clarified. That's what it is. CFQ's throughput problems come from idling and driving lower queue depth. And CFQ's throughput problems arise due to suppressing buffered write very actively in presence of sync IO. I want to know what has BFQ done fundamentally different to take care of above issues. Yeah, read the paper. They also analyzed why some things behave certain ways. A lot of them are pretty spot-on. Remember CFQ had started with a simple algorith. Just start allocating time slice in proportion to ioprio. That simple scheme did not work in real world and then it became more complicated. Yes, this is another refinement step. WTF is that so hard to warp your head around it? It doesn't throw away much. It builds upon cfq and clarifies and improves what it has been doing. What's the guarantee that same thing will not happen to BFQ. There is no point in getting fairness if overall performance sucks. That's what happens even with block IO cgroups. Create more than 4 cgroups, put some workload in that and with CFQ your performance will be so bad that you will drop the idea of getting fairness. Dude, just go read the paper. Can you please provide me the link to the paper. I had read one few years back. I am not sure if it is still the same paper of a new one. And after experimenting with their implementation and playing with CFQ, my impression was that fairness algorithm is not the core problem. The references are in the frigging head message and in the head comment of the implementation. Are you even reading stuff? You're just wasting other people's time. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
On Tue, May 27, 2014 at 02:42:24PM +0200, paolo wrote: From: Paolo Valente paolo.vale...@unimore.it [Re-posting, previous attempt seems to have partially failed] Hi, this patchset introduces the last version of BFQ, a proportional-share storage-I/O scheduler. BFQ also supports hierarchical scheduling with a cgroups interface. The first version of BFQ was submitted a few years ago [1]. It is denoted as v0 in the patches, to distinguish it from the version I am submitting now, v7r4. In particular, the first four patches introduce BFQ-v0, whereas the remaining patches turn it progressively into BFQ-v7r4. Here are some nice features of this last version. Low latency for interactive applications According to our results, regardless of the actual background workload, for interactive tasks the storage device is virtually as responsive as if it was idle. For example, even if one or more of the following background workloads are being executed: - one or more large files are being read or written, - a tree of source files is being compiled, - one or more virtual machines are performing I/O, - a software update is in progress, - indexing daemons are scanning filesystems and updating their databases, starting an application or loading a file from within an application takes about the same time as if the storage device was idle. As a comparison, with CFQ, NOOP or DEADLINE, and in the same conditions, applications experience high latencies, or even become unresponsive until the background workload terminates (also on SSDs). So how do you achieve it? IOW, how do you figure out something is interactive and just give it priority and almost stop other. What happens to notion of fairness in that case. And if there is a way to figure out interactive applications, then even in CFQ, one should be able to easily put these queues in the beginning of service tree so that they get served first and achieve better experience? And in general that would be a desirable thing to do. So why not modify CFQ. Low latency for soft real-time applications Also soft real-time applications, such as audio and video players/streamers, enjoy low latency and drop rate, regardless of the storage-device background workload. As a consequence, these applications do not suffer from almost any glitch due to the background workload. Again, how do you achieve it? High throughput On hard disks, BFQ achieves up to 30% higher throughput than CFQ, For what workload? and up to 150% higher throughput than DEADLINE and NOOP, Is this true with buffered write workload also? I think these % will be very dependent on IO pattern. Again I would like to know how did you achieve such a high throughput when compared to CFQ, Deadline, NOOP. One of the things which drops throughput on rotational hard disk is non-sequential IO pattern and CFQ already does that. So only way to achieve higher throughput will be to accumulate more sequetial IO in the queue and then let that queue run for longer and stop other IO from other queues. And that will mean higher latencies for IO in other queues. So on this rotation hard disk, how do you achieve higher throughput as well as reduced latency. with half of the parallel workloads considered in our tests. With the rest of the workloads, and with all the workloads on flash-based devices, BFQ achieves instead about the same throughput as the other schedulers. Strong fairness guarantees (already provided by BFQ-v0) As for long-term guarantees, BFQ distributes the device throughput (and not just the device time) as desired to I/O-bound applications, I think this is one key differece as comapred to CFQ. Fairness based on bandwidth and not fairness based on time slice. So if a process is doing large sequential IO and other is doing small random IO, most of the disk time will be given to process doing small random IOs. Is that more fair. I think that's one reason that CFQ was switched to time based scheme. Provide time slices and after that it is up to the application how much they can get out of disk in that slice based on their IO pattern. At least in terms of fairness, that sounds more fair to me. I think this is one point which needs to be discussed that is it a better idea to switch to bandwidth based fairness. Should we change CFQ to achieve that or we need to introduce new IO scheduler for that. with any workload and regardless of the device parameters. What allows BFQ to provide the above features is its accurate scheduling engine (patches 1-4), combined with a set of simple heuristics and improvements (patches 5-14). This is very hard to understand. This puzzle need to be broken down into small pieces and explained in simple design terms so that even 5 years down the line I can explain why BFQ was better. Thanks Vivek -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
On Fri, May 30, 2014 at 01:31:46PM -0400, Vivek Goyal wrote: What allows BFQ to provide the above features is its accurate scheduling engine (patches 1-4), combined with a set of simple heuristics and improvements (patches 5-14). This is very hard to understand. This puzzle need to be broken down into small pieces and explained in simple design terms so that even 5 years down the line I can explain why BFQ was better. Vivek, stop wasting people's time and just go study the paper. Let's *please* talk after that. What do you expect him to do? Copy paste the whole paper here and walk you through each step of it? He presented as much information as he could and then provided sufficient summary in the head message and as comments in the implementation. It's now *your* turn to study what has been presented. Sure, if you have further questions or think that the implementation and patches need more documentation, that's completely fine but I find it ridiculous that you're demanding to be spoon-fed information which is readily available in an easily consumable form. Seriously, this is the best documentation we've had in this area *EVER*. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
On Fri, May 30, 2014 at 01:26:09PM -0400, Tejun Heo wrote: Hello, On Fri, May 30, 2014 at 01:09:58PM -0400, Vivek Goyal wrote: Are you referring to BFQ paper. I had read one in the past and it was all about how to achieve more accurate fairness. At this point of time I don't think that smarter algorithm is the problem. Until and unless somebody can show me that how algorithm is a problem. Because cfq rr's with heuristically guessed slices and bfq calculates where each one is supposed to end and then schedules the slices accordingly. With a lot of slices to serve, cfq loses track of which one should come first to adhere to desired latency guarantees while bfq doesn't, which in turn allows more latitude in using longer slices for bfq allowing for better throughput. It's all in the paper and backed by numbers. What more do you want? Now CFQ also dynamically adjusts the slice length based on the how many queues are ready to do IO. One problem with fixed slice lenth round robin was that if there are lot of queues doing IO, then after serving one queue, same queue will get time slice after a long time. Corrado Zoccolo did work in this area in an attempt to improve latency. Now slice length is calculated dynamically in an attempt to achieve better latency. commit 5db5d64277bf390056b1a87d0bb288c8b8553f96 Author: Corrado Zoccolo czocc...@gmail.com Date: Mon Oct 26 22:44:04 2009 +0100 cfq-iosched: adapt slice to number of processes doing I/O Apart from algorithm, one thing BFQ did different in the past was provide fairness based on amount of IO done *and not in terms of time slice*. I don't know if that's still the case. If it is the case I am wondering why CFQ was converted from amount of IO based fairness to time slice based fairness. Rotating rusts being rotating rusts, either measure isn't perfect. Bandwidth tracking gets silly with seeky workloads while pure time slice tracking unfairly treats users of inner and outer tracks. BFQ uses bw tracking for sequential workload while basically switches to time slice tracking for seeky workloads. These were pretty clearly discussed in the paper. Ok, so you prefer to have another IO scheduler instead of improving CFQ? I am all for a better algorithm. I just want to somebody to explain it to me that what magic they have done to achieve better throughput as well as better latency. It feels more like you're actively refusing to understand it even when the algorithm and heristics involved are as clearly presented as it could be. This is one of the best documented piece of work in this area of the kernel. Just read the papers. Tejun, I have spent significant amount of time on BFQ few years ago. And that's the reason I have not read it again yet. My understanding was that there was nothing which would not be done in CFQ (atleast things which mattered). Looks like you prefer introducing a new scheduler instead of improving CFQ. My preference is to improve CFQ. Borrow good ideas from BFQ and implement them in CFQ. Do you think that CFQ's problems come from round robin algorithms. No, they don't. Most of the time we don't even honor the allocated time Oh yes, why do you think we bother with preemption at all? bfq reportedly achieves sufficient fairness and responsiveness without the need for preemption. This is a lot clearer model. We primarily allow preemption of buffered write queue. If you allocate a share for buffered write queue, that would be good for not starving buffered writes but your read latencies should go down. slice (except sequential IO) and preempt the allocated slice. That's why I think implementing a more fair algorithm is probably not the solution. If you actually study what has been presented, you'd immediately recognize that a lot of what bfq does is clarification and improvements of the buried heuristics. Look at it this way: RR + preemption becomes the basic timestamp based scheduling and other heuristics are extracted and clarified. That's what it is. CFQ's throughput problems come from idling and driving lower queue depth. And CFQ's throughput problems arise due to suppressing buffered write very actively in presence of sync IO. I want to know what has BFQ done fundamentally different to take care of above issues. Yeah, read the paper. They also analyzed why some things behave certain ways. A lot of them are pretty spot-on. Remember CFQ had started with a simple algorith. Just start allocating time slice in proportion to ioprio. That simple scheme did not work in real world and then it became more complicated. Yes, this is another refinement step. WTF is that so hard to warp your head around it? It doesn't throw away much. It builds upon cfq and clarifies and improves what it has been doing. So why not improve CFQ instead of carrying and maintaining another scheduler. And then have a discussion that
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
Hey, Vivek. On Fri, May 30, 2014 at 01:55:27PM -0400, Vivek Goyal wrote: Now CFQ also dynamically adjusts the slice length based on the how many queues are ready to do IO. One problem with fixed slice lenth round robin was that if there are lot of queues doing IO, then after serving one queue, same queue will get time slice after a long time. Corrado Zoccolo did work in this area in an attempt to improve latency. Now slice length is calculated dynamically in an attempt to achieve better latency. Consdier it a better version of that. Ok, so you prefer to have another IO scheduler instead of improving CFQ? As I wrote in another reply, I think the best approach would be morphing cfq to accomodate the scheduling algorithm and heristics of bfq. Tejun, I have spent significant amount of time on BFQ few years ago. And that's the reason I have not read it again yet. My understanding was that there was nothing which would not be done in CFQ (atleast things which mattered). THERE IS A NEW PAPER. Looks like you prefer introducing a new scheduler instead of improving CFQ. My preference is to improve CFQ. Borrow good ideas from BFQ and implement them in CFQ. LOOKS LIKE YOU ARE NOT READING ANYTHING AND JUST WRITING IRRELEVANT SHIT. So why not improve CFQ instead of carrying and maintaining another scheduler. And then have a discussion that what's the default scheduler. For fuck's sake, I'm out. This is total waste of time. You don't read what others are writing and refuse to study what's right there. What are you doing? -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
Il giorno 30/mag/2014, alle ore 19:09, Vivek Goyal vgo...@redhat.com ha scritto: […] Instead of just looking at numbers, I am keen on knowing what's the fundamental design change which allows this. What is CFQ doing wrong which BFQ gets right. I think that Tejun has already highlighted the key points and provided many details. To contribute to answer your questions about the reasons why bfq outperforms cfq, here is a summary of the most relevant underlying facts: 1) cfq is based on a round-robin scheme, in which an unlucky queue that should be served immediately may happen instead to wait for all the other busy queues before being served. In this respect, combining round robin with virtual-time-based improvements is likely to lead to not very clear solutions, and probably to sub-optimal results with respect to just using an optimal scheduler with provable deterministic guarantees (as the internal scheduler of bfq). 2) To provide a queue with a higher fraction of the throughput, a round-robin scheduler serves the queue for a longer time slice. Increasing time slices further increases per-request latencies. The problem may be mitigated by using preemption, but the result is a combination of a basic algorithm and a ‘corrective’ heuristic. This is again a more convoluted, and likely less accurate, solution than using directly an optimal algorithm with provable guarantees. 3) In bfq, budgets play a similar role as time slices in cfq, i.e., once a queue has been granted access to the device, the queue is served, in the simplest case, until it finishes its budget. But, under bfq, the fraction of the throughput received by a queue is *independent* of the budget assigned to the queue. I agree that this may seem counterintuitive in the first place, especially if one is accustomed to thinking a la round-robin. Differently from a round-robin algorithm, the internal scheduler of bfq controls throughput distribution by controlling the frequency at which queues are served. The resulting degree of freedom with respect to budget sizes has the following two main advantages: 3.a) bfq can choose for each queue the budget that best fits the requirements or characteristics of the queue. For example, queues corresponding to time-sensitive applications are assigned small budgets, which guarantees that they are served quickly. On the opposite side, queues associated to I/O-bound processes performing mostly sequential I/O are assigned large budgets, which helps boost the throughput. 3.b) bfq does not need to assign large budgets to queues to provide them with large fractions of the throughput; hence bfq does not need to deteriorate per-request latencies to guarantee a differentiated throughput distribution. 3) The internal scheduler of bfq guarantees that a queue that needs to be served quickly may wait, unjustly, for the service of at most one queue. More formally, bfq guarantees that each budget is completed with the smallest possible delay, for a budget-by-budget scheduler, with respect to an ideal, perfectly fair scheduler (i.e., an ideal scheduler that serves all busy queues at the same, providing each with a fraction of the throughput proportional to its weight). 4) Assigning temporarily a large fraction of the throughput is the main mechanism through which bfq provides interactive and soft real-time applications with a low latency. Thanks to fact 3.b, bfq achieves this goal without increasing per-request latencies. As for how applications are deemed as interactive or soft real-time, I have tried to describe both detection heuristics in patches 06 and 07. Finally, as for adding to cfq the heuristics I have added to bfq, I think that this would probably improve application latencies also with cfq. But, because of the above facts, the result would unavoidably be worse than with bfq. Paolo -- Paolo Valente Algogroup Dipartimento di Fisica, Informatica e Matematica Via Campi, 213/B 41125 Modena - Italy homepage: http://algogroup.unimore.it/people/paolo/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
From: Paolo Valente [Re-posting, previous attempt seems to have partially failed] Hi, this patchset introduces the last version of BFQ, a proportional-share storage-I/O scheduler. BFQ also supports hierarchical scheduling with a cgroups interface. The first version of BFQ was submitted a few years ago [1]. It is denoted as v0 in the patches, to distinguish it from the version I am submitting now, v7r4. In particular, the first four patches introduce BFQ-v0, whereas the remaining patches turn it progressively into BFQ-v7r4. Here are some nice features of this last version. Low latency for interactive applications According to our results, regardless of the actual background workload, for interactive tasks the storage device is virtually as responsive as if it was idle. For example, even if one or more of the following background workloads are being executed: - one or more large files are being read or written, - a tree of source files is being compiled, - one or more virtual machines are performing I/O, - a software update is in progress, - indexing daemons are scanning filesystems and updating their databases, starting an application or loading a file from within an application takes about the same time as if the storage device was idle. As a comparison, with CFQ, NOOP or DEADLINE, and in the same conditions, applications experience high latencies, or even become unresponsive until the background workload terminates (also on SSDs). Low latency for soft real-time applications Also soft real-time applications, such as audio and video players/streamers, enjoy low latency and drop rate, regardless of the storage-device background workload. As a consequence, these applications do not suffer from almost any glitch due to the background workload. High throughput On hard disks, BFQ achieves up to 30% higher throughput than CFQ, and up to 150% higher throughput than DEADLINE and NOOP, with half of the parallel workloads considered in our tests. With the rest of the workloads, and with all the workloads on flash-based devices, BFQ achieves instead about the same throughput as the other schedulers. Strong fairness guarantees (already provided by BFQ-v0) As for long-term guarantees, BFQ distributes the device throughput (and not just the device time) as desired to I/O-bound applications, with any workload and regardless of the device parameters. What allows BFQ to provide the above features is its accurate scheduling engine (patches 1-4), combined with a set of simple heuristics and improvements (patches 5-14). A 15-minute demo of the performance of BFQ is available here [2]. I made this demo with an older version of BFQ (v3r4) and under Linux 3.4.0. We have further improved BFQ since then. The performance of the last version of BFQ is shown, e.g., through some graphs here [3], under 3.14.0, compared against CFQ, DEADLINE and NOOP, and on: a fast and a slow hard disk, a RAID1, an SSD, a microSDHC Card and an eMMC. As an example, our results on the SSD are reported also in a table at the end of this email. Finally, details on how BFQ and its components work are provided in the descriptions of the patches. An organic description of the main BFQ algorithm and of most of its features can instead be found in this paper [4]. Finally, as for testing in everyday use, BFQ is the default I/O scheduler in, e.g., Manjaro, Sabayon, OpenMandriva and Arch Linux ARM in some NAS boxes, plus several kernel forks for PCs and smartphones. BFQ is instead optionally available in, e.g., Arch, PCLinuxOS and Gentoo. In addition, we record a few tens of downloads per day from people using other distributions. The feedback received so far basically confirms the expected latency drop and throughput boost. Paolo Results on a Plextor PX-256M5S SSD The first two rows of the next table report the aggregate throughput achieved by BFQ, CFQ, DEADLINE and NOOP, while ten parallel processes read, either sequentially or randomly, a separate portion of the memory blocks each. These processes read directly from the device, and no process performs writes, to avoid writing large files repeatedly and wearing out the SSD during the many tests done. As can be seen, all schedulers achieve about the same throughput with sequential readers, whereas, with random readers, the throughput slightly grows as the complexity, and hence the execution time, of the schedulers decreases. In fact, with random readers, the number of IOPS is extremely higher, and all CPUs spend all the time either executing instructions or waiting for I/O (the total idle percentage is 0). Therefore, the processing time of I/O requests influences the maximum throughput achievable. The remaining rows report the cold-cache start-up time experienced by various applications while one of the above two workloads is being executed in parallel. In particular, "Start-up time 10 seq/rand" stands for "Start-up time of the application at hand while 10 sequential/random readers are
[PATCH RFC RESEND 00/14] New version of the BFQ I/O Scheduler
From: Paolo Valente paolo.vale...@unimore.it [Re-posting, previous attempt seems to have partially failed] Hi, this patchset introduces the last version of BFQ, a proportional-share storage-I/O scheduler. BFQ also supports hierarchical scheduling with a cgroups interface. The first version of BFQ was submitted a few years ago [1]. It is denoted as v0 in the patches, to distinguish it from the version I am submitting now, v7r4. In particular, the first four patches introduce BFQ-v0, whereas the remaining patches turn it progressively into BFQ-v7r4. Here are some nice features of this last version. Low latency for interactive applications According to our results, regardless of the actual background workload, for interactive tasks the storage device is virtually as responsive as if it was idle. For example, even if one or more of the following background workloads are being executed: - one or more large files are being read or written, - a tree of source files is being compiled, - one or more virtual machines are performing I/O, - a software update is in progress, - indexing daemons are scanning filesystems and updating their databases, starting an application or loading a file from within an application takes about the same time as if the storage device was idle. As a comparison, with CFQ, NOOP or DEADLINE, and in the same conditions, applications experience high latencies, or even become unresponsive until the background workload terminates (also on SSDs). Low latency for soft real-time applications Also soft real-time applications, such as audio and video players/streamers, enjoy low latency and drop rate, regardless of the storage-device background workload. As a consequence, these applications do not suffer from almost any glitch due to the background workload. High throughput On hard disks, BFQ achieves up to 30% higher throughput than CFQ, and up to 150% higher throughput than DEADLINE and NOOP, with half of the parallel workloads considered in our tests. With the rest of the workloads, and with all the workloads on flash-based devices, BFQ achieves instead about the same throughput as the other schedulers. Strong fairness guarantees (already provided by BFQ-v0) As for long-term guarantees, BFQ distributes the device throughput (and not just the device time) as desired to I/O-bound applications, with any workload and regardless of the device parameters. What allows BFQ to provide the above features is its accurate scheduling engine (patches 1-4), combined with a set of simple heuristics and improvements (patches 5-14). A 15-minute demo of the performance of BFQ is available here [2]. I made this demo with an older version of BFQ (v3r4) and under Linux 3.4.0. We have further improved BFQ since then. The performance of the last version of BFQ is shown, e.g., through some graphs here [3], under 3.14.0, compared against CFQ, DEADLINE and NOOP, and on: a fast and a slow hard disk, a RAID1, an SSD, a microSDHC Card and an eMMC. As an example, our results on the SSD are reported also in a table at the end of this email. Finally, details on how BFQ and its components work are provided in the descriptions of the patches. An organic description of the main BFQ algorithm and of most of its features can instead be found in this paper [4]. Finally, as for testing in everyday use, BFQ is the default I/O scheduler in, e.g., Manjaro, Sabayon, OpenMandriva and Arch Linux ARM in some NAS boxes, plus several kernel forks for PCs and smartphones. BFQ is instead optionally available in, e.g., Arch, PCLinuxOS and Gentoo. In addition, we record a few tens of downloads per day from people using other distributions. The feedback received so far basically confirms the expected latency drop and throughput boost. Paolo Results on a Plextor PX-256M5S SSD The first two rows of the next table report the aggregate throughput achieved by BFQ, CFQ, DEADLINE and NOOP, while ten parallel processes read, either sequentially or randomly, a separate portion of the memory blocks each. These processes read directly from the device, and no process performs writes, to avoid writing large files repeatedly and wearing out the SSD during the many tests done. As can be seen, all schedulers achieve about the same throughput with sequential readers, whereas, with random readers, the throughput slightly grows as the complexity, and hence the execution time, of the schedulers decreases. In fact, with random readers, the number of IOPS is extremely higher, and all CPUs spend all the time either executing instructions or waiting for I/O (the total idle percentage is 0). Therefore, the processing time of I/O requests influences the maximum throughput achievable. The remaining rows report the cold-cache start-up time experienced by various applications while one of the above two workloads is being executed in parallel. In particular, Start-up time 10 seq/rand stands for Start-up time of the application at hand while 10