Re: [PATCH V3 00/11] block-throttle: add .high limit
Not to compound upon this again. However if BFQ isn't suitable to replace CFQ for high I/O workloads (I've yet to see 20k IOPS on any reasonably sized SAN (SC4020 / v5000, etc)), can't we at-least default BFQ to become the default I/O scheduler for people otherwise requesting CFQ? Paolo has had a team of students working on this for years, even if the otherwise "secret weapon" is mainlined I highly doubt his work will stop. We're pretty close to fixing hard I/O stalls in Linux, mainlining being the last major burden. While I've contributed nothing to BFQ code wise, absolutely let any of us know if there's anything outstanding to solve hard lockups and I believe any of us will try our best. Kyle. On Sun, Oct 16, 2016 at 12:02 PM, Paolo Valentewrote: > >> Il giorno 14 ott 2016, alle ore 20:35, Tejun Heo ha >> scritto: >> >> Hello, Paolo. >> >> On Fri, Oct 14, 2016 at 07:13:41PM +0200, Paolo Valente wrote: >>> That said, your 'thus' seems a little too strong: "bfq does not yet >>> handle fast SSDs, thus we need something else". What about the >>> millions of devices (and people) still within 10-20 K IOPS, and >>> experiencing awful latencies and lack of bandwidth guarantees? >> >> I'm not objecting to any of that. > > Ok, sorry for misunderstanding. I'm just more and more confused about > why a readily available, and not proven wrong solution has not yet > been accepted, if everybody apparently acknowledges the problem. > >> My point just is that bfq, at least >> as currently implemented, is unfit for certain classes of use cases. >> > > Absolutely correct. > FWIW, it looks like the only way we can implement proportional control on highspeed ssds with acceptable overhead >>> >>> Maybe not: as I wrote to Viveck in a previous reply, containing >>> pointers to documentation, we have already achieved twenty millions >>> of decisions per second with a prototype driving existing >>> proportional-share packet schedulers (essentially without >>> modifications). >> >> And that doesn't require idling and thus doesn't severely impact >> utilization? >> > > Nope. Packets are commonly assumed to be sent asynchronously. > I guess that discussing the validity of this assumption is out of the > scope of this thread. > > Thanks, > Paolo > is somehow finding a way to calculate the cost of each IO and throttle IOs according to that while controlling for latency as necessary. Slice scheduling with idling seems too expensive with highspeed devices with high io depth. >>> >>> Yes, that's absolutely true. I'm already thinking about an idleless >>> solution. As I already wrote, I'm willing to help with scheduling in >>> blk-mq. I hope there will be the opportunity to find some way to go >>> at KS. >> >> It'd be great to have a proportional control mechanism whose overhead >> is acceptable. Unfortunately, we don't have one now and nothing seems >> right around the corner. (Mostly) work-conserving throttling would be >> fiddlier to use but is something which is useful regardless of such >> proportional control mechanism and can be obtained relatively easily. >> >> I don't see why the two approaches would be mutually exclusive. >> >> Thanks. >> >> -- >> tejun >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-block" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > Paolo Valente > Algogroup > Dipartimento di Scienze Fisiche, Informatiche e Matematiche > Via Campi 213/B > 41125 Modena - Italy > http://algogroup.unimore.it/people/paolo/ > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
> Il giorno 14 ott 2016, alle ore 20:35, Tejun Heoha scritto: > > Hello, Paolo. > > On Fri, Oct 14, 2016 at 07:13:41PM +0200, Paolo Valente wrote: >> That said, your 'thus' seems a little too strong: "bfq does not yet >> handle fast SSDs, thus we need something else". What about the >> millions of devices (and people) still within 10-20 K IOPS, and >> experiencing awful latencies and lack of bandwidth guarantees? > > I'm not objecting to any of that. Ok, sorry for misunderstanding. I'm just more and more confused about why a readily available, and not proven wrong solution has not yet been accepted, if everybody apparently acknowledges the problem. > My point just is that bfq, at least > as currently implemented, is unfit for certain classes of use cases. > Absolutely correct. >>> FWIW, it looks like the only way we can implement proportional control >>> on highspeed ssds with acceptable overhead >> >> Maybe not: as I wrote to Viveck in a previous reply, containing >> pointers to documentation, we have already achieved twenty millions >> of decisions per second with a prototype driving existing >> proportional-share packet schedulers (essentially without >> modifications). > > And that doesn't require idling and thus doesn't severely impact > utilization? > Nope. Packets are commonly assumed to be sent asynchronously. I guess that discussing the validity of this assumption is out of the scope of this thread. Thanks, Paolo >>> is somehow finding a way to >>> calculate the cost of each IO and throttle IOs according to that while >>> controlling for latency as necessary. Slice scheduling with idling >>> seems too expensive with highspeed devices with high io depth. >> >> Yes, that's absolutely true. I'm already thinking about an idleless >> solution. As I already wrote, I'm willing to help with scheduling in >> blk-mq. I hope there will be the opportunity to find some way to go >> at KS. > > It'd be great to have a proportional control mechanism whose overhead > is acceptable. Unfortunately, we don't have one now and nothing seems > right around the corner. (Mostly) work-conserving throttling would be > fiddlier to use but is something which is useful regardless of such > proportional control mechanism and can be obtained relatively easily. > > I don't see why the two approaches would be mutually exclusive. > > Thanks. > > -- > tejun > -- > To unsubscribe from this list: send the line "unsubscribe linux-block" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Paolo Valente Algogroup Dipartimento di Scienze Fisiche, Informatiche e Matematiche Via Campi 213/B 41125 Modena - Italy http://algogroup.unimore.it/people/paolo/ -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
> Il giorno 14 ott 2016, alle ore 18:40, Tejun Heoha scritto: > > Hello, Kyle. > > On Sat, Oct 08, 2016 at 06:15:14PM -0700, Kyle Sanderson wrote: >> How is this even a discussion when hard numbers, and trying any >> reproduction case easily reproduce the issues that CFQ causes. Reading >> this thread, and many others only grows not only my disappointment, >> but whenever someone launches kterm or scrot and their machine >> freezes, leaves a selective few individuals completely responsible for >> this. Help those users, help yourself, help Linux. > > So, just to be clear. I wasn't arguing against bfq replacing cfq (or > anything along that line) but that proportional control, as > implemented, would be too costly for many use cases and thus we need > something along the line of what Shaohua is proposing. > Sorry for dropping in all the times, but the vision that you and some other guys propose seems to miss some important piece (unless, now or then, you will patiently prove me wrong, or I will finally understand on my own why I'm wrong). You are of course right: bfq, as a component of blk, and above all, as a sort of derivative of CFQ (and of its overhead), has currently too high a overhead to handle more than 10-20K IOPS. That said, your 'thus' seems a little too strong: "bfq does not yet handle fast SSDs, thus we need something else". What about the millions of devices (and people) still within 10-20 K IOPS, and experiencing awful latencies and lack of bandwidth guarantees? For certain systems or applications, it isn't even just a "buy a fast SSD" matter, but a technological constraint. > FWIW, it looks like the only way we can implement proportional control > on highspeed ssds with acceptable overhead Maybe not: as I wrote to Viveck in a previous reply, containing pointers to documentation, we have already achieved twenty millions of decisions per second with a prototype driving existing proportional-share packet schedulers (essentially without modifications). > is somehow finding a way to > calculate the cost of each IO and throttle IOs according to that while > controlling for latency as necessary. Slice scheduling with idling > seems too expensive with highspeed devices with high io depth. > Yes, that's absolutely true. I'm already thinking about an idleless solution. As I already wrote, I'm willing to help with scheduling in blk-mq. I hope there will be the opportunity to find some way to go at KS. Thanks, Paolo > Thanks. > > -- > tejun -- Paolo Valente Algogroup Dipartimento di Scienze Fisiche, Informatiche e Matematiche Via Campi 213/B 41125 Modena - Italy http://algogroup.unimore.it/people/paolo/ -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
> Il giorno 06 ott 2016, alle ore 21:57, Shaohua Liha scritto: > > On Thu, Oct 06, 2016 at 09:58:44AM +0200, Paolo Valente wrote: >> >>> Il giorno 05 ott 2016, alle ore 22:46, Shaohua Li ha scritto: >>> >>> On Wed, Oct 05, 2016 at 09:47:19PM +0200, Paolo Valente wrote: > Il giorno 05 ott 2016, alle ore 20:30, Shaohua Li ha > scritto: > > On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote: >> Hello, Paolo. >> >> On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: >>> In this respect, for your generic, unpredictable scenario to make >>> sense, there must exist at least one real system that meets the >>> requirements of such a scenario. Or, if such a real system does not >>> yet exist, it must be possible to emulate it. If it is impossible to >>> achieve this last goal either, then I miss the usefulness >>> of looking for solutions for such a scenario. >>> >>> That said, let's define the instance(s) of the scenario that you find >>> most representative, and let's test BFQ on it/them. Numbers will give >>> us the answers. For example, what about all or part of the following >>> groups: >>> . one cyclically doing random I/O for some second and then sequential >>> I/O >>> for the next seconds >>> . one doing, say, quasi-sequential I/O in ON/OFF cycles >>> . one starting an application cyclically >>> . one playing back or streaming a movie >>> >>> For each group, we could then measure the time needed to complete each >>> phase of I/O in each cycle, plus the responsiveness in the group >>> starting an application, plus the frame drop in the group streaming >>> the movie. In addition, we can measure the bandwidth/iops enjoyed by >>> each group, plus, of course, the aggregate throughput of the whole >>> system. In particular we could compare results with throttling, BFQ, >>> and CFQ. >>> >>> Then we could write resulting numbers on the stone, and stick to them >>> until something proves them wrong. >>> >>> What do you (or others) think about it? >> >> That sounds great and yeah it's lame that we didn't start with that. >> Shaohua, would it be difficult to compare how bfq performs against >> blk-throttle? > > I had a test of BFQ. Thank you very much for testing BFQ! > I'm using BFQ found at > https://urldefense.proofpoint.com/v2/url?u=http-3A__algogroup.unimore.it_people_paolo_disk-5Fsched_sources.php=DQIFAg=5VD0RTtNlTh3ycd41b3MUw=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4=2pG8KEx5tRymExa_K0ddKH_YvhH3qvJxELBd1_lw0-w=FZKEAOu2sw95y9jZio2k012cQWoLzlBWDl0NiGPVW78= > . version is > 4.7.0-v8r3. That's the latest stable version. The development version [1] already contains further improvements for fairness, latency and throughput. It is however still a release candidate. [1] https://github.com/linusw/linux-bfq/tree/bfq-v8 > It's a LSI SSD, queue depth 32. I use default setting. fio script > is: > > [global] > ioengine=libaio > direct=1 > readwrite=randread > bs=4k > runtime=60 > time_based=1 > file_service_type=random:36 > overwrite=1 > thread=0 > group_reporting=1 > filename=/dev/sdb > iodepth=1 > numjobs=8 > > [groupA] > prio=2 > > [groupB] > new_group > prio=6 > > I'll change iodepth, numjobs and prio in different tests. result unit is > MB/s. > > iodepth=1 numjobs=1 prio 4:4 > CFQ: 28:28 BFQ: 21:21 deadline: 29:29 > > iodepth=8 numjobs=1 prio 4:4 > CFQ: 162:162 BFQ: 102:98 deadline: 205:205 > > iodepth=1 numjobs=8 prio 4:4 > CFQ: 157:157 BFQ: 81:92 deadline: 196:197 > > iodepth=1 numjobs=1 prio 2:6 > CFQ: 26.7:27.6 BFQ: 20:6 deadline: 29:29 > > iodepth=8 numjobs=1 prio 2:6 > CFQ: 166:174 BFQ: 139:72 deadline: 202:202 > > iodepth=1 numjobs=8 prio 2:6 > CFQ: 148:150 BFQ: 90:77 deadline: 198:197 > > CFQ isn't fair at all. BFQ is very good in this side, but has poor > throughput > even prio is the default value. > Throughput is lower with BFQ for two reasons. First, you certainly left the low_latency in its default state, i.e., on. As explained, e.g., here [2], low_latency mode is totally geared towards maximum responsiveness and minimum latency for soft real-time applications (e.g., video players). To achieve this goal, BFQ is willing to perform more idling, when necessary. This lowers throughput (I'll get back on this at the end of the discussion of the second reason). >>> >>> changing low_latency to 0 seems not change anything, at least for the test: >>> iodepth=1 numjobs=1 prio 2:6 A bs 4k:64k >>> The second, most
Re: [PATCH V3 00/11] block-throttle: add .high limit
> Il giorno 06 ott 2016, alle ore 20:32, Vivek Goyalha > scritto: > > On Thu, Oct 06, 2016 at 08:01:42PM +0200, Paolo Valente wrote: >> >>> Il giorno 06 ott 2016, alle ore 19:49, Vivek Goyal ha >>> scritto: >>> >>> On Thu, Oct 06, 2016 at 03:15:50PM +0200, Paolo Valente wrote: >>> >>> [..] Shaohua, I have just realized that I have unconsciously defended a wrong argument. Although all the facts that I have reported are evidently true, I have argued as if the question was: "do we need to throw away throttling because there is proportional, or do we need to throw away proportional share because there is throttling?". This question is simply wrong, as I think consciously (sorry for my dissociated behavior :) ). >>> >>> I was wondering about the same. We need both and both should be able >>> to work with fast devices of today using blk-mq interfaces without >>> much overhead. >>> The best goal to achieve is to have both a good throttling mechanism, and a good proportional share scheduler. This goal would be valid if even if there was just one important scenario for each of the two approaches. The vulnus here is that you guys are constantly, and rightly, working on solutions to achieve and consolidate reasonable QoS guarantees, but an apparently very good proportional-share scheduler has been kept off for years. If you (or others) have good arguments to support this state of affairs, then this would probably be an important point to discuss. >>> >>> Paolo, CFQ is legacy now and if we can come up with a proportional >>> IO mechanism which works reasonably well with fast devices using >>> blk-mq interfaces, that will be much more interesting. >>> >> >> That's absolutely true. But, why do we pretend not to know that, for >> (at least) hundreds of thousands of users Linux will go on giving bad >> responsiveness, starvation, high latency and unfairness, until blk >> will not be used any more (assuming that these problems will somehow >> disappear will blk-mq). Many of these users are fully aware of these >> Linux long-standing problems. We could solve these problems by just >> adding a scheduler that has already been adopted, and thus extensively >> tested, by thousands of users. And more and more people are aware of >> this fact too. Are we doing the right thing? > > Hi Paolo, > Hi > People have been using CFQ for many years. Yes, but allow me just to add that a lot of people have also been unhappy with CFQ for many years. > I am not sure if benefits > offered by BFQ over CFQ are significant enough to justify taking a > completely new code and get rid of CFQ. Or are the benfits significant > enough that one feels like putting time and effort into this and > take chances wiht new code. > Although I think that BFQ's benefits are relevant (but I'm a little bit an interested party :) ), I do agree that abruptly replacing the most used I/O scheduler (AFAIK) with a so different one is at least a little risky. > At this point of time replacing CFQ with something better is not a > priority for me. ok > But if something better and stable goes upstream, I > will gladly use it. > Then, in case of success, I will be glad to receive some feedback from you, and possibly use it to improve the set of ideas that we have put into BFQ. Thank you, Paolo > Vivek > -- > To unsubscribe from this list: send the line "unsubscribe linux-block" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Paolo Valente Algogroup Dipartimento di Scienze Fisiche, Informatiche e Matematiche Via Campi 213/B 41125 Modena - Italy http://algogroup.unimore.it/people/paolo/ -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
On 2016-10-06 08:50, Paolo Valente wrote: Il giorno 06 ott 2016, alle ore 13:57, Austin S. Hemmelgarnha scritto: On 2016-10-06 07:03, Mark Brown wrote: On Thu, Oct 06, 2016 at 10:04:41AM +0200, Linus Walleij wrote: On Tue, Oct 4, 2016 at 9:14 PM, Tejun Heo wrote: I get that bfq can be a good compromise on most desktop workloads and behave reasonably well for some server workloads with the slice expiration mechanism but it really isn't an IO resource partitioning mechanism. Not just desktops, also Android phones. So why not have BFQ as a separate scheduling policy upstream, alongside CFQ, deadline and noop? Right. We're already doing the per-usecase Kconfig thing for preemption. But maybe somebody already hates that and want to get rid of it, I don't know. Hannes also suggested going back to making BFQ a separate scheduler rather than replacing CFQ earlier, pointing out that it mitigates against the risks of changing CFQ substantially at this point (which seems to be the biggest issue here). ISTR that the original argument for this approach essentially amounted to: 'If it's so much better, why do we need both?'. Such an argument is valid only if the new design is better in all respects (which there isn't sufficient information to decide in this case), or the negative aspects are worth the improvements (which is too workload specific to decide for something like this). All correct, apart from the workload-specific issue, which is not very clear to me. Over the last five years I have not found a single workload for which CFQ is better than BFQ, and none has been suggested. My point is that whether or not BFQ is better depends on the workload. You can't test for every workload, so you can't say definitively that BFQ is better for every workload. At a minimum, there are workloads where the deadline and noop schedulers are better, but they're very domain specific workloads. Based on the numbers from Shaohua, it looks like CFQ has better throughput than BFQ, and that will affect some workloads (for most, the improved fairness is worth the reduced throughput, but there probably are some cases where it isn't). Anyway, leaving aside this fact, IMO the real problem here is that we are in a catch-22: "we want BFQ to replace CFQ, but, since CFQ is legacy code, then you cannot change, and thus replace, CFQ" I agree that that's part of the issue, but I also don't entirely agree with the reasoning on it. Until blk-mq has proper I/O scheduling, people will continue to use CFQ, and based on the way things are going, it will be multiple months before that happens, whereas BFQ exists and is working now. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
> Il giorno 06 ott 2016, alle ore 09:58, Paolo Valente >ha scritto: > >> >> Il giorno 05 ott 2016, alle ore 22:46, Shaohua Li ha scritto: >> >> On Wed, Oct 05, 2016 at 09:47:19PM +0200, Paolo Valente wrote: >>> Il giorno 05 ott 2016, alle ore 20:30, Shaohua Li ha scritto: On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote: > Hello, Paolo. > > On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: >> In this respect, for your generic, unpredictable scenario to make >> sense, there must exist at least one real system that meets the >> requirements of such a scenario. Or, if such a real system does not >> yet exist, it must be possible to emulate it. If it is impossible to >> achieve this last goal either, then I miss the usefulness >> of looking for solutions for such a scenario. >> >> That said, let's define the instance(s) of the scenario that you find >> most representative, and let's test BFQ on it/them. Numbers will give >> us the answers. For example, what about all or part of the following >> groups: >> . one cyclically doing random I/O for some second and then sequential I/O >> for the next seconds >> . one doing, say, quasi-sequential I/O in ON/OFF cycles >> . one starting an application cyclically >> . one playing back or streaming a movie >> >> For each group, we could then measure the time needed to complete each >> phase of I/O in each cycle, plus the responsiveness in the group >> starting an application, plus the frame drop in the group streaming >> the movie. In addition, we can measure the bandwidth/iops enjoyed by >> each group, plus, of course, the aggregate throughput of the whole >> system. In particular we could compare results with throttling, BFQ, >> and CFQ. >> >> Then we could write resulting numbers on the stone, and stick to them >> until something proves them wrong. >> >> What do you (or others) think about it? > > That sounds great and yeah it's lame that we didn't start with that. > Shaohua, would it be difficult to compare how bfq performs against > blk-throttle? I had a test of BFQ. >>> >>> Thank you very much for testing BFQ! >>> I'm using BFQ found at https://urldefense.proofpoint.com/v2/url?u=http-3A__algogroup.unimore.it_people_paolo_disk-5Fsched_sources.php=DQIFAg=5VD0RTtNlTh3ycd41b3MUw=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4=2pG8KEx5tRymExa_K0ddKH_YvhH3qvJxELBd1_lw0-w=FZKEAOu2sw95y9jZio2k012cQWoLzlBWDl0NiGPVW78= . version is 4.7.0-v8r3. >>> >>> That's the latest stable version. The development version [1] already >>> contains further improvements for fairness, latency and throughput. >>> It is however still a release candidate. >>> >>> [1] https://github.com/linusw/linux-bfq/tree/bfq-v8 >>> It's a LSI SSD, queue depth 32. I use default setting. fio script is: [global] ioengine=libaio direct=1 readwrite=randread bs=4k runtime=60 time_based=1 file_service_type=random:36 overwrite=1 thread=0 group_reporting=1 filename=/dev/sdb iodepth=1 numjobs=8 [groupA] prio=2 [groupB] new_group prio=6 I'll change iodepth, numjobs and prio in different tests. result unit is MB/s. iodepth=1 numjobs=1 prio 4:4 CFQ: 28:28 BFQ: 21:21 deadline: 29:29 iodepth=8 numjobs=1 prio 4:4 CFQ: 162:162 BFQ: 102:98 deadline: 205:205 iodepth=1 numjobs=8 prio 4:4 CFQ: 157:157 BFQ: 81:92 deadline: 196:197 iodepth=1 numjobs=1 prio 2:6 CFQ: 26.7:27.6 BFQ: 20:6 deadline: 29:29 iodepth=8 numjobs=1 prio 2:6 CFQ: 166:174 BFQ: 139:72 deadline: 202:202 iodepth=1 numjobs=8 prio 2:6 CFQ: 148:150 BFQ: 90:77 deadline: 198:197 CFQ isn't fair at all. BFQ is very good in this side, but has poor throughput even prio is the default value. >>> >>> Throughput is lower with BFQ for two reasons. >>> >>> First, you certainly left the low_latency in its default state, i.e., >>> on. As explained, e.g., here [2], low_latency mode is totally geared >>> towards maximum responsiveness and minimum latency for soft real-time >>> applications (e.g., video players). To achieve this goal, BFQ is >>> willing to perform more idling, when necessary. This lowers >>> throughput (I'll get back on this at the end of the discussion of the >>> second reason). >> >> changing low_latency to 0 seems not change anything, at least for the test: >> iodepth=1 numjobs=1 prio 2:6 A bs 4k:64k >> >>> The second, most important reason, is that a minimum of idling is the >>> *only* way to achieve differentiated bandwidth distribution, as you >>> requested by setting different ioprios. I stress that this
Re: [PATCH V3 00/11] block-throttle: add .high limit
> Il giorno 06 ott 2016, alle ore 13:57, Austin S. Hemmelgarn >ha scritto: > > On 2016-10-06 07:03, Mark Brown wrote: >> On Thu, Oct 06, 2016 at 10:04:41AM +0200, Linus Walleij wrote: >>> On Tue, Oct 4, 2016 at 9:14 PM, Tejun Heo wrote: >> I get that bfq can be a good compromise on most desktop workloads and behave reasonably well for some server workloads with the slice expiration mechanism but it really isn't an IO resource partitioning mechanism. >> >>> Not just desktops, also Android phones. >> >>> So why not have BFQ as a separate scheduling policy upstream, >>> alongside CFQ, deadline and noop? >> >> Right. >> >>> We're already doing the per-usecase Kconfig thing for preemption. >>> But maybe somebody already hates that and want to get rid of it, >>> I don't know. >> >> Hannes also suggested going back to making BFQ a separate scheduler >> rather than replacing CFQ earlier, pointing out that it mitigates >> against the risks of changing CFQ substantially at this point (which >> seems to be the biggest issue here). >> > ISTR that the original argument for this approach essentially amounted to: > 'If it's so much better, why do we need both?'. > > Such an argument is valid only if the new design is better in all respects > (which there isn't sufficient information to decide in this case), or the > negative aspects are worth the improvements (which is too workload specific > to decide for something like this). All correct, apart from the workload-specific issue, which is not very clear to me. Over the last five years I have not found a single workload for which CFQ is better than BFQ, and none has been suggested. Anyway, leaving aside this fact, IMO the real problem here is that we are in a catch-22: "we want BFQ to replace CFQ, but, since CFQ is legacy code, then you cannot change, and thus replace, CFQ" Thanks, Paolo -- Paolo Valente Algogroup Dipartimento di Scienze Fisiche, Informatiche e Matematiche Via Campi 213/B 41125 Modena - Italy http://algogroup.unimore.it/people/paolo/ -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
On Thu, Oct 06, 2016 at 10:04:41AM +0200, Linus Walleij wrote: > On Tue, Oct 4, 2016 at 9:14 PM, Tejun Heowrote: > > I get that bfq can be a good compromise on most desktop workloads and > > behave reasonably well for some server workloads with the slice > > expiration mechanism but it really isn't an IO resource partitioning > > mechanism. > Not just desktops, also Android phones. > So why not have BFQ as a separate scheduling policy upstream, > alongside CFQ, deadline and noop? Right. > We're already doing the per-usecase Kconfig thing for preemption. > But maybe somebody already hates that and want to get rid of it, > I don't know. Hannes also suggested going back to making BFQ a separate scheduler rather than replacing CFQ earlier, pointing out that it mitigates against the risks of changing CFQ substantially at this point (which seems to be the biggest issue here). signature.asc Description: PGP signature
Re: [PATCH V3 00/11] block-throttle: add .high limit
On Tue, Oct 4, 2016 at 9:14 PM, Tejun Heowrote: > I get that bfq can be a good compromise on most desktop workloads and > behave reasonably well for some server workloads with the slice > expiration mechanism but it really isn't an IO resource partitioning > mechanism. Not just desktops, also Android phones. So why not have BFQ as a separate scheduling policy upstream, alongside CFQ, deadline and noop? I understand the CPU scheduler people's position that they want one scheduler for everyone's everyday loads (except RT and SCHED_DEADLINE) and I guess that is the source of the highlander "there can be only one" argument, but note this: kernel/Kconfig.preempt: config PREEMPT_NONE bool "No Forced Preemption (Server)" config PREEMPT_VOLUNTARY bool "Voluntary Kernel Preemption (Desktop)" config PREEMPT bool "Preemptible Kernel (Low-Latency Desktop)" We're already doing the per-usecase Kconfig thing for preemption. But maybe somebody already hates that and want to get rid of it, I don't know. Yours, Linus Walleij -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
On Wed, Oct 05, 2016 at 09:47:19PM +0200, Paolo Valente wrote: > > > Il giorno 05 ott 2016, alle ore 20:30, Shaohua Liha scritto: > > > > On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote: > >> Hello, Paolo. > >> > >> On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: > >>> In this respect, for your generic, unpredictable scenario to make > >>> sense, there must exist at least one real system that meets the > >>> requirements of such a scenario. Or, if such a real system does not > >>> yet exist, it must be possible to emulate it. If it is impossible to > >>> achieve this last goal either, then I miss the usefulness > >>> of looking for solutions for such a scenario. > >>> > >>> That said, let's define the instance(s) of the scenario that you find > >>> most representative, and let's test BFQ on it/them. Numbers will give > >>> us the answers. For example, what about all or part of the following > >>> groups: > >>> . one cyclically doing random I/O for some second and then sequential I/O > >>> for the next seconds > >>> . one doing, say, quasi-sequential I/O in ON/OFF cycles > >>> . one starting an application cyclically > >>> . one playing back or streaming a movie > >>> > >>> For each group, we could then measure the time needed to complete each > >>> phase of I/O in each cycle, plus the responsiveness in the group > >>> starting an application, plus the frame drop in the group streaming > >>> the movie. In addition, we can measure the bandwidth/iops enjoyed by > >>> each group, plus, of course, the aggregate throughput of the whole > >>> system. In particular we could compare results with throttling, BFQ, > >>> and CFQ. > >>> > >>> Then we could write resulting numbers on the stone, and stick to them > >>> until something proves them wrong. > >>> > >>> What do you (or others) think about it? > >> > >> That sounds great and yeah it's lame that we didn't start with that. > >> Shaohua, would it be difficult to compare how bfq performs against > >> blk-throttle? > > > > I had a test of BFQ. > > Thank you very much for testing BFQ! > > > I'm using BFQ found at > > https://urldefense.proofpoint.com/v2/url?u=http-3A__algogroup.unimore.it_people_paolo_disk-5Fsched_sources.php=DQIFAg=5VD0RTtNlTh3ycd41b3MUw=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4=2pG8KEx5tRymExa_K0ddKH_YvhH3qvJxELBd1_lw0-w=FZKEAOu2sw95y9jZio2k012cQWoLzlBWDl0NiGPVW78= > > . version is > > 4.7.0-v8r3. > > That's the latest stable version. The development version [1] already > contains further improvements for fairness, latency and throughput. > It is however still a release candidate. > > [1] https://github.com/linusw/linux-bfq/tree/bfq-v8 > > > It's a LSI SSD, queue depth 32. I use default setting. fio script > > is: > > > > [global] > > ioengine=libaio > > direct=1 > > readwrite=randread > > bs=4k > > runtime=60 > > time_based=1 > > file_service_type=random:36 > > overwrite=1 > > thread=0 > > group_reporting=1 > > filename=/dev/sdb > > iodepth=1 > > numjobs=8 > > > > [groupA] > > prio=2 > > > > [groupB] > > new_group > > prio=6 > > > > I'll change iodepth, numjobs and prio in different tests. result unit is > > MB/s. > > > > iodepth=1 numjobs=1 prio 4:4 > > CFQ: 28:28 BFQ: 21:21 deadline: 29:29 > > > > iodepth=8 numjobs=1 prio 4:4 > > CFQ: 162:162 BFQ: 102:98 deadline: 205:205 > > > > iodepth=1 numjobs=8 prio 4:4 > > CFQ: 157:157 BFQ: 81:92 deadline: 196:197 > > > > iodepth=1 numjobs=1 prio 2:6 > > CFQ: 26.7:27.6 BFQ: 20:6 deadline: 29:29 > > > > iodepth=8 numjobs=1 prio 2:6 > > CFQ: 166:174 BFQ: 139:72 deadline: 202:202 > > > > iodepth=1 numjobs=8 prio 2:6 > > CFQ: 148:150 BFQ: 90:77 deadline: 198:197 > > > > CFQ isn't fair at all. BFQ is very good in this side, but has poor > > throughput > > even prio is the default value. > > > > Throughput is lower with BFQ for two reasons. > > First, you certainly left the low_latency in its default state, i.e., > on. As explained, e.g., here [2], low_latency mode is totally geared > towards maximum responsiveness and minimum latency for soft real-time > applications (e.g., video players). To achieve this goal, BFQ is > willing to perform more idling, when necessary. This lowers > throughput (I'll get back on this at the end of the discussion of the > second reason). changing low_latency to 0 seems not change anything, at least for the test: iodepth=1 numjobs=1 prio 2:6 A bs 4k:64k > The second, most important reason, is that a minimum of idling is the > *only* way to achieve differentiated bandwidth distribution, as you > requested by setting different ioprios. I stress that this constraint > is not a technological accident, but a intrinsic, logical necessity. > The proof is simple, and if the following explanation is too boring or > confusing, I can show it to you with any trace of sync I/O. > > First, to provide differentiated service, you need per-process > scheduling, i.e., schedulers in which there is a separate
Re: [PATCH V3 00/11] block-throttle: add .high limit
On Wed, Oct 05, 2016 at 09:57:22PM +0200, Paolo Valente wrote: > > > Il giorno 05 ott 2016, alle ore 21:08, Shaohua Liha scritto: > > > > On Wed, Oct 05, 2016 at 11:30:53AM -0700, Shaohua Li wrote: > >> On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote: > >>> Hello, Paolo. > >>> > >>> On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: > In this respect, for your generic, unpredictable scenario to make > sense, there must exist at least one real system that meets the > requirements of such a scenario. Or, if such a real system does not > yet exist, it must be possible to emulate it. If it is impossible to > achieve this last goal either, then I miss the usefulness > of looking for solutions for such a scenario. > > That said, let's define the instance(s) of the scenario that you find > most representative, and let's test BFQ on it/them. Numbers will give > us the answers. For example, what about all or part of the following > groups: > . one cyclically doing random I/O for some second and then sequential I/O > for the next seconds > . one doing, say, quasi-sequential I/O in ON/OFF cycles > . one starting an application cyclically > . one playing back or streaming a movie > > For each group, we could then measure the time needed to complete each > phase of I/O in each cycle, plus the responsiveness in the group > starting an application, plus the frame drop in the group streaming > the movie. In addition, we can measure the bandwidth/iops enjoyed by > each group, plus, of course, the aggregate throughput of the whole > system. In particular we could compare results with throttling, BFQ, > and CFQ. > > Then we could write resulting numbers on the stone, and stick to them > until something proves them wrong. > > What do you (or others) think about it? > >>> > >>> That sounds great and yeah it's lame that we didn't start with that. > >>> Shaohua, would it be difficult to compare how bfq performs against > >>> blk-throttle? > >> > >> I had a test of BFQ. I'm using BFQ found at > >> https://urldefense.proofpoint.com/v2/url?u=http-3A__algogroup.unimore.it_people_paolo_disk-5Fsched_sources.php=DQIFAg=5VD0RTtNlTh3ycd41b3MUw=X13hAPkxmvBro1Ug8vcKHw=zB09S7v2QifXXTa6f2_r6YLjiXq3AwAi7sqO4o2UfBQ=oMKpjQMXfWmMwHmANB-Qnrm2EdERzz9Oef7jcLkbyFg= > >> . version is > >> 4.7.0-v8r3. It's a LSI SSD, queue depth 32. I use default setting. fio > >> script > >> is: > >> > >> [global] > >> ioengine=libaio > >> direct=1 > >> readwrite=randread > >> bs=4k > >> runtime=60 > >> time_based=1 > >> file_service_type=random:36 > >> overwrite=1 > >> thread=0 > >> group_reporting=1 > >> filename=/dev/sdb > >> iodepth=1 > >> numjobs=8 > >> > >> [groupA] > >> prio=2 > >> > >> [groupB] > >> new_group > >> prio=6 > >> > >> I'll change iodepth, numjobs and prio in different tests. result unit is > >> MB/s. > >> > >> iodepth=1 numjobs=1 prio 4:4 > >> CFQ: 28:28 BFQ: 21:21 deadline: 29:29 > >> > >> iodepth=8 numjobs=1 prio 4:4 > >> CFQ: 162:162 BFQ: 102:98 deadline: 205:205 > >> > >> iodepth=1 numjobs=8 prio 4:4 > >> CFQ: 157:157 BFQ: 81:92 deadline: 196:197 > >> > >> iodepth=1 numjobs=1 prio 2:6 > >> CFQ: 26.7:27.6 BFQ: 20:6 deadline: 29:29 > >> > >> iodepth=8 numjobs=1 prio 2:6 > >> CFQ: 166:174 BFQ: 139:72 deadline: 202:202 > >> > >> iodepth=1 numjobs=8 prio 2:6 > >> CFQ: 148:150 BFQ: 90:77 deadline: 198:197 > > > > More tests: > > > > iodepth=8 numjobs=1 prio 2:6, group A has 50M/s limit > > CFQ:51:207 BFQ: 51:45 deadline: 51:216 > > > > iodepth=1 numjobs=1 prio 2:6, group A bs=4k, group B bs=64k > > CFQ:25:249 BFQ: 23:42 deadline: 26:251 > > > > A true proportional share scheduler like BFQ works under the > assumption to be the only limiter of the bandwidth of its clients. > And the availability of such a scheduler should apparently make > bandwidth limiting useless: once you have a mechanism that allows you > to give each group the desired fraction of the bandwidth, and to > redistribute excess bandwidth seamlessly when needed, what do you need > additional limiting for? > > But I'm not expert of any possible system configuration or > requirement. So, if you have practical examples, I would really > appreciate them. And I don't think it will be difficult to see what > goes wrong in BFQ with external bw limitation, and to fix the > problem. I think the test emulates a very common configuration. We assign more IO resources to high priority workload. But such workload doesn't always dispatch enough io. That's why I set a rate limit. When this happend, we hope low priority workload uses the disk bandwidth. That's the whole point of disk sharing. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH V3 00/11] block-throttle: add .high limit
> Il giorno 05 ott 2016, alle ore 21:47, Paolo Valente >ha scritto: > >> >> Il giorno 05 ott 2016, alle ore 20:30, Shaohua Li ha scritto: >> >> On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote: >>> Hello, Paolo. >>> >>> On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: In this respect, for your generic, unpredictable scenario to make sense, there must exist at least one real system that meets the requirements of such a scenario. Or, if such a real system does not yet exist, it must be possible to emulate it. If it is impossible to achieve this last goal either, then I miss the usefulness of looking for solutions for such a scenario. That said, let's define the instance(s) of the scenario that you find most representative, and let's test BFQ on it/them. Numbers will give us the answers. For example, what about all or part of the following groups: . one cyclically doing random I/O for some second and then sequential I/O for the next seconds . one doing, say, quasi-sequential I/O in ON/OFF cycles . one starting an application cyclically . one playing back or streaming a movie For each group, we could then measure the time needed to complete each phase of I/O in each cycle, plus the responsiveness in the group starting an application, plus the frame drop in the group streaming the movie. In addition, we can measure the bandwidth/iops enjoyed by each group, plus, of course, the aggregate throughput of the whole system. In particular we could compare results with throttling, BFQ, and CFQ. Then we could write resulting numbers on the stone, and stick to them until something proves them wrong. What do you (or others) think about it? >>> >>> That sounds great and yeah it's lame that we didn't start with that. >>> Shaohua, would it be difficult to compare how bfq performs against >>> blk-throttle? >> >> I had a test of BFQ. > > Thank you very much for testing BFQ! > >> I'm using BFQ found at >> http://algogroup.unimore.it/people/paolo/disk_sched/sources.php. version is >> 4.7.0-v8r3. > > That's the latest stable version. The development version [1] already > contains further improvements for fairness, latency and throughput. > It is however still a release candidate. > > [1] https://github.com/linusw/linux-bfq/tree/bfq-v8 > >> It's a LSI SSD, queue depth 32. I use default setting. fio script >> is: >> >> [global] >> ioengine=libaio >> direct=1 >> readwrite=randread >> bs=4k >> runtime=60 >> time_based=1 >> file_service_type=random:36 >> overwrite=1 >> thread=0 >> group_reporting=1 >> filename=/dev/sdb >> iodepth=1 >> numjobs=8 >> >> [groupA] >> prio=2 >> >> [groupB] >> new_group >> prio=6 >> >> I'll change iodepth, numjobs and prio in different tests. result unit is >> MB/s. >> >> iodepth=1 numjobs=1 prio 4:4 >> CFQ: 28:28 BFQ: 21:21 deadline: 29:29 >> >> iodepth=8 numjobs=1 prio 4:4 >> CFQ: 162:162 BFQ: 102:98 deadline: 205:205 >> >> iodepth=1 numjobs=8 prio 4:4 >> CFQ: 157:157 BFQ: 81:92 deadline: 196:197 >> >> iodepth=1 numjobs=1 prio 2:6 >> CFQ: 26.7:27.6 BFQ: 20:6 deadline: 29:29 >> >> iodepth=8 numjobs=1 prio 2:6 >> CFQ: 166:174 BFQ: 139:72 deadline: 202:202 >> >> iodepth=1 numjobs=8 prio 2:6 >> CFQ: 148:150 BFQ: 90:77 deadline: 198:197 >> >> CFQ isn't fair at all. BFQ is very good in this side, but has poor throughput >> even prio is the default value. >> > > Throughput is lower with BFQ for two reasons. > > First, you certainly left the low_latency in its default state, i.e., > on. As explained, e.g., here [2], low_latency mode is totally geared > towards maximum responsiveness and minimum latency for soft real-time > applications (e.g., video players). To achieve this goal, BFQ is > willing to perform more idling, when necessary. This lowers > throughput (I'll get back on this at the end of the discussion of the > second reason). > > The second, most important reason, is that a minimum of idling is the > *only* way to achieve differentiated bandwidth distribution, as you > requested by setting different ioprios. I stress that this constraint > is not a technological accident, but a intrinsic, logical necessity. > The proof is simple, and if the following explanation is too boring or > confusing, I can show it to you with any trace of sync I/O. > > First, to provide differentiated service, you need per-process > scheduling, i.e., schedulers in which there is a separate queue > associated with each process. Now, let A be the process with higher > weight (ioprio), and B the process with lower weight. Both processes > are sync, thus, by definition, they issue requests as follows: a few > requests (probably two, or a little bit more with larger iodepth), > then a little break to wait for request completion, then the next > small batch and so on. For each
Re: [PATCH V3 00/11] block-throttle: add .high limit
On Wed, Oct 05, 2016 at 11:30:53AM -0700, Shaohua Li wrote: > On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote: > > Hello, Paolo. > > > > On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: > > > In this respect, for your generic, unpredictable scenario to make > > > sense, there must exist at least one real system that meets the > > > requirements of such a scenario. Or, if such a real system does not > > > yet exist, it must be possible to emulate it. If it is impossible to > > > achieve this last goal either, then I miss the usefulness > > > of looking for solutions for such a scenario. > > > > > > That said, let's define the instance(s) of the scenario that you find > > > most representative, and let's test BFQ on it/them. Numbers will give > > > us the answers. For example, what about all or part of the following > > > groups: > > > . one cyclically doing random I/O for some second and then sequential I/O > > > for the next seconds > > > . one doing, say, quasi-sequential I/O in ON/OFF cycles > > > . one starting an application cyclically > > > . one playing back or streaming a movie > > > > > > For each group, we could then measure the time needed to complete each > > > phase of I/O in each cycle, plus the responsiveness in the group > > > starting an application, plus the frame drop in the group streaming > > > the movie. In addition, we can measure the bandwidth/iops enjoyed by > > > each group, plus, of course, the aggregate throughput of the whole > > > system. In particular we could compare results with throttling, BFQ, > > > and CFQ. > > > > > > Then we could write resulting numbers on the stone, and stick to them > > > until something proves them wrong. > > > > > > What do you (or others) think about it? > > > > That sounds great and yeah it's lame that we didn't start with that. > > Shaohua, would it be difficult to compare how bfq performs against > > blk-throttle? > > I had a test of BFQ. I'm using BFQ found at > http://algogroup.unimore.it/people/paolo/disk_sched/sources.php. version is > 4.7.0-v8r3. It's a LSI SSD, queue depth 32. I use default setting. fio script > is: > > [global] > ioengine=libaio > direct=1 > readwrite=randread > bs=4k > runtime=60 > time_based=1 > file_service_type=random:36 > overwrite=1 > thread=0 > group_reporting=1 > filename=/dev/sdb > iodepth=1 > numjobs=8 > > [groupA] > prio=2 > > [groupB] > new_group > prio=6 > > I'll change iodepth, numjobs and prio in different tests. result unit is MB/s. > > iodepth=1 numjobs=1 prio 4:4 > CFQ: 28:28 BFQ: 21:21 deadline: 29:29 > > iodepth=8 numjobs=1 prio 4:4 > CFQ: 162:162 BFQ: 102:98 deadline: 205:205 > > iodepth=1 numjobs=8 prio 4:4 > CFQ: 157:157 BFQ: 81:92 deadline: 196:197 > > iodepth=1 numjobs=1 prio 2:6 > CFQ: 26.7:27.6 BFQ: 20:6 deadline: 29:29 > > iodepth=8 numjobs=1 prio 2:6 > CFQ: 166:174 BFQ: 139:72 deadline: 202:202 > > iodepth=1 numjobs=8 prio 2:6 > CFQ: 148:150 BFQ: 90:77 deadline: 198:197 More tests: iodepth=8 numjobs=1 prio 2:6, group A has 50M/s limit CFQ:51:207 BFQ: 51:45 deadline: 51:216 iodepth=1 numjobs=1 prio 2:6, group A bs=4k, group B bs=64k CFQ:25:249 BFQ: 23:42 deadline: 26:251 Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote: > Hello, Paolo. > > On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: > > In this respect, for your generic, unpredictable scenario to make > > sense, there must exist at least one real system that meets the > > requirements of such a scenario. Or, if such a real system does not > > yet exist, it must be possible to emulate it. If it is impossible to > > achieve this last goal either, then I miss the usefulness > > of looking for solutions for such a scenario. > > > > That said, let's define the instance(s) of the scenario that you find > > most representative, and let's test BFQ on it/them. Numbers will give > > us the answers. For example, what about all or part of the following > > groups: > > . one cyclically doing random I/O for some second and then sequential I/O > > for the next seconds > > . one doing, say, quasi-sequential I/O in ON/OFF cycles > > . one starting an application cyclically > > . one playing back or streaming a movie > > > > For each group, we could then measure the time needed to complete each > > phase of I/O in each cycle, plus the responsiveness in the group > > starting an application, plus the frame drop in the group streaming > > the movie. In addition, we can measure the bandwidth/iops enjoyed by > > each group, plus, of course, the aggregate throughput of the whole > > system. In particular we could compare results with throttling, BFQ, > > and CFQ. > > > > Then we could write resulting numbers on the stone, and stick to them > > until something proves them wrong. > > > > What do you (or others) think about it? > > That sounds great and yeah it's lame that we didn't start with that. > Shaohua, would it be difficult to compare how bfq performs against > blk-throttle? I had a test of BFQ. I'm using BFQ found at http://algogroup.unimore.it/people/paolo/disk_sched/sources.php. version is 4.7.0-v8r3. It's a LSI SSD, queue depth 32. I use default setting. fio script is: [global] ioengine=libaio direct=1 readwrite=randread bs=4k runtime=60 time_based=1 file_service_type=random:36 overwrite=1 thread=0 group_reporting=1 filename=/dev/sdb iodepth=1 numjobs=8 [groupA] prio=2 [groupB] new_group prio=6 I'll change iodepth, numjobs and prio in different tests. result unit is MB/s. iodepth=1 numjobs=1 prio 4:4 CFQ: 28:28 BFQ: 21:21 deadline: 29:29 iodepth=8 numjobs=1 prio 4:4 CFQ: 162:162 BFQ: 102:98 deadline: 205:205 iodepth=1 numjobs=8 prio 4:4 CFQ: 157:157 BFQ: 81:92 deadline: 196:197 iodepth=1 numjobs=1 prio 2:6 CFQ: 26.7:27.6 BFQ: 20:6 deadline: 29:29 iodepth=8 numjobs=1 prio 2:6 CFQ: 166:174 BFQ: 139:72 deadline: 202:202 iodepth=1 numjobs=8 prio 2:6 CFQ: 148:150 BFQ: 90:77 deadline: 198:197 CFQ isn't fair at all. BFQ is very good in this side, but has poor throughput even prio is the default value. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
Hello, Paolo. On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: > In this respect, for your generic, unpredictable scenario to make > sense, there must exist at least one real system that meets the > requirements of such a scenario. Or, if such a real system does not > yet exist, it must be possible to emulate it. If it is impossible to > achieve this last goal either, then I miss the usefulness > of looking for solutions for such a scenario. > > That said, let's define the instance(s) of the scenario that you find > most representative, and let's test BFQ on it/them. Numbers will give > us the answers. For example, what about all or part of the following > groups: > . one cyclically doing random I/O for some second and then sequential I/O > for the next seconds > . one doing, say, quasi-sequential I/O in ON/OFF cycles > . one starting an application cyclically > . one playing back or streaming a movie > > For each group, we could then measure the time needed to complete each > phase of I/O in each cycle, plus the responsiveness in the group > starting an application, plus the frame drop in the group streaming > the movie. In addition, we can measure the bandwidth/iops enjoyed by > each group, plus, of course, the aggregate throughput of the whole > system. In particular we could compare results with throttling, BFQ, > and CFQ. > > Then we could write resulting numbers on the stone, and stick to them > until something proves them wrong. > > What do you (or others) think about it? That sounds great and yeah it's lame that we didn't start with that. Shaohua, would it be difficult to compare how bfq performs against blk-throttle? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
> Il giorno 05 ott 2016, alle ore 15:12, Vivek Goyalha > scritto: > > On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: > > [..] >> Anyway, to avoid going on with trying speculations and arguments, let >> me retry with a practical proposal. BFQ is out there, free. Let's >> just test, measure and check whether we have already a solution to >> the problems you/we are still trying to solve in Linux. > > Hi Paolo, > > Does BFQ implementaiton scale for fast storage devices using blk-mq > interface. We will want to make sure that locking and other overhead of > BFQ is very minimal so that overall throughput does not suffer. > Of course BFQ needs to be modified to work in blk-mq. I'm rather sure its overhead will then be small enough, just because I have already collaborated to a basically equivalent port from single to multi-queue for packet scheduling (with Luigi Rizzo and others), and our prototype can make over 15 million scheduling decisions per second, and keep latency low, even with tens of concurrent clients running on a multi-core, multi-socket system. For details, here is the paper [1], plus some slides [2]. Actually, the solution in [1] is a global scheduler, which is more complex than the first blk-mq version of BFQ that I have in mind, namely, partitioned scheduling, in which there should be one independent scheduler instance per core. But this is still investigation territory. BTW, I would really appreciate help/feedback on this task [3]. Thanks, Paolo [1] http://info.iet.unipi.it/~luigi/papers/20160921-pspat.pdf [2] http://info.iet.unipi.it/~luigi/pspat/ [3] https://marc.info/?l=linux-kernel=147066540916339=2 > Vivek > -- Paolo Valente Algogroup Dipartimento di Scienze Fisiche, Informatiche e Matematiche Via Campi 213/B 41125 Modena - Italy http://algogroup.unimore.it/people/paolo/ -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: [..] > Anyway, to avoid going on with trying speculations and arguments, let > me retry with a practical proposal. BFQ is out there, free. Let's > just test, measure and check whether we have already a solution to > the problems you/we are still trying to solve in Linux. Hi Paolo, Does BFQ implementaiton scale for fast storage devices using blk-mq interface. We will want to make sure that locking and other overhead of BFQ is very minimal so that overall throughput does not suffer. Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
> Il giorno 04 ott 2016, alle ore 22:27, Tejun Heoha scritto: > > Hello, Paolo. > > On Tue, Oct 04, 2016 at 09:29:48PM +0200, Paolo Valente wrote: >>> Hmm... I think we already discussed this but here's a really simple >>> case. There are three unknown workloads A, B and C and we want to >>> give A certain best-effort guarantees (let's say around 80% of the >>> underlying device) whether A is sharing the device with B or C. >> >> That's the same example that you proposed me in our previous >> discussion. For this example I showed you, with many boring numbers, >> that with BFQ you get the most accurate distribution of the resource. > > Yes, it is about the same example and what I understood was that > "accurate distribution of the resources" holds as long as the > randomness is incidental (ie. due to layout on the filesystem and so > on) with the slice expiration mechanism offsetting the actually random > workloads. > For completeness, this property holds whatever the workload is, especially even if it changes. >> If you have enough stamina, I can repeat them again. To save your > > I'll go back to the thread and re-read them. > Maybe we can make this less boring, see the end of this email. >> patience, here is a very brief summary. In a concrete use case, the >> unknown workloads turn into something like this: there will be a first >> time interval during which A happens to be, say, sequential, B happens >> to be, say, random and C happens to be, say, quasi-sequential. Then >> there will be a next time interval during which their characteristics >> change, and so on. It is easy (but boring, I acknowledge it) to show >> that, for each of these time intervals BFQ provides the best possible >> service in terms of fairness, bandwidth distribution, stability and so >> on. Why? Because of the elastic bandwidth-time scheduling of BFQ >> that we already discussed, and because BFQ is naturally accurate in >> redistributing aggregate throughput proportionally, when needed. > > Yeah, that's what I remember and for workload above certain level of > randomness its time consumption is mapped to bw, right? > Exactly. >>> I get that bfq can be a good compromise on most desktop workloads and >>> behave reasonably well for some server workloads with the slice >>> expiration mechanism but it really isn't an IO resource partitioning >>> mechanism. >> >> Right. My argument is that BFQ enables you to give to each client the >> bandwidth and low-latency guarantees you want. And this IMO is way >> better than partitioning a resource and then getting unavoidable >> unfairness and high latency. > > But that statement only holds while bw is the main thing to guarantee, > no? The level of isolation that we're looking for here is fairly > strict adherence to sub/few-milliseconds in terms of high percentile > scheduling latency while within the configured bw/iops limits, not > "overall this device is being used pretty well". > Guaranteeing such a short-term latency, while guaranteeing not just bw limits, but also proportional share distribution of the bw, is the reason why we have devised BFQ years ago. Anyway, to avoid going on with trying speculations and arguments, let me retry with a practical proposal. BFQ is out there, free. Let's just test, measure and check whether we have already a solution to the problems you/we are still trying to solve in Linux. In this respect, for your generic, unpredictable scenario to make sense, there must exist at least one real system that meets the requirements of such a scenario. Or, if such a real system does not yet exist, it must be possible to emulate it. If it is impossible to achieve this last goal either, then I miss the usefulness of looking for solutions for such a scenario. That said, let's define the instance(s) of the scenario that you find most representative, and let's test BFQ on it/them. Numbers will give us the answers. For example, what about all or part of the following groups: . one cyclically doing random I/O for some second and then sequential I/O for the next seconds . one doing, say, quasi-sequential I/O in ON/OFF cycles . one starting an application cyclically . one playing back or streaming a movie For each group, we could then measure the time needed to complete each phase of I/O in each cycle, plus the responsiveness in the group starting an application, plus the frame drop in the group streaming the movie. In addition, we can measure the bandwidth/iops enjoyed by each group, plus, of course, the aggregate throughput of the whole system. In particular we could compare results with throttling, BFQ, and CFQ. Then we could write resulting numbers on the stone, and stick to them until something proves them wrong. What do you (or others) think about it? Thanks, Paolo > Thanks. > > -- > tejun > -- > To unsubscribe from this list: send the line "unsubscribe linux-block" in > the body of a message to
Re: [PATCH V3 00/11] block-throttle: add .high limit
Hello, Paolo. On Tue, Oct 04, 2016 at 09:29:48PM +0200, Paolo Valente wrote: > > Hmm... I think we already discussed this but here's a really simple > > case. There are three unknown workloads A, B and C and we want to > > give A certain best-effort guarantees (let's say around 80% of the > > underlying device) whether A is sharing the device with B or C. > > That's the same example that you proposed me in our previous > discussion. For this example I showed you, with many boring numbers, > that with BFQ you get the most accurate distribution of the resource. Yes, it is about the same example and what I understood was that "accurate distribution of the resources" holds as long as the randomness is incidental (ie. due to layout on the filesystem and so on) with the slice expiration mechanism offsetting the actually random workloads. > If you have enough stamina, I can repeat them again. To save your I'll go back to the thread and re-read them. > patience, here is a very brief summary. In a concrete use case, the > unknown workloads turn into something like this: there will be a first > time interval during which A happens to be, say, sequential, B happens > to be, say, random and C happens to be, say, quasi-sequential. Then > there will be a next time interval during which their characteristics > change, and so on. It is easy (but boring, I acknowledge it) to show > that, for each of these time intervals BFQ provides the best possible > service in terms of fairness, bandwidth distribution, stability and so > on. Why? Because of the elastic bandwidth-time scheduling of BFQ > that we already discussed, and because BFQ is naturally accurate in > redistributing aggregate throughput proportionally, when needed. Yeah, that's what I remember and for workload above certain level of randomness its time consumption is mapped to bw, right? > > I get that bfq can be a good compromise on most desktop workloads and > > behave reasonably well for some server workloads with the slice > > expiration mechanism but it really isn't an IO resource partitioning > > mechanism. > > Right. My argument is that BFQ enables you to give to each client the > bandwidth and low-latency guarantees you want. And this IMO is way > better than partitioning a resource and then getting unavoidable > unfairness and high latency. But that statement only holds while bw is the main thing to guarantee, no? The level of isolation that we're looking for here is fairly strict adherence to sub/few-milliseconds in terms of high percentile scheduling latency while within the configured bw/iops limits, not "overall this device is being used pretty well". Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
> Il giorno 04 ott 2016, alle ore 21:14, Tejun Heoha scritto: > > Hello, Paolo. > > On Tue, Oct 04, 2016 at 09:02:47PM +0200, Paolo Valente wrote: >> That's exactly what BFQ has succeeded in doing in all the tests >> devised so far. Can you give me a concrete example for which I can >> try with BFQ and with any other mechanism you deem better. If >> you are right, numbers will just make your point. > > Hmm... I think we already discussed this but here's a really simple > case. There are three unknown workloads A, B and C and we want to > give A certain best-effort guarantees (let's say around 80% of the > underlying device) whether A is sharing the device with B or C. > That's the same example that you proposed me in our previous discussion. For this example I showed you, with many boring numbers, that with BFQ you get the most accurate distribution of the resource. If you have enough stamina, I can repeat them again. To save your patience, here is a very brief summary. In a concrete use case, the unknown workloads turn into something like this: there will be a first time interval during which A happens to be, say, sequential, B happens to be, say, random and C happens to be, say, quasi-sequential. Then there will be a next time interval during which their characteristics change, and so on. It is easy (but boring, I acknowledge it) to show that, for each of these time intervals BFQ provides the best possible service in terms of fairness, bandwidth distribution, stability and so on. Why? Because of the elastic bandwidth-time scheduling of BFQ that we already discussed, and because BFQ is naturally accurate in redistributing aggregate throughput proportionally, when needed. > I get that bfq can be a good compromise on most desktop workloads and > behave reasonably well for some server workloads with the slice > expiration mechanism but it really isn't an IO resource partitioning > mechanism. > Right. My argument is that BFQ enables you to give to each client the bandwidth and low-latency guarantees you want. And this IMO is way better than partitioning a resource and then getting unavoidable unfairness and high latency. Thanks, Paolo > Thanks. > > -- > tejun -- Paolo Valente Algogroup Dipartimento di Scienze Fisiche, Informatiche e Matematiche Via Campi 213/B 41125 Modena - Italy http://algogroup.unimore.it/people/paolo/ -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
Hello, Paolo. On Tue, Oct 04, 2016 at 09:02:47PM +0200, Paolo Valente wrote: > That's exactly what BFQ has succeeded in doing in all the tests > devised so far. Can you give me a concrete example for which I can > try with BFQ and with any other mechanism you deem better. If > you are right, numbers will just make your point. Hmm... I think we already discussed this but here's a really simple case. There are three unknown workloads A, B and C and we want to give A certain best-effort guarantees (let's say around 80% of the underlying device) whether A is sharing the device with B or C. I get that bfq can be a good compromise on most desktop workloads and behave reasonably well for some server workloads with the slice expiration mechanism but it really isn't an IO resource partitioning mechanism. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
> Il giorno 04 ott 2016, alle ore 17:56, Tejun Heoha scritto: > > Hello, Vivek. > > On Tue, Oct 04, 2016 at 09:28:05AM -0400, Vivek Goyal wrote: >> On Mon, Oct 03, 2016 at 02:20:19PM -0700, Shaohua Li wrote: >>> Hi, >>> >>> The background is we don't have an ioscheduler for blk-mq yet, so we can't >>> prioritize processes/cgroups. >> >> So this is an interim solution till we have ioscheduler for blk-mq? > > It's a common permanent solution which applies to both !mq and mq. > >>> This patch set tries to add basic arbitration >>> between cgroups with blk-throttle. It adds a new limit io.high for >>> blk-throttle. It's only for cgroup2. >>> >>> io.max is a hard limit throttling. cgroups with a max limit never dispatch >>> more >>> IO than their max limit. While io.high is a best effort throttling. cgroups >>> with high limit can run above their high limit at appropriate time. >>> Specifically, if all cgroups reach their high limit, all cgroups can run >>> above >>> their high limit. If any cgroup runs under its high limit, all other cgroups >>> will run according to their high limit. >> >> Hi Shaohua, >> >> I still don't understand why we should not implement a weight based >> proportional IO mechanism and how this mechanism is better than proportional >> IO . > > Oh, if we actually can implement proportional IO control, it'd be > great. The problem is that we have no way of knowing IO cost for > highspeed ssd devices. CFQ gets around the problem by using the > walltime as the measure of resource usage and scheduling time slices, > which works fine for rotating disks but horribly for highspeed ssds. > Could you please elaborate more on this point? BFQ uses sectors served to measure service, and, on the all the fast devices on which we have tested it, it accurately distributes bandwidth as desired, redistributes excess bandwidth with any issue, and guarantees high responsiveness and low latency at application and system level (e.g., ~0 drop rate in video playback, with any background workload tested). Could you please suggest me some test to show how sector-based guarantees fails? Thanks, Paolo > We can get some semblance of proportional control by just counting bw > or iops but both break down badly as a means to measure the actual > resource consumption depending on the workload. While limit based > control is more tedious to configure, it doesn't misrepresent what's > going on and is a lot less likely to produce surprising outcomes. > > We *can* try to concoct something which tries to do proportional > control for highspeed ssds but that's gonna be quite a bit of > complexity and I'm not so sure it'd be justifiable given that we can't > even figure out measurement of the most basic operating unit. > >> Agreed that we have issues with proportional IO and we don't have good >> solutions for these problems. But I can't see that how this mechanism >> will overcome these problems either. > > It mostly defers the burden to the one who's configuring the limits > and expects it to know the characteristics of the device and workloads > and configure accordingly. It's quite a bit more tedious to use but > should be able to cover good portion of use cases without being overly > complicated. I agree that it'd be nice to have a simple proportional > control but as you said can't see a good solution for it at the > moment. > >> IIRC, biggest issue with proportional IO was that a low prio group might >> fill up the device queue with plenty of IO requests and later when high >> prio cgroup comes, it will still experience latencies anyway. And solution >> to the problem probably would be to get some awareness in device about >> priority of request and map weights to those priority. That way higher >> prio requests get prioritized. > > Nah, the real problem is that we can't even decide what the > proportions should be based on. The most fundamental part is missing. > >> Or run device at lower queue depth. That will improve latencies but migth >> reduce overall throughput. > > And that we can't do this (and thus basically operate close to > scheduling time slices) for highspeed ssds. > >> Or thorottle number of buffered writes (as Jens's writeback throttling) >> patches were doing. Buffered writes seem to be biggest culprit for >> increased latencies and being able to control these should help. > > That's a different topic. > >> ioprio/weight based proportional IO mechanism is much more generic and >> much easier to configure for any kind of storage. io.high is absolute >> limit and makes it much harder to configure. One needs to know a lot >> about underlying volume/device's bandwidth (which varies a lot anyway >> based on workload). > > Yeap, no disagreement there, but it still is a workable solution. > >> IMHO, we seem to be trying to cater to one specific use case using >> this mechanism. Something ioprio/weight based will be much more >> generic and we should
Re: [PATCH V3 00/11] block-throttle: add .high limit
Hello, Vivek. On Tue, Oct 04, 2016 at 09:28:05AM -0400, Vivek Goyal wrote: > On Mon, Oct 03, 2016 at 02:20:19PM -0700, Shaohua Li wrote: > > Hi, > > > > The background is we don't have an ioscheduler for blk-mq yet, so we can't > > prioritize processes/cgroups. > > So this is an interim solution till we have ioscheduler for blk-mq? It's a common permanent solution which applies to both !mq and mq. > > This patch set tries to add basic arbitration > > between cgroups with blk-throttle. It adds a new limit io.high for > > blk-throttle. It's only for cgroup2. > > > > io.max is a hard limit throttling. cgroups with a max limit never dispatch > > more > > IO than their max limit. While io.high is a best effort throttling. cgroups > > with high limit can run above their high limit at appropriate time. > > Specifically, if all cgroups reach their high limit, all cgroups can run > > above > > their high limit. If any cgroup runs under its high limit, all other cgroups > > will run according to their high limit. > > Hi Shaohua, > > I still don't understand why we should not implement a weight based > proportional IO mechanism and how this mechanism is better than proportional > IO . Oh, if we actually can implement proportional IO control, it'd be great. The problem is that we have no way of knowing IO cost for highspeed ssd devices. CFQ gets around the problem by using the walltime as the measure of resource usage and scheduling time slices, which works fine for rotating disks but horribly for highspeed ssds. We can get some semblance of proportional control by just counting bw or iops but both break down badly as a means to measure the actual resource consumption depending on the workload. While limit based control is more tedious to configure, it doesn't misrepresent what's going on and is a lot less likely to produce surprising outcomes. We *can* try to concoct something which tries to do proportional control for highspeed ssds but that's gonna be quite a bit of complexity and I'm not so sure it'd be justifiable given that we can't even figure out measurement of the most basic operating unit. > Agreed that we have issues with proportional IO and we don't have good > solutions for these problems. But I can't see that how this mechanism > will overcome these problems either. It mostly defers the burden to the one who's configuring the limits and expects it to know the characteristics of the device and workloads and configure accordingly. It's quite a bit more tedious to use but should be able to cover good portion of use cases without being overly complicated. I agree that it'd be nice to have a simple proportional control but as you said can't see a good solution for it at the moment. > IIRC, biggest issue with proportional IO was that a low prio group might > fill up the device queue with plenty of IO requests and later when high > prio cgroup comes, it will still experience latencies anyway. And solution > to the problem probably would be to get some awareness in device about > priority of request and map weights to those priority. That way higher > prio requests get prioritized. Nah, the real problem is that we can't even decide what the proportions should be based on. The most fundamental part is missing. > Or run device at lower queue depth. That will improve latencies but migth > reduce overall throughput. And that we can't do this (and thus basically operate close to scheduling time slices) for highspeed ssds. > Or thorottle number of buffered writes (as Jens's writeback throttling) > patches were doing. Buffered writes seem to be biggest culprit for > increased latencies and being able to control these should help. That's a different topic. > ioprio/weight based proportional IO mechanism is much more generic and > much easier to configure for any kind of storage. io.high is absolute > limit and makes it much harder to configure. One needs to know a lot > about underlying volume/device's bandwidth (which varies a lot anyway > based on workload). Yeap, no disagreement there, but it still is a workable solution. > IMHO, we seem to be trying to cater to one specific use case using > this mechanism. Something ioprio/weight based will be much more > generic and we should explore implementing that along with building > notion of ioprio in devices. When these two work together, we might > be able to see good results. Just software mechanism alone might not > be enough. I don't think it's catering to specific use cases. It is a generic mechanism which demands knowledge and experimentation to configure. It's more a way for the kernel to cop out and defer figuring out device characteristics to userland. If you have a better idea, I'm all ears. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 00/11] block-throttle: add .high limit
On Mon, Oct 03, 2016 at 02:20:19PM -0700, Shaohua Li wrote: > Hi, > > The background is we don't have an ioscheduler for blk-mq yet, so we can't > prioritize processes/cgroups. So this is an interim solution till we have ioscheduler for blk-mq? > This patch set tries to add basic arbitration > between cgroups with blk-throttle. It adds a new limit io.high for > blk-throttle. It's only for cgroup2. > > io.max is a hard limit throttling. cgroups with a max limit never dispatch > more > IO than their max limit. While io.high is a best effort throttling. cgroups > with high limit can run above their high limit at appropriate time. > Specifically, if all cgroups reach their high limit, all cgroups can run above > their high limit. If any cgroup runs under its high limit, all other cgroups > will run according to their high limit. Hi Shaohua, I still don't understand why we should not implement a weight based proportional IO mechanism and how this mechanism is better than proportional IO . Agreed that we have issues with proportional IO and we don't have good solutions for these problems. But I can't see that how this mechanism will overcome these problems either. IIRC, biggest issue with proportional IO was that a low prio group might fill up the device queue with plenty of IO requests and later when high prio cgroup comes, it will still experience latencies anyway. And solution to the problem probably would be to get some awareness in device about priority of request and map weights to those priority. That way higher prio requests get prioritized. Or run device at lower queue depth. That will improve latencies but migth reduce overall throughput. Or thorottle number of buffered writes (as Jens's writeback throttling) patches were doing. Buffered writes seem to be biggest culprit for increased latencies and being able to control these should help. ioprio/weight based proportional IO mechanism is much more generic and much easier to configure for any kind of storage. io.high is absolute limit and makes it much harder to configure. One needs to know a lot about underlying volume/device's bandwidth (which varies a lot anyway based on workload). IMHO, we seem to be trying to cater to one specific use case using this mechanism. Something ioprio/weight based will be much more generic and we should explore implementing that along with building notion of ioprio in devices. When these two work together, we might be able to see good results. Just software mechanism alone might not be enough. Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html