Re: Renice X for cpu schedulers
On Tue, Apr 24, 2007 at 08:50:20AM -0700, Ray Lee wrote: > > Firstly, lots of clients in your list are remote. X usually isn't. > > They really aren't, unless you happen to work somewhere that can afford > to dedicate a box to a db, which suddenly makes the scheduler a dull > topic. > > For example, I have a db and web server installed on my laptop, so > that the few times that I have to do web app programming (while wearing > a mustache and glasses so that I don't have to admit to it in polite > company), I can be functional with just one computer. Indeed. The vast majority of people doing "LAMP" web services are doing it on a single machine. Or VM for that matter. It seems that this is a lot like the priority inheritance problem. If a nice -19 process blocks on the db running at nice 0, the db ought to get a boost until it wakes the original process up. The same should apply at the level of dynamic priorities at the same nice level. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
Nick Piggin wrote: > On Thu, Apr 19, 2007 at 12:26:03PM -0700, Ray Lee wrote: >> On 4/19/07, Con Kolivas <[EMAIL PROTECTED]> wrote: >>> The one fly in the ointment for >>> linux remains X. I am still, to this moment, completely and utterly stunned >>> at why everyone is trying to find increasingly complex unique ways to >>> manage >>> X when all it needs is more cpu[1]. >> [...and hence should be reniced] >> >> The problem is that X is not unique. There's postgresql, memcached, >> mysql, db2, a little embedded app I wrote... all of these perform work >> on behalf of another process. It's just most *noticeable* with X, as >> pretty much everyone is running that. > > But for most of those apps, we don't actually care if they do fairly > degrade in performance as other loads on the system ramp up. (Who's this 'we' kemosabe? I do. Desktop systems are increasingly using databases for their day-to-day tasks. As they should, a db is not something that should be reinvented poorly.) > However > the user prefers X to be given priority in these situations. Whether > that is the design of X, x clients, or the human condition really > doesn't matter two hoots to the scheduler. Hmm, let's try this again. Anything that communicates out of process as part of its normal usage for Getting Work Done gets impacted by the scheduler. That means pipelines in the shell, d-bus on the desktop, and lots of other things that follow the unix philosophy of lots of little programs communicating. >> If we had some way for the scheduler to decide to donate part of a >> client process's time slice to the server it just spoke to (with an >> exponential dampening factor -- take 50% from the client, give 25% to >> the server, toss the rest on the floor), that -- from my naive point >> of view -- would be a step toward fixing the underlying issue. Or I >> might be spouting crap, who knows. > > Firstly, lots of clients in your list are remote. X usually isn't. They really aren't, unless you happen to work somewhere that can afford to dedicate a box to a db, which suddenly makes the scheduler a dull topic. For example, I have a db and web server installed on my laptop, so that the few times that I have to do web app programming (while wearing a mustache and glasses so that I don't have to admit to it in polite company), I can be functional with just one computer. > However for X, a syscall or something to donate time might not be > such a bad idea... We have one already, it's called write(). We have another called read(), too. Okay, so they have some data related side-effects other than the scheduler hints, but I claim the scheduler hint is already implicitly there. > but given a couple of X clients and a server > against a parallel make, this is probably just going to make the > clients slow down as well without giving enough priority to the > server. Do you have data, or at least a theory to back up that hypothesis? > X isn't special so much because it does work on behalf of others > (as you said, lots of things do that). It is special simply because > we _want_ rendering to have priority of the CPU Really not. I'm trying to get across that this is a general problem with interprocess communication, or any systems that rely on multiple processes to make forward progress on a problem. Sure, let the clients make forward progress until they can't any more. If they stop making forward progress by blocking on a read or sleeping after a write to another process, then there's a big hint there as to who should get focus next. > (if you shifed CPU > intensive rendering to the clients, you'd most likely want to give > them priority to); nice, right? They'd have it automatically, if they were spending their time computing rather than rendering. Ray - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
Nick Piggin wrote: On Thu, Apr 19, 2007 at 12:26:03PM -0700, Ray Lee wrote: On 4/19/07, Con Kolivas [EMAIL PROTECTED] wrote: The one fly in the ointment for linux remains X. I am still, to this moment, completely and utterly stunned at why everyone is trying to find increasingly complex unique ways to manage X when all it needs is more cpu[1]. [...and hence should be reniced] The problem is that X is not unique. There's postgresql, memcached, mysql, db2, a little embedded app I wrote... all of these perform work on behalf of another process. It's just most *noticeable* with X, as pretty much everyone is running that. But for most of those apps, we don't actually care if they do fairly degrade in performance as other loads on the system ramp up. (Who's this 'we' kemosabe? I do. Desktop systems are increasingly using databases for their day-to-day tasks. As they should, a db is not something that should be reinvented poorly.) However the user prefers X to be given priority in these situations. Whether that is the design of X, x clients, or the human condition really doesn't matter two hoots to the scheduler. Hmm, let's try this again. Anything that communicates out of process as part of its normal usage for Getting Work Done gets impacted by the scheduler. That means pipelines in the shell, d-bus on the desktop, and lots of other things that follow the unix philosophy of lots of little programs communicating. If we had some way for the scheduler to decide to donate part of a client process's time slice to the server it just spoke to (with an exponential dampening factor -- take 50% from the client, give 25% to the server, toss the rest on the floor), that -- from my naive point of view -- would be a step toward fixing the underlying issue. Or I might be spouting crap, who knows. Firstly, lots of clients in your list are remote. X usually isn't. They really aren't, unless you happen to work somewhere that can afford to dedicate a box to a db, which suddenly makes the scheduler a dull topic. For example, I have a db and web server installed on my laptop, so that the few times that I have to do web app programming (while wearing a mustache and glasses so that I don't have to admit to it in polite company), I can be functional with just one computer. However for X, a syscall or something to donate time might not be such a bad idea... We have one already, it's called write(). We have another called read(), too. Okay, so they have some data related side-effects other than the scheduler hints, but I claim the scheduler hint is already implicitly there. but given a couple of X clients and a server against a parallel make, this is probably just going to make the clients slow down as well without giving enough priority to the server. Do you have data, or at least a theory to back up that hypothesis? X isn't special so much because it does work on behalf of others (as you said, lots of things do that). It is special simply because we _want_ rendering to have priority of the CPU Really not. I'm trying to get across that this is a general problem with interprocess communication, or any systems that rely on multiple processes to make forward progress on a problem. Sure, let the clients make forward progress until they can't any more. If they stop making forward progress by blocking on a read or sleeping after a write to another process, then there's a big hint there as to who should get focus next. (if you shifed CPU intensive rendering to the clients, you'd most likely want to give them priority to); nice, right? They'd have it automatically, if they were spending their time computing rather than rendering. Ray - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Tue, Apr 24, 2007 at 08:50:20AM -0700, Ray Lee wrote: Firstly, lots of clients in your list are remote. X usually isn't. They really aren't, unless you happen to work somewhere that can afford to dedicate a box to a db, which suddenly makes the scheduler a dull topic. For example, I have a db and web server installed on my laptop, so that the few times that I have to do web app programming (while wearing a mustache and glasses so that I don't have to admit to it in polite company), I can be functional with just one computer. Indeed. The vast majority of people doing LAMP web services are doing it on a single machine. Or VM for that matter. It seems that this is a lot like the priority inheritance problem. If a nice -19 process blocks on the db running at nice 0, the db ought to get a boost until it wakes the original process up. The same should apply at the level of dynamic priorities at the same nice level. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Sunday 22 April 2007 22:54, Mark Lord wrote: > Just to throw another possibly-overlooked variable into the mess: > > My system here is using the on-demand cpufreq policy governor. > I wonder how that interacts with the various schedulers here? > > I suppose for the "make" kernel case, after a couple of seconds > the cpufreq would hit max and stay there for the rest of the build, > so it shouldn't really be a factor for (non-)interactivity during the > build. > > Or should it? Short answer: shouldn't matter :) -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
Just to throw another possibly-overlooked variable into the mess: My system here is using the on-demand cpufreq policy governor. I wonder how that interacts with the various schedulers here? I suppose for the "make" kernel case, after a couple of seconds the cpufreq would hit max and stay there for the rest of the build, so it shouldn't really be a factor for (non-)interactivity during the build. Or should it? Cheers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
Just to throw another possibly-overlooked variable into the mess: My system here is using the on-demand cpufreq policy governor. I wonder how that interacts with the various schedulers here? I suppose for the make kernel case, after a couple of seconds the cpufreq would hit max and stay there for the rest of the build, so it shouldn't really be a factor for (non-)interactivity during the build. Or should it? Cheers - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Sunday 22 April 2007 22:54, Mark Lord wrote: Just to throw another possibly-overlooked variable into the mess: My system here is using the on-demand cpufreq policy governor. I wonder how that interacts with the various schedulers here? I suppose for the make kernel case, after a couple of seconds the cpufreq would hit max and stay there for the rest of the build, so it shouldn't really be a factor for (non-)interactivity during the build. Or should it? Short answer: shouldn't matter :) -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
Nick Piggin wrote: On Thu, Apr 19, 2007 at 09:17:25AM -0400, Mark Lord wrote: Just plain "make" (no -j2 or -j) is enough to kill interactivity on my 2GHz P-M single-core non-HT machine with SD. Is this with or without X reniced? That was with no manual jiggling, everything the same as with stock kernels, except that stock kernels don't kill interactivity here. But with the very first posted version of CFS by Ingo, I can do "make -j2" no problem and still have a nicely interactive destop. How well does cfs run if you have the granularity set to something like 30ms (3000)? Dunno, I've put this stuff aside for now until things settle down. With four schedulers, and lots of patches / revisions / tuning-knobs, there's just no way to keep up with it all here. Cheers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
Nick Piggin wrote: On Thu, Apr 19, 2007 at 09:17:25AM -0400, Mark Lord wrote: Just plain make (no -j2 or -j) is enough to kill interactivity on my 2GHz P-M single-core non-HT machine with SD. Is this with or without X reniced? That was with no manual jiggling, everything the same as with stock kernels, except that stock kernels don't kill interactivity here. But with the very first posted version of CFS by Ingo, I can do make -j2 no problem and still have a nicely interactive destop. How well does cfs run if you have the granularity set to something like 30ms (3000)? Dunno, I've put this stuff aside for now until things settle down. With four schedulers, and lots of patches / revisions / tuning-knobs, there's just no way to keep up with it all here. Cheers - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Fri, Apr 20, 2007 at 12:12:29AM -0700, Michael K. Edwards wrote: > Actual fractional CPU reservation is a bit different, and is probably > best handled with "container"-type infrastructure (not quite > virtualization, but not quite scheduling classes either). SGI > pioneered this (in "open systems" space -- IBM probably had it first, > as usual) with GRIO in XFS. (That was I/O throughput reservation of I'm very aware of this having grow up on those systems and see what 30k USD of hardware can do for you with the right kernel facilties. It would be a mind blower to get OpenGL and friends back to that level of performance with regards to React/Pro's rt abilities, frame drop would just be gone and we'd own gaming. No joke. We have a number of former SGI XFS engineers here at NetApp and I should ask them about the GRIO implementation. > course, not "CPU bandwidth" -- but IIRC IRIX had CPU reservation too). > There's a more general class of techniques in which it's worth > spending idle cycles speculating along paths that might or might not > be taken depending on unpredictable I/O; I'd be surprised if you > couldn't approximate most of the sane balancing strategies in this > area within the "economic dispatch" scheduler model. (Good JIT What is that ? never heard of it before. > I don't know where the -rt patch enters in. But if you need agile > reprioritization with a deep runnable queue, either under direct > application control or as a side effect of priority inheritance or a > related OS-enforced protocol, then you need a kernel-level data > structure with a fancier interface than the classic > insert/find/delete-min priority queue. From what I've read (this is > not my area of expertise and I don't have Knuth handy), the relatively > simple heap-based implementations of priority queues can't > reprioritize an entry any more quickly than find+delete+insert, which > pretty much rules them out as a basis for a scalable scheduler with > priority inheritance (let alone PCP emulation). The -rt patch has turnstile-esque infrastructure that's stack allocated. Linux's lock hierarchy is relatively shallow (compensated with a heavy use of per CPU method and RCU-ified algorithms in place of rwlocks) so I've encountered nothing close to this that would demand such an overly sophisticated mechanism. I'm aware of PCP and preemptions thresholds. I created the lockstat infrastructure as a means of precisely measuring contention in -rt in anticipation to experiment with these techniques. I mention -rt because it's the most likely place to encounter what you're talking about, not an app. > >I have Solaris style adaptive locks in my tree with my lockstat patch > >under -rt. I've also modified my lockstat patch to track readers ... > Ooh, that's neat. The next time I can cook up an excuse to run a > kernel that won't load this damn WiFi driver, I'll try it out. Some > of the people I work with are real engineers and respect in-system > instrumentation. It's not publically released yet since I'm still stuck in .20-rc6 land and the soft lock up detector triggers. I need to forward port it and my lockstat changes to the most recent -rt patch. I've been stalled on revision control problem that I'm trying to solve with monotone for at least a month (of my own spare time). > That's a good thing; it implies that in-kernel algorithms don't take > locks needlessly as a matter of cargo-cult habit. Attempting to take The jury is still out on this until I can record what the rtmutex owner's state is in. No further conclusion can be made until then. I think this is very interesting pursuit/investigation. > a lock (other than an uncontended futex, which is practically free) > should almost always transfer control to the thread that has the power > to deliver the information (or the free slot) that you're looking for > -- or in the case of an external data source/sink, should send you > into low-power mode until time and tide give you something new to do. > Think of it as a just-in-time inventory system; if you keep too much > product in stock (or free warehouse space), you're wasting space and > harming responsiveness to a shift in demand. Once in a while you have > to play Sokoban in order to service a request promptly; that's exactly > the case that priority inheritance is meant to help with. What did you mean by this ? Victor Yodaiken's stuff ? > The fiddly part, on a non-real-time-no-matter-what-the-label-says > system with an opaque cache architecture and mysterious hidden costs > of context switching, is to minimize the lossage resulting from brutal > timer- or priority-inheritance-driven preemption. Given the way > people code these days -- OK, it was probably just as bad back in the > day -- the only thing worse than letting the axe fall at random is to > steal the CPU away the moment a contended lock is released, because My adaptive spin stuff in front of an rtmutex is design to complement Steve
Re: Renice X for cpu schedulers
On 4/19/07, hui Bill Huey <[EMAIL PROTECTED]> wrote: DSP operations like, particularly with digital synthesis, tend to max the CPU doing vector operations on as many processors as it can get a hold of. In a live performance critical application, it's important to be able to deliver a protected amount of CPU to a thread doing that work as well as response to external input such as controllers, etc... Actual fractional CPU reservation is a bit different, and is probably best handled with "container"-type infrastructure (not quite virtualization, but not quite scheduling classes either). SGI pioneered this (in "open systems" space -- IBM probably had it first, as usual) with GRIO in XFS. (That was I/O throughput reservation of course, not "CPU bandwidth" -- but IIRC IRIX had CPU reservation too). There's a more general class of techniques in which it's worth spending idle cycles speculating along paths that might or might not be taken depending on unpredictable I/O; I'd be surprised if you couldn't approximate most of the sane balancing strategies in this area within the "economic dispatch" scheduler model. (Good JIT bytecode engines more or less do this already if you let them, with a cap on JIT cache size serving as a crude CPU throttle.) > In practice, you probably don't want to burden desktop Linux with > priority inheritance where you don't have to. Priority queues with > algorithmically efficient decrease-key operations (Fibonacci heaps and > their ilk) are complicated to implement and have correspondingly high > constant factors. (However, a sufficiently clever heuristic for > assigning quasi-static task priorities would usually short-circuit the > priority cascade; if you can keep N small in the > tasks-with-unpredictable-priority queue, you can probably use a > simpler flavor with O(log N) decrease-key. Ask someone who knows more > about data structures than I do.) These are app issue and not really somethings that's mutable in kernel per se with regard to the -rt patch. I don't know where the -rt patch enters in. But if you need agile reprioritization with a deep runnable queue, either under direct application control or as a side effect of priority inheritance or a related OS-enforced protocol, then you need a kernel-level data structure with a fancier interface than the classic insert/find/delete-min priority queue. From what I've read (this is not my area of expertise and I don't have Knuth handy), the relatively simple heap-based implementations of priority queues can't reprioritize an entry any more quickly than find+delete+insert, which pretty much rules them out as a basis for a scalable scheduler with priority inheritance (let alone PCP emulation). I have Solaris style adaptive locks in my tree with my lockstat patch under -rt. I've also modified my lockstat patch to track readers correctly now with rwsem and the like to see where the single reader limitation in the rtmutex blows it. Ooh, that's neat. The next time I can cook up an excuse to run a kernel that won't load this damn WiFi driver, I'll try it out. Some of the people I work with are real engineers and respect in-system instrumentation. So far I've seen less than 10 percent of in-kernel contention events actually worth spinning on and the rest of the stats imply that the mutex owner in question is either preempted or blocked on something else. That's a good thing; it implies that in-kernel algorithms don't take locks needlessly as a matter of cargo-cult habit. Attempting to take a lock (other than an uncontended futex, which is practically free) should almost always transfer control to the thread that has the power to deliver the information (or the free slot) that you're looking for -- or in the case of an external data source/sink, should send you into low-power mode until time and tide give you something new to do. Think of it as a just-in-time inventory system; if you keep too much product in stock (or free warehouse space), you're wasting space and harming responsiveness to a shift in demand. Once in a while you have to play Sokoban in order to service a request promptly; that's exactly the case that priority inheritance is meant to help with. The fiddly part, on a non-real-time-no-matter-what-the-label-says system with an opaque cache architecture and mysterious hidden costs of context switching, is to minimize the lossage resulting from brutal timer- or priority-inheritance-driven preemption. Given the way people code these days -- OK, it was probably just as bad back in the day -- the only thing worse than letting the axe fall at random is to steal the CPU away the moment a contended lock is released, because the next 20 lines of code probably poke one last time at all the data structures the task had in cache right before entering the critical section. That doesn't hurt so bad on RTOS-friendly hardware -- an MMU-less system with either zero or near-infinite cache -- but it's got to make
Re: Renice X for cpu schedulers
On 4/19/07, hui Bill Huey [EMAIL PROTECTED] wrote: DSP operations like, particularly with digital synthesis, tend to max the CPU doing vector operations on as many processors as it can get a hold of. In a live performance critical application, it's important to be able to deliver a protected amount of CPU to a thread doing that work as well as response to external input such as controllers, etc... Actual fractional CPU reservation is a bit different, and is probably best handled with container-type infrastructure (not quite virtualization, but not quite scheduling classes either). SGI pioneered this (in open systems space -- IBM probably had it first, as usual) with GRIO in XFS. (That was I/O throughput reservation of course, not CPU bandwidth -- but IIRC IRIX had CPU reservation too). There's a more general class of techniques in which it's worth spending idle cycles speculating along paths that might or might not be taken depending on unpredictable I/O; I'd be surprised if you couldn't approximate most of the sane balancing strategies in this area within the economic dispatch scheduler model. (Good JIT bytecode engines more or less do this already if you let them, with a cap on JIT cache size serving as a crude CPU throttle.) In practice, you probably don't want to burden desktop Linux with priority inheritance where you don't have to. Priority queues with algorithmically efficient decrease-key operations (Fibonacci heaps and their ilk) are complicated to implement and have correspondingly high constant factors. (However, a sufficiently clever heuristic for assigning quasi-static task priorities would usually short-circuit the priority cascade; if you can keep N small in the tasks-with-unpredictable-priority queue, you can probably use a simpler flavor with O(log N) decrease-key. Ask someone who knows more about data structures than I do.) These are app issue and not really somethings that's mutable in kernel per se with regard to the -rt patch. I don't know where the -rt patch enters in. But if you need agile reprioritization with a deep runnable queue, either under direct application control or as a side effect of priority inheritance or a related OS-enforced protocol, then you need a kernel-level data structure with a fancier interface than the classic insert/find/delete-min priority queue. From what I've read (this is not my area of expertise and I don't have Knuth handy), the relatively simple heap-based implementations of priority queues can't reprioritize an entry any more quickly than find+delete+insert, which pretty much rules them out as a basis for a scalable scheduler with priority inheritance (let alone PCP emulation). I have Solaris style adaptive locks in my tree with my lockstat patch under -rt. I've also modified my lockstat patch to track readers correctly now with rwsem and the like to see where the single reader limitation in the rtmutex blows it. Ooh, that's neat. The next time I can cook up an excuse to run a kernel that won't load this damn WiFi driver, I'll try it out. Some of the people I work with are real engineers and respect in-system instrumentation. So far I've seen less than 10 percent of in-kernel contention events actually worth spinning on and the rest of the stats imply that the mutex owner in question is either preempted or blocked on something else. That's a good thing; it implies that in-kernel algorithms don't take locks needlessly as a matter of cargo-cult habit. Attempting to take a lock (other than an uncontended futex, which is practically free) should almost always transfer control to the thread that has the power to deliver the information (or the free slot) that you're looking for -- or in the case of an external data source/sink, should send you into low-power mode until time and tide give you something new to do. Think of it as a just-in-time inventory system; if you keep too much product in stock (or free warehouse space), you're wasting space and harming responsiveness to a shift in demand. Once in a while you have to play Sokoban in order to service a request promptly; that's exactly the case that priority inheritance is meant to help with. The fiddly part, on a non-real-time-no-matter-what-the-label-says system with an opaque cache architecture and mysterious hidden costs of context switching, is to minimize the lossage resulting from brutal timer- or priority-inheritance-driven preemption. Given the way people code these days -- OK, it was probably just as bad back in the day -- the only thing worse than letting the axe fall at random is to steal the CPU away the moment a contended lock is released, because the next 20 lines of code probably poke one last time at all the data structures the task had in cache right before entering the critical section. That doesn't hurt so bad on RTOS-friendly hardware -- an MMU-less system with either zero or near-infinite cache -- but it's got to make this year's
Re: Renice X for cpu schedulers
On Fri, Apr 20, 2007 at 12:12:29AM -0700, Michael K. Edwards wrote: Actual fractional CPU reservation is a bit different, and is probably best handled with container-type infrastructure (not quite virtualization, but not quite scheduling classes either). SGI pioneered this (in open systems space -- IBM probably had it first, as usual) with GRIO in XFS. (That was I/O throughput reservation of I'm very aware of this having grow up on those systems and see what 30k USD of hardware can do for you with the right kernel facilties. It would be a mind blower to get OpenGL and friends back to that level of performance with regards to React/Pro's rt abilities, frame drop would just be gone and we'd own gaming. No joke. We have a number of former SGI XFS engineers here at NetApp and I should ask them about the GRIO implementation. course, not CPU bandwidth -- but IIRC IRIX had CPU reservation too). There's a more general class of techniques in which it's worth spending idle cycles speculating along paths that might or might not be taken depending on unpredictable I/O; I'd be surprised if you couldn't approximate most of the sane balancing strategies in this area within the economic dispatch scheduler model. (Good JIT What is that ? never heard of it before. I don't know where the -rt patch enters in. But if you need agile reprioritization with a deep runnable queue, either under direct application control or as a side effect of priority inheritance or a related OS-enforced protocol, then you need a kernel-level data structure with a fancier interface than the classic insert/find/delete-min priority queue. From what I've read (this is not my area of expertise and I don't have Knuth handy), the relatively simple heap-based implementations of priority queues can't reprioritize an entry any more quickly than find+delete+insert, which pretty much rules them out as a basis for a scalable scheduler with priority inheritance (let alone PCP emulation). The -rt patch has turnstile-esque infrastructure that's stack allocated. Linux's lock hierarchy is relatively shallow (compensated with a heavy use of per CPU method and RCU-ified algorithms in place of rwlocks) so I've encountered nothing close to this that would demand such an overly sophisticated mechanism. I'm aware of PCP and preemptions thresholds. I created the lockstat infrastructure as a means of precisely measuring contention in -rt in anticipation to experiment with these techniques. I mention -rt because it's the most likely place to encounter what you're talking about, not an app. I have Solaris style adaptive locks in my tree with my lockstat patch under -rt. I've also modified my lockstat patch to track readers ... Ooh, that's neat. The next time I can cook up an excuse to run a kernel that won't load this damn WiFi driver, I'll try it out. Some of the people I work with are real engineers and respect in-system instrumentation. It's not publically released yet since I'm still stuck in .20-rc6 land and the soft lock up detector triggers. I need to forward port it and my lockstat changes to the most recent -rt patch. I've been stalled on revision control problem that I'm trying to solve with monotone for at least a month (of my own spare time). That's a good thing; it implies that in-kernel algorithms don't take locks needlessly as a matter of cargo-cult habit. Attempting to take The jury is still out on this until I can record what the rtmutex owner's state is in. No further conclusion can be made until then. I think this is very interesting pursuit/investigation. a lock (other than an uncontended futex, which is practically free) should almost always transfer control to the thread that has the power to deliver the information (or the free slot) that you're looking for -- or in the case of an external data source/sink, should send you into low-power mode until time and tide give you something new to do. Think of it as a just-in-time inventory system; if you keep too much product in stock (or free warehouse space), you're wasting space and harming responsiveness to a shift in demand. Once in a while you have to play Sokoban in order to service a request promptly; that's exactly the case that priority inheritance is meant to help with. What did you mean by this ? Victor Yodaiken's stuff ? The fiddly part, on a non-real-time-no-matter-what-the-label-says system with an opaque cache architecture and mysterious hidden costs of context switching, is to minimize the lossage resulting from brutal timer- or priority-inheritance-driven preemption. Given the way people code these days -- OK, it was probably just as bad back in the day -- the only thing worse than letting the axe fall at random is to steal the CPU away the moment a contended lock is released, because My adaptive spin stuff in front of an rtmutex is design to complement Steve Rostedt's owner stealing code also in that path and
Re: Renice X for cpu schedulers
On Thu, Apr 19, 2007 at 05:20:53PM -0700, Michael K. Edwards wrote: > Embedded systems are already in 2007, and the mainline Linux scheduler > frankly sucks on them, because it thinks it's back in the 1960's with > a fixed supply and captive demand, pissing away "CPU bandwidth" as > waste heat. Not to say it's an easy problem; even academics with a > dozen publications in this area don't seem to be able to model energy > usage to the nearest big O, let alone design a stable economic > dispatch engine. But it helps to acknowledge what the problem is: > even in a 1960's raised-floor screaming-air-conditioners > screw-the-power-bill machine room, you can't actually run a > half-decent CPU flat out any more without burning it to a crisp. > stupid. What's your excuse? ;-) It's now possible to QoS significant parts of the kernel since we now have a deadline mechanism in place. In the original 2.4 kernel, TimeSys's irq-thread allowed for the processing of skbuffs in a thread under a CPU reservation run category which was use to provide QoS I believe. This basic mechanish can now be generalized to many place in the kernel and put it under scheduler control. It's just a matter of who and when somebody is going take on this task. bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thu, Apr 19, 2007 at 06:32:15PM -0700, Michael K. Edwards wrote: > But I think SCHED_FIFO on a chain of tasks is fundamentally not the > right way to handle low audio latency. The object with a low latency > requirement isn't the task, it's the device. When it's starting to > get urgent to deliver more data to the device, the task that it's > waiting on should slide up the urgency scale; and if it's waiting on > something else, that something else should slide up the scale; and so > forth. Similarly, responding to user input is urgent; so when user > input is available (by whatever mechanism), the task that's waiting > for it should slide up the urgency scale, etc. DSP operations like, particularly with digital synthesis, tend to max the CPU doing vector operations on as many processors as it can get a hold of. In a live performance critical application, it's important to be able to deliver a protected amount of CPU to a thread doing that work as well as response to external input such as controllers, etc... > In practice, you probably don't want to burden desktop Linux with > priority inheritance where you don't have to. Priority queues with > algorithmically efficient decrease-key operations (Fibonacci heaps and > their ilk) are complicated to implement and have correspondingly high > constant factors. (However, a sufficiently clever heuristic for > assigning quasi-static task priorities would usually short-circuit the > priority cascade; if you can keep N small in the > tasks-with-unpredictable-priority queue, you can probably use a > simpler flavor with O(log N) decrease-key. Ask someone who knows more > about data structures than I do.) These are app issue and not really somethings that's mutable in kernel per se with regard to the -rt patch. > More importantly, non-real-time application coders aren't very smart > about grouping data structure accesses on one side or the other of a > system call that is likely to release a lock and let something else > run, flushing application data out of cache. (Kernel coders aren't > always smart about this either; see LKML threads a few weeks ago about > racy, and cache-stall-prone, f_pos handling in VFS.) So switching > tasks immediately on lock release is usually the wrong thing to do if > letting the task run a little longer would allow it to reach a point > where it has to block anyway. I have Solaris style adaptive locks in my tree with my lockstat patch under -rt. I've also modified my lockstat patch to track readers correctly now with rwsem and the like to see where the single reader limitation in the rtmutex blows it. So far I've seen less than 10 percent of in-kernel contention events actually worth spinning on and the rest of the stats imply that the mutex owner in question is either preempted or blocked on something else. I've been trying to get folks to try this on a larger machine than my 2x AMD64 box so that I there is more data regarding Linux contention and overschedulling in -rt. > Anyway, I already described the urgency-driven strategy to the extent > that I've thought it out, elsewhere in this thread. I only held this > draft back because I wanted to double-check my latency measurements. bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Fri, 2007-04-20 at 08:47 +1000, Con Kolivas wrote: > It's those who want X to have an unfair advantage that want it to do > something "special". I hope you're not lumping me in with "those". If X + client had been able to get their fair share and do so in the low latency manner they need, I would have been one of the carrots instead of being the stick. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thu, Apr 19, 2007 at 12:26:03PM -0700, Ray Lee wrote: > On 4/19/07, Con Kolivas <[EMAIL PROTECTED]> wrote: > >The one fly in the ointment for > >linux remains X. I am still, to this moment, completely and utterly stunned > >at why everyone is trying to find increasingly complex unique ways to > >manage > >X when all it needs is more cpu[1]. > [...and hence should be reniced] > > The problem is that X is not unique. There's postgresql, memcached, > mysql, db2, a little embedded app I wrote... all of these perform work > on behalf of another process. It's just most *noticeable* with X, as > pretty much everyone is running that. But for most of those apps, we don't actually care if they do fairly degrade in performance as other loads on the system ramp up. However the user prefers X to be given priority in these situations. Whether that is the design of X, x clients, or the human condition really doesn't matter two hoots to the scheduler. > If we had some way for the scheduler to decide to donate part of a > client process's time slice to the server it just spoke to (with an > exponential dampening factor -- take 50% from the client, give 25% to > the server, toss the rest on the floor), that -- from my naive point > of view -- would be a step toward fixing the underlying issue. Or I > might be spouting crap, who knows. Firstly, lots of clients in your list are remote. X usually isn't. However for X, a syscall or something to donate time might not be such a bad idea... but given a couple of X clients and a server against a parallel make, this is probably just going to make the clients slow down as well without giving enough priority to the server. X isn't special so much because it does work on behalf of others (as you said, lots of things do that). It is special simply because we _want_ rendering to have priority of the CPU (if you shifed CPU intensive rendering to the clients, you'd most likely want to give them priority to); nice, right? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thu, Apr 19, 2007 at 09:17:25AM -0400, Mark Lord wrote: > Con Kolivas wrote: > s go ahead and think up great ideas for other ways of metering out cpu > >bandwidth for different purposes, but for X, given the absurd simplicity > >of renicing, why keep fighting it? Again I reiterate that most users of SD > >have not found the need to renice X anyway except if they stick to old > >habits of make -j4 on uniprocessor and the like, and I expect that those > >on CFS and Nicksched would also have similar experiences. > > Just plain "make" (no -j2 or -j) is enough to kill interactivity > on my 2GHz P-M single-core non-HT machine with SD. Is this with or without X reniced? > But with the very first posted version of CFS by Ingo, > I can do "make -j2" no problem and still have a nicely interactive destop. How well does cfs run if you have the granularity set to something like 30ms (3000)? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thursday 19 April 2007, Con Kolivas wrote: >On Friday 20 April 2007 04:16, Gene Heskett wrote: >> On Thursday 19 April 2007, Con Kolivas wrote: >> >> [and I snipped a good overview] >> >> >So yes go ahead and think up great ideas for other ways of metering out >> > cpu bandwidth for different purposes, but for X, given the absurd >> > simplicity of renicing, why keep fighting it? Again I reiterate that >> > most users of SD have not found the need to renice X anyway except if >> > they stick to old habits of make -j4 on uniprocessor and the like, and I >> > expect that those on CFS and Nicksched would also have similar >> > experiences. >> >> FWIW folks, I have never touched x's niceness, its running at the default >> -1 for all of my so-called 'tests', and I have another set to be rebooted >> to right now. And yes, my kernel makeit script uses -j4 by default, and >> has used -j8 just for effects, which weren't all that different from what >> I expected in 'abusing' a UP system that way. The system DID remain >> usable, not snappy, but usable. > >Gene, you're agreeing with me. You've shown that you're very happy with a > fair distribution of cpu and leaving X at nice 0. I was quite happy till Ingo's first patch came out, and it was even better, but I over-wrote it, and we're still figuring out just exactly what the magic twanger was that made it all click for me. OTOH, I don't think that patch passed muster with Mike G., either. We have obviously different workloads, and critical points in them. >> Having tried re-nicing X a while back, and having the rest of the system >> suffer in quite obvious ways for even 1 + or - from its default felt >> pretty bad from this users perspective. >> >> It is my considered opinion (yeah I know, I'm just a leaf in the hurricane >> of this list) that if X has to be re-niced from the 1 point advantage its >> had for ages, then something is basicly wrong with the overall scheduling, >> cpu or i/o, or both in combination. FWIW I'm using cfq for i/o. > >It's those who want X to have an unfair advantage that want it to do >something "special". Your agreement that it works fine at nice 0 shows you >don't want it to have an unfair advantage. Others who want it to have an >unfair advantage _can_ renice it if they desire. But if the cpu scheduler >gives X an unfair advantage within the kernel by default then you have _no_ >choice. If you leave the choice up to userspace (renice or not) then both >parties get their way. If you put it into the kernel only one party wins and >there is no way for the Genes (and Cons) of this world to get it back. > >Your opinion is as valuable as eveyone else's Gene. It is hard to get people >to speak on as frightening a playground as the linux kernel mailing list so >please do. In the FWIW category, htop has always told me that x is running at -1, not zero. Now, I have NDI where this is actually set at, so I'd have to ask stupid questions here if I did wanna play with it. Which I really don't, the last time I tried to -5 x, kde got a whole lot LESS responsive. But heck, 2.6.2 was freshly minted then too and I've long since forgot how I went about that unless I used htop to change it, the most likely scenario that I can picture at this late date. As for speaking my mind, yes, and I've been slapped down a few times, as much because I do a lot of bitching and microscopic amounts of patch submission. The only patch I ever submitted was for something in the floppy driver, way back in the middle of 2.2 days, rejected because I didn't know how to use the tools correctly. I didn't, so it was a shrug and my feelings weren't hurt. Some see that as an unbalanced set of books and I'm aware of it. OTOH, I think I do a pretty good job of playing the canary here, and that should be worth something if for no other reason than I can turn into a burr under somebodies saddle when things go all aglay. But I figure if its happening to me, then if I don't fuss, and that gotcha gets into a distro kernel, there are gonna be a hell of a lot more folks than me trying to grab the microphone. BTW, I'm glad you are feeling well enough to get into this again. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) There cannot be a crisis next week. My schedule is already full. -- Henry Kissinger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thursday 19 April 2007, Con Kolivas wrote: >On Friday 20 April 2007 04:16, Gene Heskett wrote: >> On Thursday 19 April 2007, Con Kolivas wrote: >> >> [and I snipped a good overview] >> >> >So yes go ahead and think up great ideas for other ways of metering out >> > cpu bandwidth for different purposes, but for X, given the absurd >> > simplicity of renicing, why keep fighting it? Again I reiterate that >> > most users of SD have not found the need to renice X anyway except if >> > they stick to old habits of make -j4 on uniprocessor and the like, and I >> > expect that those on CFS and Nicksched would also have similar >> > experiences. >> >> FWIW folks, I have never touched x's niceness, its running at the default >> -1 for all of my so-called 'tests', and I have another set to be rebooted >> to right now. And yes, my kernel makeit script uses -j4 by default, and >> has used -j8 just for effects, which weren't all that different from what >> I expected in 'abusing' a UP system that way. The system DID remain >> usable, not snappy, but usable. > >Gene, you're agreeing with me. You've shown that you're very happy with a > fair distribution of cpu and leaving X at nice 0. I was quite happy till Ingo's first patch came out, and it was even better, but I over-wrote it, and we're still figuring out just exactly what the magic twanger was that made it all click for me. OTOH, I don't think that patch passed muster with Mike G., either. We have obviously different workloads, and critical points in them. >> Having tried re-nicing X a while back, and having the rest of the system >> suffer in quite obvious ways for even 1 + or - from its default felt >> pretty bad from this users perspective. >> >> It is my considered opinion (yeah I know, I'm just a leaf in the hurricane >> of this list) that if X has to be re-niced from the 1 point advantage its >> had for ages, then something is basicly wrong with the overall scheduling, >> cpu or i/o, or both in combination. FWIW I'm using cfq for i/o. > >It's those who want X to have an unfair advantage that want it to do >something "special". Your agreement that it works fine at nice 0 shows you >don't want it to have an unfair advantage. Others who want it to have an >unfair advantage _can_ renice it if they desire. But if the cpu scheduler >gives X an unfair advantage within the kernel by default then you have _no_ >choice. If you leave the choice up to userspace (renice or not) then both >parties get their way. If you put it into the kernel only one party wins and >there is no way for the Genes (and Cons) of this world to get it back. > >Your opinion is as valuable as eveyone else's Gene. It is hard to get people >to speak on as frightening a playground as the linux kernel mailing list so >please do. In the FWIW category, htop has always told me that x is running at -1, not zero. Now, I have NDI where this is actually set at, so I'd have to ask stupid questions here if I did wanna play with it. Which I really don't, the last time I tried to -5 x, kde got a whole lot LESS responsive. But heck, 2.6.2 was freshly minted then too and I've long since forgot how I went about that unless I used htop to change it, the most likely scenario that I can picture at this late date. As for speaking my mind, yes, and I've been slapped down a few times, as much because I do a lot of bitching and microscopic amounts of patch submission. The only patch I ever submitted was for something in the floppy driver, way back in the middle of 2.2 days, rejected because I didn't know how to use the tools correctly. I didn't, so it was a shrug and my feelings weren't hurt. Some see that as an unbalanced set of books and I'm aware of it. OTOH, I think I do a pretty good job of playing the canary here, and that should be worth something if for no other reason than I can turn into a burr under somebodies saddle when things go all aglay. But I figure if its happening to me, then if I don't fuss, and that gotcha gets into a distro kernel, there are gonna be a hell of a lot more folks than me trying to grab the microphone. BTW, I'm glad you are feeling well enough to get into this again. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) There cannot be a crisis next week. My schedule is already full. -- Henry Kissinger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On 4/19/07, Lee Revell <[EMAIL PROTECTED]> wrote: IMHO audio streamers should use SCHED_FIFO thread for time critical work. I think it's insane to expect the scheduler to figure out that these processes need low latency when they can just be explicit about it. "Professional" audio software does it already, on Linux as well as other OS... It is certainly true that SCHED_FIFO is currently necessary in the layers of an audio application lying closest to the hardware, if you don't want to throw a monstrous hardware ring buffer at the problem. See the alsa-devel archives for a patch to aplay (sched_setscheduler plus some cleanups) that converts it from "unsafe at any speed" (on a non-RT kernel) to a rock-solid 18ms round trip from PCM in to PCM out. (The hardware and driver aren't terribly exotic for an SoC, and the measurement was done with aplay -C | aplay -P -- on a not-particularly-tuned CONFIG_PREEMPT kernel with a 12ms+ peak scheduling latency according to cyclictest. A similar test via /dev/dsp, done through a slightly modified OSS emulation layer to the same driver, measures at 40ms and is probably tuned too conservatively.) Note that SCHED_FIFO may be less necessary on an -rt kernel, but I haven't had that option on the embedded hardware I've been working with lately. Ingo, please please pretty please pick a -stable branch one of these days and provide a git repo with -rt integrated against that branch. Then I could port our chip support to it -- all of which will be GPLed after the impending code review -- after which I might have a prayer of strong-arming our chip vendor into porting their WiFi driver onto -rt. It's really a much more interesting scheduler use case than make -j200 under X, because it's a best-effort SCHED_BATCH-ish load that wants to be temporally clustered for power management reasons. (Believe it or not, a stable -rt branch with a clock-scaling-aware scheduler is the one thing that might lead to this major WiFi vendor's GPLing their driver core. They're starting to see the light on the biz dev side, and the nature of the devices their chip will go in makes them somewhat less concerned about the regulatory fig leaf aspect of a closed-source driver; but they would have to port off of the third-party real-time executive embedded within the driver, and mainline's task and timer granularity won't cut it. I can't even get more detail about _why_ it won't cut it unless there's some remotely supportable -rt base they could port to.) But I think SCHED_FIFO on a chain of tasks is fundamentally not the right way to handle low audio latency. The object with a low latency requirement isn't the task, it's the device. When it's starting to get urgent to deliver more data to the device, the task that it's waiting on should slide up the urgency scale; and if it's waiting on something else, that something else should slide up the scale; and so forth. Similarly, responding to user input is urgent; so when user input is available (by whatever mechanism), the task that's waiting for it should slide up the urgency scale, etc. In practice, you probably don't want to burden desktop Linux with priority inheritance where you don't have to. Priority queues with algorithmically efficient decrease-key operations (Fibonacci heaps and their ilk) are complicated to implement and have correspondingly high constant factors. (However, a sufficiently clever heuristic for assigning quasi-static task priorities would usually short-circuit the priority cascade; if you can keep N small in the tasks-with-unpredictable-priority queue, you can probably use a simpler flavor with O(log N) decrease-key. Ask someone who knows more about data structures than I do.) More importantly, non-real-time application coders aren't very smart about grouping data structure accesses on one side or the other of a system call that is likely to release a lock and let something else run, flushing application data out of cache. (Kernel coders aren't always smart about this either; see LKML threads a few weeks ago about racy, and cache-stall-prone, f_pos handling in VFS.) So switching tasks immediately on lock release is usually the wrong thing to do if letting the task run a little longer would allow it to reach a point where it has to block anyway. Anyway, I already described the urgency-driven strategy to the extent that I've thought it out, elsewhere in this thread. I only held this draft back because I wanted to double-check my latency measurements. Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thu, 19 Apr 2007, Ed Tomlinson wrote: > > > > SD just doesn't do nearly as good as the stock scheduler, or CFS, here. > > > > I'm quite likely one of the few single-CPU/non-HT testers of this stuff. > > If it should ever get more widely used I think we'd hear a lot more > > complaints. > > amd64 UP here. SD with several makes running works just fine. The thing is, it probably depends *heavily* on just how much work the X server ends up doing. Fast video hardware? The X server doesn't need to busy-wait much. Not a lot of eye-candy? The X server is likely fast enough even with a slower card that it still gets sufficient CPU time and isn't getting dinged by any balancing. DRI vs non-DRI? Which window manager (maybe some of the user-visible lags come from there..) etc etc. Anyway, I'd ask people to look a bit at the current *regressions* instead of spending all their time on something that won't even be merged before 2.6.21 is released, and we thus have some mroe pressing issues. Please? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thursday 19 April 2007 12:15, Mark Lord wrote: > Con Kolivas wrote: > > On Thursday 19 April 2007 23:17, Mark Lord wrote: > >> Con Kolivas wrote: > >> s go ahead and think up great ideas for other ways of metering out cpu > >> > >>> bandwidth for different purposes, but for X, given the absurd simplicity > >>> of renicing, why keep fighting it? Again I reiterate that most users of > >>> SD have not found the need to renice X anyway except if they stick to old > >>> habits of make -j4 on uniprocessor and the like, and I expect that those > >>> on CFS and Nicksched would also have similar experiences. > >> Just plain "make" (no -j2 or -j) is enough to kill interactivity > >> on my 2GHz P-M single-core non-HT machine with SD. > >> > >> But with the very first posted version of CFS by Ingo, > >> I can do "make -j2" no problem and still have a nicely interactive destop. > > > > Cool. Then there's clearly a bug with SD that manifests on your machine as > > it > > should not have that effect at all (and doesn't on other people's > > machines). > > I suggest trying the latest version which fixes some bugs. > > SD just doesn't do nearly as good as the stock scheduler, or CFS, here. > > I'm quite likely one of the few single-CPU/non-HT testers of this stuff. > If it should ever get more widely used I think we'd hear a lot more > complaints. amd64 UP here. SD with several makes running works just fine. Ed Tomlinson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
Con Kolivas wrote: > You're welcome and thanks for taking the floor to speak. I would say you have > actually agreed with me though. X is not unique, it's just an obvious so > let's not design the cpu scheduler around the problem with X. Same goes for > every other application. Leaving the choice to hand out differential cpu > usage when they seem to need is should be up to the users. The donation idea > has been done before in some fashion or other in things like "back-boost" > which Linus himself tried in 2.5.X days. It worked lovely till it did the > wrong thing and wreaked havoc. I know. I came to the party late, or I would have played with it back then. Perhaps you could correct me, but it seems his back-boost didn't do any dampening, which means the system could get into nasty capture scenarios, where two processes bouncing messages back and forth could take over the scheduler and starve out the rest. It seems pretty obvious in hind-sight that something without exponential dampening would allow feedback loops. Regardless, perhaps we are in agreement. I just don't like the idea of having to guess how much work postgresql is going to be doing on my client processes' behalf. Worse, I don't necessarily want it to have that -10 priority when it's going and updating statistics or whatnot, or any other housekeeping activity that shouldn't make a noticeable impact on the rest of the system. Worst, I'm leery of the idea that if I get its nice level wrong, that I'm going to be affecting the overall throughput of the server. All of which are only hypothetical worries, granted. Anyway, I'll shut up now. Thanks again for stickin' with it. Ray - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On 4/19/07, Con Kolivas <[EMAIL PROTECTED]> wrote: The cpu scheduler core is a cpu bandwidth and latency proportionator and should be nothing more or less. Not really. The CPU scheduler is (or ought to be) what electric utilities call an economic dispatch mechanism -- a real-time controller whose goal is to service competing demands cost-effectively from a limited supply, without compromising system stability. If you live in the 1960's, coal and nuclear (and a little bit of fig-leaf hydro) are all you have, it takes you twelve hours to bring plants on and off line, and there's no live operational control or pricing signal between you and your customers. So you're stuck running your system at projected peak + operating margin, dumping excess power as waste heat most of the time, and browning or blacking people out willy-nilly when there's excess demand. Maybe you get to trade off shedding the loads with the worst transmission efficiency against degrading the customers with the most tolerance for brownouts (or the least regulatory clout). That's life without modern economic dispatch. If you live in 2007, natural gas and (outside the US) better control over nuclear plants give you more ability to ramp supply up and down with demand on something like a 15-minute cycle. Better yet, you can store a little energy "in the grid" to smooth out instantaneous demand fluctuations; if you're lucky, you also have enough fast-twitch hydro (thanks, Canada!) that you can run your coal and lame-ass nuclear very close to base load even when gas is expensive, and even pump water back uphill when demand dips. (Coal is nasty stuff and a worse contributor by far to radiation exposure than nuclear generation; but on current trends it's going to last a lot longer than oil and gas, and it's a lot easier to stockpile next to the generator.) Best of all, you have industrial customers who will trade you live control (within limits) over when and how much power they take in return for a lower price per unit energy. Some of them will even dump power back into the grid when you ask them to. So now the biggest challenge in making supply and demand meet (in the short term) is to damp all the different ways that a control feedback path might result in an oscillation -- or in runaway pricing. Because there's always some asshole greedhead who will gamble with system stability in order to game the pricing mechanism. Lots of 'em, if you're in California and your legislature is so dumb, or so bought, that they let the asshole greedheads design the whole system so they can game it to the max. (But that's a whole 'nother rant.) Embedded systems are already in 2007, and the mainline Linux scheduler frankly sucks on them, because it thinks it's back in the 1960's with a fixed supply and captive demand, pissing away "CPU bandwidth" as waste heat. Not to say it's an easy problem; even academics with a dozen publications in this area don't seem to be able to model energy usage to the nearest big O, let alone design a stable economic dispatch engine. But it helps to acknowledge what the problem is: even in a 1960's raised-floor screaming-air-conditioners screw-the-power-bill machine room, you can't actually run a half-decent CPU flat out any more without burning it to a crisp. You can act ignorant and let the PMIC brown you out when it has to. Or you can start coping in mainline the way that organizations big enough (and smart enough) to feel the heat in their pocketbooks do in their pet kernels. (Boo on Google for not sharing, and props to IBM for doing their damnedest.) And guess what? The system will actually get simpler, and stabler, and faster, and easier to maintain, because it'll be based on a real theory of operation with equations and things instead of a bunch of opaque, undocumented shotgun heuristics. This hypothetical economic-dispatch scheduler will still _have_ heuristics, of course -- you can't begin to model a modern CPU accurately on-line. But they will be contained in _data_ rather than _code_, and issues of numerical stability will be separated cleanly from the rule set. You'll be able to characterize the rule set's domain of stability, given a conservative set of assumptions about the feedback paths in the system under control, with the sort of techniques they teach in the engineering schools that none of us (me included) seem to have attended. (I went to school thinking I was going to be a physicist. Wishful thinking -- but I was young and stupid. What's your excuse? ;-) OK, it feels better to have that off my chest. Apologies to those readers -- doubtless the vast majority of LKML, including everyone else in this thread -- for whom it's irrelevant, pseudo-learned pontification with no patch attached. And my sincere thanks to Ingo, Con, and really everyone else CC'ed, without whom Linux wouldn't be as good as it is (really quite good, all things considered) and wouldn't contribute as much as it
Re: Renice X for cpu schedulers
On Friday 20 April 2007 02:15, Mark Lord wrote: > Con Kolivas wrote: > > On Thursday 19 April 2007 23:17, Mark Lord wrote: > >> Con Kolivas wrote: > >> s go ahead and think up great ideas for other ways of metering out cpu > >> > >>> bandwidth for different purposes, but for X, given the absurd > >>> simplicity of renicing, why keep fighting it? Again I reiterate that > >>> most users of SD have not found the need to renice X anyway except if > >>> they stick to old habits of make -j4 on uniprocessor and the like, and > >>> I expect that those on CFS and Nicksched would also have similar > >>> experiences. > >> > >> Just plain "make" (no -j2 or -j) is enough to kill interactivity > >> on my 2GHz P-M single-core non-HT machine with SD. > >> > >> But with the very first posted version of CFS by Ingo, > >> I can do "make -j2" no problem and still have a nicely interactive > >> destop. > > > > Cool. Then there's clearly a bug with SD that manifests on your machine > > as it should not have that effect at all (and doesn't on other people's > > machines). I suggest trying the latest version which fixes some bugs. > > SD just doesn't do nearly as good as the stock scheduler, or CFS, here. > > I'm quite likely one of the few single-CPU/non-HT testers of this stuff. > If it should ever get more widely used I think we'd hear a lot more > complaints. You are not really one of the few. A lot of my own work is done on a single core pentium M 1.7Ghz laptop. I am not endowed with truckloads of hardware like all the paid developers are. I recall extreme frustration myself when a developer a few years ago (around 2002) said he couldn't reproduce poor behaviour on his 4GB ram 4 x Xeon machine. Even today if I add up every machine I have in my house and work at my disposal it doesn't amount to that many cpus and that much ram. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Friday 20 April 2007 05:26, Ray Lee wrote: > On 4/19/07, Con Kolivas <[EMAIL PROTECTED]> wrote: > > The one fly in the ointment for > > linux remains X. I am still, to this moment, completely and utterly > > stunned at why everyone is trying to find increasingly complex unique > > ways to manage X when all it needs is more cpu[1]. > > [...and hence should be reniced] > > The problem is that X is not unique. There's postgresql, memcached, > mysql, db2, a little embedded app I wrote... all of these perform work > on behalf of another process. It's just most *noticeable* with X, as > pretty much everyone is running that. > > If we had some way for the scheduler to decide to donate part of a > client process's time slice to the server it just spoke to (with an > exponential dampening factor -- take 50% from the client, give 25% to > the server, toss the rest on the floor), that -- from my naive point > of view -- would be a step toward fixing the underlying issue. Or I > might be spouting crap, who knows. > > The problem is real, though, and not limited to X. > > While I have the floor, thank you, Con, for all your work. You're welcome and thanks for taking the floor to speak. I would say you have actually agreed with me though. X is not unique, it's just an obvious so let's not design the cpu scheduler around the problem with X. Same goes for every other application. Leaving the choice to hand out differential cpu usage when they seem to need is should be up to the users. The donation idea has been done before in some fashion or other in things like "back-boost" which Linus himself tried in 2.5.X days. It worked lovely till it did the wrong thing and wreaked havoc. As is shown repeatedly, the workarounds and the tweaks and the bonuses and the decide on who to give advantage to, when done by the cpu scheduler, is also what is its undoing as it can't always get it right. The consequences of getting it wrong on the other hand are disastrous. The cpu scheduler core is a cpu bandwidth and latency proportionator and should be nothing more or less. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Friday 20 April 2007 04:16, Gene Heskett wrote: > On Thursday 19 April 2007, Con Kolivas wrote: > > [and I snipped a good overview] > > >So yes go ahead and think up great ideas for other ways of metering out > > cpu bandwidth for different purposes, but for X, given the absurd > > simplicity of renicing, why keep fighting it? Again I reiterate that most > > users of SD have not found the need to renice X anyway except if they > > stick to old habits of make -j4 on uniprocessor and the like, and I > > expect that those on CFS and Nicksched would also have similar > > experiences. > > FWIW folks, I have never touched x's niceness, its running at the default > -1 for all of my so-called 'tests', and I have another set to be rebooted > to right now. And yes, my kernel makeit script uses -j4 by default, and > has used -j8 just for effects, which weren't all that different from what I > expected in 'abusing' a UP system that way. The system DID remain usable, > not snappy, but usable. Gene, you're agreeing with me. You've shown that you're very happy with a fair distribution of cpu and leaving X at nice 0. > > Having tried re-nicing X a while back, and having the rest of the system > suffer in quite obvious ways for even 1 + or - from its default felt pretty > bad from this users perspective. > > It is my considered opinion (yeah I know, I'm just a leaf in the hurricane > of this list) that if X has to be re-niced from the 1 point advantage its > had for ages, then something is basicly wrong with the overall scheduling, > cpu or i/o, or both in combination. FWIW I'm using cfq for i/o. It's those who want X to have an unfair advantage that want it to do something "special". Your agreement that it works fine at nice 0 shows you don't want it to have an unfair advantage. Others who want it to have an unfair advantage _can_ renice it if they desire. But if the cpu scheduler gives X an unfair advantage within the kernel by default then you have _no_ choice. If you leave the choice up to userspace (renice or not) then both parties get their way. If you put it into the kernel only one party wins and there is no way for the Genes (and Cons) of this world to get it back. Your opinion is as valuable as eveyone else's Gene. It is hard to get people to speak on as frightening a playground as the linux kernel mailing list so please do. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On 4/19/07, Gene Heskett <[EMAIL PROTECTED]> wrote: Having tried re-nicing X a while back, and having the rest of the system suffer in quite obvious ways for even 1 + or - from its default felt pretty bad from this users perspective. It is my considered opinion (yeah I know, I'm just a leaf in the hurricane of this list) that if X has to be re-niced from the 1 point advantage its had for ages, then something is basicly wrong with the overall scheduling, cpu or i/o, or both in combination. FWIW I'm using cfq for i/o. I think I just realized why the X server is such a problem. If it gets preempted when it's not actually selecting/polling over a set of fds that includes the input devices, the scheduler doesn't know that it's a good candidate for scheduling when data arrives on those devices. (That's all that any of these dynamic priority heuristics really seem to do -- weight the scheduler towards switching to conspicuously I/O bound tasks when they become runnable, without the forced preemption on lock release that would result from a true priority inheritance mechanism.) One way of looking at this is that "fairness-driven" scheduling is a poor man's priority ceiling protocol for I/O bound workloads, with the implicit priority of an fd or lock given by how desperately the reader side needs more data in order to accomplish anything. "Nice" on a task is sort of an indirect way of boosting or dropping the base priority of the fds it commonly waits on. I recognize this is a drastic oversimplification, and possibly even a misrepresentation of the design _intent_; but I think it's fairly accurate in terms of the design _effect_. The event-driven, non-threaded design of the X server makes it particularly vulnerable to "non-interactive behavior" penalties, which is appropriate to the extent that it's an output device having trouble keeping up with rendering -- in fact, that's exactly the throttling mechanism you need in order to exert back-pressure on the X client. (Trying to exert back-pressure over Linux's local domain sockets seems to be like pushing on a rope, but that's a different problem.) That same event-driven design would prioritize input events just fine -- except the scheduler won't wake the task in order to deliver them, because as far as it's concerned the X server is getting more than enough I/O to keep it busy. It's not only not blocked on the input device, it isn't even selecting on it at the moment that its timeslice expires -- so no amount of poor-man's PCP emulation is going to help. What "more negative nice on the X server than on any CPU-bound process" seems to do is to put the X server on a hair-trigger, boosting its dynamic priority in a render-limited scenario (on some graphics cards!) just enough to cancel the penalty for non-interactive behavior. It's forced to share _some_ CPU cycles, but nobody else is allowed a long enough timeslice to keep the X server off the CPU (and insensitive to input events) for long. Not terribly efficient in terms of context switch / cache eviction overhead, but certainly friendlier to the PEBCAK (who is clearly putting totally inappropriate load on a single-threaded CPU by running both a local X server and non-SCHED_BATCH compute jobs) than a frozen mouse cursor. So what's the right answer? Not special-casing the X server, that's for sure. If this analysis is correct (and as of now it's pure speculation), any event-driven application that does compute work opportunistically in the absence of user interaction is vulnerable to the same overzealous squelching. I wouldn't design a new application that way, of course -- user interaction belongs in a separate thread on any UNIX-legacy system which assigns priorities to threads of control instead of to patterns of activity. But all sorts of Linux applications have been designed to implicitly elevate artificial throughput benchmarks over user responsiveness -- that has been the UNIX way at least since SVR4, and Linux's history of expensive thread switches prior to NPTL didn't help. If you want responsiveness when the CPU is oversubscribed -- and I for one do, which is one reason why I abandoned the Linux desktop once both Microsoft and Apple figured out how to make hyperthreading work in their favor -- you should probably think about how to get it without rewriting half of userspace. IMHO, dinking around with "fairness", as if there were any relationship these days between UIDs or process groups or any other control structure and the work that's trying to flow through the system, is not going to get you there. If this were my problem, I might start by attaching urgency to behavior instead of to thread ID, which demands a scheduler queue built around a data structure with a cheap decrease-key operation. I'd figure out how to propagate this urgency not just along lock chains but also along chains of fds that need flushing (or refilling) -- even if the reader (or writer) got preempted for
Re: Renice X for cpu schedulers
On 4/19/07, Con Kolivas <[EMAIL PROTECTED]> wrote: The one fly in the ointment for linux remains X. I am still, to this moment, completely and utterly stunned at why everyone is trying to find increasingly complex unique ways to manage X when all it needs is more cpu[1]. [...and hence should be reniced] The problem is that X is not unique. There's postgresql, memcached, mysql, db2, a little embedded app I wrote... all of these perform work on behalf of another process. It's just most *noticeable* with X, as pretty much everyone is running that. If we had some way for the scheduler to decide to donate part of a client process's time slice to the server it just spoke to (with an exponential dampening factor -- take 50% from the client, give 25% to the server, toss the rest on the floor), that -- from my naive point of view -- would be a step toward fixing the underlying issue. Or I might be spouting crap, who knows. The problem is real, though, and not limited to X. While I have the floor, thank you, Con, for all your work. Ray - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thursday 19 April 2007, Mark Lord wrote: >Con Kolivas wrote: >> On Thursday 19 April 2007 23:17, Mark Lord wrote: >>> Con Kolivas wrote: >>> s go ahead and think up great ideas for other ways of metering out cpu >>> bandwidth for different purposes, but for X, given the absurd simplicity of renicing, why keep fighting it? Again I reiterate that most users of SD have not found the need to renice X anyway except if they stick to old habits of make -j4 on uniprocessor and the like, and I expect that those on CFS and Nicksched would also have similar experiences. >>> >>> Just plain "make" (no -j2 or -j) is enough to kill interactivity >>> on my 2GHz P-M single-core non-HT machine with SD. >>> >>> But with the very first posted version of CFS by Ingo, >>> I can do "make -j2" no problem and still have a nicely interactive >>> destop. >> >> Cool. Then there's clearly a bug with SD that manifests on your machine as >> it should not have that effect at all (and doesn't on other people's >> machines). I suggest trying the latest version which fixes some bugs. > >SD just doesn't do nearly as good as the stock scheduler, or CFS, here. I found the early SD's much friendlier here, but I also think that at that point I was comparing SD to stock 2.6.21-rc5 and 6, and to say that it sucked would be a slight understatement. >I'm quite likely one of the few single-CPU/non-HT testers of this stuff. >If it should ever get more widely used I think we'd hear a lot more > complaints. I'm in that row of seats too Mark. Someday I have to build a new box, that's all there is to it... -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Lots of folks confuse bad management with destiny. -- Frank Hubbard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thursday 19 April 2007, Con Kolivas wrote: [and I snipped a good overview] >So yes go ahead and think up great ideas for other ways of metering out cpu >bandwidth for different purposes, but for X, given the absurd simplicity of >renicing, why keep fighting it? Again I reiterate that most users of SD have >not found the need to renice X anyway except if they stick to old habits of >make -j4 on uniprocessor and the like, and I expect that those on CFS and >Nicksched would also have similar experiences. FWIW folks, I have never touched x's niceness, its running at the default -1 for all of my so-called 'tests', and I have another set to be rebooted to right now. And yes, my kernel makeit script uses -j4 by default, and has used -j8 just for effects, which weren't all that different from what I expected in 'abusing' a UP system that way. The system DID remain usable, not snappy, but usable. Having tried re-nicing X a while back, and having the rest of the system suffer in quite obvious ways for even 1 + or - from its default felt pretty bad from this users perspective. It is my considered opinion (yeah I know, I'm just a leaf in the hurricane of this list) that if X has to be re-niced from the 1 point advantage its had for ages, then something is basicly wrong with the overall scheduling, cpu or i/o, or both in combination. FWIW I'm using cfq for i/o. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Moore's Constant: Everybody sets out to do something, and everybody does something, but no one does what he sets out to do. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
Con Kolivas wrote: On Thursday 19 April 2007 23:17, Mark Lord wrote: Con Kolivas wrote: s go ahead and think up great ideas for other ways of metering out cpu bandwidth for different purposes, but for X, given the absurd simplicity of renicing, why keep fighting it? Again I reiterate that most users of SD have not found the need to renice X anyway except if they stick to old habits of make -j4 on uniprocessor and the like, and I expect that those on CFS and Nicksched would also have similar experiences. Just plain "make" (no -j2 or -j) is enough to kill interactivity on my 2GHz P-M single-core non-HT machine with SD. But with the very first posted version of CFS by Ingo, I can do "make -j2" no problem and still have a nicely interactive destop. Cool. Then there's clearly a bug with SD that manifests on your machine as it should not have that effect at all (and doesn't on other people's machines). I suggest trying the latest version which fixes some bugs. SD just doesn't do nearly as good as the stock scheduler, or CFS, here. I'm quite likely one of the few single-CPU/non-HT testers of this stuff. If it should ever get more widely used I think we'd hear a lot more complaints. Cheers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
Con Kolivas wrote: Ok, there are 3 known schedulers currently being "promoted" as solid replacements for the mainline scheduler which address most of the issues with mainline (and about 10 other ones not currently being promoted). The main way they do this is through attempting to maintain solid fairness. There is enough evidence mounting now from the numerous test cases fixed by much fairer designs that this is the way forward for a general purpose cpu scheduler which is what linux needs. Interactivity of just about everything that needs low latency (ie audio and video players) are easily managed by maintaining low latency between wakeups and scheduling of all these low cpu users. On a "fair" scheduler these will all get high priority (and good response) because their CPU bandwidth usage will be much smaller than their entitlement and the scheduler will be trying to help them "catch up". So (as you say) they shouldn't be a problem. The one fly in the ointment for linux remains X. I am still, to this moment, completely and utterly stunned at why everyone is trying to find increasingly complex unique ways to manage X when all it needs is more cpu[1]. Now most of these are actually very good ideas about _extra_ features that would be desirable in the long run for linux, but given the ludicrous simplicity of renicing X I cannot fathom why people keep promoting these alternatives. At the time of 2.6.0 coming out we were desparately trying to get half decent interactivity within a reasonable time frame to release 2.6.0 without rewiring the whole scheduler. So I tweaked the crap out of the tunables that were already there[2]. X's needs are more complex than that (from my observations) in that the part of X that processes input doesn't use much CPU but the part that does output can be quite a heavy user of CPU (e.g. do a "ls -lR /" in an xterm and watch X chew up the CPU). At the same time, the part of X that processes input needs quick responsiveness as it's part of the interactive chain where this is less so for the output part. Where X comes unstuck in the current scheduler is that when the output part goes on one of its CPU storms it ceases to look like an interactive task and gets given lower priority. Ironically, this doesn't effect the output part of X but it does effect the input part and is manifest as crappy interactive response. One wonders whether modifying X so that it has two threads: one for output and one for input; that could be scheduled separately might help. I guess it would depend on whether there is insufficient independence between the two halves. Part of this issue is that giving X a high static priority runs the risk of the CPU hog output part disrupting scheduling of other important tasks. So don't give it too big a boost. So let's hear from the 3 people who generated the schedulers under the spotlight. These are recent snippets and by no means the only time these comments have been said. Without sounding too bold, we do know a thing or two about scheduling. CFS: On Thursday 19 April 2007 16:38, Ingo Molnar wrote: h. How about the following then: default to nice -10 for all (SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ special: root already has disk space reserved to it, root has special memory allocation allowances, etc. I dont see a reason why we couldnt by default make all root tasks have nice -10. This would be instantly loved by sysadmins i suspect ;-) It's worth noting that the -10 mentioned is roughly equivalent (in the old scheduler) to restoring interactive task status to X in those cases where it loses it due to a CPU storm in its output part. (distros that go the extra mile of making Xorg run under non-root could also go another extra one foot to renice that X server to -10.) Nicksched: On Wednesday 18 April 2007 15:00, Nick Piggin wrote: What's wrong with allowing X to get more than it's fair share of CPU time by "fiddling with nice levels"? That's what they're there for. and Staircase-Deadline: On Thursday 19 April 2007 09:59, Con Kolivas wrote: Remember to renice X to -10 for nicest desktop behaviour :) I'd like to add the EBS scheduler (posted by Aurema Pty Ltd a couple of years back) to this list as it also recommended running X at nice -5 to -10. Also some of the "interactive bonus" mechanisms in my SPA schedulers could be removed if X was reniced. In fact, with a reniced X the spa_svr (server oriented scheduler which attempts to minimise the time tasks spend on the queue waiting for CPU access and which doesn't have interactive bonuses) might be usable on a work station. [1]The one caveat I can think of is that when you share X sessions across multiple users -with a fair cpu scheduler-, having them all nice 0 also makes the distribution of cpu across the multiple users very even and smooth, without the expense of burning away the other person's cpu
Re: Renice X for cpu schedulers
On Thursday 19 April 2007 23:17, Mark Lord wrote: > Con Kolivas wrote: > s go ahead and think up great ideas for other ways of metering out cpu > > > bandwidth for different purposes, but for X, given the absurd simplicity > > of renicing, why keep fighting it? Again I reiterate that most users of > > SD have not found the need to renice X anyway except if they stick to old > > habits of make -j4 on uniprocessor and the like, and I expect that those > > on CFS and Nicksched would also have similar experiences. > > Just plain "make" (no -j2 or -j) is enough to kill interactivity > on my 2GHz P-M single-core non-HT machine with SD. > > But with the very first posted version of CFS by Ingo, > I can do "make -j2" no problem and still have a nicely interactive destop. Cool. Then there's clearly a bug with SD that manifests on your machine as it should not have that effect at all (and doesn't on other people's machines). I suggest trying the latest version which fixes some bugs. Thanks. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On 4/19/07, Peter Williams <[EMAIL PROTECTED]> wrote: PS I think that the tasks most likely to be adversely effected by X's CPU storms (enough to annoy the user) are audio streamers so when you're doing tests to determine the best nice value for X I suggest that would be a good criterion. Video streamers are also susceptible but glitches in video don't seem to annoy users as much as audio ones. IMHO audio streamers should use SCHED_FIFO thread for time critical work. I think it's insane to expect the scheduler to figure out that these processes need low latency when they can just be explicit about it. "Professional" audio software does it already, on Linux as well as other OS... Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
Peter Williams wrote: Con Kolivas wrote: Ok, there are 3 known schedulers currently being "promoted" as solid replacements for the mainline scheduler which address most of the issues with mainline (and about 10 other ones not currently being promoted). The main way they do this is through attempting to maintain solid fairness. There is enough evidence mounting now from the numerous test cases fixed by much fairer designs that this is the way forward for a general purpose cpu scheduler which is what linux needs. Interactivity of just about everything that needs low latency (ie audio and video players) are easily managed by maintaining low latency between wakeups and scheduling of all these low cpu users. On a "fair" scheduler these will all get high priority (and good response) because their CPU bandwidth usage will be much smaller than their entitlement and the scheduler will be trying to help them "catch up". So (as you say) they shouldn't be a problem. The one fly in the ointment for linux remains X. I am still, to this moment, completely and utterly stunned at why everyone is trying to find increasingly complex unique ways to manage X when all it needs is more cpu[1]. Now most of these are actually very good ideas about _extra_ features that would be desirable in the long run for linux, but given the ludicrous simplicity of renicing X I cannot fathom why people keep promoting these alternatives. At the time of 2.6.0 coming out we were desparately trying to get half decent interactivity within a reasonable time frame to release 2.6.0 without rewiring the whole scheduler. So I tweaked the crap out of the tunables that were already there[2]. X's needs are more complex than that (from my observations) in that the part of X that processes input doesn't use much CPU but the part that does output can be quite a heavy user of CPU (e.g. do a "ls -lR /" in an xterm and watch X chew up the CPU). At the same time, the part of X that processes input needs quick responsiveness as it's part of the interactive chain where this is less so for the output part. Where X comes unstuck in the current scheduler is that when the output part goes on one of its CPU storms it ceases to look like an interactive task and gets given lower priority. Ironically, this doesn't effect the output part of X but it does effect the input part and is manifest as crappy interactive response. One wonders whether modifying X so that it has two threads: one for output and one for input; that could be scheduled separately might help. I guess it would depend on whether there is insufficient independence between the two halves. I forgot to make my point here and that was that if X could be split in two neither half would need to be reniced. As a very low CPU bandwidth user the input half would get along just fine like the other interactive tasks that you mention. And the output put part isn't adversely effected by not having a boost so it would get along just fine as well and you don't want it having a boost when it's in a CPU storm anyway. Of course, if the interdependence between the two halves is such that the equivalent of priority inversion occurs between the two threads. However, that might be solved by making the division between the two halves on a dimension other than the input/output one. Peter PS I think that the tasks most likely to be adversely effected by X's CPU storms (enough to annoy the user) are audio streamers so when you're doing tests to determine the best nice value for X I suggest that would be a good criterion. Video streamers are also susceptible but glitches in video don't seem to annoy users as much as audio ones. -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
Con Kolivas wrote: s go ahead and think up great ideas for other ways of metering out cpu bandwidth for different purposes, but for X, given the absurd simplicity of renicing, why keep fighting it? Again I reiterate that most users of SD have not found the need to renice X anyway except if they stick to old habits of make -j4 on uniprocessor and the like, and I expect that those on CFS and Nicksched would also have similar experiences. Just plain "make" (no -j2 or -j) is enough to kill interactivity on my 2GHz P-M single-core non-HT machine with SD. But with the very first posted version of CFS by Ingo, I can do "make -j2" no problem and still have a nicely interactive destop. -ml - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
Con Kolivas wrote: s go ahead and think up great ideas for other ways of metering out cpu bandwidth for different purposes, but for X, given the absurd simplicity of renicing, why keep fighting it? Again I reiterate that most users of SD have not found the need to renice X anyway except if they stick to old habits of make -j4 on uniprocessor and the like, and I expect that those on CFS and Nicksched would also have similar experiences. Just plain make (no -j2 or -j) is enough to kill interactivity on my 2GHz P-M single-core non-HT machine with SD. But with the very first posted version of CFS by Ingo, I can do make -j2 no problem and still have a nicely interactive destop. -ml - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
Peter Williams wrote: Con Kolivas wrote: Ok, there are 3 known schedulers currently being promoted as solid replacements for the mainline scheduler which address most of the issues with mainline (and about 10 other ones not currently being promoted). The main way they do this is through attempting to maintain solid fairness. There is enough evidence mounting now from the numerous test cases fixed by much fairer designs that this is the way forward for a general purpose cpu scheduler which is what linux needs. Interactivity of just about everything that needs low latency (ie audio and video players) are easily managed by maintaining low latency between wakeups and scheduling of all these low cpu users. On a fair scheduler these will all get high priority (and good response) because their CPU bandwidth usage will be much smaller than their entitlement and the scheduler will be trying to help them catch up. So (as you say) they shouldn't be a problem. The one fly in the ointment for linux remains X. I am still, to this moment, completely and utterly stunned at why everyone is trying to find increasingly complex unique ways to manage X when all it needs is more cpu[1]. Now most of these are actually very good ideas about _extra_ features that would be desirable in the long run for linux, but given the ludicrous simplicity of renicing X I cannot fathom why people keep promoting these alternatives. At the time of 2.6.0 coming out we were desparately trying to get half decent interactivity within a reasonable time frame to release 2.6.0 without rewiring the whole scheduler. So I tweaked the crap out of the tunables that were already there[2]. X's needs are more complex than that (from my observations) in that the part of X that processes input doesn't use much CPU but the part that does output can be quite a heavy user of CPU (e.g. do a ls -lR / in an xterm and watch X chew up the CPU). At the same time, the part of X that processes input needs quick responsiveness as it's part of the interactive chain where this is less so for the output part. Where X comes unstuck in the current scheduler is that when the output part goes on one of its CPU storms it ceases to look like an interactive task and gets given lower priority. Ironically, this doesn't effect the output part of X but it does effect the input part and is manifest as crappy interactive response. One wonders whether modifying X so that it has two threads: one for output and one for input; that could be scheduled separately might help. I guess it would depend on whether there is insufficient independence between the two halves. I forgot to make my point here and that was that if X could be split in two neither half would need to be reniced. As a very low CPU bandwidth user the input half would get along just fine like the other interactive tasks that you mention. And the output put part isn't adversely effected by not having a boost so it would get along just fine as well and you don't want it having a boost when it's in a CPU storm anyway. Of course, if the interdependence between the two halves is such that the equivalent of priority inversion occurs between the two threads. However, that might be solved by making the division between the two halves on a dimension other than the input/output one. Peter PS I think that the tasks most likely to be adversely effected by X's CPU storms (enough to annoy the user) are audio streamers so when you're doing tests to determine the best nice value for X I suggest that would be a good criterion. Video streamers are also susceptible but glitches in video don't seem to annoy users as much as audio ones. -- Peter Williams [EMAIL PROTECTED] Learning, n. The kind of ignorance distinguishing the studious. -- Ambrose Bierce - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On 4/19/07, Peter Williams [EMAIL PROTECTED] wrote: PS I think that the tasks most likely to be adversely effected by X's CPU storms (enough to annoy the user) are audio streamers so when you're doing tests to determine the best nice value for X I suggest that would be a good criterion. Video streamers are also susceptible but glitches in video don't seem to annoy users as much as audio ones. IMHO audio streamers should use SCHED_FIFO thread for time critical work. I think it's insane to expect the scheduler to figure out that these processes need low latency when they can just be explicit about it. Professional audio software does it already, on Linux as well as other OS... Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thursday 19 April 2007 23:17, Mark Lord wrote: Con Kolivas wrote: s go ahead and think up great ideas for other ways of metering out cpu bandwidth for different purposes, but for X, given the absurd simplicity of renicing, why keep fighting it? Again I reiterate that most users of SD have not found the need to renice X anyway except if they stick to old habits of make -j4 on uniprocessor and the like, and I expect that those on CFS and Nicksched would also have similar experiences. Just plain make (no -j2 or -j) is enough to kill interactivity on my 2GHz P-M single-core non-HT machine with SD. But with the very first posted version of CFS by Ingo, I can do make -j2 no problem and still have a nicely interactive destop. Cool. Then there's clearly a bug with SD that manifests on your machine as it should not have that effect at all (and doesn't on other people's machines). I suggest trying the latest version which fixes some bugs. Thanks. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
Con Kolivas wrote: Ok, there are 3 known schedulers currently being promoted as solid replacements for the mainline scheduler which address most of the issues with mainline (and about 10 other ones not currently being promoted). The main way they do this is through attempting to maintain solid fairness. There is enough evidence mounting now from the numerous test cases fixed by much fairer designs that this is the way forward for a general purpose cpu scheduler which is what linux needs. Interactivity of just about everything that needs low latency (ie audio and video players) are easily managed by maintaining low latency between wakeups and scheduling of all these low cpu users. On a fair scheduler these will all get high priority (and good response) because their CPU bandwidth usage will be much smaller than their entitlement and the scheduler will be trying to help them catch up. So (as you say) they shouldn't be a problem. The one fly in the ointment for linux remains X. I am still, to this moment, completely and utterly stunned at why everyone is trying to find increasingly complex unique ways to manage X when all it needs is more cpu[1]. Now most of these are actually very good ideas about _extra_ features that would be desirable in the long run for linux, but given the ludicrous simplicity of renicing X I cannot fathom why people keep promoting these alternatives. At the time of 2.6.0 coming out we were desparately trying to get half decent interactivity within a reasonable time frame to release 2.6.0 without rewiring the whole scheduler. So I tweaked the crap out of the tunables that were already there[2]. X's needs are more complex than that (from my observations) in that the part of X that processes input doesn't use much CPU but the part that does output can be quite a heavy user of CPU (e.g. do a ls -lR / in an xterm and watch X chew up the CPU). At the same time, the part of X that processes input needs quick responsiveness as it's part of the interactive chain where this is less so for the output part. Where X comes unstuck in the current scheduler is that when the output part goes on one of its CPU storms it ceases to look like an interactive task and gets given lower priority. Ironically, this doesn't effect the output part of X but it does effect the input part and is manifest as crappy interactive response. One wonders whether modifying X so that it has two threads: one for output and one for input; that could be scheduled separately might help. I guess it would depend on whether there is insufficient independence between the two halves. Part of this issue is that giving X a high static priority runs the risk of the CPU hog output part disrupting scheduling of other important tasks. So don't give it too big a boost. So let's hear from the 3 people who generated the schedulers under the spotlight. These are recent snippets and by no means the only time these comments have been said. Without sounding too bold, we do know a thing or two about scheduling. CFS: On Thursday 19 April 2007 16:38, Ingo Molnar wrote: h. How about the following then: default to nice -10 for all (SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ special: root already has disk space reserved to it, root has special memory allocation allowances, etc. I dont see a reason why we couldnt by default make all root tasks have nice -10. This would be instantly loved by sysadmins i suspect ;-) It's worth noting that the -10 mentioned is roughly equivalent (in the old scheduler) to restoring interactive task status to X in those cases where it loses it due to a CPU storm in its output part. (distros that go the extra mile of making Xorg run under non-root could also go another extra one foot to renice that X server to -10.) Nicksched: On Wednesday 18 April 2007 15:00, Nick Piggin wrote: What's wrong with allowing X to get more than it's fair share of CPU time by fiddling with nice levels? That's what they're there for. and Staircase-Deadline: On Thursday 19 April 2007 09:59, Con Kolivas wrote: Remember to renice X to -10 for nicest desktop behaviour :) I'd like to add the EBS scheduler (posted by Aurema Pty Ltd a couple of years back) to this list as it also recommended running X at nice -5 to -10. Also some of the interactive bonus mechanisms in my SPA schedulers could be removed if X was reniced. In fact, with a reniced X the spa_svr (server oriented scheduler which attempts to minimise the time tasks spend on the queue waiting for CPU access and which doesn't have interactive bonuses) might be usable on a work station. [1]The one caveat I can think of is that when you share X sessions across multiple users -with a fair cpu scheduler-, having them all nice 0 also makes the distribution of cpu across the multiple users very even and smooth, without the expense of burning away the other person's cpu time they'd
Re: Renice X for cpu schedulers
Con Kolivas wrote: On Thursday 19 April 2007 23:17, Mark Lord wrote: Con Kolivas wrote: s go ahead and think up great ideas for other ways of metering out cpu bandwidth for different purposes, but for X, given the absurd simplicity of renicing, why keep fighting it? Again I reiterate that most users of SD have not found the need to renice X anyway except if they stick to old habits of make -j4 on uniprocessor and the like, and I expect that those on CFS and Nicksched would also have similar experiences. Just plain make (no -j2 or -j) is enough to kill interactivity on my 2GHz P-M single-core non-HT machine with SD. But with the very first posted version of CFS by Ingo, I can do make -j2 no problem and still have a nicely interactive destop. Cool. Then there's clearly a bug with SD that manifests on your machine as it should not have that effect at all (and doesn't on other people's machines). I suggest trying the latest version which fixes some bugs. SD just doesn't do nearly as good as the stock scheduler, or CFS, here. I'm quite likely one of the few single-CPU/non-HT testers of this stuff. If it should ever get more widely used I think we'd hear a lot more complaints. Cheers - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thursday 19 April 2007, Con Kolivas wrote: [and I snipped a good overview] So yes go ahead and think up great ideas for other ways of metering out cpu bandwidth for different purposes, but for X, given the absurd simplicity of renicing, why keep fighting it? Again I reiterate that most users of SD have not found the need to renice X anyway except if they stick to old habits of make -j4 on uniprocessor and the like, and I expect that those on CFS and Nicksched would also have similar experiences. FWIW folks, I have never touched x's niceness, its running at the default -1 for all of my so-called 'tests', and I have another set to be rebooted to right now. And yes, my kernel makeit script uses -j4 by default, and has used -j8 just for effects, which weren't all that different from what I expected in 'abusing' a UP system that way. The system DID remain usable, not snappy, but usable. Having tried re-nicing X a while back, and having the rest of the system suffer in quite obvious ways for even 1 + or - from its default felt pretty bad from this users perspective. It is my considered opinion (yeah I know, I'm just a leaf in the hurricane of this list) that if X has to be re-niced from the 1 point advantage its had for ages, then something is basicly wrong with the overall scheduling, cpu or i/o, or both in combination. FWIW I'm using cfq for i/o. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Moore's Constant: Everybody sets out to do something, and everybody does something, but no one does what he sets out to do. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thursday 19 April 2007, Mark Lord wrote: Con Kolivas wrote: On Thursday 19 April 2007 23:17, Mark Lord wrote: Con Kolivas wrote: s go ahead and think up great ideas for other ways of metering out cpu bandwidth for different purposes, but for X, given the absurd simplicity of renicing, why keep fighting it? Again I reiterate that most users of SD have not found the need to renice X anyway except if they stick to old habits of make -j4 on uniprocessor and the like, and I expect that those on CFS and Nicksched would also have similar experiences. Just plain make (no -j2 or -j) is enough to kill interactivity on my 2GHz P-M single-core non-HT machine with SD. But with the very first posted version of CFS by Ingo, I can do make -j2 no problem and still have a nicely interactive destop. Cool. Then there's clearly a bug with SD that manifests on your machine as it should not have that effect at all (and doesn't on other people's machines). I suggest trying the latest version which fixes some bugs. SD just doesn't do nearly as good as the stock scheduler, or CFS, here. I found the early SD's much friendlier here, but I also think that at that point I was comparing SD to stock 2.6.21-rc5 and 6, and to say that it sucked would be a slight understatement. I'm quite likely one of the few single-CPU/non-HT testers of this stuff. If it should ever get more widely used I think we'd hear a lot more complaints. I'm in that row of seats too Mark. Someday I have to build a new box, that's all there is to it... -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Lots of folks confuse bad management with destiny. -- Frank Hubbard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On 4/19/07, Con Kolivas [EMAIL PROTECTED] wrote: The one fly in the ointment for linux remains X. I am still, to this moment, completely and utterly stunned at why everyone is trying to find increasingly complex unique ways to manage X when all it needs is more cpu[1]. [...and hence should be reniced] The problem is that X is not unique. There's postgresql, memcached, mysql, db2, a little embedded app I wrote... all of these perform work on behalf of another process. It's just most *noticeable* with X, as pretty much everyone is running that. If we had some way for the scheduler to decide to donate part of a client process's time slice to the server it just spoke to (with an exponential dampening factor -- take 50% from the client, give 25% to the server, toss the rest on the floor), that -- from my naive point of view -- would be a step toward fixing the underlying issue. Or I might be spouting crap, who knows. The problem is real, though, and not limited to X. While I have the floor, thank you, Con, for all your work. Ray - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On 4/19/07, Gene Heskett [EMAIL PROTECTED] wrote: Having tried re-nicing X a while back, and having the rest of the system suffer in quite obvious ways for even 1 + or - from its default felt pretty bad from this users perspective. It is my considered opinion (yeah I know, I'm just a leaf in the hurricane of this list) that if X has to be re-niced from the 1 point advantage its had for ages, then something is basicly wrong with the overall scheduling, cpu or i/o, or both in combination. FWIW I'm using cfq for i/o. I think I just realized why the X server is such a problem. If it gets preempted when it's not actually selecting/polling over a set of fds that includes the input devices, the scheduler doesn't know that it's a good candidate for scheduling when data arrives on those devices. (That's all that any of these dynamic priority heuristics really seem to do -- weight the scheduler towards switching to conspicuously I/O bound tasks when they become runnable, without the forced preemption on lock release that would result from a true priority inheritance mechanism.) One way of looking at this is that fairness-driven scheduling is a poor man's priority ceiling protocol for I/O bound workloads, with the implicit priority of an fd or lock given by how desperately the reader side needs more data in order to accomplish anything. Nice on a task is sort of an indirect way of boosting or dropping the base priority of the fds it commonly waits on. I recognize this is a drastic oversimplification, and possibly even a misrepresentation of the design _intent_; but I think it's fairly accurate in terms of the design _effect_. The event-driven, non-threaded design of the X server makes it particularly vulnerable to non-interactive behavior penalties, which is appropriate to the extent that it's an output device having trouble keeping up with rendering -- in fact, that's exactly the throttling mechanism you need in order to exert back-pressure on the X client. (Trying to exert back-pressure over Linux's local domain sockets seems to be like pushing on a rope, but that's a different problem.) That same event-driven design would prioritize input events just fine -- except the scheduler won't wake the task in order to deliver them, because as far as it's concerned the X server is getting more than enough I/O to keep it busy. It's not only not blocked on the input device, it isn't even selecting on it at the moment that its timeslice expires -- so no amount of poor-man's PCP emulation is going to help. What more negative nice on the X server than on any CPU-bound process seems to do is to put the X server on a hair-trigger, boosting its dynamic priority in a render-limited scenario (on some graphics cards!) just enough to cancel the penalty for non-interactive behavior. It's forced to share _some_ CPU cycles, but nobody else is allowed a long enough timeslice to keep the X server off the CPU (and insensitive to input events) for long. Not terribly efficient in terms of context switch / cache eviction overhead, but certainly friendlier to the PEBCAK (who is clearly putting totally inappropriate load on a single-threaded CPU by running both a local X server and non-SCHED_BATCH compute jobs) than a frozen mouse cursor. So what's the right answer? Not special-casing the X server, that's for sure. If this analysis is correct (and as of now it's pure speculation), any event-driven application that does compute work opportunistically in the absence of user interaction is vulnerable to the same overzealous squelching. I wouldn't design a new application that way, of course -- user interaction belongs in a separate thread on any UNIX-legacy system which assigns priorities to threads of control instead of to patterns of activity. But all sorts of Linux applications have been designed to implicitly elevate artificial throughput benchmarks over user responsiveness -- that has been the UNIX way at least since SVR4, and Linux's history of expensive thread switches prior to NPTL didn't help. If you want responsiveness when the CPU is oversubscribed -- and I for one do, which is one reason why I abandoned the Linux desktop once both Microsoft and Apple figured out how to make hyperthreading work in their favor -- you should probably think about how to get it without rewriting half of userspace. IMHO, dinking around with fairness, as if there were any relationship these days between UIDs or process groups or any other control structure and the work that's trying to flow through the system, is not going to get you there. If this were my problem, I might start by attaching urgency to behavior instead of to thread ID, which demands a scheduler queue built around a data structure with a cheap decrease-key operation. I'd figure out how to propagate this urgency not just along lock chains but also along chains of fds that need flushing (or refilling) -- even if the reader (or writer) got preempted for unrelated reasons.
Re: Renice X for cpu schedulers
On Friday 20 April 2007 04:16, Gene Heskett wrote: On Thursday 19 April 2007, Con Kolivas wrote: [and I snipped a good overview] So yes go ahead and think up great ideas for other ways of metering out cpu bandwidth for different purposes, but for X, given the absurd simplicity of renicing, why keep fighting it? Again I reiterate that most users of SD have not found the need to renice X anyway except if they stick to old habits of make -j4 on uniprocessor and the like, and I expect that those on CFS and Nicksched would also have similar experiences. FWIW folks, I have never touched x's niceness, its running at the default -1 for all of my so-called 'tests', and I have another set to be rebooted to right now. And yes, my kernel makeit script uses -j4 by default, and has used -j8 just for effects, which weren't all that different from what I expected in 'abusing' a UP system that way. The system DID remain usable, not snappy, but usable. Gene, you're agreeing with me. You've shown that you're very happy with a fair distribution of cpu and leaving X at nice 0. Having tried re-nicing X a while back, and having the rest of the system suffer in quite obvious ways for even 1 + or - from its default felt pretty bad from this users perspective. It is my considered opinion (yeah I know, I'm just a leaf in the hurricane of this list) that if X has to be re-niced from the 1 point advantage its had for ages, then something is basicly wrong with the overall scheduling, cpu or i/o, or both in combination. FWIW I'm using cfq for i/o. It's those who want X to have an unfair advantage that want it to do something special. Your agreement that it works fine at nice 0 shows you don't want it to have an unfair advantage. Others who want it to have an unfair advantage _can_ renice it if they desire. But if the cpu scheduler gives X an unfair advantage within the kernel by default then you have _no_ choice. If you leave the choice up to userspace (renice or not) then both parties get their way. If you put it into the kernel only one party wins and there is no way for the Genes (and Cons) of this world to get it back. Your opinion is as valuable as eveyone else's Gene. It is hard to get people to speak on as frightening a playground as the linux kernel mailing list so please do. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Friday 20 April 2007 05:26, Ray Lee wrote: On 4/19/07, Con Kolivas [EMAIL PROTECTED] wrote: The one fly in the ointment for linux remains X. I am still, to this moment, completely and utterly stunned at why everyone is trying to find increasingly complex unique ways to manage X when all it needs is more cpu[1]. [...and hence should be reniced] The problem is that X is not unique. There's postgresql, memcached, mysql, db2, a little embedded app I wrote... all of these perform work on behalf of another process. It's just most *noticeable* with X, as pretty much everyone is running that. If we had some way for the scheduler to decide to donate part of a client process's time slice to the server it just spoke to (with an exponential dampening factor -- take 50% from the client, give 25% to the server, toss the rest on the floor), that -- from my naive point of view -- would be a step toward fixing the underlying issue. Or I might be spouting crap, who knows. The problem is real, though, and not limited to X. While I have the floor, thank you, Con, for all your work. You're welcome and thanks for taking the floor to speak. I would say you have actually agreed with me though. X is not unique, it's just an obvious so let's not design the cpu scheduler around the problem with X. Same goes for every other application. Leaving the choice to hand out differential cpu usage when they seem to need is should be up to the users. The donation idea has been done before in some fashion or other in things like back-boost which Linus himself tried in 2.5.X days. It worked lovely till it did the wrong thing and wreaked havoc. As is shown repeatedly, the workarounds and the tweaks and the bonuses and the decide on who to give advantage to, when done by the cpu scheduler, is also what is its undoing as it can't always get it right. The consequences of getting it wrong on the other hand are disastrous. The cpu scheduler core is a cpu bandwidth and latency proportionator and should be nothing more or less. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Friday 20 April 2007 02:15, Mark Lord wrote: Con Kolivas wrote: On Thursday 19 April 2007 23:17, Mark Lord wrote: Con Kolivas wrote: s go ahead and think up great ideas for other ways of metering out cpu bandwidth for different purposes, but for X, given the absurd simplicity of renicing, why keep fighting it? Again I reiterate that most users of SD have not found the need to renice X anyway except if they stick to old habits of make -j4 on uniprocessor and the like, and I expect that those on CFS and Nicksched would also have similar experiences. Just plain make (no -j2 or -j) is enough to kill interactivity on my 2GHz P-M single-core non-HT machine with SD. But with the very first posted version of CFS by Ingo, I can do make -j2 no problem and still have a nicely interactive destop. Cool. Then there's clearly a bug with SD that manifests on your machine as it should not have that effect at all (and doesn't on other people's machines). I suggest trying the latest version which fixes some bugs. SD just doesn't do nearly as good as the stock scheduler, or CFS, here. I'm quite likely one of the few single-CPU/non-HT testers of this stuff. If it should ever get more widely used I think we'd hear a lot more complaints. You are not really one of the few. A lot of my own work is done on a single core pentium M 1.7Ghz laptop. I am not endowed with truckloads of hardware like all the paid developers are. I recall extreme frustration myself when a developer a few years ago (around 2002) said he couldn't reproduce poor behaviour on his 4GB ram 4 x Xeon machine. Even today if I add up every machine I have in my house and work at my disposal it doesn't amount to that many cpus and that much ram. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On 4/19/07, Con Kolivas [EMAIL PROTECTED] wrote: The cpu scheduler core is a cpu bandwidth and latency proportionator and should be nothing more or less. Not really. The CPU scheduler is (or ought to be) what electric utilities call an economic dispatch mechanism -- a real-time controller whose goal is to service competing demands cost-effectively from a limited supply, without compromising system stability. If you live in the 1960's, coal and nuclear (and a little bit of fig-leaf hydro) are all you have, it takes you twelve hours to bring plants on and off line, and there's no live operational control or pricing signal between you and your customers. So you're stuck running your system at projected peak + operating margin, dumping excess power as waste heat most of the time, and browning or blacking people out willy-nilly when there's excess demand. Maybe you get to trade off shedding the loads with the worst transmission efficiency against degrading the customers with the most tolerance for brownouts (or the least regulatory clout). That's life without modern economic dispatch. If you live in 2007, natural gas and (outside the US) better control over nuclear plants give you more ability to ramp supply up and down with demand on something like a 15-minute cycle. Better yet, you can store a little energy in the grid to smooth out instantaneous demand fluctuations; if you're lucky, you also have enough fast-twitch hydro (thanks, Canada!) that you can run your coal and lame-ass nuclear very close to base load even when gas is expensive, and even pump water back uphill when demand dips. (Coal is nasty stuff and a worse contributor by far to radiation exposure than nuclear generation; but on current trends it's going to last a lot longer than oil and gas, and it's a lot easier to stockpile next to the generator.) Best of all, you have industrial customers who will trade you live control (within limits) over when and how much power they take in return for a lower price per unit energy. Some of them will even dump power back into the grid when you ask them to. So now the biggest challenge in making supply and demand meet (in the short term) is to damp all the different ways that a control feedback path might result in an oscillation -- or in runaway pricing. Because there's always some asshole greedhead who will gamble with system stability in order to game the pricing mechanism. Lots of 'em, if you're in California and your legislature is so dumb, or so bought, that they let the asshole greedheads design the whole system so they can game it to the max. (But that's a whole 'nother rant.) Embedded systems are already in 2007, and the mainline Linux scheduler frankly sucks on them, because it thinks it's back in the 1960's with a fixed supply and captive demand, pissing away CPU bandwidth as waste heat. Not to say it's an easy problem; even academics with a dozen publications in this area don't seem to be able to model energy usage to the nearest big O, let alone design a stable economic dispatch engine. But it helps to acknowledge what the problem is: even in a 1960's raised-floor screaming-air-conditioners screw-the-power-bill machine room, you can't actually run a half-decent CPU flat out any more without burning it to a crisp. You can act ignorant and let the PMIC brown you out when it has to. Or you can start coping in mainline the way that organizations big enough (and smart enough) to feel the heat in their pocketbooks do in their pet kernels. (Boo on Google for not sharing, and props to IBM for doing their damnedest.) And guess what? The system will actually get simpler, and stabler, and faster, and easier to maintain, because it'll be based on a real theory of operation with equations and things instead of a bunch of opaque, undocumented shotgun heuristics. This hypothetical economic-dispatch scheduler will still _have_ heuristics, of course -- you can't begin to model a modern CPU accurately on-line. But they will be contained in _data_ rather than _code_, and issues of numerical stability will be separated cleanly from the rule set. You'll be able to characterize the rule set's domain of stability, given a conservative set of assumptions about the feedback paths in the system under control, with the sort of techniques they teach in the engineering schools that none of us (me included) seem to have attended. (I went to school thinking I was going to be a physicist. Wishful thinking -- but I was young and stupid. What's your excuse? ;-) OK, it feels better to have that off my chest. Apologies to those readers -- doubtless the vast majority of LKML, including everyone else in this thread -- for whom it's irrelevant, pseudo-learned pontification with no patch attached. And my sincere thanks to Ingo, Con, and really everyone else CC'ed, without whom Linux wouldn't be as good as it is (really quite good, all things considered) and wouldn't contribute as much as it does to
Re: Renice X for cpu schedulers
Con Kolivas wrote: You're welcome and thanks for taking the floor to speak. I would say you have actually agreed with me though. X is not unique, it's just an obvious so let's not design the cpu scheduler around the problem with X. Same goes for every other application. Leaving the choice to hand out differential cpu usage when they seem to need is should be up to the users. The donation idea has been done before in some fashion or other in things like back-boost which Linus himself tried in 2.5.X days. It worked lovely till it did the wrong thing and wreaked havoc. nod I know. I came to the party late, or I would have played with it back then. Perhaps you could correct me, but it seems his back-boost didn't do any dampening, which means the system could get into nasty capture scenarios, where two processes bouncing messages back and forth could take over the scheduler and starve out the rest. It seems pretty obvious in hind-sight that something without exponential dampening would allow feedback loops. Regardless, perhaps we are in agreement. I just don't like the idea of having to guess how much work postgresql is going to be doing on my client processes' behalf. Worse, I don't necessarily want it to have that -10 priority when it's going and updating statistics or whatnot, or any other housekeeping activity that shouldn't make a noticeable impact on the rest of the system. Worst, I'm leery of the idea that if I get its nice level wrong, that I'm going to be affecting the overall throughput of the server. All of which are only hypothetical worries, granted. Anyway, I'll shut up now. Thanks again for stickin' with it. Ray - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thursday 19 April 2007 12:15, Mark Lord wrote: Con Kolivas wrote: On Thursday 19 April 2007 23:17, Mark Lord wrote: Con Kolivas wrote: s go ahead and think up great ideas for other ways of metering out cpu bandwidth for different purposes, but for X, given the absurd simplicity of renicing, why keep fighting it? Again I reiterate that most users of SD have not found the need to renice X anyway except if they stick to old habits of make -j4 on uniprocessor and the like, and I expect that those on CFS and Nicksched would also have similar experiences. Just plain make (no -j2 or -j) is enough to kill interactivity on my 2GHz P-M single-core non-HT machine with SD. But with the very first posted version of CFS by Ingo, I can do make -j2 no problem and still have a nicely interactive destop. Cool. Then there's clearly a bug with SD that manifests on your machine as it should not have that effect at all (and doesn't on other people's machines). I suggest trying the latest version which fixes some bugs. SD just doesn't do nearly as good as the stock scheduler, or CFS, here. I'm quite likely one of the few single-CPU/non-HT testers of this stuff. If it should ever get more widely used I think we'd hear a lot more complaints. amd64 UP here. SD with several makes running works just fine. Ed Tomlinson - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thu, 19 Apr 2007, Ed Tomlinson wrote: SD just doesn't do nearly as good as the stock scheduler, or CFS, here. I'm quite likely one of the few single-CPU/non-HT testers of this stuff. If it should ever get more widely used I think we'd hear a lot more complaints. amd64 UP here. SD with several makes running works just fine. The thing is, it probably depends *heavily* on just how much work the X server ends up doing. Fast video hardware? The X server doesn't need to busy-wait much. Not a lot of eye-candy? The X server is likely fast enough even with a slower card that it still gets sufficient CPU time and isn't getting dinged by any balancing. DRI vs non-DRI? Which window manager (maybe some of the user-visible lags come from there..) etc etc. Anyway, I'd ask people to look a bit at the current *regressions* instead of spending all their time on something that won't even be merged before 2.6.21 is released, and we thus have some mroe pressing issues. Please? Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On 4/19/07, Lee Revell [EMAIL PROTECTED] wrote: IMHO audio streamers should use SCHED_FIFO thread for time critical work. I think it's insane to expect the scheduler to figure out that these processes need low latency when they can just be explicit about it. Professional audio software does it already, on Linux as well as other OS... It is certainly true that SCHED_FIFO is currently necessary in the layers of an audio application lying closest to the hardware, if you don't want to throw a monstrous hardware ring buffer at the problem. See the alsa-devel archives for a patch to aplay (sched_setscheduler plus some cleanups) that converts it from unsafe at any speed (on a non-RT kernel) to a rock-solid 18ms round trip from PCM in to PCM out. (The hardware and driver aren't terribly exotic for an SoC, and the measurement was done with aplay -C | aplay -P -- on a not-particularly-tuned CONFIG_PREEMPT kernel with a 12ms+ peak scheduling latency according to cyclictest. A similar test via /dev/dsp, done through a slightly modified OSS emulation layer to the same driver, measures at 40ms and is probably tuned too conservatively.) Note that SCHED_FIFO may be less necessary on an -rt kernel, but I haven't had that option on the embedded hardware I've been working with lately. Ingo, please please pretty please pick a -stable branch one of these days and provide a git repo with -rt integrated against that branch. Then I could port our chip support to it -- all of which will be GPLed after the impending code review -- after which I might have a prayer of strong-arming our chip vendor into porting their WiFi driver onto -rt. It's really a much more interesting scheduler use case than make -j200 under X, because it's a best-effort SCHED_BATCH-ish load that wants to be temporally clustered for power management reasons. (Believe it or not, a stable -rt branch with a clock-scaling-aware scheduler is the one thing that might lead to this major WiFi vendor's GPLing their driver core. They're starting to see the light on the biz dev side, and the nature of the devices their chip will go in makes them somewhat less concerned about the regulatory fig leaf aspect of a closed-source driver; but they would have to port off of the third-party real-time executive embedded within the driver, and mainline's task and timer granularity won't cut it. I can't even get more detail about _why_ it won't cut it unless there's some remotely supportable -rt base they could port to.) But I think SCHED_FIFO on a chain of tasks is fundamentally not the right way to handle low audio latency. The object with a low latency requirement isn't the task, it's the device. When it's starting to get urgent to deliver more data to the device, the task that it's waiting on should slide up the urgency scale; and if it's waiting on something else, that something else should slide up the scale; and so forth. Similarly, responding to user input is urgent; so when user input is available (by whatever mechanism), the task that's waiting for it should slide up the urgency scale, etc. In practice, you probably don't want to burden desktop Linux with priority inheritance where you don't have to. Priority queues with algorithmically efficient decrease-key operations (Fibonacci heaps and their ilk) are complicated to implement and have correspondingly high constant factors. (However, a sufficiently clever heuristic for assigning quasi-static task priorities would usually short-circuit the priority cascade; if you can keep N small in the tasks-with-unpredictable-priority queue, you can probably use a simpler flavor with O(log N) decrease-key. Ask someone who knows more about data structures than I do.) More importantly, non-real-time application coders aren't very smart about grouping data structure accesses on one side or the other of a system call that is likely to release a lock and let something else run, flushing application data out of cache. (Kernel coders aren't always smart about this either; see LKML threads a few weeks ago about racy, and cache-stall-prone, f_pos handling in VFS.) So switching tasks immediately on lock release is usually the wrong thing to do if letting the task run a little longer would allow it to reach a point where it has to block anyway. Anyway, I already described the urgency-driven strategy to the extent that I've thought it out, elsewhere in this thread. I only held this draft back because I wanted to double-check my latency measurements. Cheers, - Michael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thursday 19 April 2007, Con Kolivas wrote: On Friday 20 April 2007 04:16, Gene Heskett wrote: On Thursday 19 April 2007, Con Kolivas wrote: [and I snipped a good overview] So yes go ahead and think up great ideas for other ways of metering out cpu bandwidth for different purposes, but for X, given the absurd simplicity of renicing, why keep fighting it? Again I reiterate that most users of SD have not found the need to renice X anyway except if they stick to old habits of make -j4 on uniprocessor and the like, and I expect that those on CFS and Nicksched would also have similar experiences. FWIW folks, I have never touched x's niceness, its running at the default -1 for all of my so-called 'tests', and I have another set to be rebooted to right now. And yes, my kernel makeit script uses -j4 by default, and has used -j8 just for effects, which weren't all that different from what I expected in 'abusing' a UP system that way. The system DID remain usable, not snappy, but usable. Gene, you're agreeing with me. You've shown that you're very happy with a fair distribution of cpu and leaving X at nice 0. I was quite happy till Ingo's first patch came out, and it was even better, but I over-wrote it, and we're still figuring out just exactly what the magic twanger was that made it all click for me. OTOH, I don't think that patch passed muster with Mike G., either. We have obviously different workloads, and critical points in them. Having tried re-nicing X a while back, and having the rest of the system suffer in quite obvious ways for even 1 + or - from its default felt pretty bad from this users perspective. It is my considered opinion (yeah I know, I'm just a leaf in the hurricane of this list) that if X has to be re-niced from the 1 point advantage its had for ages, then something is basicly wrong with the overall scheduling, cpu or i/o, or both in combination. FWIW I'm using cfq for i/o. It's those who want X to have an unfair advantage that want it to do something special. Your agreement that it works fine at nice 0 shows you don't want it to have an unfair advantage. Others who want it to have an unfair advantage _can_ renice it if they desire. But if the cpu scheduler gives X an unfair advantage within the kernel by default then you have _no_ choice. If you leave the choice up to userspace (renice or not) then both parties get their way. If you put it into the kernel only one party wins and there is no way for the Genes (and Cons) of this world to get it back. Your opinion is as valuable as eveyone else's Gene. It is hard to get people to speak on as frightening a playground as the linux kernel mailing list so please do. In the FWIW category, htop has always told me that x is running at -1, not zero. Now, I have NDI where this is actually set at, so I'd have to ask stupid questions here if I did wanna play with it. Which I really don't, the last time I tried to -5 x, kde got a whole lot LESS responsive. But heck, 2.6.2 was freshly minted then too and I've long since forgot how I went about that unless I used htop to change it, the most likely scenario that I can picture at this late date. As for speaking my mind, yes, and I've been slapped down a few times, as much because I do a lot of bitching and microscopic amounts of patch submission. The only patch I ever submitted was for something in the floppy driver, way back in the middle of 2.2 days, rejected because I didn't know how to use the tools correctly. I didn't, so it was a shrug and my feelings weren't hurt. Some see that as an unbalanced set of books and I'm aware of it. OTOH, I think I do a pretty good job of playing the canary here, and that should be worth something if for no other reason than I can turn into a burr under somebodies saddle when things go all aglay. But I figure if its happening to me, then if I don't fuss, and that gotcha gets into a distro kernel, there are gonna be a hell of a lot more folks than me trying to grab the microphone. BTW, I'm glad you are feeling well enough to get into this again. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) There cannot be a crisis next week. My schedule is already full. -- Henry Kissinger - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thursday 19 April 2007, Con Kolivas wrote: On Friday 20 April 2007 04:16, Gene Heskett wrote: On Thursday 19 April 2007, Con Kolivas wrote: [and I snipped a good overview] So yes go ahead and think up great ideas for other ways of metering out cpu bandwidth for different purposes, but for X, given the absurd simplicity of renicing, why keep fighting it? Again I reiterate that most users of SD have not found the need to renice X anyway except if they stick to old habits of make -j4 on uniprocessor and the like, and I expect that those on CFS and Nicksched would also have similar experiences. FWIW folks, I have never touched x's niceness, its running at the default -1 for all of my so-called 'tests', and I have another set to be rebooted to right now. And yes, my kernel makeit script uses -j4 by default, and has used -j8 just for effects, which weren't all that different from what I expected in 'abusing' a UP system that way. The system DID remain usable, not snappy, but usable. Gene, you're agreeing with me. You've shown that you're very happy with a fair distribution of cpu and leaving X at nice 0. I was quite happy till Ingo's first patch came out, and it was even better, but I over-wrote it, and we're still figuring out just exactly what the magic twanger was that made it all click for me. OTOH, I don't think that patch passed muster with Mike G., either. We have obviously different workloads, and critical points in them. Having tried re-nicing X a while back, and having the rest of the system suffer in quite obvious ways for even 1 + or - from its default felt pretty bad from this users perspective. It is my considered opinion (yeah I know, I'm just a leaf in the hurricane of this list) that if X has to be re-niced from the 1 point advantage its had for ages, then something is basicly wrong with the overall scheduling, cpu or i/o, or both in combination. FWIW I'm using cfq for i/o. It's those who want X to have an unfair advantage that want it to do something special. Your agreement that it works fine at nice 0 shows you don't want it to have an unfair advantage. Others who want it to have an unfair advantage _can_ renice it if they desire. But if the cpu scheduler gives X an unfair advantage within the kernel by default then you have _no_ choice. If you leave the choice up to userspace (renice or not) then both parties get their way. If you put it into the kernel only one party wins and there is no way for the Genes (and Cons) of this world to get it back. Your opinion is as valuable as eveyone else's Gene. It is hard to get people to speak on as frightening a playground as the linux kernel mailing list so please do. In the FWIW category, htop has always told me that x is running at -1, not zero. Now, I have NDI where this is actually set at, so I'd have to ask stupid questions here if I did wanna play with it. Which I really don't, the last time I tried to -5 x, kde got a whole lot LESS responsive. But heck, 2.6.2 was freshly minted then too and I've long since forgot how I went about that unless I used htop to change it, the most likely scenario that I can picture at this late date. As for speaking my mind, yes, and I've been slapped down a few times, as much because I do a lot of bitching and microscopic amounts of patch submission. The only patch I ever submitted was for something in the floppy driver, way back in the middle of 2.2 days, rejected because I didn't know how to use the tools correctly. I didn't, so it was a shrug and my feelings weren't hurt. Some see that as an unbalanced set of books and I'm aware of it. OTOH, I think I do a pretty good job of playing the canary here, and that should be worth something if for no other reason than I can turn into a burr under somebodies saddle when things go all aglay. But I figure if its happening to me, then if I don't fuss, and that gotcha gets into a distro kernel, there are gonna be a hell of a lot more folks than me trying to grab the microphone. BTW, I'm glad you are feeling well enough to get into this again. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) There cannot be a crisis next week. My schedule is already full. -- Henry Kissinger - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thu, Apr 19, 2007 at 09:17:25AM -0400, Mark Lord wrote: Con Kolivas wrote: s go ahead and think up great ideas for other ways of metering out cpu bandwidth for different purposes, but for X, given the absurd simplicity of renicing, why keep fighting it? Again I reiterate that most users of SD have not found the need to renice X anyway except if they stick to old habits of make -j4 on uniprocessor and the like, and I expect that those on CFS and Nicksched would also have similar experiences. Just plain make (no -j2 or -j) is enough to kill interactivity on my 2GHz P-M single-core non-HT machine with SD. Is this with or without X reniced? But with the very first posted version of CFS by Ingo, I can do make -j2 no problem and still have a nicely interactive destop. How well does cfs run if you have the granularity set to something like 30ms (3000)? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thu, Apr 19, 2007 at 12:26:03PM -0700, Ray Lee wrote: On 4/19/07, Con Kolivas [EMAIL PROTECTED] wrote: The one fly in the ointment for linux remains X. I am still, to this moment, completely and utterly stunned at why everyone is trying to find increasingly complex unique ways to manage X when all it needs is more cpu[1]. [...and hence should be reniced] The problem is that X is not unique. There's postgresql, memcached, mysql, db2, a little embedded app I wrote... all of these perform work on behalf of another process. It's just most *noticeable* with X, as pretty much everyone is running that. But for most of those apps, we don't actually care if they do fairly degrade in performance as other loads on the system ramp up. However the user prefers X to be given priority in these situations. Whether that is the design of X, x clients, or the human condition really doesn't matter two hoots to the scheduler. If we had some way for the scheduler to decide to donate part of a client process's time slice to the server it just spoke to (with an exponential dampening factor -- take 50% from the client, give 25% to the server, toss the rest on the floor), that -- from my naive point of view -- would be a step toward fixing the underlying issue. Or I might be spouting crap, who knows. Firstly, lots of clients in your list are remote. X usually isn't. However for X, a syscall or something to donate time might not be such a bad idea... but given a couple of X clients and a server against a parallel make, this is probably just going to make the clients slow down as well without giving enough priority to the server. X isn't special so much because it does work on behalf of others (as you said, lots of things do that). It is special simply because we _want_ rendering to have priority of the CPU (if you shifed CPU intensive rendering to the clients, you'd most likely want to give them priority to); nice, right? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Fri, 2007-04-20 at 08:47 +1000, Con Kolivas wrote: It's those who want X to have an unfair advantage that want it to do something special. I hope you're not lumping me in with those. If X + client had been able to get their fair share and do so in the low latency manner they need, I would have been one of the carrots instead of being the stick. -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thu, Apr 19, 2007 at 06:32:15PM -0700, Michael K. Edwards wrote: But I think SCHED_FIFO on a chain of tasks is fundamentally not the right way to handle low audio latency. The object with a low latency requirement isn't the task, it's the device. When it's starting to get urgent to deliver more data to the device, the task that it's waiting on should slide up the urgency scale; and if it's waiting on something else, that something else should slide up the scale; and so forth. Similarly, responding to user input is urgent; so when user input is available (by whatever mechanism), the task that's waiting for it should slide up the urgency scale, etc. DSP operations like, particularly with digital synthesis, tend to max the CPU doing vector operations on as many processors as it can get a hold of. In a live performance critical application, it's important to be able to deliver a protected amount of CPU to a thread doing that work as well as response to external input such as controllers, etc... In practice, you probably don't want to burden desktop Linux with priority inheritance where you don't have to. Priority queues with algorithmically efficient decrease-key operations (Fibonacci heaps and their ilk) are complicated to implement and have correspondingly high constant factors. (However, a sufficiently clever heuristic for assigning quasi-static task priorities would usually short-circuit the priority cascade; if you can keep N small in the tasks-with-unpredictable-priority queue, you can probably use a simpler flavor with O(log N) decrease-key. Ask someone who knows more about data structures than I do.) These are app issue and not really somethings that's mutable in kernel per se with regard to the -rt patch. More importantly, non-real-time application coders aren't very smart about grouping data structure accesses on one side or the other of a system call that is likely to release a lock and let something else run, flushing application data out of cache. (Kernel coders aren't always smart about this either; see LKML threads a few weeks ago about racy, and cache-stall-prone, f_pos handling in VFS.) So switching tasks immediately on lock release is usually the wrong thing to do if letting the task run a little longer would allow it to reach a point where it has to block anyway. I have Solaris style adaptive locks in my tree with my lockstat patch under -rt. I've also modified my lockstat patch to track readers correctly now with rwsem and the like to see where the single reader limitation in the rtmutex blows it. So far I've seen less than 10 percent of in-kernel contention events actually worth spinning on and the rest of the stats imply that the mutex owner in question is either preempted or blocked on something else. I've been trying to get folks to try this on a larger machine than my 2x AMD64 box so that I there is more data regarding Linux contention and overschedulling in -rt. Anyway, I already described the urgency-driven strategy to the extent that I've thought it out, elsewhere in this thread. I only held this draft back because I wanted to double-check my latency measurements. bill - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Renice X for cpu schedulers
On Thu, Apr 19, 2007 at 05:20:53PM -0700, Michael K. Edwards wrote: Embedded systems are already in 2007, and the mainline Linux scheduler frankly sucks on them, because it thinks it's back in the 1960's with a fixed supply and captive demand, pissing away CPU bandwidth as waste heat. Not to say it's an easy problem; even academics with a dozen publications in this area don't seem to be able to model energy usage to the nearest big O, let alone design a stable economic dispatch engine. But it helps to acknowledge what the problem is: even in a 1960's raised-floor screaming-air-conditioners screw-the-power-bill machine room, you can't actually run a half-decent CPU flat out any more without burning it to a crisp. stupid. What's your excuse? ;-) It's now possible to QoS significant parts of the kernel since we now have a deadline mechanism in place. In the original 2.4 kernel, TimeSys's irq-thread allowed for the processing of skbuffs in a thread under a CPU reservation run category which was use to provide QoS I believe. This basic mechanish can now be generalized to many place in the kernel and put it under scheduler control. It's just a matter of who and when somebody is going take on this task. bill - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/