Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Bill Davidsen

Ingo Molnar wrote:

* Davide Libenzi <[EMAIL PROTECTED]> wrote:


The same user nicing two different multi-threaded processes would 
expect a predictable CPU distribution too. [...]


i disagree that the user 'would expect' this. Some users might. Others 
would say: 'my 10-thread rendering engine is more important than a 
1-thread job because it's using 10 threads for a reason'. And the CFS 
feedback so far strengthens this point: the default behavior of treating 
the thread as a single scheduling (and CPU time accounting) unit works 
pretty well on the desktop.


If by desktop you mean "one and only one interactive user," that's true. 
On a shared machine it's very hard to preserve any semblance of fairness 
when one user gets far more than another, based not on the value of what 
they're doing but the tools they use to to it.


think about it in another, 'kernel policy' way as well: we'd like to 
_encourage_ more parallel user applications. Hurting them by accounting 
all threads together sends the exact opposite message.


Why is that? There are lots of things which are intrinsically single 
threaded, how are we hurting hurting multi-threaded applications by 
refusing to give them more CPU than an application running on behalf of 
another user? By accounting all threads together we encourage writing an 
application in the most logical way. Threads are a solution, not a goal 
in themselves.


[...] Doing that efficently (the old per-cpu run-queue is pretty nice 
from many POVs) is the real challenge.


yeah.

Ingo



--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Bill Davidsen

Linus Torvalds wrote:


On Wed, 18 Apr 2007, Matt Mackall wrote:

Why is X special? Because it does work on behalf of other processes?
Lots of things do this. Perhaps a scheduler should focus entirely on
the implicit and directed wakeup matrix and optimizing that
instead[1].


I 100% agree - the perfect scheduler would indeed take into account where 
the wakeups come from, and try to "weigh" processes that help other 
processes make progress more. That would naturally give server processes 
more CPU power, because they help others


I don't believe for a second that "fairness" means "give everybody the 
same amount of CPU". That's a totally illogical measure of fairness. All 
processes are _not_ created equal.


That said, even trying to do "fairness by effective user ID" would 
probably already do a lot. In a desktop environment, X would get as much 
CPU time as the user processes, simply because it's in a different 
protection domain (and that's really what "effective user ID" means: it's 
not about "users", it's really about "protection domains").


And "fairness by euid" is probably a hell of a lot easier to do than 
trying to figure out the wakeup matrix.


You probably want to consider the controlling terminal as well...  do 
you want to have people starting 'at' jobs competing on equal footing 
with people typing at a terminal? I'm not offering an answer, just 
raising the question.


And for some database applications, everyone in a group may connect with 
the same login-id, then do sub authorization to the database 
application. euid may be an issue there as well.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Bill Davidsen

Matt Mackall wrote:

On Wed, Apr 18, 2007 at 08:37:11AM +0200, Nick Piggin wrote:



[2] It's trivial to construct two or more perfectly reasonable and
desirable definitions of fairness that are mutually incompatible.

Probably not if you use common sense, and in the context of a replacement
for the 2.6 scheduler.


Ok, trivial example. You cannot allocate equal CPU time to
processes/tasks and simultaneously allocate equal time to thread
groups. Is it common sense that a heavily-threaded app should be able
to get hugely more CPU than a well-written app? No. I don't want Joe's
stupid Java app to make my compile crawl.

On the other hand, if my heavily threaded app is, say, a voicemail
server serving 30 customers, I probably want it to get 30x the CPU of
my gzip job.

Matt, you tickled a thought... on one hand we have a single user running 
a threaded application, and it ideally should get the same total CPU as 
a user running a single thread process. On the other hand we have a 
threaded application, call it sendmail, nnrpd, httpd, bind, whatever. In 
that case each thread is really providing service for an independent 
user, and should get an appropriate share of the CPU.


Perhaps the solution is to add a means for identifying server processes, 
by capability, or by membership in a "server" group, or by having the 
initiating process set some flag at exec() time. That doesn't 
necessarily solve problems, but it may provide more information to allow 
them to be soluble.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Willy Tarreau
Hi Björn,

On Sat, Apr 21, 2007 at 01:29:41PM +0200, Björn Steinbrink wrote:
> Hi,
> 
> On 2007.04.21 13:07:48 +0200, Willy Tarreau wrote:
> > > another thing i noticed: when using a -y larger then 1, then the window 
> > > title (at least on Metacity) overlaps and thus the ocbench tasks have 
> > > different X overhead and get scheduled a bit assymetrically as well. Is 
> > > there any way to start them up title-less perhaps?
> > 
> > It has annoyed me a bit too, but I'm no X developer at all, so I don't
> > know at all if it's possible nor how to do this. I know that my window
> > manager even adds title bars to xeyes, so I'm not sure we can do this.
> > 
> > Right now, I've added a "-B " argument so that you can
> > skip the size of your title bar. It's dirty but it's not my main job :-)
> 
> Here's a small patch that makes the windows unmanaged, which also causes
> ocbench to start up quite a bit faster on my box with larger number of
> windows, so it probably avoids some window manager overhead, which is a
> nice side-effect.

Excellent ! I've just merged it but conditionned it to a "-u" argument
so that we can keep previous behaviour (moving the windows is useful
especially when there are few of them).

So the new version 0.5 is available there :

  http://linux.1wt.eu/sched/

I believe it's the last one for today as I'm late on some work.

Thanks !
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Björn Steinbrink
Hi,

On 2007.04.21 13:07:48 +0200, Willy Tarreau wrote:
> > another thing i noticed: when using a -y larger then 1, then the window 
> > title (at least on Metacity) overlaps and thus the ocbench tasks have 
> > different X overhead and get scheduled a bit assymetrically as well. Is 
> > there any way to start them up title-less perhaps?
> 
> It has annoyed me a bit too, but I'm no X developer at all, so I don't
> know at all if it's possible nor how to do this. I know that my window
> manager even adds title bars to xeyes, so I'm not sure we can do this.
> 
> Right now, I've added a "-B " argument so that you can
> skip the size of your title bar. It's dirty but it's not my main job :-)

Here's a small patch that makes the windows unmanaged, which also causes
ocbench to start up quite a bit faster on my box with larger number of
windows, so it probably avoids some window manager overhead, which is a
nice side-effect.

Björn

--

diff -u ocbench-0.4/ocbench.c ocbench-0.4.1/ocbench.c
--- ocbench-0.4/ocbench.c   2007-04-21 13:05:55.0 +0200
+++ ocbench-0.4.1/ocbench.c 2007-04-21 13:24:01.0 +0200
@@ -213,6 +213,7 @@
 int main(int argc, char *argv[]) {
   Window root;
   XGCValues gc_setup;
+  XSetWindowAttributes swa;
   int c, index, proc_x, proc_y, pid;
   int *pcount[] = {&HOUR, &MIN, &SEC};
   char *p, *q;
@@ -342,8 +343,11 @@
   alloc_color(fg, &orange);
   alloc_color(fg2, &blue);
 
-  win = XCreateSimpleWindow(dpy, root, X, Y, width, height, 0, 
-   black.pixel, black.pixel);
+  swa.override_redirect = 1;
+
+  win = XCreateWindow(dpy, root, X, Y, width, height, 0,
+   CopyFromParent, InputOutput, CopyFromParent,
+   CWOverrideRedirect, &swa);
   XStoreName(dpy, win, "ocbench");
 
   XSelectInput(dpy, win, ExposureMask | StructureNotifyMask);
Only in ocbench-0.4.1/: .README.swp
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Willy Tarreau
Hi Ingo,

I'm replying to your 3 mails at once.

On Sat, Apr 21, 2007 at 12:45:22PM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar <[EMAIL PROTECTED]> wrote:
> 
> > > It could become a useful scheduler benchmark !
> > 
> > i just tried ocbench-0.3, and it is indeed very nice!

So as you've noticed just one minute after I put it there, I've updated
the tool and renamed it ocbench. For others, it's here :

http://linux.1wt.eu/sched/

Useful news are proper positionning, automatic forking, and more visible
progress with smaller windows, which eat less of X ressources.

Now about your idea of making it report information on stdout, I don't
know if it would be that useful. There are many other command line tools
for this purpose. This one's goal is to eat CPU with a visual control of
CPU distribution only.

Concerning your idea of using a signal to resync every process, I agree
with you. Running at 8x8 shows a noticeable offset. I've just uploaded
v0.4 which supports your idea of sending USR1.

> another thing i noticed: when using a -y larger then 1, then the window 
> title (at least on Metacity) overlaps and thus the ocbench tasks have 
> different X overhead and get scheduled a bit assymetrically as well. Is 
> there any way to start them up title-less perhaps?

It has annoyed me a bit too, but I'm no X developer at all, so I don't
know at all if it's possible nor how to do this. I know that my window
manager even adds title bars to xeyes, so I'm not sure we can do this.

Right now, I've added a "-B " argument so that you can
skip the size of your title bar. It's dirty but it's not my main job :-)

Thanks for your feedback
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> > It could become a useful scheduler benchmark !
> 
> i just tried ocbench-0.3, and it is indeed very nice!

another thing i noticed: when using a -y larger then 1, then the window 
title (at least on Metacity) overlaps and thus the ocbench tasks have 
different X overhead and get scheduled a bit assymetrically as well. Is 
there any way to start them up title-less perhaps?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> > The modified code is here :
> > 
> >   http://linux.1wt.eu/sched/orbitclock-0.2bench.tgz
> > 
> > What is interesting to note is that it's easy to make X work a lot 
> > (99%) by using 0 as the sleeping time, and it's easy to make the 
> > process work a lot by using large values for the running time 
> > associated with very low values (or 0) for the sleep time.
> > 
> > Ah, and it supports -geometry ;-)
> > 
> > It could become a useful scheduler benchmark !
> 
> i just tried ocbench-0.3, and it is indeed very nice!

another thing i just noticed: when starting up lots of ocbench tasks 
(say -x 6 -y 6) then they (naturally) get started up with an already 
visible offset. It's nice to observe the startup behavior, but after 
that it would be useful if it were possible to 'resync' all those 
ocbench tasks so that they start at the same offset. [ Maybe a "killall 
-SIGUSR1 ocbench" could serve this purpose, without having to 
synchronize the tasks explicitly? ]

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Ingo Molnar

* Willy Tarreau <[EMAIL PROTECTED]> wrote:

> I hacked it a bit to make it accept two parameters :
>   -R  : time spent burning CPU cycles at each round
>   -S  : time spent getting a rest
> 
> It now advances what it thinks is a second at each iteration, so that 
> it makes it easy to compare its progress with other instances (there 
> are seconds, minutes and hours, so it's easy to visually count up to 
> around 43200).
> 
> The modified code is here :
> 
>   http://linux.1wt.eu/sched/orbitclock-0.2bench.tgz
> 
> What is interesting to note is that it's easy to make X work a lot 
> (99%) by using 0 as the sleeping time, and it's easy to make the 
> process work a lot by using large values for the running time 
> associated with very low values (or 0) for the sleep time.
> 
> Ah, and it supports -geometry ;-)
> 
> It could become a useful scheduler benchmark !

i just tried ocbench-0.3, and it is indeed very nice!

Would it make sense perhaps to (optionally?) also log some sort of 
periodic text feedback to stdout, about the quality of scheduling? Maybe 
even a 'run this many seconds' option plus a summary text output at the 
end (which would output measured runtime, observed longest/smallest 
latency and standard deviation of latencies maybe)? That would make it 
directly usable both as a 'consistency of X app scheduling' visual test 
and as an easily shareable benchmark with an objective numeric result as 
well.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Ingo Molnar

* Bill Davidsen <[EMAIL PROTECTED]> wrote:

> All of my testing has been on desktop machines, although in most cases 
> they were really loaded desktops which had load avg 10..100 from time 
> to time, and none were low memory machines. Up to CFS v3 I thought 
> nicksched was my winner, now CFSv3 looks better, by not having 
> stumbles under stupid loads.

nice! I hope CFSv4 kept that good tradition too ;)

> I have not tested:
>   1 - server loads, nntp, smtp, etc
>   2 - low memory machines
>   3 - uniprocessor systems
> 
> I think this should be done before drawing conclusions. Or if someone 
> has tried this, perhaps they would report what they saw. People are 
> talking about smoothness, but not how many pages per second come out 
> of their overloaded web server.

i tested heavily swapping systems. (make -j50 workloads easily trigger 
that) I also tested UP systems and a handful of SMP systems. I have also 
tested massive_intr.c which i believe is an indicator of how fairly CPU 
time is distributed between partly sleeping partly running server 
threads. But i very much agree that diverse feedback is sought and 
welcome, both from those who are happy with the current scheduler and 
those who are unhappy about it.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Nick Piggin
On Fri, Apr 20, 2007 at 04:47:27PM -0400, Bill Davidsen wrote:
> Ingo Molnar wrote:
> 
> >( Lets be cautious though: the jury is still out whether people actually 
> >  like this more than the current approach. While CFS feedback looks 
> >  promising after a whopping 3 days of it being released [ ;-) ], the 
> >  test coverage of all 'fairness centric' schedulers, even considering 
> >  years of availability is less than 1% i'm afraid, and that < 1% was 
> >  mostly self-selecting. )
> >
> All of my testing has been on desktop machines, although in most cases 
> they were really loaded desktops which had load avg 10..100 from time to 
> time, and none were low memory machines. Up to CFS v3 I thought 
> nicksched was my winner, now CFSv3 looks better, by not having stumbles 
> under stupid loads.

What base_timeslice were you using for nicksched, and what HZ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-20 Thread Bill Davidsen

Ingo Molnar wrote:

( Lets be cautious though: the jury is still out whether people actually 
  like this more than the current approach. While CFS feedback looks 
  promising after a whopping 3 days of it being released [ ;-) ], the 
  test coverage of all 'fairness centric' schedulers, even considering 
  years of availability is less than 1% i'm afraid, and that < 1% was 
  mostly self-selecting. )


All of my testing has been on desktop machines, although in most cases 
they were really loaded desktops which had load avg 10..100 from time to 
time, and none were low memory machines. Up to CFS v3 I thought 
nicksched was my winner, now CFSv3 looks better, by not having stumbles 
under stupid loads.


I have not tested:
  1 - server loads, nntp, smtp, etc
  2 - low memory machines
  3 - uniprocessor systems

I think this should be done before drawing conclusions. Or if someone 
has tried this, perhaps they would report what they saw. People are 
talking about smoothness, but not how many pages per second come out of 
their overloaded web server.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-20 Thread Bill Davidsen

Mike Galbraith wrote:

On Tue, 2007-04-17 at 05:40 +0200, Nick Piggin wrote:

On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote:
 

Yup, and progress _is_ happening now, quite rapidly.

Progress as in progress on Ingo's scheduler. I still don't know how we'd
decide when to replace the mainline scheduler or with what.

I don't think we can say Ingo's is better than the alternatives, can we?


No, that would require massive performance testing of all alternatives.


If there is some kind of bakeoff, then I'd like one of Con's designs to
be involved, and mine, and Peter's...


The trouble with a bakeoff is that it's pretty darn hard to get people
to test in the first place, and then comes weighting the subjective and
hard performance numbers.  If they're close in numbers, do you go with
the one which starts the least flamewars or what?

Here we disagree... I picked a scheduler not by running benchmarks, but 
by running loads which piss me off with the mainline scheduler. And then 
I ran the other schedulers for a while to find the things, normal things 
I do, which resulted in bad behavior. And when I found one which had (so 
far) no such cases I called it my winner, but I haven't tested it under 
server load, so I can't begin to say it's "the best."


What we need is for lots of people to run every scheduler in real life, 
and do "worst case analysis" by finding the cases which cause bad 
behavior. And if there were a way to easily choose another scheduler, 
call it plugable, modular, or Russian Roulette, people who found a worst 
case would report it (aka bitch about it) and try another. But the 
average user is better able to boot with an option like "sched=cfs" (or 
sc, or nick, or ...) than to patch and build a kernel. So if we don't 
get easily switched schedulers people will not test nearly as well.


The best scheduler isn't the one 2% faster than the rest, it's the one 
with the fewest jackpot cases where it sucks. And if the mainline had 
multiple schedulers this testing would get done, authors would get more 
reports and have a better chance of fixing corner cases.


Note that we really need multiple schedulers to make people happy, 
because fairness is not the most desirable behavior on all machines, and 
adding knobs probably isn't the answer. I want a server to degrade 
gently, I want my desktop to show my movie and echo my typing, and if 
that's hard on compiles or the file transfer, so be it. Con doesn't want 
to compromise his goals, I agree but want to have an option if I don't 
share them.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Peter Williams

William Lee Irwin III wrote:

William Lee Irwin III wrote:

I'd further recommend making priority levels accessible to kernel threads
that are not otherwise accessible to processes, both above and below
user-available priority levels. Basically, if you can get SCHED_RR and
SCHED_FIFO to coexist as "intimate scheduler classes," then a SCHED_KERN
scheduler class can coexist with SCHED_OTHER in like fashion, but with
availability of higher and lower priorities than any userspace process
is allowed, and potentially some differing scheduling semantics. In such
a manner nonessential background processing intended not to ever disturb
userspace can be given priorities appropriate to it (perhaps even con's
SCHED_IDLEPRIO would make sense), and other, urgent processing can be
given priority over userspace altogether.


On Thu, Apr 19, 2007 at 09:50:19PM +1000, Peter Williams wrote:
This is sounding very much like System V Release 4 (and descendants) 
except that they call it SCHED_SYS and also give SCHED_NORMAL tasks that 
are in system mode dynamic priorities in the SCHED_SYS range (to avoid 
priority inversion, I believe).


Descriptions of that are probably where I got the idea (hurrah for OS
textbooks).


And long term background memory.  :-)


It makes a fair amount of sense.


Yes.  You could also add a SCHED_IA in between SCHED_SYS and SCHED_OTHER 
(a la Solaris) for interactive tasks.  The only problem is how to get a 
task into SCHED_IA without root privileges.



Not sure what the take on
the specific precedent is. The only content here is expanding the
priority range with ranges above and below for the exclusive use of
ultra-privileged tasks, so it's really trivial. Actually it might be so
trivial it should just be some permission checks in the SCHED_OTHER
renicing code.


Perhaps.

Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread William Lee Irwin III
William Lee Irwin III wrote:
>> I'd further recommend making priority levels accessible to kernel threads
>> that are not otherwise accessible to processes, both above and below
>> user-available priority levels. Basically, if you can get SCHED_RR and
>> SCHED_FIFO to coexist as "intimate scheduler classes," then a SCHED_KERN
>> scheduler class can coexist with SCHED_OTHER in like fashion, but with
>> availability of higher and lower priorities than any userspace process
>> is allowed, and potentially some differing scheduling semantics. In such
>> a manner nonessential background processing intended not to ever disturb
>> userspace can be given priorities appropriate to it (perhaps even con's
>> SCHED_IDLEPRIO would make sense), and other, urgent processing can be
>> given priority over userspace altogether.

On Thu, Apr 19, 2007 at 09:50:19PM +1000, Peter Williams wrote:
> This is sounding very much like System V Release 4 (and descendants) 
> except that they call it SCHED_SYS and also give SCHED_NORMAL tasks that 
> are in system mode dynamic priorities in the SCHED_SYS range (to avoid 
> priority inversion, I believe).

Descriptions of that are probably where I got the idea (hurrah for OS
textbooks). It makes a fair amount of sense. Not sure what the take on
the specific precedent is. The only content here is expanding the
priority range with ranges above and below for the exclusive use of
ultra-privileged tasks, so it's really trivial. Actually it might be so
trivial it should just be some permission checks in the SCHED_OTHER
renicing code.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Mike Galbraith
On Thu, 2007-04-19 at 09:55 -0700, Davide Libenzi wrote:
> On Thu, 19 Apr 2007, Mike Galbraith wrote:
> 
> > On Thu, 2007-04-19 at 09:09 +0200, Ingo Molnar wrote:
> > > * Mike Galbraith <[EMAIL PROTECTED]> wrote:
> > > 
> > > > With a heavily reniced X (perfectly fine), that should indeed solve my 
> > > > daily usage pattern nicely (always need godmode for shells, but not 
> > > > for mozilla and ilk. 50/50 split automatic without renice of entire 
> > > > gui)
> > > 
> > > how about the first-approximation solution i suggested in the previous 
> > > mail: to add a per UID default nice level? (With this default defaulting 
> > > to '-10' for all root-owned processes, and defaulting to '0' for 
> > > everything else.) That would solve most of the current CFS regressions 
> > > at hand.
> > 
> > That would make my kernel builds etc interfere with my other self's
> > surfing and whatnot.  With it by EUID, when I'm surfing or whatnot, the
> > X portion of my Joe-User activity pushes the compile portion of root
> > down in bandwidth utilization automagically, which is exactly the right
> > thing, because the root me in not as important as the Joe-User me using
> > the GUI at that time.  If the idea of X disturbing root upsets some,
> > they can move X to another UID.  Generally, it seems perfect for here.
> 
> Now guys, I did not follow the whole lengthy and feisty thread, but IIRC 
> Con's scheduler has been attacked because, among other argouments, was 
> requiring X to be reniced. This happened like a month ago IINM.

I don't object to renicing X if you want it to receive _more_ than it's
fair share. I do object to having to renice X in order for it to _get_
it's fair share.  That's what I attacked.

> I did not have time to look at Con's scheduler, and I only had a brief 
> look at Ingo's one (looks very promising IMO, but so was the initial O(1) 
> post before all the corner-cases fixes went in).
> But this is not a about technical merit, this is about applying the same 
> rules of judgement to others as well to ourselves.

I'm running the same tests with CFS that I ran for RSDL/SD.  It falls
short in one key area (to me) in that X+client cannot yet split my box
50/50 with two concurrent tasks.  In the CFS case, renicing both X and
client does work, but it should not be necessary IMHO.  With RSDL/SD
renicing didn't help.

> We went from a "renicing X to -10 is bad because the scheduler should 
> be able to correctly handle the problem w/out additional external plugs" 
> to a totally opposite "let's renice -10 X, the whole SCHED_NORMAL kthreads 
> class, on top of all the tasks owned by root" [1].
> >From a spectator POV like myself in this case, this looks rather "unfair".

Well, for me, the renicing I mentioned above is only interesting as a
way to improve long term fairness with schedulers with no history.

I found Linus' EUID idea intriguing in that by putting the server
together with a steady load in one 'fair' domain, and clients in
another, X can, if prioritized to empower it to do so, modulate the
steady load in it's domain (but can't starve it!), the clients modulate
X, and the steady load gets it all when X and clients are idle.  The
nice level of X determines to what _extent_ X can modulate the constant
load rather like a mixer slider.  The synchronous (I'm told) nature of
X/client then becomes kind of an asset to the desktop instead of a
liability.

The specific case I was thinking about is the X+Gforce test where both
RSDL and CFS fail to provide fairness (as defined by me;).  X and Gforce
are mostly not concurrent.  The make -j2 I put them up against are
mostly concurrent.  I don't call giving 1/3 of my CPU to X+Client fair
at _all_, but that's what you'll get if your fairstick of the instant
generally can't see the fourth competing task.  Seemed pretty cool to me
because it creates the missing connection between client and server,
though also likely complicated (and maybe full of perils, who knows).

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Willy Tarreau
On Fri, Apr 20, 2007 at 02:52:38AM +0300, Jan Knutar wrote:
> On Thursday 19 April 2007 18:18, Ingo Molnar wrote:
> > * Willy Tarreau <[EMAIL PROTECTED]> wrote:
> > > You can certainly script it with -geometry. But it is the wrong
> > > application for this matter, because you benchmark X more than
> > > glxgears itself. What would be better is something like a line
> > > rotating 360 degrees and doing some short stuff between each
> > > degree, so that X is not much sollicitated, but the CPU would be
> > > spent more on the processes themselves.
> >
> > at least on my setup glxgears goes via DRI/DRM so there's no X
> > scheduling inbetween at all, and the visual appearance of glxgears is
> > a direct function of its scheduling.
> 
> How much of the subjective interactiveness-feel of the desktop is at the 
> mercy of the X server's scheduling and not the cpu scheduler?

probably a lot. Hence the reason why I wanted something visually noticeable
but using far less X resources than glxgears. The modified orbitclock is
perfect IMHO.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Jan Knutar
On Thursday 19 April 2007 18:18, Ingo Molnar wrote:
> * Willy Tarreau <[EMAIL PROTECTED]> wrote:
> > You can certainly script it with -geometry. But it is the wrong
> > application for this matter, because you benchmark X more than
> > glxgears itself. What would be better is something like a line
> > rotating 360 degrees and doing some short stuff between each
> > degree, so that X is not much sollicitated, but the CPU would be
> > spent more on the processes themselves.
>
> at least on my setup glxgears goes via DRI/DRM so there's no X
> scheduling inbetween at all, and the visual appearance of glxgears is
> a direct function of its scheduling.

How much of the subjective interactiveness-feel of the desktop is at the 
mercy of the X server's scheduling and not the cpu scheduler?

I've noticed that video playback is significantly smoother and resistant 
to other load, when using MPlayer's opengl output, especially if 
"heavy" programs are running at the same time. Especially firefox and 
ksysguard seem to have found a way to cause video through Xv to look 
annoyingly jittery.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Willy Tarreau
On Thu, Apr 19, 2007 at 05:18:03PM +0200, Ingo Molnar wrote:
> 
> * Willy Tarreau <[EMAIL PROTECTED]> wrote:
> 
> > You can certainly script it with -geometry. But it is the wrong 
> > application for this matter, because you benchmark X more than 
> > glxgears itself. What would be better is something like a line 
> > rotating 360 degrees and doing some short stuff between each degree, 
> > so that X is not much sollicitated, but the CPU would be spent more on 
> > the processes themselves.
> 
> at least on my setup glxgears goes via DRI/DRM so there's no X 
> scheduling inbetween at all, and the visual appearance of glxgears is a 
> direct function of its scheduling.

OK, I thought that somethink looking like a clock would be useful, especially
if we could tune the amount of CPU spent per task instead of being limited by
graphics drivers.

I searched freashmeat for a clock and found "orbitclock" by Jeremy Weatherford,
which was exactly what I was looking for :
  - small
  - C only
  - X11 only
  - needed less than 5 minutes and no knowledge of X11 for the complete hack !
  => Kudos to its author, sincerely !

I hacked it a bit to make it accept two parameters :
  -R  : time spent burning CPU cycles at each round
  -S  : time spent getting a rest

It now advances what it thinks is a second at each iteration, so that it makes
it easy to compare its progress with other instances (there are seconds,
minutes and hours, so it's easy to visually count up to around 43200).

The modified code is here :

  http://linux.1wt.eu/sched/orbitclock-0.2bench.tgz

What is interesting to note is that it's easy to make X work a lot (99%) by
using 0 as the sleeping time, and it's easy to make the process work a lot
by using large values for the running time associated with very low values
(or 0) for the sleep time.

Ah, and it supports -geometry ;-)

It could become a useful scheduler benchmark !

Have fun !
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Bernd Eckenfels
In article <[EMAIL PROTECTED]> you wrote:
> Top (VCPU maybe?)
>User
>Process
>Thread

The problem with that is, that not all Schedulers might work on the User
level. You can think of Batch/Job, Parent, Group, Session or namespace
level. That would be IMHO a generic Top, with no need for a level above.

Greetings
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Gene Heskett
On Thursday 19 April 2007, Ingo Molnar wrote:
>* Willy Tarreau <[EMAIL PROTECTED]> wrote:
>> You can certainly script it with -geometry. But it is the wrong
>> application for this matter, because you benchmark X more than
>> glxgears itself. What would be better is something like a line
>> rotating 360 degrees and doing some short stuff between each degree,
>> so that X is not much sollicitated, but the CPU would be spent more on
>> the processes themselves.
>
>at least on my setup glxgears goes via DRI/DRM so there's no X
>scheduling inbetween at all, and the visual appearance of glxgears is a
>direct function of its scheduling.
>
>   Ingo

That doesn't appear to be the case here Ingo. Even when I know the rest of the 
system is lagged, glxgears continues to show very smooth and steady movement.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yow!  I just went below the poverty line!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Gene Heskett
On Thursday 19 April 2007, Ingo Molnar wrote:
>* Willy Tarreau <[EMAIL PROTECTED]> wrote:
>> Good idea. The machine I'm typing from now has 1000 scheddos running
>> at +19, and 12 gears at nice 0. [...]
>>
>> From time to time, one of the 12 aligned gears will quickly perform a
>> full quarter of round while others slowly turn by a few degrees. In
>> fact, while I don't know this process's CPU usage pattern, there's
>> something useful in it : it allows me to visually see when process
>> accelerate/decelerate. [...]
>
>cool idea - i have just tried this and it rocks - you can easily see the
>'nature' of CPU time distribution just via visual feedback. (Is there
>any easy way to start up 12 glxgears fully aligned, or does one always
>have to mouse around to get them into proper position?)
>
>btw., i am using another method to quickly judge X's behavior: i started
>the 'snowflakes' plugin in Beryl on Fedora 7, which puts a nice smooth
>opengl-rendered snow fall on the desktop background. That gives me an
>idea about how well X is scheduling under various workloads, without
>having to instrument it explicitly.
>
yes, its a  cute idea, till you switch away from that screen to check progress 
on something else, like to compose this message.

===
5913 frames in 5.0 seconds = 1182.499 FPS
6238 frames in 5.0 seconds = 1247.556 FPS
11380 frames in 5.0 seconds = 2275.905 FPS
10691 frames in 5.0 seconds = 2138.173 FPS
8707 frames in 5.0 seconds = 1741.305 FPS
10669 frames in 5.0 seconds = 2133.708 FPS
11392 frames in 5.0 seconds = 2278.037 FPS
11379 frames in 5.0 seconds = 2275.711 FPS
11310 frames in 5.0 seconds = 2261.861 FPS
11386 frames in 5.0 seconds = 2277.081 FPS
11292 frames in 5.0 seconds = 2258.353 FPS
11352 frames in 5.0 seconds = 2270.297 FPS
11415 frames in 5.0 seconds = 2282.886 FPS
11406 frames in 5.0 seconds = 2281.037 FPS
11483 frames in 5.0 seconds = 2296.533 FPS
11510 frames in 5.0 seconds = 2301.883 FPS
11123 frames in 5.0 seconds = 2224.266 FPS
8980 frames in 5.0 seconds = 1795.861 FPS
===
The over 2000fps reports were while I was either looking at htop, or starting 
this message, both on different screens.  htop said it was using 95+ % of the 
cpu even when its display was going to /dev/null.  So 'Kewl' doesn't seem to 
get us apples to apples numbers we can go to the window and bet 
win-place-show based on them alone.

FWIW, running the nvidia-9755 drivers here.

So if we are going to use that as a judgement operator, it obviously needs 
some intelligently applied scaling before they are worth more than a 
subjective feel is.

>   Ingo
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [EMAIL PROTECTED]
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
The confusion of a staff member is measured by the length of his memos.
-- New York Times, Jan. 20, 1981
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Davide Libenzi
On Thu, 19 Apr 2007, Mike Galbraith wrote:

> On Thu, 2007-04-19 at 09:09 +0200, Ingo Molnar wrote:
> > * Mike Galbraith <[EMAIL PROTECTED]> wrote:
> > 
> > > With a heavily reniced X (perfectly fine), that should indeed solve my 
> > > daily usage pattern nicely (always need godmode for shells, but not 
> > > for mozilla and ilk. 50/50 split automatic without renice of entire 
> > > gui)
> > 
> > how about the first-approximation solution i suggested in the previous 
> > mail: to add a per UID default nice level? (With this default defaulting 
> > to '-10' for all root-owned processes, and defaulting to '0' for 
> > everything else.) That would solve most of the current CFS regressions 
> > at hand.
> 
> That would make my kernel builds etc interfere with my other self's
> surfing and whatnot.  With it by EUID, when I'm surfing or whatnot, the
> X portion of my Joe-User activity pushes the compile portion of root
> down in bandwidth utilization automagically, which is exactly the right
> thing, because the root me in not as important as the Joe-User me using
> the GUI at that time.  If the idea of X disturbing root upsets some,
> they can move X to another UID.  Generally, it seems perfect for here.

Now guys, I did not follow the whole lengthy and feisty thread, but IIRC 
Con's scheduler has been attacked because, among other argouments, was 
requiring X to be reniced. This happened like a month ago IINM.
I did not have time to look at Con's scheduler, and I only had a brief 
look at Ingo's one (looks very promising IMO, but so was the initial O(1) 
post before all the corner-cases fixes went in).
But this is not a about technical merit, this is about applying the same 
rules of judgement to others as well to ourselves.
We went from a "renicing X to -10 is bad because the scheduler should 
be able to correctly handle the problem w/out additional external plugs" 
to a totally opposite "let's renice -10 X, the whole SCHED_NORMAL kthreads 
class, on top of all the tasks owned by root" [1].
>From a spectator POV like myself in this case, this looks rather "unfair".



[1] I think, before and now, that that's more a duck tape patch than a 
real solution. OTOH if the "solution" is gonna be another maze of 
macros and heuristics filled with pretty bad corner cases, I may 
prefer the former.


- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Davide Libenzi
On Thu, 19 Apr 2007, Ingo Molnar wrote:

> i disagree that the user 'would expect' this. Some users might. Others 
> would say: 'my 10-thread rendering engine is more important than a 
> 1-thread job because it's using 10 threads for a reason'. And the CFS 
> feedback so far strengthens this point: the default behavior of treating 
> the thread as a single scheduling (and CPU time accounting) unit works 
> pretty well on the desktop.
> 
> think about it in another, 'kernel policy' way as well: we'd like to 
> _encourage_ more parallel user applications. Hurting them by accounting 
> all threads together sends the exact opposite message.

There are counter argouments too. Like, not every user knows if a certain 
process is MT or not. I agree though that doing accounting and fairness at 
a depth lower then USER is messy, and not only for performance.


- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Ingo Molnar

* Willy Tarreau <[EMAIL PROTECTED]> wrote:

> You can certainly script it with -geometry. But it is the wrong 
> application for this matter, because you benchmark X more than 
> glxgears itself. What would be better is something like a line 
> rotating 360 degrees and doing some short stuff between each degree, 
> so that X is not much sollicitated, but the CPU would be spent more on 
> the processes themselves.

at least on my setup glxgears goes via DRI/DRM so there's no X 
scheduling inbetween at all, and the visual appearance of glxgears is a 
direct function of its scheduling.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Willy Tarreau
Hi Ingo,

On Thu, Apr 19, 2007 at 11:01:44AM +0200, Ingo Molnar wrote:
> 
> * Willy Tarreau <[EMAIL PROTECTED]> wrote:
> 
> > Good idea. The machine I'm typing from now has 1000 scheddos running 
> > at +19, and 12 gears at nice 0. [...]
> 
> > From time to time, one of the 12 aligned gears will quickly perform a 
> > full quarter of round while others slowly turn by a few degrees. In 
> > fact, while I don't know this process's CPU usage pattern, there's 
> > something useful in it : it allows me to visually see when process 
> > accelerate/decelerate. [...]
> 
> cool idea - i have just tried this and it rocks - you can easily see the 
> 'nature' of CPU time distribution just via visual feedback. (Is there 
> any easy way to start up 12 glxgears fully aligned, or does one always 
> have to mouse around to get them into proper position?)

-- Replying quickly, I'm short in time --

You can certainly script it with -geometry. But it is the wrong application
for this matter, because you benchmark X more than glxgears itself. What would
be better is something like a line rotating 360 degrees and doing some short
stuff between each degree, so that X is not much sollicitated, but the CPU
would be spent more on the processes themselves.

Benchmarking interactions between X and multiple clients is a completely
different test IMHO. Glxgears is between those two, making it inappropriate
for scheduler tuning.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Peter Williams

William Lee Irwin III wrote:

* Andrew Morton <[EMAIL PROTECTED]> wrote:
Yes, there are potential compatibility problems.  Example: a machine 
with 100 busy httpd processes and suddenly a big gzip starts up from 
console or cron.

[...]

On Thu, Apr 19, 2007 at 08:38:10AM +0200, Ingo Molnar wrote:
h. How about the following then: default to nice -10 for all 
(SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ 
special: root already has disk space reserved to it, root has special 
memory allocation allowances, etc. I dont see a reason why we couldnt by 
default make all root tasks have nice -10. This would be instantly loved 
by sysadmins i suspect ;-)
(distros that go the extra mile of making Xorg run under non-root could 
also go another extra one foot to renice that X server to -10.)


I'd further recommend making priority levels accessible to kernel threads
that are not otherwise accessible to processes, both above and below
user-available priority levels. Basically, if you can get SCHED_RR and
SCHED_FIFO to coexist as "intimate scheduler classes," then a SCHED_KERN
scheduler class can coexist with SCHED_OTHER in like fashion, but with
availability of higher and lower priorities than any userspace process
is allowed, and potentially some differing scheduling semantics. In such
a manner nonessential background processing intended not to ever disturb
userspace can be given priorities appropriate to it (perhaps even con's
SCHED_IDLEPRIO would make sense), and other, urgent processing can be
given priority over userspace altogether.

I believe root's default priority can be adjusted in userspace as
things now stand somewhere in /etc/ but I'm not sure of the specifics.
Word is somewhere in /etc/security/limits.conf


This is sounding very much like System V Release 4 (and descendants) 
except that they call it SCHED_SYS and also give SCHED_NORMAL tasks that 
are in system mode dynamic priorities in the SCHED_SYS range (to avoid 
priority inversion, I believe).


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])

2007-04-19 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> > I think a better approach would be to keep track of the rightmost 
> > entry, set the key to the rightmost's key +1 and then simply insert 
> > it there.
> 
> yeah. I had that implemented at a stage but was trying to be too 
> clever for my own good ;-)

i have fixed it via the patch below. (I'm using rb_last() because that 
way the normal scheduling codepaths are not burdened with the 
maintainance of a rightmost entry.)

Ingo

---
 kernel/sched.c  |3 ++-
 kernel/sched_fair.c |   24 +---
 2 files changed, 15 insertions(+), 12 deletions(-)

Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -3806,7 +3806,8 @@ asmlinkage long sys_sched_yield(void)
schedstat_inc(rq, yld_cnt);
if (rq->nr_running == 1)
schedstat_inc(rq, yld_act_empty);
-   current->sched_class->yield_task(rq, current);
+   else
+   current->sched_class->yield_task(rq, current);
 
/*
 * Since we are going to call schedule() anyway, there's
Index: linux/kernel/sched_fair.c
===
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -275,21 +275,23 @@ static void dequeue_task_fair(struct rq 
  */
 static void yield_task_fair(struct rq *rq, struct task_struct *p)
 {
+   struct rb_node *entry;
+   struct task_struct *last;
+
dequeue_task_fair(rq, p);
p->on_rq = 0;
+
/*
-* Temporarily insert at the last position of the tree:
+* Temporarily insert at the last position of the tree.
+* The key will be updated back to (near) its old value
+* when the task gets scheduled.
 */
-   p->fair_key = LLONG_MAX;
+   entry = rb_last(&rq->tasks_timeline);
+   last = rb_entry(entry, struct task_struct, run_node);
+
+   p->fair_key = last->fair_key + 1;
__enqueue_task_fair(rq, p);
p->on_rq = 1;
-
-   /*
-* Update the key to the real value, so that when all other
-* tasks from before the rightmost position have executed,
-* this task is picked up again:
-*/
-   p->fair_key = rq->fair_clock - p->wait_runtime + p->nice_offset;
 }
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])

2007-04-19 Thread Ingo Molnar

* Esben Nielsen <[EMAIL PROTECTED]> wrote:

> >+/*
> >+ * Temporarily insert at the last position of the tree:
> >+ */
> >+p->fair_key = LLONG_MAX;
> >+__enqueue_task_fair(rq, p);
> > p->on_rq = 1;
> >+
> >+/*
> >+ * Update the key to the real value, so that when all other
> >+ * tasks from before the rightmost position have executed,
> >+ * this task is picked up again:
> >+ */
> >+p->fair_key = rq->fair_clock - p->wait_runtime + p->nice_offset;
> 
> I don't think it safe to change the key after inserting the element in 
> the tree. You end up with an unsorted tree giving where new entries 
> end up in wrong places "randomly".

yeah, indeed. I hoped that once this rightmost entry is removed (as soon 
as it gets scheduled next time) the tree goes back to a correct shape, 
but that's not the case - the left sub-tree and the right sub-tree is 
merged by the rbtree code with the assumption that the entry had a 
correct key.

> I think a better approach would be to keep track of the rightmost 
> entry, set the key to the rightmost's key +1 and then simply insert it 
> there.

yeah. I had that implemented at a stage but was trying to be too clever 
for my own good ;-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])

2007-04-19 Thread Esben Nielsen



On Wed, 18 Apr 2007, Ingo Molnar wrote:



* Christian Hesse <[EMAIL PROTECTED]> wrote:


Hi Ingo and all,

On Friday 13 April 2007, Ingo Molnar wrote:

as usual, any sort of feedback, bugreports, fixes and suggestions are
more than welcome,


I just gave CFS a try on my system. From a user's point of view it
looks good so far. Thanks for your work.


you are welcome!


However I found a problem: When trying to suspend a system patched
with suspend2 2.2.9.11 it hangs with "doing atomic copy". Pressing the
ESC key results in a message that it tries to abort suspend, but then
still hangs.


i took a quick look at suspend2 and it makes some use of yield().
There's a bug in CFS's yield code, i've attached a patch that should fix
it, does it make any difference to the hang?

Ingo

Index: linux/kernel/sched_fair.c
===
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -264,15 +264,26 @@ static void dequeue_task_fair(struct rq

/*
 * sched_yield() support is very simple via the rbtree, we just
- * dequeue and enqueue the task, which causes the task to
- * roundrobin to the end of the tree:
+ * dequeue the task and move it to the rightmost position, which
+ * causes the task to roundrobin to the end of the tree.
 */
static void requeue_task_fair(struct rq *rq, struct task_struct *p)
{
dequeue_task_fair(rq, p);
p->on_rq = 0;
-   enqueue_task_fair(rq, p);
+   /*
+* Temporarily insert at the last position of the tree:
+*/
+   p->fair_key = LLONG_MAX;
+   __enqueue_task_fair(rq, p);
p->on_rq = 1;
+
+   /*
+* Update the key to the real value, so that when all other
+* tasks from before the rightmost position have executed,
+* this task is picked up again:
+*/
+   p->fair_key = rq->fair_clock - p->wait_runtime + p->nice_offset;


I don't think it safe to change the key after inserting the element in the 
tree. You end up with an unsorted tree giving where new entries end up in 
wrong places "randomly".
I think a better approach would be to keep track of the rightmost entry, 
set the key to the rightmost's key +1 and then simply insert it there.


Esben




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Ingo Molnar

* Willy Tarreau <[EMAIL PROTECTED]> wrote:

> Good idea. The machine I'm typing from now has 1000 scheddos running 
> at +19, and 12 gears at nice 0. [...]

> From time to time, one of the 12 aligned gears will quickly perform a 
> full quarter of round while others slowly turn by a few degrees. In 
> fact, while I don't know this process's CPU usage pattern, there's 
> something useful in it : it allows me to visually see when process 
> accelerate/decelerate. [...]

cool idea - i have just tried this and it rocks - you can easily see the 
'nature' of CPU time distribution just via visual feedback. (Is there 
any easy way to start up 12 glxgears fully aligned, or does one always 
have to mouse around to get them into proper position?)

btw., i am using another method to quickly judge X's behavior: i started 
the 'snowflakes' plugin in Beryl on Fedora 7, which puts a nice smooth 
opengl-rendered snow fall on the desktop background. That gives me an 
idea about how well X is scheduling under various workloads, without 
having to instrument it explicitly.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Nick Piggin
On Thu, Apr 19, 2007 at 08:38:10AM +0200, Ingo Molnar wrote:
> 
> * Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > > And yes, by fairly, I mean fairly among all threads as a base 
> > > resource class, because that's what Linux has always done
> > 
> > Yes, there are potential compatibility problems.  Example: a machine 
> > with 100 busy httpd processes and suddenly a big gzip starts up from 
> > console or cron.
> > 
> > Under current kernels, that gzip will take ages and the httpds will 
> > take a 1% slowdown, which may well be exactly the behaviour which is 
> > desired.
> > 
> > If we were to schedule by UID then the gzip suddenly gets 50% of the 
> > CPU and those httpd's all take a 50% hit, which could be quite 
> > serious.
> > 
> > That's simple to fix via nicing, but people have to know to do that, 
> > and there will be a transition period where some disruption is 
> > possible.
> 
> h. How about the following then: default to nice -10 for all 
> (SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ 
> special: root already has disk space reserved to it, root has special 
> memory allocation allowances, etc. I dont see a reason why we couldnt by 
> default make all root tasks have nice -10. This would be instantly loved 
> by sysadmins i suspect ;-)

I have no problem with doing fancy new fairness classes and things.

But considering that we _need_ to have per-thread fairness and that
is also what the current scheduler has and what we need to do well for
obvious reasons, the best path to take is to get per-thread scheduling
up to a point where it is able to replace the current scheduler, then
look at more complex things after that.
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Ingo Molnar

* Davide Libenzi <[EMAIL PROTECTED]> wrote:

> > That's one reason why i dont think it's necessarily a good idea to 
> > group-schedule threads, we dont really want to do a per thread group 
> > percpu_alloc().
> 
> I still do not have clear how much overhead this will bring into the 
> table, but I think (like Linus was pointing out) the hierarchy should 
> look like:
> 
> Top (VCPU maybe?)
> User
> Process
> Thread
> 
> The "run_queue" concept (and data) that now is bound to a CPU, need to be 
> replicated in:
> 
> ROOT <- VCPUs add themselves here
> VCPU <- USERs add themselves here
> USER <- PROCs add themselves here
> PROC <- THREADs add themselves here
> THREAD (ultimate fine grained scheduling unit)
> 
> So ROOT, VCPU, USER and PROC will have their own "run_queue". Picking 
> up a new task would mean:
> 
> VCPU = ROOT->lookup();
> USER = VCPU->lookup();
> PROC = USER->lookup();
> THREAD = PROC->lookup();
> 
> Run-time statistics should propagate back the other way around.

yeah, but this looks quite bad from an overhead POV ... i think we can 
do alot simpler to solve X and kernel threads prioritization.

> > In fact for threads the _reverse_ problem exists, threaded apps tend 
> > to _strive_ for more performance - hence their desperation of using 
> > the threaded programming model to begin with ;) (just think of media 
> > playback apps which are typically multithreaded)
> 
> The same user nicing two different multi-threaded processes would 
> expect a predictable CPU distribution too. [...]

i disagree that the user 'would expect' this. Some users might. Others 
would say: 'my 10-thread rendering engine is more important than a 
1-thread job because it's using 10 threads for a reason'. And the CFS 
feedback so far strengthens this point: the default behavior of treating 
the thread as a single scheduling (and CPU time accounting) unit works 
pretty well on the desktop.

think about it in another, 'kernel policy' way as well: we'd like to 
_encourage_ more parallel user applications. Hurting them by accounting 
all threads together sends the exact opposite message.

> [...] Doing that efficently (the old per-cpu run-queue is pretty nice 
> from many POVs) is the real challenge.

yeah.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread William Lee Irwin III
* Andrew Morton <[EMAIL PROTECTED]> wrote:
>> Yes, there are potential compatibility problems.  Example: a machine 
>> with 100 busy httpd processes and suddenly a big gzip starts up from 
>> console or cron.
[...]

On Thu, Apr 19, 2007 at 08:38:10AM +0200, Ingo Molnar wrote:
> h. How about the following then: default to nice -10 for all 
> (SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ 
> special: root already has disk space reserved to it, root has special 
> memory allocation allowances, etc. I dont see a reason why we couldnt by 
> default make all root tasks have nice -10. This would be instantly loved 
> by sysadmins i suspect ;-)
> (distros that go the extra mile of making Xorg run under non-root could 
> also go another extra one foot to renice that X server to -10.)

I'd further recommend making priority levels accessible to kernel threads
that are not otherwise accessible to processes, both above and below
user-available priority levels. Basically, if you can get SCHED_RR and
SCHED_FIFO to coexist as "intimate scheduler classes," then a SCHED_KERN
scheduler class can coexist with SCHED_OTHER in like fashion, but with
availability of higher and lower priorities than any userspace process
is allowed, and potentially some differing scheduling semantics. In such
a manner nonessential background processing intended not to ever disturb
userspace can be given priorities appropriate to it (perhaps even con's
SCHED_IDLEPRIO would make sense), and other, urgent processing can be
given priority over userspace altogether.

I believe root's default priority can be adjusted in userspace as
things now stand somewhere in /etc/ but I'm not sure of the specifics.
Word is somewhere in /etc/security/limits.conf


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Mike Galbraith
On Thu, 2007-04-19 at 09:09 +0200, Ingo Molnar wrote:
> * Mike Galbraith <[EMAIL PROTECTED]> wrote:
> 
> > With a heavily reniced X (perfectly fine), that should indeed solve my 
> > daily usage pattern nicely (always need godmode for shells, but not 
> > for mozilla and ilk. 50/50 split automatic without renice of entire 
> > gui)
> 
> how about the first-approximation solution i suggested in the previous 
> mail: to add a per UID default nice level? (With this default defaulting 
> to '-10' for all root-owned processes, and defaulting to '0' for 
> everything else.) That would solve most of the current CFS regressions 
> at hand.

That would make my kernel builds etc interfere with my other self's
surfing and whatnot.  With it by EUID, when I'm surfing or whatnot, the
X portion of my Joe-User activity pushes the compile portion of root
down in bandwidth utilization automagically, which is exactly the right
thing, because the root me in not as important as the Joe-User me using
the GUI at that time.  If the idea of X disturbing root upsets some,
they can move X to another UID.  Generally, it seems perfect for here.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Mike Galbraith
On Thu, 2007-04-19 at 08:52 +0200, Mike Galbraith wrote:
> On Wed, 2007-04-18 at 23:48 +0200, Ingo Molnar wrote:
> 
> > so my current impression is that we want per UID accounting to solve the 
> > X problem, the kernel threads problem and the many-users problem, but 
> > i'd not want to do it for threads just yet because for them there's not 
> > really any apparent problem to be solved.
> 
> If you really mean UID vs EUID as Linus mentioned, I suppose I could
> learn to login as !root, and set KDE up to always give me root shells.
> 
> With a heavily reniced X (perfectly fine), that should indeed solve my
> daily usage pattern nicely (always need godmode for shells, but not for
> mozilla and ilk. 50/50 split automatic without renice of entire gui)

Backward, needs to be EUID as Linus suggested.  Kernel builds etc along
with reniced X in root's bucket, surfing and whatnot in Joe-User's
bucket.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Ingo Molnar

* Mike Galbraith <[EMAIL PROTECTED]> wrote:

> With a heavily reniced X (perfectly fine), that should indeed solve my 
> daily usage pattern nicely (always need godmode for shells, but not 
> for mozilla and ilk. 50/50 split automatic without renice of entire 
> gui)

how about the first-approximation solution i suggested in the previous 
mail: to add a per UID default nice level? (With this default defaulting 
to '-10' for all root-owned processes, and defaulting to '0' for 
everything else.) That would solve most of the current CFS regressions 
at hand.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Mike Galbraith
On Wed, 2007-04-18 at 23:48 +0200, Ingo Molnar wrote:

> so my current impression is that we want per UID accounting to solve the 
> X problem, the kernel threads problem and the many-users problem, but 
> i'd not want to do it for threads just yet because for them there's not 
> really any apparent problem to be solved.

If you really mean UID vs EUID as Linus mentioned, I suppose I could
learn to login as !root, and set KDE up to always give me root shells.

With a heavily reniced X (perfectly fine), that should indeed solve my
daily usage pattern nicely (always need godmode for shells, but not for
mozilla and ilk. 50/50 split automatic without renice of entire gui)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Ingo Molnar

* Andrew Morton <[EMAIL PROTECTED]> wrote:

> > And yes, by fairly, I mean fairly among all threads as a base 
> > resource class, because that's what Linux has always done
> 
> Yes, there are potential compatibility problems.  Example: a machine 
> with 100 busy httpd processes and suddenly a big gzip starts up from 
> console or cron.
> 
> Under current kernels, that gzip will take ages and the httpds will 
> take a 1% slowdown, which may well be exactly the behaviour which is 
> desired.
> 
> If we were to schedule by UID then the gzip suddenly gets 50% of the 
> CPU and those httpd's all take a 50% hit, which could be quite 
> serious.
> 
> That's simple to fix via nicing, but people have to know to do that, 
> and there will be a transition period where some disruption is 
> possible.

h. How about the following then: default to nice -10 for all 
(SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ 
special: root already has disk space reserved to it, root has special 
memory allocation allowances, etc. I dont see a reason why we couldnt by 
default make all root tasks have nice -10. This would be instantly loved 
by sysadmins i suspect ;-)

(distros that go the extra mile of making Xorg run under non-root could 
also go another extra one foot to renice that X server to -10.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Andrew Morton
On Thu, 19 Apr 2007 05:18:07 +0200 Nick Piggin <[EMAIL PROTECTED]> wrote:

> And yes, by fairly, I mean fairly among all threads as a base resource
> class, because that's what Linux has always done

Yes, there are potential compatibility problems.  Example: a machine with
100 busy httpd processes and suddenly a big gzip starts up from console or
cron.

Under current kernels, that gzip will take ages and the httpds will take a
1% slowdown, which may well be exactly the behaviour which is desired.

If we were to schedule by UID then the gzip suddenly gets 50% of the CPU
and those httpd's all take a 50% hit, which could be quite serious.

That's simple to fix via nicing, but people have to know to do that, and
there will be a transition period where some disruption is possible.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Nick Piggin
On Wed, Apr 18, 2007 at 10:49:45PM +1000, Con Kolivas wrote:
> On Wednesday 18 April 2007 22:13, Nick Piggin wrote:
> >
> > The kernel compile (make -j8 on 4 thread system) is doing 1800 total
> > context switches per second (450/s per runqueue) for cfs, and 670
> > for mainline. Going up to 20ms granularity for cfs brings the context
> > switch numbers similar, but user time is still a % or so higher. I'd
> > be more worried about compute heavy threads which naturally don't do
> > much context switching.
> 
> While kernel compiles are nice and easy to do I've seen enough criticism of 
> them in the past to wonder about their usefulness as a standard benchmark on 
> their own.

Actually it is a real workload for most kernel developers including you
no doubt :)

The criticism's of kernbench for the kernel are probably fair in that
kernel compiles don't exercise a lot of kernel functionality (page
allocator and fault paths mostly, IIRC). However as far as I'm concerned,
they're great for testing the CPU scheduler, because it doesn't actually
matter whether you're running in userspace or kernel space for a context
switch to blow your caches. The results are quite stable.

You could actually make up a benchmark that hurts a whole lot more from
context switching, but I figure that kernbench is a real world thing
that shows it up quite well.


> > Some other numbers on the same system
> > Hackbench:  2.6.21-rc7  cfs-v2 1ms[*]   nicksched
> > 10 groups: Time: 1.332  0.743   0.607
> > 20 groups: Time: 1.197  1.100   1.241
> > 30 groups: Time: 1.754  2.376   1.834
> > 40 groups: Time: 3.451  2.227   2.503
> > 50 groups: Time: 3.726  3.399   3.220
> > 60 groups: Time: 3.548  4.567   3.668
> > 70 groups: Time: 4.206  4.905   4.314
> > 80 groups: Time: 4.551  6.324   4.879
> > 90 groups: Time: 7.904  6.962   5.335
> > 100 groups: Time: 7.293 7.799   5.857
> > 110 groups: Time: 10.5958.728   6.517
> > 120 groups: Time: 7.543 9.304   7.082
> > 130 groups: Time: 8.269 10.639  8.007
> > 140 groups: Time: 11.8678.250   8.302
> > 150 groups: Time: 14.8528.656   8.662
> > 160 groups: Time: 9.648 9.313   9.541
> 
> Hackbench even more so. A prolonged discussion with Rusty Russell on this 
> issue he suggested hackbench was more a pass/fail benchmark to ensure there 
> was no starvation scenario that never ended, and very little value should be 
> placed on the actual results returned from it.

Yeah, cfs seems to do a little worse than nicksched here, but I
include the numbers not because I think that is significant, but to
show mainline's poor characteristics.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Nick Piggin
On Wed, Apr 18, 2007 at 07:48:21AM -0700, Linus Torvalds wrote:
> 
> 
> On Wed, 18 Apr 2007, Matt Mackall wrote:
> > 
> > Why is X special? Because it does work on behalf of other processes?
> > Lots of things do this. Perhaps a scheduler should focus entirely on
> > the implicit and directed wakeup matrix and optimizing that
> > instead[1].
> 
> I 100% agree - the perfect scheduler would indeed take into account where 
> the wakeups come from, and try to "weigh" processes that help other 
> processes make progress more. That would naturally give server processes 
> more CPU power, because they help others
> 
> I don't believe for a second that "fairness" means "give everybody the 
> same amount of CPU". That's a totally illogical measure of fairness. All 
> processes are _not_ created equal.

I believe that unless the kernel is told of these inequalities, then it
must schedule fairly.

And yes, by fairly, I mean fairly among all threads as a base resource
class, because that's what Linux has always done (and if you aggregate
into higher classes, you still need that per-thread scheduling).

So I'm not excluding extra scheduling classes like per-process, per-user,
but among any class of equal schedulable entities, fair scheduling is the
only option because the alternative of unfairness is just insane.


> That said, even trying to do "fairness by effective user ID" would 
> probably already do a lot. In a desktop environment, X would get as much 
> CPU time as the user processes, simply because it's in a different 
> protection domain (and that's really what "effective user ID" means: it's 
> not about "users", it's really about "protection domains").
> 
> And "fairness by euid" is probably a hell of a lot easier to do than 
> trying to figure out the wakeup matrix.

Well my X server has an euid of root, which would mean my X clients can
cause X to do work and eat into root's resources. Or as Ingo said, X
may not be running as root. Seems like just another hack to try to
implicitly solve the X problem and probably create a lot of others
along the way.

All fairness issues aside, in the context of keeping a very heavily
loaded desktop interactive, X is special. That you are trying to think
up funny rules that would implicitly give X better priority is kind of
indicative of that.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Peter Williams

Ingo Molnar wrote:

* Peter Williams <[EMAIL PROTECTED]> wrote:

And my scheduler for example cuts down the amount of policy code and 
code size significantly.
Yours is one of the smaller patches mainly because you perpetuate (or 
you did in the last one I looked at) the (horrible to my eyes) dual 
array (active/expired) mechanism.  That this idea was bad should have 
been apparent to all as soon as the decision was made to excuse some 
tasks from being moved from the active array to the expired array.  
This essentially meant that there would be circumstances where extreme 
unfairness (to the extent of starvation in some cases) -- the very 
things that the mechanism was originally designed to ensure (as far as 
I can gather).  Right about then in the development of the O(1) 
scheduler alternative solutions should have been sought.


in hindsight i'd agree.


Hindsight's a wonderful place isn't it :-) and, of course, it's where I 
was making my comments from.


But back then we were clearly not ready for 
fine-grained accurate statistics + trees (cpus are alot faster at more 
complex arithmetics today, plus people still believed that low-res can 
be done well enough),  and taking out any of these two concepts from CFS

would result in a similarly complex runqueue implementation.


I disagree.  The single priority array with a promotion mechanism that I 
use in the SPA schedulers can do the job of avoiding starvation with no 
measurable increase in the overhead.  Fairness, nice, good interactive 
responsiveness can then be managed by how you determine tasks' dynamic 
priorities.


Also, the 
array switch was just thought to be of another piece of 'if the 
heuristics go wrong, we fall back to an array switch' logic, right in 
line with the other heuristics. And you have to accept it, mainline's 
ability to auto-renice make -j jobs (and other CPU hogs) was quite a 
plus for developers, so it had (and probably still has) quite some 
inertia.


I agree, it wasn't totally useless especially for the average user.  My 
main problem with it was that the effect of "nice" wasn't consistent or 
predictable enough for reliable resource allocation.


I also agree with the aims of the various heuristics i.e. you have to be 
unfair and give some tasks preferential treatment in order to give the 
users the type of responsiveness that they want.  It's just a shame that 
it got broken in the process but as you say it's easier to see these 
things in hindsight than in the middle of the melee.


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Peter Williams

Chris Friesen wrote:

Mark Glines wrote:


One minor question: is it even possible to be completely fair on SMP?
For instance, if you have a 2-way SMP box running 3 applications, one of
which has 2 threads, will the threaded app have an advantage here?  (The
current system seems to try to keep each thread on a specific CPU, to
reduce cache thrashing, which means threads and processes alike each
get 50% of the CPU.)


I think the ideal in this case would be to have both threads on one cpu, 
with the other app on the other cpu.  This gives inter-process fairness 
while minimizing the amount of task migration required.


Solving this sort of issue was one of the reasons for the smpnice patches.



More interesting is the case of three processes on a 2-cpu system.  Do 
we constantly migrate one of them back and forth to ensure that each of 
them gets 66% of a cpu?


Depends how keen you are on fairness.  Unless the process are long term 
continuously active tasks that never sleep it's probably not an issue as 
they'll probably move around enough in the long term for them each to 
get 66% over the long term.


Exact load balancing for real work loads (where tasks are coming and 
going, sleeping and waking semi randomly and over relatively brief 
periods) is probably unattainable because by the time you've work out 
the ideal placement of the currently runnable tasks on the available 
CPUs it's all changed and the solution is invalid.  The best you can 
hope for that change isn't so great as to completely invalidate the 
solution and the changes you make as a result are an improvement on the 
current allocation of processes to CPUs.


The above probably doesn't hold for some systems such as those large 
super computer jobs that run for several days but they're probably best 
served by explicit allocation of processes to CPUs using the process 
affinity mechanism.


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Linus Torvalds


On Wed, 18 Apr 2007, Davide Libenzi wrote:
> 
> I know, we agree there. But that did not fit my "Pirates of the Caribbean" 
> quote :)

Ahh, I'm clearly not cultured enough, I didn't catch that reference.

Linus "yes, I've seen the movie, but it
 apparently left more of a mark in other people" Torvalds
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Peter Williams

Linus Torvalds wrote:


On Wed, 18 Apr 2007, Matt Mackall wrote:

On Wed, Apr 18, 2007 at 07:48:21AM -0700, Linus Torvalds wrote:
And "fairness by euid" is probably a hell of a lot easier to do than 
trying to figure out the wakeup matrix.

For the record, you actually don't need to track a whole NxN matrix
(or do the implied O(n**3) matrix inversion!) to get to the same
result.


I'm sure you can do things differently, but the reason I think "fairness 
by euid" is actually worth looking at is that it's pretty much the 
*identical* issue that we'll have with "fairness by virtual machine" and a 
number of other "container" issues.


The fact is:

 - "fairness" is *not* about giving everybody the same amount of CPU time 
   (scaled by some niceness level or not). Anybody who thinks that is 
   "fair" is just being silly and hasn't thought it through.


 - "fairness" is multi-level. You want to be fair to threads within a 
   thread group (where "process" may be one good approximation of what a 
   "thread group" is, but not necessarily the only one).


   But you *also* want to be fair in between those "thread groups", and 
   then you want to be fair across "containers" (where "user" may be one 
   such container).


So I claim that anything that cannot be fair by user ID is actually really 
REALLY unfair. I think it's absolutely humongously STUPID to call 
something the "Completely Fair Scheduler", and then just be fair on a 
thread level. That's not fair AT ALL! It's the anti-thesis of being fair!


So if you have 2 users on a machine running CPU hogs, you should *first* 
try to be fair among users. If one user then runs 5 programs, and the 
other one runs just 1, then the *one* program should get 50% of the CPU 
time (the users fair share), and the five programs should get 10% of CPU 
time each. And if one of them uses two threads, each thread should get 5%.


So you should see one thread get 50& CPU (single thread of one user), 4 
threads get 10% CPU (their fair share of that users time), and 2 threads 
get 5% CPU (the fair share within that thread group!).


Any scheduling argument that just considers the above to be "7 threads 
total" and gives each thread 14% of CPU time "fairly" is *anything* but 
fair. It's a joke if that kind of scheduler then calls itself CFS!


And yes, that's largely what the current scheduler will do, but at least 
the current scheduler doesn't claim to be fair! So the current scheduler 
is a lot *better* if only in the sense that it doesn't make ridiculous 
claims that aren't true!


Linus


Sounds a lot like the PLFS (process level fair sharing) scheduler in 
Aurema's ARMTech (for whom I used to work).  The "fair" in the title is 
a bit misleading as it's all about unfair scheduling in order to meet 
specific policies.  But it's based on the principle that if you can 
allocate CPU band width "fairly" (which really means in proportion to 
the entitlement each process is allocated) then you can allocate CPU 
band width "fairly" between higher level entities such as process 
groups, users groups and so on by subdividing the entitlements downwards.


The tricky part of implementing this was the fact that not all entities 
at the various levels have sufficient demand for CPU band width to use 
their entitlements and this in turn means that the entities above them 
will have difficulty using their entitlements even if other of their 
subordinates have sufficient demand (because their entitlements will be 
too small).  The trick is to have a measure of each entity's demand for 
CPU bandwidth and use that to modify the way entitlement is divided 
among subordinates.


As a first guess, an entity's CPU band width usage is an indicator of 
demand but doesn't take into account unmet demand due to tasks waiting 
on a run queue waiting for access to the CPU.  On the other hand, usage 
plus time waiting on the queue isn't a good measure of demand either 
(although it's probably a good upper bound) as it's unlikely that the 
task would have used the same amount of CPU as the waiting time if it 
had gone straight to the CPU.


But my main point is that it is possible to build schedulers that can 
achieve higher level scheduling policies.  Versions of PLFS work on 
Windows from user space by twiddling process priorities.  Part of my 
more recent work at Aurema had been involved in patching Linux's 
scheduler so that nice worked more predictably so that we could release 
a user space version of PLFS for Linux.  The other part was to add hard 
CPU band width caps for processes so that ARMTech could enforce hard CPU 
bandwidth caps on higher level entities (as this can't be done without 
the kernel being able to do it at that level.


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Davide Libenzi
On Wed, 18 Apr 2007, Linus Torvalds wrote:

> On Wed, 18 Apr 2007, Davide Libenzi wrote:
> > 
> > "Perhaps on the rare occasion pursuing the right course demands an act of 
> >  unfairness, unfairness itself can be the right course?"
> 
> I don't think that's the right issue.
> 
> It's just that "fairness" != "equal".
> 
> Do you think it "fair" to pay everybody the same regardless of how good a 
> job they do? I don't think anybody really believes that. 
> 
> Equating "fair" and "equal" is simply a very fundamental mistake. They're 
> not the same thing. Never have been, and never will.

I know, we agree there. But that did not fit my "Pirates of the Caribbean" 
quote :)



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Davide Libenzi
On Wed, 18 Apr 2007, Ingo Molnar wrote:

> That's one reason why i dont think it's necessarily a good idea to 
> group-schedule threads, we dont really want to do a per thread group 
> percpu_alloc().

I still do not have clear how much overhead this will bring into the 
table, but I think (like Linus was pointing out) the hierarchy should look 
like:

Top (VCPU maybe?)
User
Process
Thread

The "run_queue" concept (and data) that now is bound to a CPU, need to be 
replicated in:

ROOT <- VCPUs add themselves here
VCPU <- USERs add themselves here
USER <- PROCs add themselves here
PROC <- THREADs add themselves here
THREAD (ultimate fine grained scheduling unit)

So ROOT, VCPU, USER and PROC will have their own "run_queue". Picking up a 
new task would mean:

VCPU = ROOT->lookup();
USER = VCPU->lookup();
PROC = USER->lookup();
THREAD = PROC->lookup();

Run-time statistics should propagate back the other way around.


> In fact for threads the _reverse_ problem exists, threaded apps tend to 
> _strive_ for more performance - hence their desperation of using the 
> threaded programming model to begin with ;) (just think of media 
> playback apps which are typically multithreaded)

The same user nicing two different multi-threaded processes would expect a 
predictable CPU distribution too. Doing that efficently (the old per-cpu 
run-queue is pretty nice from many POVs) is the real challenge.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Con Kolivas
On Wednesday 18 April 2007 22:33, Con Kolivas wrote:
> On Wednesday 18 April 2007 22:14, Nick Piggin wrote:
> > On Wed, Apr 18, 2007 at 07:33:56PM +1000, Con Kolivas wrote:
> > > On Wednesday 18 April 2007 18:55, Nick Piggin wrote:
> > > > Again, for comparison 2.6.21-rc7 mainline:
> > > >
> > > > 508.87user 32.47system 2:17.82elapsed 392%CPU
> > > > 509.05user 32.25system 2:17.84elapsed 392%CPU
> > > > 508.75user 32.26system 2:17.83elapsed 392%CPU
> > > > 508.63user 32.17system 2:17.88elapsed 392%CPU
> > > > 509.01user 32.26system 2:17.90elapsed 392%CPU
> > > > 509.08user 32.20system 2:17.95elapsed 392%CPU
> > > >
> > > > So looking at elapsed time, a granularity of 100ms is just behind the
> > > > mainline score. However it is using slightly less user time and
> > > > slightly more idle time, which indicates that balancing might have
> > > > got a bit less aggressive.
> > > >
> > > > But anyway, it conclusively shows the efficiency impact of such tiny
> > > > timeslices.
> > >
> > > See test.kernel.org for how (the now defunct) SD was performing on
> > > kernbench. It had low latency _and_ equivalent throughput to mainline.
> > > Set the standard appropriately on both counts please.
> >
> > I can give it a run. Got an updated patch against -rc7?
>
> I said I wasn't pursuing it but since you're offering, the rc6 patch should
> apply ok.
>
> http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc6-sd-0.40.patch

Oh and if you go to the effort of trying you may as well try the timeslice 
tweak to see what effect it has on SD as well.

/proc/sys/kernel/rr_interval

100 is the highest.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Ingo Molnar

* Davide Libenzi <[EMAIL PROTECTED]> wrote:

> I think Ingo's idea of a new sched_group to contain the generic 
> parameters needed for the "key" calculation, works better than adding 
> more fields to existing strctures (that would, of course, host 
> pointers to it). Otherwise I can already the the struct_signal being 
> the target for other unrelated fields :)

yeah. Another detail is that for global containers like uids, the 
statistics will have to be percpu_alloc()-ed, both for correctness 
(runqueues are per CPU) and for performance.

That's one reason why i dont think it's necessarily a good idea to 
group-schedule threads, we dont really want to do a per thread group 
percpu_alloc().

In fact for threads the _reverse_ problem exists, threaded apps tend to 
_strive_ for more performance - hence their desperation of using the 
threaded programming model to begin with ;) (just think of media 
playback apps which are typically multithreaded)

I dont think threads are all that different. Also, the 
resource-conserving act of using CLONE_VM to share the VM (and to use a 
different programming environment like Java) should not be 'punished' by 
forcing the thread group to be accounted as a single, shared entity 
against other 'fat' tasks.

so my current impression is that we want per UID accounting to solve the 
X problem, the kernel threads problem and the many-users problem, but 
i'd not want to do it for threads just yet because for them there's not 
really any apparent problem to be solved.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> > perhaps a more fitting term would be 'precise group-scheduling'. 
> > Within the lowest level task group entity (be that thread group or 
> > uid group, etc.) 'precise scheduling' is equivalent to 'fairness'.
> 
> Yes. Absolutely. Except I think that at least if you're going to name 
> somethign "complete" (or "perfect" or "precise"), you should also 
> admit that groups can be hierarchical.

yes. Am i correct to sum up your impression as:

 " Ingo, for you the hierarchy still appears to be an after-thought,
   while in practice it's easily the most important thing! Why are you
   so hung up about 'fairness', it makes no sense!"

right?

and you would definitely be right if you suggested that i neglected the 
'group scheduling' aspects of CFS (except for a minimalistic nice level 
implementation, which is a poor-man's-non-automatic-group-scheduling), 
but i very much know its important and i'll definitely fix it for -v4.

But please let me explain my reasons for my different focus:

yes, group scheduling in practice is the most important first-layer 
thing, and without it any of the other 'CFS wins' can easily be useless.

Firstly, i have not neglected the group scheduling related CFS 
regressions at all, mainly because there _is_ already a quick hack to 
check whether group scheduling would solve these regressions: renice. 
And it was tried in both of the two CFS regression cases i'm aware of: 
Mike's X starvation problem and Willy's "kevents starvation with 
thousands of scheddos tasks running" problem. And in both cases, 
applying the renice hack [which should be properly and automatically 
implemented as uid group scheduling] fixed the regression for them! So i 
was not worried at all, group scheduling _provably solves_ these CFS 
regressions. I rather concentrated on the CFS regressions that were much 
less clear.

But PLEASE believe me: even with perfect cross-group CPU allocation but 
with a simple non-heuristic scheduler underlying it, you can _easily_ 
get a sucky desktop experience! I know it because i tried it and others 
tried it too. (in fact the first version of sched_fair.c was tick based 
and low-res, and it sucked)

Two more things were needed:

  - the high precision of nsec/64-bit accounting
('reliability of scheduling')

  - extremely even time-distribution of CPU power 
('determinism/smoothness, human perception')

(i'm expanding on these two concepts further below)

take out any of these and group scheduling or not, you are easily going 
to have a sucky desktop! (We know that from years of experiments: many 
people tried to rip out the unfairness from the scheduler and there were 
always nasty corner cases that 'should' have worked but didnt.)

Without these we'd in essence start again at square one, just at a 
different square, this time with another group of people being 
irritated!

But the biggest and hardest to achieve _wins_ of CFS are _NOT_ achieved 
via a simple 'get rid of the unfairness of the upstream scheduler and 
apply group scheduling'. (I know that because i tried it before and 
because others tried it before, for many many years.) You will _easily_ 
get sucky desktop experience. The other two things are very much needed 
too:

 - the high precision of nsec/64-bit accounting, and the many
   corner-cases this solves. (For example on a typical desktop there are
   _lots_ of timing-driven workloads that are in essence 'invisible' to
   low-resolution, timer-tick based accounting and are heavily skewed.)

 - extremely even time-distribution of CPU power. CFS behaves pretty
   well even under the dreaded 'make -jN in an xterm' kernel build
   workload as reported by Mark Lord, because it also distributes CPU
   power in a _finegrained_ way. A shell prompt under CFS still behaves
   acceptably on a single-CPU testbox of mine with a "make -j50"
   workload. (yes, fifty) Humans react alot more negatively to sudden
   changes in application behavior ('lags', pauses, short hangs) than
   they react to fine, gradual, all-encompassing slowdowns. This is a
   key property of CFS.

  ( Otherwise renicing X to -10 would have solved most of the
interactivity complaints against the vanilla scheduler, otherwise
renicing X to -10 would have fixed Mike's setup under SD (it didnt)
while it worked much better under CFS, otherwise Gene wouldnt have
found CFS markedly better than SD, etc., etc. So getting rid of the
heuristics is less than 50% of the road to the perfect desktop
scheduler. )

and i claim that these were the really hard bits, and i spent most of 
the CFS coding only on getting _these_ details 100% right under various 
workloads, and it makes a night and day difference _even without any 
group scheduling help_.

and note another reason here: group scheduling _masks_ many other 
scheduling deficiencies that are possible in scheduler. So since CFS 
doesnt do group scheduling, i get a _fuller_ pictur

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Linus Torvalds


On Wed, 18 Apr 2007, Davide Libenzi wrote:
> 
> "Perhaps on the rare occasion pursuing the right course demands an act of 
>  unfairness, unfairness itself can be the right course?"

I don't think that's the right issue.

It's just that "fairness" != "equal".

Do you think it "fair" to pay everybody the same regardless of how good a 
job they do? I don't think anybody really believes that. 

Equating "fair" and "equal" is simply a very fundamental mistake. They're 
not the same thing. Never have been, and never will.

Now, there's no question that "equal" is much easier to implement, if only 
because it's a lot easier to agree what it means. "Equal parts" is 
somethign everybody can agree on. "Fair parts" automatically involves a 
balancing act, and people will invariably count things differently and 
thus disagree about what is "fair" and what is not.

I don't think we can ever get a "perfect" setup for that reason, but I 
think we can get something that at least gets reasonably close, at least 
for the obvious cases.

So my suggested test-case of running one process as one user and two 
processes as another one has a fairly "obviously correct" solution if you 
have just one CPU's, and you can probably be pretty fair in practice on 
two CPU's (there's an obvious theoretical solution, whether you can get 
there with a practical algorithm is another thing). On three or more 
CPU's, you obviously wouldn't even *want* to be fair, since you can very 
naturally just give a CPU to each..

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Davide Libenzi
On Wed, 18 Apr 2007, Linus Torvalds wrote:

> For example, maybe we can approximate it by spreading out the statistics: 
> right now you have things like
> 
>  - last_ran, wait_runtime, sum_wait_runtime..
> 
> be per-thread things. Maybe some of those can be spread out, so that you 
> put a part of them in the "struct vm_struct" thing (to approximate 
> processes), part of them in the "struct user" struct (to approximate the 
> user-level thing), and part of it in a per-container thing for when/if we 
> support that kind of thing?

I think Ingo's idea of a new sched_group to contain the generic 
parameters needed for the "key" calculation, works better than adding more 
fields to existing strctures (that would, of course, host pointers to it). 
Otherwise I can already the the struct_signal being the target for other 
unrelated fields :)



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Davide Libenzi
On Wed, 18 Apr 2007, Linus Torvalds wrote:

> I'm not arguing against fairness. I'm arguing against YOUR notion of 
> fairness, which is obviously bogus. It is *not* fair to try to give out 
> CPU time evenly!

"Perhaps on the rare occasion pursuing the right course demands an act of 
 unfairness, unfairness itself can be the right course?"



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Davide Libenzi
On Wed, 18 Apr 2007, William Lee Irwin III wrote:

> Thinking of the scheduler as a CPU bandwidth allocator, this means
> handing out shares of CPU bandwidth to all users on the system, which
> in turn hand out shares of bandwidth to all sessions, which in turn
> hand out shares of bandwidth to all process groups, which in turn hand
> out shares of bandwidth to all thread groups, which in turn hand out
> shares of bandwidth to threads. The event handlers for the scheduler
> need not deal with this apart from task creation and exit and various
> sorts of process ID changes (e.g. setsid(), setpgrp(), setuid(), etc.).

Yes, it really becomes a hierarchical problem once you consider user and 
processes. Top level sees a "user" can be scheduled (put itself on the 
virtual run queue), and passes the ball to the "process" scheduler inside 
the "user" container, down to maybe "threads". With all the "key" 
calculation parameters kept at each level (with up-propagation).



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> For example, maybe we can approximate it by spreading out the 
> statistics: right now you have things like
> 
>  - last_ran, wait_runtime, sum_wait_runtime..
> 
> be per-thread things. [...]

yes, yes, yes! :) My thinking is "struct sched_group" embedded into 
_arbitrary_ other resource containers and abstractions, which 
sched_group's are then in a simple hierarchy and are driven by the core 
scheduling machinery.

> [...] Maybe some of those can be spread out, so that you put a part of 
> them in the "struct vm_struct" thing (to approximate processes), part 
> of them in the "struct user" struct (to approximate the user-level 
> thing), and part of it in a per-container thing for when/if we support 
> that kind of thing?

yes.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Linus Torvalds


On Wed, 18 Apr 2007, Ingo Molnar wrote:
> 
> perhaps a more fitting term would be 'precise group-scheduling'. Within 
> the lowest level task group entity (be that thread group or uid group, 
> etc.) 'precise scheduling' is equivalent to 'fairness'.

Yes. Absolutely. Except I think that at least if you're going to name 
somethign "complete" (or "perfect" or "precise"), you should also admit 
that groups can be hierarchical.

The "threads in a process" thing is a great example of a hierarchical 
group. Imagine if X was running as a collection of threads - then each 
server thread would no longer be more important than the clients! But if 
you have a mix of "bags of threads" and "single process" kind 
applications, then very arguably the single thread in a single traditional 
process should get as much time as the "bag of threads" process gets 
total.

So it really should be a hierarchical notion, where each thread is owned 
by one "process", and each process is owned by one "user", and each user 
is in one "virtual machine" - there's at least three different levels to 
this, and you'd want to schedule this thing top-down: virtual machines 
should be given CPU time "fairly" (which doesn't need to mean "equally", 
of course - nice-values could very well work at that level too), and then 
within each virtual machine users or "scheduling groups" should be 
scheduled fairly, and then within each scheduling group the processes 
should be scheduled, and within each process threads should equally get 
their fair share at _that_ level.

And no, I don't think we necessarily need to do something quite that 
elaborate. But I think that's the kind of "obviously good goal" to keep in 
mind. Can we perhaps _approximate_ something like that by other means? 

For example, maybe we can approximate it by spreading out the statistics: 
right now you have things like

 - last_ran, wait_runtime, sum_wait_runtime..

be per-thread things. Maybe some of those can be spread out, so that you 
put a part of them in the "struct vm_struct" thing (to approximate 
processes), part of them in the "struct user" struct (to approximate the 
user-level thing), and part of it in a per-container thing for when/if we 
support that kind of thing?

IOW, I don't think the scheduling "groups" have to be explicit boxes or 
anything like that. I suspect you can make do with just heurstics that 
penalize the same "struct user" and "struct vm_struct" to get overly much 
scheduling time, and you'll get the same _effect_. 

And I don't think it's wrong to look at the "one hundred processes by the 
same user" case as being an important case. But it should not be the 
*only* case or even necessarily the *main* case that matters. I think a 
benchmark that literally does

pid_t pid = fork();
if (pid < 0)
exit(1);
if (pid) {
if (setuid(500) < 0)
exit(2);
for (;;)
/* Do nothing */;
}
if (setuid(501) < 0)
exit(3);
fork();
for (;;)
/* Do nothing in two processes */;

and I think that it's a really valid benchmark: if the scheduler gives 25% 
of time to each of the two processes of user 501, and 50% to user 500, 
then THAT is a good scheduler.

If somebody wants to actually write and test the above as a test-script, 
and add it to a collection of scheduler tests, I think that could be a 
good thing.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Chris Friesen

Mark Glines wrote:


One minor question: is it even possible to be completely fair on SMP?
For instance, if you have a 2-way SMP box running 3 applications, one of
which has 2 threads, will the threaded app have an advantage here?  (The
current system seems to try to keep each thread on a specific CPU, to
reduce cache thrashing, which means threads and processes alike each
get 50% of the CPU.)


I think the ideal in this case would be to have both threads on one cpu, 
with the other app on the other cpu.  This gives inter-process fairness 
while minimizing the amount of task migration required.


More interesting is the case of three processes on a 2-cpu system.  Do 
we constantly migrate one of them back and forth to ensure that each of 
them gets 66% of a cpu?


Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Linus Torvalds


On Wed, 18 Apr 2007, Ingo Molnar wrote:
> 
> But note that most of the reported CFS interactivity wins, as surprising 
> as it might be, were due to fairness between _the same user's tasks_.

And *ALL* of the CFS interactivity *losses* and complaints have been 
because it did the wrong thing _between different user's tasks_

So what's your point? Your point was that when people try it out as a 
single user, it is indeed fair. But that's no point at all, since it 
totally missed _my_ point.

The problems with X scheduling is exactly that "other user" kind of thing.

The problem with kernel thread starvation due to user threads getting all 
the CPU time is exactly the same issue.

As logn as you think that all threads are equal, and should be treated 
equally, you CANNOT make it work well. People can say "ok, you can renice 
X", but the whole problem stems from the fact that you're trying to be 
fair based on A TOTALLY INVALID NOTION of what "fair" is.

> In the typical case, 99% of the desktop CPU time is executed either as X 
> (root user) or under the uid of the logged in user, and X is just one 
> task.

So? You are ignoring the argument again. You're totally bringing up a red 
herring:

> Even with a bad hack of making X super-high-prio, interactivity as 
> experienced by users still sucks without having fairness between the 
> other 100-200 user tasks that a desktop system is typically using.

I didn't say that you should be *unfair* within one user group. What kind 
of *idiotic* argument are you trying to put forth?

OF COURSE you should be fair "within the user group". Nobody contests that 
the "other 100-200 user tasks" should be scheduled fairly _amongst 
themselves_. 

The only point I had was that you cannot just lump all threads together 
and say "these threads are equally important". The 100-200 user tasks may 
be equally important, and should get equal amounts of preference, but that 
has absolutely _zero_ bearing on the _single_ task run in another 
"scheduling group", ie by other users or by X.

I'm not arguing against fairness. I'm arguing against YOUR notion of 
fairness, which is obviously bogus. It is *not* fair to try to give out 
CPU time evenly!

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Michael K. Edwards

On 4/18/07, Matt Mackall <[EMAIL PROTECTED]> wrote:

For the record, you actually don't need to track a whole NxN matrix
(or do the implied O(n**3) matrix inversion!) to get to the same
result. You can converge on the same node weightings (ie dynamic
priorities) by applying a damped function at each transition point
(directed wakeup, preemption, fork, exit).

The trouble with any scheme like this is that it needs careful tuning
of the damping factor to converge rapidly and not oscillate and
precise numerical attention to the transition functions so that the sum of
dynamic priorities is conserved.


That would be the control theory approach.  And yes, you have to get
both the theoretical transfer function and the numerics right.  It
sometimes helps to use a control-systems framework like the classic
Takagi-Sugeno-Kang fuzzy logic controller; get the numerics right once
and for all, and treat the heuristics as data, not logic.  (I haven't
worked in this area in almost twenty years, but Google -- yes, I do
use Google+brain for fact-checking; what do you do? -- says that
people are still doing active research on TSK models, and solid
fixed-point reference implementations are readily available.)  That
seems like an attractive strategy here because you could easily embed
the control engine in the kernel and load rule sets dynamically.  Done
right, that could give most of the advantages of pluggable schedulers
(different heuristic strokes for different folks) without diluting the
tester pool for the actual engine code.

(Of course, different scheduling strategies require different input
data, and you might not want the overhead of collecting data that your
chosen heuristics won't use.  But that's not much different from the
netfilter situation, and is obviously a solvable problem, if anyone
cares to put that much work in.  The people who ought to be funding
this kind of work are Sun and IBM, who don't have a chance on the
desktop and are in big trouble in the database tier; their future as
processor vendors depends on being able to service presentation-tier
and business-logic-tier loads efficiently on their massively
multi-core chips.  MIPS should pitch in too, on behalf of licensees
like Cavium who need more predictable behavior on multi-core embedded
Linux.)

Note also that you might not even want to persistently prioritize
particular processes or process groups.  You might want a heuristic
that notices that some task (say, the X server) often responds to
being awakened by doing a little work and then unblocking the task
that awakened it.  When it is pinged from some highly interactive
task, you want it to jump the scheduler queue just long enough to
unblock the interactive task, which may mean letting it flush some
work out of its internal queue.  But otherwise you want to batch
things up until there's too much "scheduler pressure" behind it, then
let it work more or less until it runs out of things to do, because
its working set is so large that repeatedly scheduling it in and out
is hell on caches.

(Priority inheritance is the classic solution to the
blocked-high-priority-task problem _in_isolation_.  It is not without
its pitfalls, especially when the designer of the "server" didn't
expect to lose his timeslice instantly on releasing the lock.  True
priority inheritance is probably not something you want to inflict on
a non-real-time system, but you do need some urgency heuristic.  What
a "fuzzy logic" framework does for you is to let you combine competing
heuristics in a way that remains amenable to analysis using control
theory techniques.)

What does any of this have to do with "fairness"?  Nothing whatsoever!
There's work that has to be done, and choosing when to do it is
almost entirely a matter of staying out of the way of more urgent work
while minimizing the task's negative impact on the rest of the system.
Does that mean that the X server is "special", kind of the way that
latency-sensitive A/V applications are "special", and belongs in a
separate scheduler class?  No.  Nowadays, workloads where the kernel
has any idea what tasks belong to what "users" are the exception, not
the norm.  The X server is the canary in the coal mine, and a
scheduler that won't do the right thing for X without hand tweaking
won't do the right thing for other eyeball-driven,
multiple-tiers-on-one-box scenarios either.

If you want fairness among users to the extent that their demands
_compete_, you might as well partition the whole machine, and have a
separate fairness-oriented scheduler (let's call it a "hypervisor")
that lives outside the kernel.  (Talk about two students running gcc
on the same shell server, with more important people also doing things
on the same system, is so 1990's!)  Not that the design of scheduler
heuristics shouldn't include "fairness"-like considerations; but
they're probably only interesting as a fallback for when the scheduler
has no idea what it ought to schedule next.

So why is Ingo's s

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Davide Libenzi
On Wed, 18 Apr 2007, Matt Mackall wrote:

> On Wed, Apr 18, 2007 at 07:48:21AM -0700, Linus Torvalds wrote:
> > And "fairness by euid" is probably a hell of a lot easier to do than 
> > trying to figure out the wakeup matrix.
> 
> For the record, you actually don't need to track a whole NxN matrix
> (or do the implied O(n**3) matrix inversion!) to get to the same
> result. You can converge on the same node weightings (ie dynamic
> priorities) by applying a damped function at each transition point
> (directed wakeup, preemption, fork, exit).
> 
> The trouble with any scheme like this is that it needs careful tuning
> of the damping factor to converge rapidly and not oscillate and
> precise numerical attention to the transition functions so that the sum of
> dynamic priorities is conserved.

Doing that inside the boundaries of the time constrains imposed by a 
scheduler, is the interesting part. Given also that the size (and members) 
of it (matrix) is dynamic.
Also, a "wakup matrix" (if the name correctly pictures what it is for) 
would help with latencies and priority inheritance, but not for 
global fairness.
The maniacal fairness focus we're seeing now, is due to the fact the 
mainline can have extremely unfair behaviour under certain conditions. 
IMO fairness, although important, should not be main objective of the 
scheduler rewrite. Simplification and predictability should be on higher 
priority, with interactivity achievements bound to decent fariness 
constraints.




- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Diego Calleja
El Wed, 18 Apr 2007 10:22:59 -0700 (PDT), Linus Torvalds <[EMAIL PROTECTED]> 
escribió:

> So if you have 2 users on a machine running CPU hogs, you should *first* 
> try to be fair among users. If one user then runs 5 programs, and the 
> other one runs just 1, then the *one* program should get 50% of the CPU 
> time (the users fair share), and the five programs should get 10% of CPU 
> time each. And if one of them uses two threads, each thread should get 5%.


"Fairness between users" was implemented long time ago by rik van riel
(http://surriel.com/patches/2.4/2.4.19-fairsched). Some people has been
asking for a functionality like that for a long time, ie: universities that want
to avoid gcc processes from one student that is trying to learn how fork()
works from starving the processes of rest of the students.

But not only they want "fairness between users", they also want "priorities
between users and/or groups of users", ie: "the 'students' group shouldn't
starve the 'admins' group".
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Mark Glines
On Wed, 18 Apr 2007 10:22:59 -0700 (PDT)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> So if you have 2 users on a machine running CPU hogs, you should
> *first* try to be fair among users. If one user then runs 5 programs,
> and the other one runs just 1, then the *one* program should get 50%
> of the CPU time (the users fair share), and the five programs should
> get 10% of CPU time each. And if one of them uses two threads, each
> thread should get 5%.

This sounds great, to me.

One minor question: is it even possible to be completely fair on SMP?
For instance, if you have a 2-way SMP box running 3 applications, one of
which has 2 threads, will the threaded app have an advantage here?  (The
current system seems to try to keep each thread on a specific CPU, to
reduce cache thrashing, which means threads and processes alike each
get 50% of the CPU.)

Mark
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Ingo Molnar

* William Lee Irwin III <[EMAIL PROTECTED]> wrote:

> It does largely achieve the sort of fairness it set out for itself as 
> its design goal. One should also note that the queueing mechanism is 
> more than flexible enough to handle prioritization by a number of 
> different methods, and the large precision of its priorities is useful 
> there. So a rather broad variety of policies can be implemented by 
> changing the ->fair_key calculations.

yeah. Note that i concentrated on the bit that makes the largest 
interactivity improvement: to implement "precise scheduling" (a'ka 
complete fairness) between the 100+ user tasks that do a complex 
scheduling dance on a typical desktop on various workloads.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread William Lee Irwin III
On Wed, Apr 18, 2007 at 10:22:59AM -0700, Linus Torvalds wrote:
> So I claim that anything that cannot be fair by user ID is actually really 
> REALLY unfair. I think it's absolutely humongously STUPID to call 
> something the "Completely Fair Scheduler", and then just be fair on a 
> thread level. That's not fair AT ALL! It's the anti-thesis of being fair!
> So if you have 2 users on a machine running CPU hogs, you should *first* 
> try to be fair among users. If one user then runs 5 programs, and the 
> other one runs just 1, then the *one* program should get 50% of the CPU 
> time (the users fair share), and the five programs should get 10% of CPU 
> time each. And if one of them uses two threads, each thread should get 5%.
> So you should see one thread get 50& CPU (single thread of one user), 4 
> threads get 10% CPU (their fair share of that users time), and 2 threads 
> get 5% CPU (the fair share within that thread group!).
> Any scheduling argument that just considers the above to be "7 threads 
> total" and gives each thread 14% of CPU time "fairly" is *anything* but 
> fair. It's a joke if that kind of scheduler then calls itself CFS!

I don't think it's completely fair [sic] to come down on it that hard.
It does largely achieve the sort of fairness it set out for itself as
its design goal. One should also note that the queueing mechanism is
more than flexible enough to handle prioritization by a number of
different methods, and the large precision of its priorities is useful
there. So a rather broad variety of policies can be implemented by
changing the ->fair_key calculations.

In some respects, the vast priority space and very high clock precision
are two of its most crucial advantages.


On Wed, Apr 18, 2007 at 10:22:59AM -0700, Linus Torvalds wrote:
> And yes, that's largely what the current scheduler will do, but at least 
> the current scheduler doesn't claim to be fair! So the current scheduler 
> is a lot *better* if only in the sense that it doesn't make ridiculous 
> claims that aren't true!

The name chosen was somewhat buzzwordy. I'd have named it something more
descriptive of the algorithm, though what's implemented in the current
dynamic priority (i.e. ->fair_key) calculations are somewhat difficult
to precisely categorize.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> In that sense 'fairness' is not global (and in fact it is almost 
> _never_ a global property, as X runs under root uid [*]), it is only 
> the most lowlevel scheduling machinery that can then be built upon. 
> [...]

perhaps a more fitting term would be 'precise group-scheduling'. Within 
the lowest level task group entity (be that thread group or uid group, 
etc.) 'precise scheduling' is equivalent to 'fairness'.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> The fact is:
> 
>  - "fairness" is *not* about giving everybody the same amount of CPU 
>time (scaled by some niceness level or not). Anybody who thinks 
>that is "fair" is just being silly and hasn't thought it through.

yeah, very much so.

But note that most of the reported CFS interactivity wins, as surprising 
as it might be, were due to fairness between _the same user's tasks_. In 
the typical case, 99% of the desktop CPU time is executed either as X 
(root user) or under the uid of the logged in user, and X is just one 
task. Even with a bad hack of making X super-high-prio, interactivity as 
experienced by users still sucks without having fairness between the 
other 100-200 user tasks that a desktop system is typically using.

'renicing X to -10' is a broken way of achieving: 'root uid should get 
its share of CPU time too, no matter how many user tasks are running'. 
We can do this much cleaner by saying: 'each uid, if it has any tasks 
running, should get its fair share of CPU time, independently of the 
number of tasks it is running'.

In that sense 'fairness' is not global (and in fact it is almost _never_ 
a global property, as X runs under root uid [*]), it is only the most 
lowlevel scheduling machinery that can then be built upon. Higher-level 
controls to allocate CPU power between groups of tasks very much make 
sense - but according to the CFS interactivity test results i got from 
people so far, they very much need this basic fairness machinery 
_within_ the uid group too. So 'fairness' is still a powerful lower 
level scheduling concept. And this all makes lots of sense to me.

One purpose of doing the hierarchical scheduling classes stuff was to 
enable such higher scope task group decisions too. Next i'll try to 
figure out whether 'task group bandwidth' logic should live right within 
sched_fair.c itself, or whether it should be layered separately as a 
sched_group.c. Intutively i'd say it should live within sched_fair.c.

Ingo

[*] There are distributions where X does not run under root uid anymore.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Linus Torvalds


On Wed, 18 Apr 2007, Matt Mackall wrote:
>
> On Wed, Apr 18, 2007 at 07:48:21AM -0700, Linus Torvalds wrote:
> > And "fairness by euid" is probably a hell of a lot easier to do than 
> > trying to figure out the wakeup matrix.
> 
> For the record, you actually don't need to track a whole NxN matrix
> (or do the implied O(n**3) matrix inversion!) to get to the same
> result.

I'm sure you can do things differently, but the reason I think "fairness 
by euid" is actually worth looking at is that it's pretty much the 
*identical* issue that we'll have with "fairness by virtual machine" and a 
number of other "container" issues.

The fact is:

 - "fairness" is *not* about giving everybody the same amount of CPU time 
   (scaled by some niceness level or not). Anybody who thinks that is 
   "fair" is just being silly and hasn't thought it through.

 - "fairness" is multi-level. You want to be fair to threads within a 
   thread group (where "process" may be one good approximation of what a 
   "thread group" is, but not necessarily the only one).

   But you *also* want to be fair in between those "thread groups", and 
   then you want to be fair across "containers" (where "user" may be one 
   such container).

So I claim that anything that cannot be fair by user ID is actually really 
REALLY unfair. I think it's absolutely humongously STUPID to call 
something the "Completely Fair Scheduler", and then just be fair on a 
thread level. That's not fair AT ALL! It's the anti-thesis of being fair!

So if you have 2 users on a machine running CPU hogs, you should *first* 
try to be fair among users. If one user then runs 5 programs, and the 
other one runs just 1, then the *one* program should get 50% of the CPU 
time (the users fair share), and the five programs should get 10% of CPU 
time each. And if one of them uses two threads, each thread should get 5%.

So you should see one thread get 50& CPU (single thread of one user), 4 
threads get 10% CPU (their fair share of that users time), and 2 threads 
get 5% CPU (the fair share within that thread group!).

Any scheduling argument that just considers the above to be "7 threads 
total" and gives each thread 14% of CPU time "fairly" is *anything* but 
fair. It's a joke if that kind of scheduler then calls itself CFS!

And yes, that's largely what the current scheduler will do, but at least 
the current scheduler doesn't claim to be fair! So the current scheduler 
is a lot *better* if only in the sense that it doesn't make ridiculous 
claims that aren't true!

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])

2007-04-18 Thread Ingo Molnar

* Christian Hesse <[EMAIL PROTECTED]> wrote:

> Hi Ingo and all,
> 
> On Friday 13 April 2007, Ingo Molnar wrote:
> > as usual, any sort of feedback, bugreports, fixes and suggestions are
> > more than welcome,
> 
> I just gave CFS a try on my system. From a user's point of view it 
> looks good so far. Thanks for your work.

you are welcome!

> However I found a problem: When trying to suspend a system patched 
> with suspend2 2.2.9.11 it hangs with "doing atomic copy". Pressing the 
> ESC key results in a message that it tries to abort suspend, but then 
> still hangs.

i took a quick look at suspend2 and it makes some use of yield(). 
There's a bug in CFS's yield code, i've attached a patch that should fix 
it, does it make any difference to the hang?

Ingo

Index: linux/kernel/sched_fair.c
===
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -264,15 +264,26 @@ static void dequeue_task_fair(struct rq 
 
 /*
  * sched_yield() support is very simple via the rbtree, we just
- * dequeue and enqueue the task, which causes the task to
- * roundrobin to the end of the tree:
+ * dequeue the task and move it to the rightmost position, which
+ * causes the task to roundrobin to the end of the tree.
  */
 static void requeue_task_fair(struct rq *rq, struct task_struct *p)
 {
dequeue_task_fair(rq, p);
p->on_rq = 0;
-   enqueue_task_fair(rq, p);
+   /*
+* Temporarily insert at the last position of the tree:
+*/
+   p->fair_key = LLONG_MAX;
+   __enqueue_task_fair(rq, p);
p->on_rq = 1;
+
+   /*
+* Update the key to the real value, so that when all other
+* tasks from before the rightmost position have executed,
+* this task is picked up again:
+*/
+   p->fair_key = rq->fair_clock - p->wait_runtime + p->nice_offset;
 }
 
 /*
@@ -380,7 +391,10 @@ static void task_tick_fair(struct rq *rq
 * Dequeue and enqueue the task to update its
 * position within the tree:
 */
-   requeue_task_fair(rq, curr);
+   dequeue_task_fair(rq, curr);
+   curr->on_rq = 0;
+   enqueue_task_fair(rq, curr);
+   curr->on_rq = 1;
 
/*
 * Reschedule if another task tops the current one.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])

2007-04-18 Thread Christian Hesse
Hi Ingo and all,

On Friday 13 April 2007, Ingo Molnar wrote:
> as usual, any sort of feedback, bugreports, fixes and suggestions are
> more than welcome,

I just gave CFS a try on my system. From a user's point of view it looks good 
so far. Thanks for your work.

However I found a problem: When trying to suspend a system patched with 
suspend2 2.2.9.11 it hangs with "doing atomic copy". Pressing the ESC key 
results in a message that it tries to abort suspend, but then still hangs.

I cced suspend2 devel list, perhaps Nigel is interested as well.
-- 
Regards,
Chris


signature.asc
Description: This is a digitally signed message part.


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Matt Mackall
On Wed, Apr 18, 2007 at 07:48:21AM -0700, Linus Torvalds wrote:
> And "fairness by euid" is probably a hell of a lot easier to do than 
> trying to figure out the wakeup matrix.

For the record, you actually don't need to track a whole NxN matrix
(or do the implied O(n**3) matrix inversion!) to get to the same
result. You can converge on the same node weightings (ie dynamic
priorities) by applying a damped function at each transition point
(directed wakeup, preemption, fork, exit).

The trouble with any scheme like this is that it needs careful tuning
of the damping factor to converge rapidly and not oscillate and
precise numerical attention to the transition functions so that the sum of
dynamic priorities is conserved.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Linus Torvalds


On Wed, 18 Apr 2007, Matt Mackall wrote:
> 
> Why is X special? Because it does work on behalf of other processes?
> Lots of things do this. Perhaps a scheduler should focus entirely on
> the implicit and directed wakeup matrix and optimizing that
> instead[1].

I 100% agree - the perfect scheduler would indeed take into account where 
the wakeups come from, and try to "weigh" processes that help other 
processes make progress more. That would naturally give server processes 
more CPU power, because they help others

I don't believe for a second that "fairness" means "give everybody the 
same amount of CPU". That's a totally illogical measure of fairness. All 
processes are _not_ created equal.

That said, even trying to do "fairness by effective user ID" would 
probably already do a lot. In a desktop environment, X would get as much 
CPU time as the user processes, simply because it's in a different 
protection domain (and that's really what "effective user ID" means: it's 
not about "users", it's really about "protection domains").

And "fairness by euid" is probably a hell of a lot easier to do than 
trying to figure out the wakeup matrix.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread William Lee Irwin III
On Wed, Apr 18, 2007 at 12:55:25AM -0500, Matt Mackall wrote:
> Why are processes special? Should user A be able to get more CPU time
> for his job than user B by splitting it into N parallel jobs? Should
> we be fair per process, per user, per thread group, per session, per
> controlling terminal? Some weighted combination of the preceding?[2]

On a side note, I think a combination of all of the above is a very
good idea, plus process groups (pgrp's). All the make -j loads should
come up in one pgrp of one session for one user and hence should be
automatically kept isolated in its own corner by such policies. Thread
bombs, forkbombs, and so on get handled too, which is good when on e.g.
a compileserver and someone rudely spawns too many tasks.

Thinking of the scheduler as a CPU bandwidth allocator, this means
handing out shares of CPU bandwidth to all users on the system, which
in turn hand out shares of bandwidth to all sessions, which in turn
hand out shares of bandwidth to all process groups, which in turn hand
out shares of bandwidth to all thread groups, which in turn hand out
shares of bandwidth to threads. The event handlers for the scheduler
need not deal with this apart from task creation and exit and various
sorts of process ID changes (e.g. setsid(), setpgrp(), setuid(), etc.).
They just determine what the scheduler sees as ->load_weight or some
analogue of ->static_prio, though it is possible to do this by means of
data structure organization instead of numerical prioritization. It'd
probably have to be calculated on the fly by, say, doing fixpoint
arithmetic something like
user_share(p)*session_share(p)*pgrp_share(p)*tgrp_share(p)*task_share(p)
so that readjusting the shares of aggregates doesn't have to traverse
lists and remains O(1). Each of the share computations can instead just
do some analogue of the calculation p->load_weight/rq->raw_weighted_load
in fixpoint, though precision issues with this make me queasy. There is
maybe a slight nasty point in that the ->raw_weighted_load analogue for
users or whatever the highest level chosen is ends up being global. One
might as well get users in there and omit intermediate levels if any are
to be omitted so that the truly global state is as read-only as possible.

I suppose jacking up the fixpoint precision to 128-bit or 256-bit all
below the radix point (our max is 1.0 after all) until precision issues
vanish can be done but the idea of that much number crunching in the
scheduler makes me rather uncomfortable. I hope u64 or u32 [2] can be
gotten away with as far as fixpoint goes.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Peter Williams

Chris Friesen wrote:

Peter Williams wrote:

Chris Friesen wrote:


Suppose I have a really high priority task running.  Another very 
high priority task wakes up and would normally preempt the first one. 
However, there happens to be another cpu available.  It seems like it 
would be a win if we moved one of those tasks to the available cpu 
immediately so they can both run simultaneously.  This would seem to 
require some communication between the scheduler and the load balancer.



Not really the load balancer can do this on its own AND the decision 
should be based on the STATIC priority of the task being woken.


I guess I don't follow.  How would the load balancer know that it needs 
to run?  Running on every task wake-up seems expensive.  Also, static 
priority isn't everything.  What about the gang-scheduler concept where 
certain tasks must be scheduled simultaneously on different cpus?  What 
about a resource-group scenario where you have per-cpu resource limits, 
so that for good latency/fairness you need to force a high priority task 
to migrate to another cpu once it has consumed the cpu allocation of 
that group on the current cpu?


I can see having a generic load balancer core code, but it seems to me 
that the scheduler proper needs to have some way of triggering the load 
balancer to run,


It doesn't have to be closely coupled with the load balancer to does 
this.  It just needs to know where the trigger is.


and some kind of goodness functions to indicate a) 
which tasks to move, and b) where to move them.


That's the load balancer's job and even if you use dynamic priority for 
load balancing it still wouldn't need to be closely coupled.  The load 
balancer would just need to know how to find a process's dynamic priority.


In fact, in the current set up, the load balancer decides how much load 
needs to be moved based on the static load on the CPUs but uses dynamic 
priority (to a large degree) to decide which ones to move.  This is due 
more to computational efficiency considerations than any deliberate 
design (I suspect) as the fact that tasks are stored on the runqueue in 
dynamic priority order makes looking at processes in dynamic priority 
order is the most efficient strategy.


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Con Kolivas
On Wednesday 18 April 2007 22:13, Nick Piggin wrote:
> On Wed, Apr 18, 2007 at 11:53:34AM +0200, Ingo Molnar wrote:
> > * Nick Piggin <[EMAIL PROTECTED]> wrote:
> > > So looking at elapsed time, a granularity of 100ms is just behind the
> > > mainline score. However it is using slightly less user time and
> > > slightly more idle time, which indicates that balancing might have got
> > > a bit less aggressive.
> > >
> > > But anyway, it conclusively shows the efficiency impact of such tiny
> > > timeslices.
> >
> > yeah, the 4% drop in a CPU-cache-sensitive workload like kernbench is
> > not unexpected when going to really frequent preemption. Clearly, the
> > default preemption granularity needs to be tuned up.
> >
> > I think you said you measured ~3msec average preemption rate per CPU?
>
> This was just looking at ctxsw numbers from running 2 cpu hogs on the
> same runqueue.
>
> > That would suggest the average cache-trashing cost was 120 usecs per
> > every 3 msec window. Taking that as a ballpark figure, to get the
> > difference back into the noise range we'd have to either use ~5 msec:
> >
> > echo 500 > /proc/sys/kernel/sched_granularity
> >
> > or 15 msec:
> >
> > echo 1500 > /proc/sys/kernel/sched_granularity
> >
> > (depending on whether it's 5x 3msec or 5x 1msec - i'm still not sure i
> > correctly understood your 3msec value. I'd have to know your kernbench
> > workload's approximate 'steady state' context-switch rate to do a more
> > accurate calculation.)
>
> The kernel compile (make -j8 on 4 thread system) is doing 1800 total
> context switches per second (450/s per runqueue) for cfs, and 670
> for mainline. Going up to 20ms granularity for cfs brings the context
> switch numbers similar, but user time is still a % or so higher. I'd
> be more worried about compute heavy threads which naturally don't do
> much context switching.

While kernel compiles are nice and easy to do I've seen enough criticism of 
them in the past to wonder about their usefulness as a standard benchmark on 
their own.

>
> Some other numbers on the same system
> Hackbench:2.6.21-rc7  cfs-v2 1ms[*]   nicksched
> 10 groups: Time: 1.3320.743   0.607
> 20 groups: Time: 1.1971.100   1.241
> 30 groups: Time: 1.7542.376   1.834
> 40 groups: Time: 3.4512.227   2.503
> 50 groups: Time: 3.7263.399   3.220
> 60 groups: Time: 3.5484.567   3.668
> 70 groups: Time: 4.2064.905   4.314
> 80 groups: Time: 4.5516.324   4.879
> 90 groups: Time: 7.9046.962   5.335
> 100 groups: Time: 7.293   7.799   5.857
> 110 groups: Time: 10.595  8.728   6.517
> 120 groups: Time: 7.543   9.304   7.082
> 130 groups: Time: 8.269   10.639  8.007
> 140 groups: Time: 11.867  8.250   8.302
> 150 groups: Time: 14.852  8.656   8.662
> 160 groups: Time: 9.648   9.313   9.541

Hackbench even more so. A prolonged discussion with Rusty Russell on this 
issue he suggested hackbench was more a pass/fail benchmark to ensure there 
was no starvation scenario that never ended, and very little value should be 
placed on the actual results returned from it.

Wli's concerns regarding some sort of standard framework for a battery of 
accepted meaningful benchmarks comes to mind as important rather than ones 
that highlight one over the other. So while interesting for their own 
endpoints, I certainly wouldn't put either benchmark as some sort of 
yardstick for a "winner". Note I'm not saying that we shouldn't be looking at 
them per se, but since the whole drive for a new scheduler is trying to be 
more objective we need to start expanding the range of benchmarks. Even 
though I don't feel the need to have SD in the "race" I guess it stands for 
more data to compare what is possible/where as well.

> Mainline seems pretty inconsistent here.
>
> lmbench 0K ctxsw latency bound to CPU0:
> tasks
> 2 2.593.422.50
> 4 3.263.543.09
> 8 3.013.643.22
> 163.003.663.50
> 322.993.703.49
> 643.094.173.50
> 128   4.805.584.74
> 256   5.796.375.76
>
> cfs is noticably disadvantaged.
>
> [*] 500ms didn't make much difference in either test.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Con Kolivas
On Wednesday 18 April 2007 22:14, Nick Piggin wrote:
> On Wed, Apr 18, 2007 at 07:33:56PM +1000, Con Kolivas wrote:
> > On Wednesday 18 April 2007 18:55, Nick Piggin wrote:
> > > Again, for comparison 2.6.21-rc7 mainline:
> > >
> > > 508.87user 32.47system 2:17.82elapsed 392%CPU
> > > 509.05user 32.25system 2:17.84elapsed 392%CPU
> > > 508.75user 32.26system 2:17.83elapsed 392%CPU
> > > 508.63user 32.17system 2:17.88elapsed 392%CPU
> > > 509.01user 32.26system 2:17.90elapsed 392%CPU
> > > 509.08user 32.20system 2:17.95elapsed 392%CPU
> > >
> > > So looking at elapsed time, a granularity of 100ms is just behind the
> > > mainline score. However it is using slightly less user time and
> > > slightly more idle time, which indicates that balancing might have got
> > > a bit less aggressive.
> > >
> > > But anyway, it conclusively shows the efficiency impact of such tiny
> > > timeslices.
> >
> > See test.kernel.org for how (the now defunct) SD was performing on
> > kernbench. It had low latency _and_ equivalent throughput to mainline.
> > Set the standard appropriately on both counts please.
>
> I can give it a run. Got an updated patch against -rc7?

I said I wasn't pursuing it but since you're offering, the rc6 patch should 
apply ok.

http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc6-sd-0.40.patch

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Nick Piggin
On Wed, Apr 18, 2007 at 07:33:56PM +1000, Con Kolivas wrote:
> On Wednesday 18 April 2007 18:55, Nick Piggin wrote:
> > Again, for comparison 2.6.21-rc7 mainline:
> >
> > 508.87user 32.47system 2:17.82elapsed 392%CPU
> > 509.05user 32.25system 2:17.84elapsed 392%CPU
> > 508.75user 32.26system 2:17.83elapsed 392%CPU
> > 508.63user 32.17system 2:17.88elapsed 392%CPU
> > 509.01user 32.26system 2:17.90elapsed 392%CPU
> > 509.08user 32.20system 2:17.95elapsed 392%CPU
> >
> > So looking at elapsed time, a granularity of 100ms is just behind the
> > mainline score. However it is using slightly less user time and
> > slightly more idle time, which indicates that balancing might have got
> > a bit less aggressive.
> >
> > But anyway, it conclusively shows the efficiency impact of such tiny
> > timeslices.
> 
> See test.kernel.org for how (the now defunct) SD was performing on kernbench. 
> It had low latency _and_ equivalent throughput to mainline. Set the standard 
> appropriately on both counts please.

I can give it a run. Got an updated patch against -rc7?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Nick Piggin
On Wed, Apr 18, 2007 at 11:53:34AM +0200, Ingo Molnar wrote:
> 
> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > So looking at elapsed time, a granularity of 100ms is just behind the 
> > mainline score. However it is using slightly less user time and 
> > slightly more idle time, which indicates that balancing might have got 
> > a bit less aggressive.
> > 
> > But anyway, it conclusively shows the efficiency impact of such tiny 
> > timeslices.
> 
> yeah, the 4% drop in a CPU-cache-sensitive workload like kernbench is 
> not unexpected when going to really frequent preemption. Clearly, the 
> default preemption granularity needs to be tuned up.
> 
> I think you said you measured ~3msec average preemption rate per CPU? 

This was just looking at ctxsw numbers from running 2 cpu hogs on the
same runqueue.

> That would suggest the average cache-trashing cost was 120 usecs per 
> every 3 msec window. Taking that as a ballpark figure, to get the 
> difference back into the noise range we'd have to either use ~5 msec:
> 
> echo 500 > /proc/sys/kernel/sched_granularity
> 
> or 15 msec:
> 
> echo 1500 > /proc/sys/kernel/sched_granularity
> 
> (depending on whether it's 5x 3msec or 5x 1msec - i'm still not sure i 
> correctly understood your 3msec value. I'd have to know your kernbench 
> workload's approximate 'steady state' context-switch rate to do a more 
> accurate calculation.)

The kernel compile (make -j8 on 4 thread system) is doing 1800 total
context switches per second (450/s per runqueue) for cfs, and 670
for mainline. Going up to 20ms granularity for cfs brings the context
switch numbers similar, but user time is still a % or so higher. I'd
be more worried about compute heavy threads which naturally don't do
much context switching.

Some other numbers on the same system
Hackbench:  2.6.21-rc7  cfs-v2 1ms[*]   nicksched
10 groups: Time: 1.332  0.743   0.607
20 groups: Time: 1.197  1.100   1.241
30 groups: Time: 1.754  2.376   1.834
40 groups: Time: 3.451  2.227   2.503
50 groups: Time: 3.726  3.399   3.220
60 groups: Time: 3.548  4.567   3.668
70 groups: Time: 4.206  4.905   4.314
80 groups: Time: 4.551  6.324   4.879
90 groups: Time: 7.904  6.962   5.335
100 groups: Time: 7.293 7.799   5.857
110 groups: Time: 10.5958.728   6.517
120 groups: Time: 7.543 9.304   7.082
130 groups: Time: 8.269 10.639  8.007
140 groups: Time: 11.8678.250   8.302
150 groups: Time: 14.8528.656   8.662
160 groups: Time: 9.648 9.313   9.541

Mainline seems pretty inconsistent here.

lmbench 0K ctxsw latency bound to CPU0:
tasks
2   2.593.422.50
4   3.263.543.09
8   3.013.643.22
16  3.003.663.50
32  2.993.703.49
64  3.094.173.50
128 4.805.584.74
256 5.796.375.76

cfs is noticably disadvantaged.

[*] 500ms didn't make much difference in either test.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Ingo Molnar

* Andy Whitcroft <[EMAIL PROTECTED]> wrote:

> > as usual, any sort of feedback, bugreports, fixes and suggestions 
> > are more than welcome,
> 
> Pushed this through the test.kernel.org and nothing new blew up. 
> Notably the kernbench figures are within expectations even on the 
> bigger numa systems, commonly badly affected by balancing problems in 
> the schedular.

thanks! Given the really low preemption latency/granularity default 
(roughly equivalent to 'timeslice length'), and that basically all of my 
focus was on interactivity characteristics, this is a pretty good 
result. I suspect it will be necessary to increase the default to 10 
msecs (or more) to be on the safe side. (Nick has reported a 4% 
kernbench drop so for his kernbench workload it's needed.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> > > 535.43user 30.62system 2:23.72elapsed 393%CPU
> > 
> > Thanks for testing this! Could you please try this also with:
> > 
> >echo 1 > /proc/sys/kernel/sched_granularity
> 
> 507.68user 31.87system 2:18.05elapsed 390%CPU
> 507.99user 31.93system 2:18.09elapsed 390%CPU

> > could you maybe even try a more extreme setting of:
> > 
> >echo 5 > /proc/sys/kernel/sched_granularity

> 506.69user 31.96system 2:17.82elapsed 390%CPU
> 505.70user 31.84system 2:17.90elapsed 389%CPU

> Again, for comparison 2.6.21-rc7 mainline:
> 
> 508.87user 32.47system 2:17.82elapsed 392%CPU
> 509.05user 32.25system 2:17.84elapsed 392%CPU

thanks for testing this!

> So looking at elapsed time, a granularity of 100ms is just behind the 
> mainline score. However it is using slightly less user time and 
> slightly more idle time, which indicates that balancing might have got 
> a bit less aggressive.
> 
> But anyway, it conclusively shows the efficiency impact of such tiny 
> timeslices.

yeah, the 4% drop in a CPU-cache-sensitive workload like kernbench is 
not unexpected when going to really frequent preemption. Clearly, the 
default preemption granularity needs to be tuned up.

I think you said you measured ~3msec average preemption rate per CPU? 
That would suggest the average cache-trashing cost was 120 usecs per 
every 3 msec window. Taking that as a ballpark figure, to get the 
difference back into the noise range we'd have to either use ~5 msec:

echo 500 > /proc/sys/kernel/sched_granularity

or 15 msec:

echo 1500 > /proc/sys/kernel/sched_granularity

(depending on whether it's 5x 3msec or 5x 1msec - i'm still not sure i 
correctly understood your 3msec value. I'd have to know your kernbench 
workload's approximate 'steady state' context-switch rate to do a more 
accurate calculation.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Con Kolivas
On Wednesday 18 April 2007 18:55, Nick Piggin wrote:
> On Tue, Apr 17, 2007 at 11:59:00AM +0200, Ingo Molnar wrote:
> > * Nick Piggin <[EMAIL PROTECTED]> wrote:
> > > 2.6.21-rc7-cfs-v2
> > > 534.80user 30.92system 2:23.64elapsed 393%CPU
> > > 534.75user 31.01system 2:23.70elapsed 393%CPU
> > > 534.66user 31.07system 2:23.76elapsed 393%CPU
> > > 534.56user 30.91system 2:23.76elapsed 393%CPU
> > > 534.66user 31.07system 2:23.67elapsed 393%CPU
> > > 535.43user 30.62system 2:23.72elapsed 393%CPU
> >
> > Thanks for testing this! Could you please try this also with:
> >
> >echo 1 > /proc/sys/kernel/sched_granularity
>
> 507.68user 31.87system 2:18.05elapsed 390%CPU
> 507.99user 31.93system 2:18.09elapsed 390%CPU
> 507.46user 31.78system 2:18.03elapsed 390%CPU
> 507.68user 31.93system 2:18.11elapsed 390%CPU
> 507.63user 31.98system 2:18.01elapsed 390%CPU
> 507.83user 31.94system 2:18.28elapsed 390%CPU
>
> > could you maybe even try a more extreme setting of:
> >
> >echo 5 > /proc/sys/kernel/sched_granularity
>
> 504.87user 32.13system 2:18.03elapsed 389%CPU
> 505.94user 32.29system 2:17.87elapsed 390%CPU
> 506.10user 31.90system 2:17.96elapsed 389%CPU
> 505.02user 32.02system 2:17.96elapsed 389%CPU
> 506.69user 31.96system 2:17.82elapsed 390%CPU
> 505.70user 31.84system 2:17.90elapsed 389%CPU
>
>
> Again, for comparison 2.6.21-rc7 mainline:
>
> 508.87user 32.47system 2:17.82elapsed 392%CPU
> 509.05user 32.25system 2:17.84elapsed 392%CPU
> 508.75user 32.26system 2:17.83elapsed 392%CPU
> 508.63user 32.17system 2:17.88elapsed 392%CPU
> 509.01user 32.26system 2:17.90elapsed 392%CPU
> 509.08user 32.20system 2:17.95elapsed 392%CPU
>
> So looking at elapsed time, a granularity of 100ms is just behind the
> mainline score. However it is using slightly less user time and
> slightly more idle time, which indicates that balancing might have got
> a bit less aggressive.
>
> But anyway, it conclusively shows the efficiency impact of such tiny
> timeslices.

See test.kernel.org for how (the now defunct) SD was performing on kernbench. 
It had low latency _and_ equivalent throughput to mainline. Set the standard 
appropriately on both counts please.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Nick Piggin
On Tue, Apr 17, 2007 at 11:59:00AM +0200, Ingo Molnar wrote:
> 
> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > 2.6.21-rc7-cfs-v2
> > 534.80user 30.92system 2:23.64elapsed 393%CPU
> > 534.75user 31.01system 2:23.70elapsed 393%CPU
> > 534.66user 31.07system 2:23.76elapsed 393%CPU
> > 534.56user 30.91system 2:23.76elapsed 393%CPU
> > 534.66user 31.07system 2:23.67elapsed 393%CPU
> > 535.43user 30.62system 2:23.72elapsed 393%CPU
> 
> Thanks for testing this! Could you please try this also with:
> 
>echo 1 > /proc/sys/kernel/sched_granularity

507.68user 31.87system 2:18.05elapsed 390%CPU
507.99user 31.93system 2:18.09elapsed 390%CPU
507.46user 31.78system 2:18.03elapsed 390%CPU
507.68user 31.93system 2:18.11elapsed 390%CPU
507.63user 31.98system 2:18.01elapsed 390%CPU
507.83user 31.94system 2:18.28elapsed 390%CPU

> could you maybe even try a more extreme setting of:
> 
>echo 5 > /proc/sys/kernel/sched_granularity

504.87user 32.13system 2:18.03elapsed 389%CPU
505.94user 32.29system 2:17.87elapsed 390%CPU
506.10user 31.90system 2:17.96elapsed 389%CPU
505.02user 32.02system 2:17.96elapsed 389%CPU
506.69user 31.96system 2:17.82elapsed 390%CPU
505.70user 31.84system 2:17.90elapsed 389%CPU


Again, for comparison 2.6.21-rc7 mainline:

508.87user 32.47system 2:17.82elapsed 392%CPU
509.05user 32.25system 2:17.84elapsed 392%CPU
508.75user 32.26system 2:17.83elapsed 392%CPU
508.63user 32.17system 2:17.88elapsed 392%CPU
509.01user 32.26system 2:17.90elapsed 392%CPU
509.08user 32.20system 2:17.95elapsed 392%CPU

So looking at elapsed time, a granularity of 100ms is just behind the
mainline score. However it is using slightly less user time and
slightly more idle time, which indicates that balancing might have got
a bit less aggressive.

But anyway, it conclusively shows the efficiency impact of such tiny
timeslices.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread James Bruce

Matt Mackall wrote:

On Tue, Apr 17, 2007 at 03:59:02PM -0700, William Lee Irwin III wrote:

On Tue, Apr 17, 2007 at 03:32:56PM -0700, William Lee Irwin III wrote:
I'm working with the following suggestion:

On Tue, Apr 17, 2007 at 09:07:49AM -0400, James Bruce wrote:

Nonlinear is a must IMO.  I would suggest X = exp(ln(10)/10) ~= 1.2589
That value has the property that a nice=10 task gets 1/10th the cpu of a
nice=0 task, and a nice=20 task gets 1/100 of nice=0.  I think that
would be fairly easy to explain to admins and users so that they can
know what to expect from nicing tasks.

I'm not likely to write the testcase until this upcoming weekend, though.


So that means there's a 1:1 ratio between nice 20 and nice -19. In
that sort of dynamic range, you're likely to have non-trivial
numerical accuracy issues in integer/fixed-point math.


Well, you *are* specifying vastly different priorities.  The question is 
how many nice=20 tasks should it take to interfere with a nice=-19 task? 
 If you've only got a 100:1 ratio, 100 nice=20 tasks will take ~50% of 
the CPU away from a nice=-19 task.  I don't think that's ideal, as in my 
mind a -19 task shouldn't have to care how many nice=20 tasks there are 
(within reason).  IMHO, if a user is running a CPU hog at nice=-19, and 
expecting nice=20 tasks to run immediately, I don't think the scheduler 
is the problem.



(Especially if your clock is jiffies-scale, which a significant number
of machines will continue to be.)

I really think if we want to have vastly different ratios, we probably
want to be looking at BATCH and RT scheduling classes instead.


I, like all users, can live with anything, but there should be a clear 
specification of what the user should expect.  Magic changes in the 
function at nice=0, or no real clear meaning at all (mainline), are both 
things that don't help the users to figure that out.  I like the 
exponential base because shifting all tasks up or down one nice level 
does not change the relative cpu distribution (i.e. two tasks 
{nice=-5,nice=0} get the same relative cpu distribution as if they were 
{nice=0,nice=5}.  An exponential base is the only way that property can 
hold.


Now, perhaps implementation issues may prevent something like the 
"1.2589" ratio rule from being realized, but I'm not sure we should 
throw it out _before_ we know that it's actually a problem.  This is the 
same sort of resistance that the timekeeping code updates faces (using 
nanoseconds everywhere instead of "natural" clock bases), but that got 
addressed eventually.


 - Jim Bruce

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Nick Piggin
On Wed, Apr 18, 2007 at 01:55:34AM -0500, Matt Mackall wrote:
> On Wed, Apr 18, 2007 at 08:37:11AM +0200, Nick Piggin wrote:
> > I don't know how that supports your argument for unfairness,
> 
> I never had such an argument. I like fairness.
> 
> My argument is that -you- don't have an argument for making fairness a
> -requirement-.

It seems easy enough that there is no point accepting unfair
behaviour like the old scheduler if we're going to go to all
this trouble to replace it. The old scheduler seems to have
bounded unfairness and bounded starvation, so let the good times
roll.


> > processes are special only because that's how we've always done
> > scheduling.  I'm not precluding other groupings for fairness, though.
> 
> If you make one form of fairness a -requirement- for all acceptable
> algorithms, your -are- precluding most other forms of fairness.
> 
> If you refuse to define what "fairness" means when specifying your
> requirement, what's the point of requiring it?

I don't refuse. I'm talking about per-process CPU time fairness.
My paragraph above was pointing out that subsequent work to
add other classes of fairness are not excluded as configurable
features, but this basic type of fairness should be included.


> > What do you mean optimal? If your criteria is fairness, then of course
> > it is optimal. If your criteria is throughput, then it probably isn't.
> 
> I don't know what optimal behavior is. And neither do you. It may or
> may not be fair. It very likely includes small deviations from fair.

You misunderstand me. There is no single "optimal" when you're talking
about fairness (or most other scheduler things). So pondering whether
fairness is optimal or not doesn't really make sense.

I'm saying it should be a basic axiom, not that it is quantitively
better. It isn't a refutable argument. I state it because that it is
what users and programs expect.

You can reject that, and fine. I guess if a scheduler comes along that
does exactly the right thing for everyone, then it is better than any
fair scheduler. So OK, while we're talking theoretical, I won't dismiss
an unfair scheduler out of hand.


> > > [2] It's trivial to construct two or more perfectly reasonable and
> > > desirable definitions of fairness that are mutually incompatible.
> > 
> > Probably not if you use common sense, and in the context of a replacement
> > for the 2.6 scheduler.
> 
> Ok, trivial example. You cannot allocate equal CPU time to
> processes/tasks and simultaneously allocate equal time to thread
> groups. Is it common sense that a heavily-threaded app should be able
> to get hugely more CPU than a well-written app? No. I don't want Joe's
> stupid Java app to make my compile crawl.
> 
> On the other hand, if my heavily threaded app is, say, a voicemail
> server serving 30 customers, I probably want it to get 30x the CPU of
> my gzip job.

So that might be a nice addition, but the base funcionality is threads
simply because that's what we've always done. Just common sense.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Matt Mackall
On Wed, Apr 18, 2007 at 08:37:11AM +0200, Nick Piggin wrote:
> I don't know how that supports your argument for unfairness,

I never had such an argument. I like fairness.

My argument is that -you- don't have an argument for making fairness a
-requirement-.

> processes are special only because that's how we've always done
> scheduling.  I'm not precluding other groupings for fairness, though.

If you make one form of fairness a -requirement- for all acceptable
algorithms, your -are- precluding most other forms of fairness.

If you refuse to define what "fairness" means when specifying your
requirement, what's the point of requiring it?

> What do you mean optimal? If your criteria is fairness, then of course
> it is optimal. If your criteria is throughput, then it probably isn't.

I don't know what optimal behavior is. And neither do you. It may or
may not be fair. It very likely includes small deviations from fair.

> > [2] It's trivial to construct two or more perfectly reasonable and
> > desirable definitions of fairness that are mutually incompatible.
> 
> Probably not if you use common sense, and in the context of a replacement
> for the 2.6 scheduler.

Ok, trivial example. You cannot allocate equal CPU time to
processes/tasks and simultaneously allocate equal time to thread
groups. Is it common sense that a heavily-threaded app should be able
to get hugely more CPU than a well-written app? No. I don't want Joe's
stupid Java app to make my compile crawl.

On the other hand, if my heavily threaded app is, say, a voicemail
server serving 30 customers, I probably want it to get 30x the CPU of
my gzip job.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-17 Thread Nick Piggin
On Wed, Apr 18, 2007 at 12:55:25AM -0500, Matt Mackall wrote:
> On Wed, Apr 18, 2007 at 07:00:24AM +0200, Nick Piggin wrote:
> > > It's also not yet clear that a scheduler can't be taught to do the
> > > right thing with X without fiddling with nice levels.
> > 
> > Being fair doesn't prevent that. Implicit unfairness is wrong though,
> > because it will bite people.
> > 
> > What's wrong with allowing X to get more than it's fair share of CPU
> > time by "fiddling with nice levels"? That's what they're there for.
> 
> Why is X special? Because it does work on behalf of other processes?

The high level reason is that giving it more than its fair share of
CPU allows a desktop to remain interactive under load. And it isn't
just about doing work on behalf of other processes. Mouse interrupts
are a big part of it, for example.

> Lots of things do this. Perhaps a scheduler should focus entirely on
> the implicit and directed wakeup matrix and optimizing that
> instead[1].

You could do that, and I tried a variant of it at one point. The
problem was that it leads to unexpected bad things too.

UNIX programs more or less expect fair SCHED_OTHER scheduling, and
given the principle of least surprise...


> Why are processes special? Should user A be able to get more CPU time
> for his job than user B by splitting it into N parallel jobs? Should
> we be fair per process, per user, per thread group, per session, per
> controlling terminal? Some weighted combination of the preceding?[2]

I don't know how that supports your argument for unfairness, but
processes are special only because that's how we've always done
scheduling.  I'm not precluding other groupings for fairness, though.


> Why is the measure CPU time? I can imagine a scheduler that weighed
> memory bandwidth in the equation. Or power consumption. Or execution
> unit usage.

Feel free. And I'd also argue that once you schedule for those metrics
then fairness is also important there too.


> Fairness is nice. It's simple, it's obvious, it's predictable. But
> it's just not clear that it's optimal. If the question is (and it
> was!) "what should the basic requirements for the scheduler be?" it's
> not clear that fairness is a requirement or even how to pick a metric
> for fairness that's obviously and uniquely superior.

What do you mean optimal? If your criteria is fairness, then of course
it is optimal. If your criteria is throughput, then it probably isn't.

Considering it is simple and what we've always done, measuring fairness
by CPU time per process is obvious for a general purpose scheduler.
If you accept that, then I argue that fairness is an optimal property
given that the alternative is unfairness.


> It's instead much easier to try to recognize and rule out really bad
> behaviour with bounded latencies, minimum service guarantees, etc.

It's the bad behaviour that you didn't recognize that is the problem.
If you start with explicit fairness, then unfairness will never be
one of those problems.


> [1] That's basically how Google decides to prioritize webpages, which
> it seems to do moderately well. And how a number of other optimization
> problems are solved.

This is not an optimization problem, it is a heuristic. There is no
right and wrong answer.


> [2] It's trivial to construct two or more perfectly reasonable and
> desirable definitions of fairness that are mutually incompatible.

Probably not if you use common sense, and in the context of a replacement
for the 2.6 scheduler.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-17 Thread Matt Mackall
On Wed, Apr 18, 2007 at 07:00:24AM +0200, Nick Piggin wrote:
> > It's also not yet clear that a scheduler can't be taught to do the
> > right thing with X without fiddling with nice levels.
> 
> Being fair doesn't prevent that. Implicit unfairness is wrong though,
> because it will bite people.
> 
> What's wrong with allowing X to get more than it's fair share of CPU
> time by "fiddling with nice levels"? That's what they're there for.

Why is X special? Because it does work on behalf of other processes?
Lots of things do this. Perhaps a scheduler should focus entirely on
the implicit and directed wakeup matrix and optimizing that
instead[1].

Why are processes special? Should user A be able to get more CPU time
for his job than user B by splitting it into N parallel jobs? Should
we be fair per process, per user, per thread group, per session, per
controlling terminal? Some weighted combination of the preceding?[2]

Why is the measure CPU time? I can imagine a scheduler that weighed
memory bandwidth in the equation. Or power consumption. Or execution
unit usage.

Fairness is nice. It's simple, it's obvious, it's predictable. But
it's just not clear that it's optimal. If the question is (and it
was!) "what should the basic requirements for the scheduler be?" it's
not clear that fairness is a requirement or even how to pick a metric
for fairness that's obviously and uniquely superior.

It's instead much easier to try to recognize and rule out really bad
behaviour with bounded latencies, minimum service guarantees, etc.

[1] That's basically how Google decides to prioritize webpages, which
it seems to do moderately well. And how a number of other optimization
problems are solved.

[2] It's trivial to construct two or more perfectly reasonable and
desirable definitions of fairness that are mutually incompatible.
-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-17 Thread Chris Friesen

Peter Williams wrote:

Chris Friesen wrote:


Suppose I have a really high priority task running.  Another very high 
priority task wakes up and would normally preempt the first one. 
However, there happens to be another cpu available.  It seems like it 
would be a win if we moved one of those tasks to the available cpu 
immediately so they can both run simultaneously.  This would seem to 
require some communication between the scheduler and the load balancer.



Not really the load balancer can do this on its own AND the decision 
should be based on the STATIC priority of the task being woken.


I guess I don't follow.  How would the load balancer know that it needs 
to run?  Running on every task wake-up seems expensive.  Also, static 
priority isn't everything.  What about the gang-scheduler concept where 
certain tasks must be scheduled simultaneously on different cpus?  What 
about a resource-group scenario where you have per-cpu resource limits, 
so that for good latency/fairness you need to force a high priority task 
to migrate to another cpu once it has consumed the cpu allocation of 
that group on the current cpu?


I can see having a generic load balancer core code, but it seems to me 
that the scheduler proper needs to have some way of triggering the load 
balancer to run, and some kind of goodness functions to indicate a) 
which tasks to move, and b) where to move them.


Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-17 Thread Nick Piggin
On Tue, Apr 17, 2007 at 11:38:31PM -0500, Matt Mackall wrote:
> On Wed, Apr 18, 2007 at 05:15:11AM +0200, Nick Piggin wrote:
> > 
> > I don't know why this would be a useful feature (of course I'm talking
> > about processes at the same nice level). One of the big problems with
> > the current scheduler is that it is unfair in some corner cases. It
> > works OK for most people, but when it breaks down it really hurts. At
> > least if you start with a fair scheduler, you can alter priorities
> > until it satisfies your need... with an unfair one your guess is as
> > good as mine.
> > 
> > So on what basis would you allow unfairness? On the basis that it doesn't
> > seem to harm anyone? It doesn't seem to harm testers?
> 
> On the basis that there's only anecdotal evidence thus far that
> fairness is the right approach.
> 
> It's not yet clear that a fair scheduler can do the right thing with X,
> with various kernel threads, etc. without fiddling with nice levels.
> Which makes it no longer "completely fair".

Of course I mean SCHED_OTHER tasks at the same nice level. Otherwise
I would be arguing to make nice basically a noop.


> It's also not yet clear that a scheduler can't be taught to do the
> right thing with X without fiddling with nice levels.

Being fair doesn't prevent that. Implicit unfairness is wrong though,
because it will bite people.

What's wrong with allowing X to get more than it's fair share of CPU
time by "fiddling with nice levels"? That's what they're there for.


> So I'm just not yet willing to completely rule out systems that aren't
> "completely fair".
> 
> But I think we should rule out schedulers that don't have rigid bounds on
> that unfairness. That's where the really ugly behavior lies.

Been a while since I really looked at the mainline scheduler, but I
don't think it can permanently starve something, so I don't know what
your bounded unfairness would help with.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-17 Thread Matt Mackall
On Wed, Apr 18, 2007 at 05:15:11AM +0200, Nick Piggin wrote:
> On Tue, Apr 17, 2007 at 04:39:54PM -0500, Matt Mackall wrote:
> > On Tue, Apr 17, 2007 at 09:01:55AM +0200, Nick Piggin wrote:
> > > On Mon, Apr 16, 2007 at 11:26:21PM -0700, William Lee Irwin III wrote:
> > > > On Mon, Apr 16, 2007 at 11:09:55PM -0700, William Lee Irwin III wrote:
> > > > >> All things are not equal; they all have different properties. I like
> > > > 
> > > > On Tue, Apr 17, 2007 at 08:15:03AM +0200, Nick Piggin wrote:
> > > > > Exactly. So we have to explore those properties and evaluate 
> > > > > performance
> > > > > (in all meanings of the word). That's only logical.
> > > > 
> > > > Any chance you'd be willing to put down a few thoughts on what sorts
> > > > of standards you'd like to set for both correctness (i.e. the bare
> > > > minimum a scheduler implementation must do to be considered valid
> > > > beyond not oopsing) and performance metrics (i.e. things that produce
> > > > numbers for each scheduler you can compare to say "this scheduler is
> > > > better than this other scheduler at this.").
> > > 
> > > Yeah I guess that's the hard part :)
> > > 
> > > For correctness, I guess fairness is an easy one. I think that unfairness
> > > is basically a bug and that it would be very unfortunate to merge 
> > > something
> > > unfair. But this is just within the context of a single runqueue... for
> > > better or worse, we allow some unfairness in multiprocessors for 
> > > performance
> > > reasons of course.
> > 
> > I'm a big fan of fairness, but I think it's a bit early to declare it
> > a mandatory feature. Bounded unfairness is probably something we can
> > agree on, ie "if we decide to be unfair, no process suffers more than
> > a factor of x".
> 
> I don't know why this would be a useful feature (of course I'm talking
> about processes at the same nice level). One of the big problems with
> the current scheduler is that it is unfair in some corner cases. It
> works OK for most people, but when it breaks down it really hurts. At
> least if you start with a fair scheduler, you can alter priorities
> until it satisfies your need... with an unfair one your guess is as
> good as mine.
> 
> So on what basis would you allow unfairness? On the basis that it doesn't
> seem to harm anyone? It doesn't seem to harm testers?

On the basis that there's only anecdotal evidence thus far that
fairness is the right approach.

It's not yet clear that a fair scheduler can do the right thing with X,
with various kernel threads, etc. without fiddling with nice levels.
Which makes it no longer "completely fair".

It's also not yet clear that a scheduler can't be taught to do the
right thing with X without fiddling with nice levels.

So I'm just not yet willing to completely rule out systems that aren't
"completely fair".

But I think we should rule out schedulers that don't have rigid bounds on
that unfairness. That's where the really ugly behavior lies.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-17 Thread Nick Piggin
On Tue, Apr 17, 2007 at 11:16:54PM +1000, Peter Williams wrote:
> Nick Piggin wrote:
> >I don't like the timeslice based nice in mainline. It's too nasty
> >with latencies. nicksched is far better in that regard IMO.
> >
> >But I don't know how you can assert a particular way is the best way
> >to do something.
> 
> I should have added "I may be wrong but I think that ...".
> 
> My opinion is based on a lot of experience with different types of 
> scheduler design and the observation from gathering scheduling 
> statistics while playing with these schedulers that the size of the time 
> slices we're talking about is much larger than the CPU chunks most tasks 
> use in any one go so time slice size has no real effect on most tasks 
> and the faster CPUs become the more this becomes true.

For desktop loads, maybe. But for things that are compute bound, the
cost of context switching I believe still gets worse as CPUs continue
to be able to execute more instructions per cycle, get clocked faster,
and get larger caches.


> >>In that case I'd go O(1) provided that the k factor for the O(1) wasn't 
> >>greater than O(logN)'s k factor multiplied by logMaxN.
> >
> >Yes, or even significantly greater around typical large sizes of N.
> 
> Yes.  In fact its' probably better to use the maximum number of threads 
> allowed on the system for N.  We know that value don't we?

Well we might be able to work it out by looking at the tunables or
amount of kernel memory available, but I guess it is hard to just
pick a number.

I'll try running a few more benchmarks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-17 Thread Davide Libenzi
On Tue, 17 Apr 2007, William Lee Irwin III wrote:

> 100**(1/39.0) ~= 1.12534 may do if so, but it seems a little weak, and
> even 1000**(1/39.0) ~= 1.19378 still seems weak.
> 
> I suspect that in order to get low nice numbers strong enough without
> making high nice numbers too strong something sub-exponential may need
> to be used. Maybe just picking percentages outright as opposed to some
> particular function.
> 
> We may also be better off defining it in terms of a share weighting as
> opposed to two tasks in competition. In such a manner the extension to
> N tasks is more automatic. f(n) would be a univariate function of nice
> numbers and two tasks in competition with nice numbers n_1 and n_2
> would get shares f(n_1)/(f(n_1)+f(n_2)) and f(n_2)/(f(n_1)+f(n_2)). In
> the exponential case f(n) = K*e**(r*n) this ends up as
> 1/(1+e**(r*(n_2-n_1))) which is indeed a function of n_1-n_2 but for
> other choices it's not so. f(n) = n+K for K >= 20 results in a share
> weighting of (n_1+K,n_2+K)/(n_1+n_2+2*K), which is not entirely clear
> in its impact. My guess is that f(n)=1/(n+1) when n >= 0 and f(n)=1-n
> when n <= 0 is highly plausible. An exponent or an additive constant
> may be worthwhile to throw in. In this case, f(-19) = 20, f(20) = 1/21,
> and the ratio of shares is 420, which is still arithmeticaly feasible.
> -10 vs. 0 and 0 vs. 10 are both 10:1.

This makes more sense, and the ratio at the extremes is something 
reasonable.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-17 Thread Mike Galbraith
On Wed, 2007-04-18 at 05:56 +0200, Nick Piggin wrote:
> On Wed, Apr 18, 2007 at 05:45:20AM +0200, Mike Galbraith wrote:
> > On Wed, 2007-04-18 at 05:15 +0200, Nick Piggin wrote:
> > >
> > > 
> > > So on what basis would you allow unfairness? On the basis that it doesn't
> > > seem to harm anyone? It doesn't seem to harm testers?
> > 
> > Well, there's short term fair and long term fair.  Seems to me a burst
> > load having to always merge with a steady stream load using a short term
> > fairness yardstick absolutely must 'starve' relative to the steady load,
> > so to be long term fair, you have to add some short term unfairness.
> > The mainline scheduler is more long term fair (discounting the rather
> > obnoxious corner cases).
> 
> Oh yes definitely I mean long term fair. I guess it is impossible to be
> completely fair so long as we have to timeshare the CPU :)
> 
> So a constant delta is fine and unavoidable. But I don't think I agree
> with a constant factor: that means you can pick a time where process 1
> is allowed an arbitrary T more CPU time than process 2.

Definitely.  Using constants with no consideration of what else is
running is what causes the fairness mechanism in mainline to break down
under load.

(aside: What I was experimenting with before this new scheduler came
along was to turn the sleep_avg thing into an off-cpu period thing. Once
a time slice begins execution [runqueue wait doesn't count], that task
has the right to use it's slice in one go, and _anything_ that knocked
it off the cpu added to it's credit.  Knocking someone else off detracts
from credit, and to get to the point where you can knock others off
costs you stored credit proportional to the dynamic priority you attain
by using it.  All tasks that have credit stay active, no favorites.)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-17 Thread William Lee Irwin III
On Tue, Apr 17, 2007 at 09:07:49AM -0400, James Bruce wrote:
>>> Nonlinear is a must IMO.  I would suggest X = exp(ln(10)/10) ~= 1.2589
>>> That value has the property that a nice=10 task gets 1/10th the cpu of a
>>> nice=0 task, and a nice=20 task gets 1/100 of nice=0.  I think that
>>> would be fairly easy to explain to admins and users so that they can
>>> know what to expect from nicing tasks.

On Tue, Apr 17, 2007 at 03:59:02PM -0700, William Lee Irwin III wrote:
>> I'm not likely to write the testcase until this upcoming weekend, though.

On Tue, Apr 17, 2007 at 05:57:23PM -0500, Matt Mackall wrote:
> So that means there's a 1:1 ratio between nice 20 and nice -19. In
> that sort of dynamic range, you're likely to have non-trivial
> numerical accuracy issues in integer/fixed-point math.
> (Especially if your clock is jiffies-scale, which a significant number
> of machines will continue to be.)
> I really think if we want to have vastly different ratios, we probably
> want to be looking at BATCH and RT scheduling classes instead.

100**(1/39.0) ~= 1.12534 may do if so, but it seems a little weak, and
even 1000**(1/39.0) ~= 1.19378 still seems weak.

I suspect that in order to get low nice numbers strong enough without
making high nice numbers too strong something sub-exponential may need
to be used. Maybe just picking percentages outright as opposed to some
particular function.

We may also be better off defining it in terms of a share weighting as
opposed to two tasks in competition. In such a manner the extension to
N tasks is more automatic. f(n) would be a univariate function of nice
numbers and two tasks in competition with nice numbers n_1 and n_2
would get shares f(n_1)/(f(n_1)+f(n_2)) and f(n_2)/(f(n_1)+f(n_2)). In
the exponential case f(n) = K*e**(r*n) this ends up as
1/(1+e**(r*(n_2-n_1))) which is indeed a function of n_1-n_2 but for
other choices it's not so. f(n) = n+K for K >= 20 results in a share
weighting of (n_1+K,n_2+K)/(n_1+n_2+2*K), which is not entirely clear
in its impact. My guess is that f(n)=1/(n+1) when n >= 0 and f(n)=1-n
when n <= 0 is highly plausible. An exponent or an additive constant
may be worthwhile to throw in. In this case, f(-19) = 20, f(20) = 1/21,
and the ratio of shares is 420, which is still arithmeticaly feasible.
-10 vs. 0 and 0 vs. 10 are both 10:1.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-17 Thread Nick Piggin
On Wed, Apr 18, 2007 at 05:45:20AM +0200, Mike Galbraith wrote:
> On Wed, 2007-04-18 at 05:15 +0200, Nick Piggin wrote:
> > On Tue, Apr 17, 2007 at 04:39:54PM -0500, Matt Mackall wrote:
> > > 
> > > I'm a big fan of fairness, but I think it's a bit early to declare it
> > > a mandatory feature. Bounded unfairness is probably something we can
> > > agree on, ie "if we decide to be unfair, no process suffers more than
> > > a factor of x".
> > 
> > I don't know why this would be a useful feature (of course I'm talking
> > about processes at the same nice level). One of the big problems with
> > the current scheduler is that it is unfair in some corner cases. It
> > works OK for most people, but when it breaks down it really hurts. At
> > least if you start with a fair scheduler, you can alter priorities
> > until it satisfies your need... with an unfair one your guess is as
> > good as mine.
> > 
> > So on what basis would you allow unfairness? On the basis that it doesn't
> > seem to harm anyone? It doesn't seem to harm testers?
> 
> Well, there's short term fair and long term fair.  Seems to me a burst
> load having to always merge with a steady stream load using a short term
> fairness yardstick absolutely must 'starve' relative to the steady load,
> so to be long term fair, you have to add some short term unfairness.
> The mainline scheduler is more long term fair (discounting the rather
> obnoxious corner cases).

Oh yes definitely I mean long term fair. I guess it is impossible to be
completely fair so long as we have to timeshare the CPU :)

So a constant delta is fine and unavoidable. But I don't think I agree
with a constant factor: that means you can pick a time where process 1
is allowed an arbitrary T more CPU time than process 2.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-17 Thread Mike Galbraith
On Wed, 2007-04-18 at 05:15 +0200, Nick Piggin wrote:
> On Tue, Apr 17, 2007 at 04:39:54PM -0500, Matt Mackall wrote:
> > 
> > I'm a big fan of fairness, but I think it's a bit early to declare it
> > a mandatory feature. Bounded unfairness is probably something we can
> > agree on, ie "if we decide to be unfair, no process suffers more than
> > a factor of x".
> 
> I don't know why this would be a useful feature (of course I'm talking
> about processes at the same nice level). One of the big problems with
> the current scheduler is that it is unfair in some corner cases. It
> works OK for most people, but when it breaks down it really hurts. At
> least if you start with a fair scheduler, you can alter priorities
> until it satisfies your need... with an unfair one your guess is as
> good as mine.
> 
> So on what basis would you allow unfairness? On the basis that it doesn't
> seem to harm anyone? It doesn't seem to harm testers?

Well, there's short term fair and long term fair.  Seems to me a burst
load having to always merge with a steady stream load using a short term
fairness yardstick absolutely must 'starve' relative to the steady load,
so to be long term fair, you have to add some short term unfairness.
The mainline scheduler is more long term fair (discounting the rather
obnoxious corner cases).

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-17 Thread Nick Piggin
On Tue, Apr 17, 2007 at 04:39:54PM -0500, Matt Mackall wrote:
> On Tue, Apr 17, 2007 at 09:01:55AM +0200, Nick Piggin wrote:
> > On Mon, Apr 16, 2007 at 11:26:21PM -0700, William Lee Irwin III wrote:
> > > On Mon, Apr 16, 2007 at 11:09:55PM -0700, William Lee Irwin III wrote:
> > > >> All things are not equal; they all have different properties. I like
> > > 
> > > On Tue, Apr 17, 2007 at 08:15:03AM +0200, Nick Piggin wrote:
> > > > Exactly. So we have to explore those properties and evaluate performance
> > > > (in all meanings of the word). That's only logical.
> > > 
> > > Any chance you'd be willing to put down a few thoughts on what sorts
> > > of standards you'd like to set for both correctness (i.e. the bare
> > > minimum a scheduler implementation must do to be considered valid
> > > beyond not oopsing) and performance metrics (i.e. things that produce
> > > numbers for each scheduler you can compare to say "this scheduler is
> > > better than this other scheduler at this.").
> > 
> > Yeah I guess that's the hard part :)
> > 
> > For correctness, I guess fairness is an easy one. I think that unfairness
> > is basically a bug and that it would be very unfortunate to merge something
> > unfair. But this is just within the context of a single runqueue... for
> > better or worse, we allow some unfairness in multiprocessors for performance
> > reasons of course.
> 
> I'm a big fan of fairness, but I think it's a bit early to declare it
> a mandatory feature. Bounded unfairness is probably something we can
> agree on, ie "if we decide to be unfair, no process suffers more than
> a factor of x".

I don't know why this would be a useful feature (of course I'm talking
about processes at the same nice level). One of the big problems with
the current scheduler is that it is unfair in some corner cases. It
works OK for most people, but when it breaks down it really hurts. At
least if you start with a fair scheduler, you can alter priorities
until it satisfies your need... with an unfair one your guess is as
good as mine.

So on what basis would you allow unfairness? On the basis that it doesn't
seem to harm anyone? It doesn't seem to harm testers?

I think we should aim for something better.


> > Latency. Given N tasks in the system, an arbitrary task should get
> > onto the CPU in a bounded amount of time (excluding events like freak
> > IRQ holdoffs and such, obviously -- ie. just considering the context
> > of the scheduler's state machine).
> 
> This is a slightly stronger statement than starvation-free (which is
> obviously mandatory). I think you're looking for something like
> "worst-case scheduling latency is proportional to the number of
> runnable tasks". Which I think is quite a reasonable requirement.

Yes, bounded and proportional to.


> I'm pretty sure the stock scheduler falls short of both of these
> guarantees though.

And I think that's what its main problems are. It's interactivity
obviously can't be too bad for most people. It's performance seems to
be pretty good.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-17 Thread Peter Williams

Michael K. Edwards wrote:

On 4/17/07, Peter Williams <[EMAIL PROTECTED]> wrote:

The other way in which the code deviates from the original as that (for
a few years now) I no longer calculated CPU bandwidth usage directly.
I've found that the overhead is less if I keep a running average of the
size of a tasks CPU bursts and the length of its scheduling cycle (i.e.
from on CPU one time to on CPU next time) and using the ratio of these
values as a measure of bandwidth usage.

Anyway it works and gives very predictable allocations of CPU bandwidth
based on nice.


Works, that is, right up until you add nonlinear interactions with CPU
speed scaling.  From my perspective as an embedded platform
integrator, clock/voltage scaling is the elephant in the scheduler's
living room.  Patch in DPM (now OpPoint?) to scale the clock based on
what task is being scheduled, and suddenly the dynamic priority
calculations go wild.  Nip this in the bud by putting an RT priority
on the relevant threads (which you have to do anyway if you need
remotely audio-grade latency), and the lock affinity heuristics break,
so you have to hand-tune all the thread priorities.  Blecch.

Not to mention the likelihood that the task whose clock speed you're
trying to crank up (say, a WiFi soft MAC) needs to be _lower_ priority
than the application.  (You want to crank the CPU for this task
because it runs with the RF hot, which may cost you as much power as
the rest of the platform.)  You'd better hope you can remove it from
the dynamic priority heuristics with SCHED_BATCH.  Otherwise
everything _else_ has to be RT priority (or it'll be starved by the
soft MAC) and you've basically tossed SCHED_NORMAL in the bin.  Double
blecch!

Is it too much to ask for someone with actual engineering training
(not me, unfortunately) to sit down and build a negative-feedback
control system that handles soft-real-time _and_ dynamic-priority
_and_ batch loads, CPU _and_ I/O scheduling, preemption _and_ clock
scaling?  And actually separates the accounting and control mechanisms
from the heuristics, so the latter can be tuned (within a well
documented stable range) to reflect the expected system usage
patterns?

It's not like there isn't a vast literature in this area over the past
decade, including some dealing specifically with clock scaling
consistent with low-latency applications.  It's a pity that people
doing academic work in this area rarely wade into LKML, even when
they're hacking on a Linux fork.  But then, there's not much economic
incentive for them to do so, and they can usually get their fill of
citation politics and dominance games without leaving their home
department.  :-P

Seriously, though.  If you're really going to put the mainline
scheduler through this kind of churn, please please pretty please knit
in per-task clock scaling (possibly even rejigged during the slice;
see e. g. Yuan and Nahrstedt's GRACE-OS papers) and some sort of
linger mechanism to keep from taking context switch hits when you're
confident that an I/O will complete quickly.


I think that this doesn't effect the basic design principles of spa_ebs 
but just means that the statistics that it uses need to be rethought. 
E.g. instead of measuring average CPU usage per burst in terms of wall 
clock time spent on the CPU measure it in terms of CPU capacity (for the 
want of a better word) used per burst.


I don't have suitable hardware for investigating this line of attack 
further, unfortunately, and have no idea what would be the best way to 
calculate this new statistic.


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-17 Thread Peter Williams

William Lee Irwin III wrote:

Peter Williams wrote:

William Lee Irwin III wrote:

I was tempted to restart from scratch given Ingo's comments, but I
reconsidered and I'll be working with your code (and the German
students' as well). If everything has to change, so be it, but it'll
still be a derived work. It would be ignoring precedent and failure to
properly attribute if I did otherwise.
I can give you a patch (or set of patches) against the latest git 
vanilla kernel version if that would help.  There have been changes to 
the vanilla scheduler code since 2.6.20 so the latest patch on 
sourceforge won't apply cleanly.  I've found that implementing this as a 
series of patches rather than one big patch makes it easier fro me to 
cope with changes to the underlying code.

On Wed, Apr 18, 2007 at 10:27:27AM +1000, Peter Williams wrote:
I've just placed a single patch for plugsched against 2.6.21-rc7 updated 
to Linus's git tree as of an hour or two ago on sourceforge:


This should at least enable you to get it to apply cleanly to the latest 
kernel sources.  Let me know if you'd also like this as a quilt/mq 
friendly patch series?


A quilt-friendly series would be most excellent if you could arrange it.


Done:



Just untar this in the base directory of your Linux kernel source and 
Bob's your uncle.


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-17 Thread William Lee Irwin III
> Peter Williams wrote:
> >William Lee Irwin III wrote:
> >>I was tempted to restart from scratch given Ingo's comments, but I
> >>reconsidered and I'll be working with your code (and the German
> >>students' as well). If everything has to change, so be it, but it'll
> >>still be a derived work. It would be ignoring precedent and failure to
> >>properly attribute if I did otherwise.
> >
> >I can give you a patch (or set of patches) against the latest git 
> >vanilla kernel version if that would help.  There have been changes to 
> >the vanilla scheduler code since 2.6.20 so the latest patch on 
> >sourceforge won't apply cleanly.  I've found that implementing this as a 
> >series of patches rather than one big patch makes it easier fro me to 
> >cope with changes to the underlying code.
> 
On Wed, Apr 18, 2007 at 10:27:27AM +1000, Peter Williams wrote:
> I've just placed a single patch for plugsched against 2.6.21-rc7 updated 
> to Linus's git tree as of an hour or two ago on sourceforge:
> 
> This should at least enable you to get it to apply cleanly to the latest 
> kernel sources.  Let me know if you'd also like this as a quilt/mq 
> friendly patch series?

A quilt-friendly series would be most excellent if you could arrange it.

Thanks.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   >