Re: Renice X for cpu schedulers

2007-04-24 Thread Matt Mackall
On Tue, Apr 24, 2007 at 08:50:20AM -0700, Ray Lee wrote:
> > Firstly, lots of clients in your list are remote. X usually isn't.
> 
> They really aren't, unless you happen to work somewhere that can afford
> to dedicate a box to a db, which suddenly makes the scheduler a dull
> topic.
> 
> For example, I have a db and web server installed on my laptop, so
> that the few times that I have to do web app programming (while wearing
> a mustache and glasses so that I don't have to admit to it in polite
> company), I can be functional with just one computer.

Indeed. The vast majority of people doing "LAMP" web services are
doing it on a single machine. Or VM for that matter.

It seems that this is a lot like the priority inheritance problem. If
a nice -19 process blocks on the db running at nice 0, the db ought to
get a boost until it wakes the original process up. The same should
apply at the level of dynamic priorities at the same nice level.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-24 Thread Ray Lee
Nick Piggin wrote:
> On Thu, Apr 19, 2007 at 12:26:03PM -0700, Ray Lee wrote:
>> On 4/19/07, Con Kolivas <[EMAIL PROTECTED]> wrote:
>>> The one fly in the ointment for
>>> linux remains X. I am still, to this moment, completely and utterly stunned
>>> at why everyone is trying to find increasingly complex unique ways to 
>>> manage
>>> X when all it needs is more cpu[1].
>> [...and hence should be reniced]
>>
>> The problem is that X is not unique. There's postgresql, memcached,
>> mysql, db2, a little embedded app I wrote... all of these perform work
>> on behalf of another process. It's just most *noticeable* with X, as
>> pretty much everyone is running that.
> 
> But for most of those apps, we don't actually care if they do fairly
> degrade in performance as other loads on the system ramp up.

(Who's this 'we' kemosabe? I do. Desktop systems are increasingly using
databases for their day-to-day tasks. As they should, a db is not
something that should be reinvented poorly.)

> However
> the user prefers X to be given priority in these situations. Whether
> that is the design of X, x clients, or the human condition really
> doesn't matter two hoots to the scheduler.

Hmm, let's try this again. Anything that communicates out of process
as part of its normal usage for Getting Work Done gets impacted by the
scheduler. That means pipelines in the shell, d-bus on the desktop, and
lots of other things that follow the unix philosophy of lots of little
programs communicating.

>> If we had some way for the scheduler to decide to donate part of a
>> client process's time slice to the server it just spoke to (with an
>> exponential dampening factor -- take 50% from the client, give 25% to
>> the server, toss the rest on the floor), that -- from my naive point
>> of view -- would be a step toward fixing the underlying issue. Or I
>> might be spouting crap, who knows.
> 
> Firstly, lots of clients in your list are remote. X usually isn't.

They really aren't, unless you happen to work somewhere that can afford
to dedicate a box to a db, which suddenly makes the scheduler a dull
topic.

For example, I have a db and web server installed on my laptop, so
that the few times that I have to do web app programming (while wearing
a mustache and glasses so that I don't have to admit to it in polite
company), I can be functional with just one computer.

> However for X, a syscall or something to donate time might not be
> such a bad idea...

We have one already, it's called write(). We have another called
read(), too. Okay, so they have some data related side-effects other
than the scheduler hints, but I claim the scheduler hint is already
implicitly there.

> but given a couple of X clients and a server
> against a parallel make, this is probably just going to make the
> clients slow down as well without giving enough priority to the
> server.

Do you have data, or at least a theory to back up that hypothesis?

> X isn't special so much because it does work on behalf of others
> (as you said, lots of things do that). It is special simply because
> we _want_ rendering to have priority of the CPU

Really not. I'm trying to get across that this is a general problem
with interprocess communication, or any systems that rely on multiple
processes to make forward progress on a problem. Sure, let the clients
make forward progress until they can't any more. If they stop making
forward progress by blocking on a read or sleeping after a write to
another process, then there's a big hint there as to who should get
focus next.

> (if you shifed CPU
> intensive rendering to the clients, you'd most likely want to give
> them priority to); nice, right?

They'd have it automatically, if they were spending their time computing
rather than rendering.

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-24 Thread Ray Lee
Nick Piggin wrote:
 On Thu, Apr 19, 2007 at 12:26:03PM -0700, Ray Lee wrote:
 On 4/19/07, Con Kolivas [EMAIL PROTECTED] wrote:
 The one fly in the ointment for
 linux remains X. I am still, to this moment, completely and utterly stunned
 at why everyone is trying to find increasingly complex unique ways to 
 manage
 X when all it needs is more cpu[1].
 [...and hence should be reniced]

 The problem is that X is not unique. There's postgresql, memcached,
 mysql, db2, a little embedded app I wrote... all of these perform work
 on behalf of another process. It's just most *noticeable* with X, as
 pretty much everyone is running that.
 
 But for most of those apps, we don't actually care if they do fairly
 degrade in performance as other loads on the system ramp up.

(Who's this 'we' kemosabe? I do. Desktop systems are increasingly using
databases for their day-to-day tasks. As they should, a db is not
something that should be reinvented poorly.)

 However
 the user prefers X to be given priority in these situations. Whether
 that is the design of X, x clients, or the human condition really
 doesn't matter two hoots to the scheduler.

Hmm, let's try this again. Anything that communicates out of process
as part of its normal usage for Getting Work Done gets impacted by the
scheduler. That means pipelines in the shell, d-bus on the desktop, and
lots of other things that follow the unix philosophy of lots of little
programs communicating.

 If we had some way for the scheduler to decide to donate part of a
 client process's time slice to the server it just spoke to (with an
 exponential dampening factor -- take 50% from the client, give 25% to
 the server, toss the rest on the floor), that -- from my naive point
 of view -- would be a step toward fixing the underlying issue. Or I
 might be spouting crap, who knows.
 
 Firstly, lots of clients in your list are remote. X usually isn't.

They really aren't, unless you happen to work somewhere that can afford
to dedicate a box to a db, which suddenly makes the scheduler a dull
topic.

For example, I have a db and web server installed on my laptop, so
that the few times that I have to do web app programming (while wearing
a mustache and glasses so that I don't have to admit to it in polite
company), I can be functional with just one computer.

 However for X, a syscall or something to donate time might not be
 such a bad idea...

We have one already, it's called write(). We have another called
read(), too. Okay, so they have some data related side-effects other
than the scheduler hints, but I claim the scheduler hint is already
implicitly there.

 but given a couple of X clients and a server
 against a parallel make, this is probably just going to make the
 clients slow down as well without giving enough priority to the
 server.

Do you have data, or at least a theory to back up that hypothesis?

 X isn't special so much because it does work on behalf of others
 (as you said, lots of things do that). It is special simply because
 we _want_ rendering to have priority of the CPU

Really not. I'm trying to get across that this is a general problem
with interprocess communication, or any systems that rely on multiple
processes to make forward progress on a problem. Sure, let the clients
make forward progress until they can't any more. If they stop making
forward progress by blocking on a read or sleeping after a write to
another process, then there's a big hint there as to who should get
focus next.

 (if you shifed CPU
 intensive rendering to the clients, you'd most likely want to give
 them priority to); nice, right?

They'd have it automatically, if they were spending their time computing
rather than rendering.

Ray
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-24 Thread Matt Mackall
On Tue, Apr 24, 2007 at 08:50:20AM -0700, Ray Lee wrote:
  Firstly, lots of clients in your list are remote. X usually isn't.
 
 They really aren't, unless you happen to work somewhere that can afford
 to dedicate a box to a db, which suddenly makes the scheduler a dull
 topic.
 
 For example, I have a db and web server installed on my laptop, so
 that the few times that I have to do web app programming (while wearing
 a mustache and glasses so that I don't have to admit to it in polite
 company), I can be functional with just one computer.

Indeed. The vast majority of people doing LAMP web services are
doing it on a single machine. Or VM for that matter.

It seems that this is a lot like the priority inheritance problem. If
a nice -19 process blocks on the db running at nice 0, the db ought to
get a boost until it wakes the original process up. The same should
apply at the level of dynamic priorities at the same nice level.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-22 Thread Con Kolivas
On Sunday 22 April 2007 22:54, Mark Lord wrote:
> Just to throw another possibly-overlooked variable into the mess:
>
> My system here is using the on-demand cpufreq policy governor.
> I wonder how that interacts with the various schedulers here?
>
> I suppose for the "make" kernel case, after a couple of seconds
> the cpufreq would hit max and stay there for the rest of the build,
> so it shouldn't really be a factor for (non-)interactivity during the
> build.
>
> Or should it?

Short answer: shouldn't matter :)

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-22 Thread Mark Lord

Just to throw another possibly-overlooked variable into the mess:

My system here is using the on-demand cpufreq policy governor.
I wonder how that interacts with the various schedulers here?

I suppose for the "make" kernel case, after a couple of seconds
the cpufreq would hit max and stay there for the rest of the build,
so it shouldn't really be a factor for (non-)interactivity during the build.

Or should it?

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-22 Thread Mark Lord

Just to throw another possibly-overlooked variable into the mess:

My system here is using the on-demand cpufreq policy governor.
I wonder how that interacts with the various schedulers here?

I suppose for the make kernel case, after a couple of seconds
the cpufreq would hit max and stay there for the rest of the build,
so it shouldn't really be a factor for (non-)interactivity during the build.

Or should it?

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-22 Thread Con Kolivas
On Sunday 22 April 2007 22:54, Mark Lord wrote:
 Just to throw another possibly-overlooked variable into the mess:

 My system here is using the on-demand cpufreq policy governor.
 I wonder how that interacts with the various schedulers here?

 I suppose for the make kernel case, after a couple of seconds
 the cpufreq would hit max and stay there for the rest of the build,
 so it shouldn't really be a factor for (non-)interactivity during the
 build.

 Or should it?

Short answer: shouldn't matter :)

-- 
-ck
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-21 Thread Mark Lord

Nick Piggin wrote:

On Thu, Apr 19, 2007 at 09:17:25AM -0400, Mark Lord wrote:

Just plain "make" (no -j2 or -j) is enough to kill interactivity
on my 2GHz P-M single-core non-HT machine with SD.


Is this with or without X reniced?


That was with no manual jiggling, everything the same as with stock kernels,
except that stock kernels don't kill interactivity here.


But with the very first posted version of CFS by Ingo,
I can do "make -j2" no problem and still have a nicely interactive destop.


How well does cfs run if you have the granularity set to something
like 30ms (3000)?


Dunno, I've put this stuff aside for now until things settle down.
With four schedulers, and lots of patches / revisions / tuning-knobs,
there's just no way to keep up with it all here.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-21 Thread Mark Lord

Nick Piggin wrote:

On Thu, Apr 19, 2007 at 09:17:25AM -0400, Mark Lord wrote:

Just plain make (no -j2 or -j) is enough to kill interactivity
on my 2GHz P-M single-core non-HT machine with SD.


Is this with or without X reniced?


That was with no manual jiggling, everything the same as with stock kernels,
except that stock kernels don't kill interactivity here.


But with the very first posted version of CFS by Ingo,
I can do make -j2 no problem and still have a nicely interactive destop.


How well does cfs run if you have the granularity set to something
like 30ms (3000)?


Dunno, I've put this stuff aside for now until things settle down.
With four schedulers, and lots of patches / revisions / tuning-knobs,
there's just no way to keep up with it all here.

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-20 Thread hui
On Fri, Apr 20, 2007 at 12:12:29AM -0700, Michael K. Edwards wrote:
> Actual fractional CPU reservation is a bit different, and is probably
> best handled with "container"-type infrastructure (not quite
> virtualization, but not quite scheduling classes either).  SGI
> pioneered this (in "open systems" space -- IBM probably had it first,
> as usual) with GRIO in XFS.  (That was I/O throughput reservation of

I'm very aware of this having grow up on those systems and see what 30k
USD of hardware can do for you with the right kernel facilties. It
would be a mind blower to get OpenGL and friends back to that level of
performance with regards to React/Pro's rt abilities, frame drop would
just be gone and we'd own gaming. No joke.

We have a number of former SGI XFS engineers here at NetApp and I should
ask them about the GRIO implementation.

> course, not "CPU bandwidth" -- but IIRC IRIX had CPU reservation too).
> There's a more general class of techniques in which it's worth
> spending idle cycles speculating along paths that might or might not
> be taken depending on unpredictable I/O; I'd be surprised if you
> couldn't approximate most of the sane balancing strategies in this
> area within the "economic dispatch" scheduler model.  (Good JIT

What is that ? never heard of it before.
 
> I don't know where the -rt patch enters in.  But if you need agile
> reprioritization with a deep runnable queue, either under direct
> application control or as a side effect of priority inheritance or a
> related OS-enforced protocol, then you need a kernel-level data
> structure with a fancier interface than the classic
> insert/find/delete-min priority queue.  From what I've read (this is
> not my area of expertise and I don't have Knuth handy), the relatively
> simple heap-based implementations of priority queues can't
> reprioritize an entry any more quickly than find+delete+insert, which
> pretty much rules them out as a basis for a scalable scheduler with
> priority inheritance (let alone PCP emulation).

The -rt patch has turnstile-esque infrastructure that's stack allocated.
Linux's lock hierarchy is relatively shallow (compensated with a heavy
use of per CPU method and RCU-ified algorithms in place of rwlocks) so
I've encountered nothing close to this that would demand such an overly
sophisticated mechanism. I'm aware of PCP and preemptions thresholds.
I created the lockstat infrastructure as a means of precisely measuring
contention in -rt in anticipation to experiment with these techniques.

I mention -rt because it's the most likely place to encounter what you're
talking about, not an app.
 
> >I have Solaris style adaptive locks in my tree with my lockstat patch
> >under -rt. I've also modified my lockstat patch to track readers

...

> Ooh, that's neat.  The next time I can cook up an excuse to run a
> kernel that won't load this damn WiFi driver, I'll try it out.  Some
> of the people I work with are real engineers and respect in-system
> instrumentation.

It's not publically released yet since I'm still stuck in .20-rc6 land
and the soft lock up detector triggers. I need to forward port it and
my lockstat changes to the most recent -rt patch.

I've been stalled on revision control problem that I'm trying to solve
with monotone for at least a month (of my own spare time).

> That's a good thing; it implies that in-kernel algorithms don't take
> locks needlessly as a matter of cargo-cult habit.  Attempting to take

The jury is still out on this until I can record what the rtmutex owner's
state is in. No further conclusion can be made until then. I think
this is very interesting pursuit/investigation.

> a lock (other than an uncontended futex, which is practically free)
> should almost always transfer control to the thread that has the power
> to deliver the information (or the free slot) that you're looking for
> -- or in the case of an external data source/sink, should send you
> into low-power mode until time and tide give you something new to do.

> Think of it as a just-in-time inventory system; if you keep too much
> product in stock (or free warehouse space), you're wasting space and
> harming responsiveness to a shift in demand.  Once in a while you have
> to play Sokoban in order to service a request promptly; that's exactly
> the case that priority inheritance is meant to help with.

What did you mean by this ? Victor Yodaiken's stuff ?

> The fiddly part, on a non-real-time-no-matter-what-the-label-says
> system with an opaque cache architecture and mysterious hidden costs
> of context switching, is to minimize the lossage resulting from brutal
> timer- or priority-inheritance-driven preemption.  Given the way
> people code these days -- OK, it was probably just as bad back in the
> day -- the only thing worse than letting the axe fall at random is to
> steal the CPU away the moment a contended lock is released, because

My adaptive spin stuff in front of an rtmutex is design to complement
Steve 

Re: Renice X for cpu schedulers

2007-04-20 Thread Michael K. Edwards

On 4/19/07, hui Bill Huey <[EMAIL PROTECTED]> wrote:

DSP operations like, particularly with digital synthesis, tend to max
the CPU doing vector operations on as many processors as it can get
a hold of. In a live performance critical application, it's important
to be able to deliver a protected amount of CPU to a thread doing that
work as well as response to external input such as controllers, etc...


Actual fractional CPU reservation is a bit different, and is probably
best handled with "container"-type infrastructure (not quite
virtualization, but not quite scheduling classes either).  SGI
pioneered this (in "open systems" space -- IBM probably had it first,
as usual) with GRIO in XFS.  (That was I/O throughput reservation of
course, not "CPU bandwidth" -- but IIRC IRIX had CPU reservation too).
There's a more general class of techniques in which it's worth
spending idle cycles speculating along paths that might or might not
be taken depending on unpredictable I/O; I'd be surprised if you
couldn't approximate most of the sane balancing strategies in this
area within the "economic dispatch" scheduler model.  (Good JIT
bytecode engines more or less do this already if you let them, with a
cap on JIT cache size serving as a crude CPU throttle.)


> In practice, you probably don't want to burden desktop Linux with
> priority inheritance where you don't have to.  Priority queues with
> algorithmically efficient decrease-key operations (Fibonacci heaps and
> their ilk) are complicated to implement and have correspondingly high
> constant factors.  (However, a sufficiently clever heuristic for
> assigning quasi-static task priorities would usually short-circuit the
> priority cascade; if you can keep N small in the
> tasks-with-unpredictable-priority queue, you can probably use a
> simpler flavor with O(log N) decrease-key.  Ask someone who knows more
> about data structures than I do.)

These are app issue and not really somethings that's mutable in kernel
per se with regard to the -rt patch.


I don't know where the -rt patch enters in.  But if you need agile
reprioritization with a deep runnable queue, either under direct
application control or as a side effect of priority inheritance or a
related OS-enforced protocol, then you need a kernel-level data
structure with a fancier interface than the classic
insert/find/delete-min priority queue.  From what I've read (this is
not my area of expertise and I don't have Knuth handy), the relatively
simple heap-based implementations of priority queues can't
reprioritize an entry any more quickly than find+delete+insert, which
pretty much rules them out as a basis for a scalable scheduler with
priority inheritance (let alone PCP emulation).


I have Solaris style adaptive locks in my tree with my lockstat patch
under -rt. I've also modified my lockstat patch to track readers
correctly now with rwsem and the like to see where the single reader
limitation in the rtmutex blows it.


Ooh, that's neat.  The next time I can cook up an excuse to run a
kernel that won't load this damn WiFi driver, I'll try it out.  Some
of the people I work with are real engineers and respect in-system
instrumentation.


So far I've seen less than 10 percent of in-kernel contention events
actually worth spinning on and the rest of the stats imply that the
mutex owner in question is either preempted or blocked on something
else.


That's a good thing; it implies that in-kernel algorithms don't take
locks needlessly as a matter of cargo-cult habit.  Attempting to take
a lock (other than an uncontended futex, which is practically free)
should almost always transfer control to the thread that has the power
to deliver the information (or the free slot) that you're looking for
-- or in the case of an external data source/sink, should send you
into low-power mode until time and tide give you something new to do.
Think of it as a just-in-time inventory system; if you keep too much
product in stock (or free warehouse space), you're wasting space and
harming responsiveness to a shift in demand.  Once in a while you have
to play Sokoban in order to service a request promptly; that's exactly
the case that priority inheritance is meant to help with.

The fiddly part, on a non-real-time-no-matter-what-the-label-says
system with an opaque cache architecture and mysterious hidden costs
of context switching, is to minimize the lossage resulting from brutal
timer- or priority-inheritance-driven preemption.  Given the way
people code these days -- OK, it was probably just as bad back in the
day -- the only thing worse than letting the axe fall at random is to
steal the CPU away the moment a contended lock is released, because
the next 20 lines of code probably poke one last time at all the data
structures the task had in cache right before entering the critical
section.  That doesn't hurt so bad on RTOS-friendly hardware -- an
MMU-less system with either zero or near-infinite cache -- but it's
got to make 

Re: Renice X for cpu schedulers

2007-04-20 Thread Michael K. Edwards

On 4/19/07, hui Bill Huey [EMAIL PROTECTED] wrote:

DSP operations like, particularly with digital synthesis, tend to max
the CPU doing vector operations on as many processors as it can get
a hold of. In a live performance critical application, it's important
to be able to deliver a protected amount of CPU to a thread doing that
work as well as response to external input such as controllers, etc...


Actual fractional CPU reservation is a bit different, and is probably
best handled with container-type infrastructure (not quite
virtualization, but not quite scheduling classes either).  SGI
pioneered this (in open systems space -- IBM probably had it first,
as usual) with GRIO in XFS.  (That was I/O throughput reservation of
course, not CPU bandwidth -- but IIRC IRIX had CPU reservation too).
There's a more general class of techniques in which it's worth
spending idle cycles speculating along paths that might or might not
be taken depending on unpredictable I/O; I'd be surprised if you
couldn't approximate most of the sane balancing strategies in this
area within the economic dispatch scheduler model.  (Good JIT
bytecode engines more or less do this already if you let them, with a
cap on JIT cache size serving as a crude CPU throttle.)


 In practice, you probably don't want to burden desktop Linux with
 priority inheritance where you don't have to.  Priority queues with
 algorithmically efficient decrease-key operations (Fibonacci heaps and
 their ilk) are complicated to implement and have correspondingly high
 constant factors.  (However, a sufficiently clever heuristic for
 assigning quasi-static task priorities would usually short-circuit the
 priority cascade; if you can keep N small in the
 tasks-with-unpredictable-priority queue, you can probably use a
 simpler flavor with O(log N) decrease-key.  Ask someone who knows more
 about data structures than I do.)

These are app issue and not really somethings that's mutable in kernel
per se with regard to the -rt patch.


I don't know where the -rt patch enters in.  But if you need agile
reprioritization with a deep runnable queue, either under direct
application control or as a side effect of priority inheritance or a
related OS-enforced protocol, then you need a kernel-level data
structure with a fancier interface than the classic
insert/find/delete-min priority queue.  From what I've read (this is
not my area of expertise and I don't have Knuth handy), the relatively
simple heap-based implementations of priority queues can't
reprioritize an entry any more quickly than find+delete+insert, which
pretty much rules them out as a basis for a scalable scheduler with
priority inheritance (let alone PCP emulation).


I have Solaris style adaptive locks in my tree with my lockstat patch
under -rt. I've also modified my lockstat patch to track readers
correctly now with rwsem and the like to see where the single reader
limitation in the rtmutex blows it.


Ooh, that's neat.  The next time I can cook up an excuse to run a
kernel that won't load this damn WiFi driver, I'll try it out.  Some
of the people I work with are real engineers and respect in-system
instrumentation.


So far I've seen less than 10 percent of in-kernel contention events
actually worth spinning on and the rest of the stats imply that the
mutex owner in question is either preempted or blocked on something
else.


That's a good thing; it implies that in-kernel algorithms don't take
locks needlessly as a matter of cargo-cult habit.  Attempting to take
a lock (other than an uncontended futex, which is practically free)
should almost always transfer control to the thread that has the power
to deliver the information (or the free slot) that you're looking for
-- or in the case of an external data source/sink, should send you
into low-power mode until time and tide give you something new to do.
Think of it as a just-in-time inventory system; if you keep too much
product in stock (or free warehouse space), you're wasting space and
harming responsiveness to a shift in demand.  Once in a while you have
to play Sokoban in order to service a request promptly; that's exactly
the case that priority inheritance is meant to help with.

The fiddly part, on a non-real-time-no-matter-what-the-label-says
system with an opaque cache architecture and mysterious hidden costs
of context switching, is to minimize the lossage resulting from brutal
timer- or priority-inheritance-driven preemption.  Given the way
people code these days -- OK, it was probably just as bad back in the
day -- the only thing worse than letting the axe fall at random is to
steal the CPU away the moment a contended lock is released, because
the next 20 lines of code probably poke one last time at all the data
structures the task had in cache right before entering the critical
section.  That doesn't hurt so bad on RTOS-friendly hardware -- an
MMU-less system with either zero or near-infinite cache -- but it's
got to make this year's 

Re: Renice X for cpu schedulers

2007-04-20 Thread hui
On Fri, Apr 20, 2007 at 12:12:29AM -0700, Michael K. Edwards wrote:
 Actual fractional CPU reservation is a bit different, and is probably
 best handled with container-type infrastructure (not quite
 virtualization, but not quite scheduling classes either).  SGI
 pioneered this (in open systems space -- IBM probably had it first,
 as usual) with GRIO in XFS.  (That was I/O throughput reservation of

I'm very aware of this having grow up on those systems and see what 30k
USD of hardware can do for you with the right kernel facilties. It
would be a mind blower to get OpenGL and friends back to that level of
performance with regards to React/Pro's rt abilities, frame drop would
just be gone and we'd own gaming. No joke.

We have a number of former SGI XFS engineers here at NetApp and I should
ask them about the GRIO implementation.

 course, not CPU bandwidth -- but IIRC IRIX had CPU reservation too).
 There's a more general class of techniques in which it's worth
 spending idle cycles speculating along paths that might or might not
 be taken depending on unpredictable I/O; I'd be surprised if you
 couldn't approximate most of the sane balancing strategies in this
 area within the economic dispatch scheduler model.  (Good JIT

What is that ? never heard of it before.
 
 I don't know where the -rt patch enters in.  But if you need agile
 reprioritization with a deep runnable queue, either under direct
 application control or as a side effect of priority inheritance or a
 related OS-enforced protocol, then you need a kernel-level data
 structure with a fancier interface than the classic
 insert/find/delete-min priority queue.  From what I've read (this is
 not my area of expertise and I don't have Knuth handy), the relatively
 simple heap-based implementations of priority queues can't
 reprioritize an entry any more quickly than find+delete+insert, which
 pretty much rules them out as a basis for a scalable scheduler with
 priority inheritance (let alone PCP emulation).

The -rt patch has turnstile-esque infrastructure that's stack allocated.
Linux's lock hierarchy is relatively shallow (compensated with a heavy
use of per CPU method and RCU-ified algorithms in place of rwlocks) so
I've encountered nothing close to this that would demand such an overly
sophisticated mechanism. I'm aware of PCP and preemptions thresholds.
I created the lockstat infrastructure as a means of precisely measuring
contention in -rt in anticipation to experiment with these techniques.

I mention -rt because it's the most likely place to encounter what you're
talking about, not an app.
 
 I have Solaris style adaptive locks in my tree with my lockstat patch
 under -rt. I've also modified my lockstat patch to track readers

...

 Ooh, that's neat.  The next time I can cook up an excuse to run a
 kernel that won't load this damn WiFi driver, I'll try it out.  Some
 of the people I work with are real engineers and respect in-system
 instrumentation.

It's not publically released yet since I'm still stuck in .20-rc6 land
and the soft lock up detector triggers. I need to forward port it and
my lockstat changes to the most recent -rt patch.

I've been stalled on revision control problem that I'm trying to solve
with monotone for at least a month (of my own spare time).

 That's a good thing; it implies that in-kernel algorithms don't take
 locks needlessly as a matter of cargo-cult habit.  Attempting to take

The jury is still out on this until I can record what the rtmutex owner's
state is in. No further conclusion can be made until then. I think
this is very interesting pursuit/investigation.

 a lock (other than an uncontended futex, which is practically free)
 should almost always transfer control to the thread that has the power
 to deliver the information (or the free slot) that you're looking for
 -- or in the case of an external data source/sink, should send you
 into low-power mode until time and tide give you something new to do.

 Think of it as a just-in-time inventory system; if you keep too much
 product in stock (or free warehouse space), you're wasting space and
 harming responsiveness to a shift in demand.  Once in a while you have
 to play Sokoban in order to service a request promptly; that's exactly
 the case that priority inheritance is meant to help with.

What did you mean by this ? Victor Yodaiken's stuff ?

 The fiddly part, on a non-real-time-no-matter-what-the-label-says
 system with an opaque cache architecture and mysterious hidden costs
 of context switching, is to minimize the lossage resulting from brutal
 timer- or priority-inheritance-driven preemption.  Given the way
 people code these days -- OK, it was probably just as bad back in the
 day -- the only thing worse than letting the axe fall at random is to
 steal the CPU away the moment a contended lock is released, because

My adaptive spin stuff in front of an rtmutex is design to complement
Steve Rostedt's owner stealing code also in that path and 

Re: Renice X for cpu schedulers

2007-04-19 Thread hui
On Thu, Apr 19, 2007 at 05:20:53PM -0700, Michael K. Edwards wrote:
> Embedded systems are already in 2007, and the mainline Linux scheduler
> frankly sucks on them, because it thinks it's back in the 1960's with
> a fixed supply and captive demand, pissing away "CPU bandwidth" as
> waste heat.  Not to say it's an easy problem; even academics with a
> dozen publications in this area don't seem to be able to model energy
> usage to the nearest big O, let alone design a stable economic
> dispatch engine.  But it helps to acknowledge what the problem is:
> even in a 1960's raised-floor screaming-air-conditioners
> screw-the-power-bill machine room, you can't actually run a
> half-decent CPU flat out any more without burning it to a crisp.
> stupid.  What's your excuse?  ;-)

It's now possible to QoS significant parts of the kernel since we now
have a deadline mechanism in place. In the original 2.4 kernel, TimeSys's
irq-thread allowed for the processing of skbuffs in a thread under a CPU
reservation run category which was use to provide QoS I believe. This
basic mechanish can now be generalized to many place in the kernel and
put it under scheduler control.

It's just a matter of who and when somebody is going take on this task.

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread hui
On Thu, Apr 19, 2007 at 06:32:15PM -0700, Michael K. Edwards wrote:
> But I think SCHED_FIFO on a chain of tasks is fundamentally not the
> right way to handle low audio latency.  The object with a low latency
> requirement isn't the task, it's the device.  When it's starting to
> get urgent to deliver more data to the device, the task that it's
> waiting on should slide up the urgency scale; and if it's waiting on
> something else, that something else should slide up the scale; and so
> forth.  Similarly, responding to user input is urgent; so when user
> input is available (by whatever mechanism), the task that's waiting
> for it should slide up the urgency scale, etc.

DSP operations like, particularly with digital synthesis, tend to max
the CPU doing vector operations on as many processors as it can get
a hold of. In a live performance critical application, it's important
to be able to deliver a protected amount of CPU to a thread doing that
work as well as response to external input such as controllers, etc...

> In practice, you probably don't want to burden desktop Linux with
> priority inheritance where you don't have to.  Priority queues with
> algorithmically efficient decrease-key operations (Fibonacci heaps and
> their ilk) are complicated to implement and have correspondingly high
> constant factors.  (However, a sufficiently clever heuristic for
> assigning quasi-static task priorities would usually short-circuit the
> priority cascade; if you can keep N small in the
> tasks-with-unpredictable-priority queue, you can probably use a
> simpler flavor with O(log N) decrease-key.  Ask someone who knows more
> about data structures than I do.)

These are app issue and not really somethings that's mutable in kernel
per se with regard to the -rt patch.

> More importantly, non-real-time application coders aren't very smart
> about grouping data structure accesses on one side or the other of a
> system call that is likely to release a lock and let something else
> run, flushing application data out of cache.  (Kernel coders aren't
> always smart about this either; see LKML threads a few weeks ago about
> racy, and cache-stall-prone, f_pos handling in VFS.)  So switching
> tasks immediately on lock release is usually the wrong thing to do if
> letting the task run a little longer would allow it to reach a point
> where it has to block anyway.

I have Solaris style adaptive locks in my tree with my lockstat patch
under -rt. I've also modified my lockstat patch to track readers
correctly now with rwsem and the like to see where the single reader
limitation in the rtmutex blows it.

So far I've seen less than 10 percent of in-kernel contention events
actually worth spinning on and the rest of the stats imply that the
mutex owner in question is either preempted or blocked on something
else.

I've been trying to get folks to try this on a larger machine than my
2x AMD64 box so that I there is more data regarding Linux contention
and overschedulling in -rt.
 
> Anyway, I already described the urgency-driven strategy to the extent
> that I've thought it out, elsewhere in this thread.  I only held this
> draft back because I wanted to double-check my latency measurements.

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Mike Galbraith
On Fri, 2007-04-20 at 08:47 +1000, Con Kolivas wrote:

> It's those who want X to have an unfair advantage that want it to do 
> something "special".

I hope you're not lumping me in with "those".  If X + client had been
able to get their fair share and do so in the low latency manner they
need, I would have been one of the carrots instead of being the stick.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Nick Piggin
On Thu, Apr 19, 2007 at 12:26:03PM -0700, Ray Lee wrote:
> On 4/19/07, Con Kolivas <[EMAIL PROTECTED]> wrote:
> >The one fly in the ointment for
> >linux remains X. I am still, to this moment, completely and utterly stunned
> >at why everyone is trying to find increasingly complex unique ways to 
> >manage
> >X when all it needs is more cpu[1].
> [...and hence should be reniced]
> 
> The problem is that X is not unique. There's postgresql, memcached,
> mysql, db2, a little embedded app I wrote... all of these perform work
> on behalf of another process. It's just most *noticeable* with X, as
> pretty much everyone is running that.

But for most of those apps, we don't actually care if they do fairly
degrade in performance as other loads on the system ramp up. However
the user prefers X to be given priority in these situations. Whether
that is the design of X, x clients, or the human condition really
doesn't matter two hoots to the scheduler.


> If we had some way for the scheduler to decide to donate part of a
> client process's time slice to the server it just spoke to (with an
> exponential dampening factor -- take 50% from the client, give 25% to
> the server, toss the rest on the floor), that -- from my naive point
> of view -- would be a step toward fixing the underlying issue. Or I
> might be spouting crap, who knows.

Firstly, lots of clients in your list are remote. X usually isn't.
However for X, a syscall or something to donate time might not be
such a bad idea... but given a couple of X clients and a server
against a parallel make, this is probably just going to make the
clients slow down as well without giving enough priority to the
server.

X isn't special so much because it does work on behalf of others
(as you said, lots of things do that). It is special simply because
we _want_ rendering to have priority of the CPU (if you shifed CPU
intensive rendering to the clients, you'd most likely want to give
them priority to); nice, right?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Nick Piggin
On Thu, Apr 19, 2007 at 09:17:25AM -0400, Mark Lord wrote:
> Con Kolivas wrote:
> s go ahead and think up great ideas for other ways of metering out cpu 
> >bandwidth for different purposes, but for X, given the absurd simplicity 
> >of renicing, why keep fighting it? Again I reiterate that most users of SD 
> >have not found the need to renice X anyway except if they stick to old 
> >habits of make -j4 on uniprocessor and the like, and I expect that those 
> >on CFS and Nicksched would also have similar experiences.
> 
> Just plain "make" (no -j2 or -j) is enough to kill interactivity
> on my 2GHz P-M single-core non-HT machine with SD.

Is this with or without X reniced?


> But with the very first posted version of CFS by Ingo,
> I can do "make -j2" no problem and still have a nicely interactive destop.

How well does cfs run if you have the granularity set to something
like 30ms (3000)?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Gene Heskett
On Thursday 19 April 2007, Con Kolivas wrote:
>On Friday 20 April 2007 04:16, Gene Heskett wrote:
>> On Thursday 19 April 2007, Con Kolivas wrote:
>>
>> [and I snipped a good overview]
>>
>> >So yes go ahead and think up great ideas for other ways of metering out
>> > cpu bandwidth for different purposes, but for X, given the absurd
>> > simplicity of renicing, why keep fighting it? Again I reiterate that
>> > most users of SD have not found the need to renice X anyway except if
>> > they stick to old habits of make -j4 on uniprocessor and the like, and I
>> > expect that those on CFS and Nicksched would also have similar
>> > experiences.
>>
>> FWIW folks, I have never touched x's niceness, its running at the default
>> -1 for all of my so-called 'tests', and I have another set to be rebooted
>> to right now.  And yes, my kernel makeit script uses -j4 by default, and
>> has used -j8 just for effects, which weren't all that different from what
>> I expected in 'abusing' a UP system that way.  The system DID remain
>> usable, not snappy, but usable.
>
>Gene, you're agreeing with me. You've shown that you're very happy with a
> fair distribution of cpu and leaving X at nice 0.

I was quite happy till Ingo's first patch came out, and it was even better, 
but I over-wrote it, and we're still figuring out just exactly what the magic 
twanger was that made it all click for me.  OTOH, I don't think that patch 
passed muster with Mike G., either.  We have obviously different workloads, 
and critical points in them.

>> Having tried re-nicing X a while back, and having the rest of the system
>> suffer in quite obvious ways for even 1 + or - from its default felt
>> pretty bad from this users perspective.
>>
>> It is my considered opinion (yeah I know, I'm just a leaf in the hurricane
>> of this list) that if X has to be re-niced from the 1 point advantage its
>> had for ages, then something is basicly wrong with the overall scheduling,
>> cpu or i/o, or both in combination.  FWIW I'm using cfq for i/o.
>
>It's those who want X to have an unfair advantage that want it to do
>something "special". Your agreement that it works fine at nice 0 shows you
>don't want it to have an unfair advantage. Others who want it to have an
>unfair advantage _can_ renice it if they desire. But if the cpu scheduler
>gives X an unfair advantage within the kernel by default then you have _no_
>choice. If you leave the choice up to userspace (renice or not) then both
>parties get their way. If you put it into the kernel only one party wins and
>there is no way for the Genes (and Cons) of this world to get it back.
>
>Your opinion is as valuable as eveyone else's Gene. It is hard to get people
>to speak on as frightening a playground as the linux kernel mailing list so
>please do.

In the FWIW category, htop has always told me that x is running at -1, not 
zero.  Now, I have NDI where this is actually set at, so I'd have to ask 
stupid questions here if I did wanna play with it.  Which I really don't, the 
last time I tried to -5 x, kde got a whole lot LESS responsive.  But heck, 
2.6.2 was freshly minted then too and I've long since forgot how I went about 
that unless I used htop to change it, the most likely scenario that I can 
picture at this late date. 

As for speaking my mind, yes, and I've been slapped down a few times, as much 
because I do a lot of bitching and microscopic amounts of patch submission. 
The only patch I ever submitted was for something in the floppy driver, way 
back in the middle of 2.2 days, rejected because I didn't know how to use the 
tools correctly.  I didn't, so it was a shrug and my feelings weren't hurt.

Some see that as an unbalanced set of books and I'm aware of it.  OTOH, I 
think I do a pretty good job of playing the canary here, and that should be 
worth something if for no other reason than I can turn into a burr under 
somebodies saddle when things go all aglay.  But I figure if its happening to 
me, then if I don't fuss, and that gotcha gets into a distro kernel, there 
are gonna be a hell of a lot more folks than me trying to grab the 
microphone.

BTW, I'm glad you are feeling well enough to get into this again.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
There cannot be a crisis next week.  My schedule is already full.
-- Henry Kissinger
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Gene Heskett
On Thursday 19 April 2007, Con Kolivas wrote:
>On Friday 20 April 2007 04:16, Gene Heskett wrote:
>> On Thursday 19 April 2007, Con Kolivas wrote:
>>
>> [and I snipped a good overview]
>>
>> >So yes go ahead and think up great ideas for other ways of metering out
>> > cpu bandwidth for different purposes, but for X, given the absurd
>> > simplicity of renicing, why keep fighting it? Again I reiterate that
>> > most users of SD have not found the need to renice X anyway except if
>> > they stick to old habits of make -j4 on uniprocessor and the like, and I
>> > expect that those on CFS and Nicksched would also have similar
>> > experiences.
>>
>> FWIW folks, I have never touched x's niceness, its running at the default
>> -1 for all of my so-called 'tests', and I have another set to be rebooted
>> to right now.  And yes, my kernel makeit script uses -j4 by default, and
>> has used -j8 just for effects, which weren't all that different from what
>> I expected in 'abusing' a UP system that way.  The system DID remain
>> usable, not snappy, but usable.
>
>Gene, you're agreeing with me. You've shown that you're very happy with a
> fair distribution of cpu and leaving X at nice 0.

I was quite happy till Ingo's first patch came out, and it was even better, 
but I over-wrote it, and we're still figuring out just exactly what the magic 
twanger was that made it all click for me.  OTOH, I don't think that patch 
passed muster with Mike G., either.  We have obviously different workloads, 
and critical points in them.

>> Having tried re-nicing X a while back, and having the rest of the system
>> suffer in quite obvious ways for even 1 + or - from its default felt
>> pretty bad from this users perspective.
>>
>> It is my considered opinion (yeah I know, I'm just a leaf in the hurricane
>> of this list) that if X has to be re-niced from the 1 point advantage its
>> had for ages, then something is basicly wrong with the overall scheduling,
>> cpu or i/o, or both in combination.  FWIW I'm using cfq for i/o.
>
>It's those who want X to have an unfair advantage that want it to do
>something "special". Your agreement that it works fine at nice 0 shows you
>don't want it to have an unfair advantage. Others who want it to have an
>unfair advantage _can_ renice it if they desire. But if the cpu scheduler
>gives X an unfair advantage within the kernel by default then you have _no_
>choice. If you leave the choice up to userspace (renice or not) then both
>parties get their way. If you put it into the kernel only one party wins and
>there is no way for the Genes (and Cons) of this world to get it back.
>
>Your opinion is as valuable as eveyone else's Gene. It is hard to get people
>to speak on as frightening a playground as the linux kernel mailing list so
>please do.

In the FWIW category, htop has always told me that x is running at -1, not 
zero.  Now, I have NDI where this is actually set at, so I'd have to ask 
stupid questions here if I did wanna play with it.  Which I really don't, the 
last time I tried to -5 x, kde got a whole lot LESS responsive.  But heck, 
2.6.2 was freshly minted then too and I've long since forgot how I went about 
that unless I used htop to change it, the most likely scenario that I can 
picture at this late date. 

As for speaking my mind, yes, and I've been slapped down a few times, as much 
because I do a lot of bitching and microscopic amounts of patch submission. 
The only patch I ever submitted was for something in the floppy driver, way 
back in the middle of 2.2 days, rejected because I didn't know how to use the 
tools correctly.  I didn't, so it was a shrug and my feelings weren't hurt.

Some see that as an unbalanced set of books and I'm aware of it.  OTOH, I 
think I do a pretty good job of playing the canary here, and that should be 
worth something if for no other reason than I can turn into a burr under 
somebodies saddle when things go all aglay.  But I figure if its happening to 
me, then if I don't fuss, and that gotcha gets into a distro kernel, there 
are gonna be a hell of a lot more folks than me trying to grab the 
microphone.

BTW, I'm glad you are feeling well enough to get into this again.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
There cannot be a crisis next week.  My schedule is already full.
-- Henry Kissinger
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Michael K. Edwards

On 4/19/07, Lee Revell <[EMAIL PROTECTED]> wrote:

IMHO audio streamers should use SCHED_FIFO thread for time critical
work.  I think it's insane to expect the scheduler to figure out that
these processes need low latency when they can just be explicit about
it.  "Professional" audio software does it already, on Linux as well
as other OS...


It is certainly true that SCHED_FIFO is currently necessary in the
layers of an audio application lying closest to the hardware, if you
don't want to throw a monstrous hardware ring buffer at the problem.
See the alsa-devel archives for a patch to aplay (sched_setscheduler
plus some cleanups) that converts it from "unsafe at any speed" (on a
non-RT kernel) to a rock-solid 18ms round trip from PCM in to PCM out.
(The hardware and driver aren't terribly exotic for an SoC, and the
measurement was done with aplay -C | aplay -P -- on a
not-particularly-tuned CONFIG_PREEMPT kernel with a 12ms+ peak
scheduling latency according to cyclictest.  A similar test via
/dev/dsp, done through a slightly modified OSS emulation layer to the
same driver, measures at 40ms and is probably tuned too
conservatively.)

Note that SCHED_FIFO may be less necessary on an -rt kernel, but I
haven't had that option on the embedded hardware I've been working
with lately.  Ingo, please please pretty please pick a -stable branch
one of these days and provide a git repo with -rt integrated against
that branch.  Then I could port our chip support to it -- all of which
will be GPLed after the impending code review -- after which I might
have a prayer of strong-arming our chip vendor into porting their WiFi
driver onto -rt.  It's really a much more interesting scheduler use
case than make -j200 under X, because it's a best-effort
SCHED_BATCH-ish load that wants to be temporally clustered for power
management reasons.

(Believe it or not, a stable -rt branch with a clock-scaling-aware
scheduler is the one thing that might lead to this major WiFi vendor's
GPLing their driver core.  They're starting to see the light on the
biz dev side, and the nature of the devices their chip will go in
makes them somewhat less concerned about the regulatory fig leaf
aspect of a closed-source driver; but they would have to port off of
the third-party real-time executive embedded within the driver, and
mainline's task and timer granularity won't cut it.  I can't even get
more detail about _why_ it won't cut it unless there's some remotely
supportable -rt base they could port to.)

But I think SCHED_FIFO on a chain of tasks is fundamentally not the
right way to handle low audio latency.  The object with a low latency
requirement isn't the task, it's the device.  When it's starting to
get urgent to deliver more data to the device, the task that it's
waiting on should slide up the urgency scale; and if it's waiting on
something else, that something else should slide up the scale; and so
forth.  Similarly, responding to user input is urgent; so when user
input is available (by whatever mechanism), the task that's waiting
for it should slide up the urgency scale, etc.

In practice, you probably don't want to burden desktop Linux with
priority inheritance where you don't have to.  Priority queues with
algorithmically efficient decrease-key operations (Fibonacci heaps and
their ilk) are complicated to implement and have correspondingly high
constant factors.  (However, a sufficiently clever heuristic for
assigning quasi-static task priorities would usually short-circuit the
priority cascade; if you can keep N small in the
tasks-with-unpredictable-priority queue, you can probably use a
simpler flavor with O(log N) decrease-key.  Ask someone who knows more
about data structures than I do.)

More importantly, non-real-time application coders aren't very smart
about grouping data structure accesses on one side or the other of a
system call that is likely to release a lock and let something else
run, flushing application data out of cache.  (Kernel coders aren't
always smart about this either; see LKML threads a few weeks ago about
racy, and cache-stall-prone, f_pos handling in VFS.)  So switching
tasks immediately on lock release is usually the wrong thing to do if
letting the task run a little longer would allow it to reach a point
where it has to block anyway.

Anyway, I already described the urgency-driven strategy to the extent
that I've thought it out, elsewhere in this thread.  I only held this
draft back because I wanted to double-check my latency measurements.

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Linus Torvalds


On Thu, 19 Apr 2007, Ed Tomlinson wrote:
> > 
> > SD just doesn't do nearly as good as the stock scheduler, or CFS, here.
> > 
> > I'm quite likely one of the few single-CPU/non-HT testers of this stuff.
> > If it should ever get more widely used I think we'd hear a lot more 
> > complaints.
> 
> amd64 UP here.  SD with several makes running works just fine.

The thing is, it probably depends *heavily* on just how much work the X 
server ends up doing. Fast video hardware? The X server doesn't need to 
busy-wait much. Not a lot of eye-candy? The X server is likely fast enough 
even with a slower card that it still gets sufficient CPU time and isn't 
getting dinged by any balancing. DRI vs non-DRI? Which window manager 
(maybe some of the user-visible lags come from there..) etc etc.

Anyway, I'd ask people to look a bit at the current *regressions* instead 
of spending all their time on something that won't even be merged before 
2.6.21 is released, and we thus have some mroe pressing issues. Please?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Ed Tomlinson
On Thursday 19 April 2007 12:15, Mark Lord wrote:
> Con Kolivas wrote:
> > On Thursday 19 April 2007 23:17, Mark Lord wrote:
> >> Con Kolivas wrote:
> >> s go ahead and think up great ideas for other ways of metering out cpu
> >>
> >>> bandwidth for different purposes, but for X, given the absurd simplicity
> >>> of renicing, why keep fighting it? Again I reiterate that most users of
> >>> SD have not found the need to renice X anyway except if they stick to old
> >>> habits of make -j4 on uniprocessor and the like, and I expect that those
> >>> on CFS and Nicksched would also have similar experiences.
> >> Just plain "make" (no -j2 or -j) is enough to kill interactivity
> >> on my 2GHz P-M single-core non-HT machine with SD.
> >>
> >> But with the very first posted version of CFS by Ingo,
> >> I can do "make -j2" no problem and still have a nicely interactive destop.
> > 
> > Cool. Then there's clearly a bug with SD that manifests on your machine as 
> > it 
> > should not have that effect at all (and doesn't on other people's 
> > machines). 
> > I suggest trying the latest version which fixes some bugs.
> 
> SD just doesn't do nearly as good as the stock scheduler, or CFS, here.
> 
> I'm quite likely one of the few single-CPU/non-HT testers of this stuff.
> If it should ever get more widely used I think we'd hear a lot more 
> complaints.

amd64 UP here.  SD with several makes running works just fine.

Ed Tomlinson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Ray Lee
Con Kolivas wrote:
> You're welcome and thanks for taking the floor to speak. I would say you have 
> actually agreed with me though. X is not unique, it's just an obvious so 
> let's not design the cpu scheduler around the problem with X. Same goes for 
> every other application. Leaving the choice to hand out differential cpu 
> usage when they seem to need is should be up to the users. The donation idea 
> has been done before in some fashion or other in things like "back-boost" 
> which Linus himself tried in 2.5.X days. It worked lovely till it did the 
> wrong thing and wreaked havoc.

 I know. I came to the party late, or I would have played with it back
then. Perhaps you could correct me, but it seems his back-boost didn't do
any dampening, which means the system could get into nasty capture scenarios,
where two processes bouncing messages back and forth could take over the
scheduler and starve out the rest. It seems pretty obvious in hind-sight
that something without exponential dampening would allow feedback loops.

Regardless, perhaps we are in agreement. I just don't like the idea of having
to guess how much work postgresql is going to be doing on my client processes'
behalf. Worse, I don't necessarily want it to have that -10 priority when
it's going and updating statistics or whatnot, or any other housekeeping
activity that shouldn't make a noticeable impact on the rest of the system.
Worst, I'm leery of the idea that if I get its nice level wrong, that I'm
going to be affecting the overall throughput of the server.

All of which are only hypothetical worries, granted.

Anyway, I'll shut up now. Thanks again for stickin' with it.

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Michael K. Edwards

On 4/19/07, Con Kolivas <[EMAIL PROTECTED]> wrote:

The cpu scheduler core is a cpu bandwidth and latency
proportionator and should be nothing more or less.


Not really.  The CPU scheduler is (or ought to be) what electric
utilities call an economic dispatch mechanism -- a real-time
controller whose goal is to service competing demands cost-effectively
from a limited supply, without compromising system stability.

If you live in the 1960's, coal and nuclear (and a little bit of
fig-leaf hydro) are all you have, it takes you twelve hours to bring
plants on and off line, and there's no live operational control or
pricing signal between you and your customers.  So you're stuck
running your system at projected peak + operating margin, dumping
excess power as waste heat most of the time, and browning or blacking
people out willy-nilly when there's excess demand.  Maybe you get to
trade off shedding the loads with the worst transmission efficiency
against degrading the customers with the most tolerance for brownouts
(or the least regulatory clout).  That's life without modern economic
dispatch.

If you live in 2007, natural gas and (outside the US) better control
over nuclear plants give you more ability to ramp supply up and down
with demand on something like a 15-minute cycle.  Better yet, you can
store a little energy "in the grid" to smooth out instantaneous demand
fluctuations; if you're lucky, you also have enough fast-twitch hydro
(thanks, Canada!) that you can run your coal and lame-ass nuclear very
close to base load even when gas is expensive, and even pump water
back uphill when demand dips.  (Coal is nasty stuff and a worse
contributor by far to radiation exposure than nuclear generation; but
on current trends it's going to last a lot longer than oil and gas,
and it's a lot easier to stockpile next to the generator.)

Best of all, you have industrial customers who will trade you live
control (within limits) over when and how much power they take in
return for a lower price per unit energy.  Some of them will even dump
power back into the grid when you ask them to.  So now the biggest
challenge in making supply and demand meet (in the short term) is to
damp all the different ways that a control feedback path might result
in an oscillation -- or in runaway pricing.  Because there's always
some asshole greedhead who will gamble with system stability in order
to game the pricing mechanism.  Lots of 'em, if you're in California
and your legislature is so dumb, or so bought, that they let the
asshole greedheads design the whole system so they can game it to the
max.  (But that's a whole 'nother rant.)

Embedded systems are already in 2007, and the mainline Linux scheduler
frankly sucks on them, because it thinks it's back in the 1960's with
a fixed supply and captive demand, pissing away "CPU bandwidth" as
waste heat.  Not to say it's an easy problem; even academics with a
dozen publications in this area don't seem to be able to model energy
usage to the nearest big O, let alone design a stable economic
dispatch engine.  But it helps to acknowledge what the problem is:
even in a 1960's raised-floor screaming-air-conditioners
screw-the-power-bill machine room, you can't actually run a
half-decent CPU flat out any more without burning it to a crisp.

You can act ignorant and let the PMIC brown you out when it has to.
Or you can start coping in mainline the way that organizations big
enough (and smart enough) to feel the heat in their pocketbooks do in
their pet kernels.  (Boo on Google for not sharing, and props to IBM
for doing their damnedest.)  And guess what?  The system will actually
get simpler, and stabler, and faster, and easier to maintain, because
it'll be based on a real theory of operation with equations and things
instead of a bunch of opaque, undocumented shotgun heuristics.

This hypothetical economic-dispatch scheduler will still _have_
heuristics, of course -- you can't begin to model a modern CPU
accurately on-line.  But they will be contained in _data_ rather than
_code_, and issues of numerical stability will be separated cleanly
from the rule set.  You'll be able to characterize the rule set's
domain of stability, given a conservative set of assumptions about the
feedback paths in the system under control, with the sort of
techniques they teach in the engineering schools that none of us (me
included) seem to have attended.  (I went to school thinking I was
going to be a physicist.  Wishful thinking -- but I was young and
stupid.  What's your excuse?  ;-)

OK, it feels better to have that off my chest.  Apologies to those
readers -- doubtless the vast majority of LKML, including everyone
else in this thread -- for whom it's irrelevant, pseudo-learned
pontification with no patch attached.  And my sincere thanks to Ingo,
Con, and really everyone else CC'ed, without whom Linux wouldn't be as
good as it is (really quite good, all things considered) and wouldn't
contribute as much as it 

Re: Renice X for cpu schedulers

2007-04-19 Thread Con Kolivas
On Friday 20 April 2007 02:15, Mark Lord wrote:
> Con Kolivas wrote:
> > On Thursday 19 April 2007 23:17, Mark Lord wrote:
> >> Con Kolivas wrote:
> >> s go ahead and think up great ideas for other ways of metering out cpu
> >>
> >>> bandwidth for different purposes, but for X, given the absurd
> >>> simplicity of renicing, why keep fighting it? Again I reiterate that
> >>> most users of SD have not found the need to renice X anyway except if
> >>> they stick to old habits of make -j4 on uniprocessor and the like, and
> >>> I expect that those on CFS and Nicksched would also have similar
> >>> experiences.
> >>
> >> Just plain "make" (no -j2 or -j) is enough to kill interactivity
> >> on my 2GHz P-M single-core non-HT machine with SD.
> >>
> >> But with the very first posted version of CFS by Ingo,
> >> I can do "make -j2" no problem and still have a nicely interactive
> >> destop.
> >
> > Cool. Then there's clearly a bug with SD that manifests on your machine
> > as it should not have that effect at all (and doesn't on other people's
> > machines). I suggest trying the latest version which fixes some bugs.
>
> SD just doesn't do nearly as good as the stock scheduler, or CFS, here.
>
> I'm quite likely one of the few single-CPU/non-HT testers of this stuff.
> If it should ever get more widely used I think we'd hear a lot more
> complaints.

You are not really one of the few. A lot of my own work is done on a single 
core pentium M 1.7Ghz laptop. I am not endowed with truckloads of hardware 
like all the paid developers are. I recall extreme frustration myself when a 
developer a few years ago (around 2002) said he couldn't reproduce poor 
behaviour on his 4GB ram 4 x Xeon machine. Even today if I add up every 
machine I have in my house and work at my disposal it doesn't amount to that 
many cpus and that much ram.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Con Kolivas
On Friday 20 April 2007 05:26, Ray Lee wrote:
> On 4/19/07, Con Kolivas <[EMAIL PROTECTED]> wrote:
> > The one fly in the ointment for
> > linux remains X. I am still, to this moment, completely and utterly
> > stunned at why everyone is trying to find increasingly complex unique
> > ways to manage X when all it needs is more cpu[1].
>
> [...and hence should be reniced]
>
> The problem is that X is not unique. There's postgresql, memcached,
> mysql, db2, a little embedded app I wrote... all of these perform work
> on behalf of another process. It's just most *noticeable* with X, as
> pretty much everyone is running that.
>
> If we had some way for the scheduler to decide to donate part of a
> client process's time slice to the server it just spoke to (with an
> exponential dampening factor -- take 50% from the client, give 25% to
> the server, toss the rest on the floor), that -- from my naive point
> of view -- would be a step toward fixing the underlying issue. Or I
> might be spouting crap, who knows.
>
> The problem is real, though, and not limited to X.
>
> While I have the floor, thank you, Con, for all your work.

You're welcome and thanks for taking the floor to speak. I would say you have 
actually agreed with me though. X is not unique, it's just an obvious so 
let's not design the cpu scheduler around the problem with X. Same goes for 
every other application. Leaving the choice to hand out differential cpu 
usage when they seem to need is should be up to the users. The donation idea 
has been done before in some fashion or other in things like "back-boost" 
which Linus himself tried in 2.5.X days. It worked lovely till it did the 
wrong thing and wreaked havoc. As is shown repeatedly, the workarounds and 
the tweaks and the bonuses and the decide on who to give advantage to, when 
done by the cpu scheduler, is also what is its undoing as it can't always get 
it right. The consequences of getting it wrong on the other hand are 
disastrous. The cpu scheduler core is a cpu bandwidth and latency 
proportionator and should be nothing more or less.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Con Kolivas
On Friday 20 April 2007 04:16, Gene Heskett wrote:
> On Thursday 19 April 2007, Con Kolivas wrote:
>
> [and I snipped a good overview]
>
> >So yes go ahead and think up great ideas for other ways of metering out
> > cpu bandwidth for different purposes, but for X, given the absurd
> > simplicity of renicing, why keep fighting it? Again I reiterate that most
> > users of SD have not found the need to renice X anyway except if they
> > stick to old habits of make -j4 on uniprocessor and the like, and I
> > expect that those on CFS and Nicksched would also have similar
> > experiences.
>
> FWIW folks, I have never touched x's niceness, its running at the default
> -1 for all of my so-called 'tests', and I have another set to be rebooted
> to right now.  And yes, my kernel makeit script uses -j4 by default, and
> has used -j8 just for effects, which weren't all that different from what I
> expected in 'abusing' a UP system that way.  The system DID remain usable,
> not snappy, but usable.

Gene, you're agreeing with me. You've shown that you're very happy with a fair 
distribution of cpu and leaving X at nice 0.
>
> Having tried re-nicing X a while back, and having the rest of the system
> suffer in quite obvious ways for even 1 + or - from its default felt pretty
> bad from this users perspective.
>
> It is my considered opinion (yeah I know, I'm just a leaf in the hurricane
> of this list) that if X has to be re-niced from the 1 point advantage its
> had for ages, then something is basicly wrong with the overall scheduling,
> cpu or i/o, or both in combination.  FWIW I'm using cfq for i/o.

It's those who want X to have an unfair advantage that want it to do 
something "special". Your agreement that it works fine at nice 0 shows you 
don't want it to have an unfair advantage. Others who want it to have an 
unfair advantage _can_ renice it if they desire. But if the cpu scheduler 
gives X an unfair advantage within the kernel by default then you have _no_ 
choice. If you leave the choice up to userspace (renice or not) then both 
parties get their way. If you put it into the kernel only one party wins and 
there is no way for the Genes (and Cons) of this world to get it back.

Your opinion is as valuable as eveyone else's Gene. It is hard to get people 
to speak on as frightening a playground as the linux kernel mailing list so 
please do. 

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Michael K. Edwards

On 4/19/07, Gene Heskett <[EMAIL PROTECTED]> wrote:

Having tried re-nicing X a while back, and having the rest of the system
suffer in quite obvious ways for even 1 + or - from its default felt pretty
bad from this users perspective.

It is my considered opinion (yeah I know, I'm just a leaf in the hurricane of
this list) that if X has to be re-niced from the 1 point advantage its had
for ages, then something is basicly wrong with the overall scheduling, cpu or
i/o, or both in combination.  FWIW I'm using cfq for i/o.


I think I just realized why the X server is such a problem.  If it
gets preempted when it's not actually selecting/polling over a set of
fds that includes the input devices, the scheduler doesn't know that
it's a good candidate for scheduling when data arrives on those
devices.  (That's all that any of these dynamic priority heuristics
really seem to do -- weight the scheduler towards switching to
conspicuously I/O bound tasks when they become runnable, without the
forced preemption on lock release that would result from a true
priority inheritance mechanism.)

One way of looking at this is that "fairness-driven" scheduling is a
poor man's priority ceiling protocol for I/O bound workloads, with the
implicit priority of an fd or lock given by how desperately the reader
side needs more data in order to accomplish anything.  "Nice" on a
task is sort of an indirect way of boosting or dropping the base
priority of the fds it commonly waits on.  I recognize this is a
drastic oversimplification, and possibly even a misrepresentation of
the design _intent_; but I think it's fairly accurate in terms of the
design _effect_.

The event-driven, non-threaded design of the X server makes it
particularly vulnerable to "non-interactive behavior" penalties, which
is appropriate to the extent that it's an output device having trouble
keeping up with rendering -- in fact, that's exactly the throttling
mechanism you need in order to exert back-pressure on the X client.
(Trying to exert back-pressure over Linux's local domain sockets seems
to be like pushing on a rope, but that's a different problem.)  That
same event-driven design would prioritize input events just fine --
except the scheduler won't wake the task in order to deliver them,
because as far as it's concerned the X server is getting more than
enough I/O to keep it busy.  It's not only not blocked on the input
device, it isn't even selecting on it at the moment that its timeslice
expires -- so no amount of poor-man's PCP emulation is going to help.

What "more negative nice on the X server than on any CPU-bound
process" seems to do is to put the X server on a hair-trigger,
boosting its dynamic priority in a render-limited scenario (on some
graphics cards!) just enough to cancel the penalty for non-interactive
behavior.  It's forced to share _some_ CPU cycles, but nobody else is
allowed a long enough timeslice to keep the X server off the CPU (and
insensitive to input events) for long.  Not terribly efficient in
terms of context switch / cache eviction overhead, but certainly
friendlier to the PEBCAK (who is clearly putting totally inappropriate
load on a single-threaded CPU by running both a local X server and
non-SCHED_BATCH compute jobs) than a frozen mouse cursor.

So what's the right answer?  Not special-casing the X server, that's
for sure.  If this analysis is correct (and as of now it's pure
speculation), any event-driven application that does compute work
opportunistically in the absence of user interaction is vulnerable to
the same overzealous squelching.  I wouldn't design a new application
that way, of course -- user interaction belongs in a separate thread
on any UNIX-legacy system which assigns priorities to threads of
control instead of to patterns of activity.  But all sorts of Linux
applications have been designed to implicitly elevate artificial
throughput benchmarks over user responsiveness -- that has been the
UNIX way at least since SVR4, and Linux's history of expensive thread
switches prior to NPTL didn't help.

If you want responsiveness when the CPU is oversubscribed -- and I for
one do, which is one reason why I abandoned the Linux desktop once
both Microsoft and Apple figured out how to make hyperthreading work
in their favor -- you should probably think about how to get it
without rewriting half of userspace.  IMHO, dinking around with
"fairness", as if there were any relationship these days between UIDs
or process groups or any other control structure and the work that's
trying to flow through the system, is not going to get you there.

If this were my problem, I might start by attaching urgency to
behavior instead of to thread ID, which demands a scheduler queue
built around a data structure with a cheap decrease-key operation.
I'd figure out how to propagate this urgency not just along lock
chains but also along chains of fds that need flushing (or refilling)
-- even if the reader (or writer) got preempted for 

Re: Renice X for cpu schedulers

2007-04-19 Thread Ray Lee

On 4/19/07, Con Kolivas <[EMAIL PROTECTED]> wrote:

The one fly in the ointment for
linux remains X. I am still, to this moment, completely and utterly stunned
at why everyone is trying to find increasingly complex unique ways to manage
X when all it needs is more cpu[1].

[...and hence should be reniced]

The problem is that X is not unique. There's postgresql, memcached,
mysql, db2, a little embedded app I wrote... all of these perform work
on behalf of another process. It's just most *noticeable* with X, as
pretty much everyone is running that.

If we had some way for the scheduler to decide to donate part of a
client process's time slice to the server it just spoke to (with an
exponential dampening factor -- take 50% from the client, give 25% to
the server, toss the rest on the floor), that -- from my naive point
of view -- would be a step toward fixing the underlying issue. Or I
might be spouting crap, who knows.

The problem is real, though, and not limited to X.

While I have the floor, thank you, Con, for all your work.

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Gene Heskett
On Thursday 19 April 2007, Mark Lord wrote:
>Con Kolivas wrote:
>> On Thursday 19 April 2007 23:17, Mark Lord wrote:
>>> Con Kolivas wrote:
>>> s go ahead and think up great ideas for other ways of metering out cpu
>>>
 bandwidth for different purposes, but for X, given the absurd simplicity
 of renicing, why keep fighting it? Again I reiterate that most users of
 SD have not found the need to renice X anyway except if they stick to
 old habits of make -j4 on uniprocessor and the like, and I expect that
 those on CFS and Nicksched would also have similar experiences.
>>>
>>> Just plain "make" (no -j2 or -j) is enough to kill interactivity
>>> on my 2GHz P-M single-core non-HT machine with SD.
>>>
>>> But with the very first posted version of CFS by Ingo,
>>> I can do "make -j2" no problem and still have a nicely interactive
>>> destop.
>>
>> Cool. Then there's clearly a bug with SD that manifests on your machine as
>> it should not have that effect at all (and doesn't on other people's
>> machines). I suggest trying the latest version which fixes some bugs.
>
>SD just doesn't do nearly as good as the stock scheduler, or CFS, here.

I found the early SD's much friendlier here, but I also think that at that 
point I was comparing SD to stock 2.6.21-rc5 and 6, and to say that it sucked 
would be a slight understatement.

>I'm quite likely one of the few single-CPU/non-HT testers of this stuff.
>If it should ever get more widely used I think we'd hear a lot more
> complaints.

I'm in that row of seats too Mark.  Someday I have to build a new box, that's 
all there is to it...

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Lots of folks confuse bad management with destiny.
-- Frank Hubbard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Gene Heskett
On Thursday 19 April 2007, Con Kolivas wrote:

[and I snipped a good overview]

>So yes go ahead and think up great ideas for other ways of metering out cpu
>bandwidth for different purposes, but for X, given the absurd simplicity of
>renicing, why keep fighting it? Again I reiterate that most users of SD have
>not found the need to renice X anyway except if they stick to old habits of
>make -j4 on uniprocessor and the like, and I expect that those on CFS and
>Nicksched would also have similar experiences.

FWIW folks, I have never touched x's niceness, its running at the default -1 
for all of my so-called 'tests', and I have another set to be rebooted to 
right now.  And yes, my kernel makeit script uses -j4 by default, and has 
used -j8 just for effects, which weren't all that different from what I 
expected in 'abusing' a UP system that way.  The system DID remain usable, 
not snappy, but usable.

Having tried re-nicing X a while back, and having the rest of the system 
suffer in quite obvious ways for even 1 + or - from its default felt pretty 
bad from this users perspective.

It is my considered opinion (yeah I know, I'm just a leaf in the hurricane of 
this list) that if X has to be re-niced from the 1 point advantage its had 
for ages, then something is basicly wrong with the overall scheduling, cpu or 
i/o, or both in combination.  FWIW I'm using cfq for i/o.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Moore's Constant:
Everybody sets out to do something, and everybody
does something, but no one does what he sets out to do.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Mark Lord

Con Kolivas wrote:

On Thursday 19 April 2007 23:17, Mark Lord wrote:

Con Kolivas wrote:
s go ahead and think up great ideas for other ways of metering out cpu


bandwidth for different purposes, but for X, given the absurd simplicity
of renicing, why keep fighting it? Again I reiterate that most users of
SD have not found the need to renice X anyway except if they stick to old
habits of make -j4 on uniprocessor and the like, and I expect that those
on CFS and Nicksched would also have similar experiences.

Just plain "make" (no -j2 or -j) is enough to kill interactivity
on my 2GHz P-M single-core non-HT machine with SD.

But with the very first posted version of CFS by Ingo,
I can do "make -j2" no problem and still have a nicely interactive destop.


Cool. Then there's clearly a bug with SD that manifests on your machine as it 
should not have that effect at all (and doesn't on other people's machines). 
I suggest trying the latest version which fixes some bugs.


SD just doesn't do nearly as good as the stock scheduler, or CFS, here.

I'm quite likely one of the few single-CPU/non-HT testers of this stuff.
If it should ever get more widely used I think we'd hear a lot more complaints.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Peter Williams

Con Kolivas wrote:
Ok, there are 3 known schedulers currently being "promoted" as solid 
replacements for the mainline scheduler which address most of the issues with 
mainline (and about 10 other ones not currently being promoted). The main way 
they do this is through attempting to maintain solid fairness. There is 
enough evidence mounting now from the numerous test cases fixed by much 
fairer designs that this is the way forward for a general purpose cpu 
scheduler which is what linux needs. 

Interactivity of just about everything that needs low latency (ie audio and 
video players) are easily managed by maintaining low latency between wakeups 
and scheduling of all these low cpu users.


On a "fair" scheduler these will all get high priority (and good 
response) because their CPU bandwidth usage will be much smaller than 
their entitlement and the scheduler will be trying to help them "catch 
up".  So (as you say) they shouldn't be a problem.


The one fly in the ointment for 
linux remains X. I am still, to this moment, completely and utterly stunned 
at why everyone is trying to find increasingly complex unique ways to manage 
X when all it needs is more cpu[1]. Now most of these are actually very good 
ideas about _extra_ features that would be desirable in the long run for 
linux, but given the ludicrous simplicity of renicing X I cannot fathom why 
people keep promoting these alternatives. At the time of 2.6.0 coming out we 
were desparately trying to get half decent interactivity within a reasonable 
time frame to release 2.6.0 without rewiring the whole scheduler. So I 
tweaked the crap out of the tunables that were already there[2].


X's needs are more complex than that (from my observations) in that the 
part of X that processes input doesn't use much CPU but the part that 
does output can be quite a heavy user of CPU (e.g. do a "ls -lR /" in an 
xterm and watch X chew up the CPU).  At the same time, the part of X 
that processes input needs quick responsiveness as it's part of the 
interactive chain where this is less so for the output part.


Where X comes unstuck in the current scheduler is that when the output 
part goes on one of its CPU storms it ceases to look like an interactive 
task and gets given lower priority.  Ironically, this doesn't effect the 
output part of X but it does effect the input part and is manifest as 
crappy interactive response.  One wonders whether modifying X so that it 
has two threads: one for output and one for input; that could be 
scheduled separately might help.  I guess it would depend on whether 
there is insufficient independence between the two halves.


Part of this issue is that giving X a high static priority runs the risk 
of the CPU hog output part disrupting scheduling of other important 
tasks.  So don't give it too big a boost.




So let's hear from the 3 people who generated the schedulers under the 
spotlight. These are recent snippets and by no means the only time these 
comments have been said. Without sounding too bold, we do know a thing or two 
about scheduling.


CFS:
On Thursday 19 April 2007 16:38, Ingo Molnar wrote:

h. How about the following then: default to nice -10 for all
(SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_
special: root already has disk space reserved to it, root has special
memory allocation allowances, etc. I dont see a reason why we couldnt by
default make all root tasks have nice -10. This would be instantly loved
by sysadmins i suspect ;-)


It's worth noting that the -10 mentioned is roughly equivalent (in the 
old scheduler) to restoring interactive task status to X in those cases 
where it loses it due to a CPU storm in its output part.




(distros that go the extra mile of making Xorg run under non-root could
also go another extra one foot to renice that X server to -10.)


Nicksched:
On Wednesday 18 April 2007 15:00, Nick Piggin wrote:

What's wrong with allowing X to get more than it's fair share of CPU
time by "fiddling with nice levels"? That's what they're there for.


and

Staircase-Deadline:
On Thursday 19 April 2007 09:59, Con Kolivas wrote:

Remember to renice X to -10 for nicest desktop behaviour :)


I'd like to add the EBS scheduler (posted by Aurema Pty Ltd a couple of 
years back) to this list as it also recommended running X at nice -5 to -10.


Also some of the "interactive bonus" mechanisms in my SPA schedulers 
could be removed if X was reniced.  In fact, with a reniced X the 
spa_svr (server oriented scheduler which attempts to minimise the time 
tasks spend on the queue waiting for CPU access and which doesn't have 
interactive bonuses) might be usable on a work station.





[1]The one caveat I can think of is that when you share X sessions across 
multiple users -with a fair cpu scheduler-, having them all nice 0 also makes 
the distribution of cpu across the multiple users very even and smooth, 
without the expense of burning away the other person's cpu 

Re: Renice X for cpu schedulers

2007-04-19 Thread Con Kolivas
On Thursday 19 April 2007 23:17, Mark Lord wrote:
> Con Kolivas wrote:
> s go ahead and think up great ideas for other ways of metering out cpu
>
> > bandwidth for different purposes, but for X, given the absurd simplicity
> > of renicing, why keep fighting it? Again I reiterate that most users of
> > SD have not found the need to renice X anyway except if they stick to old
> > habits of make -j4 on uniprocessor and the like, and I expect that those
> > on CFS and Nicksched would also have similar experiences.
>
> Just plain "make" (no -j2 or -j) is enough to kill interactivity
> on my 2GHz P-M single-core non-HT machine with SD.
>
> But with the very first posted version of CFS by Ingo,
> I can do "make -j2" no problem and still have a nicely interactive destop.

Cool. Then there's clearly a bug with SD that manifests on your machine as it 
should not have that effect at all (and doesn't on other people's machines). 
I suggest trying the latest version which fixes some bugs.

Thanks.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Lee Revell

On 4/19/07, Peter Williams <[EMAIL PROTECTED]> wrote:

PS I think that the tasks most likely to be adversely effected by X's
CPU storms (enough to annoy the user) are audio streamers so when you're
doing tests to determine the best nice value for X I suggest that would
be a good criterion.  Video streamers are also susceptible but glitches
in video don't seem to annoy users as much as audio ones.


IMHO audio streamers should use SCHED_FIFO thread for time critical
work.  I think it's insane to expect the scheduler to figure out that
these processes need low latency when they can just be explicit about
it.  "Professional" audio software does it already, on Linux as well
as other OS...

Lee
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Peter Williams

Peter Williams wrote:

Con Kolivas wrote:
Ok, there are 3 known schedulers currently being "promoted" as solid 
replacements for the mainline scheduler which address most of the 
issues with mainline (and about 10 other ones not currently being 
promoted). The main way they do this is through attempting to maintain 
solid fairness. There is enough evidence mounting now from the 
numerous test cases fixed by much fairer designs that this is the way 
forward for a general purpose cpu scheduler which is what linux needs.
Interactivity of just about everything that needs low latency (ie 
audio and video players) are easily managed by maintaining low latency 
between wakeups and scheduling of all these low cpu users.


On a "fair" scheduler these will all get high priority (and good 
response) because their CPU bandwidth usage will be much smaller than 
their entitlement and the scheduler will be trying to help them "catch 
up".  So (as you say) they shouldn't be a problem.


The one fly in the ointment for linux remains X. I am still, to this 
moment, completely and utterly stunned at why everyone is trying to 
find increasingly complex unique ways to manage X when all it needs is 
more cpu[1]. Now most of these are actually very good ideas about 
_extra_ features that would be desirable in the long run for linux, 
but given the ludicrous simplicity of renicing X I cannot fathom why 
people keep promoting these alternatives. At the time of 2.6.0 coming 
out we were desparately trying to get half decent interactivity within 
a reasonable time frame to release 2.6.0 without rewiring the whole 
scheduler. So I tweaked the crap out of the tunables that were already 
there[2].


X's needs are more complex than that (from my observations) in that the 
part of X that processes input doesn't use much CPU but the part that 
does output can be quite a heavy user of CPU (e.g. do a "ls -lR /" in an 
xterm and watch X chew up the CPU).  At the same time, the part of X 
that processes input needs quick responsiveness as it's part of the 
interactive chain where this is less so for the output part.


Where X comes unstuck in the current scheduler is that when the output 
part goes on one of its CPU storms it ceases to look like an interactive 
task and gets given lower priority.  Ironically, this doesn't effect the 
output part of X but it does effect the input part and is manifest as 
crappy interactive response.  One wonders whether modifying X so that it 
has two threads: one for output and one for input; that could be 
scheduled separately might help.  I guess it would depend on whether 
there is insufficient independence between the two halves.


I forgot to make my point here and that was that if X could be split in 
two neither half would need to be reniced.  As a very low CPU bandwidth 
user the input half would get along just fine like the other interactive 
tasks that you mention.  And the output put part isn't adversely 
effected by not having a boost so it would get along just fine as well 
and you don't want it having a boost when it's in a CPU storm anyway.


Of course, if the interdependence between the two halves is such that 
the equivalent of priority inversion occurs between the two threads. 
However, that might be solved by making the division between the two 
halves on a dimension other than the input/output one.


Peter
PS I think that the tasks most likely to be adversely effected by X's 
CPU storms (enough to annoy the user) are audio streamers so when you're 
doing tests to determine the best nice value for X I suggest that would 
be a good criterion.  Video streamers are also susceptible but glitches 
in video don't seem to annoy users as much as audio ones.

--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Mark Lord

Con Kolivas wrote:
s go ahead and think up great ideas for other ways of metering out cpu 
bandwidth for different purposes, but for X, given the absurd simplicity of 
renicing, why keep fighting it? Again I reiterate that most users of SD have 
not found the need to renice X anyway except if they stick to old habits of 
make -j4 on uniprocessor and the like, and I expect that those on CFS and 
Nicksched would also have similar experiences.


Just plain "make" (no -j2 or -j) is enough to kill interactivity
on my 2GHz P-M single-core non-HT machine with SD.

But with the very first posted version of CFS by Ingo,
I can do "make -j2" no problem and still have a nicely interactive destop.

-ml
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Mark Lord

Con Kolivas wrote:
s go ahead and think up great ideas for other ways of metering out cpu 
bandwidth for different purposes, but for X, given the absurd simplicity of 
renicing, why keep fighting it? Again I reiterate that most users of SD have 
not found the need to renice X anyway except if they stick to old habits of 
make -j4 on uniprocessor and the like, and I expect that those on CFS and 
Nicksched would also have similar experiences.


Just plain make (no -j2 or -j) is enough to kill interactivity
on my 2GHz P-M single-core non-HT machine with SD.

But with the very first posted version of CFS by Ingo,
I can do make -j2 no problem and still have a nicely interactive destop.

-ml
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Peter Williams

Peter Williams wrote:

Con Kolivas wrote:
Ok, there are 3 known schedulers currently being promoted as solid 
replacements for the mainline scheduler which address most of the 
issues with mainline (and about 10 other ones not currently being 
promoted). The main way they do this is through attempting to maintain 
solid fairness. There is enough evidence mounting now from the 
numerous test cases fixed by much fairer designs that this is the way 
forward for a general purpose cpu scheduler which is what linux needs.
Interactivity of just about everything that needs low latency (ie 
audio and video players) are easily managed by maintaining low latency 
between wakeups and scheduling of all these low cpu users.


On a fair scheduler these will all get high priority (and good 
response) because their CPU bandwidth usage will be much smaller than 
their entitlement and the scheduler will be trying to help them catch 
up.  So (as you say) they shouldn't be a problem.


The one fly in the ointment for linux remains X. I am still, to this 
moment, completely and utterly stunned at why everyone is trying to 
find increasingly complex unique ways to manage X when all it needs is 
more cpu[1]. Now most of these are actually very good ideas about 
_extra_ features that would be desirable in the long run for linux, 
but given the ludicrous simplicity of renicing X I cannot fathom why 
people keep promoting these alternatives. At the time of 2.6.0 coming 
out we were desparately trying to get half decent interactivity within 
a reasonable time frame to release 2.6.0 without rewiring the whole 
scheduler. So I tweaked the crap out of the tunables that were already 
there[2].


X's needs are more complex than that (from my observations) in that the 
part of X that processes input doesn't use much CPU but the part that 
does output can be quite a heavy user of CPU (e.g. do a ls -lR / in an 
xterm and watch X chew up the CPU).  At the same time, the part of X 
that processes input needs quick responsiveness as it's part of the 
interactive chain where this is less so for the output part.


Where X comes unstuck in the current scheduler is that when the output 
part goes on one of its CPU storms it ceases to look like an interactive 
task and gets given lower priority.  Ironically, this doesn't effect the 
output part of X but it does effect the input part and is manifest as 
crappy interactive response.  One wonders whether modifying X so that it 
has two threads: one for output and one for input; that could be 
scheduled separately might help.  I guess it would depend on whether 
there is insufficient independence between the two halves.


I forgot to make my point here and that was that if X could be split in 
two neither half would need to be reniced.  As a very low CPU bandwidth 
user the input half would get along just fine like the other interactive 
tasks that you mention.  And the output put part isn't adversely 
effected by not having a boost so it would get along just fine as well 
and you don't want it having a boost when it's in a CPU storm anyway.


Of course, if the interdependence between the two halves is such that 
the equivalent of priority inversion occurs between the two threads. 
However, that might be solved by making the division between the two 
halves on a dimension other than the input/output one.


Peter
PS I think that the tasks most likely to be adversely effected by X's 
CPU storms (enough to annoy the user) are audio streamers so when you're 
doing tests to determine the best nice value for X I suggest that would 
be a good criterion.  Video streamers are also susceptible but glitches 
in video don't seem to annoy users as much as audio ones.

--
Peter Williams   [EMAIL PROTECTED]

Learning, n. The kind of ignorance distinguishing the studious.
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Lee Revell

On 4/19/07, Peter Williams [EMAIL PROTECTED] wrote:

PS I think that the tasks most likely to be adversely effected by X's
CPU storms (enough to annoy the user) are audio streamers so when you're
doing tests to determine the best nice value for X I suggest that would
be a good criterion.  Video streamers are also susceptible but glitches
in video don't seem to annoy users as much as audio ones.


IMHO audio streamers should use SCHED_FIFO thread for time critical
work.  I think it's insane to expect the scheduler to figure out that
these processes need low latency when they can just be explicit about
it.  Professional audio software does it already, on Linux as well
as other OS...

Lee
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Con Kolivas
On Thursday 19 April 2007 23:17, Mark Lord wrote:
 Con Kolivas wrote:
 s go ahead and think up great ideas for other ways of metering out cpu

  bandwidth for different purposes, but for X, given the absurd simplicity
  of renicing, why keep fighting it? Again I reiterate that most users of
  SD have not found the need to renice X anyway except if they stick to old
  habits of make -j4 on uniprocessor and the like, and I expect that those
  on CFS and Nicksched would also have similar experiences.

 Just plain make (no -j2 or -j) is enough to kill interactivity
 on my 2GHz P-M single-core non-HT machine with SD.

 But with the very first posted version of CFS by Ingo,
 I can do make -j2 no problem and still have a nicely interactive destop.

Cool. Then there's clearly a bug with SD that manifests on your machine as it 
should not have that effect at all (and doesn't on other people's machines). 
I suggest trying the latest version which fixes some bugs.

Thanks.

-- 
-ck
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Peter Williams

Con Kolivas wrote:
Ok, there are 3 known schedulers currently being promoted as solid 
replacements for the mainline scheduler which address most of the issues with 
mainline (and about 10 other ones not currently being promoted). The main way 
they do this is through attempting to maintain solid fairness. There is 
enough evidence mounting now from the numerous test cases fixed by much 
fairer designs that this is the way forward for a general purpose cpu 
scheduler which is what linux needs. 

Interactivity of just about everything that needs low latency (ie audio and 
video players) are easily managed by maintaining low latency between wakeups 
and scheduling of all these low cpu users.


On a fair scheduler these will all get high priority (and good 
response) because their CPU bandwidth usage will be much smaller than 
their entitlement and the scheduler will be trying to help them catch 
up.  So (as you say) they shouldn't be a problem.


The one fly in the ointment for 
linux remains X. I am still, to this moment, completely and utterly stunned 
at why everyone is trying to find increasingly complex unique ways to manage 
X when all it needs is more cpu[1]. Now most of these are actually very good 
ideas about _extra_ features that would be desirable in the long run for 
linux, but given the ludicrous simplicity of renicing X I cannot fathom why 
people keep promoting these alternatives. At the time of 2.6.0 coming out we 
were desparately trying to get half decent interactivity within a reasonable 
time frame to release 2.6.0 without rewiring the whole scheduler. So I 
tweaked the crap out of the tunables that were already there[2].


X's needs are more complex than that (from my observations) in that the 
part of X that processes input doesn't use much CPU but the part that 
does output can be quite a heavy user of CPU (e.g. do a ls -lR / in an 
xterm and watch X chew up the CPU).  At the same time, the part of X 
that processes input needs quick responsiveness as it's part of the 
interactive chain where this is less so for the output part.


Where X comes unstuck in the current scheduler is that when the output 
part goes on one of its CPU storms it ceases to look like an interactive 
task and gets given lower priority.  Ironically, this doesn't effect the 
output part of X but it does effect the input part and is manifest as 
crappy interactive response.  One wonders whether modifying X so that it 
has two threads: one for output and one for input; that could be 
scheduled separately might help.  I guess it would depend on whether 
there is insufficient independence between the two halves.


Part of this issue is that giving X a high static priority runs the risk 
of the CPU hog output part disrupting scheduling of other important 
tasks.  So don't give it too big a boost.




So let's hear from the 3 people who generated the schedulers under the 
spotlight. These are recent snippets and by no means the only time these 
comments have been said. Without sounding too bold, we do know a thing or two 
about scheduling.


CFS:
On Thursday 19 April 2007 16:38, Ingo Molnar wrote:

h. How about the following then: default to nice -10 for all
(SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_
special: root already has disk space reserved to it, root has special
memory allocation allowances, etc. I dont see a reason why we couldnt by
default make all root tasks have nice -10. This would be instantly loved
by sysadmins i suspect ;-)


It's worth noting that the -10 mentioned is roughly equivalent (in the 
old scheduler) to restoring interactive task status to X in those cases 
where it loses it due to a CPU storm in its output part.




(distros that go the extra mile of making Xorg run under non-root could
also go another extra one foot to renice that X server to -10.)


Nicksched:
On Wednesday 18 April 2007 15:00, Nick Piggin wrote:

What's wrong with allowing X to get more than it's fair share of CPU
time by fiddling with nice levels? That's what they're there for.


and

Staircase-Deadline:
On Thursday 19 April 2007 09:59, Con Kolivas wrote:

Remember to renice X to -10 for nicest desktop behaviour :)


I'd like to add the EBS scheduler (posted by Aurema Pty Ltd a couple of 
years back) to this list as it also recommended running X at nice -5 to -10.


Also some of the interactive bonus mechanisms in my SPA schedulers 
could be removed if X was reniced.  In fact, with a reniced X the 
spa_svr (server oriented scheduler which attempts to minimise the time 
tasks spend on the queue waiting for CPU access and which doesn't have 
interactive bonuses) might be usable on a work station.





[1]The one caveat I can think of is that when you share X sessions across 
multiple users -with a fair cpu scheduler-, having them all nice 0 also makes 
the distribution of cpu across the multiple users very even and smooth, 
without the expense of burning away the other person's cpu time they'd 

Re: Renice X for cpu schedulers

2007-04-19 Thread Mark Lord

Con Kolivas wrote:

On Thursday 19 April 2007 23:17, Mark Lord wrote:

Con Kolivas wrote:
s go ahead and think up great ideas for other ways of metering out cpu


bandwidth for different purposes, but for X, given the absurd simplicity
of renicing, why keep fighting it? Again I reiterate that most users of
SD have not found the need to renice X anyway except if they stick to old
habits of make -j4 on uniprocessor and the like, and I expect that those
on CFS and Nicksched would also have similar experiences.

Just plain make (no -j2 or -j) is enough to kill interactivity
on my 2GHz P-M single-core non-HT machine with SD.

But with the very first posted version of CFS by Ingo,
I can do make -j2 no problem and still have a nicely interactive destop.


Cool. Then there's clearly a bug with SD that manifests on your machine as it 
should not have that effect at all (and doesn't on other people's machines). 
I suggest trying the latest version which fixes some bugs.


SD just doesn't do nearly as good as the stock scheduler, or CFS, here.

I'm quite likely one of the few single-CPU/non-HT testers of this stuff.
If it should ever get more widely used I think we'd hear a lot more complaints.

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Gene Heskett
On Thursday 19 April 2007, Con Kolivas wrote:

[and I snipped a good overview]

So yes go ahead and think up great ideas for other ways of metering out cpu
bandwidth for different purposes, but for X, given the absurd simplicity of
renicing, why keep fighting it? Again I reiterate that most users of SD have
not found the need to renice X anyway except if they stick to old habits of
make -j4 on uniprocessor and the like, and I expect that those on CFS and
Nicksched would also have similar experiences.

FWIW folks, I have never touched x's niceness, its running at the default -1 
for all of my so-called 'tests', and I have another set to be rebooted to 
right now.  And yes, my kernel makeit script uses -j4 by default, and has 
used -j8 just for effects, which weren't all that different from what I 
expected in 'abusing' a UP system that way.  The system DID remain usable, 
not snappy, but usable.

Having tried re-nicing X a while back, and having the rest of the system 
suffer in quite obvious ways for even 1 + or - from its default felt pretty 
bad from this users perspective.

It is my considered opinion (yeah I know, I'm just a leaf in the hurricane of 
this list) that if X has to be re-niced from the 1 point advantage its had 
for ages, then something is basicly wrong with the overall scheduling, cpu or 
i/o, or both in combination.  FWIW I'm using cfq for i/o.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Moore's Constant:
Everybody sets out to do something, and everybody
does something, but no one does what he sets out to do.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Gene Heskett
On Thursday 19 April 2007, Mark Lord wrote:
Con Kolivas wrote:
 On Thursday 19 April 2007 23:17, Mark Lord wrote:
 Con Kolivas wrote:
 s go ahead and think up great ideas for other ways of metering out cpu

 bandwidth for different purposes, but for X, given the absurd simplicity
 of renicing, why keep fighting it? Again I reiterate that most users of
 SD have not found the need to renice X anyway except if they stick to
 old habits of make -j4 on uniprocessor and the like, and I expect that
 those on CFS and Nicksched would also have similar experiences.

 Just plain make (no -j2 or -j) is enough to kill interactivity
 on my 2GHz P-M single-core non-HT machine with SD.

 But with the very first posted version of CFS by Ingo,
 I can do make -j2 no problem and still have a nicely interactive
 destop.

 Cool. Then there's clearly a bug with SD that manifests on your machine as
 it should not have that effect at all (and doesn't on other people's
 machines). I suggest trying the latest version which fixes some bugs.

SD just doesn't do nearly as good as the stock scheduler, or CFS, here.

I found the early SD's much friendlier here, but I also think that at that 
point I was comparing SD to stock 2.6.21-rc5 and 6, and to say that it sucked 
would be a slight understatement.

I'm quite likely one of the few single-CPU/non-HT testers of this stuff.
If it should ever get more widely used I think we'd hear a lot more
 complaints.

I'm in that row of seats too Mark.  Someday I have to build a new box, that's 
all there is to it...

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Lots of folks confuse bad management with destiny.
-- Frank Hubbard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Ray Lee

On 4/19/07, Con Kolivas [EMAIL PROTECTED] wrote:

The one fly in the ointment for
linux remains X. I am still, to this moment, completely and utterly stunned
at why everyone is trying to find increasingly complex unique ways to manage
X when all it needs is more cpu[1].

[...and hence should be reniced]

The problem is that X is not unique. There's postgresql, memcached,
mysql, db2, a little embedded app I wrote... all of these perform work
on behalf of another process. It's just most *noticeable* with X, as
pretty much everyone is running that.

If we had some way for the scheduler to decide to donate part of a
client process's time slice to the server it just spoke to (with an
exponential dampening factor -- take 50% from the client, give 25% to
the server, toss the rest on the floor), that -- from my naive point
of view -- would be a step toward fixing the underlying issue. Or I
might be spouting crap, who knows.

The problem is real, though, and not limited to X.

While I have the floor, thank you, Con, for all your work.

Ray
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Michael K. Edwards

On 4/19/07, Gene Heskett [EMAIL PROTECTED] wrote:

Having tried re-nicing X a while back, and having the rest of the system
suffer in quite obvious ways for even 1 + or - from its default felt pretty
bad from this users perspective.

It is my considered opinion (yeah I know, I'm just a leaf in the hurricane of
this list) that if X has to be re-niced from the 1 point advantage its had
for ages, then something is basicly wrong with the overall scheduling, cpu or
i/o, or both in combination.  FWIW I'm using cfq for i/o.


I think I just realized why the X server is such a problem.  If it
gets preempted when it's not actually selecting/polling over a set of
fds that includes the input devices, the scheduler doesn't know that
it's a good candidate for scheduling when data arrives on those
devices.  (That's all that any of these dynamic priority heuristics
really seem to do -- weight the scheduler towards switching to
conspicuously I/O bound tasks when they become runnable, without the
forced preemption on lock release that would result from a true
priority inheritance mechanism.)

One way of looking at this is that fairness-driven scheduling is a
poor man's priority ceiling protocol for I/O bound workloads, with the
implicit priority of an fd or lock given by how desperately the reader
side needs more data in order to accomplish anything.  Nice on a
task is sort of an indirect way of boosting or dropping the base
priority of the fds it commonly waits on.  I recognize this is a
drastic oversimplification, and possibly even a misrepresentation of
the design _intent_; but I think it's fairly accurate in terms of the
design _effect_.

The event-driven, non-threaded design of the X server makes it
particularly vulnerable to non-interactive behavior penalties, which
is appropriate to the extent that it's an output device having trouble
keeping up with rendering -- in fact, that's exactly the throttling
mechanism you need in order to exert back-pressure on the X client.
(Trying to exert back-pressure over Linux's local domain sockets seems
to be like pushing on a rope, but that's a different problem.)  That
same event-driven design would prioritize input events just fine --
except the scheduler won't wake the task in order to deliver them,
because as far as it's concerned the X server is getting more than
enough I/O to keep it busy.  It's not only not blocked on the input
device, it isn't even selecting on it at the moment that its timeslice
expires -- so no amount of poor-man's PCP emulation is going to help.

What more negative nice on the X server than on any CPU-bound
process seems to do is to put the X server on a hair-trigger,
boosting its dynamic priority in a render-limited scenario (on some
graphics cards!) just enough to cancel the penalty for non-interactive
behavior.  It's forced to share _some_ CPU cycles, but nobody else is
allowed a long enough timeslice to keep the X server off the CPU (and
insensitive to input events) for long.  Not terribly efficient in
terms of context switch / cache eviction overhead, but certainly
friendlier to the PEBCAK (who is clearly putting totally inappropriate
load on a single-threaded CPU by running both a local X server and
non-SCHED_BATCH compute jobs) than a frozen mouse cursor.

So what's the right answer?  Not special-casing the X server, that's
for sure.  If this analysis is correct (and as of now it's pure
speculation), any event-driven application that does compute work
opportunistically in the absence of user interaction is vulnerable to
the same overzealous squelching.  I wouldn't design a new application
that way, of course -- user interaction belongs in a separate thread
on any UNIX-legacy system which assigns priorities to threads of
control instead of to patterns of activity.  But all sorts of Linux
applications have been designed to implicitly elevate artificial
throughput benchmarks over user responsiveness -- that has been the
UNIX way at least since SVR4, and Linux's history of expensive thread
switches prior to NPTL didn't help.

If you want responsiveness when the CPU is oversubscribed -- and I for
one do, which is one reason why I abandoned the Linux desktop once
both Microsoft and Apple figured out how to make hyperthreading work
in their favor -- you should probably think about how to get it
without rewriting half of userspace.  IMHO, dinking around with
fairness, as if there were any relationship these days between UIDs
or process groups or any other control structure and the work that's
trying to flow through the system, is not going to get you there.

If this were my problem, I might start by attaching urgency to
behavior instead of to thread ID, which demands a scheduler queue
built around a data structure with a cheap decrease-key operation.
I'd figure out how to propagate this urgency not just along lock
chains but also along chains of fds that need flushing (or refilling)
-- even if the reader (or writer) got preempted for unrelated reasons.

Re: Renice X for cpu schedulers

2007-04-19 Thread Con Kolivas
On Friday 20 April 2007 04:16, Gene Heskett wrote:
 On Thursday 19 April 2007, Con Kolivas wrote:

 [and I snipped a good overview]

 So yes go ahead and think up great ideas for other ways of metering out
  cpu bandwidth for different purposes, but for X, given the absurd
  simplicity of renicing, why keep fighting it? Again I reiterate that most
  users of SD have not found the need to renice X anyway except if they
  stick to old habits of make -j4 on uniprocessor and the like, and I
  expect that those on CFS and Nicksched would also have similar
  experiences.

 FWIW folks, I have never touched x's niceness, its running at the default
 -1 for all of my so-called 'tests', and I have another set to be rebooted
 to right now.  And yes, my kernel makeit script uses -j4 by default, and
 has used -j8 just for effects, which weren't all that different from what I
 expected in 'abusing' a UP system that way.  The system DID remain usable,
 not snappy, but usable.

Gene, you're agreeing with me. You've shown that you're very happy with a fair 
distribution of cpu and leaving X at nice 0.

 Having tried re-nicing X a while back, and having the rest of the system
 suffer in quite obvious ways for even 1 + or - from its default felt pretty
 bad from this users perspective.

 It is my considered opinion (yeah I know, I'm just a leaf in the hurricane
 of this list) that if X has to be re-niced from the 1 point advantage its
 had for ages, then something is basicly wrong with the overall scheduling,
 cpu or i/o, or both in combination.  FWIW I'm using cfq for i/o.

It's those who want X to have an unfair advantage that want it to do 
something special. Your agreement that it works fine at nice 0 shows you 
don't want it to have an unfair advantage. Others who want it to have an 
unfair advantage _can_ renice it if they desire. But if the cpu scheduler 
gives X an unfair advantage within the kernel by default then you have _no_ 
choice. If you leave the choice up to userspace (renice or not) then both 
parties get their way. If you put it into the kernel only one party wins and 
there is no way for the Genes (and Cons) of this world to get it back.

Your opinion is as valuable as eveyone else's Gene. It is hard to get people 
to speak on as frightening a playground as the linux kernel mailing list so 
please do. 

-- 
-ck
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Con Kolivas
On Friday 20 April 2007 05:26, Ray Lee wrote:
 On 4/19/07, Con Kolivas [EMAIL PROTECTED] wrote:
  The one fly in the ointment for
  linux remains X. I am still, to this moment, completely and utterly
  stunned at why everyone is trying to find increasingly complex unique
  ways to manage X when all it needs is more cpu[1].

 [...and hence should be reniced]

 The problem is that X is not unique. There's postgresql, memcached,
 mysql, db2, a little embedded app I wrote... all of these perform work
 on behalf of another process. It's just most *noticeable* with X, as
 pretty much everyone is running that.

 If we had some way for the scheduler to decide to donate part of a
 client process's time slice to the server it just spoke to (with an
 exponential dampening factor -- take 50% from the client, give 25% to
 the server, toss the rest on the floor), that -- from my naive point
 of view -- would be a step toward fixing the underlying issue. Or I
 might be spouting crap, who knows.

 The problem is real, though, and not limited to X.

 While I have the floor, thank you, Con, for all your work.

You're welcome and thanks for taking the floor to speak. I would say you have 
actually agreed with me though. X is not unique, it's just an obvious so 
let's not design the cpu scheduler around the problem with X. Same goes for 
every other application. Leaving the choice to hand out differential cpu 
usage when they seem to need is should be up to the users. The donation idea 
has been done before in some fashion or other in things like back-boost 
which Linus himself tried in 2.5.X days. It worked lovely till it did the 
wrong thing and wreaked havoc. As is shown repeatedly, the workarounds and 
the tweaks and the bonuses and the decide on who to give advantage to, when 
done by the cpu scheduler, is also what is its undoing as it can't always get 
it right. The consequences of getting it wrong on the other hand are 
disastrous. The cpu scheduler core is a cpu bandwidth and latency 
proportionator and should be nothing more or less.

-- 
-ck
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Con Kolivas
On Friday 20 April 2007 02:15, Mark Lord wrote:
 Con Kolivas wrote:
  On Thursday 19 April 2007 23:17, Mark Lord wrote:
  Con Kolivas wrote:
  s go ahead and think up great ideas for other ways of metering out cpu
 
  bandwidth for different purposes, but for X, given the absurd
  simplicity of renicing, why keep fighting it? Again I reiterate that
  most users of SD have not found the need to renice X anyway except if
  they stick to old habits of make -j4 on uniprocessor and the like, and
  I expect that those on CFS and Nicksched would also have similar
  experiences.
 
  Just plain make (no -j2 or -j) is enough to kill interactivity
  on my 2GHz P-M single-core non-HT machine with SD.
 
  But with the very first posted version of CFS by Ingo,
  I can do make -j2 no problem and still have a nicely interactive
  destop.
 
  Cool. Then there's clearly a bug with SD that manifests on your machine
  as it should not have that effect at all (and doesn't on other people's
  machines). I suggest trying the latest version which fixes some bugs.

 SD just doesn't do nearly as good as the stock scheduler, or CFS, here.

 I'm quite likely one of the few single-CPU/non-HT testers of this stuff.
 If it should ever get more widely used I think we'd hear a lot more
 complaints.

You are not really one of the few. A lot of my own work is done on a single 
core pentium M 1.7Ghz laptop. I am not endowed with truckloads of hardware 
like all the paid developers are. I recall extreme frustration myself when a 
developer a few years ago (around 2002) said he couldn't reproduce poor 
behaviour on his 4GB ram 4 x Xeon machine. Even today if I add up every 
machine I have in my house and work at my disposal it doesn't amount to that 
many cpus and that much ram.

-- 
-ck
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Michael K. Edwards

On 4/19/07, Con Kolivas [EMAIL PROTECTED] wrote:

The cpu scheduler core is a cpu bandwidth and latency
proportionator and should be nothing more or less.


Not really.  The CPU scheduler is (or ought to be) what electric
utilities call an economic dispatch mechanism -- a real-time
controller whose goal is to service competing demands cost-effectively
from a limited supply, without compromising system stability.

If you live in the 1960's, coal and nuclear (and a little bit of
fig-leaf hydro) are all you have, it takes you twelve hours to bring
plants on and off line, and there's no live operational control or
pricing signal between you and your customers.  So you're stuck
running your system at projected peak + operating margin, dumping
excess power as waste heat most of the time, and browning or blacking
people out willy-nilly when there's excess demand.  Maybe you get to
trade off shedding the loads with the worst transmission efficiency
against degrading the customers with the most tolerance for brownouts
(or the least regulatory clout).  That's life without modern economic
dispatch.

If you live in 2007, natural gas and (outside the US) better control
over nuclear plants give you more ability to ramp supply up and down
with demand on something like a 15-minute cycle.  Better yet, you can
store a little energy in the grid to smooth out instantaneous demand
fluctuations; if you're lucky, you also have enough fast-twitch hydro
(thanks, Canada!) that you can run your coal and lame-ass nuclear very
close to base load even when gas is expensive, and even pump water
back uphill when demand dips.  (Coal is nasty stuff and a worse
contributor by far to radiation exposure than nuclear generation; but
on current trends it's going to last a lot longer than oil and gas,
and it's a lot easier to stockpile next to the generator.)

Best of all, you have industrial customers who will trade you live
control (within limits) over when and how much power they take in
return for a lower price per unit energy.  Some of them will even dump
power back into the grid when you ask them to.  So now the biggest
challenge in making supply and demand meet (in the short term) is to
damp all the different ways that a control feedback path might result
in an oscillation -- or in runaway pricing.  Because there's always
some asshole greedhead who will gamble with system stability in order
to game the pricing mechanism.  Lots of 'em, if you're in California
and your legislature is so dumb, or so bought, that they let the
asshole greedheads design the whole system so they can game it to the
max.  (But that's a whole 'nother rant.)

Embedded systems are already in 2007, and the mainline Linux scheduler
frankly sucks on them, because it thinks it's back in the 1960's with
a fixed supply and captive demand, pissing away CPU bandwidth as
waste heat.  Not to say it's an easy problem; even academics with a
dozen publications in this area don't seem to be able to model energy
usage to the nearest big O, let alone design a stable economic
dispatch engine.  But it helps to acknowledge what the problem is:
even in a 1960's raised-floor screaming-air-conditioners
screw-the-power-bill machine room, you can't actually run a
half-decent CPU flat out any more without burning it to a crisp.

You can act ignorant and let the PMIC brown you out when it has to.
Or you can start coping in mainline the way that organizations big
enough (and smart enough) to feel the heat in their pocketbooks do in
their pet kernels.  (Boo on Google for not sharing, and props to IBM
for doing their damnedest.)  And guess what?  The system will actually
get simpler, and stabler, and faster, and easier to maintain, because
it'll be based on a real theory of operation with equations and things
instead of a bunch of opaque, undocumented shotgun heuristics.

This hypothetical economic-dispatch scheduler will still _have_
heuristics, of course -- you can't begin to model a modern CPU
accurately on-line.  But they will be contained in _data_ rather than
_code_, and issues of numerical stability will be separated cleanly
from the rule set.  You'll be able to characterize the rule set's
domain of stability, given a conservative set of assumptions about the
feedback paths in the system under control, with the sort of
techniques they teach in the engineering schools that none of us (me
included) seem to have attended.  (I went to school thinking I was
going to be a physicist.  Wishful thinking -- but I was young and
stupid.  What's your excuse?  ;-)

OK, it feels better to have that off my chest.  Apologies to those
readers -- doubtless the vast majority of LKML, including everyone
else in this thread -- for whom it's irrelevant, pseudo-learned
pontification with no patch attached.  And my sincere thanks to Ingo,
Con, and really everyone else CC'ed, without whom Linux wouldn't be as
good as it is (really quite good, all things considered) and wouldn't
contribute as much as it does to 

Re: Renice X for cpu schedulers

2007-04-19 Thread Ray Lee
Con Kolivas wrote:
 You're welcome and thanks for taking the floor to speak. I would say you have 
 actually agreed with me though. X is not unique, it's just an obvious so 
 let's not design the cpu scheduler around the problem with X. Same goes for 
 every other application. Leaving the choice to hand out differential cpu 
 usage when they seem to need is should be up to the users. The donation idea 
 has been done before in some fashion or other in things like back-boost 
 which Linus himself tried in 2.5.X days. It worked lovely till it did the 
 wrong thing and wreaked havoc.

nod I know. I came to the party late, or I would have played with it back
then. Perhaps you could correct me, but it seems his back-boost didn't do
any dampening, which means the system could get into nasty capture scenarios,
where two processes bouncing messages back and forth could take over the
scheduler and starve out the rest. It seems pretty obvious in hind-sight
that something without exponential dampening would allow feedback loops.

Regardless, perhaps we are in agreement. I just don't like the idea of having
to guess how much work postgresql is going to be doing on my client processes'
behalf. Worse, I don't necessarily want it to have that -10 priority when
it's going and updating statistics or whatnot, or any other housekeeping
activity that shouldn't make a noticeable impact on the rest of the system.
Worst, I'm leery of the idea that if I get its nice level wrong, that I'm
going to be affecting the overall throughput of the server.

All of which are only hypothetical worries, granted.

Anyway, I'll shut up now. Thanks again for stickin' with it.

Ray
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Ed Tomlinson
On Thursday 19 April 2007 12:15, Mark Lord wrote:
 Con Kolivas wrote:
  On Thursday 19 April 2007 23:17, Mark Lord wrote:
  Con Kolivas wrote:
  s go ahead and think up great ideas for other ways of metering out cpu
 
  bandwidth for different purposes, but for X, given the absurd simplicity
  of renicing, why keep fighting it? Again I reiterate that most users of
  SD have not found the need to renice X anyway except if they stick to old
  habits of make -j4 on uniprocessor and the like, and I expect that those
  on CFS and Nicksched would also have similar experiences.
  Just plain make (no -j2 or -j) is enough to kill interactivity
  on my 2GHz P-M single-core non-HT machine with SD.
 
  But with the very first posted version of CFS by Ingo,
  I can do make -j2 no problem and still have a nicely interactive destop.
  
  Cool. Then there's clearly a bug with SD that manifests on your machine as 
  it 
  should not have that effect at all (and doesn't on other people's 
  machines). 
  I suggest trying the latest version which fixes some bugs.
 
 SD just doesn't do nearly as good as the stock scheduler, or CFS, here.
 
 I'm quite likely one of the few single-CPU/non-HT testers of this stuff.
 If it should ever get more widely used I think we'd hear a lot more 
 complaints.

amd64 UP here.  SD with several makes running works just fine.

Ed Tomlinson
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Linus Torvalds


On Thu, 19 Apr 2007, Ed Tomlinson wrote:
  
  SD just doesn't do nearly as good as the stock scheduler, or CFS, here.
  
  I'm quite likely one of the few single-CPU/non-HT testers of this stuff.
  If it should ever get more widely used I think we'd hear a lot more 
  complaints.
 
 amd64 UP here.  SD with several makes running works just fine.

The thing is, it probably depends *heavily* on just how much work the X 
server ends up doing. Fast video hardware? The X server doesn't need to 
busy-wait much. Not a lot of eye-candy? The X server is likely fast enough 
even with a slower card that it still gets sufficient CPU time and isn't 
getting dinged by any balancing. DRI vs non-DRI? Which window manager 
(maybe some of the user-visible lags come from there..) etc etc.

Anyway, I'd ask people to look a bit at the current *regressions* instead 
of spending all their time on something that won't even be merged before 
2.6.21 is released, and we thus have some mroe pressing issues. Please?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Michael K. Edwards

On 4/19/07, Lee Revell [EMAIL PROTECTED] wrote:

IMHO audio streamers should use SCHED_FIFO thread for time critical
work.  I think it's insane to expect the scheduler to figure out that
these processes need low latency when they can just be explicit about
it.  Professional audio software does it already, on Linux as well
as other OS...


It is certainly true that SCHED_FIFO is currently necessary in the
layers of an audio application lying closest to the hardware, if you
don't want to throw a monstrous hardware ring buffer at the problem.
See the alsa-devel archives for a patch to aplay (sched_setscheduler
plus some cleanups) that converts it from unsafe at any speed (on a
non-RT kernel) to a rock-solid 18ms round trip from PCM in to PCM out.
(The hardware and driver aren't terribly exotic for an SoC, and the
measurement was done with aplay -C | aplay -P -- on a
not-particularly-tuned CONFIG_PREEMPT kernel with a 12ms+ peak
scheduling latency according to cyclictest.  A similar test via
/dev/dsp, done through a slightly modified OSS emulation layer to the
same driver, measures at 40ms and is probably tuned too
conservatively.)

Note that SCHED_FIFO may be less necessary on an -rt kernel, but I
haven't had that option on the embedded hardware I've been working
with lately.  Ingo, please please pretty please pick a -stable branch
one of these days and provide a git repo with -rt integrated against
that branch.  Then I could port our chip support to it -- all of which
will be GPLed after the impending code review -- after which I might
have a prayer of strong-arming our chip vendor into porting their WiFi
driver onto -rt.  It's really a much more interesting scheduler use
case than make -j200 under X, because it's a best-effort
SCHED_BATCH-ish load that wants to be temporally clustered for power
management reasons.

(Believe it or not, a stable -rt branch with a clock-scaling-aware
scheduler is the one thing that might lead to this major WiFi vendor's
GPLing their driver core.  They're starting to see the light on the
biz dev side, and the nature of the devices their chip will go in
makes them somewhat less concerned about the regulatory fig leaf
aspect of a closed-source driver; but they would have to port off of
the third-party real-time executive embedded within the driver, and
mainline's task and timer granularity won't cut it.  I can't even get
more detail about _why_ it won't cut it unless there's some remotely
supportable -rt base they could port to.)

But I think SCHED_FIFO on a chain of tasks is fundamentally not the
right way to handle low audio latency.  The object with a low latency
requirement isn't the task, it's the device.  When it's starting to
get urgent to deliver more data to the device, the task that it's
waiting on should slide up the urgency scale; and if it's waiting on
something else, that something else should slide up the scale; and so
forth.  Similarly, responding to user input is urgent; so when user
input is available (by whatever mechanism), the task that's waiting
for it should slide up the urgency scale, etc.

In practice, you probably don't want to burden desktop Linux with
priority inheritance where you don't have to.  Priority queues with
algorithmically efficient decrease-key operations (Fibonacci heaps and
their ilk) are complicated to implement and have correspondingly high
constant factors.  (However, a sufficiently clever heuristic for
assigning quasi-static task priorities would usually short-circuit the
priority cascade; if you can keep N small in the
tasks-with-unpredictable-priority queue, you can probably use a
simpler flavor with O(log N) decrease-key.  Ask someone who knows more
about data structures than I do.)

More importantly, non-real-time application coders aren't very smart
about grouping data structure accesses on one side or the other of a
system call that is likely to release a lock and let something else
run, flushing application data out of cache.  (Kernel coders aren't
always smart about this either; see LKML threads a few weeks ago about
racy, and cache-stall-prone, f_pos handling in VFS.)  So switching
tasks immediately on lock release is usually the wrong thing to do if
letting the task run a little longer would allow it to reach a point
where it has to block anyway.

Anyway, I already described the urgency-driven strategy to the extent
that I've thought it out, elsewhere in this thread.  I only held this
draft back because I wanted to double-check my latency measurements.

Cheers,
- Michael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Gene Heskett
On Thursday 19 April 2007, Con Kolivas wrote:
On Friday 20 April 2007 04:16, Gene Heskett wrote:
 On Thursday 19 April 2007, Con Kolivas wrote:

 [and I snipped a good overview]

 So yes go ahead and think up great ideas for other ways of metering out
  cpu bandwidth for different purposes, but for X, given the absurd
  simplicity of renicing, why keep fighting it? Again I reiterate that
  most users of SD have not found the need to renice X anyway except if
  they stick to old habits of make -j4 on uniprocessor and the like, and I
  expect that those on CFS and Nicksched would also have similar
  experiences.

 FWIW folks, I have never touched x's niceness, its running at the default
 -1 for all of my so-called 'tests', and I have another set to be rebooted
 to right now.  And yes, my kernel makeit script uses -j4 by default, and
 has used -j8 just for effects, which weren't all that different from what
 I expected in 'abusing' a UP system that way.  The system DID remain
 usable, not snappy, but usable.

Gene, you're agreeing with me. You've shown that you're very happy with a
 fair distribution of cpu and leaving X at nice 0.

I was quite happy till Ingo's first patch came out, and it was even better, 
but I over-wrote it, and we're still figuring out just exactly what the magic 
twanger was that made it all click for me.  OTOH, I don't think that patch 
passed muster with Mike G., either.  We have obviously different workloads, 
and critical points in them.

 Having tried re-nicing X a while back, and having the rest of the system
 suffer in quite obvious ways for even 1 + or - from its default felt
 pretty bad from this users perspective.

 It is my considered opinion (yeah I know, I'm just a leaf in the hurricane
 of this list) that if X has to be re-niced from the 1 point advantage its
 had for ages, then something is basicly wrong with the overall scheduling,
 cpu or i/o, or both in combination.  FWIW I'm using cfq for i/o.

It's those who want X to have an unfair advantage that want it to do
something special. Your agreement that it works fine at nice 0 shows you
don't want it to have an unfair advantage. Others who want it to have an
unfair advantage _can_ renice it if they desire. But if the cpu scheduler
gives X an unfair advantage within the kernel by default then you have _no_
choice. If you leave the choice up to userspace (renice or not) then both
parties get their way. If you put it into the kernel only one party wins and
there is no way for the Genes (and Cons) of this world to get it back.

Your opinion is as valuable as eveyone else's Gene. It is hard to get people
to speak on as frightening a playground as the linux kernel mailing list so
please do.

In the FWIW category, htop has always told me that x is running at -1, not 
zero.  Now, I have NDI where this is actually set at, so I'd have to ask 
stupid questions here if I did wanna play with it.  Which I really don't, the 
last time I tried to -5 x, kde got a whole lot LESS responsive.  But heck, 
2.6.2 was freshly minted then too and I've long since forgot how I went about 
that unless I used htop to change it, the most likely scenario that I can 
picture at this late date. 

As for speaking my mind, yes, and I've been slapped down a few times, as much 
because I do a lot of bitching and microscopic amounts of patch submission. 
The only patch I ever submitted was for something in the floppy driver, way 
back in the middle of 2.2 days, rejected because I didn't know how to use the 
tools correctly.  I didn't, so it was a shrug and my feelings weren't hurt.

Some see that as an unbalanced set of books and I'm aware of it.  OTOH, I 
think I do a pretty good job of playing the canary here, and that should be 
worth something if for no other reason than I can turn into a burr under 
somebodies saddle when things go all aglay.  But I figure if its happening to 
me, then if I don't fuss, and that gotcha gets into a distro kernel, there 
are gonna be a hell of a lot more folks than me trying to grab the 
microphone.

BTW, I'm glad you are feeling well enough to get into this again.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
There cannot be a crisis next week.  My schedule is already full.
-- Henry Kissinger
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Gene Heskett
On Thursday 19 April 2007, Con Kolivas wrote:
On Friday 20 April 2007 04:16, Gene Heskett wrote:
 On Thursday 19 April 2007, Con Kolivas wrote:

 [and I snipped a good overview]

 So yes go ahead and think up great ideas for other ways of metering out
  cpu bandwidth for different purposes, but for X, given the absurd
  simplicity of renicing, why keep fighting it? Again I reiterate that
  most users of SD have not found the need to renice X anyway except if
  they stick to old habits of make -j4 on uniprocessor and the like, and I
  expect that those on CFS and Nicksched would also have similar
  experiences.

 FWIW folks, I have never touched x's niceness, its running at the default
 -1 for all of my so-called 'tests', and I have another set to be rebooted
 to right now.  And yes, my kernel makeit script uses -j4 by default, and
 has used -j8 just for effects, which weren't all that different from what
 I expected in 'abusing' a UP system that way.  The system DID remain
 usable, not snappy, but usable.

Gene, you're agreeing with me. You've shown that you're very happy with a
 fair distribution of cpu and leaving X at nice 0.

I was quite happy till Ingo's first patch came out, and it was even better, 
but I over-wrote it, and we're still figuring out just exactly what the magic 
twanger was that made it all click for me.  OTOH, I don't think that patch 
passed muster with Mike G., either.  We have obviously different workloads, 
and critical points in them.

 Having tried re-nicing X a while back, and having the rest of the system
 suffer in quite obvious ways for even 1 + or - from its default felt
 pretty bad from this users perspective.

 It is my considered opinion (yeah I know, I'm just a leaf in the hurricane
 of this list) that if X has to be re-niced from the 1 point advantage its
 had for ages, then something is basicly wrong with the overall scheduling,
 cpu or i/o, or both in combination.  FWIW I'm using cfq for i/o.

It's those who want X to have an unfair advantage that want it to do
something special. Your agreement that it works fine at nice 0 shows you
don't want it to have an unfair advantage. Others who want it to have an
unfair advantage _can_ renice it if they desire. But if the cpu scheduler
gives X an unfair advantage within the kernel by default then you have _no_
choice. If you leave the choice up to userspace (renice or not) then both
parties get their way. If you put it into the kernel only one party wins and
there is no way for the Genes (and Cons) of this world to get it back.

Your opinion is as valuable as eveyone else's Gene. It is hard to get people
to speak on as frightening a playground as the linux kernel mailing list so
please do.

In the FWIW category, htop has always told me that x is running at -1, not 
zero.  Now, I have NDI where this is actually set at, so I'd have to ask 
stupid questions here if I did wanna play with it.  Which I really don't, the 
last time I tried to -5 x, kde got a whole lot LESS responsive.  But heck, 
2.6.2 was freshly minted then too and I've long since forgot how I went about 
that unless I used htop to change it, the most likely scenario that I can 
picture at this late date. 

As for speaking my mind, yes, and I've been slapped down a few times, as much 
because I do a lot of bitching and microscopic amounts of patch submission. 
The only patch I ever submitted was for something in the floppy driver, way 
back in the middle of 2.2 days, rejected because I didn't know how to use the 
tools correctly.  I didn't, so it was a shrug and my feelings weren't hurt.

Some see that as an unbalanced set of books and I'm aware of it.  OTOH, I 
think I do a pretty good job of playing the canary here, and that should be 
worth something if for no other reason than I can turn into a burr under 
somebodies saddle when things go all aglay.  But I figure if its happening to 
me, then if I don't fuss, and that gotcha gets into a distro kernel, there 
are gonna be a hell of a lot more folks than me trying to grab the 
microphone.

BTW, I'm glad you are feeling well enough to get into this again.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
There cannot be a crisis next week.  My schedule is already full.
-- Henry Kissinger
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Nick Piggin
On Thu, Apr 19, 2007 at 09:17:25AM -0400, Mark Lord wrote:
 Con Kolivas wrote:
 s go ahead and think up great ideas for other ways of metering out cpu 
 bandwidth for different purposes, but for X, given the absurd simplicity 
 of renicing, why keep fighting it? Again I reiterate that most users of SD 
 have not found the need to renice X anyway except if they stick to old 
 habits of make -j4 on uniprocessor and the like, and I expect that those 
 on CFS and Nicksched would also have similar experiences.
 
 Just plain make (no -j2 or -j) is enough to kill interactivity
 on my 2GHz P-M single-core non-HT machine with SD.

Is this with or without X reniced?


 But with the very first posted version of CFS by Ingo,
 I can do make -j2 no problem and still have a nicely interactive destop.

How well does cfs run if you have the granularity set to something
like 30ms (3000)?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Nick Piggin
On Thu, Apr 19, 2007 at 12:26:03PM -0700, Ray Lee wrote:
 On 4/19/07, Con Kolivas [EMAIL PROTECTED] wrote:
 The one fly in the ointment for
 linux remains X. I am still, to this moment, completely and utterly stunned
 at why everyone is trying to find increasingly complex unique ways to 
 manage
 X when all it needs is more cpu[1].
 [...and hence should be reniced]
 
 The problem is that X is not unique. There's postgresql, memcached,
 mysql, db2, a little embedded app I wrote... all of these perform work
 on behalf of another process. It's just most *noticeable* with X, as
 pretty much everyone is running that.

But for most of those apps, we don't actually care if they do fairly
degrade in performance as other loads on the system ramp up. However
the user prefers X to be given priority in these situations. Whether
that is the design of X, x clients, or the human condition really
doesn't matter two hoots to the scheduler.


 If we had some way for the scheduler to decide to donate part of a
 client process's time slice to the server it just spoke to (with an
 exponential dampening factor -- take 50% from the client, give 25% to
 the server, toss the rest on the floor), that -- from my naive point
 of view -- would be a step toward fixing the underlying issue. Or I
 might be spouting crap, who knows.

Firstly, lots of clients in your list are remote. X usually isn't.
However for X, a syscall or something to donate time might not be
such a bad idea... but given a couple of X clients and a server
against a parallel make, this is probably just going to make the
clients slow down as well without giving enough priority to the
server.

X isn't special so much because it does work on behalf of others
(as you said, lots of things do that). It is special simply because
we _want_ rendering to have priority of the CPU (if you shifed CPU
intensive rendering to the clients, you'd most likely want to give
them priority to); nice, right?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread Mike Galbraith
On Fri, 2007-04-20 at 08:47 +1000, Con Kolivas wrote:

 It's those who want X to have an unfair advantage that want it to do 
 something special.

I hope you're not lumping me in with those.  If X + client had been
able to get their fair share and do so in the low latency manner they
need, I would have been one of the carrots instead of being the stick.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread hui
On Thu, Apr 19, 2007 at 06:32:15PM -0700, Michael K. Edwards wrote:
 But I think SCHED_FIFO on a chain of tasks is fundamentally not the
 right way to handle low audio latency.  The object with a low latency
 requirement isn't the task, it's the device.  When it's starting to
 get urgent to deliver more data to the device, the task that it's
 waiting on should slide up the urgency scale; and if it's waiting on
 something else, that something else should slide up the scale; and so
 forth.  Similarly, responding to user input is urgent; so when user
 input is available (by whatever mechanism), the task that's waiting
 for it should slide up the urgency scale, etc.

DSP operations like, particularly with digital synthesis, tend to max
the CPU doing vector operations on as many processors as it can get
a hold of. In a live performance critical application, it's important
to be able to deliver a protected amount of CPU to a thread doing that
work as well as response to external input such as controllers, etc...

 In practice, you probably don't want to burden desktop Linux with
 priority inheritance where you don't have to.  Priority queues with
 algorithmically efficient decrease-key operations (Fibonacci heaps and
 their ilk) are complicated to implement and have correspondingly high
 constant factors.  (However, a sufficiently clever heuristic for
 assigning quasi-static task priorities would usually short-circuit the
 priority cascade; if you can keep N small in the
 tasks-with-unpredictable-priority queue, you can probably use a
 simpler flavor with O(log N) decrease-key.  Ask someone who knows more
 about data structures than I do.)

These are app issue and not really somethings that's mutable in kernel
per se with regard to the -rt patch.

 More importantly, non-real-time application coders aren't very smart
 about grouping data structure accesses on one side or the other of a
 system call that is likely to release a lock and let something else
 run, flushing application data out of cache.  (Kernel coders aren't
 always smart about this either; see LKML threads a few weeks ago about
 racy, and cache-stall-prone, f_pos handling in VFS.)  So switching
 tasks immediately on lock release is usually the wrong thing to do if
 letting the task run a little longer would allow it to reach a point
 where it has to block anyway.

I have Solaris style adaptive locks in my tree with my lockstat patch
under -rt. I've also modified my lockstat patch to track readers
correctly now with rwsem and the like to see where the single reader
limitation in the rtmutex blows it.

So far I've seen less than 10 percent of in-kernel contention events
actually worth spinning on and the rest of the stats imply that the
mutex owner in question is either preempted or blocked on something
else.

I've been trying to get folks to try this on a larger machine than my
2x AMD64 box so that I there is more data regarding Linux contention
and overschedulling in -rt.
 
 Anyway, I already described the urgency-driven strategy to the extent
 that I've thought it out, elsewhere in this thread.  I only held this
 draft back because I wanted to double-check my latency measurements.

bill

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Renice X for cpu schedulers

2007-04-19 Thread hui
On Thu, Apr 19, 2007 at 05:20:53PM -0700, Michael K. Edwards wrote:
 Embedded systems are already in 2007, and the mainline Linux scheduler
 frankly sucks on them, because it thinks it's back in the 1960's with
 a fixed supply and captive demand, pissing away CPU bandwidth as
 waste heat.  Not to say it's an easy problem; even academics with a
 dozen publications in this area don't seem to be able to model energy
 usage to the nearest big O, let alone design a stable economic
 dispatch engine.  But it helps to acknowledge what the problem is:
 even in a 1960's raised-floor screaming-air-conditioners
 screw-the-power-bill machine room, you can't actually run a
 half-decent CPU flat out any more without burning it to a crisp.
 stupid.  What's your excuse?  ;-)

It's now possible to QoS significant parts of the kernel since we now
have a deadline mechanism in place. In the original 2.4 kernel, TimeSys's
irq-thread allowed for the processing of skbuffs in a thread under a CPU
reservation run category which was use to provide QoS I believe. This
basic mechanish can now be generalized to many place in the kernel and
put it under scheduler control.

It's just a matter of who and when somebody is going take on this task.

bill

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/