Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Tuesday 13 March 2007 16:10, Mike Galbraith wrote: On Tue, 2007-03-13 at 09:51 +1100, Con Kolivas wrote: On 13/03/07, Mike Galbraith [EMAIL PROTECTED] wrote: As soon as your cpu is fully utilized, fairness looses or interactivity loses. Pick one. That's not true unless you refuse to prioritise your tasks accordingly. Let's take this discussion in a different direction. You already nice your lame processes. Why? You already have the concept that you are prioritising things to normal or background tasks. You say so yourself that lame is a background task. Stating the bleedingly obvious, the unix way of prioritising things is via nice. You already do that. So moving on from that... Sure. If a user wants to do anything interactive, they can indeed nice 19 the rest of their box before they start. Your test case you ask how can I maximise cpu usage. Well you know the answer already. You run two threads. I won't dispute that. The debate seems to be centered on whether two tasks that are niced +5 or to a higher value is background. In my opinion, nice 5 is not background, but relatively less cpu. You already are savvy enough to be using two threads and nicing them. All I ask you to do when using RSDL is to change your expectations slightly and your settings from nice 5 to nice 10 or 15 or even 19. Why is that so offensive to you? It's not offensive to me, it is a behavioral regression. The situation as we speak is that you can run cpu intensive tasks while watching eye-candy. With RSDL, you can't, you feel the non-interactive load instantly. Doesn't the fact that you're asking me to lower my expectations tell you that I just might have a point? Yet looking at the mainline scheduler code, nice 5 tasks are also supposed to get 75% cpu compared to nice 0 tasks, however I cannot seem to get 75% cpu with a fully cpu bound task in the presence of an interactive task. To me that means mainline is not living up to my expectations. What you're saying is your expectations are based on a false cpu expectation from nice 5. You can spin it both ways. It seems to me the only one that lives up to a defined expectation is to be fair. Anything else is at best vague, and at worst starvation prone. Please don't pick 5.none of the above. Please try to work with me on this. I'm not trying to be pig-headed. I'm of the opinion that fairness is great... until you strictly enforce it wrt interactive tasks. How about answering my question then since I offered you numerous combinations of ways to tackle the problem? The simplest one doesn't even need code, it just needs you to alter the nice value that you're already setting. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Tuesday 13 March 2007 17:08, Mike Galbraith wrote: Virtual or physical cores has nothing to do with the interactivity regression I noticed. Two nice 0 tasks which combined used 50% of my box can no longer share that box with two nice 5 tasks and receive the 50% they need to perform. That's it. From there, we wandered off into a discussion on the relative merit and pitfalls of fairness. And again, with X in its current implementation it is NOT like two nice 0 tasks at all; it is like one nice 0 task. This is being fixed in the X design as we speak. -Mike -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL for 2.6.21-rc3- 0.29
Hi Gene. On Monday 12 March 2007 16:38, Gene Heskett wrote: > I hate to say it Con, but this one seems to have broken the amanda-tar > symbiosis. > > I haven't tried a plain 21-rc3, so the problem may exist there, and in > fact it did for 21-rc1, but I don't recall if it was true for -rc2. But > I will have a plain 21-rc3 running by tomorrow nights amanda run to test. > > What happens is that when amanda tells tar to do a level 1 or 2, tar still > thinks its doing a level 0. The net result is that the tape is filled > completely and amanda does an EOT exit in about 10 of my 42 dle's. This > is tar-1.15-1 for fedora core 6. I'm sorry but I have to say I have no idea what any of this means. I gather you're making an association between some application combination failing and RSDL cpu scheduler. Unfortunately the details of what the problem is, or how the cpu scheduler is responsible, escape me :( -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
On Monday 12 March 2007 15:42, Al Boldi wrote: > Con Kolivas wrote: > > On Monday 12 March 2007 08:52, Con Kolivas wrote: > > > And thank you! I think I know what's going on now. I think each > > > rotation is followed by another rotation before the higher priority > > > task is getting a look in in schedule() to even get quota and add it to > > > the runqueue quota. I'll try a simple change to see if that helps. > > > Patch coming up shortly. > > > > Can you try the following patch and see if it helps. There's also one > > minor preemption logic fix in there that I'm planning on including. > > Thanks! > > Applied on top of v0.28 mainline, and there is no difference. > > What's it look like on your machine? The higher priority one always get 6-7ms whereas the lower priority one runs 6-7ms and then one larger perfectly bound expiration amount. Basically exactly as I'd expect. The higher priority task gets precisely RR_INTERVAL maximum latency whereas the lower priority task gets RR_INTERVAL min and full expiration (according to the virtual deadline) as a maximum. That's exactly how I intend it to work. Yes I realise that the max latency ends up being longer intermittently on the niced task but that's -in my opinion- perfectly fine as a compromise to ensure the nice 0 one always gets low latency. Eg: nice 0 vs nice 10 nice 0: pid 6288, prio 0, out for7 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms nice 10: pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for 66 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms exactly as I'd expect. If you want fixed latencies _of niced tasks_ in the presence of less niced tasks you will not get them with this scheduler. What you will get, though, is a perfectly bound relationship knowing exactly what the maximum latency will ever be. Thanks for the test case. It's interesting and nice that it confirms this scheduler works as I expect it to. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.30 cpu scheduler for ... 2.6.18.8 kernel
On Monday 12 March 2007 19:17, Vincent Fortier wrote: > > There are updated patches for 2.6.20, 2.6.20.2, 2.6.21-rc3 and > > 2.6.21-rc3-mm2 to bring RSDL up to version 0.30 for download here: > > > > Full patches: > > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.20-sched-rsdl-0.30.p > >at ch > > http://ck.kolivas.org/patches/staircase-deadline/2.6.20.2-rsdl-0.30.patch > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0. > >30 .patch > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.30 > >.p atch > > > > incrementals: > > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.20/2.6.20.2-rsdl-0.2 > >9- 0.30.patch > > http://ck.kolivas.org/patches/staircase-deadline/2.6.20.2/2.6.20.2-rsdl-0 > >.2 9-0.30.patch > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3/2.6.21-rc3-rs > >dl -0.29-0.30.patch > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2/2.6.21-rc > >3- mm2-rsdl-0.29-0.30.patch > And here are the backported RSDL 0.30 patches in case any of you would > still be running an older 2.6.18.8 kernel ... Thanks, your efforts are appreciated as it would take me quite a while to do a variety of backports that people are already requesting. > Just for info, verison 0.30 seems around 2 seconds faster than 0.26-0.29 > versions at boot time. I used to have around 2-3 seconds of difference > between a vanilla and a rsdl patched kernel. Now it looks more like 5 > seconds faster! Wow.. nice work CK! > > 2.6.18.8 vanilla kernel: > [ 68.514248] ACPI: Power Button (CM) [PWRB] > 2.6.18.8-rsdl-0.30: > [ 63.739337] ACPI: Power Button (CM) [PWRB] Indeed there's almost 5 seconds difference there. To be honest, the boot time speedups are an unexpected bonus, but everyone seems to be reporting them on all flavours so perhaps all those timeout related driver setups are inadvertently benefiting. > - vin Thanks -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RSDL v0.30 cpu scheduler for mainline kernels
There are updated patches for 2.6.20, 2.6.20.2, 2.6.21-rc3 and 2.6.21-rc3-mm2 to bring RSDL up to version 0.30 for download here: Full patches: http://ck.kolivas.org/patches/staircase-deadline/2.6.20-sched-rsdl-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.20.2-rsdl-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.30.patch incrementals: http://ck.kolivas.org/patches/staircase-deadline/2.6.20/2.6.20.2-rsdl-0.29-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.20.2/2.6.20.2-rsdl-0.29-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3/2.6.21-rc3-rsdl-0.29-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2/2.6.21-rc3-mm2-rsdl-0.29-0.30.patch -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [RSDL] sched: rsdl accounting fixes
Andrew the following patch can be rolled into the sched-implement-rsdl-cpu-scheduler.patch file or added separately if that's easier. All the oopses and bitmap errors of previous versions of rsdl were fixed by v0.29 so I think RSDL is ready for another round in -mm. Thanks. --- Higher priority tasks should always preempt lower priority tasks if they are queued higher than their static priority as non-rt tasks. Fix it. The deadline mechanism can be triggered before tasks' quota ever gets added to the runqueue priority level's quota. Add 1 to the quota in anticipation of this. The deadline mechanism should only be triggered if the quota is overrun instead of as soon as the quota is expired allowing some aliasing errors in scheduler_tick accounting. Fix that Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> --- kernel/sched.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) Index: linux-2.6.21-rc3-mm2/kernel/sched.c === --- linux-2.6.21-rc3-mm2.orig/kernel/sched.c2007-03-12 08:47:43.0 +1100 +++ linux-2.6.21-rc3-mm2/kernel/sched.c 2007-03-12 09:10:33.0 +1100 @@ -96,10 +96,9 @@ unsigned long long __attribute__((weak)) * provided it is not a realtime comparison. */ #define TASK_PREEMPTS_CURR(p, curr) \ - (((p)->prio < (curr)->prio) || (((p)->prio == (curr)->prio) && \ + (((p)->prio < (curr)->prio) || (!rt_task(p) && \ ((p)->static_prio < (curr)->static_prio && \ - ((curr)->static_prio > (curr)->prio)) && \ - !rt_task(p))) + ((curr)->static_prio > (curr)->prio /* * This is the time all tasks within the same priority round robin. @@ -3323,7 +3322,7 @@ static inline void major_prio_rotation(s */ static inline void rotate_runqueue_priority(struct rq *rq) { - int new_prio_level, remaining_quota; + int new_prio_level; struct prio_array *array; /* @@ -3334,7 +,6 @@ static inline void rotate_runqueue_prior if (unlikely(sched_find_first_bit(rq->dyn_bitmap) < rq->prio_level)) return; - remaining_quota = rq_quota(rq, rq->prio_level); array = rq->active; if (rq->prio_level > MAX_PRIO - 2) { /* Major rotation required */ @@ -3368,10 +3366,11 @@ static inline void rotate_runqueue_prior } rq->prio_level = new_prio_level; /* -* While we usually rotate with the rq quota being 0, it is possible -* to be negative so we subtract any deficit from the new level. +* As we are merging to a prio_level that may not have anything in +* its quota we add 1 to ensure the tasks get to run in schedule() to +* add their quota to it. */ - rq_quota(rq, new_prio_level) += remaining_quota; + rq_quota(rq, new_prio_level) += 1; } static void task_running_tick(struct rq *rq, struct task_struct *p) @@ -3397,12 +3396,11 @@ static void task_running_tick(struct rq if (!--p->time_slice) task_expired_entitlement(rq, p); /* -* The rq quota can become negative due to a task being queued in -* scheduler without any quota left at that priority level. It is -* cheaper to allow it to run till this scheduler tick and then -* subtract it from the quota of the merged queues. +* We only employ the deadline mechanism if we run over the quota. +* It allows aliasing problems around the scheduler_tick to be +* less harmful. */ - if (!rt_task(p) && --rq_quota(rq, rq->prio_level) <= 0) { + if (!rt_task(p) && --rq_quota(rq, rq->prio_level) < 0) { if (unlikely(p->first_time_slice)) p->first_time_slice = 0; rotate_runqueue_priority(rq); -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
On Monday 12 March 2007 09:29, bert hubert wrote: > Con, > > Recent kernel versions have real problems for me on the interactivity > front, with even a simple 'make' of my C++ program (PowerDNS) causing > Firefox to slow down to a crawl. > > RSDL fixed all that, the system is noticeably snappier. > > As a case in point, I used to notice when a compile was done because the > system stopped being sluggish. > > Today, a few times, I only noticed 'make' was done because the fans of my > computer slowed down. > > Thanks for the good work! I'm on 2.6.21-rc3-rsdl-0.29. You're most welcome, and thank you for the report :) > Bert -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
On Monday 12 March 2007 08:52, Con Kolivas wrote: > And thank you! I think I know what's going on now. I think each rotation is > followed by another rotation before the higher priority task is getting a > look in in schedule() to even get quota and add it to the runqueue quota. > I'll try a simple change to see if that helps. Patch coming up shortly. Can you try the following patch and see if it helps. There's also one minor preemption logic fix in there that I'm planning on including. Thanks! --- kernel/sched.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) Index: linux-2.6.21-rc3-mm2/kernel/sched.c === --- linux-2.6.21-rc3-mm2.orig/kernel/sched.c2007-03-12 08:47:43.0 +1100 +++ linux-2.6.21-rc3-mm2/kernel/sched.c 2007-03-12 09:10:33.0 +1100 @@ -96,10 +96,9 @@ unsigned long long __attribute__((weak)) * provided it is not a realtime comparison. */ #define TASK_PREEMPTS_CURR(p, curr) \ - (((p)->prio < (curr)->prio) || (((p)->prio == (curr)->prio) && \ + (((p)->prio < (curr)->prio) || (!rt_task(p) && \ ((p)->static_prio < (curr)->static_prio && \ - ((curr)->static_prio > (curr)->prio)) && \ - !rt_task(p))) + ((curr)->static_prio > (curr)->prio /* * This is the time all tasks within the same priority round robin. @@ -3323,7 +3322,7 @@ static inline void major_prio_rotation(s */ static inline void rotate_runqueue_priority(struct rq *rq) { - int new_prio_level, remaining_quota; + int new_prio_level; struct prio_array *array; /* @@ -3334,7 +,6 @@ static inline void rotate_runqueue_prior if (unlikely(sched_find_first_bit(rq->dyn_bitmap) < rq->prio_level)) return; - remaining_quota = rq_quota(rq, rq->prio_level); array = rq->active; if (rq->prio_level > MAX_PRIO - 2) { /* Major rotation required */ @@ -3368,10 +3366,11 @@ static inline void rotate_runqueue_prior } rq->prio_level = new_prio_level; /* -* While we usually rotate with the rq quota being 0, it is possible -* to be negative so we subtract any deficit from the new level. +* As we are merging to a prio_level that may not have anything in +* its quota we add 1 to ensure the tasks get to run in schedule() to +* add their quota to it. */ - rq_quota(rq, new_prio_level) += remaining_quota; + rq_quota(rq, new_prio_level) += 1; } static void task_running_tick(struct rq *rq, struct task_struct *p) @@ -3397,12 +3396,11 @@ static void task_running_tick(struct rq if (!--p->time_slice) task_expired_entitlement(rq, p); /* -* The rq quota can become negative due to a task being queued in -* scheduler without any quota left at that priority level. It is -* cheaper to allow it to run till this scheduler tick and then -* subtract it from the quota of the merged queues. +* We only employ the deadline mechanism if we run over the quota. +* It allows aliasing problems around the scheduler_tick to be +* less harmful. */ - if (!rt_task(p) && --rq_quota(rq, rq->prio_level) <= 0) { + if (!rt_task(p) && --rq_quota(rq, rq->prio_level) < 0) { if (unlikely(p->first_time_slice)) p->first_time_slice = 0; rotate_runqueue_priority(rq); -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
On Monday 12 March 2007 05:11, Al Boldi wrote: > Al Boldi wrote: > > BTW, another way to show these hickups would be through some kind of a > > cpu/proc timing-tracer. Do we have something like that? > > Here is something like a tracer. > > Original idea by Chris Friesen, thanks, from this post: > http://marc.theaimsgroup.com/?l=linux-kernel=117331003029329=4 > > Try attached chew.c like this: > Boot into /bin/sh. > Run chew in one console. > Run nice chew in another console. > Watch timings. > > Console 1: ./chew > Console 2: nice -10 ./chew > pid 669, prio 10, out for5 ms > pid 669, prio 10, out for 65 ms One full expiration > pid 669, prio 10, out for6 ms > pid 669, prio 10, out for 65 ms again > Console 2: nice -15 ./chew > pid 673, prio 15, out for6 ms > pid 673, prio 15, out for 95 ms again and so on.. > OTOH, mainline is completely smooth, albeit huge drop-outs. Heh. That's not much good either is it. > Thanks! And thank you! I think I know what's going on now. I think each rotation is followed by another rotation before the higher priority task is getting a look in in schedule() to even get quota and add it to the runqueue quota. I'll try a simple change to see if that helps. Patch coming up shortly. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Sunday 11 March 2007 23:38, James Cloos wrote: > |> See: > |> http://webcvs.freedesktop.org/mesa/Mesa/src/mesa/drivers/dri/r200/r200_i > |>octl.c?revision=1.37=markup > > OK. > > Mesa is in git, now, but that still applies. The gitweb url is: > > http://gitweb.freedesktop.org/?p=mesa/mesa.git > > and for the version of the above file in the master branch: > > http://gitweb.freedesktop.org/?p=mesa/mesa.git;a=blob;f=src/mesa/drivers/dr >i/r200/r200_ioctl.c > > The recursive grep(1) on mesa shows: > > ,[grep -r sched_yield mesa] > > | mesa/mesa/src/mesa/drivers/dri/r300/radeon_ioctl.c: sched_yield(); > | mesa/mesa/src/mesa/drivers/dri/i915tex/intel_batchpool.c: > | sched_yield(); > | mesa/mesa/src/mesa/drivers/dri/i915tex/intel_batchbuffer.c: > | sched_yield(); mesa/mesa/src/mesa/drivers/dri/common/vblank.h:#include > |/* for sched_yield() */ > | mesa/mesa/src/mesa/drivers/dri/common/vblank.h:#include/* > | for sched_yield() */ mesa/mesa/src/mesa/drivers/dri/common/vblank.h: > | sched_yield(); \ > | mesa/mesa/src/mesa/drivers/dri/unichrome/via_ioctl.c: sched_yield(); > | mesa/mesa/src/mesa/drivers/dri/i915/intel_ioctl.c: sched_yield(); > | mesa/mesa/src/mesa/drivers/dri/r200/r200_ioctl.c: sched_yield(); > > ` > > Thanks for the heads up. I must've grep(1)ed the xorg subdir rather > than the parent dir, and so missed mesa. I just wonder what the heck all these will do to testing when using any of these drivers. Whether or not we do no yield, mild yield or full blown expiration yield, somehow or other I can't get over the feeling that if the code relies on yield() we can't really trust them to be meaningful cpu scheduler tests. This means most 3d apps out there that aren't using binary drivers, whether they be (fscking) glxgears, audio app visualisations or what... -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Sunday 11 March 2007 22:39, Mike Galbraith wrote: > Hi Con, > > On Sun, 2007-03-11 at 14:57 +1100, Con Kolivas wrote: > > What follows this email is a patch series for the latest version of the > > RSDL cpu scheduler (ie v0.29). I have addressed all bugs that I am able > > to reproduce in this version so if some people would be kind enough to > > test if there are any hidden bugs or oops lurking, it would be nice to > > know in anticipation of putting this back in -mm. Thanks. > > > > Full patch for 2.6.21-rc3-mm2: > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.29 > >.patch > > I'm seeing a cpu distribution problem running this on my P4 box. > > Scenario: > listening to music collection (mp3) via Amarok. Enable Amarok > visualization gforce, and size such that X and gforce each use ~50% cpu. > Start rip/encode of new CD with grip/lame encoder. Lame is set to use > both cpus, at nice 5. Once the encoders start, they receive > considerable more cpu than nice 0 X/Gforce, taking ~120% and leaving the > remaining 80% for X/Gforce and Amarok (when it updates it's ~12k entry > database) to squabble over. > > With 2.6.21-rc3, X/Gforce maintain their ~50% cpu (remain smooth), and > the encoders (100%cpu bound) get whats left when Amarok isn't eating it. > > I plunked the above patch into plain 2.6.21-rc3 and retested to > eliminate other mm tree differences, and it's repeatable. The nice 5 > cpu hogs always receive considerably more that the nice 0 sleepers. Thanks for the report. I'm assuming you're describing a single hyperthread P4 here in SMP mode so 2 logical cores. Can you elaborate on whether there is any difference as to which cpu things are bound to as well? Can you also see what happens with lame not niced to +5 (ie at 0) and with lame at nice +19. Thanks. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BIG] Re: sched rsdl fix for 0.28
On Sunday 11 March 2007 20:21, Con Kolivas wrote: > On Sunday 11 March 2007 20:10, Nicolas Mailhot wrote: > > Le dimanche 11 mars 2007 à 11:07 +1100, Con Kolivas a écrit : > > > sched rsdl fix > > > > Doesn't change a thing. Always breaks at the same place (though > > depending on hardware timings? the trace is not always the same). Pretty > > sure nothing happens before this failure > > Bummer. The only other thing to try is v0.29 posted recently. I still > haven't got a good way to reproduce this locally but I'll keep trying. > Thanks for testing. Oh and if that oopses and you still have the time, could you please test 0.29 on 2.6.20.2 (available from same directory). -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BIG] Re: sched rsdl fix for 0.28
On Sunday 11 March 2007 20:10, Nicolas Mailhot wrote: > Le dimanche 11 mars 2007 à 11:07 +1100, Con Kolivas a écrit : > > sched rsdl fix > > Doesn't change a thing. Always breaks at the same place (though > depending on hardware timings? the trace is not always the same). Pretty > sure nothing happens before this failure Bummer. The only other thing to try is v0.29 posted recently. I still haven't got a good way to reproduce this locally but I'll keep trying. Thanks for testing. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BIG] Re: sched rsdl fix for 0.28
On Sunday 11 March 2007 20:10, Nicolas Mailhot wrote: Le dimanche 11 mars 2007 à 11:07 +1100, Con Kolivas a écrit : sched rsdl fix Doesn't change a thing. Always breaks at the same place (though depending on hardware timings? the trace is not always the same). Pretty sure nothing happens before this failure Bummer. The only other thing to try is v0.29 posted recently. I still haven't got a good way to reproduce this locally but I'll keep trying. Thanks for testing. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BIG] Re: sched rsdl fix for 0.28
On Sunday 11 March 2007 20:21, Con Kolivas wrote: On Sunday 11 March 2007 20:10, Nicolas Mailhot wrote: Le dimanche 11 mars 2007 à 11:07 +1100, Con Kolivas a écrit : sched rsdl fix Doesn't change a thing. Always breaks at the same place (though depending on hardware timings? the trace is not always the same). Pretty sure nothing happens before this failure Bummer. The only other thing to try is v0.29 posted recently. I still haven't got a good way to reproduce this locally but I'll keep trying. Thanks for testing. Oh and if that oopses and you still have the time, could you please test 0.29 on 2.6.20.2 (available from same directory). -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Sunday 11 March 2007 22:39, Mike Galbraith wrote: Hi Con, On Sun, 2007-03-11 at 14:57 +1100, Con Kolivas wrote: What follows this email is a patch series for the latest version of the RSDL cpu scheduler (ie v0.29). I have addressed all bugs that I am able to reproduce in this version so if some people would be kind enough to test if there are any hidden bugs or oops lurking, it would be nice to know in anticipation of putting this back in -mm. Thanks. Full patch for 2.6.21-rc3-mm2: http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.29 .patch I'm seeing a cpu distribution problem running this on my P4 box. Scenario: listening to music collection (mp3) via Amarok. Enable Amarok visualization gforce, and size such that X and gforce each use ~50% cpu. Start rip/encode of new CD with grip/lame encoder. Lame is set to use both cpus, at nice 5. Once the encoders start, they receive considerable more cpu than nice 0 X/Gforce, taking ~120% and leaving the remaining 80% for X/Gforce and Amarok (when it updates it's ~12k entry database) to squabble over. With 2.6.21-rc3, X/Gforce maintain their ~50% cpu (remain smooth), and the encoders (100%cpu bound) get whats left when Amarok isn't eating it. I plunked the above patch into plain 2.6.21-rc3 and retested to eliminate other mm tree differences, and it's repeatable. The nice 5 cpu hogs always receive considerably more that the nice 0 sleepers. Thanks for the report. I'm assuming you're describing a single hyperthread P4 here in SMP mode so 2 logical cores. Can you elaborate on whether there is any difference as to which cpu things are bound to as well? Can you also see what happens with lame not niced to +5 (ie at 0) and with lame at nice +19. Thanks. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Sunday 11 March 2007 23:38, James Cloos wrote: | See: | http://webcvs.freedesktop.org/mesa/Mesa/src/mesa/drivers/dri/r200/r200_i |octl.c?revision=1.37view=markup OK. Mesa is in git, now, but that still applies. The gitweb url is: http://gitweb.freedesktop.org/?p=mesa/mesa.git and for the version of the above file in the master branch: http://gitweb.freedesktop.org/?p=mesa/mesa.git;a=blob;f=src/mesa/drivers/dr i/r200/r200_ioctl.c The recursive grep(1) on mesa shows: ,[grep -r sched_yield mesa] | mesa/mesa/src/mesa/drivers/dri/r300/radeon_ioctl.c: sched_yield(); | mesa/mesa/src/mesa/drivers/dri/i915tex/intel_batchpool.c: | sched_yield(); | mesa/mesa/src/mesa/drivers/dri/i915tex/intel_batchbuffer.c: | sched_yield(); mesa/mesa/src/mesa/drivers/dri/common/vblank.h:#include | sched.h /* for sched_yield() */ | mesa/mesa/src/mesa/drivers/dri/common/vblank.h:#include sched.h /* | for sched_yield() */ mesa/mesa/src/mesa/drivers/dri/common/vblank.h: | sched_yield(); \ | mesa/mesa/src/mesa/drivers/dri/unichrome/via_ioctl.c: sched_yield(); | mesa/mesa/src/mesa/drivers/dri/i915/intel_ioctl.c: sched_yield(); | mesa/mesa/src/mesa/drivers/dri/r200/r200_ioctl.c: sched_yield(); ` Thanks for the heads up. I must've grep(1)ed the xorg subdir rather than the parent dir, and so missed mesa. I just wonder what the heck all these will do to testing when using any of these drivers. Whether or not we do no yield, mild yield or full blown expiration yield, somehow or other I can't get over the feeling that if the code relies on yield() we can't really trust them to be meaningful cpu scheduler tests. This means most 3d apps out there that aren't using binary drivers, whether they be (fscking) glxgears, audio app visualisations or what... -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
On Monday 12 March 2007 05:11, Al Boldi wrote: Al Boldi wrote: BTW, another way to show these hickups would be through some kind of a cpu/proc timing-tracer. Do we have something like that? Here is something like a tracer. Original idea by Chris Friesen, thanks, from this post: http://marc.theaimsgroup.com/?l=linux-kernelm=117331003029329w=4 Try attached chew.c like this: Boot into /bin/sh. Run chew in one console. Run nice chew in another console. Watch timings. Console 1: ./chew Console 2: nice -10 ./chew pid 669, prio 10, out for5 ms pid 669, prio 10, out for 65 ms One full expiration pid 669, prio 10, out for6 ms pid 669, prio 10, out for 65 ms again Console 2: nice -15 ./chew pid 673, prio 15, out for6 ms pid 673, prio 15, out for 95 ms again and so on.. OTOH, mainline is completely smooth, albeit huge drop-outs. Heh. That's not much good either is it. Thanks! And thank you! I think I know what's going on now. I think each rotation is followed by another rotation before the higher priority task is getting a look in in schedule() to even get quota and add it to the runqueue quota. I'll try a simple change to see if that helps. Patch coming up shortly. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
On Monday 12 March 2007 08:52, Con Kolivas wrote: And thank you! I think I know what's going on now. I think each rotation is followed by another rotation before the higher priority task is getting a look in in schedule() to even get quota and add it to the runqueue quota. I'll try a simple change to see if that helps. Patch coming up shortly. Can you try the following patch and see if it helps. There's also one minor preemption logic fix in there that I'm planning on including. Thanks! --- kernel/sched.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) Index: linux-2.6.21-rc3-mm2/kernel/sched.c === --- linux-2.6.21-rc3-mm2.orig/kernel/sched.c2007-03-12 08:47:43.0 +1100 +++ linux-2.6.21-rc3-mm2/kernel/sched.c 2007-03-12 09:10:33.0 +1100 @@ -96,10 +96,9 @@ unsigned long long __attribute__((weak)) * provided it is not a realtime comparison. */ #define TASK_PREEMPTS_CURR(p, curr) \ - (((p)-prio (curr)-prio) || (((p)-prio == (curr)-prio) \ + (((p)-prio (curr)-prio) || (!rt_task(p) \ ((p)-static_prio (curr)-static_prio \ - ((curr)-static_prio (curr)-prio)) \ - !rt_task(p))) + ((curr)-static_prio (curr)-prio /* * This is the time all tasks within the same priority round robin. @@ -3323,7 +3322,7 @@ static inline void major_prio_rotation(s */ static inline void rotate_runqueue_priority(struct rq *rq) { - int new_prio_level, remaining_quota; + int new_prio_level; struct prio_array *array; /* @@ -3334,7 +,6 @@ static inline void rotate_runqueue_prior if (unlikely(sched_find_first_bit(rq-dyn_bitmap) rq-prio_level)) return; - remaining_quota = rq_quota(rq, rq-prio_level); array = rq-active; if (rq-prio_level MAX_PRIO - 2) { /* Major rotation required */ @@ -3368,10 +3366,11 @@ static inline void rotate_runqueue_prior } rq-prio_level = new_prio_level; /* -* While we usually rotate with the rq quota being 0, it is possible -* to be negative so we subtract any deficit from the new level. +* As we are merging to a prio_level that may not have anything in +* its quota we add 1 to ensure the tasks get to run in schedule() to +* add their quota to it. */ - rq_quota(rq, new_prio_level) += remaining_quota; + rq_quota(rq, new_prio_level) += 1; } static void task_running_tick(struct rq *rq, struct task_struct *p) @@ -3397,12 +3396,11 @@ static void task_running_tick(struct rq if (!--p-time_slice) task_expired_entitlement(rq, p); /* -* The rq quota can become negative due to a task being queued in -* scheduler without any quota left at that priority level. It is -* cheaper to allow it to run till this scheduler tick and then -* subtract it from the quota of the merged queues. +* We only employ the deadline mechanism if we run over the quota. +* It allows aliasing problems around the scheduler_tick to be +* less harmful. */ - if (!rt_task(p) --rq_quota(rq, rq-prio_level) = 0) { + if (!rt_task(p) --rq_quota(rq, rq-prio_level) 0) { if (unlikely(p-first_time_slice)) p-first_time_slice = 0; rotate_runqueue_priority(rq); -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
On Monday 12 March 2007 09:29, bert hubert wrote: Con, Recent kernel versions have real problems for me on the interactivity front, with even a simple 'make' of my C++ program (PowerDNS) causing Firefox to slow down to a crawl. RSDL fixed all that, the system is noticeably snappier. As a case in point, I used to notice when a compile was done because the system stopped being sluggish. Today, a few times, I only noticed 'make' was done because the fans of my computer slowed down. Thanks for the good work! I'm on 2.6.21-rc3-rsdl-0.29. You're most welcome, and thank you for the report :) Bert -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [RSDL] sched: rsdl accounting fixes
Andrew the following patch can be rolled into the sched-implement-rsdl-cpu-scheduler.patch file or added separately if that's easier. All the oopses and bitmap errors of previous versions of rsdl were fixed by v0.29 so I think RSDL is ready for another round in -mm. Thanks. --- Higher priority tasks should always preempt lower priority tasks if they are queued higher than their static priority as non-rt tasks. Fix it. The deadline mechanism can be triggered before tasks' quota ever gets added to the runqueue priority level's quota. Add 1 to the quota in anticipation of this. The deadline mechanism should only be triggered if the quota is overrun instead of as soon as the quota is expired allowing some aliasing errors in scheduler_tick accounting. Fix that Signed-off-by: Con Kolivas [EMAIL PROTECTED] --- kernel/sched.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) Index: linux-2.6.21-rc3-mm2/kernel/sched.c === --- linux-2.6.21-rc3-mm2.orig/kernel/sched.c2007-03-12 08:47:43.0 +1100 +++ linux-2.6.21-rc3-mm2/kernel/sched.c 2007-03-12 09:10:33.0 +1100 @@ -96,10 +96,9 @@ unsigned long long __attribute__((weak)) * provided it is not a realtime comparison. */ #define TASK_PREEMPTS_CURR(p, curr) \ - (((p)-prio (curr)-prio) || (((p)-prio == (curr)-prio) \ + (((p)-prio (curr)-prio) || (!rt_task(p) \ ((p)-static_prio (curr)-static_prio \ - ((curr)-static_prio (curr)-prio)) \ - !rt_task(p))) + ((curr)-static_prio (curr)-prio /* * This is the time all tasks within the same priority round robin. @@ -3323,7 +3322,7 @@ static inline void major_prio_rotation(s */ static inline void rotate_runqueue_priority(struct rq *rq) { - int new_prio_level, remaining_quota; + int new_prio_level; struct prio_array *array; /* @@ -3334,7 +,6 @@ static inline void rotate_runqueue_prior if (unlikely(sched_find_first_bit(rq-dyn_bitmap) rq-prio_level)) return; - remaining_quota = rq_quota(rq, rq-prio_level); array = rq-active; if (rq-prio_level MAX_PRIO - 2) { /* Major rotation required */ @@ -3368,10 +3366,11 @@ static inline void rotate_runqueue_prior } rq-prio_level = new_prio_level; /* -* While we usually rotate with the rq quota being 0, it is possible -* to be negative so we subtract any deficit from the new level. +* As we are merging to a prio_level that may not have anything in +* its quota we add 1 to ensure the tasks get to run in schedule() to +* add their quota to it. */ - rq_quota(rq, new_prio_level) += remaining_quota; + rq_quota(rq, new_prio_level) += 1; } static void task_running_tick(struct rq *rq, struct task_struct *p) @@ -3397,12 +3396,11 @@ static void task_running_tick(struct rq if (!--p-time_slice) task_expired_entitlement(rq, p); /* -* The rq quota can become negative due to a task being queued in -* scheduler without any quota left at that priority level. It is -* cheaper to allow it to run till this scheduler tick and then -* subtract it from the quota of the merged queues. +* We only employ the deadline mechanism if we run over the quota. +* It allows aliasing problems around the scheduler_tick to be +* less harmful. */ - if (!rt_task(p) --rq_quota(rq, rq-prio_level) = 0) { + if (!rt_task(p) --rq_quota(rq, rq-prio_level) 0) { if (unlikely(p-first_time_slice)) p-first_time_slice = 0; rotate_runqueue_priority(rq); -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RSDL v0.30 cpu scheduler for mainline kernels
There are updated patches for 2.6.20, 2.6.20.2, 2.6.21-rc3 and 2.6.21-rc3-mm2 to bring RSDL up to version 0.30 for download here: Full patches: http://ck.kolivas.org/patches/staircase-deadline/2.6.20-sched-rsdl-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.20.2-rsdl-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.30.patch incrementals: http://ck.kolivas.org/patches/staircase-deadline/2.6.20/2.6.20.2-rsdl-0.29-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.20.2/2.6.20.2-rsdl-0.29-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3/2.6.21-rc3-rsdl-0.29-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2/2.6.21-rc3-mm2-rsdl-0.29-0.30.patch -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.30 cpu scheduler for ... 2.6.18.8 kernel
On Monday 12 March 2007 19:17, Vincent Fortier wrote: There are updated patches for 2.6.20, 2.6.20.2, 2.6.21-rc3 and 2.6.21-rc3-mm2 to bring RSDL up to version 0.30 for download here: Full patches: http://ck.kolivas.org/patches/staircase-deadline/2.6.20-sched-rsdl-0.30.p at ch http://ck.kolivas.org/patches/staircase-deadline/2.6.20.2-rsdl-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0. 30 .patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.30 .p atch incrementals: http://ck.kolivas.org/patches/staircase-deadline/2.6.20/2.6.20.2-rsdl-0.2 9- 0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.20.2/2.6.20.2-rsdl-0 .2 9-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3/2.6.21-rc3-rs dl -0.29-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2/2.6.21-rc 3- mm2-rsdl-0.29-0.30.patch And here are the backported RSDL 0.30 patches in case any of you would still be running an older 2.6.18.8 kernel ... Thanks, your efforts are appreciated as it would take me quite a while to do a variety of backports that people are already requesting. Just for info, verison 0.30 seems around 2 seconds faster than 0.26-0.29 versions at boot time. I used to have around 2-3 seconds of difference between a vanilla and a rsdl patched kernel. Now it looks more like 5 seconds faster! Wow.. nice work CK! 2.6.18.8 vanilla kernel: [ 68.514248] ACPI: Power Button (CM) [PWRB] 2.6.18.8-rsdl-0.30: [ 63.739337] ACPI: Power Button (CM) [PWRB] Indeed there's almost 5 seconds difference there. To be honest, the boot time speedups are an unexpected bonus, but everyone seems to be reporting them on all flavours so perhaps all those timeout related driver setups are inadvertently benefiting. - vin Thanks -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
On Monday 12 March 2007 15:42, Al Boldi wrote: Con Kolivas wrote: On Monday 12 March 2007 08:52, Con Kolivas wrote: And thank you! I think I know what's going on now. I think each rotation is followed by another rotation before the higher priority task is getting a look in in schedule() to even get quota and add it to the runqueue quota. I'll try a simple change to see if that helps. Patch coming up shortly. Can you try the following patch and see if it helps. There's also one minor preemption logic fix in there that I'm planning on including. Thanks! Applied on top of v0.28 mainline, and there is no difference. What's it look like on your machine? The higher priority one always get 6-7ms whereas the lower priority one runs 6-7ms and then one larger perfectly bound expiration amount. Basically exactly as I'd expect. The higher priority task gets precisely RR_INTERVAL maximum latency whereas the lower priority task gets RR_INTERVAL min and full expiration (according to the virtual deadline) as a maximum. That's exactly how I intend it to work. Yes I realise that the max latency ends up being longer intermittently on the niced task but that's -in my opinion- perfectly fine as a compromise to ensure the nice 0 one always gets low latency. Eg: nice 0 vs nice 10 nice 0: pid 6288, prio 0, out for7 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms nice 10: pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for 66 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms exactly as I'd expect. If you want fixed latencies _of niced tasks_ in the presence of less niced tasks you will not get them with this scheduler. What you will get, though, is a perfectly bound relationship knowing exactly what the maximum latency will ever be. Thanks for the test case. It's interesting and nice that it confirms this scheduler works as I expect it to. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL for 2.6.21-rc3- 0.29
Hi Gene. On Monday 12 March 2007 16:38, Gene Heskett wrote: I hate to say it Con, but this one seems to have broken the amanda-tar symbiosis. I haven't tried a plain 21-rc3, so the problem may exist there, and in fact it did for 21-rc1, but I don't recall if it was true for -rc2. But I will have a plain 21-rc3 running by tomorrow nights amanda run to test. What happens is that when amanda tells tar to do a level 1 or 2, tar still thinks its doing a level 0. The net result is that the tape is filled completely and amanda does an EOT exit in about 10 of my 42 dle's. This is tar-1.15-1 for fedora core 6. I'm sorry but I have to say I have no idea what any of this means. I gather you're making an association between some application combination failing and RSDL cpu scheduler. Unfortunately the details of what the problem is, or how the cpu scheduler is responsible, escape me :( -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL-mm 0.28
On Sunday 11 March 2007 15:03, Matt Mackall wrote: > On Sat, Mar 10, 2007 at 10:01:32PM -0600, Matt Mackall wrote: > > On Sun, Mar 11, 2007 at 01:28:22PM +1100, Con Kolivas wrote: > > > Ok I don't think there's any actual accounting problem here per se > > > (although I did just recently post a bugfix for rsdl however I think > > > that's unrelated). What I think is going on in the ccache testcase is > > > that all the work is being offloaded to kernel threads reading/writing > > > to/from the filesystem and the make is not getting any actual cpu > > > time. > > > > I don't see significant system time while this is happening. > > Also, it's running pretty much entirely out of page cache so there > wouldn't be a whole lot for kernel threads to do. Well I can't reproduce that behaviour here at all whether from disk or the pagecache with ccache, so I'm not entirely sure what's different at your end. However both you and the other person reporting bad behaviour were using ATI drivers. That's about the only commonality? I wonder if they do need to yield... somewhat instead of not at all. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 7/7] sched: document rsdl cpu scheduler
From: Con Kolivas <[EMAIL PROTECTED]> Add comprehensive documentation of the RSDL cpu scheduler design. Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> Cc: Nick Piggin <[EMAIL PROTECTED]> Cc: "Siddha, Suresh B" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- Documentation/sched-design.txt | 273 - 1 file changed, 267 insertions(+), 6 deletions(-) Index: linux-2.6.21-rc3-mm2/Documentation/sched-design.txt === --- linux-2.6.21-rc3-mm2.orig/Documentation/sched-design.txt2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/Documentation/sched-design.txt 2007-03-11 14:48:00.0 +1100 @@ -1,11 +1,14 @@ - Goals, Design and Implementation of the - new ultra-scalable O(1) scheduler + Goals, Design and Implementation of the ultra-scalable O(1) scheduler by + Ingo Molnar and the Rotating Staircase Deadline cpu scheduler policy + designed by Con Kolivas. - This is an edited version of an email Ingo Molnar sent to - lkml on 4 Jan 2002. It describes the goals, design, and - implementation of Ingo's new ultra-scalable O(1) scheduler. - Last Updated: 18 April 2002. + This was originally an edited version of an email Ingo Molnar sent to + lkml on 4 Jan 2002. It describes the goals, design, and implementation + of Ingo's ultra-scalable O(1) scheduler. It now contains a description + of the Rotating Staircase Deadline priority scheduler that was built on + this design. + Last Updated: Sun Feb 25 2007 Goal @@ -163,3 +166,261 @@ certain code paths and data constructs. code is smaller than the old one. Ingo + + +Rotating Staircase Deadline cpu scheduler policy + + +Design summary +== + +A novel design which incorporates a foreground-background descending priority +system (the staircase) with runqueue managed minor and major epochs (rotation +and deadline). + + +Features + + +A starvation free, strict fairness O(1) scalable design with interactivity +as good as the above restrictions can provide. There is no interactivity +estimator, no sleep/run measurements and only simple fixed accounting. +The design has strict enough a design and accounting that task behaviour +can be modelled and maximum scheduling latencies can be predicted by +the virtual deadline mechanism that manages runqueues. The prime concern +in this design is to maintain fairness at all costs determined by nice level, +yet to maintain as good interactivity as can be allowed within the +constraints of strict fairness. + + +Design description +== + +RSDL works off the principle of providing each task a quota of runtime that +it is allowed to run at each priority level equal to its static priority +(ie. its nice level) and every priority below that. When each task is queued, +the cpu that it is queued onto also keeps a record of that quota. If the +task uses up its quota it is decremented one priority level. Also, if the cpu +notices a quota full has been used for that priority level, it pushes +everything remaining at that priority level to the next lowest priority +level. Once every runtime quota has been consumed of every priority level, +a task is queued on the "expired" array. When no other tasks exist with +quota, the expired array is activated and fresh quotas are handed out. This +is all done in O(1). + + +Design details +== + +Each cpu has its own runqueue which micromanages its own epochs, and each +task keeps a record of its own entitlement of cpu time. Most of the rest +of these details apply to non-realtime tasks as rt task management is +straight forward. + +Each runqueue keeps a record of what major epoch it is up to in the +rq->prio_rotation field which is incremented on each major epoch. It also +keeps a record of quota available to each priority value valid for that +major epoch in rq->prio_quota[]. + +Each task keeps a record of what major runqueue epoch it was last running +on in p->rotation. It also keeps a record of what priority levels it has +already been allocated quota from during this epoch in a bitmap p->bitmap. + +The only tunable that determines all other details is the RR_INTERVAL. This +is set to 6ms (minimum on 1000HZ, higher at different HZ values). + +All tasks are initially given a quota based on RR_INTERVAL. This is equal to +RR_INTERVAL between nice values of 0 and 19, and progressively larger for +nice values from -1 to -20. This is assigned to p->quota and only changes +with changes in nice level. + +As a task is first queued, it checks in recalc_task_prio to see if it has +run at this runqueue's current priority rotation. If it has not, it will +have its p->prio level set to equal its p->stat
[PATCH][RSDL-mm 3/7] sched: remove noninteractive flag
From: Con Kolivas <[EMAIL PROTECTED]> Remove the TASK_NONINTERACTIVE flag as it will no longer be used. Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> Cc: Nick Piggin <[EMAIL PROTECTED]> Cc: "Siddha, Suresh B" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- fs/pipe.c |7 +-- include/linux/sched.h |3 +-- 2 files changed, 2 insertions(+), 8 deletions(-) Index: linux-2.6.21-rc3-mm2/fs/pipe.c === --- linux-2.6.21-rc3-mm2.orig/fs/pipe.c 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/fs/pipe.c 2007-03-11 14:47:59.0 +1100 @@ -41,12 +41,7 @@ void pipe_wait(struct pipe_inode_info *p { DEFINE_WAIT(wait); - /* -* Pipes are system-local resources, so sleeping on them -* is considered a noninteractive wait: -*/ - prepare_to_wait(>wait, , - TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE); + prepare_to_wait(>wait, , TASK_INTERRUPTIBLE); if (pipe->inode) mutex_unlock(>inode->i_mutex); schedule(); Index: linux-2.6.21-rc3-mm2/include/linux/sched.h === --- linux-2.6.21-rc3-mm2.orig/include/linux/sched.h 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/include/linux/sched.h 2007-03-11 14:47:59.0 +1100 @@ -150,8 +150,7 @@ extern unsigned long weighted_cpuload(co #define EXIT_ZOMBIE16 #define EXIT_DEAD 32 /* in tsk->state again */ -#define TASK_NONINTERACTIVE64 -#define TASK_DEAD 128 +#define TASK_DEAD 64 #define __set_task_state(tsk, state_value) \ do { (tsk)->state = (state_value); } while (0) -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 4/7] sched: implement 180 bit sched bitmap
From: Con Kolivas <[EMAIL PROTECTED]> Modify the sched_find_first_bit function to work on a 180bit long bitmap. Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> Cc: Nick Piggin <[EMAIL PROTECTED]> Cc: "Siddha, Suresh B" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- include/asm-generic/bitops/sched.h | 10 ++ include/asm-s390/bitops.h | 12 +--- 2 files changed, 7 insertions(+), 15 deletions(-) Index: linux-2.6.21-rc3-mm2/include/asm-generic/bitops/sched.h === --- linux-2.6.21-rc3-mm2.orig/include/asm-generic/bitops/sched.h 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/include/asm-generic/bitops/sched.h 2007-03-11 14:47:59.0 +1100 @@ -6,8 +6,8 @@ /* * Every architecture must define this function. It's the fastest - * way of searching a 140-bit bitmap where the first 100 bits are - * unlikely to be set. It's guaranteed that at least one of the 140 + * way of searching a 180-bit bitmap where the first 100 bits are + * unlikely to be set. It's guaranteed that at least one of the 180 * bits is cleared. */ static inline int sched_find_first_bit(const unsigned long *b) @@ -15,7 +15,7 @@ static inline int sched_find_first_bit(c #if BITS_PER_LONG == 64 if (unlikely(b[0])) return __ffs(b[0]); - if (likely(b[1])) + if (b[1]) return __ffs(b[1]) + 64; return __ffs(b[2]) + 128; #elif BITS_PER_LONG == 32 @@ -27,7 +27,9 @@ static inline int sched_find_first_bit(c return __ffs(b[2]) + 64; if (b[3]) return __ffs(b[3]) + 96; - return __ffs(b[4]) + 128; + if (b[4]) + return __ffs(b[4]) + 128; + return __ffs(b[5]) + 160; #else #error BITS_PER_LONG not defined #endif Index: linux-2.6.21-rc3-mm2/include/asm-s390/bitops.h === --- linux-2.6.21-rc3-mm2.orig/include/asm-s390/bitops.h 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/include/asm-s390/bitops.h 2007-03-11 14:47:59.0 +1100 @@ -729,17 +729,7 @@ find_next_bit (const unsigned long * add return offset + find_first_bit(p, size); } -/* - * Every architecture must define this function. It's the fastest - * way of searching a 140-bit bitmap where the first 100 bits are - * unlikely to be set. It's guaranteed that at least one of the 140 - * bits is cleared. - */ -static inline int sched_find_first_bit(unsigned long *b) -{ - return find_first_bit(b, 140); -} - +#include #include #include -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 5/7] sched dont renice kernel threads
The practice of renicing kernel threads to negative nice values is of questionable benefit at best, and at worst leads to larger latencies when kernel threads are busy on behalf of other tasks. Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> --- kernel/workqueue.c |1 - 1 file changed, 1 deletion(-) Index: linux-2.6.21-rc3-mm2/kernel/workqueue.c === --- linux-2.6.21-rc3-mm2.orig/kernel/workqueue.c2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/kernel/workqueue.c 2007-03-11 14:47:59.0 +1100 @@ -294,7 +294,6 @@ static int worker_thread(void *__cwq) if (!cwq->wq->freezeable) current->flags |= PF_NOFREEZE; - set_user_nice(current, -5); /* * We inherited MPOL_INTERLEAVE from the booting kernel. * Set MPOL_DEFAULT to insure node local allocations. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 1/7] lists: add list splice tail
From: Con Kolivas <[EMAIL PROTECTED]> Add a list_splice_tail variant of list_splice. Patch-by: Peter Zijlstra <[EMAIL PROTECTED]> Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> Cc: Nick Piggin <[EMAIL PROTECTED]> Cc: "Siddha, Suresh B" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- include/linux/list.h | 42 ++ 1 file changed, 42 insertions(+) Index: linux-2.6.21-rc3-mm2/include/linux/list.h === --- linux-2.6.21-rc3-mm2.orig/include/linux/list.h 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/include/linux/list.h 2007-03-11 14:47:59.0 +1100 @@ -333,6 +333,20 @@ static inline void __list_splice(struct at->prev = last; } +static inline void __list_splice_tail(struct list_head *list, + struct list_head *head) +{ + struct list_head *first = list->next; + struct list_head *last = list->prev; + struct list_head *at = head->prev; + + first->prev = at; + at->next = first; + + last->next = head; + head->prev = last; +} + /** * list_splice - join two lists * @list: the new list to add. @@ -345,6 +359,18 @@ static inline void list_splice(struct li } /** + * list_splice_tail - join two lists at one's tail + * @list: the new list to add. + * @head: the place to add it in the first list. + */ +static inline void list_splice_tail(struct list_head *list, + struct list_head *head) +{ + if (!list_empty(list)) + __list_splice_tail(list, head); +} + +/** * list_splice_init - join two lists and reinitialise the emptied list. * @list: the new list to add. * @head: the place to add it in the first list. @@ -417,6 +443,22 @@ static inline void list_splice_init_rcu( } /** + * list_splice_tail_init - join 2 lists at one's tail & reinitialise emptied + * @list: the new list to add. + * @head: the place to add it in the first list. + * + * The list at @list is reinitialised + */ +static inline void list_splice_tail_init(struct list_head *list, +struct list_head *head) +{ + if (!list_empty(list)) { + __list_splice_tail(list, head); + INIT_LIST_HEAD(list); + } +} + +/** * list_entry - get the struct for this entry * @ptr: the list_head pointer. * @type: the type of the struct this is embedded in. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 2/7] sched: remove sleepavg from proc
From: Con Kolivas <[EMAIL PROTECTED]> Remove the sleep_avg field from proc output as it will be removed from the task_struct. Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> Cc: Nick Piggin <[EMAIL PROTECTED]> Cc: "Siddha, Suresh B" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- fs/proc/array.c |2 -- 1 file changed, 2 deletions(-) Index: linux-2.6.21-rc3-mm2/fs/proc/array.c === --- linux-2.6.21-rc3-mm2.orig/fs/proc/array.c 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/fs/proc/array.c2007-03-11 14:47:59.0 +1100 @@ -171,7 +171,6 @@ static inline char * task_state(struct t buffer += sprintf(buffer, "State:\t%s\n" - "SleepAVG:\t%lu%%\n" "Tgid:\t%d\n" "Pid:\t%d\n" "PPid:\t%d\n" @@ -179,7 +178,6 @@ static inline char * task_state(struct t "Uid:\t%d\t%d\t%d\t%d\n" "Gid:\t%d\t%d\t%d\t%d\n", get_task_state(p), - (p->sleep_avg/1024)*100/(102000/1024), p->tgid, p->pid, pid_alive(p) ? rcu_dereference(p->parent)->tgid : 0, tracer_pid, -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
What follows this email is a patch series for the latest version of the RSDL cpu scheduler (ie v0.29). I have addressed all bugs that I am able to reproduce in this version so if some people would be kind enough to test if there are any hidden bugs or oops lurking, it would be nice to know in anticipation of putting this back in -mm. Thanks. Full patch for 2.6.21-rc3-mm2: http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.29.patch Patch series (which will follow this email): http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2/ Changelog: - Fixed the longstanding buggy bitmap problem which occurred due to swapping arrays when there were still tasks on the active array. - Fixed preemption of realtime tasks when rt prio inheritance elevated their priority. - Made kernel threads not be reniced to -5 by default - Changed sched_yield behaviour of SCHED_NORMAL (SCHED_OTHER) to resemble realtime task yielding. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL-mm 0.28
On Sunday 11 March 2007 14:39, Andrew Morton wrote: > > On Sun, 11 Mar 2007 14:59:28 +1100 Con Kolivas <[EMAIL PROTECTED]> wrote: > > > Bottom line: we've had a _lot_ of problems with the new yield() > > > semantics. We effectively broke back-compatibility by changing its > > > behaviour a lot, and we can't really turn around and blame application > > > developers for that. > > > > So... I would take it that's a yes for a recommendation with respect to > > implementing a new yield() ? A new scheduler is as good a time as any to > > do it. > > I guess so. We'd, err, need to gather Ingo's input ;) cc'ed. Don't you hate timezones? > Perhaps a suitable way of doing this would be to characterise then emulate > the 2.4 behaviour. As long as it turns out to be vaguely sensible. It's really very simple. We just go the end of the current queued priority on the same array instead of swapping to the expired array; ie we do what realtime tasks currently do. It works fine here locally afaict. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL-mm 0.28
On Sunday 11 March 2007 14:16, Andrew Morton wrote: > > On Sun, 11 Mar 2007 13:28:22 +1100 "Con Kolivas" <[EMAIL PROTECTED]> > > wrote: Well... are you advocating we change sched_yield semantics to a > > gentler form? > > > >From a practical POV: our present yield() behaviour is so truly awful that > > it's basically always a bug to use it. This probably isn't a good thing. > > So yes, I do think that we should have a rethink and try to come up with > behaviour which is more in accord with what application developers expect > yield() to do. > > otoh, > > a) we should have done this five years ago. Instead, we've spent that >time training userspace programmers to not use yield(), so perhaps >there's little to be gained in changing it now. > > b) if we _were_ to change yield(), people would use it more, and their >applications would of course suck bigtime when run on earlier 2.6 >kernels. > > > Bottom line: we've had a _lot_ of problems with the new yield() semantics. > We effectively broke back-compatibility by changing its behaviour a lot, > and we can't really turn around and blame application developers for that. So... I would take it that's a yes for a recommendation with respect to implementing a new yield() ? A new scheduler is as good a time as any to do it. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL-mm 0.28
On 11/03/07, Matt Mackall <[EMAIL PROTECTED]> wrote: I've tested -mm2 against -mm2+noyield and -mm2+rsdl+noyield. The noyield patch simply makes the sched_yield syscall return immediately. Xorg and all tests are run at nice 0. Loads: memload: constant memcpy of 16MB buffer execload: constant re-exec of a trivial shell script forkload: constant fork and exit of a trivial shell script make -j 5: hot-cache kernel build without ccache make -j 5 ccache: hot-cache kernel build with ccache Tests: beryl - 3D window manager, wiggle windows, spin desktop, etc. galeon - web browser, rapidly scrolling long web pages by grabbing the scroll bar mp3 - XMMS on a FUSE sshfs over wireless (during all tests) terminal - responsiveness of ssh and local terminal sessions mouse - responsiveness of mouse pointer Results: great = completely smooth good = fully responsive ok = visible latency bad = becomes difficult to use (or mp3 skips) awful = make it stop, please -mm2-mm2+noyield rsdl+noyield no load berylgreat great great galeon goodgood good mp3 goodgood good terminal goodgood good mousegoodgood good memload x10 berylawful/bad great good galeon goodgood ok/good mp3 goodgood good terminal goodgood good mousegoodgood good execload x10 berylawful/bad bad/good good galeon goodbad/good ok/good mp3 goodbadgood terminal goodbad/good good mousegoodbad/good good forkload x10 berylgoodgood great galeon goodgood ok/good mp3 goodgood good terminal goodgood ok/good mousegoodgood good make -j 5 berylok good good/great galeon goodgood ok/good mp3 goodgood good terminal goodgood good mousegoodgood good make -j 5 ccache berylok good awful galeon goodgood bad mp3 goodgood bad terminal goodgood bad/ok mousegoodgood bad/ok make -j 5 real 8m1.857s8m50.659s 8m9.282s user 7m19.127s 8m3.494s 7m30.740s sys 0m30.910s 0m33.722s 0m29.542s make -j 5 ccache real 2m6.182s2m19.032s 2m1.832s user 1m39.466s 1m48.787s 1m37.250s sys 0m19.741s 0m22.993s 0m20.109s Thanks very much for that comprehensive summary and testing! There's a substantial performance hit for not yield, so we probably want to investigate alternate semantics for it. It seems reasonable for apps to say "let me not hog the CPU" without completely expiring them. Imagine you're in the front of the line (aka queue) and you spend a moment fumbling for your wallet. The polite thing to do is to let the next guy in front. But with the current sched_yield, you go all the way to the back of the line. Well... are you advocating we change sched_yield semantics to a gentler form? This is a cinch to implement but I know how Ingo feels about this. It will only encourage more lax coding using sched_yield instead of proper blocking (see huge arguments with the ldap people on this one who insist it's impossible not to use yield). RSDL makes most of the noyield hit back in normal make and then some with ccache. Impressive. But ccache is still destroying interactivity somehow. The ccache effect is fairly visible even with non-parallel 'make'. Ok I don't think there's any actual accounting problem here per se (although I did just recently post a bugfix for rsdl however I think that's unrelated). What I think is going on in the ccache testcase is that all the work is being offloaded to kernel threads reading/writing to/from the filesystem and the make is not getting any actual cpu time. This is "worked around" in mainline thanks to the testing for sleeping on uninterruptible sleep in the interactivity estimator. What I suspect is happening is kernel threads that are running nice -5 are doing all the work on make's behalf in the setting of ccache since it is mostly i/o bound. The reason for -nice values on kernel threads is questionable anyway. Can you try renicing your kernel threads all to nice 0 and see what effect that has? Obviously this doesn't need a recompile, but is simple enough to implement in kthread code as a new default. Also note I could occassionally trigger nasty multi-second pauses with -mm2+noyield under exectest that didn't show up elsewhere. That's probably a bug in the mainline scheduler. Ew. It's probably not a bug but a good example of some of the starvation scenarios we're hitting on mainline (hence the need for a rewrite ;))
sched rsdl fix for 0.28
Here's a big bugfix for sched rsdl 0.28 --- kernel/sched.c |7 +++ 1 file changed, 7 insertions(+) Index: linux-2.6.21-rc3-mm2/kernel/sched.c === --- linux-2.6.21-rc3-mm2.orig/kernel/sched.c2007-03-11 11:04:38.0 +1100 +++ linux-2.6.21-rc3-mm2/kernel/sched.c 2007-03-11 11:05:46.0 +1100 @@ -3328,6 +3328,13 @@ static inline void rotate_runqueue_prior int new_prio_level, remaining_quota = rq_quota(rq, rq->prio_level); struct prio_array *array = rq->active; + /* +* Make sure we don't have tasks still on the active array that +* haven't run due to not preempting (merging or smp balancing) +*/ + if (find_next_bit(rq->dyn_bitmap, MAX_PRIO, MAX_RT_PRIO) < + rq->prio_level) + return; if (rq->prio_level > MAX_PRIO - 2) { /* Major rotation required */ struct prio_array *new_queue = rq->expired; -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.28 for 2.6.20
On Sunday 11 March 2007 06:11, Willy Tarreau wrote: > On Sat, Mar 10, 2007 at 01:09:35PM -0500, Stephen Clark wrote: > > Con Kolivas wrote: > > >Here is an update for RSDL to version 0.28 > > > > > >Full patch: > > >http://ck.kolivas.org/patches/staircase-deadline/2.6.20-sched-rsdl-0.28. > > >patch > > > > > >Series: > > >http://ck.kolivas.org/patches/staircase-deadline/2.6.20/ > > > > > >The patch to get you from 0.26 to 0.28: > > >http://ck.kolivas.org/patches/staircase-deadline/2.6.20/sched-rsdl-0.26- > > >0.28.patch > > > > > >A similar patch and directories will be made for 2.6.21-rc3 without > > >further announcement > > > > doesn't apply against 2.6.20.2: > > > > patch -p1 <~/2.6.20-sched-rsdl-0.28.patch --dry-run > > patching file include/linux/list.h > > patching file fs/proc/array.c > > patching file fs/pipe.c > > patching file include/linux/sched.h > > patching file include/asm-generic/bitops/sched.h > > patching file include/asm-s390/bitops.h > > patching file kernel/sched.c > > Hunk #41 FAILED at 3531. > > 1 out of 62 hunks FAILED -- saving rejects to file kernel/sched.c.rej > > patching file include/linux/init_task.h > > patching file Documentation/sched-design.txt > > It is easier to apply 2.6.20.2 on top of 2.6.20+RSDL. The .2 patch > is a one-liner that you can easily fix by hand, and I'm not even > certain that it is still required : > > --- ./kernel/sched.c.orig 2007-03-10 13:03:51 +0100 > +++ ./kernel/sched.c 2007-03-10 13:08:02 +0100 > @@ -3544,7 +3544,7 @@ > next = list_entry(queue->next, struct task_struct, run_list); > } > > - if (dependent_sleeper(cpu, rq, next)) > + if (rq->nr_running == 1 && dependent_sleeper(cpu, rq, next)) > next = rq->idle; > switch_tasks: > if (next == rq->idle) > > BTW, Con, I think that you should base your work on 2.6.20.[23] and not > 2.6.20 next time, due to this conflict. It will get wider adoption. Gotcha. This bugfix for 2.6.20.2 was controversial anyway so it probably wont hurt if you dont apply it. Has anyone had any trouble with RSDL on the stable kernels (ie not -mm)? -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Sunday 11 March 2007 10:34, Con Kolivas wrote: > On Sunday 11 March 2007 05:21, Mark Lord wrote: > > Con Kolivas wrote: > > > On Saturday 10 March 2007 05:07, Mark Lord wrote: > > >> Mmm.. when it's good, it's *really* good. > > >> My desktop feels snappier and all of that. > > > > > >.. > > > > > >> But when it's bad, it stinks. > > >> Like when a "make -j2" kernel rebuild is happening in a background > > >> window > > > > > > And that's bad. When you say "it stinks" is it more than 3 times > > > slower? It should be precisely 3 times slower under that load (although > > > low cpu using things like audio wont be affected by running 3 times > > > slower). If it feels like much more than that much slower, there is a > > > bug there somewhere. > > > > Scrolling windows is incredibly jerkey, and very very sluggish > > when images are involved (eg. a large web page in firefox). > > > > > As another reader suggested, how does it run with the compile 'niced'? > > > How does it perform with make (without a -j number). > > > > Yes, it behaves itself when the "make -j2" is nice'd. > > > > >> This is on a Pentium-M 760 single-core, w/2GB SDRAM (notebook). > > > > > > What HZ are you running? Are you running a Beryl desktop? > > > > HZ==1000, NO_HZ, Kubunutu Dapper Drake distro, ATI X300 open-source X.org > > driver. > > Can you try the new version of RSDL. Assuming it doesn't oops on you it has > some accounting bugfixes which may have been biting you. Oh I just checked the mesa repo for that driver as well. It seems the r300 drivers have sched_yield in them as well, but not all components. You may be getting bitten by this too. http://webcvs.freedesktop.org/mesa/Mesa/src/mesa/drivers/dri/r300/radeon_ioctl.c?revision=1.14=markup I don't really know what the radeon and other models are so I'm not sure if it applies to your hardware; I just did a random search through the r300 directory. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Sunday 11 March 2007 05:21, Mark Lord wrote: > Con Kolivas wrote: > > On Saturday 10 March 2007 05:07, Mark Lord wrote: > >> Mmm.. when it's good, it's *really* good. > >> My desktop feels snappier and all of that. > > > >.. > > > >> But when it's bad, it stinks. > >> Like when a "make -j2" kernel rebuild is happening in a background > >> window > > > > And that's bad. When you say "it stinks" is it more than 3 times slower? > > It should be precisely 3 times slower under that load (although low cpu > > using things like audio wont be affected by running 3 times slower). If > > it feels like much more than that much slower, there is a bug there > > somewhere. > > Scrolling windows is incredibly jerkey, and very very sluggish > when images are involved (eg. a large web page in firefox). > > > As another reader suggested, how does it run with the compile 'niced'? > > How does it perform with make (without a -j number). > > Yes, it behaves itself when the "make -j2" is nice'd. > > >> This is on a Pentium-M 760 single-core, w/2GB SDRAM (notebook). > > > > What HZ are you running? Are you running a Beryl desktop? > > HZ==1000, NO_HZ, Kubunutu Dapper Drake distro, ATI X300 open-source X.org > driver. Can you try the new version of RSDL. Assuming it doesn't oops on you it has some accounting bugfixes which may have been biting you. Thanks -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Sunday 11 March 2007 04:01, James Cloos wrote: > >>>>> "Con" == Con Kolivas <[EMAIL PROTECTED]> writes: > > Con> It's sad that sched_yield is still in our graphics card drivers ... > > I just did a recursive grep(1) on my mirror of the freedesktop git > repos for sched_yield. This only checked the master branches as I > did not bother to script up something to clone each, check out all > branches in turn, and grep(1) each possibility. > > The output is just: > :; grep -r sched_yield FDO/xorg > > FDO/xorg/xserver/hw/kdrive/via/viadraw.c: sched_yield(); > FDO/xorg/driver/xf86-video-glint/src/pm2_video.c:if (sync) /* > sched_yield? */ > > Is there something else I should grep(1) for? If not, it looks as > if sched_yield(2) has been evicted from the drivers. See: http://webcvs.freedesktop.org/mesa/Mesa/src/mesa/drivers/dri/r200/r200_ioctl.c?revision=1.37=markup -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/6] Rotating Staircase DeadLine scheduler for -mm
On Sunday 11 March 2007 03:53, Nicolas Mailhot wrote: > Le dimanche 11 mars 2007 à 01:03 +1100, Con Kolivas a écrit : > > On Saturday 10 March 2007 22:49, Nicolas Mailhot wrote: > > > Oops > > > > > > ⇒ http://bugzilla.kernel.org/show_bug.cgi?id=8166 > > > > Thanks very much. I can't get your config to boot on qemu, but could you > > please try this debugging patch? It's not a patch you can really run the > > machine with but might find where the problem occurs. Specifically I'm > > looking for the warning MISSING STATIC BIT in your case. > > > > http://ck.kolivas.org/patches/crap/sched-rsdl-0.28-stuff.patch > > I attached a screenshot of the patched kernel boot Thanks. Darn the debugging didn't catch anything. Did you see any BUG during the boot earlier than that screenshot? Probably not. If you have the time I would appreciate you testing 2.6.20 with the rsdl 0.28 patch for it with a config as close to this -mm2 one as possible. http://ck.kolivas.org/patches/staircase-deadline/2.6.20-sched-rsdl-0.28.patch and see if the bug recurs please? Thanks! -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/6] Rotating Staircase DeadLine scheduler for -mm
On Saturday 10 March 2007 22:49, Nicolas Mailhot wrote: > Oops > > ⇒ http://bugzilla.kernel.org/show_bug.cgi?id=8166 Thanks very much. I can't get your config to boot on qemu, but could you please try this debugging patch? It's not a patch you can really run the machine with but might find where the problem occurs. Specifically I'm looking for the warning MISSING STATIC BIT in your case. http://ck.kolivas.org/patches/crap/sched-rsdl-0.28-stuff.patch Thanks! -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/6] Rotating Staircase DeadLine scheduler for -mm
On Saturday 10 March 2007 18:25, Con Kolivas wrote: > What follows this email is a series of patches for the RSDL cpu scheduler > as found in 2.6.21-rc3-mm1. This series is for 2.6.21-rc3-mm2 and has some > bugfixes for the issues found so far. While it is not clear that I've > attended to all the bugs, it is worth noting that a complete rewrite is a > teensy bit more than a trivial change shall we say ;) > > akpm it still has trouble on that monster config of yours but there's so > much else going on in this -mm I don't know what to make of it. getting ppc > to work on qemu was more work than I thought and I haven't succeeded there > yet either :| > > A rolled up patch can be found here: > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl.patch > > Patch series here: > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2/ > > and the patch series will follow. If time permits I will make available a > newer version for the older kernels soon. > > Changelog: > - Made it possible for the idle task to schedule on cpu_hotplug. > - Fixed the accounting across fork() > - Recalculation of priority for tasks that have already run this major > rotation and are now on the expired array was wrong. It has been corrected. > - The bitmap error that has been hit on some architectures was made more > verbose to make it clear that something has gone wrong, and will keep going > off if the problem persists. Oh and if it wasn't clear from this message. It still needs work to be bug free and probably shouldn't be part of -mm till I've ironed out all the oopsen. Thanks. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/6] Rotating Staircase DeadLine scheduler for -mm
On Saturday 10 March 2007 18:25, Con Kolivas wrote: What follows this email is a series of patches for the RSDL cpu scheduler as found in 2.6.21-rc3-mm1. This series is for 2.6.21-rc3-mm2 and has some bugfixes for the issues found so far. While it is not clear that I've attended to all the bugs, it is worth noting that a complete rewrite is a teensy bit more than a trivial change shall we say ;) akpm it still has trouble on that monster config of yours but there's so much else going on in this -mm I don't know what to make of it. getting ppc to work on qemu was more work than I thought and I haven't succeeded there yet either :| A rolled up patch can be found here: http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl.patch Patch series here: http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2/ and the patch series will follow. If time permits I will make available a newer version for the older kernels soon. Changelog: - Made it possible for the idle task to schedule on cpu_hotplug. - Fixed the accounting across fork() - Recalculation of priority for tasks that have already run this major rotation and are now on the expired array was wrong. It has been corrected. - The bitmap error that has been hit on some architectures was made more verbose to make it clear that something has gone wrong, and will keep going off if the problem persists. Oh and if it wasn't clear from this message. It still needs work to be bug free and probably shouldn't be part of -mm till I've ironed out all the oopsen. Thanks. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/6] Rotating Staircase DeadLine scheduler for -mm
On Saturday 10 March 2007 22:49, Nicolas Mailhot wrote: Oops ⇒ http://bugzilla.kernel.org/show_bug.cgi?id=8166 Thanks very much. I can't get your config to boot on qemu, but could you please try this debugging patch? It's not a patch you can really run the machine with but might find where the problem occurs. Specifically I'm looking for the warning MISSING STATIC BIT in your case. http://ck.kolivas.org/patches/crap/sched-rsdl-0.28-stuff.patch Thanks! -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/6] Rotating Staircase DeadLine scheduler for -mm
On Sunday 11 March 2007 03:53, Nicolas Mailhot wrote: Le dimanche 11 mars 2007 à 01:03 +1100, Con Kolivas a écrit : On Saturday 10 March 2007 22:49, Nicolas Mailhot wrote: Oops ⇒ http://bugzilla.kernel.org/show_bug.cgi?id=8166 Thanks very much. I can't get your config to boot on qemu, but could you please try this debugging patch? It's not a patch you can really run the machine with but might find where the problem occurs. Specifically I'm looking for the warning MISSING STATIC BIT in your case. http://ck.kolivas.org/patches/crap/sched-rsdl-0.28-stuff.patch I attached a screenshot of the patched kernel boot Thanks. Darn the debugging didn't catch anything. Did you see any BUG during the boot earlier than that screenshot? Probably not. If you have the time I would appreciate you testing 2.6.20 with the rsdl 0.28 patch for it with a config as close to this -mm2 one as possible. http://ck.kolivas.org/patches/staircase-deadline/2.6.20-sched-rsdl-0.28.patch and see if the bug recurs please? Thanks! -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Sunday 11 March 2007 04:01, James Cloos wrote: Con == Con Kolivas [EMAIL PROTECTED] writes: Con It's sad that sched_yield is still in our graphics card drivers ... I just did a recursive grep(1) on my mirror of the freedesktop git repos for sched_yield. This only checked the master branches as I did not bother to script up something to clone each, check out all branches in turn, and grep(1) each possibility. The output is just: :; grep -r sched_yield FDO/xorg FDO/xorg/xserver/hw/kdrive/via/viadraw.c: sched_yield(); FDO/xorg/driver/xf86-video-glint/src/pm2_video.c:if (sync) /* sched_yield? */ Is there something else I should grep(1) for? If not, it looks as if sched_yield(2) has been evicted from the drivers. See: http://webcvs.freedesktop.org/mesa/Mesa/src/mesa/drivers/dri/r200/r200_ioctl.c?revision=1.37view=markup -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Sunday 11 March 2007 05:21, Mark Lord wrote: Con Kolivas wrote: On Saturday 10 March 2007 05:07, Mark Lord wrote: Mmm.. when it's good, it's *really* good. My desktop feels snappier and all of that. .. But when it's bad, it stinks. Like when a make -j2 kernel rebuild is happening in a background window And that's bad. When you say it stinks is it more than 3 times slower? It should be precisely 3 times slower under that load (although low cpu using things like audio wont be affected by running 3 times slower). If it feels like much more than that much slower, there is a bug there somewhere. Scrolling windows is incredibly jerkey, and very very sluggish when images are involved (eg. a large web page in firefox). As another reader suggested, how does it run with the compile 'niced'? How does it perform with make (without a -j number). Yes, it behaves itself when the make -j2 is nice'd. This is on a Pentium-M 760 single-core, w/2GB SDRAM (notebook). What HZ are you running? Are you running a Beryl desktop? HZ==1000, NO_HZ, Kubunutu Dapper Drake distro, ATI X300 open-source X.org driver. Can you try the new version of RSDL. Assuming it doesn't oops on you it has some accounting bugfixes which may have been biting you. Thanks -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Sunday 11 March 2007 10:34, Con Kolivas wrote: On Sunday 11 March 2007 05:21, Mark Lord wrote: Con Kolivas wrote: On Saturday 10 March 2007 05:07, Mark Lord wrote: Mmm.. when it's good, it's *really* good. My desktop feels snappier and all of that. .. But when it's bad, it stinks. Like when a make -j2 kernel rebuild is happening in a background window And that's bad. When you say it stinks is it more than 3 times slower? It should be precisely 3 times slower under that load (although low cpu using things like audio wont be affected by running 3 times slower). If it feels like much more than that much slower, there is a bug there somewhere. Scrolling windows is incredibly jerkey, and very very sluggish when images are involved (eg. a large web page in firefox). As another reader suggested, how does it run with the compile 'niced'? How does it perform with make (without a -j number). Yes, it behaves itself when the make -j2 is nice'd. This is on a Pentium-M 760 single-core, w/2GB SDRAM (notebook). What HZ are you running? Are you running a Beryl desktop? HZ==1000, NO_HZ, Kubunutu Dapper Drake distro, ATI X300 open-source X.org driver. Can you try the new version of RSDL. Assuming it doesn't oops on you it has some accounting bugfixes which may have been biting you. Oh I just checked the mesa repo for that driver as well. It seems the r300 drivers have sched_yield in them as well, but not all components. You may be getting bitten by this too. http://webcvs.freedesktop.org/mesa/Mesa/src/mesa/drivers/dri/r300/radeon_ioctl.c?revision=1.14view=markup I don't really know what the radeon and other models are so I'm not sure if it applies to your hardware; I just did a random search through the r300 directory. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.28 for 2.6.20
On Sunday 11 March 2007 06:11, Willy Tarreau wrote: On Sat, Mar 10, 2007 at 01:09:35PM -0500, Stephen Clark wrote: Con Kolivas wrote: Here is an update for RSDL to version 0.28 Full patch: http://ck.kolivas.org/patches/staircase-deadline/2.6.20-sched-rsdl-0.28. patch Series: http://ck.kolivas.org/patches/staircase-deadline/2.6.20/ The patch to get you from 0.26 to 0.28: http://ck.kolivas.org/patches/staircase-deadline/2.6.20/sched-rsdl-0.26- 0.28.patch A similar patch and directories will be made for 2.6.21-rc3 without further announcement doesn't apply against 2.6.20.2: patch -p1 ~/2.6.20-sched-rsdl-0.28.patch --dry-run patching file include/linux/list.h patching file fs/proc/array.c patching file fs/pipe.c patching file include/linux/sched.h patching file include/asm-generic/bitops/sched.h patching file include/asm-s390/bitops.h patching file kernel/sched.c Hunk #41 FAILED at 3531. 1 out of 62 hunks FAILED -- saving rejects to file kernel/sched.c.rej patching file include/linux/init_task.h patching file Documentation/sched-design.txt It is easier to apply 2.6.20.2 on top of 2.6.20+RSDL. The .2 patch is a one-liner that you can easily fix by hand, and I'm not even certain that it is still required : --- ./kernel/sched.c.orig 2007-03-10 13:03:51 +0100 +++ ./kernel/sched.c 2007-03-10 13:08:02 +0100 @@ -3544,7 +3544,7 @@ next = list_entry(queue-next, struct task_struct, run_list); } - if (dependent_sleeper(cpu, rq, next)) + if (rq-nr_running == 1 dependent_sleeper(cpu, rq, next)) next = rq-idle; switch_tasks: if (next == rq-idle) BTW, Con, I think that you should base your work on 2.6.20.[23] and not 2.6.20 next time, due to this conflict. It will get wider adoption. Gotcha. This bugfix for 2.6.20.2 was controversial anyway so it probably wont hurt if you dont apply it. Has anyone had any trouble with RSDL on the stable kernels (ie not -mm)? -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
sched rsdl fix for 0.28
Here's a big bugfix for sched rsdl 0.28 --- kernel/sched.c |7 +++ 1 file changed, 7 insertions(+) Index: linux-2.6.21-rc3-mm2/kernel/sched.c === --- linux-2.6.21-rc3-mm2.orig/kernel/sched.c2007-03-11 11:04:38.0 +1100 +++ linux-2.6.21-rc3-mm2/kernel/sched.c 2007-03-11 11:05:46.0 +1100 @@ -3328,6 +3328,13 @@ static inline void rotate_runqueue_prior int new_prio_level, remaining_quota = rq_quota(rq, rq-prio_level); struct prio_array *array = rq-active; + /* +* Make sure we don't have tasks still on the active array that +* haven't run due to not preempting (merging or smp balancing) +*/ + if (find_next_bit(rq-dyn_bitmap, MAX_PRIO, MAX_RT_PRIO) + rq-prio_level) + return; if (rq-prio_level MAX_PRIO - 2) { /* Major rotation required */ struct prio_array *new_queue = rq-expired; -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL-mm 0.28
On 11/03/07, Matt Mackall [EMAIL PROTECTED] wrote: I've tested -mm2 against -mm2+noyield and -mm2+rsdl+noyield. The noyield patch simply makes the sched_yield syscall return immediately. Xorg and all tests are run at nice 0. Loads: memload: constant memcpy of 16MB buffer execload: constant re-exec of a trivial shell script forkload: constant fork and exit of a trivial shell script make -j 5: hot-cache kernel build without ccache make -j 5 ccache: hot-cache kernel build with ccache Tests: beryl - 3D window manager, wiggle windows, spin desktop, etc. galeon - web browser, rapidly scrolling long web pages by grabbing the scroll bar mp3 - XMMS on a FUSE sshfs over wireless (during all tests) terminal - responsiveness of ssh and local terminal sessions mouse - responsiveness of mouse pointer Results: great = completely smooth good = fully responsive ok = visible latency bad = becomes difficult to use (or mp3 skips) awful = make it stop, please -mm2-mm2+noyield rsdl+noyield no load berylgreat great great galeon goodgood good mp3 goodgood good terminal goodgood good mousegoodgood good memload x10 berylawful/bad great good galeon goodgood ok/good mp3 goodgood good terminal goodgood good mousegoodgood good execload x10 berylawful/bad bad/good good galeon goodbad/good ok/good mp3 goodbadgood terminal goodbad/good good mousegoodbad/good good forkload x10 berylgoodgood great galeon goodgood ok/good mp3 goodgood good terminal goodgood ok/good mousegoodgood good make -j 5 berylok good good/great galeon goodgood ok/good mp3 goodgood good terminal goodgood good mousegoodgood good make -j 5 ccache berylok good awful galeon goodgood bad mp3 goodgood bad terminal goodgood bad/ok mousegoodgood bad/ok make -j 5 real 8m1.857s8m50.659s 8m9.282s user 7m19.127s 8m3.494s 7m30.740s sys 0m30.910s 0m33.722s 0m29.542s make -j 5 ccache real 2m6.182s2m19.032s 2m1.832s user 1m39.466s 1m48.787s 1m37.250s sys 0m19.741s 0m22.993s 0m20.109s Thanks very much for that comprehensive summary and testing! There's a substantial performance hit for not yield, so we probably want to investigate alternate semantics for it. It seems reasonable for apps to say let me not hog the CPU without completely expiring them. Imagine you're in the front of the line (aka queue) and you spend a moment fumbling for your wallet. The polite thing to do is to let the next guy in front. But with the current sched_yield, you go all the way to the back of the line. Well... are you advocating we change sched_yield semantics to a gentler form? This is a cinch to implement but I know how Ingo feels about this. It will only encourage more lax coding using sched_yield instead of proper blocking (see huge arguments with the ldap people on this one who insist it's impossible not to use yield). RSDL makes most of the noyield hit back in normal make and then some with ccache. Impressive. But ccache is still destroying interactivity somehow. The ccache effect is fairly visible even with non-parallel 'make'. Ok I don't think there's any actual accounting problem here per se (although I did just recently post a bugfix for rsdl however I think that's unrelated). What I think is going on in the ccache testcase is that all the work is being offloaded to kernel threads reading/writing to/from the filesystem and the make is not getting any actual cpu time. This is worked around in mainline thanks to the testing for sleeping on uninterruptible sleep in the interactivity estimator. What I suspect is happening is kernel threads that are running nice -5 are doing all the work on make's behalf in the setting of ccache since it is mostly i/o bound. The reason for -nice values on kernel threads is questionable anyway. Can you try renicing your kernel threads all to nice 0 and see what effect that has? Obviously this doesn't need a recompile, but is simple enough to implement in kthread code as a new default. Also note I could occassionally trigger nasty multi-second pauses with -mm2+noyield under exectest that didn't show up elsewhere. That's probably a bug in the mainline scheduler. Ew. It's probably not a bug but a good example of some of the starvation scenarios we're hitting on mainline (hence the need for a rewrite ;)) Thanks! ---
Re: RSDL-mm 0.28
On Sunday 11 March 2007 14:16, Andrew Morton wrote: On Sun, 11 Mar 2007 13:28:22 +1100 Con Kolivas [EMAIL PROTECTED] wrote: Well... are you advocating we change sched_yield semantics to a gentler form? From a practical POV: our present yield() behaviour is so truly awful that it's basically always a bug to use it. This probably isn't a good thing. So yes, I do think that we should have a rethink and try to come up with behaviour which is more in accord with what application developers expect yield() to do. otoh, a) we should have done this five years ago. Instead, we've spent that time training userspace programmers to not use yield(), so perhaps there's little to be gained in changing it now. b) if we _were_ to change yield(), people would use it more, and their applications would of course suck bigtime when run on earlier 2.6 kernels. Bottom line: we've had a _lot_ of problems with the new yield() semantics. We effectively broke back-compatibility by changing its behaviour a lot, and we can't really turn around and blame application developers for that. So... I would take it that's a yes for a recommendation with respect to implementing a new yield() ? A new scheduler is as good a time as any to do it. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL-mm 0.28
On Sunday 11 March 2007 14:39, Andrew Morton wrote: On Sun, 11 Mar 2007 14:59:28 +1100 Con Kolivas [EMAIL PROTECTED] wrote: Bottom line: we've had a _lot_ of problems with the new yield() semantics. We effectively broke back-compatibility by changing its behaviour a lot, and we can't really turn around and blame application developers for that. So... I would take it that's a yes for a recommendation with respect to implementing a new yield() ? A new scheduler is as good a time as any to do it. I guess so. We'd, err, need to gather Ingo's input ;) cc'ed. Don't you hate timezones? Perhaps a suitable way of doing this would be to characterise then emulate the 2.4 behaviour. As long as it turns out to be vaguely sensible. It's really very simple. We just go the end of the current queued priority on the same array instead of swapping to the expired array; ie we do what realtime tasks currently do. It works fine here locally afaict. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
What follows this email is a patch series for the latest version of the RSDL cpu scheduler (ie v0.29). I have addressed all bugs that I am able to reproduce in this version so if some people would be kind enough to test if there are any hidden bugs or oops lurking, it would be nice to know in anticipation of putting this back in -mm. Thanks. Full patch for 2.6.21-rc3-mm2: http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.29.patch Patch series (which will follow this email): http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2/ Changelog: - Fixed the longstanding buggy bitmap problem which occurred due to swapping arrays when there were still tasks on the active array. - Fixed preemption of realtime tasks when rt prio inheritance elevated their priority. - Made kernel threads not be reniced to -5 by default - Changed sched_yield behaviour of SCHED_NORMAL (SCHED_OTHER) to resemble realtime task yielding. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 1/7] lists: add list splice tail
From: Con Kolivas [EMAIL PROTECTED] Add a list_splice_tail variant of list_splice. Patch-by: Peter Zijlstra [EMAIL PROTECTED] Signed-off-by: Con Kolivas [EMAIL PROTECTED] Cc: Ingo Molnar [EMAIL PROTECTED] Cc: Nick Piggin [EMAIL PROTECTED] Cc: Siddha, Suresh B [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- include/linux/list.h | 42 ++ 1 file changed, 42 insertions(+) Index: linux-2.6.21-rc3-mm2/include/linux/list.h === --- linux-2.6.21-rc3-mm2.orig/include/linux/list.h 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/include/linux/list.h 2007-03-11 14:47:59.0 +1100 @@ -333,6 +333,20 @@ static inline void __list_splice(struct at-prev = last; } +static inline void __list_splice_tail(struct list_head *list, + struct list_head *head) +{ + struct list_head *first = list-next; + struct list_head *last = list-prev; + struct list_head *at = head-prev; + + first-prev = at; + at-next = first; + + last-next = head; + head-prev = last; +} + /** * list_splice - join two lists * @list: the new list to add. @@ -345,6 +359,18 @@ static inline void list_splice(struct li } /** + * list_splice_tail - join two lists at one's tail + * @list: the new list to add. + * @head: the place to add it in the first list. + */ +static inline void list_splice_tail(struct list_head *list, + struct list_head *head) +{ + if (!list_empty(list)) + __list_splice_tail(list, head); +} + +/** * list_splice_init - join two lists and reinitialise the emptied list. * @list: the new list to add. * @head: the place to add it in the first list. @@ -417,6 +443,22 @@ static inline void list_splice_init_rcu( } /** + * list_splice_tail_init - join 2 lists at one's tail reinitialise emptied + * @list: the new list to add. + * @head: the place to add it in the first list. + * + * The list at @list is reinitialised + */ +static inline void list_splice_tail_init(struct list_head *list, +struct list_head *head) +{ + if (!list_empty(list)) { + __list_splice_tail(list, head); + INIT_LIST_HEAD(list); + } +} + +/** * list_entry - get the struct for this entry * @ptr: the struct list_head pointer. * @type: the type of the struct this is embedded in. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 2/7] sched: remove sleepavg from proc
From: Con Kolivas [EMAIL PROTECTED] Remove the sleep_avg field from proc output as it will be removed from the task_struct. Signed-off-by: Con Kolivas [EMAIL PROTECTED] Cc: Ingo Molnar [EMAIL PROTECTED] Cc: Nick Piggin [EMAIL PROTECTED] Cc: Siddha, Suresh B [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- fs/proc/array.c |2 -- 1 file changed, 2 deletions(-) Index: linux-2.6.21-rc3-mm2/fs/proc/array.c === --- linux-2.6.21-rc3-mm2.orig/fs/proc/array.c 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/fs/proc/array.c2007-03-11 14:47:59.0 +1100 @@ -171,7 +171,6 @@ static inline char * task_state(struct t buffer += sprintf(buffer, State:\t%s\n - SleepAVG:\t%lu%%\n Tgid:\t%d\n Pid:\t%d\n PPid:\t%d\n @@ -179,7 +178,6 @@ static inline char * task_state(struct t Uid:\t%d\t%d\t%d\t%d\n Gid:\t%d\t%d\t%d\t%d\n, get_task_state(p), - (p-sleep_avg/1024)*100/(102000/1024), p-tgid, p-pid, pid_alive(p) ? rcu_dereference(p-parent)-tgid : 0, tracer_pid, -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 3/7] sched: remove noninteractive flag
From: Con Kolivas [EMAIL PROTECTED] Remove the TASK_NONINTERACTIVE flag as it will no longer be used. Signed-off-by: Con Kolivas [EMAIL PROTECTED] Cc: Ingo Molnar [EMAIL PROTECTED] Cc: Nick Piggin [EMAIL PROTECTED] Cc: Siddha, Suresh B [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- fs/pipe.c |7 +-- include/linux/sched.h |3 +-- 2 files changed, 2 insertions(+), 8 deletions(-) Index: linux-2.6.21-rc3-mm2/fs/pipe.c === --- linux-2.6.21-rc3-mm2.orig/fs/pipe.c 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/fs/pipe.c 2007-03-11 14:47:59.0 +1100 @@ -41,12 +41,7 @@ void pipe_wait(struct pipe_inode_info *p { DEFINE_WAIT(wait); - /* -* Pipes are system-local resources, so sleeping on them -* is considered a noninteractive wait: -*/ - prepare_to_wait(pipe-wait, wait, - TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE); + prepare_to_wait(pipe-wait, wait, TASK_INTERRUPTIBLE); if (pipe-inode) mutex_unlock(pipe-inode-i_mutex); schedule(); Index: linux-2.6.21-rc3-mm2/include/linux/sched.h === --- linux-2.6.21-rc3-mm2.orig/include/linux/sched.h 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/include/linux/sched.h 2007-03-11 14:47:59.0 +1100 @@ -150,8 +150,7 @@ extern unsigned long weighted_cpuload(co #define EXIT_ZOMBIE16 #define EXIT_DEAD 32 /* in tsk-state again */ -#define TASK_NONINTERACTIVE64 -#define TASK_DEAD 128 +#define TASK_DEAD 64 #define __set_task_state(tsk, state_value) \ do { (tsk)-state = (state_value); } while (0) -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 4/7] sched: implement 180 bit sched bitmap
From: Con Kolivas [EMAIL PROTECTED] Modify the sched_find_first_bit function to work on a 180bit long bitmap. Signed-off-by: Con Kolivas [EMAIL PROTECTED] Cc: Ingo Molnar [EMAIL PROTECTED] Cc: Nick Piggin [EMAIL PROTECTED] Cc: Siddha, Suresh B [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- include/asm-generic/bitops/sched.h | 10 ++ include/asm-s390/bitops.h | 12 +--- 2 files changed, 7 insertions(+), 15 deletions(-) Index: linux-2.6.21-rc3-mm2/include/asm-generic/bitops/sched.h === --- linux-2.6.21-rc3-mm2.orig/include/asm-generic/bitops/sched.h 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/include/asm-generic/bitops/sched.h 2007-03-11 14:47:59.0 +1100 @@ -6,8 +6,8 @@ /* * Every architecture must define this function. It's the fastest - * way of searching a 140-bit bitmap where the first 100 bits are - * unlikely to be set. It's guaranteed that at least one of the 140 + * way of searching a 180-bit bitmap where the first 100 bits are + * unlikely to be set. It's guaranteed that at least one of the 180 * bits is cleared. */ static inline int sched_find_first_bit(const unsigned long *b) @@ -15,7 +15,7 @@ static inline int sched_find_first_bit(c #if BITS_PER_LONG == 64 if (unlikely(b[0])) return __ffs(b[0]); - if (likely(b[1])) + if (b[1]) return __ffs(b[1]) + 64; return __ffs(b[2]) + 128; #elif BITS_PER_LONG == 32 @@ -27,7 +27,9 @@ static inline int sched_find_first_bit(c return __ffs(b[2]) + 64; if (b[3]) return __ffs(b[3]) + 96; - return __ffs(b[4]) + 128; + if (b[4]) + return __ffs(b[4]) + 128; + return __ffs(b[5]) + 160; #else #error BITS_PER_LONG not defined #endif Index: linux-2.6.21-rc3-mm2/include/asm-s390/bitops.h === --- linux-2.6.21-rc3-mm2.orig/include/asm-s390/bitops.h 2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/include/asm-s390/bitops.h 2007-03-11 14:47:59.0 +1100 @@ -729,17 +729,7 @@ find_next_bit (const unsigned long * add return offset + find_first_bit(p, size); } -/* - * Every architecture must define this function. It's the fastest - * way of searching a 140-bit bitmap where the first 100 bits are - * unlikely to be set. It's guaranteed that at least one of the 140 - * bits is cleared. - */ -static inline int sched_find_first_bit(unsigned long *b) -{ - return find_first_bit(b, 140); -} - +#include asm-generic/bitops/sched.h #include asm-generic/bitops/ffs.h #include asm-generic/bitops/fls.h -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 5/7] sched dont renice kernel threads
The practice of renicing kernel threads to negative nice values is of questionable benefit at best, and at worst leads to larger latencies when kernel threads are busy on behalf of other tasks. Signed-off-by: Con Kolivas [EMAIL PROTECTED] --- kernel/workqueue.c |1 - 1 file changed, 1 deletion(-) Index: linux-2.6.21-rc3-mm2/kernel/workqueue.c === --- linux-2.6.21-rc3-mm2.orig/kernel/workqueue.c2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/kernel/workqueue.c 2007-03-11 14:47:59.0 +1100 @@ -294,7 +294,6 @@ static int worker_thread(void *__cwq) if (!cwq-wq-freezeable) current-flags |= PF_NOFREEZE; - set_user_nice(current, -5); /* * We inherited MPOL_INTERLEAVE from the booting kernel. * Set MPOL_DEFAULT to insure node local allocations. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 7/7] sched: document rsdl cpu scheduler
From: Con Kolivas [EMAIL PROTECTED] Add comprehensive documentation of the RSDL cpu scheduler design. Signed-off-by: Con Kolivas [EMAIL PROTECTED] Cc: Ingo Molnar [EMAIL PROTECTED] Cc: Nick Piggin [EMAIL PROTECTED] Cc: Siddha, Suresh B [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- Documentation/sched-design.txt | 273 - 1 file changed, 267 insertions(+), 6 deletions(-) Index: linux-2.6.21-rc3-mm2/Documentation/sched-design.txt === --- linux-2.6.21-rc3-mm2.orig/Documentation/sched-design.txt2007-03-11 14:47:57.0 +1100 +++ linux-2.6.21-rc3-mm2/Documentation/sched-design.txt 2007-03-11 14:48:00.0 +1100 @@ -1,11 +1,14 @@ - Goals, Design and Implementation of the - new ultra-scalable O(1) scheduler + Goals, Design and Implementation of the ultra-scalable O(1) scheduler by + Ingo Molnar and the Rotating Staircase Deadline cpu scheduler policy + designed by Con Kolivas. - This is an edited version of an email Ingo Molnar sent to - lkml on 4 Jan 2002. It describes the goals, design, and - implementation of Ingo's new ultra-scalable O(1) scheduler. - Last Updated: 18 April 2002. + This was originally an edited version of an email Ingo Molnar sent to + lkml on 4 Jan 2002. It describes the goals, design, and implementation + of Ingo's ultra-scalable O(1) scheduler. It now contains a description + of the Rotating Staircase Deadline priority scheduler that was built on + this design. + Last Updated: Sun Feb 25 2007 Goal @@ -163,3 +166,261 @@ certain code paths and data constructs. code is smaller than the old one. Ingo + + +Rotating Staircase Deadline cpu scheduler policy + + +Design summary +== + +A novel design which incorporates a foreground-background descending priority +system (the staircase) with runqueue managed minor and major epochs (rotation +and deadline). + + +Features + + +A starvation free, strict fairness O(1) scalable design with interactivity +as good as the above restrictions can provide. There is no interactivity +estimator, no sleep/run measurements and only simple fixed accounting. +The design has strict enough a design and accounting that task behaviour +can be modelled and maximum scheduling latencies can be predicted by +the virtual deadline mechanism that manages runqueues. The prime concern +in this design is to maintain fairness at all costs determined by nice level, +yet to maintain as good interactivity as can be allowed within the +constraints of strict fairness. + + +Design description +== + +RSDL works off the principle of providing each task a quota of runtime that +it is allowed to run at each priority level equal to its static priority +(ie. its nice level) and every priority below that. When each task is queued, +the cpu that it is queued onto also keeps a record of that quota. If the +task uses up its quota it is decremented one priority level. Also, if the cpu +notices a quota full has been used for that priority level, it pushes +everything remaining at that priority level to the next lowest priority +level. Once every runtime quota has been consumed of every priority level, +a task is queued on the expired array. When no other tasks exist with +quota, the expired array is activated and fresh quotas are handed out. This +is all done in O(1). + + +Design details +== + +Each cpu has its own runqueue which micromanages its own epochs, and each +task keeps a record of its own entitlement of cpu time. Most of the rest +of these details apply to non-realtime tasks as rt task management is +straight forward. + +Each runqueue keeps a record of what major epoch it is up to in the +rq-prio_rotation field which is incremented on each major epoch. It also +keeps a record of quota available to each priority value valid for that +major epoch in rq-prio_quota[]. + +Each task keeps a record of what major runqueue epoch it was last running +on in p-rotation. It also keeps a record of what priority levels it has +already been allocated quota from during this epoch in a bitmap p-bitmap. + +The only tunable that determines all other details is the RR_INTERVAL. This +is set to 6ms (minimum on 1000HZ, higher at different HZ values). + +All tasks are initially given a quota based on RR_INTERVAL. This is equal to +RR_INTERVAL between nice values of 0 and 19, and progressively larger for +nice values from -1 to -20. This is assigned to p-quota and only changes +with changes in nice level. + +As a task is first queued, it checks in recalc_task_prio to see if it has +run at this runqueue's current priority rotation. If it has not, it will +have its p-prio level set to equal its p-static_prio (nice level) and will +be given a p-time_slice equal to the p-quota, and has its allocation +bitmap
Re: RSDL-mm 0.28
On Sunday 11 March 2007 15:03, Matt Mackall wrote: On Sat, Mar 10, 2007 at 10:01:32PM -0600, Matt Mackall wrote: On Sun, Mar 11, 2007 at 01:28:22PM +1100, Con Kolivas wrote: Ok I don't think there's any actual accounting problem here per se (although I did just recently post a bugfix for rsdl however I think that's unrelated). What I think is going on in the ccache testcase is that all the work is being offloaded to kernel threads reading/writing to/from the filesystem and the make is not getting any actual cpu time. I don't see significant system time while this is happening. Also, it's running pretty much entirely out of page cache so there wouldn't be a whole lot for kernel threads to do. Well I can't reproduce that behaviour here at all whether from disk or the pagecache with ccache, so I'm not entirely sure what's different at your end. However both you and the other person reporting bad behaviour were using ATI drivers. That's about the only commonality? I wonder if they do need to yield... somewhat instead of not at all. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RSDL v0.28 for 2.6.20
Here is an update for RSDL to version 0.28 Full patch: http://ck.kolivas.org/patches/staircase-deadline/2.6.20-sched-rsdl-0.28.patch Series: http://ck.kolivas.org/patches/staircase-deadline/2.6.20/ The patch to get you from 0.26 to 0.28: http://ck.kolivas.org/patches/staircase-deadline/2.6.20/sched-rsdl-0.26-0.28.patch A similar patch and directories will be made for 2.6.21-rc3 without further announcement -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 6/6] sched: document rsdl cpu scheduler
From: Con Kolivas <[EMAIL PROTECTED]> Add comprehensive documentation of the RSDL cpu scheduler design. Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> Cc: Nick Piggin <[EMAIL PROTECTED]> Cc: "Siddha, Suresh B" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- Documentation/sched-design.txt | 273 ++- 1 files changed, 267 insertions(+), 6 deletions(-) diff -puN Documentation/sched-design.txt~sched-document-rsdl-cpu-scheduler Documentation/sched-design.txt --- a/Documentation/sched-design.txt~sched-document-rsdl-cpu-scheduler +++ a/Documentation/sched-design.txt @@ -1,11 +1,14 @@ - Goals, Design and Implementation of the - new ultra-scalable O(1) scheduler + Goals, Design and Implementation of the ultra-scalable O(1) scheduler by + Ingo Molnar and the Rotating Staircase Deadline cpu scheduler policy + designed by Con Kolivas. - This is an edited version of an email Ingo Molnar sent to - lkml on 4 Jan 2002. It describes the goals, design, and - implementation of Ingo's new ultra-scalable O(1) scheduler. - Last Updated: 18 April 2002. + This was originally an edited version of an email Ingo Molnar sent to + lkml on 4 Jan 2002. It describes the goals, design, and implementation + of Ingo's ultra-scalable O(1) scheduler. It now contains a description + of the Rotating Staircase Deadline priority scheduler that was built on + this design. + Last Updated: Sun Feb 25 2007 Goal @@ -163,3 +166,261 @@ certain code paths and data constructs. code is smaller than the old one. Ingo + + +Rotating Staircase Deadline cpu scheduler policy + + +Design summary +== + +A novel design which incorporates a foreground-background descending priority +system (the staircase) with runqueue managed minor and major epochs (rotation +and deadline). + + +Features + + +A starvation free, strict fairness O(1) scalable design with interactivity +as good as the above restrictions can provide. There is no interactivity +estimator, no sleep/run measurements and only simple fixed accounting. +The design has strict enough a design and accounting that task behaviour +can be modelled and maximum scheduling latencies can be predicted by +the virtual deadline mechanism that manages runqueues. The prime concern +in this design is to maintain fairness at all costs determined by nice level, +yet to maintain as good interactivity as can be allowed within the +constraints of strict fairness. + + +Design description +== + +RSDL works off the principle of providing each task a quota of runtime that +it is allowed to run at each priority level equal to its static priority +(ie. its nice level) and every priority below that. When each task is queued, +the cpu that it is queued onto also keeps a record of that quota. If the +task uses up its quota it is decremented one priority level. Also, if the cpu +notices a quota full has been used for that priority level, it pushes +everything remaining at that priority level to the next lowest priority +level. Once every runtime quota has been consumed of every priority level, +a task is queued on the "expired" array. When no other tasks exist with +quota, the expired array is activated and fresh quotas are handed out. This +is all done in O(1). + + +Design details +== + +Each cpu has its own runqueue which micromanages its own epochs, and each +task keeps a record of its own entitlement of cpu time. Most of the rest +of these details apply to non-realtime tasks as rt task management is +straight forward. + +Each runqueue keeps a record of what major epoch it is up to in the +rq->prio_rotation field which is incremented on each major epoch. It also +keeps a record of quota available to each priority value valid for that +major epoch in rq->prio_quota[]. + +Each task keeps a record of what major runqueue epoch it was last running +on in p->rotation. It also keeps a record of what priority levels it has +already been allocated quota from during this epoch in a bitmap p->bitmap. + +The only tunable that determines all other details is the RR_INTERVAL. This +is set to 6ms (minimum on 1000HZ, higher at different HZ values). + +All tasks are initially given a quota based on RR_INTERVAL. This is equal to +RR_INTERVAL between nice values of 0 and 19, and progressively larger for +nice values from -1 to -20. This is assigned to p->quota and only changes +with changes in nice level. + +As a task is first queued, it checks in recalc_task_prio to see if it has +run at this runqueue's current priority rotation. If it has not, it will +have its p->prio level set to equal its p->static_prio (nice level) and will +be given a p->time_slice equal to the p->quota, and has its allocation +bitmap bit se
[PATCH][RSDL-mm 4/6] sched implement 180 bit sched bitmap
From: Con Kolivas <[EMAIL PROTECTED]> Modify the sched_find_first_bit function to work on a 180bit long bitmap. Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> Cc: Nick Piggin <[EMAIL PROTECTED]> Cc: "Siddha, Suresh B" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- include/asm-generic/bitops/sched.h | 10 ++ include/asm-s390/bitops.h | 12 +--- 2 files changed, 7 insertions(+), 15 deletions(-) Index: linux-2.6.21-rc3-mm2/include/asm-generic/bitops/sched.h === --- linux-2.6.21-rc3-mm2.orig/include/asm-generic/bitops/sched.h 2006-11-30 11:30:41.0 +1100 +++ linux-2.6.21-rc3-mm2/include/asm-generic/bitops/sched.h 2007-03-10 13:54:18.0 +1100 @@ -6,8 +6,8 @@ /* * Every architecture must define this function. It's the fastest - * way of searching a 140-bit bitmap where the first 100 bits are - * unlikely to be set. It's guaranteed that at least one of the 140 + * way of searching a 180-bit bitmap where the first 100 bits are + * unlikely to be set. It's guaranteed that at least one of the 180 * bits is cleared. */ static inline int sched_find_first_bit(const unsigned long *b) @@ -15,7 +15,7 @@ static inline int sched_find_first_bit(c #if BITS_PER_LONG == 64 if (unlikely(b[0])) return __ffs(b[0]); - if (likely(b[1])) + if (b[1]) return __ffs(b[1]) + 64; return __ffs(b[2]) + 128; #elif BITS_PER_LONG == 32 @@ -27,7 +27,9 @@ static inline int sched_find_first_bit(c return __ffs(b[2]) + 64; if (b[3]) return __ffs(b[3]) + 96; - return __ffs(b[4]) + 128; + if (b[4]) + return __ffs(b[4]) + 128; + return __ffs(b[5]) + 160; #else #error BITS_PER_LONG not defined #endif Index: linux-2.6.21-rc3-mm2/include/asm-s390/bitops.h === --- linux-2.6.21-rc3-mm2.orig/include/asm-s390/bitops.h 2006-11-30 11:30:41.0 +1100 +++ linux-2.6.21-rc3-mm2/include/asm-s390/bitops.h 2007-03-10 13:54:18.0 +1100 @@ -729,17 +729,7 @@ find_next_bit (const unsigned long * add return offset + find_first_bit(p, size); } -/* - * Every architecture must define this function. It's the fastest - * way of searching a 140-bit bitmap where the first 100 bits are - * unlikely to be set. It's guaranteed that at least one of the 140 - * bits is cleared. - */ -static inline int sched_find_first_bit(unsigned long *b) -{ - return find_first_bit(b, 140); -} - +#include #include #include -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 3/6] sched: remove noninteractive flag
From: Con Kolivas <[EMAIL PROTECTED]> Remove the TASK_NONINTERACTIVE flag as it will no longer be used. Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> Cc: Nick Piggin <[EMAIL PROTECTED]> Cc: "Siddha, Suresh B" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- fs/pipe.c |7 +-- include/linux/sched.h |3 +-- 2 files changed, 2 insertions(+), 8 deletions(-) Index: linux-2.6.21-rc3-mm2/fs/pipe.c === --- linux-2.6.21-rc3-mm2.orig/fs/pipe.c 2007-03-07 21:20:36.0 +1100 +++ linux-2.6.21-rc3-mm2/fs/pipe.c 2007-03-10 13:54:18.0 +1100 @@ -41,12 +41,7 @@ void pipe_wait(struct pipe_inode_info *p { DEFINE_WAIT(wait); - /* -* Pipes are system-local resources, so sleeping on them -* is considered a noninteractive wait: -*/ - prepare_to_wait(>wait, , - TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE); + prepare_to_wait(>wait, , TASK_INTERRUPTIBLE); if (pipe->inode) mutex_unlock(>inode->i_mutex); schedule(); Index: linux-2.6.21-rc3-mm2/include/linux/sched.h === --- linux-2.6.21-rc3-mm2.orig/include/linux/sched.h 2007-03-08 22:03:18.0 +1100 +++ linux-2.6.21-rc3-mm2/include/linux/sched.h 2007-03-10 13:54:18.0 +1100 @@ -150,8 +150,7 @@ extern unsigned long weighted_cpuload(co #define EXIT_ZOMBIE16 #define EXIT_DEAD 32 /* in tsk->state again */ -#define TASK_NONINTERACTIVE64 -#define TASK_DEAD 128 +#define TASK_DEAD 64 #define __set_task_state(tsk, state_value) \ do { (tsk)->state = (state_value); } while (0) -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 2/6] sched: remove sleepavg from proc
From: Con Kolivas <[EMAIL PROTECTED]> Remove the sleep_avg field from proc output as it will be removed from the task_struct. Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> Cc: Nick Piggin <[EMAIL PROTECTED]> Cc: "Siddha, Suresh B" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- fs/proc/array.c |2 -- 1 file changed, 2 deletions(-) Index: linux-2.6.21-rc3-mm2/fs/proc/array.c === --- linux-2.6.21-rc3-mm2.orig/fs/proc/array.c 2007-03-08 22:03:17.0 +1100 +++ linux-2.6.21-rc3-mm2/fs/proc/array.c2007-03-10 13:54:13.0 +1100 @@ -171,7 +171,6 @@ static inline char * task_state(struct t buffer += sprintf(buffer, "State:\t%s\n" - "SleepAVG:\t%lu%%\n" "Tgid:\t%d\n" "Pid:\t%d\n" "PPid:\t%d\n" @@ -179,7 +178,6 @@ static inline char * task_state(struct t "Uid:\t%d\t%d\t%d\t%d\n" "Gid:\t%d\t%d\t%d\t%d\n", get_task_state(p), - (p->sleep_avg/1024)*100/(102000/1024), p->tgid, p->pid, pid_alive(p) ? rcu_dereference(p->parent)->tgid : 0, tracer_pid, -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 1/6] lists: add list splice tail
From: Con Kolivas <[EMAIL PROTECTED]> Add a list_splice_tail variant of list_splice. Patch-by: Peter Zijlstra <[EMAIL PROTECTED]> Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> Cc: Nick Piggin <[EMAIL PROTECTED]> Cc: "Siddha, Suresh B" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- include/linux/list.h | 42 ++ 1 file changed, 42 insertions(+) Index: linux-2.6.21-rc3-mm2/include/linux/list.h === --- linux-2.6.21-rc3-mm2.orig/include/linux/list.h 2007-03-08 22:03:18.0 +1100 +++ linux-2.6.21-rc3-mm2/include/linux/list.h 2007-03-10 13:41:56.0 +1100 @@ -333,6 +333,20 @@ static inline void __list_splice(struct at->prev = last; } +static inline void __list_splice_tail(struct list_head *list, + struct list_head *head) +{ + struct list_head *first = list->next; + struct list_head *last = list->prev; + struct list_head *at = head->prev; + + first->prev = at; + at->next = first; + + last->next = head; + head->prev = last; +} + /** * list_splice - join two lists * @list: the new list to add. @@ -345,6 +359,18 @@ static inline void list_splice(struct li } /** + * list_splice_tail - join two lists at one's tail + * @list: the new list to add. + * @head: the place to add it in the first list. + */ +static inline void list_splice_tail(struct list_head *list, + struct list_head *head) +{ + if (!list_empty(list)) + __list_splice_tail(list, head); +} + +/** * list_splice_init - join two lists and reinitialise the emptied list. * @list: the new list to add. * @head: the place to add it in the first list. @@ -417,6 +443,22 @@ static inline void list_splice_init_rcu( } /** + * list_splice_tail_init - join 2 lists at one's tail & reinitialise emptied + * @list: the new list to add. + * @head: the place to add it in the first list. + * + * The list at @list is reinitialised + */ +static inline void list_splice_tail_init(struct list_head *list, +struct list_head *head) +{ + if (!list_empty(list)) { + __list_splice_tail(list, head); + INIT_LIST_HEAD(list); + } +} + +/** * list_entry - get the struct for this entry * @ptr: the list_head pointer. * @type: the type of the struct this is embedded in. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RSDL-mm 0/6] Rotating Staircase DeadLine scheduler for -mm
What follows this email is a series of patches for the RSDL cpu scheduler as found in 2.6.21-rc3-mm1. This series is for 2.6.21-rc3-mm2 and has some bugfixes for the issues found so far. While it is not clear that I've attended to all the bugs, it is worth noting that a complete rewrite is a teensy bit more than a trivial change shall we say ;) akpm it still has trouble on that monster config of yours but there's so much else going on in this -mm I don't know what to make of it. getting ppc to work on qemu was more work than I thought and I haven't succeeded there yet either :| A rolled up patch can be found here: http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl.patch Patch series here: http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2/ and the patch series will follow. If time permits I will make available a newer version for the older kernels soon. Changelog: - Made it possible for the idle task to schedule on cpu_hotplug. - Fixed the accounting across fork() - Recalculation of priority for tasks that have already run this major rotation and are now on the expired array was wrong. It has been corrected. - The bitmap error that has been hit on some architectures was made more verbose to make it clear that something has gone wrong, and will keep going off if the problem persists. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 13:26, Matt Mackall wrote: > On Sat, Mar 10, 2007 at 01:20:22PM +1100, Con Kolivas wrote: > > Progress at last! And without any patches! Well those look very > > reasonable to me. Especially since -j5 is a worst case scenario. > > Well that's with a noyield patch and your sched_tick fix. > > > But would you say it's still _adequate_ with ccache considering you > > only have 1/6th cpu left for X? With and without ccache it's quite a > > different workload so they will behave differently. > > No, I don't think 1/6th is being left for X in the ccache case so I > think there's a bug lurking here. My memload, execload, and forkload > test cases did better even with X niced. > > To confirm, I've just run 15 instances of memload with unniced Xorg > and it performs better than make -j 5 with ccache. > > If I have some time tomorrow, I'll try to do a straight -mm1 to mm2 > comparison with different loads. Great, thanks very much for all that. I've found a few subtle bugs in the process and some that haven't made it to the list either. I'll respin a set of patches against -mm2 with the changes shortly. Thanks! -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 12:42, Matt Mackall wrote: > On Sat, Mar 10, 2007 at 12:28:38PM +1100, Con Kolivas wrote: > > On Saturday 10 March 2007 11:49, Matt Mackall wrote: > > > On Sat, Mar 10, 2007 at 11:34:26AM +1100, Con Kolivas wrote: > > > > Ok, so some of the basics then. Can you please give me the output of > > > > 'top -b' running for a few seconds during the whole affair? > > > > > > Here you go: > > > > > > http://selenic.com/baseline > > > http://selenic.com/underload > > > > > > This is with 2.6.20+rsdl+tickfix at HZ=250. > > > > > > Something I haven't mentioned about my setup is that I'm using ccache. > > > And it turns out disabling ccache makes a large difference. Going to > > > switch back to a NO_HZ kernel and see what that looks like. > > > > Your X is reniced to -10 so try again with X nice 0 please. > > Doh, can't believe I didn't notice that. That's apparently a default > in Debian/unstable (not sure where to tweak it). See other email from Kyle on how to dpkg reconfigure. I submitted a bug report to debian years ago about this and I presume it was fixed but you've probably slowly dist upgraded from an older version and it stayed in your config? > Reniced: > > without ccachewith ccache > make -j 5 > beryl good ok > galeon ok/good ok > mp3 good good > terminalgood ok > mouse good ok Progress at last! And without any patches! Well those look very reasonable to me. Especially since -j5 is a worst case scenario. > We're still left with a big unexplained ccache differential, But would you say it's still _adequate_ with ccache considering you only have 1/6th cpu left for X? With and without ccache it's quite a different workload so they will behave differently. > and a big > NO_HZ vs HZ=250 differential. That part I don't know about. You've only tested the difference with X running nice -10. I need to look further at the mechanism for -nice tasks. It should be possible to run smoothly even with a -niced X (although that was never my intent) so perhaps that's not working properly. I'll look into that. Thanks! -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 11:49, Matt Mackall wrote: > On Sat, Mar 10, 2007 at 11:34:26AM +1100, Con Kolivas wrote: > > Ok, so some of the basics then. Can you please give me the output of 'top > > -b' running for a few seconds during the whole affair? > > Here you go: > > http://selenic.com/baseline > http://selenic.com/underload > > This is with 2.6.20+rsdl+tickfix at HZ=250. > > Something I haven't mentioned about my setup is that I'm using ccache. > And it turns out disabling ccache makes a large difference. Going to > switch back to a NO_HZ kernel and see what that looks like. Your X is reniced to -10 so try again with X nice 0 please. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 09:12, Con Kolivas wrote: > On Saturday 10 March 2007 08:57, Willy Tarreau wrote: > > On Fri, Mar 09, 2007 at 03:39:59PM -0600, Matt Mackall wrote: > > > On Sat, Mar 10, 2007 at 08:19:18AM +1100, Con Kolivas wrote: > > > > On Saturday 10 March 2007 08:07, Con Kolivas wrote: > > > > > On Saturday 10 March 2007 07:46, Matt Mackall wrote: > > > 5x memload: good > > > 5x execload: good > > > 5x forkload: good > > > 5 parallel makes: mostly good > > > make -j 5: bad > > > > > > So what's different between makes in parallel and make -j 5? Make's > > > job server uses pipe I/O to control how many jobs are running. > > > > Matt, could you check with plain 2.6.20 + Con's patch ? It is possible > > that he added bugs when porting to -mm, or that someting in -mm causes > > the trouble. Your experience with -mm seems so much different from mine > > with mainline, there must be a difference somewhere ! > > Good idea. It's all very odd Matt. It really isn't behaving anything like you describe for myself or others. It sounds more like a real bug than what the design would do at all. The only things that are different on yours is Beryl and a different graphics card. When you're comparing to mainline are you comparing -mm1 to -mm2 to ensure something else from -mm isn't responsible? Also have you tried rsdl on 2.6.20 as Willy suggested? I would really love to get to the bottom of this as it really shouldn't behave that way under load no matter how the load is dished out. Thanks! -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 09:29, Matt Mackall wrote: > On Sat, Mar 10, 2007 at 09:18:05AM +1100, Con Kolivas wrote: > > On Saturday 10 March 2007 08:57, Con Kolivas wrote: > > > On Saturday 10 March 2007 08:39, Matt Mackall wrote: > > > > On Sat, Mar 10, 2007 at 08:19:18AM +1100, Con Kolivas wrote: > > > > > On Saturday 10 March 2007 08:07, Con Kolivas wrote: > > > > > > On Saturday 10 March 2007 07:46, Matt Mackall wrote: > > > > > > > My suspicion is the problem lies in giving too much quanta to > > > > > > > newly-started processes. > > > > > > > > > > > > Ah that's some nice detective work there. Mainline does some > > > > > > rather complex accounting on sched_fork including (possibly) a > > > > > > whole timer tick which rsdl does not do. make forks off > > > > > > continuously so what you say may well be correct. I'll see if I > > > > > > can try to revert to the mainline behaviour in sched_fork (which > > > > > > was obviously there for a reason). > > > > > > > > > > Wow! Thanks Matt. You've found a real bug too. This seems to fix > > > > > the qemu misbehaviour and bitmap errors so far too! Now can you > > > > > please try this to see if it fixes your problem? > > > > > > > > Sorry, it's about the same. I now suspect an accounting glitch > > > > involving pipe wake-ups. > > > > > > > > 5x memload: good > > > > 5x execload: good > > > > 5x forkload: good > > > > 5 parallel makes: mostly good > > > > make -j 5: bad > > > > > > > > So what's different between makes in parallel and make -j 5? Make's > > > > job server uses pipe I/O to control how many jobs are running. > > > > > > Hmm it must be those deep pipes again then. I removed any quirks > > > testing for those from mainline as I suspected it would be ok. Guess > > > I"m wrong. > > > > I shouldn't blame this straight up though if NO_HZ makes it better. > > Something else is going wrong... wtf though? > > Just so we're clear, dynticks has only 'fixed' the single non-parallel > make load so far. Ok, so some of the basics then. Can you please give me the output of 'top -b' running for a few seconds during the whole affair? Thanks very much for your testing so far! -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 10:06, Matt Mackall wrote: > On Sat, Mar 10, 2007 at 10:02:37AM +1100, Con Kolivas wrote: > > On Saturday 10 March 2007 09:29, Matt Mackall wrote: > > > On Sat, Mar 10, 2007 at 09:18:05AM +1100, Con Kolivas wrote: > > > > On Saturday 10 March 2007 08:57, Con Kolivas wrote: > > > > > On Saturday 10 March 2007 08:39, Matt Mackall wrote: > > > > > > So what's different between makes in parallel and make -j 5? > > > > > > Make's job server uses pipe I/O to control how many jobs are > > > > > > running. > > > > > > > > > > Hmm it must be those deep pipes again then. I removed any quirks > > > > > testing for those from mainline as I suspected it would be ok. > > > > > Guess I"m wrong. > > > > > > > > I shouldn't blame this straight up though if NO_HZ makes it better. > > > > Something else is going wrong... wtf though? > > > > > > Just so we're clear, dynticks has only 'fixed' the single non-parallel > > > make load so far. > > > > Ok, back to the pipe idea. Without needing a kernel recompile, can you > > try running the make -j5 as a SCHED_BATCH task? > > Seems the same. > > Oddly, nice make -j 5 is better than batch (but not quite up to stock). Shouldn't be odd. SCHED_BATCH (as Ingo implemented it which is what I'm trying to reproduce for RSDL) is meant to give the same cpu as the same nice level, but not give low latency. Nice on the other hand will give much less cpu. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 09:29, Matt Mackall wrote: > On Sat, Mar 10, 2007 at 09:18:05AM +1100, Con Kolivas wrote: > > On Saturday 10 March 2007 08:57, Con Kolivas wrote: > > > On Saturday 10 March 2007 08:39, Matt Mackall wrote: > > > > So what's different between makes in parallel and make -j 5? Make's > > > > job server uses pipe I/O to control how many jobs are running. > > > > > > Hmm it must be those deep pipes again then. I removed any quirks > > > testing for those from mainline as I suspected it would be ok. Guess > > > I"m wrong. > > > > I shouldn't blame this straight up though if NO_HZ makes it better. > > Something else is going wrong... wtf though? > > Just so we're clear, dynticks has only 'fixed' the single non-parallel > make load so far. Ok, back to the pipe idea. Without needing a kernel recompile, can you try running the make -j5 as a SCHED_BATCH task? This wrapper will make it possible: http://freequaos.host.sk/schedtool/schedtool-1.2.9.tar.bz2 then schedtool -B -e make -j5 If that helps it gives me something to work with. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 08:57, Con Kolivas wrote: > On Saturday 10 March 2007 08:39, Matt Mackall wrote: > > On Sat, Mar 10, 2007 at 08:19:18AM +1100, Con Kolivas wrote: > > > On Saturday 10 March 2007 08:07, Con Kolivas wrote: > > > > On Saturday 10 March 2007 07:46, Matt Mackall wrote: > > > > > My suspicion is the problem lies in giving too much quanta to > > > > > newly-started processes. > > > > > > > > Ah that's some nice detective work there. Mainline does some rather > > > > complex accounting on sched_fork including (possibly) a whole timer > > > > tick which rsdl does not do. make forks off continuously so what you > > > > say may well be correct. I'll see if I can try to revert to the > > > > mainline behaviour in sched_fork (which was obviously there for a > > > > reason). > > > > > > Wow! Thanks Matt. You've found a real bug too. This seems to fix the > > > qemu misbehaviour and bitmap errors so far too! Now can you please try > > > this to see if it fixes your problem? > > > > Sorry, it's about the same. I now suspect an accounting glitch involving > > pipe wake-ups. > > > > 5x memload: good > > 5x execload: good > > 5x forkload: good > > 5 parallel makes: mostly good > > make -j 5: bad > > > > So what's different between makes in parallel and make -j 5? Make's > > job server uses pipe I/O to control how many jobs are running. > > Hmm it must be those deep pipes again then. I removed any quirks testing > for those from mainline as I suspected it would be ok. Guess I"m wrong. I shouldn't blame this straight up though if NO_HZ makes it better. Something else is going wrong... wtf though? -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 08:57, Willy Tarreau wrote: > On Fri, Mar 09, 2007 at 03:39:59PM -0600, Matt Mackall wrote: > > On Sat, Mar 10, 2007 at 08:19:18AM +1100, Con Kolivas wrote: > > > On Saturday 10 March 2007 08:07, Con Kolivas wrote: > > > > On Saturday 10 March 2007 07:46, Matt Mackall wrote: > > > > > My suspicion is the problem lies in giving too much quanta to > > > > > newly-started processes. > > > > > > > > Ah that's some nice detective work there. Mainline does some rather > > > > complex accounting on sched_fork including (possibly) a whole timer > > > > tick which rsdl does not do. make forks off continuously so what you > > > > say may well be correct. I'll see if I can try to revert to the > > > > mainline behaviour in sched_fork (which was obviously there for a > > > > reason). > > > > > > Wow! Thanks Matt. You've found a real bug too. This seems to fix the > > > qemu misbehaviour and bitmap errors so far too! Now can you please try > > > this to see if it fixes your problem? > > > > Sorry, it's about the same. I now suspect an accounting glitch involving > > pipe wake-ups. > > > > 5x memload: good > > 5x execload: good > > 5x forkload: good > > 5 parallel makes: mostly good > > make -j 5: bad > > > > So what's different between makes in parallel and make -j 5? Make's > > job server uses pipe I/O to control how many jobs are running. > > Matt, could you check with plain 2.6.20 + Con's patch ? It is possible > that he added bugs when porting to -mm, or that someting in -mm causes > the trouble. Your experience with -mm seems so much different from mine > with mainline, there must be a difference somewhere ! Good idea. > Con, is your patch necessary for mainline patch too ? I see that it > should apply, but sometimes -mm may justify changes. Yes it will be necessary for the mainline patch too. > Best regards, > Willy -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 08:39, Matt Mackall wrote: > On Sat, Mar 10, 2007 at 08:19:18AM +1100, Con Kolivas wrote: > > On Saturday 10 March 2007 08:07, Con Kolivas wrote: > > > On Saturday 10 March 2007 07:46, Matt Mackall wrote: > > > > My suspicion is the problem lies in giving too much quanta to > > > > newly-started processes. > > > > > > Ah that's some nice detective work there. Mainline does some rather > > > complex accounting on sched_fork including (possibly) a whole timer > > > tick which rsdl does not do. make forks off continuously so what you > > > say may well be correct. I'll see if I can try to revert to the > > > mainline behaviour in sched_fork (which was obviously there for a > > > reason). > > > > Wow! Thanks Matt. You've found a real bug too. This seems to fix the qemu > > misbehaviour and bitmap errors so far too! Now can you please try this > > to see if it fixes your problem? > > Sorry, it's about the same. I now suspect an accounting glitch involving > pipe wake-ups. > > 5x memload: good > 5x execload: good > 5x forkload: good > 5 parallel makes: mostly good > make -j 5: bad > > So what's different between makes in parallel and make -j 5? Make's > job server uses pipe I/O to control how many jobs are running. Hmm it must be those deep pipes again then. I removed any quirks testing for those from mainline as I suspected it would be ok. Guess I"m wrong. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 08:07, Con Kolivas wrote: > On Saturday 10 March 2007 07:46, Matt Mackall wrote: > > My suspicion is the problem lies in giving too much quanta to > > newly-started processes. > > Ah that's some nice detective work there. Mainline does some rather complex > accounting on sched_fork including (possibly) a whole timer tick which rsdl > does not do. make forks off continuously so what you say may well be > correct. I'll see if I can try to revert to the mainline behaviour in > sched_fork (which was obviously there for a reason). Wow! Thanks Matt. You've found a real bug too. This seems to fix the qemu misbehaviour and bitmap errors so far too! Now can you please try this to see if it fixes your problem? --- kernel/sched.c |8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) Index: linux-2.6.21-rc3-mm1/kernel/sched.c === --- linux-2.6.21-rc3-mm1.orig/kernel/sched.c2007-03-10 08:08:11.0 +1100 +++ linux-2.6.21-rc3-mm1/kernel/sched.c 2007-03-10 08:13:57.0 +1100 @@ -1560,7 +1560,7 @@ int fastcall wake_up_state(struct task_s return try_to_wake_up(p, state, 0); } -static void task_expired_entitlement(struct rq *rq, struct task_struct *p); +static void task_running_tick(struct rq *rq, struct task_struct *p); /* * Perform scheduler related setup for a newly forked process p. * p is forked by current. @@ -1621,10 +1621,8 @@ void fastcall sched_fork(struct task_str * left from its timeslice. Taking the runqueue lock is not * a problem. */ - struct rq *rq = __task_rq_lock(current); - - task_expired_entitlement(rq, current); - __task_rq_unlock(rq); + current->time_slice = 1; + task_running_tick(cpu_rq(cpu), current); } local_irq_enable(); out: -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 07:46, Matt Mackall wrote: > Ok, I've now disabled sched_yield (I'm using xorg radeon drivers). Great. > So far: > > rc2-mm2 RSDL RSDL+NO_HZ RSDL+NO_HZ+no_yield estimated CPU > no load > berylgood good great great~30% at 600MHz > galeon good good goodgood 100% at 600MHz > mp3 good good goodgood < 5% at 600MHz > terminal good good goodgood ~0 > mousegood good goodgood ~0 > make > beryl awful ok good > galeon bad ok good > mp3good goodgood > terminal bad goodgood > mouse bad goodgood It's sad that sched_yield is still in our graphics card drivers ... > make -j2 > beryl awful bad/ok > metacity bad/ok <- it's not beryl-specifc > galeon bad bad/ok > mp3good good > terminal bad bad/ok > mouse bad bad/ok > make -j5 > berylokawful awful awful/bad > galeon okbad bad bad > mp3 good good gooda couple skips > terminal okbad bad bad > mousegood bad bad bad > memload x5 > berylok/good > galeon ok/good > mp3 good > terminal ok/good > mouseok/good > > > good = no problems > ok = noticeable latency > bad = hard to use > awful = completely unusable > > By the way, make -j5 is my usual kernel compile because it gives me > the best wall time on this box. > > A priori, this load should be manageable by RSDL as the interactive > loads are all pretty small. So I wrote a little Python script that > basically continuously memcpys some 16MB chunks of memory: > > #!/usr/bin/python > a = "a" * 16 * 1024 * 1024 > while 1: > b = a[1:] + "b" > a = b[1:] + "c" > > I've got 1.5G of RAM, so I can run quite a few of these without > killing my pagecache. This should test whether a) Beryl's actually > running up against memory bandwidth issues and b) whether "simple" > static loads work. As you can see, running 5 instances of this script > leaves me in good shape still. 10 is still in "ok" territory, with top > showing each getting 9.7-10% of the CPU. 15 starts to feel sluggish. > 20 the mouse jumps a bit and I got an MP3 skip. 30 is getting pretty > bad, but still not as bad as the make -j 5 load. > > My suspicion is the problem lies in giving too much quanta to > newly-started processes. Ah that's some nice detective work there. Mainline does some rather complex accounting on sched_fork including (possibly) a whole timer tick which rsdl does not do. make forks off continuously so what you say may well be correct. I'll see if I can try to revert to the mainline behaviour in sched_fork (which was obviously there for a reason). -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 07:15, Con Kolivas wrote: > On Saturday 10 March 2007 05:27, Matt Mackall wrote: > > On Fri, Mar 09, 2007 at 07:39:05PM +1100, Con Kolivas wrote: > > > On Friday 09 March 2007 19:20, Matt Mackall wrote: > > > > And I've just rebooted with NO_HZ and things are greatly improved. At > > > > idle, Beryl effects are silky smooth (possibly better than stock) and > > > > shows less load. Under 'make', Beryl is still responsive as is > > > > Galeon. No sign of lagging mouse or typing. > > > > > > > > Under make -j 5, things are intermittent. Galeon scrolling is > > > > sometimes still responsive, but Beryl, terminals and mouse still drag > > > > quite a bit. > > > > > > I just replied before you sent this one out I think our messages passed > > > each other across the ocean somewhere. I don't quite get what > > > combination of factors you're saying here caused great improvement. Was > > > it enabling NO_HZ on mainline cpu scheduler or disabling NO_HZ or on > > > RSDL? > > > > Turning on NO_HZ on RSDL greatly improved it. I have not tried NO_HZ > > on mainline. The first test was with NO_HZ=n, the second was with > > NO_HZ=y. > > How odd. I would have thought that if an interaction was to occur it would > have been without the new feature. Clearly what you describe without NO_HZ > is not the expected behaviour with RSDL. I wonder what went wrong. Are you > on 100HZ on that laptop? While I expect 100HZ should be ok, it might just > not be... My laptop is about the same performance and works fine with 100HZ > under load of all sorts BUT I don't have Beryl (which I would have thought > swayed things in the opposite direction also). Oh and can you grep dmesg for: Scheduler bitmap error If that occurs it's not performing properly. A subtle bug that's busting my chops to try and track down. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 05:07, Mark Lord wrote: > Mmm.. when it's good, it's *really* good. > My desktop feels snappier and all of that. > > No noticeable jerkiness of windows/scrolling, > which I *do* observe with the stock scheduler. Thats good. > But when it's bad, it stinks. > Like when a "make -j2" kernel rebuild is happening in a background window And that's bad. When you say "it stinks" is it more than 3 times slower? It should be precisely 3 times slower under that load (although low cpu using things like audio wont be affected by running 3 times slower). If it feels like much more than that much slower, there is a bug there somewhere. As another reader suggested, how does it run with the compile 'niced'? How does it perform with make (without a -j number). > This is on a Pentium-M 760 single-core, w/2GB SDRAM (notebook). What HZ are you running? Are you running a Beryl desktop? > JADP (Just Another Data Point). Appreciated, thanks. > Mark -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 05:27, Matt Mackall wrote: > On Fri, Mar 09, 2007 at 07:39:05PM +1100, Con Kolivas wrote: > > On Friday 09 March 2007 19:20, Matt Mackall wrote: > > > And I've just rebooted with NO_HZ and things are greatly improved. At > > > idle, Beryl effects are silky smooth (possibly better than stock) and > > > shows less load. Under 'make', Beryl is still responsive as is Galeon. > > > No sign of lagging mouse or typing. > > > > > > Under make -j 5, things are intermittent. Galeon scrolling is > > > sometimes still responsive, but Beryl, terminals and mouse still drag > > > quite a bit. > > > > I just replied before you sent this one out I think our messages passed > > each other across the ocean somewhere. I don't quite get what combination > > of factors you're saying here caused great improvement. Was it enabling > > NO_HZ on mainline cpu scheduler or disabling NO_HZ or on RSDL? > > Turning on NO_HZ on RSDL greatly improved it. I have not tried NO_HZ > on mainline. The first test was with NO_HZ=n, the second was with > NO_HZ=y. How odd. I would have thought that if an interaction was to occur it would have been without the new feature. Clearly what you describe without NO_HZ is not the expected behaviour with RSDL. I wonder what went wrong. Are you on 100HZ on that laptop? While I expect 100HZ should be ok, it might just not be... My laptop is about the same performance and works fine with 100HZ under load of all sorts BUT I don't have Beryl (which I would have thought swayed things in the opposite direction also). > As an aside, we should not name config options NO_* or DISABLE_* > because of the potential for double negation. Case in point, I couldn't figure out what you were saying :) -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3: /proc broken
On Friday 09 March 2007 19:53, Russell King wrote: > On Fri, Mar 09, 2007 at 08:56:44AM +1100, Con Kolivas wrote: > > I did make oldconfig from http://userweb.kernel.org/~akpm/ck/config.txt > > and chose all the defaults. Then building your fat config with -rc3, 'ps' > > hangs on qemu for almost 30 seconds and then at last produces a broken > > output > > Let me guess - you have either a serial console or something like that > and you're running these commands over said serial console? > > Or you have console directed to both a serial port and the VT and you're > capturing this off the VT using gpm. > > Either way, "serial8250: too much work for irq4" is a printk which will > be displayed by the kernel when it's unable to clear down work for the > serial port within 256 loops or so of the interrupt handler; it's a > protection against the box locking up. > > It not actually contained in any of the files. Thank you very much for taking the time to explain it to me and I apologise for the false positive. It's very much due to running qemu directing everything to the serial console which is just as you say. Allowing qemu to output to graphic fixes the error. Unfortunately that also makes akpm's oops go away so I can't really reproduce it now. Perhaps the bug occurs due to interrupts being disabled for an extended time; it gives me something to look at now. Thanks! -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Friday 09 March 2007 19:20, Matt Mackall wrote: > And I've just rebooted with NO_HZ and things are greatly improved. At > idle, Beryl effects are silky smooth (possibly better than stock) and > shows less load. Under 'make', Beryl is still responsive as is Galeon. > No sign of lagging mouse or typing. > > Under make -j 5, things are intermittent. Galeon scrolling is > sometimes still responsive, but Beryl, terminals and mouse still drag > quite a bit. I just replied before you sent this one out I think our messages passed each other across the ocean somewhere. I don't quite get what combination of factors you're saying here caused great improvement. Was it enabling NO_HZ on mainline cpu scheduler or disabling NO_HZ or on RSDL? -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Friday 09 March 2007 18:53, Matt Mackall wrote: > Well then I suppose something must be broken. When my box is idle, I > can grab my desktop and spin it around and generate less than 25% CPU > with the CPU stepped all the way down from 1.7GHz to 600MHz (Beryl is > actually much snappier than many conventional window managers by doing > just about everything through GL). By comparison, grabbing the Galeon > scroll bar and wiggling it will generate 100% CPU (still throttled > though) but remain relatively smooth. > > With a single non-parallel make running (all in cache, mind you), the > system kicks up into just about 100% CPU usage at full speed. Desktop > spinning becomes between 10x to 100x slower (from ~30fps to < 1fps). > Galeon scrolling pauses for as much as a second. Mouse movement pauses > for as much as a second. Typing in terminals lags noticeably. > > This is not the expected behavior of a fair, low-latency scheduler. No indeed it does not sound right at all to me either. Last time I encountered something like this we traced it and hit sched_yield calls somewhere in the graphic pipeline. So first question is, how does mainline perform with the same testcase, and second question is umm whatever it is that is slow is there a way to trace it to see if it yields? > For reference, this was with HZ=250, PREEMPT, PREEMPT_BKL, and !NO_HZ. Ah I also wonder if it hasn't broken with NO_HZ. I haven't had a chance to even confirm that the code works properly with it, I was only assuming (after our last chat). See if turning that off makes a difference? Thanks for testing! -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Friday 09 March 2007 18:53, Matt Mackall wrote: Well then I suppose something must be broken. When my box is idle, I can grab my desktop and spin it around and generate less than 25% CPU with the CPU stepped all the way down from 1.7GHz to 600MHz (Beryl is actually much snappier than many conventional window managers by doing just about everything through GL). By comparison, grabbing the Galeon scroll bar and wiggling it will generate 100% CPU (still throttled though) but remain relatively smooth. With a single non-parallel make running (all in cache, mind you), the system kicks up into just about 100% CPU usage at full speed. Desktop spinning becomes between 10x to 100x slower (from ~30fps to 1fps). Galeon scrolling pauses for as much as a second. Mouse movement pauses for as much as a second. Typing in terminals lags noticeably. This is not the expected behavior of a fair, low-latency scheduler. No indeed it does not sound right at all to me either. Last time I encountered something like this we traced it and hit sched_yield calls somewhere in the graphic pipeline. So first question is, how does mainline perform with the same testcase, and second question is umm whatever it is that is slow is there a way to trace it to see if it yields? For reference, this was with HZ=250, PREEMPT, PREEMPT_BKL, and !NO_HZ. Ah I also wonder if it hasn't broken with NO_HZ. I haven't had a chance to even confirm that the code works properly with it, I was only assuming (after our last chat). See if turning that off makes a difference? Thanks for testing! -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Friday 09 March 2007 19:20, Matt Mackall wrote: And I've just rebooted with NO_HZ and things are greatly improved. At idle, Beryl effects are silky smooth (possibly better than stock) and shows less load. Under 'make', Beryl is still responsive as is Galeon. No sign of lagging mouse or typing. Under make -j 5, things are intermittent. Galeon scrolling is sometimes still responsive, but Beryl, terminals and mouse still drag quite a bit. I just replied before you sent this one out I think our messages passed each other across the ocean somewhere. I don't quite get what combination of factors you're saying here caused great improvement. Was it enabling NO_HZ on mainline cpu scheduler or disabling NO_HZ or on RSDL? -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3: /proc broken
On Friday 09 March 2007 19:53, Russell King wrote: On Fri, Mar 09, 2007 at 08:56:44AM +1100, Con Kolivas wrote: I did make oldconfig from http://userweb.kernel.org/~akpm/ck/config.txt and chose all the defaults. Then building your fat config with -rc3, 'ps' hangs on qemu for almost 30 seconds and then at last produces a broken output Let me guess - you have either a serial console or something like that and you're running these commands over said serial console? Or you have console directed to both a serial port and the VT and you're capturing this off the VT using gpm. Either way, serial8250: too much work for irq4 is a printk which will be displayed by the kernel when it's unable to clear down work for the serial port within 256 loops or so of the interrupt handler; it's a protection against the box locking up. It not actually contained in any of the files. Thank you very much for taking the time to explain it to me and I apologise for the false positive. It's very much due to running qemu directing everything to the serial console which is just as you say. Allowing qemu to output to graphic fixes the error. Unfortunately that also makes akpm's oops go away so I can't really reproduce it now. Perhaps the bug occurs due to interrupts being disabled for an extended time; it gives me something to look at now. Thanks! -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 05:27, Matt Mackall wrote: On Fri, Mar 09, 2007 at 07:39:05PM +1100, Con Kolivas wrote: On Friday 09 March 2007 19:20, Matt Mackall wrote: And I've just rebooted with NO_HZ and things are greatly improved. At idle, Beryl effects are silky smooth (possibly better than stock) and shows less load. Under 'make', Beryl is still responsive as is Galeon. No sign of lagging mouse or typing. Under make -j 5, things are intermittent. Galeon scrolling is sometimes still responsive, but Beryl, terminals and mouse still drag quite a bit. I just replied before you sent this one out I think our messages passed each other across the ocean somewhere. I don't quite get what combination of factors you're saying here caused great improvement. Was it enabling NO_HZ on mainline cpu scheduler or disabling NO_HZ or on RSDL? Turning on NO_HZ on RSDL greatly improved it. I have not tried NO_HZ on mainline. The first test was with NO_HZ=n, the second was with NO_HZ=y. How odd. I would have thought that if an interaction was to occur it would have been without the new feature. Clearly what you describe without NO_HZ is not the expected behaviour with RSDL. I wonder what went wrong. Are you on 100HZ on that laptop? While I expect 100HZ should be ok, it might just not be... My laptop is about the same performance and works fine with 100HZ under load of all sorts BUT I don't have Beryl (which I would have thought swayed things in the opposite direction also). As an aside, we should not name config options NO_* or DISABLE_* because of the potential for double negation. Case in point, I couldn't figure out what you were saying :) -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 05:07, Mark Lord wrote: Mmm.. when it's good, it's *really* good. My desktop feels snappier and all of that. No noticeable jerkiness of windows/scrolling, which I *do* observe with the stock scheduler. Thats good. But when it's bad, it stinks. Like when a make -j2 kernel rebuild is happening in a background window And that's bad. When you say it stinks is it more than 3 times slower? It should be precisely 3 times slower under that load (although low cpu using things like audio wont be affected by running 3 times slower). If it feels like much more than that much slower, there is a bug there somewhere. As another reader suggested, how does it run with the compile 'niced'? How does it perform with make (without a -j number). This is on a Pentium-M 760 single-core, w/2GB SDRAM (notebook). What HZ are you running? Are you running a Beryl desktop? JADP (Just Another Data Point). Appreciated, thanks. Mark -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 07:15, Con Kolivas wrote: On Saturday 10 March 2007 05:27, Matt Mackall wrote: On Fri, Mar 09, 2007 at 07:39:05PM +1100, Con Kolivas wrote: On Friday 09 March 2007 19:20, Matt Mackall wrote: And I've just rebooted with NO_HZ and things are greatly improved. At idle, Beryl effects are silky smooth (possibly better than stock) and shows less load. Under 'make', Beryl is still responsive as is Galeon. No sign of lagging mouse or typing. Under make -j 5, things are intermittent. Galeon scrolling is sometimes still responsive, but Beryl, terminals and mouse still drag quite a bit. I just replied before you sent this one out I think our messages passed each other across the ocean somewhere. I don't quite get what combination of factors you're saying here caused great improvement. Was it enabling NO_HZ on mainline cpu scheduler or disabling NO_HZ or on RSDL? Turning on NO_HZ on RSDL greatly improved it. I have not tried NO_HZ on mainline. The first test was with NO_HZ=n, the second was with NO_HZ=y. How odd. I would have thought that if an interaction was to occur it would have been without the new feature. Clearly what you describe without NO_HZ is not the expected behaviour with RSDL. I wonder what went wrong. Are you on 100HZ on that laptop? While I expect 100HZ should be ok, it might just not be... My laptop is about the same performance and works fine with 100HZ under load of all sorts BUT I don't have Beryl (which I would have thought swayed things in the opposite direction also). Oh and can you grep dmesg for: Scheduler bitmap error If that occurs it's not performing properly. A subtle bug that's busting my chops to try and track down. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 07:46, Matt Mackall wrote: Ok, I've now disabled sched_yield (I'm using xorg radeon drivers). Great. So far: rc2-mm2 RSDL RSDL+NO_HZ RSDL+NO_HZ+no_yield estimated CPU no load berylgood good great great~30% at 600MHz galeon good good goodgood 100% at 600MHz mp3 good good goodgood 5% at 600MHz terminal good good goodgood ~0 mousegood good goodgood ~0 make beryl awful ok good galeon bad ok good mp3good goodgood terminal bad goodgood mouse bad goodgood It's sad that sched_yield is still in our graphics card drivers ... make -j2 beryl awful bad/ok metacity bad/ok - it's not beryl-specifc galeon bad bad/ok mp3good good terminal bad bad/ok mouse bad bad/ok make -j5 berylokawful awful awful/bad galeon okbad bad bad mp3 good good gooda couple skips terminal okbad bad bad mousegood bad bad bad memload x5 berylok/good galeon ok/good mp3 good terminal ok/good mouseok/good good = no problems ok = noticeable latency bad = hard to use awful = completely unusable By the way, make -j5 is my usual kernel compile because it gives me the best wall time on this box. A priori, this load should be manageable by RSDL as the interactive loads are all pretty small. So I wrote a little Python script that basically continuously memcpys some 16MB chunks of memory: #!/usr/bin/python a = a * 16 * 1024 * 1024 while 1: b = a[1:] + b a = b[1:] + c I've got 1.5G of RAM, so I can run quite a few of these without killing my pagecache. This should test whether a) Beryl's actually running up against memory bandwidth issues and b) whether simple static loads work. As you can see, running 5 instances of this script leaves me in good shape still. 10 is still in ok territory, with top showing each getting 9.7-10% of the CPU. 15 starts to feel sluggish. 20 the mouse jumps a bit and I got an MP3 skip. 30 is getting pretty bad, but still not as bad as the make -j 5 load. My suspicion is the problem lies in giving too much quanta to newly-started processes. Ah that's some nice detective work there. Mainline does some rather complex accounting on sched_fork including (possibly) a whole timer tick which rsdl does not do. make forks off continuously so what you say may well be correct. I'll see if I can try to revert to the mainline behaviour in sched_fork (which was obviously there for a reason). -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 08:07, Con Kolivas wrote: On Saturday 10 March 2007 07:46, Matt Mackall wrote: My suspicion is the problem lies in giving too much quanta to newly-started processes. Ah that's some nice detective work there. Mainline does some rather complex accounting on sched_fork including (possibly) a whole timer tick which rsdl does not do. make forks off continuously so what you say may well be correct. I'll see if I can try to revert to the mainline behaviour in sched_fork (which was obviously there for a reason). Wow! Thanks Matt. You've found a real bug too. This seems to fix the qemu misbehaviour and bitmap errors so far too! Now can you please try this to see if it fixes your problem? --- kernel/sched.c |8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) Index: linux-2.6.21-rc3-mm1/kernel/sched.c === --- linux-2.6.21-rc3-mm1.orig/kernel/sched.c2007-03-10 08:08:11.0 +1100 +++ linux-2.6.21-rc3-mm1/kernel/sched.c 2007-03-10 08:13:57.0 +1100 @@ -1560,7 +1560,7 @@ int fastcall wake_up_state(struct task_s return try_to_wake_up(p, state, 0); } -static void task_expired_entitlement(struct rq *rq, struct task_struct *p); +static void task_running_tick(struct rq *rq, struct task_struct *p); /* * Perform scheduler related setup for a newly forked process p. * p is forked by current. @@ -1621,10 +1621,8 @@ void fastcall sched_fork(struct task_str * left from its timeslice. Taking the runqueue lock is not * a problem. */ - struct rq *rq = __task_rq_lock(current); - - task_expired_entitlement(rq, current); - __task_rq_unlock(rq); + current-time_slice = 1; + task_running_tick(cpu_rq(cpu), current); } local_irq_enable(); out: -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 08:39, Matt Mackall wrote: On Sat, Mar 10, 2007 at 08:19:18AM +1100, Con Kolivas wrote: On Saturday 10 March 2007 08:07, Con Kolivas wrote: On Saturday 10 March 2007 07:46, Matt Mackall wrote: My suspicion is the problem lies in giving too much quanta to newly-started processes. Ah that's some nice detective work there. Mainline does some rather complex accounting on sched_fork including (possibly) a whole timer tick which rsdl does not do. make forks off continuously so what you say may well be correct. I'll see if I can try to revert to the mainline behaviour in sched_fork (which was obviously there for a reason). Wow! Thanks Matt. You've found a real bug too. This seems to fix the qemu misbehaviour and bitmap errors so far too! Now can you please try this to see if it fixes your problem? Sorry, it's about the same. I now suspect an accounting glitch involving pipe wake-ups. 5x memload: good 5x execload: good 5x forkload: good 5 parallel makes: mostly good make -j 5: bad So what's different between makes in parallel and make -j 5? Make's job server uses pipe I/O to control how many jobs are running. Hmm it must be those deep pipes again then. I removed any quirks testing for those from mainline as I suspected it would be ok. Guess Im wrong. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 08:57, Willy Tarreau wrote: On Fri, Mar 09, 2007 at 03:39:59PM -0600, Matt Mackall wrote: On Sat, Mar 10, 2007 at 08:19:18AM +1100, Con Kolivas wrote: On Saturday 10 March 2007 08:07, Con Kolivas wrote: On Saturday 10 March 2007 07:46, Matt Mackall wrote: My suspicion is the problem lies in giving too much quanta to newly-started processes. Ah that's some nice detective work there. Mainline does some rather complex accounting on sched_fork including (possibly) a whole timer tick which rsdl does not do. make forks off continuously so what you say may well be correct. I'll see if I can try to revert to the mainline behaviour in sched_fork (which was obviously there for a reason). Wow! Thanks Matt. You've found a real bug too. This seems to fix the qemu misbehaviour and bitmap errors so far too! Now can you please try this to see if it fixes your problem? Sorry, it's about the same. I now suspect an accounting glitch involving pipe wake-ups. 5x memload: good 5x execload: good 5x forkload: good 5 parallel makes: mostly good make -j 5: bad So what's different between makes in parallel and make -j 5? Make's job server uses pipe I/O to control how many jobs are running. Matt, could you check with plain 2.6.20 + Con's patch ? It is possible that he added bugs when porting to -mm, or that someting in -mm causes the trouble. Your experience with -mm seems so much different from mine with mainline, there must be a difference somewhere ! Good idea. Con, is your patch necessary for mainline patch too ? I see that it should apply, but sometimes -mm may justify changes. Yes it will be necessary for the mainline patch too. Best regards, Willy -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Saturday 10 March 2007 08:57, Con Kolivas wrote: On Saturday 10 March 2007 08:39, Matt Mackall wrote: On Sat, Mar 10, 2007 at 08:19:18AM +1100, Con Kolivas wrote: On Saturday 10 March 2007 08:07, Con Kolivas wrote: On Saturday 10 March 2007 07:46, Matt Mackall wrote: My suspicion is the problem lies in giving too much quanta to newly-started processes. Ah that's some nice detective work there. Mainline does some rather complex accounting on sched_fork including (possibly) a whole timer tick which rsdl does not do. make forks off continuously so what you say may well be correct. I'll see if I can try to revert to the mainline behaviour in sched_fork (which was obviously there for a reason). Wow! Thanks Matt. You've found a real bug too. This seems to fix the qemu misbehaviour and bitmap errors so far too! Now can you please try this to see if it fixes your problem? Sorry, it's about the same. I now suspect an accounting glitch involving pipe wake-ups. 5x memload: good 5x execload: good 5x forkload: good 5 parallel makes: mostly good make -j 5: bad So what's different between makes in parallel and make -j 5? Make's job server uses pipe I/O to control how many jobs are running. Hmm it must be those deep pipes again then. I removed any quirks testing for those from mainline as I suspected it would be ok. Guess Im wrong. I shouldn't blame this straight up though if NO_HZ makes it better. Something else is going wrong... wtf though? -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/