Re: [patch] CFS scheduler, -v18

Ingo Molnar Tue, 26 Jun 2007 01:40:20 -0700

* Andrew Morton <[EMAIL PROTECTED]> wrote:

> So I locally generated the diff to take -mm up to the above version of 
> CFS.


thx. I released a diff against mm2:

 
http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22-rc4-mm2-v18.patch

but indeed the -git diff serves you better if you updated -mm to Linus' 
latest.

firstly, thanks a ton for your review feedback!

> - sys_sched_yield_to() went away?  I guess I missed that.

yep. Nobody tried it and sent any feedback on it, it was causing 
patch-logistical complications both in -mm and for packagers that bundle 
CFS (the experimental-schedulers site has a CFS repo and Fedora rawhide 
started carrying CFS recently as well), and i dont really agree with 
adding yet another yield interface anyway. So we can and should do this 
independently of CFS.

> - Curious.  the simplification of task_tick_rt() seems to go only
>   halfway.  Could do
> 
>       if (p->policy != SCHED_RR)
>               return;
> 
>       if (--p->time_slice)
>               return;
> 
>       /* stuff goes here */

yeah. I have fixed it in my v19 tree for it to look like you suggest.

> - dud macro:
>                                       
> #define is_rt_policy(p)               ((p) == SCHED_FIFO || (p) == SCHED_RR)
> 
>   It evaluates its arg twice and could and should be coded in C.
> 
>   There are a bunch of other don't-need-to-be-implemented-as-a-macro
>   macros around there too.  Generally, I suggest you review all the
>   patchset for macros-which-don't-need-to-be-macros.

yep, fixed. (is a historic macro)

> - Extraneous newline:
> 
> enum cpu_idle_type
> {

fixed. (is a pre-existing enum)

> - Style thing:
> 
> struct sched_entity {

>       u64 sleep_start, sleep_start_fair;

fixed.

> - None of these fields have comments describing what they do ;)

one of them has ;-) Will fill this in.

> - __exit_signal() does apparently-unlocked 64-bit arith.  Is there 
>   some implicit locking here or do we not care about the occasional 
>   race-induced inaccuracy?

do you mean the tsk->se.sum_exec_runtime addition, etc? That runs with 
interrupts disabled so sum_sched_runtime is protected.

>   (ditto, lots of places, I expect)

which places do you mean?

>   (Gee, there's shitloads of 64-bit stuff in there.  Does it all 
>   _really_ need to be 64-bit on 32-bit?)

yes - CFS is fundamentally designed for 64-bit, with still pretty OK 
arithmetics performance for 32-bit.

> - weight_s64() (what does this do?) looks too big to inline on 32-bit.

ok, i've uninlined it.

> - update_stats_enqueue() looks too big to inline even on 64-bit.

done.

> - Overall, this change is tremendously huge for something which is
>   supposedly ready-to-merge. [...]

hey, that's not fair, your review comments just made it 10K larger ;-)

> [...] Looks like a lot of that is the sched_entity conversion, but 
> afaict there's quite a lot besides.
> 
> - Should "4" in
> 
>       (sysctl_sched_features & 4)
> 
>   be enumerated?

yep, done.

> - Maybe even __check_preempt_curr_fair() is too porky to inline.

yep - undone.

> - There really is an astonishing amount of 64-bit arith in here...
> 
> - Some opportunities for useful comments have been missed ;)
> 
> #define NICE_0_LOAD   SCHED_LOAD_SCALE
> #define NICE_0_SHIFT  SCHED_LOAD_SHIFT
> 
>   <wonders what these mean>

SCHED_LOAD_SCALE is the smpnice stuff. CFS reuses that and also makes it 
clear via this define that a nice-0 task has a 'load' contribution to 
the CPU as of NICE_0_LOAD. Sometimes, when doing smpnice load-balancing 
calculations we want to use 'SCHED_LOAD_SCALE', sometimes we want to 
stress it's NICE_0_LOAD.

> - Should any of those new 64-bit arith functions in sched.c be pulled 
>   out and made generic?

yep, the plan is to put this all into reciprocal_div.h and to convert 
existing users of reciprocal_div to the cleaner stuff from Thomas. The 
patch wont get any smaller due to that though ;-)

> - update_curr_load() is huge, inlined and has several callsites?

this is a reasonable tradeoff i think - update_curr_load()'s slowpath is 
in __update_curr_load(). Anyway, it probably wont get inlined when the 
kernel is built with -Os and without forced-inlining.

> - lots more macros-which-dont-need-to-be-macros in sched.c:
>   load_weight(), PRIO_TO_load_weight(), RTPRIO_TO_load_weight(), maybe
>   others.  People are more inclined to comment functions than they are
>   macros, for some reason.

these are mostly ancient macros. I fixed up some of them in my current 
tree.

> - inc_load(), dec_load(), inc_nr_running(), dec_nr_running(): these will
>   generate plenty of code on 32-bit and they're all inlined with 
>   multiple callsites.

yep - i'll revisit the inlining picture. This is not really a primary 
worry i think because it's easy to tweak and people can already express 
their inlining preference via CONFIG_CC_OPTIMIZE_FOR_SIZE and 
CONFIG_FORCED_INLINING.

> - overall, CFS takes sched.o from 41157 of .text up to 48781 on x86_64,
>   which at 18% is rather a large bloat.  Hopefully a lot of this is 
>   the new debug stuff.

> - On i386 sched.o went from 33755 up to 43660 which is 29% growth. 
>   Possibly acceptable, but why did it increase a lot more than the x86_64
>   version?  All that 64-bit arith, I assume?

the main reason is the sched debugging stuff:

   text    data     bss     dec     hex filename
  37570    2538      20   40128    9cc0 kernel/sched.o
  30692    2426      20   33138    8172 kernel/sched-no_sched_debug.o

i can make it depend on CONFIG_SCHEDSTATS, although i'd prefer it to be 
always on.

> - style (or the lack thereof):
> 
>       p->se.sum_wait_runtime = p->se.sum_sleep_runtime = 0;
>       p->se.sleep_start = p->se.sleep_start_fair = p->se.block_start = 0;
>       p->se.sleep_max = p->se.block_max = p->se.exec_max = p->se.wait_max = 0;
>       p->se.wait_runtime_overruns = p->se.wait_runtime_underruns = 0;
> 
>   bit of an eyesore?

fixed. (this heap grew gradually and now is/was an eyesore indeed.)

> - in sched_init() this looks funny:
> 
>               rq->ls.load_update_last = sched_clock();
>               rq->ls.load_update_start = sched_clock();
> 
>   was it intended that these both get the same value?

it doesnt really matter, i fixed them to be initialized to the same 
'now' value.

i've attached my current fixes. (Please dont apply it yet.)

        Ingo

Not-Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

---
 Makefile              |    2 
 include/linux/sched.h |  120 +++++++++++++++++++++++++++++---------------------
 kernel/exit.c         |    2 
 kernel/sched.c        |   58 ++++++++++++++++--------
 kernel/sched_debug.c  |    2 
 kernel/sched_fair.c   |   61 ++++++++++++-------------
 kernel/sched_rt.c     |   15 +++---
 7 files changed, 149 insertions(+), 111 deletions(-)

Index: linux/Makefile
===================================================================
--- linux.orig/Makefile
+++ linux/Makefile
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 22
-EXTRAVERSION = -rc6-cfs-v18
+EXTRAVERSION = -rc6-cfs-v19
 NAME = Holy Dancing Manatees, Batman!
 
 # *DOCUMENTATION*
Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -528,31 +528,6 @@ struct signal_struct {
 #define SIGNAL_STOP_CONTINUED  0x00000004 /* SIGCONT since WCONTINUED reap */
 #define SIGNAL_GROUP_EXIT      0x00000008 /* group exit in progress */
 
-
-/*
- * Priority of a process goes from 0..MAX_PRIO-1, valid RT
- * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
- * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1. Priority
- * values are inverted: lower p->prio value means higher priority.
- *
- * The MAX_USER_RT_PRIO value allows the actual maximum
- * RT priority to be separate from the value exported to
- * user-space.  This allows kernel threads to set their
- * priority to a value higher than any user task. Note:
- * MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO.
- */
-
-#define MAX_USER_RT_PRIO       100
-#define MAX_RT_PRIO            MAX_USER_RT_PRIO
-
-#define MAX_PRIO               (MAX_RT_PRIO + 40)
-#define DEFAULT_PRIO           (MAX_RT_PRIO + 20)
-
-#define rt_prio(prio)          unlikely((prio) < MAX_RT_PRIO)
-#define rt_task(p)             rt_prio((p)->prio)
-#define is_rt_policy(p)                ((p) == SCHED_FIFO || (p) == SCHED_RR)
-#define has_rt_policy(p)       unlikely(is_rt_policy((p)->policy))
-
 /*
  * Some day this will be a full-fledged user tracking system..
  */
@@ -646,8 +621,7 @@ static inline int sched_info_on(void)
 #endif
 }
 
-enum cpu_idle_type
-{
+enum cpu_idle_type {
        CPU_IDLE,
        CPU_NOT_IDLE,
        CPU_NEWLY_IDLE,
@@ -843,30 +817,45 @@ struct load_weight {
        unsigned long weight, inv_weight;
 };
 
-/* CFS stats for a schedulable entity (task, task-group etc) */
+/*
+ * CFS stats for a schedulable entity (task, task-group etc)
+ *
+ * Current field usage histogram:
+ *
+ *     4 se->block_start
+ *     4 se->run_node
+ *     4 se->sleep_start
+ *     4 se->sleep_start_fair
+ *     6 se->load.weight
+ *     7 se->delta_fair
+ *    15 se->wait_runtime
+ */
 struct sched_entity {
-       struct load_weight load;        /* for nice- load-balancing purposes */
-       int on_rq;
-       struct rb_node run_node;
-       unsigned long delta_exec;
-       s64 delta_fair;
-
-       u64 wait_start_fair;
-       u64 wait_start;
-       u64 exec_start;
-       u64 sleep_start, sleep_start_fair;
-       u64 block_start;
-       u64 sleep_max;
-       u64 block_max;
-       u64 exec_max;
-       u64 wait_max;
-       u64 last_ran;
-
-       s64 wait_runtime;
-       u64 sum_exec_runtime;
-       s64 fair_key;
-       s64 sum_wait_runtime, sum_sleep_runtime;
-       unsigned long wait_runtime_overruns, wait_runtime_underruns;
+       s64                     wait_runtime;
+       s64                     delta_fair;
+       struct load_weight      load;           /* for load-balancing */
+       struct rb_node          run_node;
+       int                     on_rq;
+       unsigned long           delta_exec;
+
+       u64                     wait_start_fair;
+       u64                     wait_start;
+       u64                     exec_start;
+       u64                     sleep_start;
+       u64                     sleep_start_fair;
+       u64                     block_start;
+       u64                     sleep_max;
+       u64                     block_max;
+       u64                     exec_max;
+       u64                     wait_max;
+       u64                     last_ran;
+
+       u64                     sum_exec_runtime;
+       s64                     fair_key;
+       s64                     sum_wait_runtime;
+       s64                     sum_sleep_runtime;
+       unsigned long           wait_runtime_overruns;
+       unsigned long           wait_runtime_underruns;
 };
 
 struct task_struct {
@@ -1126,6 +1115,37 @@ struct task_struct {
 #endif
 };
 
+/*
+ * Priority of a process goes from 0..MAX_PRIO-1, valid RT
+ * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
+ * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1. Priority
+ * values are inverted: lower p->prio value means higher priority.
+ *
+ * The MAX_USER_RT_PRIO value allows the actual maximum
+ * RT priority to be separate from the value exported to
+ * user-space.  This allows kernel threads to set their
+ * priority to a value higher than any user task. Note:
+ * MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO.
+ */
+
+#define MAX_USER_RT_PRIO       100
+#define MAX_RT_PRIO            MAX_USER_RT_PRIO
+
+#define MAX_PRIO               (MAX_RT_PRIO + 40)
+#define DEFAULT_PRIO           (MAX_RT_PRIO + 20)
+
+static inline int rt_prio(int prio)
+{
+       if (unlikely(prio < MAX_RT_PRIO))
+               return 1;
+       return 0;
+}
+
+static inline int rt_task(struct task_struct *p)
+{
+       return rt_prio(p->prio);
+}
+
 static inline pid_t process_group(struct task_struct *tsk)
 {
        return tsk->signal->pgrp;
Index: linux/kernel/exit.c
===================================================================
--- linux.orig/kernel/exit.c
+++ linux/kernel/exit.c
@@ -290,7 +290,7 @@ static void reparent_to_kthreadd(void)
        /* Set the exit signal to SIGCHLD so we signal init on exit */
        current->exit_signal = SIGCHLD;
 
-       if (!has_rt_policy(current) && (task_nice(current) < 0))
+       if (task_nice(current) < 0)
                set_user_nice(current, 0);
        /* cpus_allowed? */
        /* rt_priority? */
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -106,6 +106,18 @@ unsigned long long __attribute__((weak))
 #define MIN_TIMESLICE          max(5 * HZ / 1000, 1)
 #define DEF_TIMESLICE          (100 * HZ / 1000)
 
+static inline int rt_policy(int policy)
+{
+       if (unlikely(policy == SCHED_FIFO) || unlikely(policy == SCHED_RR))
+               return 1;
+       return 0;
+}
+
+static inline int task_has_rt_policy(struct task_struct *p)
+{
+       return rt_policy(p->policy);
+}
+
 /*
  * This is the priority-queue data structure of the RT scheduling class:
  */
@@ -752,7 +764,7 @@ static void set_load_weight(struct task_
        task_rq(p)->cfs.wait_runtime -= p->se.wait_runtime;
        p->se.wait_runtime = 0;
 
-       if (has_rt_policy(p)) {
+       if (task_has_rt_policy(p)) {
                p->se.load.weight = prio_to_weight[0] * 2;
                p->se.load.inv_weight = prio_to_wmult[0] >> 1;
                return;
@@ -805,7 +817,7 @@ static inline int normal_prio(struct tas
 {
        int prio;
 
-       if (has_rt_policy(p))
+       if (task_has_rt_policy(p))
                prio = MAX_RT_PRIO-1 - p->rt_priority;
        else
                prio = __normal_prio(p);
@@ -1476,17 +1488,24 @@ int fastcall wake_up_state(struct task_s
  */
 static void __sched_fork(struct task_struct *p)
 {
-       p->se.wait_start_fair = p->se.wait_start = p->se.exec_start = 0;
-       p->se.sum_exec_runtime = 0;
-       p->se.delta_exec = 0;
-       p->se.delta_fair = 0;
-
-       p->se.wait_runtime = 0;
-
-       p->se.sum_wait_runtime = p->se.sum_sleep_runtime = 0;
-       p->se.sleep_start = p->se.sleep_start_fair = p->se.block_start = 0;
-       p->se.sleep_max = p->se.block_max = p->se.exec_max = p->se.wait_max = 0;
-       p->se.wait_runtime_overruns = p->se.wait_runtime_underruns = 0;
+       p->se.wait_start_fair           = 0;
+       p->se.wait_start                = 0;
+       p->se.exec_start                = 0;
+       p->se.sum_exec_runtime          = 0;
+       p->se.delta_exec                = 0;
+       p->se.delta_fair                = 0;
+       p->se.wait_runtime              = 0;
+       p->se.sum_wait_runtime          = 0;
+       p->se.sum_sleep_runtime         = 0;
+       p->se.sleep_start               = 0;
+       p->se.sleep_start_fair          = 0;
+       p->se.block_start               = 0;
+       p->se.sleep_max                 = 0;
+       p->se.block_max                 = 0;
+       p->se.exec_max                  = 0;
+       p->se.wait_max                  = 0;
+       p->se.wait_runtime_overruns     = 0;
+       p->se.wait_runtime_underruns    = 0;
 
        INIT_LIST_HEAD(&p->run_list);
        p->se.on_rq = 0;
@@ -1799,7 +1818,7 @@ static void update_cpu_load(struct rq *t
        int i, scale;
 
        this_rq->nr_load_updates++;
-       if (sysctl_sched_features & 64)
+       if (unlikely(!(sysctl_sched_features & SCHED_FEAT_PRECISE_CPU_LOAD)))
                goto do_avg;
 
        /* Update delta_fair/delta_exec fields first */
@@ -3801,7 +3820,7 @@ void set_user_nice(struct task_struct *p
         * it wont have any effect on scheduling until the task is
         * SCHED_FIFO/SCHED_RR:
         */
-       if (has_rt_policy(p)) {
+       if (task_has_rt_policy(p)) {
                p->static_prio = NICE_TO_PRIO(nice);
                goto out_unlock;
        }
@@ -3999,14 +4018,14 @@ recheck:
            (p->mm && param->sched_priority > MAX_USER_RT_PRIO-1) ||
            (!p->mm && param->sched_priority > MAX_RT_PRIO-1))
                return -EINVAL;
-       if (is_rt_policy(policy) != (param->sched_priority != 0))
+       if (rt_policy(policy) != (param->sched_priority != 0))
                return -EINVAL;
 
        /*
         * Allow unprivileged RT tasks to decrease priority:
         */
        if (!capable(CAP_SYS_NICE)) {
-               if (is_rt_policy(policy)) {
+               if (rt_policy(policy)) {
                        unsigned long rlim_rtprio;
                        unsigned long flags;
 
@@ -6186,6 +6205,7 @@ int in_sched_functions(unsigned long add
 
 void __init sched_init(void)
 {
+       u64 now = sched_clock();
        int highest_cpu = 0;
        int i, j;
 
@@ -6206,8 +6226,8 @@ void __init sched_init(void)
                rq->nr_running = 0;
                rq->cfs.tasks_timeline = RB_ROOT;
                rq->clock = rq->cfs.fair_clock = 1;
-               rq->ls.load_update_last = sched_clock();
-               rq->ls.load_update_start = sched_clock();
+               rq->ls.load_update_last = now;
+               rq->ls.load_update_start = now;
 
                for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
                        rq->cpu_load[j] = 0;
Index: linux/kernel/sched_debug.c
===================================================================
--- linux.orig/kernel/sched_debug.c
+++ linux/kernel/sched_debug.c
@@ -157,7 +157,7 @@ static int sched_debug_show(struct seq_f
        u64 now = ktime_to_ns(ktime_get());
        int cpu;
 
-       SEQ_printf(m, "Sched Debug Version: v0.03, cfs-v18, %s %.*s\n",
+       SEQ_printf(m, "Sched Debug Version: v0.03, cfs-v19, %s %.*s\n",
                init_utsname()->release,
                (int)strcspn(init_utsname()->version, " "),
                init_utsname()->version);
Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -61,8 +61,24 @@ unsigned int sysctl_sched_stat_granulari
  */
 unsigned int sysctl_sched_runtime_limit __read_mostly;
 
+/*
+ * Debugging: various feature bits
+ */
+enum {
+       SCHED_FEAT_IGNORE_PREEMPTED     = 1,
+       SCHED_FEAT_DISTRIBUTE           = 2,
+       SCHED_FEAT_FAIR_SLEEPERS        = 4,
+       SCHED_FEAT_SLEEPER_AVG          = 32,
+       SCHED_FEAT_PRECISE_CPU_LOAD     = 64,
+       SCHED_FEAT_START_DEBIT          = 128,
+       SCHED_FEAT_SKIP_INITIAL         = 256,
+};
+
 unsigned int sysctl_sched_features __read_mostly =
-                       0 | 2 | 4 | 8 | 0 | 0 | 0 | 0;
+               SCHED_FEAT_DISTRIBUTE           |
+               SCHED_FEAT_FAIR_SLEEPERS        |
+               SCHED_FEAT_SLEEPER_AVG          |
+               SCHED_FEAT_PRECISE_CPU_LOAD;
 
 extern struct sched_class fair_sched_class;
 
@@ -145,7 +161,7 @@ static inline void
 __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
        if (cfs_rq->rb_leftmost == &se->run_node)
-               cfs_rq->rb_leftmost = NULL;
+               cfs_rq->rb_leftmost = rb_next(&se->run_node);
        rb_erase(&se->run_node, &cfs_rq->tasks_timeline);
        update_load_sub(&cfs_rq->load, se->load.weight);
        cfs_rq->nr_running--;
@@ -258,7 +274,7 @@ __update_curr(struct cfs_rq *cfs_rq, str
         * Task already marked for preemption, do not burden
         * it with the cost of not having left the CPU yet:
         */
-       if (unlikely(sysctl_sched_features & 1))
+       if (unlikely(sysctl_sched_features & SCHED_FEAT_IGNORE_PREEMPTED))
                if (unlikely(test_tsk_thread_flag(curtask, TIF_NEED_RESCHED)))
                        return;
 
@@ -305,7 +321,7 @@ update_stats_wait_start(struct cfs_rq *c
        se->wait_start = now;
 }
 
-static inline s64 weight_s64(s64 calc, unsigned long weight, int shift)
+static s64 weight_s64(s64 calc, unsigned long weight, int shift)
 {
        if (calc < 0) {
                calc = - calc * weight;
@@ -317,7 +333,7 @@ static inline s64 weight_s64(s64 calc, u
 /*
  * Task is being enqueued - update stats:
  */
-static inline void
+static void
 update_stats_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
 {
        s64 key;
@@ -438,7 +454,7 @@ static void distribute_fair_add(struct c
        struct sched_entity *curr = cfs_rq_curr(cfs_rq);
        s64 delta_fair = 0;
 
-       if (!(sysctl_sched_features & 2))
+       if (!(sysctl_sched_features & SCHED_FEAT_DISTRIBUTE))
                return;
 
        if (cfs_rq->nr_running) {
@@ -469,7 +485,7 @@ __enqueue_sleeper(struct cfs_rq *cfs_rq,
         * Fix up delta_fair with the effect of us running
         * during the whole sleep period:
         */
-       if (!(sysctl_sched_features & 32))
+       if (sysctl_sched_features & SCHED_FEAT_SLEEPER_AVG)
                delta_fair = div64_s(delta_fair * load, load + se->load.weight);
 
        delta_fair = weight_s64(delta_fair, se->load.weight, NICE_0_SHIFT);
@@ -495,7 +511,7 @@ enqueue_sleeper(struct cfs_rq *cfs_rq, s
        s64 delta_fair;
 
        if ((entity_is_task(se) && tsk->policy == SCHED_BATCH) ||
-                                                !(sysctl_sched_features & 4))
+                        !(sysctl_sched_features & SCHED_FEAT_FAIR_SLEEPERS))
                return;
 
        delta_fair = cfs_rq->fair_clock - se->sleep_start_fair;
@@ -574,7 +590,7 @@ static void dequeue_entity(struct cfs_rq
 /*
  * Preempt the current task with a newly woken task if needed:
  */
-static inline void
+static void
 __check_preempt_curr_fair(struct cfs_rq *cfs_rq, struct sched_entity *se,
                          struct sched_entity *curr, unsigned long granularity)
 {
@@ -612,23 +628,6 @@ put_prev_entity(struct cfs_rq *cfs_rq, s
        int updated = 0;
 
        /*
-        * If the task is still waiting for the CPU (it just got
-        * preempted), update its position within the tree and
-        * start the wait period:
-        */
-       if ((sysctl_sched_features & 16) && entity_is_task(prev))  {
-               struct task_struct *prevtask = task_of(prev);
-
-               if (prev->on_rq &&
-                       test_tsk_thread_flag(prevtask, TIF_NEED_RESCHED)) {
-
-                       dequeue_entity(cfs_rq, prev, 0, now);
-                       enqueue_entity(cfs_rq, prev, 0, now);
-                       updated = 1;
-               }
-       }
-
-       /*
         * If still on the runqueue then deactivate_task()
         * was not called and update_curr() has to be done:
         */
@@ -741,10 +740,8 @@ static void check_preempt_curr_fair(stru
        unsigned long gran;
 
        if (unlikely(rt_prio(p->prio))) {
-               if (sysctl_sched_features & 8) {
-                       if (rt_prio(p->prio))
-                               update_curr(cfs_rq, rq_clock(rq));
-               }
+               if (rt_prio(p->prio))
+                       update_curr(cfs_rq, rq_clock(rq));
                resched_task(curr);
                return;
        }
@@ -850,14 +847,14 @@ static void task_new_fair(struct rq *rq,
         * The first wait is dominated by the child-runs-first logic,
         * so do not credit it with that waiting time yet:
         */
-       if (sysctl_sched_features & 256)
+       if (sysctl_sched_features & SCHED_FEAT_SKIP_INITIAL)
                p->se.wait_start_fair = 0;
 
        /*
         * The statistical average of wait_runtime is about
         * -granularity/2, so initialize the task with that:
         */
-       if (sysctl_sched_features & 128)
+       if (sysctl_sched_features & SCHED_FEAT_START_DEBIT)
                p->se.wait_runtime = -(s64)(sysctl_sched_granularity / 2);
 
        __enqueue_entity(cfs_rq, se);
Index: linux/kernel/sched_rt.c
===================================================================
--- linux.orig/kernel/sched_rt.c
+++ linux/kernel/sched_rt.c
@@ -12,7 +12,7 @@ static inline void update_curr_rt(struct
        struct task_struct *curr = rq->curr;
        u64 delta_exec;
 
-       if (!has_rt_policy(curr))
+       if (!task_has_rt_policy(curr))
                return;
 
        delta_exec = now - curr->se.exec_start;
@@ -179,13 +179,14 @@ static void task_tick_rt(struct rq *rq, 
        if (p->policy != SCHED_RR)
                return;
 
-       if (!(--p->time_slice)) {
-               p->time_slice = static_prio_timeslice(p->static_prio);
-               set_tsk_need_resched(p);
+       if (--p->time_slice)
+               return;
 
-               /* put it at the end of the queue: */
-               requeue_task_rt(rq, p);
-       }
+       p->time_slice = static_prio_timeslice(p->static_prio);
+       set_tsk_need_resched(p);
+
+       /* put it at the end of the queue: */
+       requeue_task_rt(rq, p);
 }
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v18

Reply via email to