On Mon, 2015-04-20 at 08:18 +0200, Ingo Molnar wrote:
> Please write a small description we can cite to driver authors once 
> the (inevitable) breakages appear, outlining this new behavior and its 
> implications, so that we can fix any remaining bugs ASAP.

I wouldn't call this new behavior, simply because changing a critical
region should not be labeled as such imho. Other than asking driver
authors to put their schedule() in a loop to confirm that the expected
condition has in fact occurred, I'm not sure what else we can ask them
to do -- as you know, this is not just about futexes.

> I'll also let this pending a bit longer than other changes, to make 
> sure we shake out any bugs/regressions triggered by this change.
> 
> Third, it might make sense to add a new 'spurious wakeup injection 
> debug mechanism' that, if enabled in the .config, automatically and 
> continuously inserts spurious wakeups at a given, slightly randomized 
> rate - which would ensure that all kernel facilities can robustly 
> handle spurious wakeups.

I have been using this from Peter to test against:

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6d77432..fdf1f68 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -214,9 +214,10 @@ print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq 
*cfs_rq);
 #define TASK_WAKEKILL          128
 #define TASK_WAKING            256
 #define TASK_PARKED            512
-#define TASK_STATE_MAX         1024
+#define TASK_YIELD              1024
+#define TASK_STATE_MAX         2048
 
-#define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWP"
+#define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWPY"
 
 extern char ___assert_task_state[1 - 2*!!(
                sizeof(TASK_STATE_TO_CHAR_STR)-1 != ilog2(TASK_STATE_MAX)+1)];
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f0f831e..2c938ae 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1005,7 +1005,7 @@ void set_task_cpu(struct task_struct *p, unsigned int 
new_cpu)
         * ttwu() will sort out the placement.
         */
        WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING &&
-                       !p->on_rq);
+                       !p->on_rq && !(p->state & TASK_YIELD));
 
 #ifdef CONFIG_LOCKDEP
        /*
@@ -2743,6 +2743,23 @@ static void __sched __schedule(void)
                if (unlikely(signal_pending_state(prev->state, prev))) {
                        prev->state = TASK_RUNNING;
                } else {
+
+                       /*
+                        * Provide an auto-yield feature on schedule().
+                        *
+                        * The thought is to avoid a sleep+wakeup cycle
+                        * if simply yielding the cpu will suffice to
+                        * satisfy the required condition.
+                        *
+                        * Assumes the calling schedule() site can deal
+                        * with spurious wakeups.
+                        */
+                       if (prev->state & TASK_YIELD) {
+                               prev->state &= ~TASK_YIELD;
+                               if (rq->nr_running > 1)
+                                       goto no_deactivate;
+                       }
+
                        deactivate_task(rq, prev, DEQUEUE_SLEEP);
                        prev->on_rq = 0;
 
@@ -2759,6 +2776,7 @@ static void __sched __schedule(void)
                                        try_to_wake_up_local(to_wakeup);
                        }
                }
+       no_deactivate:
                switch_count = &prev->nvcsw;
        }
 

> My guess would be that most remaining fragilities against spurious 
> wakeups ought to be in the boot/init phase, so I'd keep an eye out for 
> suspend/resume regressions.

Correct, which is why I'm not that concerned anymore about spurious
wakups, in fact that code above now boots and handles correctly on
rather large systems.

> 
> > [...] However there is core code that cannot handle them afaict, and 
> > furthermore tglx does have the point that other events can already 
> > trigger them anyway.
> 
> s/there is core code/there is no core code

heh yes.

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to