TASK_DEAD is special in that the final schedule call from do_exit() must be done with preemption disabled.
This leads to a violation of our new scheduling invariant which states that the preempt count should be 2. Move the check for TASK_DEAD out of the debug check and use it to decrement the preempt count (from 2 to 1). Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> --- kernel/sched/core.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2955,12 +2955,8 @@ static inline void schedule_debug(struct #ifdef CONFIG_SCHED_STACK_END_CHECK BUG_ON(unlikely(task_stack_end_corrupted(prev))); #endif - /* - * Test if we are atomic. Since do_exit() needs to call into - * schedule() atomically, we ignore that path. Otherwise whine - * if we are scheduling when we should not. - */ - if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD)) { + + if (unlikely(in_atomic_preempt_off())) { __schedule_bug(prev); preempt_count_set(PREEMPT_DISABLED); } @@ -3061,6 +3057,17 @@ static void __sched __schedule(void) rcu_note_context_switch(); prev = rq->curr; + /* + * do_exit() calls schedule() with preemption disabled as an exception; + * however we must fix that up, otherwise the next task will see an + * inconsistent preempt count. + * + * It also avoids the below schedule_debug() test from complaining + * about this. + */ + if (unlikely(prev->state == TASK_DEAD)) + preempt_enable_no_resched_notrace(); + schedule_debug(prev); if (sched_feat(HRTICK)) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/