On 10/04, Oleg Nesterov wrote: > > On 10/03, Linus Torvalds wrote: > > > > On Fri, Oct 3, 2014 at 5:01 PM, Linus Torvalds > > <torva...@linux-foundation.org> wrote: > > > > > > The real fix would appear to be to use > > > "preempt_enable_no_resched_notrace()", which your patch did, but > > > without the loop. > > > > Actually, the real fix would be to not be stupid, and just make the > > code do something like > > > > > if (likely(!preemptible())) > > > return; > > > > > > __preempt_count_add(PREEMPT_ACTIVE); > > > prev_ctx = exception_enter(); > > > > > > __schedule(); > > > > > > exception_exit(prev_ctx); > > > __preempt_count_sub(PREEMPT_ACTIVE); > > > > and *not* enable preemption around the scheduling at all.
Yes, I think you are right. And I hate to admit that I didn't think about this simplification. The only complication is that we should move this function from context_tracking.c to sched/core.c. However, I still think we need the need_resched() loop, we can't do this only once. And in fact the simplest fix could just turn it into local_irq_disable(); preempt_schedule_irq(); local_irq_enable(); > Again, it is too late for me... Most probably I am wrong, but somehow > it seems to me that the real fix should try to kill preempt_schedule_context() > altogether and teach preempt_schedule() to play well with CONTEXT_TRACKING. Yes, the very fact that preempt_enable() != trace_preempt_on() + preempt_enable_notrace() looks simply wrong imo. And preempt_enable_notrace() has users outside of the tracing code which can run in IN_USER state. For example, why should __perf_sw_event() worry about context_tracking.state? OTOH, if the caller of preempt_enable_notrace() actually needs to take care of potential IN_USER state, then it probably has other problems. And indeed, ftrace_ops_control_func() has to check rcu_is_watching() anyway. IOW, imho 29bb9e5a75 "tracing/context-tracking: Add preempt_schedule_context() for tracing" was not a right solution. Perhaps the tracing functions can be changed to switch to IN_KERNEL mode, or at least perhaps we can add the special preempt_disable_ftrace() helper. But this needs another discussion. Either way, I do think that preempt_schedule_context() in its current form must die. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/