Hi Peter, On Wed, Feb 13, 2019 at 02:17:20PM +0100, Peter Zijlstra wrote: > On Wed, Feb 13, 2019 at 10:50:21AM +0000, Julien Thierry wrote: > > On 13/02/2019 10:35, Peter Zijlstra wrote: > > > On Tue, Feb 12, 2019 at 09:15:13AM +0000, Julien Thierry wrote: > > > > > >>>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > >>>>> index a674c7db..b1bb7e9 100644 > > >>>>> --- a/kernel/sched/core.c > > >>>>> +++ b/kernel/sched/core.c > > >>>>> @@ -3289,6 +3289,14 @@ static inline void schedule_debug(struct > > >>>>> task_struct *prev) > > >>>>> __schedule_bug(prev); > > >>>>> preempt_count_set(PREEMPT_DISABLED); > > >>>>> } > > >>>>> + > > >>>>> + if (IS_ENABLED(CONFIG_DEBUG_UACCESS_SLEEP) && > > >>>>> + unlikely(unsafe_user_region_active())) { > > >>>>> + printk(KERN_ERR "BUG: scheduling while user_access > > >>>>> enabled: %s/%d/0x%08x\n", > > >>>>> + prev->comm, prev->pid, preempt_count()); > > >>>>> + dump_stack(); > > >>>>> + } > > >>>>> + > > >>>>> rcu_sleep_check(); > > >>>>> > > >>>>> profile_hit(SCHED_PROFILING, __builtin_return_address(0)); > > > > > >> I guess I'll drop the might_resched() part of this patch if that sounds > > >> alright. > > > > > > I'm still confused by the schedule_debug() part. How is that not broken? > > > > Hmmm, I am not exactly sure which part you expect to be broken, I guess > > it's because of the nature of the uaccess unsafe accessor usage. > > > > Basically, the following is a definite no: > > if (user_access_begin(ptr, size)) { > > > > [...] > > > > //something that calls schedule > > > > [...] > > > > user_access_end(); > > } > > > > > > However the following is fine: > > > > - user_access_begin(ptr, size) > > - taking irq/exception > > - get preempted > > This; how is getting preempted fundamentally different from scheduling > ourselves?
The difference is because getting preempted in the sequence above is triggered off the back of an interrupt. On arm64, and I think also on x86, the user access state (SMAP or PAN) is saved and restored across exceptions but not across context switch. Consequently, taking an irq in a user_access_{begin,end} section and then scheduling is fine, but calling schedule directly within such a section is not. Julien -- please yell if I've missed some crucial detail, but I think that's the gist of what we're trying to describe here. Will