On Sat, Mar 19, 2016 at 12:10:30PM +0800, Boqun Feng wrote: > Hi Steven, > > On Fri, Mar 18, 2016 at 01:06:02PM -0400, Steven Rostedt wrote: > > While testing the tracer preemptoff, I hit this strange trace: > > > > # cmd pid ||||| time | caller > > # \ / ||||| \ | / > > <...>-259 0...1 0us : schedule <-worker_thread > > <...>-259 0d..1 0us : rcu_note_context_switch <-__schedule > > <...>-259 0d..1 0us : rcu_sched_qs <-rcu_note_context_switch > > <...>-259 0d..1 0us : rcu_preempt_qs <-rcu_note_context_switch > > <...>-259 0d..1 0us : _raw_spin_lock <-__schedule > > <...>-259 0d..1 0us : preempt_count_add <-_raw_spin_lock > > <...>-259 0d..2 0us : do_raw_spin_lock <-_raw_spin_lock > > <...>-259 0d..2 1us : deactivate_task <-__schedule > > <...>-259 0d..2 1us : update_rq_clock.part.84 <-deactivate_task > > <...>-259 0d..2 1us : dequeue_task_fair <-deactivate_task > > <...>-259 0d..2 1us : dequeue_entity <-dequeue_task_fair > > <...>-259 0d..2 1us : update_curr <-dequeue_entity > > <...>-259 0d..2 1us : update_min_vruntime <-update_curr > > <...>-259 0d..2 1us : cpuacct_charge <-update_curr > > <...>-259 0d..2 1us : __rcu_read_lock <-cpuacct_charge > > <...>-259 0d..2 1us : __rcu_read_unlock <-cpuacct_charge > > <...>-259 0d..2 1us : clear_buddies <-dequeue_entity > > <...>-259 0d..2 1us : account_entity_dequeue <-dequeue_entity > > <...>-259 0d..2 2us : update_min_vruntime <-dequeue_entity > > <...>-259 0d..2 2us : update_cfs_shares <-dequeue_entity > > <...>-259 0d..2 2us : hrtick_update <-dequeue_task_fair > > <...>-259 0d..2 2us : wq_worker_sleeping <-__schedule > > <...>-259 0d..2 2us : kthread_data <-wq_worker_sleeping > > <...>-259 0d..2 2us : pick_next_task_fair <-__schedule > > <...>-259 0d..2 2us : check_cfs_rq_runtime <-pick_next_task_fair > > <...>-259 0d..2 2us : pick_next_entity <-pick_next_task_fair > > <...>-259 0d..2 2us : clear_buddies <-pick_next_entity > > <...>-259 0d..2 2us : pick_next_entity <-pick_next_task_fair > > <...>-259 0d..2 2us : clear_buddies <-pick_next_entity > > <...>-259 0d..2 2us : set_next_entity <-pick_next_task_fair > > <...>-259 0d..2 3us : put_prev_entity <-pick_next_task_fair > > <...>-259 0d..2 3us : check_cfs_rq_runtime <-put_prev_entity > > <...>-259 0d..2 3us : set_next_entity <-pick_next_task_fair > > gnome-sh-1031 0d..2 3us : finish_task_switch <-__schedule > > gnome-sh-1031 0d..2 3us : _raw_spin_unlock_irq <-finish_task_switch > > gnome-sh-1031 0d..2 3us : do_raw_spin_unlock <-_raw_spin_unlock_irq > > gnome-sh-1031 0...2 3us!: preempt_count_sub <-_raw_spin_unlock_irq > > gnome-sh-1031 0...1 582us : do_raw_spin_lock <-_raw_spin_lock > > gnome-sh-1031 0...1 583us : _raw_spin_unlock <-drm_gem_object_lookup > > gnome-sh-1031 0...1 583us : do_raw_spin_unlock <-_raw_spin_unlock > > gnome-sh-1031 0...1 583us : preempt_count_sub <-_raw_spin_unlock > > gnome-sh-1031 0...1 584us : _raw_spin_unlock <-drm_gem_object_lookup > > gnome-sh-1031 0...1 584us+: trace_preempt_on <-drm_gem_object_lookup > > gnome-sh-1031 0...1 603us : <stack trace> > > => preempt_count_sub > > => _raw_spin_unlock > > => drm_gem_object_lookup > > => i915_gem_madvise_ioctl > > => drm_ioctl > > => do_vfs_ioctl > > => SyS_ioctl > > => entry_SYSCALL_64_fastpath > > > > As I'm tracing preemption disabled, it seemed incorrect that the trace > > would go across a schedule and report not being in the scheduler. > > Looking into this I discovered the problem. > > > > schedule() calls preempt_disable() but the preempt_schedule() calls > > preempt_enable_notrace(). What happened above was that the gnome-shell > > task was preempted on another CPU, migrated over to the idle cpu. The > > tracer stared with idle calling schedule(), which called > > preempt_disable(), but then gnome-shell finished, and it enabled > > preemption with preempt_enable_notrace() that does stop the trace, even > > though preemption was enabled. > > > > The purpose of the preempt_disable_notrace() in the preempt_schedule() > > is to prevent function tracing from going into an infinite loop. > > Because function tracing can trace the preempt_enable/disable() calls > > that are traced. The problem with function tracing is: > > > > NEED_RESCHED set > > preempt_schedule() > > preempt_disable() > > preempt_count_inc() > > Just out of curiosity, could this be solved by adding a barrier() here? > > > function trace (before incrementing preempt count) > > preempt_disable_notrace() > > preempt_enable_notrace() > > sees NEED_RESCHED set > > preempt_schedule() (repeat) > > > > I'm asking this because it seems to me if PREEMPT_COUNT=y and > PREEMPT_TRACER=y, then > > preempt_disable_notrace(); > preempt_disable_check(1); > > is actually > > __preempt_count_inc(); > barrier(); > preempt_disable_check(1); > > whereas > > preempt_disable() > > is actually > > __preempt_count_inc(); > preempt_disable_check(1); > barrier(); > > so I think adding barrier() (or even moving the barrier() up) in > preempt_disable() could solve the same problem. > > If so, seems we don't need to introduce preempt_disable_check() and > preempt_enable_check(), even better, we can use preempt_disable() and > preempt_enable_no_resched() in preempt_schedule(). > > Or Am I missing something subtle here? >
Oops, seems I have the question because I really don't understand how function tracing works. I now understand the problem here, Please ignore this, sorry for the noise. Regards, Boqun > Regards, > Boqun > > > Now by breaking out the preempt off/on tracing into their own code: > > preempt_disable_check() and preempt_enable_check(), we can add these to > > the preempt_schedule() code. As preemption would then be disabled, even > > if they were to be traced by the function tracer, the disabled > > preemption would prevent the recursion. > > > > Signed-off-by: Steven Rostedt <rost...@goodmis.org> > > --- > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 9503d590e5ef..a925e1d2c4cd 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -3023,6 +3023,17 @@ notrace unsigned long get_parent_ip(unsigned long > > addr) > > #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \ > > defined(CONFIG_PREEMPT_TRACER)) > > > > +static inline void preempt_disable_check(int val) > > +{ > > + if (preempt_count() == val) { > > + unsigned long ip = get_parent_ip(CALLER_ADDR1); > > +#ifdef CONFIG_DEBUG_PREEMPT > > + current->preempt_disable_ip = ip; > > +#endif > > + trace_preempt_off(CALLER_ADDR0, ip); > > + } > > +} > > + > > void preempt_count_add(int val) > > { > > #ifdef CONFIG_DEBUG_PREEMPT > > @@ -3040,17 +3051,17 @@ void preempt_count_add(int val) > > DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >= > > PREEMPT_MASK - 10); > > #endif > > - if (preempt_count() == val) { > > - unsigned long ip = get_parent_ip(CALLER_ADDR1); > > -#ifdef CONFIG_DEBUG_PREEMPT > > - current->preempt_disable_ip = ip; > > -#endif > > - trace_preempt_off(CALLER_ADDR0, ip); > > - } > > + preempt_disable_check(val); > > } > > EXPORT_SYMBOL(preempt_count_add); > > NOKPROBE_SYMBOL(preempt_count_add); > > > > +static inline void preempt_enable_check(int val) > > +{ > > + if (preempt_count() == val) > > + trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1)); > > +} > > + > > void preempt_count_sub(int val) > > { > > #ifdef CONFIG_DEBUG_PREEMPT > > @@ -3067,13 +3078,15 @@ void preempt_count_sub(int val) > > return; > > #endif > > > > - if (preempt_count() == val) > > - trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1)); > > + preempt_enable_check(val); > > __preempt_count_sub(val); > > } > > EXPORT_SYMBOL(preempt_count_sub); > > NOKPROBE_SYMBOL(preempt_count_sub); > > > > +#else > > +static inline void preempt_disable_check(int val) { } > > +static inline void preempt_enable_check(int val) { } > > #endif > > > > /* > > @@ -3349,7 +3362,14 @@ static void __sched notrace > > preempt_schedule_common(void) > > { > > do { > > preempt_disable_notrace(); > > + /* > > + * Function tracer requires disabling preemption before > > + * tracing functions. But we still want to trace > > + * preemption off locations. > > + */ > > + preempt_disable_check(1); > > __schedule(true); > > + preempt_enable_check(1); > > preempt_enable_no_resched_notrace(); > > > > /* > > @@ -3403,6 +3423,12 @@ asmlinkage __visible void __sched notrace > > preempt_schedule_notrace(void) > > do { > > preempt_disable_notrace(); > > /* > > + * Function tracer requires disabling preemption before > > + * tracing functions. But we still want to trace > > + * preemption off locations. > > + */ > > + preempt_disable_check(1); > > + /* > > * Needs preempt disabled in case user_exit() is traced > > * and the tracer calls preempt_enable_notrace() causing > > * an infinite recursion. > > @@ -3411,6 +3437,7 @@ asmlinkage __visible void __sched notrace > > preempt_schedule_notrace(void) > > __schedule(true); > > exception_exit(prev_ctx); > > > > + preempt_enable_check(1); > > preempt_enable_no_resched_notrace(); > > } while (need_resched()); > > }
signature.asc
Description: PGP signature