On Tue, Jan 26, 2021 at 01:57:18PM -0500, Steven Rostedt wrote: > From: "Steven Rostedt (VMware)" <rost...@goodmis.org> > > There's some paths that can call into the scheduler from interrupt disabled > or preempt disabled state. Specifically from the idle thread. The problem is > that it can call the scheduler, still stay idle, and continue. The preempt > and irq disabled tracer considers this a very long latency, and hides real > latencies that we care about. > > For example, this is from a preemptirqsoff trace: > > <idle>-0 2dN.1 16us : tick_nohz_account_idle_ticks.isra.0 > <-tick_nohz_idle_exit > <idle>-0 2.N.1 17us : flush_smp_call_function_from_idle <-do_idle > <idle>-0 2dN.1 17us : flush_smp_call_function_queue > <-flush_smp_call_function_from_idle > <idle>-0 2dN.1 17us : nohz_csd_func > <-flush_smp_call_function_queue > <idle>-0 2.N.1 18us : schedule_idle <-do_idle > <idle>-0 2dN.1 18us : rcu_note_context_switch <-__schedule > <idle>-0 2dN.1 18us : rcu_preempt_deferred_qs > <-rcu_note_context_switch > <idle>-0 2dN.1 19us : rcu_preempt_need_deferred_qs > <-rcu_preempt_deferred_qs > <idle>-0 2dN.1 19us : rcu_qs <-rcu_note_context_switch > <idle>-0 2dN.1 19us : _raw_spin_lock <-__schedule > <idle>-0 2dN.1 19us : preempt_count_add <-_raw_spin_lock > <idle>-0 2dN.2 20us : do_raw_spin_trylock <-_raw_spin_lock > > do_idle() calls schedule_idle() which calls __schedule, but the latency > continues on for 1.4 milliseconds.
I'm not sure I understand the problem from this... what? > To handle this case, create a new function called > "reset_critical_timings()" which just calls stop_critical_timings() followed > by start_critical_timings() and place this in the scheduler. There's no > reason to worry about timings when the scheduler is called, as that should > allow everything to move forward. And that's just really daft.. why are you adding two unconditional function calls to __schedule() that are a complete waste of time 99.999999% of the time? If anything, this should be fixed in schedule_idle().