Thanks for comments. On 2019/07/29 19:29, Peter Zijlstra wrote: > On Sun, Jul 28, 2019 at 09:25:58PM -0700, Andy Lutomirski wrote: >> On Sun, Jul 28, 2019 at 6:08 PM Eiichi Tsukata <de...@etsukata.com> wrote: > >>> diff --git a/kernel/trace/trace_preemptirq.c >>> b/kernel/trace/trace_preemptirq.c >>> index 4d8e99fdbbbe..031b51cb94d0 100644 >>> --- a/kernel/trace/trace_preemptirq.c >>> +++ b/kernel/trace/trace_preemptirq.c >>> @@ -10,6 +10,7 @@ >>> #include <linux/module.h> >>> #include <linux/ftrace.h> >>> #include <linux/kprobes.h> >>> +#include <linux/context_tracking.h> >>> #include "trace.h" >>> >>> #define CREATE_TRACE_POINTS >>> @@ -49,9 +50,14 @@ NOKPROBE_SYMBOL(trace_hardirqs_off); >>> >>> __visible void trace_hardirqs_on_caller(unsigned long caller_addr) >>> { >>> + enum ctx_state prev_state; >>> + >>> if (this_cpu_read(tracing_irq_cpu)) { >>> - if (!in_nmi()) >>> + if (!in_nmi()) { >>> + prev_state = exception_enter(); >>> trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr); >>> + exception_exit(prev_state); >>> + } >>> tracer_hardirqs_on(CALLER_ADDR0, caller_addr); >>> this_cpu_write(tracing_irq_cpu, 0); >>> } >> >> This seems a bit distressing. Now we're going to do a whole bunch of >> context tracking transitions for each kernel entry. Would a better>> fix me >> to change trace_hardirqs_on_caller to skip the trace event if >> the previous state was already IRQs on and, more importantly, to skip >> tracing IRQs off if IRQs were already off? The x86 code is very >> careful to avoid ever having IRQs on and CONTEXT_USER at the same >> time. > > I think they already (try to) do that; see 'tracing_irq_cpu'. >
Or you mean something like this? As for trace_hardirqs_off_caller: diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c index 4d8e99fdbbbe..d39478bcf0f2 100644 --- a/kernel/trace/trace_preemptirq.c +++ b/kernel/trace/trace_preemptirq.c @@ -66,7 +66,7 @@ __visible void trace_hardirqs_off_caller(unsigned long caller_addr) if (!this_cpu_read(tracing_irq_cpu)) { this_cpu_write(tracing_irq_cpu, 1); tracer_hardirqs_off(CALLER_ADDR0, caller_addr); - if (!in_nmi()) + if (!in_nmi() && !irqs_disabled()) trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr); } Or diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c index 4d8e99fdbbbe..e08c5c6ff2b3 100644 --- a/kernel/trace/trace_preemptirq.c +++ b/kernel/trace/trace_preemptirq.c @@ -66,8 +66,6 @@ __visible void trace_hardirqs_off_caller(unsigned long caller_addr) if (!this_cpu_read(tracing_irq_cpu)) { this_cpu_write(tracing_irq_cpu, 1); tracer_hardirqs_off(CALLER_ADDR0, caller_addr); - if (!in_nmi()) - trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr); } As for trace_hardirqs_on_caller, it is called when IRQs off and CONTEXT_USER. So even though we skipped the trace event if the previous state was already IRQs on, we will fall into the same situation.