On Tue, Feb 16, 2016 at 08:52:24AM +0300, Denis Kirjanov wrote: > On 2/15/16, Steven Rostedt <rost...@goodmis.org> wrote: > > On Sun, 14 Feb 2016 23:19:42 +0300 > > Denis Kirjanov <k...@linux-powerpc.org> wrote: > > > > > >> >> TP_PROTO(unsigned long call_site, const void *ptr), > >> >> > >> >> - TP_ARGS(call_site, ptr) > >> >> + TP_ARGS(call_site, ptr), > >> >> + > >> >> + /* > >> >> + * This trace can be potentially called from an offlined cpu. > >> >> + * Since trace points use RCU and RCU should not be used from > >> >> + * offline cpus, filter such calls out. > >> >> + * While this trace can be called from a preemptable section, > >> >> + * it has no impact on the condition since tasks can migrate > >> >> + * only from online cpus to other online cpus. Thus its safe > >> >> + * to use raw_smp_processor_id. > >> >> + */ > >> >> + TP_CONDITION(cpu_online(raw_smp_processor_id())) > >> > > >> > This is starting to become a common occurrence. Perhaps it is best to > >> > just hardcode this into the tracepoint code itself? > >> > >> Can you take it as a fix for now. I'll post the follow-up patch then > >> for rcu and offline cpus > >> issue. > > > > Actually, I prefer not to add more checks for cpu_online(), if its > > going to become the default. > > > > Does this work for you instead? > > Hm, somehow it doesn't work: > > [ 84.692846] =============================== > [ 84.692847] [ INFO: suspicious RCU usage. ] > [ 84.692849] 4.5.0-rc3-00765-g78b70d4-dirty #36 Tainted: G S > [ 84.692850] ------------------------------- > [ 84.692851] include/trace/events/rcu.h:457 suspicious > rcu_dereference_check() usage! > [ 84.692852] > other info that might help us debug this: > > [ 84.692854] > RCU used illegally from offline CPU!
The problem is that the CPU really is offline as far as RCU is concerned. The usual idle-loop workarounds do not apply here. > rcu_scheduler_active = 1, debug_locks = 1 > [ 84.692855] no locks held by swapper/11/0. > [ 84.692856] > stack backtrace: > [ 84.692858] CPU: 11 PID: 0 Comm: swapper/11 Tainted: G S > 4.5.0-rc3-00765-g78b70d4-dirty #36 > [ 84.692860] Call Trace: > [ 84.692868] [c0000005b76d37f0] [c0000000008c57ac] > .dump_stack+0xe0/0x14c (unreliable) > [ 84.692872] [c0000005b76d3880] [c00000000010d0c8] > .lockdep_rcu_suspicious+0x108/0x170 > [ 84.692876] [c0000005b76d3910] [c00000000013cab4] > .__call_rcu.constprop.60+0x244/0x680 > [ 84.692880] [c0000005b76d39e0] [c0000000002c38ec] .put_object+0x5c/0x80 > [ 84.692884] [c0000005b76d3a60] [c00000000029e748] > .kmem_cache_free+0x318/0x4c0 > [ 84.692888] [c0000005b76d3b00] [c0000000000936b4] .__mmdrop+0x54/0x150 Ah, this is the problem. We are calling __mmdrop() on a CPU that RCU believes to be offline. Can we pull the call to __mmdrop() forward? (Trying to recall what x86 did about a very similar problem...) Thanx, Paul > [ 84.692891] [c0000005b76d3b90] [c0000000000e46e0] > .idle_task_exit+0x120/0x130 > [ 84.692895] [c0000005b76d3c20] [c000000000075a14] > .pseries_mach_cpu_die+0x64/0x310 > [ 84.692899] [c0000005b76d3cd0] [c00000000004401c] .cpu_die+0x3c/0x60 > [ 84.692902] [c0000005b76d3d40] [c000000000018998] > .arch_cpu_idle_dead+0x28/0x40 > [ 84.692906] [c0000005b76d3db0] [c000000000102714] > .cpu_startup_entry+0x554/0x580 > [ 84.692909] [c0000005b76d3ed0] [c000000000043d68] > .start_secondary+0x348/0x390 > [ 84.692913] [c0000005b76d3f90] [c000000000008a6c] > start_secondary_prolog+0x10/0x14 > > [ 84.692918] =============================== > [ 84.692919] [ INFO: suspicious RCU usage. ] > [ 84.692920] 4.5.0-rc3-00765-g78b70d4-dirty #36 Tainted: G S > [ 84.692921] ------------------------------- > [ 84.692922] include/trace/events/kmem.h:141 suspicious > rcu_dereference_check() usage! > [ 84.692923] > other info that might help us debug this: > > [ 84.692924] > RCU used illegally from offline CPU! > rcu_scheduler_active = 1, debug_locks = 1 > [ 84.692925] no locks held by swapper/11/0. > [ 84.692926] > stack backtrace: > [ 84.692928] CPU: 11 PID: 0 Comm: swapper/11 Tainted: G S > 4.5.0-rc3-00765-g78b70d4-dirty #36 > [ 84.692929] Call Trace: > [ 84.692933] [c0000005b76d38c0] [c0000000008c57ac] > .dump_stack+0xe0/0x14c (unreliable) > [ 84.692936] [c0000005b76d3950] [c00000000010d0c8] > .lockdep_rcu_suspicious+0x108/0x170 > [ 84.692939] [c0000005b76d39e0] [c00000000029f1c0] .kfree+0x380/0x470 > [ 84.692942] [c0000005b76d3a80] [c000000000056194] > .destroy_context+0x44/0x100 > [ 84.692946] [c0000005b76d3b00] [c0000000000936c0] .__mmdrop+0x60/0x150 > [ 84.692948] [c0000005b76d3b90] [c0000000000e46e0] > .idle_task_exit+0x120/0x130 > [ 84.692951] [c0000005b76d3c20] [c000000000075a14] > .pseries_mach_cpu_die+0x64/0x310 > [ 84.692954] [c0000005b76d3cd0] [c00000000004401c] .cpu_die+0x3c/0x60 > [ 84.692957] [c0000005b76d3d40] [c000000000018998] > .arch_cpu_idle_dead+0x28/0x40 > [ 84.692960] [c0000005b76d3db0] [c000000000102714] > .cpu_startup_entry+0x554/0x580 > [ 84.692963] [c0000005b76d3ed0] [c000000000043d68] > .start_secondary+0x348/0x390 > [ 84.692966] [c0000005b76d3f90] [c000000000008a6c] > start_secondary_prolog+0x10/0x14 > > > > > > > -- Steve > > > > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > > index acd522a91539..acfdbf353a0b 100644 > > --- a/include/linux/tracepoint.h > > +++ b/include/linux/tracepoint.h > > @@ -14,8 +14,10 @@ > > * See the file COPYING for more details. > > */ > > > > +#include <linux/smp.h> > > #include <linux/errno.h> > > #include <linux/types.h> > > +#include <linux/cpumask.h> > > #include <linux/rcupdate.h> > > #include <linux/tracepoint-defs.h> > > > > @@ -132,6 +134,9 @@ extern void syscall_unregfunc(void); > > void *it_func; \ > > void *__data; \ > > \ > > + if (!cpu_online(raw_smp_processor_id())) \ > > + return; \ > > + \ > > if (!(cond)) \ > > return; \ > > prercu; \ > > >