On Fri, Jul 31, 2015 at 01:56:45PM +0200, Jiri Olsa wrote: > We are experiencing following crashes under kdump > kernel execution: > > [ 1.930428] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000080 > > SNIP > > [ 2.478604] Call Trace: > [ 2.479364] [<ffffffff81158f06>] perf_event_enable+0x16/0x40 > [ 2.481035] [<ffffffff81119a2c>] watchdog_nmi_enable+0x10c/0x170 > [ 2.482906] [<ffffffff81119c14>] watchdog_enable+0x54/0xc0 > [ 2.484583] [<ffffffff810ad209>] smpboot_thread_fn+0xb9/0x1a0 > [ 2.486302] [<ffffffff81634bb9>] ? schedule+0x29/0x70 > [ 2.487811] [<ffffffff810ad150>] ? lg_double_unlock+0x90/0x90 > [ 2.4810a484f>] kthread+0xcf/0xe0 > [ 2.991478] [<ffffffff810a4780>] ? kthread_create_on_node+0x140/0x140 > [ 2.993519] [<ffffffff8163fad8>] ret_from_fork+0x58/0x90 > [ 2.995192] [<ffffffff810a4780>] ? kthread_create_on_node+0x140/0x140 > [ 3.005787] RIP [<ffffffff8115744a>] > perf_event_ctx_lock_nested.isra.50+0x5a/0xa0 > [ 3.008350] RSP <ffff88003199fde0> > [ 3.009417] CR2: 0000000000000080 > [ 3.010432] ---[ end trace 892dd7be017c5cde ]--- > [ 3.011792] Kernel panic - not syncing: Fatal exception > [ 3.013379] Rebooting in 10 seconds.. > > It's caused by watchdog_nmi_disable_all/watchdog_nmi_enable_all calls > executed by fixup_ht_bug (as fallback for HT bug) and watchdog_enable > function executed by watchdog's smp_hotplug_thread setup callback.
Hi Jiri, Uli privately has been working on a patchset that cleans up a bunch of these race conditions. We believe it should cover this case. It uses the proc_mutex to synchronize everything. I think he is reaching out to you. If you could try his patchset to see if it fixes things, it might be a cleaner approach than what you are doing. <small time goes by> Ok, I was poked on IRC and you have already done this and it passed your testcase. Let me get Uli to post his patches. :-) Cheers, Don > > The watchdog_enable function calls watchdog_nmi_enable which is not serialized > with watchdog_nmi_disable_all/watchdog_nmi_enable_all. We see following race > during the boot. > > (booting with 'maxcpus=1' to emulate kdump mode and execute HT bug fallback). > > --- > kernel_init > kernel_init_freeable > lockup_detector_init > smpboot_register_percpu_thread > > -> schedules out to the new kthread > --- > smpboot_thread_fn > watchdog_enable > watchdog_nmi_enable > perf_event_create_kernel_counter > per_cpu(watchdog_ev, cpu) = event > perf_event_enable > perf_event_ctx_lock_nested > -> schedules out back to the kernel_init > --- > kernel_init > ... > do_basic_setup > do_initcalls > fixup_ht_bug > watchdog_nmi_disable_all > -> release per_cpu(watchdog_ev, cpu) initialized by watchdog_enable > > watchdog_nmi_enable_all > ... > -> schedules out to the smpboot_thread_fn > --- > smpboot_thread_fn > ... > -> crash in perf_event_ctx_lock_nested because the event > was released > > Protecting watchdog_nmi_(enable|disable) calls with new per-cpu mutex. > > I can easily reproduce this on SNB server with 'maxcpus=1' command line > option, and I guess it could be triggered on any server that deals with > the HT bug. I dig out following: > > case 42: /* 32nm SandyBridge */ > case 45: /* 32nm SandyBridge-E/EN/EP */ > case 58: /* 22nm IvyBridge */ > case 62: /* 22nm IvyBridge-EP/EX */ > case 60: /* 22nm Haswell Core */ > case 63: /* 22nm Haswell Server */ > case 69: /* 22nm Haswell ULT */ > case 70: /* 22nm Haswell + GT3e (Intel Iris Pro graphics) */ > > Signed-off-by: Jiri Olsa <jo...@kernel.org> > --- > kernel/watchdog.c | 22 ++++++++++++++++++++-- > 1 file changed, 20 insertions(+), 2 deletions(-) > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index a6ffa43f2993..8fcc75116277 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -82,6 +82,7 @@ static DEFINE_PER_CPU(bool, hard_watchdog_warn); > static DEFINE_PER_CPU(bool, watchdog_nmi_touch); > static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); > static DEFINE_PER_CPU(struct perf_event *, watchdog_ev); > +static DEFINE_PER_CPU(struct mutex, watchdog_ev_lock); > #endif > static unsigned long soft_lockup_nmi_warn; > > @@ -526,7 +527,10 @@ static unsigned long cpu0_err; > static int watchdog_nmi_enable(unsigned int cpu) > { > struct perf_event_attr *wd_attr; > - struct perf_event *event = per_cpu(watchdog_ev, cpu); > + struct perf_event *event; > + > + mutex_lock(&per_cpu(watchdog_ev_lock, cpu)); > + event = per_cpu(watchdog_ev, cpu); > > /* nothing to do if the hard lockup detector is disabled */ > if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED)) > @@ -557,6 +561,8 @@ static int watchdog_nmi_enable(unsigned int cpu) > goto out_save; > } > > + mutex_unlock(&per_cpu(watchdog_ev_lock, cpu)); > + > /* > * Disable the hard lockup detector if _any_ CPU fails to set up > * set up the hardware perf event. The watchdog() function checks > @@ -593,12 +599,16 @@ out_save: > out_enable: > perf_event_enable(per_cpu(watchdog_ev, cpu)); > out: > + mutex_unlock(&per_cpu(watchdog_ev_lock, cpu)); > return 0; > } > > static void watchdog_nmi_disable(unsigned int cpu) > { > - struct perf_event *event = per_cpu(watchdog_ev, cpu); > + struct perf_event *event; > + > + mutex_lock(&per_cpu(watchdog_ev_lock, cpu)); > + event = per_cpu(watchdog_ev, cpu); > > if (event) { > perf_event_disable(event); > @@ -611,6 +621,7 @@ static void watchdog_nmi_disable(unsigned int cpu) > /* watchdog_nmi_enable() expects this to be zero initially. */ > cpu0_err = 0; > } > + mutex_unlock(&per_cpu(watchdog_ev_lock, cpu)); > } > > void watchdog_nmi_enable_all(void) > @@ -928,6 +939,8 @@ int proc_watchdog_cpumask(struct ctl_table *table, int > write, > > void __init lockup_detector_init(void) > { > + int cpu __maybe_unused; > + > set_sample_period(); > > #ifdef CONFIG_NO_HZ_FULL > @@ -942,6 +955,11 @@ void __init lockup_detector_init(void) > cpumask_copy(&watchdog_cpumask, cpu_possible_mask); > #endif > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR > + for_each_possible_cpu(cpu) > + mutex_init(&per_cpu(watchdog_ev_lock, cpu)); > +#endif > + > if (watchdog_enabled) > watchdog_enable_all_cpus(); > } > -- > 2.4.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/