> -----Original Message----- > From: Robin Murphy [mailto:robin.mur...@arm.com] > Sent: Thursday, January 31, 2019 3:10 AM > To: Will Deacon; Li, Meng > Cc: mark.rutl...@arm.com; linux-arm-ker...@lists.infradead.org; linux- > ker...@vger.kernel.org; suzuki.poul...@arm.com > Subject: Re: Could you please help to have a look a bug trace in pmu arm-cci.c > > On 2019-01-30 6:21 pm, Will Deacon wrote: > > [+Suzuki and Robin] > > > > On Mon, Jan 28, 2019 at 07:19:20AM +0000, Li, Meng wrote: > >> When enable kernel configure CONFIG_DEBUG_ATOMIC_SLEEP, there is > below trace > >> during pmu arm cci driver probe phase. > >> > >> [ 1.983337] BUG: sleeping function called from invalid context at > kernel/locking/rtmutex.c:2004 > >> [ 1.983340] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0 > >> [ 1.983342] Preemption disabled at: > >> [ 1.983353] [<ffffff80089801f4>] cci_pmu_probe+0x1dc/0x488 > >> [ 1.983360] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.20-rt8-yocto- > preempt-rt #1 > >> [ 1.983362] Hardware name: ZynqMP ZCU102 Rev1.0 (DT) > >> [ 1.983364] Call trace: > >> [ 1.983369] dump_backtrace+0x0/0x158 > >> [ 1.983372] show_stack+0x24/0x30 > >> [ 1.983378] dump_stack+0x80/0xa4 > >> [ 1.983383] ___might_sleep+0x138/0x160 > >> [ 1.983386] __might_sleep+0x58/0x90 > >> [ 1.983391] __rt_mutex_lock_state+0x30/0xc0 > >> [ 1.983395] _mutex_lock+0x24/0x30 > >> [ 1.983400] perf_pmu_register+0x2c/0x388 > >> [ 1.983404] cci_pmu_probe+0x2bc/0x488 > >> [ 1.983409] platform_drv_probe+0x58/0xa8 > >> > >> Because get_cpu() is invoked, preempt is disable, finally, trace occurs > when > >> call might_sleep() > > > > Hmm, the {get,put}_cpu() usage here looks very broken to me. There's the > > fact that it might sleep, but also the assignment to g_cci_pmu is done after > > we've re-enabled preemption, so there's a race with CPU hotplug there > too. > > Hmm, looks like I failed to appreciate that particular race at the time > - indeed the global should probably be assigned immediately after > cci_pmu_init() has succeeded. > > > I don't think we can simply register the hotplug notifier before registering > > the PMU, because we can't call into perf_pmu_migrate_context() until the > PMU > > has been registered. Perhaps we need to use the _cpuslocked() versions > of > > the hotplug notifier registration functions. > > > > I tried looking at some other drivers, but they all look broken to me, so > > there's a good chance I'm missing something. Anybody know how this is > > supposed to work? > > As I understand the general pattern, we register the notifier last to > avoid taking a hotplug callback with a partly-initialised PMU state, > however since the CPU we've picked is part of that PMU state, we also > want to avoid getting migrated off that CPU before the notifier is in > place lest things get out of sync, hence disabling preemption. As far as > the correctness of implementing that logic, though, it was like that > when I got here so I've always just assumed it was fine :) > > I guess the question is whether we actually need to pick our nominal CPU > before perf_pmu_register(), or if something like the below would suffice > - what do you reckon? > > Robin. > > ----->8----- > diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c > index 1bfeb160c5b1..da9309ff80d7 100644 > --- a/drivers/perf/arm-cci.c > +++ b/drivers/perf/arm-cci.c > @@ -1692,19 +1692,18 @@ static int cci_pmu_probe(struct platform_device > *pdev) > raw_spin_lock_init(&cci_pmu->hw_events.pmu_lock); > mutex_init(&cci_pmu->reserve_mutex); > atomic_set(&cci_pmu->active_events, 0); > - cci_pmu->cpu = get_cpu(); > + cci_pmu->cpu = -1; /* Avoid races until hotplug notifier is alive */
If we set cci_pmu->cpu = -1 here, I think we should modify code of cci_pmu_event_init() as below: //////// original //////////////////////// if (event->cpu < 0) return -EINVAL; event->cpu = cci_pmu->cpu; //////// original //////////////////////// Change into //////// new //////////////////////// event->cpu = cci_pmu->cpu; if (event->cpu < 0) return -EINVAL; //////// new //////////////////////// Move this if condition after the assignment value operation. Otherwise, when call perf_event_open()->find_get_context(), int cpu = event->cpu = -1; -1 as a parameter passed into per_cpu_ptr(), and return a wrong cpuctx, If we continue to execute code with the wrong cpuctx, kernel may crash. I am not sure whether my analysis is reasonable, only just for your reference. Thanks, Limeng > > ret = cci_pmu_init(cci_pmu, pdev); > - if (ret) { > - put_cpu(); > + if (ret) > return ret; > - } > + g_cci_pmu = cci_pmu; > > cpuhp_setup_state_nocalls(CPUHP_AP_PERF_ARM_CCI_ONLINE, > "perf/arm/cci:online", NULL, > cci_pmu_offline_cpu); > - put_cpu(); > - g_cci_pmu = cci_pmu; > + cci_pmu->cpu = smp_processor_id(); > + > pr_info("ARM %s PMU driver probed", cci_pmu->model->name); > return 0; > }