Excerpts from Athira Rajeev's message of January 15, 2022 5:20 pm: > Running selftest with CONFIG_PPC_IRQ_SOFT_MASK_DEBUG enabled in kernel > triggered below warning: > > [ 172.851380] ------------[ cut here ]------------ > [ 172.851391] WARNING: CPU: 8 PID: 2901 at > arch/powerpc/include/asm/hw_irq.h:246 power_pmu_disable+0x270/0x280 > [ 172.851402] Modules linked in: dm_mod bonding nft_ct nf_conntrack > nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables rfkill nfnetlink sunrpc xfs > libcrc32c pseries_rng xts vmx_crypto uio_pdrv_genirq uio sch_fq_codel > ip_tables ext4 mbcache jbd2 sd_mod t10_pi sg ibmvscsi ibmveth > scsi_transport_srp fuse > [ 172.851442] CPU: 8 PID: 2901 Comm: lost_exception_ Not tainted > 5.16.0-rc5-03218-g798527287598 #2 > [ 172.851451] NIP: c00000000013d600 LR: c00000000013d5a4 CTR: > c00000000013b180 > [ 172.851458] REGS: c000000017687860 TRAP: 0700 Not tainted > (5.16.0-rc5-03218-g798527287598) > [ 172.851465] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 48004884 > XER: 20040000 > [ 172.851482] CFAR: c00000000013d5b4 IRQMASK: 1 > [ 172.851482] GPR00: c00000000013d5a4 c000000017687b00 c000000002a10600 > 0000000000000004 > [ 172.851482] GPR04: 0000000082004000 c0000008ba08f0a8 0000000000000000 > 00000008b7ed0000 > [ 172.851482] GPR08: 00000000446194f6 0000000000008000 c00000000013b118 > c000000000d58e68 > [ 172.851482] GPR12: c00000000013d390 c00000001ec54a80 0000000000000000 > 0000000000000000 > [ 172.851482] GPR16: 0000000000000000 0000000000000000 c000000015d5c708 > c0000000025396d0 > [ 172.851482] GPR20: 0000000000000000 0000000000000000 c00000000a3bbf40 > 0000000000000003 > [ 172.851482] GPR24: 0000000000000000 c0000008ba097400 c0000000161e0d00 > c00000000a3bb600 > [ 172.851482] GPR28: c000000015d5c700 0000000000000001 0000000082384090 > c0000008ba0020d8 > [ 172.851549] NIP [c00000000013d600] power_pmu_disable+0x270/0x280 > [ 172.851557] LR [c00000000013d5a4] power_pmu_disable+0x214/0x280 > [ 172.851565] Call Trace: > [ 172.851568] [c000000017687b00] [c00000000013d5a4] > power_pmu_disable+0x214/0x280 (unreliable) > [ 172.851579] [c000000017687b40] [c0000000003403ac] > perf_pmu_disable+0x4c/0x60 > [ 172.851588] [c000000017687b60] [c0000000003445e4] > __perf_event_task_sched_out+0x1d4/0x660 > [ 172.851596] [c000000017687c50] [c000000000d1175c] __schedule+0xbcc/0x12a0 > [ 172.851602] [c000000017687d60] [c000000000d11ea8] schedule+0x78/0x140 > [ 172.851608] [c000000017687d90] [c0000000001a8080] sys_sched_yield+0x20/0x40 > [ 172.851615] [c000000017687db0] [c0000000000334dc] > system_call_exception+0x18c/0x380 > [ 172.851622] [c000000017687e10] [c00000000000c74c] > system_call_common+0xec/0x268 > > The warning indicates that MSR_EE being set(interrupt enabled) when > there was an overflown PMC detected. This could happen in > power_pmu_disable since it runs under interrupt soft disable > condition ( local_irq_save ) and not with interrupts hard disabled. > commit 2c9ac51b850d ("powerpc/perf: Fix PMU callbacks to clear > pending PMI before resetting an overflown PMC") intended to clear > PMI pending bit in Paca when disabling the PMU. It could happen > that PMC gets overflown while code is in power_pmu_disable > callback function. Hence add a check to see if PMI pending bit > is set in Paca before clearing it via clear_pmi_pending. > > Fixes: 2c9ac51b850d ("powerpc/perf: Fix PMU callbacks to clear pending PMI > before resetting an overflown PMC") > Signed-off-by: Athira Rajeev <atraj...@linux.vnet.ibm.com> > Reported-by: Sachin Sant <sach...@linux.ibm.com> > Tested-by: Sachin Sant <sach...@linux.ibm.com> > --- > Note: Address the warning reported here: > https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-December/238190.html > Patch is on top of powerpc/merge > > arch/powerpc/perf/core-book3s.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c > index a684901b6965..dfb0ea0f0df3 100644 > --- a/arch/powerpc/perf/core-book3s.c > +++ b/arch/powerpc/perf/core-book3s.c > @@ -1327,9 +1327,10 @@ static void power_pmu_disable(struct pmu *pmu) > * Otherwise provide a warning if there is PMI pending, but > * no counter is found overflown. > */ > - if (any_pmc_overflown(cpuhw)) > - clear_pmi_irq_pending(); > - else > + if (any_pmc_overflown(cpuhw)) { > + if (pmi_irq_pending()) > + clear_pmi_irq_pending();
And this works because MSR[EE] gets disabled by the masked interrupt handler if we have a PMI irq pending, is that right? Can I be a pain and just ask for a little comment there otherwise I'll forget about it next time. Thanks, Nick > + } else > WARN_ON(pmi_irq_pending()); > > val = mmcra = cpuhw->mmcr.mmcra; > -- > 2.33.0 > >