On Thu, 2020-03-26 at 11:28 +1100, Michael Ellerman wrote: > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you recognize the sender and know the > content is safe. > > > Joakim Tjernlund <joakim.tjernl...@infinera.com> writes: > > On Mon, 2020-03-23 at 15:45 +0100, Christophe Leroy wrote: > > > Le 23/03/2020 à 15:43, Christophe Leroy a écrit : > > > > Le 23/03/2020 à 15:17, Joakim Tjernlund a écrit : > > > > > In __die(), see below, there is this call to notify_send() with > > > > > SIGSEGV hardcoded, this seems odd > > > > > to me as the variable "err" holds the true signal(in my case SIGBUS) > > > > > Should not SIGSEGV be replaced with the true signal no.? > > > > > > > > As far as I can see, comes from > > > > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3D66fcb1059&data=02%7C01%7CJoakim.Tjernlund%40infinera.com%7Caa316058f9e34dd758c808d7d11ca391%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637207793252449714&sdata=LBzRMxHWJzNEztnnG0UzJb7PHvaDGVswQD%2B8WpY9YX8%3D&reserved=0 > > > > > > > > > > And > > > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3Dae87221d3ce49d9de1e43756da834fd0bf05a2ad&data=02%7C01%7CJoakim.Tjernlund%40infinera.com%7Caa316058f9e34dd758c808d7d11ca391%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637207793252449714&sdata=Dh%2BUTRgG85oVSgC3SCR1B7izQH4HofT4ppOMiy9xvDA%3D&reserved=0 > > > shows it is (was?) similar on x86. > > > > > > > I tried to follow that chain thinking it would end up sending a signal to > > user space but I cannot see > > that happens. Seems to be related to debugging. > > > > In short, I cannot see any signal being delivered to user space. If so that > > would explain why > > our user space process never dies. > > Is there a signal hidden in machine_check handler for SIGBUS I cannot see? > > It's platform specific. What platform are you on? > > See the ppc_md & cur_cpu_spec calls here: > > void machine_check_exception(struct pt_regs *regs) > { > int recover = 0; > bool nested = in_nmi(); > if (!nested) > nmi_enter(); > > __this_cpu_inc(irq_stat.mce_exceptions); > > add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE); > > /* See if any machine dependent calls. In theory, we would want > * to call the CPU first, and call the ppc_md. one if the CPU > * one returns a positive number. However there is existing code > * that assumes the board gets a first chance, so let's keep it > * that way for now and fix things later. --BenH. > */ > if (ppc_md.machine_check_exception) > recover = ppc_md.machine_check_exception(regs); > else if (cur_cpu_spec->machine_check) > recover = cur_cpu_spec->machine_check(regs); > > if (recover > 0) > goto bail; > > > Either the ppc_md or cpu_spec handlers can send a signal, but after a > bit of grepping I think only the pseries and powernv ones do. > > If you get into die() then it's an oops, which is not the same as a > normal signal.
I had a look at opal_machine_check and friends and came up with: diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 0381242920d9..12715d24141c 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -621,6 +621,11 @@ int machine_check_e500mc(struct pt_regs *regs) reason & MCSR_MEA ? "Effective" : "Physical", addr); } + if ((user_mode(regs))) { + _exception(SIGBUS, regs, reason, regs->nip); + recoverable = 1; + } + silent_out: mtspr(SPRN_MCSR, mcsr); return mfspr(SPRN_MCSR) == 0 && recoverable; @@ -665,6 +670,10 @@ int machine_check_e500(struct pt_regs *regs) if (reason & MCSR_BUS_RPERR) printk("Bus - Read Parity Error\n"); + if ((user_mode(regs))) { + _exception(SIGBUS, regs, reason, regs->nip); + return 1; + } return 0; } @@ -695,6 +704,10 @@ int machine_check_e200(struct pt_regs *regs) if (reason & MCSR_BUS_WRERR) printk("Bus - Write Bus Error on buffered store or cache line push\n"); + if ((user_mode(regs))) { + _exception(SIGBUS, regs, reason, regs->nip); + return 1; + } return 0; } #elif defined(CONFIG_PPC32) @@ -731,6 +744,10 @@ int machine_check_generic(struct pt_regs *regs) default: printk("Unknown values in msr\n"); } + if ((user_mode(regs))) { + _exception(SIGBUS, regs, reason, regs->nip); + return 1; + } return 0; } #endif /* everything else */ I don't really know what I am doing, does the above make sense to you? Jocke