Michael Ellerman wrote:
Nicholas Piggin <npig...@gmail.com> writes:
On Thu, 26 Jul 2018 23:01:51 +1000
Michael Ellerman <m...@ellerman.id.au> wrote:

If we take an SLB miss while MSR[RI]=0 we can't recover and have to
oops. Currently this is reported by faking up a 0x4100 exception, eg:

  Unrecoverable exception 4100 at 0
  Oops: Unrecoverable exception, sig: 6 [#1]
  ...
  CPU: 0 PID: 1262 Comm: sh Not tainted 
4.18.0-rc3-gcc-7.3.1-00098-g7fc2229fb2ab-dirty #9
  NIP:  0000000000000000 LR: c00000000000b9e4 CTR: 00007fff8bb971b0
  REGS: c0000000ee02bbb0 TRAP: 4100
  ...
  LR [c00000000000b9e4] system_call+0x5c/0x70

The 0x4100 value was chosen back in 2004 as part of the fix for the
"mega bug" - "ppc64: Fix SLB reload bug". Back then it was obvious
that 0x4100 was not a real trap value, as the highest actual trap was
less than 0x2000.

Since then however the architecture has changed and now we have
"virtual mode" or "relon" exceptions, in which exceptions can be
delivered with the MMU on starting at 0x4000.

At a glance 0x4100 looks like a virtual mode 0x100 exception, aka
system reset exception. A close reading of the architecture will show
that system reset exceptions can't be delivered in virtual mode, and
so 0x4100 is not a valid trap number. But that's not immediately
obvious. There's also nothing about 0x4100 that suggests SLB miss.

So to make things a bit less confusing switch to a fake but unique and
hopefully more helpful numbering. For data SLB misses we report a
0x390 trap and for instruction we report 0x490. Compared to 0x380 and
0x480 for the actual data & instruction SLB exceptions.

Also add a C handler that prints a more explicit message. The end
result is something like:

  Oops: Unrecoverable SLB miss (MSR[RI]=0), sig: 6 [#3]

This is all good, but allow me to nitpick. Our unrecoverable
exception messages (and other messages, but those) are becoming a bit
ad-hoc and messy.

It would be nice to go the other way eventually and consolidate them
into one. Would be nice to have a common function that takes regs and
returns the string of the corresponding exception name that makes
these more readable.

Yeah that's true, though some of them aren't simply a mapping from the
trap number, eg. the kernel bad stack one.

But in general our whole oops output, regs, stack trace etc. could use a
revamp.

I've been thinking of making the trap number more prominent and
providing a text description, because apparently not everyone knows the
trap numbers by heart :)

Yes please, guilty as charged :)
https://patchwork.ozlabs.org/patch/899980/

Thanks,
Naveen


Reply via email to