On 02/19/2020 10:39 PM, Radu Rendec wrote:
On 02/19/2020 at 4:21 PM Christophe Leroy <christophe.le...@c-s.fr> wrote:
Radu Rendec <radu.ren...@gmail.com> a écrit :
On 02/19/2020 at 10:11 AM Radu Rendec <radu.ren...@gmail.com> wrote:
On 02/18/2020 at 1:08 PM Christophe Leroy <christophe.le...@c-s.fr> wrote:
Le 18/02/2020 à 18:07, Radu Rendec a écrit :
The saved NIP seems to be broken inside machine_check_exception() on
MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
but I have seen other weird values.
I've been able to track down the entry code to head_32.S (vector 0x200),
but I'm not sure where/how the NIP value (where the exception occurred)
is captured.
NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and
saved into _NIP(r11) in transfer_to_handler in entry_32.S
Can something clobber r12 at some point ?
I did something even simpler: I added the following
lis r12,0x1234
... right after
mfspr r12,SPRN_SRR0
... and now the NIP value I see in the crash dump is 0x12340000. This
means r12 is not clobbered and most likely the NIP value I normally see
is the actual SRR0 value.
I apologize for the noise. I just found out accidentally that the saved
NIP value is correct if interrupts are disabled at the time when the
faulty access that triggers the MCE occurs. This seems to happen
consistently.
By "interrupts are disabled" I mean local_irq_save/local_irq_restore, so
it's basically enough to wrap ioread32 to get the NIP value right.
Does this make any sense? Maybe it's not a silicon bug after all, or
maybe it is and I just found a workaround. Could this happen on other
PowerPC CPUs as well?
Interesting.
0x900 is the adress of the timer interrupt.
Would the MCE occur just after the timer interrupt ?
I doubt that. I'm using a small test module to artificially trigger the
MCE. Basically it's just this (the full code is in my original post):
bad_addr_base = ioremap(0xf0000000, 0x100);
x = ioread32(bad_addr_base);
I find it hard to believe that every time I load the module the lwbrx
instruction that triggers the MCE is executed exactly after the timer
interrupt (or that the timer interrupt always occurs close to the lwbrx
instruction).
Can you try to see how much time there is between your read and the MCE ?
The below should allow it, you'll see first value in r13 and the other
in r14 (mce.c is your test code)
Also provide the timebase frequency as reported in /proc/cpuinfo
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 97c887950c3c..0ae6a0a17e26 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -273,6 +273,7 @@ __secondary_hold_acknowledge:
. = 0x200
DO_KVM 0x200
MachineCheck:
+ mftbl r14
EXCEPTION_PROLOG_0
#ifdef CONFIG_VMAP_STACK
li r11, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */
diff --git a/arch/powerpc/platforms/83xx/mce.c
b/arch/powerpc/platforms/83xx/mce.c
index 91c2de6b73ca..0b7e4dcc0cb3 100644
--- a/arch/powerpc/platforms/83xx/mce.c
+++ b/arch/powerpc/platforms/83xx/mce.c
@@ -11,7 +11,7 @@ static int __init test_mce_init(void)
bad_addr_base = ioremap(0xf0000000, 0x100);
if (bad_addr_base) {
- __asm__ __volatile__ ("isync");
+ __asm__ __volatile__ ("isync ; mftbl 13");
x = ioread32(bad_addr_base);
pr_info("Test: %#0x\n", x);
} else
Can you tell how are configured your IO busses, etc ... ?
Nothing special. The device tree is mostly similar to mpc8379_rdb.dts,
but I can provide the actual dts if you think it's relevant.
And what's the value of SERSR after the machine check ?
I'm assuming you're talking about the IPIC SERSR register. I modified
machine_check_exception and added a call to ipic_get_mcp_status, which
seems to read IPIC_SERSR. The value is 0, both with interrupts enabled
and disabled (which makes sense, since disabling/enabling interrupts is
local to the CPU core).
And what's the reason given in the Oops message for the machine check ?
Is that "Caused by (from SRR1=49030): Transfer error ack signal" or
something else ?
Do you use the local bus monitoring driver ?
I don't. In fact, I'm not even aware of it. What driver is that?
CONFIG_FSL_LBC
Christophe