Hugh Dickins <hu...@google.com> writes: > On Sun, 3 May 2020, Michael Ellerman wrote: > >> Hugh reported that his trusty G5 crashed after a few hours under load >> with an "Unrecoverable exception 380". >> >> The crash is in interrupt_return() where we check lazy_irq_pending(), >> which calls get_paca() and with CONFIG_DEBUG_PREEMPT=y that goes to >> check_preemption_disabled() via debug_smp_processor_id(). >> >> As Nick explained on the list: >> >> Problem is MSR[RI] is cleared here, ready to do the last few things >> for interrupt return where we're not allowed to take any other >> interrupts. >> >> SLB interrupts can happen just about anywhere aside from kernel >> text, global variables, and stack. When that hits, it appears to be >> unrecoverable due to RI=0. >> >> The problematic access is in preempt_count() which is: >> >> return READ_ONCE(current_thread_info()->preempt_count); >> >> Because of THREAD_INFO_IN_TASK, current_thread_info() just points to >> current, so the access is to somewhere in kernel memory, but not on >> the stack or in .data, which means it can cause an SLB miss. If we >> take an SLB miss with RI=0 it is fatal. >> >> The easiest solution is to add a version of lazy_irq_pending() that >> doesn't do the preemption check and call it from the interrupt return >> path. >> >> Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic >> in C") >> Reported-by: Hugh Dickins <hu...@google.com> >> Signed-off-by: Michael Ellerman <m...@ellerman.id.au> > > Thank you, Michael and Nick: this has been running fine all day for me.
Thanks Hugh. cheers