On Wed, 23 Oct 2019, Thomas Gleixner wrote: > On Tue, 22 Oct 2019, Cyrill Gorcunov wrote: > Ergo ep must be a valid pointer pointing to the statically allocated and > statically initialized estack_pages array. > > /* Guard page? */ > if (!ep->size) > > How on earth can dereferencing ep crash the machine? > > return false; > > That does not make any sense. > > Surely, we should not even try to decode exception stack when > cea_exception_stacks is not yet initialized, but that does not explain > anything what you are observing.
So looking at your actual crash: [ 0.027246] BUG: unable to handle page fault for address: 0000000000001ff0 So this derefences the stack pointer address. [ 0.082275] stk 0x1010 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep 0xffffffff82014888 ep is pointing correctly to estack_pages[1] which is bogus because 0x1010 is not a valid stack value, but dereferencing ep does not make it crash. The crash farther down: end = begin + (unsigned long)ep->size; ==> end = 0x2000 regs = (struct pt_regs *)end - 1; ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0 info->type = ep->type; info->begin = (unsigned long *)begin; info->end = (unsigned long *)end; ----> info->next_sp = (unsigned long *)regs->sp; This is the crashing instruction trying to access 0x1ff0 And you are right this happens because cea_exception_stacks is not yet initialized which makes begin = 0 and therefore point into nirvana. So the fix is trivial. Thanks, tglx 8<------------ --- a/arch/x86/kernel/dumpstack_64.c +++ b/arch/x86/kernel/dumpstack_64.c @@ -94,6 +94,13 @@ static bool in_exception_stack(unsigned BUILD_BUG_ON(N_EXCEPTION_STACKS != 6); begin = (unsigned long)__this_cpu_read(cea_exception_stacks); + /* + * Handle the case where stack trace is collected _before_ + * cea_exception_stacks had been initialized. + */ + if (!begin) + return false; + end = begin + sizeof(struct cea_exception_stacks); /* Bail if @stack is outside the exception stack area. */ if (stk < begin || stk >= end)