This is for standard/base and all sub-level branches. For explanation, see the bottom of the commit log I append.
Zhe On 4/24/19 3:42 PM, zhe...@windriver.com wrote: > From: "Steven Rostedt (VMware)" <rost...@goodmis.org> > > He Zhe reported a crash by enabling trace events and selecting > "userstacktrace" which will read the stack of userspace for every trace > event recorded. Zhe narrowed it down to: > > c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their > usage") > > With the problem config, I was able to also reproduce the error. I > narrowed it down to just having to do the following: > > # cd /sys/kernel/tracing > # echo 1 > options/userstacktrace > # echo 1 > events/preemptirq/irq_disable/enable > > And sure enough, I triggered a crash. Well, it was systemd crashing > with a bad memory access?? > > systemd-journal[537]: segfault at ed8cb8 ip 00007f7fffc9fef5 sp > 00007ffc4062cb10 error 7 > > And it would crash similarly each time I tried it, but always at a > different place. After spending the day on this, I finally figured it > out. The bug is happening in entry_64.S right after error_entry. > There's two TRACE_IRQS_OFF in that code path, which if I comment out, > the bug goes away. Then it dawned on me that the crash always happens > when systemd does a normal page fault. We had this bug before, and it > was with the exception trace points. > > The issue is that a tracepoint can fault (reading vmalloc or whatever). > And doing a userspace stack trace most definitely will fault. But if we > are coming from a legitimate page fault, the address of that fault (in > the CR2 register) will be lost if we fault before we get to the page > fault handler. That's exactly what is happening. > > To solve this, a TRACE_IRQS_OFF_CR2 (and ON for consistency) was added > that saves the CR2 register. A new trace_hardirqs_off_thunk_cr2 is > created that stores the cr2 register, calls the > trace_hardirqs_off_caller, then on return restores the cr2 register if > it changed, before returning. > > On my tests this fixes the issue. I just want to know if this is a > legitimate fix or if someone can come up with a better fix? > > Note: this also saves the exception context just like the > do_page_fault() function does. > > Note2: This only gets enabled when lockdep or irq tracing is enabled, > which is not recommended for production environments. > > Link: > http://lkml.kernel.org/r/897cf5cf-fc24-8a64-cb28-847f2d2e6...@windriver.com > > Fixes: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify > their usage") > Signed-off-by: Steven Rostedt (VMware) <rost...@goodmis.org> > > Link: https://lore.kernel.org/lkml/20190320221534.165ab...@oasis.local.home/ > > This might not be the final solution. But the upstream thread has stopped over > a month and there is unlikely a final solution in the near future. > > Since the diff looks quite clear and does not affect other functions. It > should > be worth adding this initial patch from the maintainer. > > Signed-off-by: He Zhe <zhe...@windriver.com> > --- > arch/x86/entry/common.c | 26 ++++++++++++++++++++++++++ > arch/x86/entry/entry_64.S | 4 ++-- > arch/x86/entry/thunk_64.S | 2 ++ > arch/x86/include/asm/irqflags.h | 4 ++++ > 4 files changed, 34 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c > index 7bc105f..7edffec 100644 > --- a/arch/x86/entry/common.c > +++ b/arch/x86/entry/common.c > @@ -292,6 +292,32 @@ __visible void do_syscall_64(unsigned long nr, struct > pt_regs *regs) > > syscall_return_slowpath(regs); > } > + > +extern void trace_hardirqs_on_caller(unsigned long caller_addr); > +__visible void trace_hardirqs_on_caller_cr2(unsigned long caller_addr) > +{ > + unsigned long address = read_cr2(); /* Get the faulting address */ > + enum ctx_state prev_state; > + > + prev_state = exception_enter(); > + trace_hardirqs_on_caller(caller_addr); > + if (address != read_cr2()) > + write_cr2(address); > + exception_exit(prev_state); > +} > + > +extern void trace_hardirqs_off_caller(unsigned long caller_addr); > +__visible void trace_hardirqs_off_caller_cr2(unsigned long caller_addr) > +{ > + unsigned long address = read_cr2(); /* Get the faulting address */ > + enum ctx_state prev_state; > + > + prev_state = exception_enter(); > + trace_hardirqs_off_caller(caller_addr); > + if (address != read_cr2()) > + write_cr2(address); > + exception_exit(prev_state); > +} > #endif > > #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION) > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S > index 1f0efdb..73ddf24 100644 > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -1248,12 +1248,12 @@ ENTRY(error_entry) > * we fix gsbase, and we should do it before enter_from_user_mode > * (which can take locks). > */ > - TRACE_IRQS_OFF > + TRACE_IRQS_OFF_CR2 > CALL_enter_from_user_mode > ret > > .Lerror_entry_done: > - TRACE_IRQS_OFF > + TRACE_IRQS_OFF_CR2 > ret > > /* > diff --git a/arch/x86/entry/thunk_64.S b/arch/x86/entry/thunk_64.S > index be36bf4..1300b53 100644 > --- a/arch/x86/entry/thunk_64.S > +++ b/arch/x86/entry/thunk_64.S > @@ -41,6 +41,8 @@ > #ifdef CONFIG_TRACE_IRQFLAGS > THUNK trace_hardirqs_on_thunk,trace_hardirqs_on_caller,1 > THUNK trace_hardirqs_off_thunk,trace_hardirqs_off_caller,1 > + THUNK trace_hardirqs_on_thunk_cr2,trace_hardirqs_on_caller_cr2,1 > + THUNK trace_hardirqs_off_thunk_cr2,trace_hardirqs_off_caller_cr2,1 > #endif > > #ifdef CONFIG_DEBUG_LOCK_ALLOC > diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h > index 058e40f..dd51174 100644 > --- a/arch/x86/include/asm/irqflags.h > +++ b/arch/x86/include/asm/irqflags.h > @@ -172,9 +172,13 @@ static inline int arch_irqs_disabled(void) > #ifdef CONFIG_TRACE_IRQFLAGS > # define TRACE_IRQS_ON call trace_hardirqs_on_thunk; > # define TRACE_IRQS_OFF call trace_hardirqs_off_thunk; > +# define TRACE_IRQS_ON_CR2 call trace_hardirqs_on_thunk_cr2; > +# define TRACE_IRQS_OFF_CR2 call trace_hardirqs_off_thunk_cr2; > #else > # define TRACE_IRQS_ON > # define TRACE_IRQS_OFF > +# define TRACE_IRQS_ON_CR2 > +# define TRACE_IRQS_OFF_CR2 > #endif > #ifdef CONFIG_DEBUG_LOCK_ALLOC > # ifdef CONFIG_X86_64 -- _______________________________________________ linux-yocto mailing list linux-yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/linux-yocto