On Sat, 20 Jul 2019, Thomas Gleixner wrote: > On Fri, 19 Jul 2019, Sean Christopherson wrote: > > On Tue, Jul 02, 2019 at 01:39:05PM -0400, Steven Rostedt wrote: > > > > I'm hitting a similar panic that bisects to commit > > > > a0d14b8909de ("x86/mm, tracing: Fix CR2 corruption") > > > > except I'm experiencing death immediately after starting init. > > > > Through sheer dumb luck, I tracked (pun intended) this down to forcing > > context tracking: > > > > CONFIG_CONTEXT_TRACKING=y > > CONFIG_CONTEXT_TRACKING_FORCE=y > > CONFIG_VIRT_CPU_ACCOUNTING_GEN=y > > > > I haven't attempted to debug further and I'll be offline for most of the > > next few days. Hopefully this is enough to root cause the badness. > > > > [ 0.680477] Run /sbin/init as init process > > [ 0.682116] init[1]: segfault at 2926a7ef ip 00007f98a49d9c30 sp > > 00007fffd83e6af0 error 14 in ld-2.23.so[7f98a49d9000+26000] > > That's because the call into the context tracking muck clobbers RDX which > contains the CR2 value on pagefault. So the pagefault resolves to crap and > kills init. > > Brute force fix below. That needs to be conditional on read_cr2 but for now > it does the job.
But it does it just for the context tracking case. TRACE_IRQS_OFF* will do the same damage. Fix is not pretty, but ... Thanks, tglx 8<----------- --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -876,6 +876,7 @@ apicinterrupt IRQ_WORK_VECTOR irq_work .if \read_cr2 GET_CR2_INTO(%rdx); /* can clobber %rax */ + pushq %rdx .endif .if \shift_ist != -1 @@ -885,12 +886,20 @@ apicinterrupt IRQ_WORK_VECTOR irq_work .endif .if \paranoid == 0 + .if \read_cr2 + testb $3, CS + 8(%rsp) + .else testb $3, CS(%rsp) + .endif jz .Lfrom_kernel_no_context_tracking_\@ CALL_enter_from_user_mode .Lfrom_kernel_no_context_tracking_\@: .endif + .if \read_cr2 + popq %rdx + .endif + movq %rsp, %rdi /* pt_regs pointer */ .if \has_error_code