On Tue, Sep 4, 2018 at 12:50 PM, Sean Christopherson <sean.j.christopher...@intel.com> wrote: > Cc-ing Jann and Andy. > > On Tue, Aug 07, 2018 at 10:29:20AM -0700, Sean Christopherson wrote: >> Kernel addresses are always mapped with _PAGE_USER=0, i.e. PKRU >> isn't enforced, and so we should never see X86_PF_PK set on a >> kernel address fault. WARN once to capture the issue in case we >> somehow don't die, e.g. the access originated in userspace. >> >> Remove a similar check and its comment from spurious_fault_check(). >> The intent of the comment (and later code[1]) was simply to document >> that spurious faults due to protection keys should be impossible, but >> that's irrelevant and a bit of a red herring since we should never >> get a protection keys fault on a kernel address regardless of the >> kernel's TLB flushing behavior. >> >> [1] >> http://lists-archives.com/linux-kernel/28407455-x86-pkeys-new-page-fault-error-code-bit-pf_pk.html >> >> Signed-off-by: Sean Christopherson <sean.j.christopher...@intel.com> >> Cc: Dave Hansen <dave.han...@intel.com> >> --- >> There's no indication that this condition has ever been encountered. >> I came across the code in spurious_fault_check() and was confused as >> to why we would unconditionally treat a protection keys fault as >> spurious when the comment explicitly stated that such a case should >> be impossible. >> >> Dave Hansen suggested adding a WARN_ON_ONCE in spurious_fault_check(), >> but it seemed more appropriate to freak out on any protection keys >> fault on a kernel address since that would imply a hardware issue or >> kernel bug. I omitted a Suggested-by since this isn't necessarily >> what Dave had in mind. >> >> arch/x86/mm/fault.c | 16 ++++++++++------ >> 1 file changed, 10 insertions(+), 6 deletions(-) >> >> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c >> index 2aafa6ab6103..f19a55972136 100644 >> --- a/arch/x86/mm/fault.c >> +++ b/arch/x86/mm/fault.c >> @@ -1040,12 +1040,6 @@ static int spurious_fault_check(unsigned long >> error_code, pte_t *pte) >> >> if ((error_code & X86_PF_INSTR) && !pte_exec(*pte)) >> return 0; >> - /* >> - * Note: We do not do lazy flushing on protection key >> - * changes, so no spurious fault will ever set X86_PF_PK. >> - */ >> - if ((error_code & X86_PF_PK)) >> - return 1; >> >> return 1; >> } >> @@ -1241,6 +1235,14 @@ __do_page_fault(struct pt_regs *regs, unsigned long >> error_code, >> * protection error (error_code & 9) == 0. >> */ >> if (unlikely(fault_in_kernel_space(address))) { >> + /* >> + * We should never encounter a protection keys fault on a >> + * kernel address as kernel address are always mapped with >> + * _PAGE_USER=0, i.e. PKRU isn't enforced. >> + */ >> + if (WARN_ON_ONCE(error_code & X86_PF_PK)) >> + goto bad_kernel_address; >> + >> if (!(error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) >> { >> if (vmalloc_fault(address) >= 0) >> return; >> @@ -1253,6 +1255,8 @@ __do_page_fault(struct pt_regs *regs, unsigned long >> error_code, >> /* kprobes don't want to hook the spurious faults: */ >> if (kprobes_fault(regs)) >> return; >> + >> +bad_kernel_address: >> /* >> * Don't take the mm semaphore here. If we fixup a prefetch >> * fault we could otherwise deadlock:
I have no objection to this patch. Dave, why did you think that we could get a PK fault on the vsyscall page, even on kernels that still marked it executable? Sure, you could get an instruction in the vsyscall page to get a PK fault, but CR2 wouldn't point to the vsyscall page, right? >> -- >> 2.18.0 >>