Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Thu, Jan 04, 2018 at 01:28:59PM +0100, Thomas Gleixner wrote: > On Wed, 3 Jan 2018, Andy Lutomirski wrote: > > Our memory map code is utter shite. This kind of bug should not be > > possible without a giant warning at boot that something is screwed up. > > You're right it's utter shite and the KASLR folks who added this insanity > of making vaddr_end depend on a gazillion of config options and not > documenting it in mm.txt or elsewhere where it's obvious to find should > really sit back and think hard about their half baken 'security' features. > > Just look at the insanity of comment above the vaddr_end ifdef maze. > > Benjamin, can you test the patch below please? Seems to work! Thanks, --Benjamin Gilbert
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Thu, 4 Jan 2018, Andy Lutomirski wrote: > On Thu, Jan 4, 2018 at 4:28 AM, Thomas Gleixner wrote: > > --- a/arch/x86/include/asm/pgtable_64_types.h > > +++ b/arch/x86/include/asm/pgtable_64_types.h > > @@ -88,7 +88,7 @@ typedef struct { pteval_t pte; } pte_t; > > # define VMALLOC_SIZE_TB _AC(32, UL) > > # define __VMALLOC_BASE_AC(0xc900, UL) > > # define __VMEMMAP_BASE_AC(0xea00, UL) > > -# define LDT_PGD_ENTRY _AC(-4, UL) > > +# define LDT_PGD_ENTRY _AC(-3, UL) > > # define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT) > > #endif > > If you actually change the memory map order, you need to change the > shadow copy in mm/dump_pagetables.c, too. I have a draft patch to > just sort the damn list, but that's not ready yet. Yes, I forgot that in the first attempt. Noticed myself when dumping it, but that should be irrelevant to figure out whether it fixes the problem at hand.
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Thu, Jan 4, 2018 at 4:28 AM, Thomas Gleixner wrote: > On Wed, 3 Jan 2018, Andy Lutomirski wrote: >> On Wed, Jan 3, 2018 at 8:35 PM, Benjamin Gilbert >> wrote: >> > On Wed, Jan 03, 2018 at 04:37:53PM -0800, Andy Lutomirski wrote: >> >> Maybe try rebuilding a bad kernel with free_ldt_pgtables() modified >> >> to do nothing, and the read /sys/kernel/debug/page_tables/current (or >> >> current_kernel, or whatever it's called). The problem may be obvious. >> > >> > current_kernel attached. I have not seen any crashes with >> > free_ldt_pgtables() stubbed out. >> >> I haven't reproduced it, but I think I see what's wrong. KASLR sets >> vaddr_end to a totally bogus value. It should be no larger than >> LDT_BASE_ADDR. I suspect that your vmemmap is getting randomized into >> the LDT range. If it weren't for that, it could just as easily land >> in the cpu_entry_area range. This will need fixing in all versions >> that aren't still called KAISER. >> >> Our memory map code is utter shite. This kind of bug should not be >> possible without a giant warning at boot that something is screwed up. > > You're right it's utter shite and the KASLR folks who added this insanity > of making vaddr_end depend on a gazillion of config options and not > documenting it in mm.txt or elsewhere where it's obvious to find should > really sit back and think hard about their half baken 'security' features. > > Just look at the insanity of comment above the vaddr_end ifdef maze. > > Benjamin, can you test the patch below please? > > Thanks, > > tglx > > 8<-- > --- a/Documentation/x86/x86_64/mm.txt > +++ b/Documentation/x86/x86_64/mm.txt > @@ -12,8 +12,9 @@ ea00 - eaff (=40 > ... unused hole ... > ec00 - fbff (=44 bits) kasan shadow memory (16TB) > ... unused hole ... > -fe00 - fe7f (=39 bits) LDT remap for PTI > -fe80 - feff (=39 bits) cpu_entry_area mapping > + vaddr_end for KASLR > +fe00 - fe7f (=39 bits) cpu_entry_area mapping > +fe80 - feff (=39 bits) LDT remap for PTI > ff00 - ff7f (=39 bits) %esp fixup stacks > ... unused hole ... > ffef - fffe (=64 GB) EFI region mapping space > @@ -37,7 +38,9 @@ ffd4 - ffd5 (=49 > ... unused hole ... > ffdf - fc00 (=53 bits) kasan shadow memory (8PB) > ... unused hole ... > -fe80 - feff (=39 bits) cpu_entry_area mapping > + vaddr_end for KASLR > +fe00 - fe7f (=39 bits) cpu_entry_area mapping > +... unused hole ... > ff00 - ff7f (=39 bits) %esp fixup stacks > ... unused hole ... > ffef - fffe (=64 GB) EFI region mapping space > --- a/arch/x86/include/asm/pgtable_64_types.h > +++ b/arch/x86/include/asm/pgtable_64_types.h > @@ -88,7 +88,7 @@ typedef struct { pteval_t pte; } pte_t; > # define VMALLOC_SIZE_TB _AC(32, UL) > # define __VMALLOC_BASE_AC(0xc900, UL) > # define __VMEMMAP_BASE_AC(0xea00, UL) > -# define LDT_PGD_ENTRY _AC(-4, UL) > +# define LDT_PGD_ENTRY _AC(-3, UL) > # define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT) > #endif If you actually change the memory map order, you need to change the shadow copy in mm/dump_pagetables.c, too. I have a draft patch to just sort the damn list, but that's not ready yet.
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Wed, 3 Jan 2018, Andy Lutomirski wrote: > On Wed, Jan 3, 2018 at 8:35 PM, Benjamin Gilbert > wrote: > > On Wed, Jan 03, 2018 at 04:37:53PM -0800, Andy Lutomirski wrote: > >> Maybe try rebuilding a bad kernel with free_ldt_pgtables() modified > >> to do nothing, and the read /sys/kernel/debug/page_tables/current (or > >> current_kernel, or whatever it's called). The problem may be obvious. > > > > current_kernel attached. I have not seen any crashes with > > free_ldt_pgtables() stubbed out. > > I haven't reproduced it, but I think I see what's wrong. KASLR sets > vaddr_end to a totally bogus value. It should be no larger than > LDT_BASE_ADDR. I suspect that your vmemmap is getting randomized into > the LDT range. If it weren't for that, it could just as easily land > in the cpu_entry_area range. This will need fixing in all versions > that aren't still called KAISER. > > Our memory map code is utter shite. This kind of bug should not be > possible without a giant warning at boot that something is screwed up. You're right it's utter shite and the KASLR folks who added this insanity of making vaddr_end depend on a gazillion of config options and not documenting it in mm.txt or elsewhere where it's obvious to find should really sit back and think hard about their half baken 'security' features. Just look at the insanity of comment above the vaddr_end ifdef maze. Benjamin, can you test the patch below please? Thanks, tglx 8<-- --- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -12,8 +12,9 @@ ea00 - eaff (=40 ... unused hole ... ec00 - fbff (=44 bits) kasan shadow memory (16TB) ... unused hole ... -fe00 - fe7f (=39 bits) LDT remap for PTI -fe80 - feff (=39 bits) cpu_entry_area mapping + vaddr_end for KASLR +fe00 - fe7f (=39 bits) cpu_entry_area mapping +fe80 - feff (=39 bits) LDT remap for PTI ff00 - ff7f (=39 bits) %esp fixup stacks ... unused hole ... ffef - fffe (=64 GB) EFI region mapping space @@ -37,7 +38,9 @@ ffd4 - ffd5 (=49 ... unused hole ... ffdf - fc00 (=53 bits) kasan shadow memory (8PB) ... unused hole ... -fe80 - feff (=39 bits) cpu_entry_area mapping + vaddr_end for KASLR +fe00 - fe7f (=39 bits) cpu_entry_area mapping +... unused hole ... ff00 - ff7f (=39 bits) %esp fixup stacks ... unused hole ... ffef - fffe (=64 GB) EFI region mapping space --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -88,7 +88,7 @@ typedef struct { pteval_t pte; } pte_t; # define VMALLOC_SIZE_TB _AC(32, UL) # define __VMALLOC_BASE_AC(0xc900, UL) # define __VMEMMAP_BASE_AC(0xea00, UL) -# define LDT_PGD_ENTRY _AC(-4, UL) +# define LDT_PGD_ENTRY _AC(-3, UL) # define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT) #endif @@ -110,7 +110,7 @@ typedef struct { pteval_t pte; } pte_t; #define ESPFIX_PGD_ENTRY _AC(-2, UL) #define ESPFIX_BASE_ADDR (ESPFIX_PGD_ENTRY << P4D_SHIFT) -#define CPU_ENTRY_AREA_PGD _AC(-3, UL) +#define CPU_ENTRY_AREA_PGD _AC(-4, UL) #define CPU_ENTRY_AREA_BASE(CPU_ENTRY_AREA_PGD << P4D_SHIFT) #define EFI_VA_START ( -4 * (_AC(1, UL) << 30)) --- a/arch/x86/mm/kaslr.c +++ b/arch/x86/mm/kaslr.c @@ -34,25 +34,14 @@ #define TB_SHIFT 40 /* - * Virtual address start and end range for randomization. The end changes base - * on configuration to have the highest amount of space for randomization. - * It increases the possible random position for each randomized region. + * Virtual address start and end range for randomization. * - * You need to add an if/def entry if you introduce a new memory region - * compatible with KASLR. Your entry must be in logical order with memory - * layout. For example, ESPFIX is before EFI because its virtual address is - * before. You also need to add a BUILD_BUG_ON() in kernel_randomize_memory() to - * ensure that this order is correct and won't be changed. + * The end address could depend on more configuration options to make the + * highest amount of space for randomization available, but that's too hard + * to keep straight. */ static const unsigned long vaddr_start = __PAGE_OFFSET_BASE; - -#if defined(CONFIG_X86_ESPFIX64) -static const unsigned long vaddr_end = ESPFIX_BASE_ADDR; -#elif defined(CONFIG_EFI) -static const unsigned long vaddr_end = EFI_VA_END; -#else -static const unsigned long vaddr_end = __START_KERNEL_map; -#endif +static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE; /* Default values */ unsigned
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Thu, Jan 04, 2018 at 08:20:31AM +0100, Ingo Molnar wrote: > > * Greg Kroah-Hartman wrote: > > > On Thu, Jan 04, 2018 at 08:14:21AM +0100, Ingo Molnar wrote: > > > - (or it's something I missed to consider) > > > > It was a operator error, the issue is also on 4.15-rc6, see another > > email in this thread :) > > ah, ok :-) > > Nevertheless it made sense to go through all the backport candidate commits > again, > nothing stuck out as a must-have for -stable! ;-) Yes, thanks for doing that, much appreciated, there's been too many patches flying around and I am always worried I have missed something. thanks, greg k-h
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
* Ingo Molnar wrote: > These will cherry-pick cleanly, so it would be nice to test them on top of of > the > -stable kernel that fails: > > for N in 450cbdd0125c 4d2dc2cc766c 1e0f25dbf246 be62a3204406 0c3292ca8025 > 9d0b62328d34; do git cherry-pick $N; done > > if this brute-force approach resolves the problem then we have a shorter list > of > fixes to look at. As per Greg's followup this should not matter - but nevertheless for completeness these commits also need f54bb2ec02c83 as a dependency, so the full list is: for N in 450cbdd0125c 4d2dc2cc766c 1e0f25dbf246 be62a3204406 0c3292ca8025 9d0b62328d34 f54bb2ec02c83; do git cherry-pick $N; done Thanks, Ingo
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
* Greg Kroah-Hartman wrote: > On Thu, Jan 04, 2018 at 08:14:21AM +0100, Ingo Molnar wrote: > > - (or it's something I missed to consider) > > It was a operator error, the issue is also on 4.15-rc6, see another > email in this thread :) ah, ok :-) Nevertheless it made sense to go through all the backport candidate commits again, nothing stuck out as a must-have for -stable! ;-) Thanks, Ingo
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Thu, Jan 04, 2018 at 08:14:21AM +0100, Ingo Molnar wrote: > - (or it's something I missed to consider) It was a operator error, the issue is also on 4.15-rc6, see another email in this thread :)
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
* Thomas Gleixner wrote: > On Wed, 3 Jan 2018, Benjamin Gilbert wrote: > > > On Wed, Jan 03, 2018 at 10:20:16AM +0100, Greg Kroah-Hartman wrote: > > > Ick, not good, any chance you can test 4.15-rc6 to verify that the issue > > > is also there (or not)? > > > > I haven't been able to reproduce this on 4.15-rc6. > > Hmm. So we need to scrutinize the subtle differences between 4.15-rc6 and > 4.14.11 So here's a list of candidate 'missing commits': triton:~/tip> git log --oneline --no-merges WIP.x86/pti..linus arch/x86 | grep -viE 'apic|irq|vector|probe|kvm|timer|rdt|crypto|platform|tsc|insn|xen|mpx|umip|efi|build|parav|SEV|kmemch|power|stacktrace|unwind|kmmio|dma|boot|PCI|resource|init|virt|kexec|unused|perf|5-level' 10a7e9d84915: Do not hash userspace addresses in fault handlers f5b5fab1780c: x86/decoder: Fix and update the opcodes map 88edb57d1e0b: x86/vdso: Change time() prototype to match __vdso_time() d553d03f7057: x86: Fix Sparse warnings about non-static functions f4e9b7af0cd5: x86/microcode/AMD: Add support for fam17h microcode loading e3811a3f74bd: x86/cpufeatures: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD 328b4ed93b69: x86: don't hash faulting address in oops printout b562c171cf01: locking/refcounts: Do not force refcount_t usage as GPL-only export 1501899a898d: mm: fix device-dax pud write-faults triggered by get_user_pages() 55d2d0ad2fb4: x86/idt: Load idt early in start_secondary 9d0b62328d34: x86/tlb: Disable interrupts when changing CR4 0c3292ca8025: x86/tlb: Refactor CR4 setting and shadow write 12a78d43de76: x86/decoder: Add new TEST instruction pattern 30bb9811856f: x86/topology: Avoid wasting 128k for package id array 252714155f04: x86/acpi: Handle SCI interrupts above legacy space gracefully be62a3204406: x86/mm: Limit mmap() of /dev/mem to valid physical addresses 1e0f25dbf246: x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border fcdaf842bd8f: mm, sparse: do not swamp log with huge vmemmap allocation failures 353b1e7b5859: x86/mm: set fields in deferred pages 7d5905dc14a8: x86 / CPU: Always show current CPU frequency in /proc/cpuinfo 4d2dc2cc766c: fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall b29c6ef7bb12: x86 / CPU: Avoid unnecessary IPIs in arch_freq_get_on_cpu() 450cbdd0125c: locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE 9f08890ab906: x86/pvclock: add setter for pvclock_pvti_cpu0_va c5e260890d5f: x86/mm: Remove unnecessary TLB flush for SME in-place encryption 4a75aeacda3c: ACPI / APEI: Remove arch_apei_flush_tlb_one() e4dca7b7aa08: treewide: Fix function prototypes for module_param_call() 7ed4325a44ea: Drivers: hv: vmbus: Make panic reporting to be more useful 6aa7de059173: locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() 506458efaf15: locking/barriers: Convert users of lockless_dereference() to READ_ONCE() 0cfe5b5fc027: x86: Use ARRAY_SIZE c1bd743e54cd: arch/x86: remove redundant null checks before kmem_cache_destroy a4c1887d4c14: locking/arch: Remove dummy arch_{read,spin,write}_lock_flags() implementations 0160fb177d48: locking/arch: Remove dummy arch_{read,spin,write}_relax() implementations 19c60923010b: locking/arch, x86: Add __down_read_killable() 39208aa7ecb7: locking/refcounts, x86/asm: Enable CONFIG_ARCH_HAS_REFCOUNT 564c9cc84e2a: locking/refcounts, x86/asm: Use unique .text section for refcount exceptions 30c23f29d2d5: locking/x86: Use named operands in rwsem.h Note the exclusion regex pattern which might be overly aggressive. Taking out the commits that should have no real effect leads to this list: f4e9b7af0cd5: x86/microcode/AMD: Add support for fam17h microcode loading e3811a3f74bd: x86/cpufeatures: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD 1501899a898d: mm: fix device-dax pud write-faults triggered by get_user_pages() 55d2d0ad2fb4: x86/idt: Load idt early in start_secondary 9d0b62328d34: x86/tlb: Disable interrupts when changing CR4 0c3292ca8025: x86/tlb: Refactor CR4 setting and shadow write 252714155f04: x86/acpi: Handle SCI interrupts above legacy space gracefully be62a3204406: x86/mm: Limit mmap() of /dev/mem to valid physical addresses 1e0f25dbf246: x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border fcdaf842bd8f: mm, sparse: do not swamp log with huge vmemmap allocation failures 353b1e7b5859: x86/mm: set fields in deferred pages 7d5905dc14a8: x86 / CPU: Always show current CPU frequency in /proc/cpuinfo 4d2dc2cc766c: fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall b29c6ef7bb12: x86 / CPU: Avoid unnecessary IPIs in arch_freq_get_on_cpu() 450cbdd0125c: locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE 6aa7de059173: locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() 506458efaf15: locking/barriers: Convert users of lockless_dereference() to READ_ONCE() a4c1887d4c14: locking/arch: Remove dummy arch_{r
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Wed, Jan 3, 2018 at 8:35 PM, Benjamin Gilbert wrote: > On Wed, Jan 03, 2018 at 04:37:53PM -0800, Andy Lutomirski wrote: >> Maybe try rebuilding a bad kernel with free_ldt_pgtables() modified >> to do nothing, and the read /sys/kernel/debug/page_tables/current (or >> current_kernel, or whatever it's called). The problem may be obvious. > > current_kernel attached. I have not seen any crashes with > free_ldt_pgtables() stubbed out. I haven't reproduced it, but I think I see what's wrong. KASLR sets vaddr_end to a totally bogus value. It should be no larger than LDT_BASE_ADDR. I suspect that your vmemmap is getting randomized into the LDT range. If it weren't for that, it could just as easily land in the cpu_entry_area range. This will need fixing in all versions that aren't still called KAISER. Our memory map code is utter shite. This kind of bug should not be possible without a giant warning at boot that something is screwed up.
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Wed, Jan 03, 2018 at 05:37:42PM -0800, Benjamin Gilbert wrote: > I was caught by the fact that 4.14.11 has PAGE_TABLE_ISOLATION default y > but 4.15-rc6 doesn't. Retesting. It turns out that 4.15-rc6 has the same problem as 4.14.11. --Benjamin Gilbert
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Wed, Jan 03, 2018 at 04:37:53PM -0800, Andy Lutomirski wrote: > Maybe try rebuilding a bad kernel with free_ldt_pgtables() modified > to do nothing, and the read /sys/kernel/debug/page_tables/current (or > current_kernel, or whatever it's called). The problem may be obvious. current_kernel attached. I have not seen any crashes with free_ldt_pgtables() stubbed out. --Benjamin Gilbert current_kernel.gz Description: application/gzip
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Wed, Jan 03, 2018 at 04:33:03PM -0800, Benjamin Gilbert wrote: > I haven't been able to reproduce this on 4.15-rc6. This is bad data. I was caught by the fact that 4.14.11 has PAGE_TABLE_ISOLATION default y but 4.15-rc6 doesn't. Retesting. --Benjamin Gilbert
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Wed, Jan 03, 2018 at 04:27:04PM -0800, Andy Lutomirski wrote: > How much memory does the affected system have? It sounds like something > is mapped in the LDT region and is getting corrupted because the LDT code > expects to own that region. We've seen this on systems from 1 to 7 GB. --Benjamin Gilbert
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Wed, 3 Jan 2018, Benjamin Gilbert wrote: > On Wed, Jan 03, 2018 at 10:20:16AM +0100, Greg Kroah-Hartman wrote: > > Ick, not good, any chance you can test 4.15-rc6 to verify that the issue > > is also there (or not)? > > I haven't been able to reproduce this on 4.15-rc6. Hmm. So we need to scrutinize the subtle differences between 4.15-rc6 and 4.14.11
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
> On Jan 3, 2018, at 4:33 PM, Benjamin Gilbert > wrote: > >> On Wed, Jan 03, 2018 at 10:20:16AM +0100, Greg Kroah-Hartman wrote: >> Ick, not good, any chance you can test 4.15-rc6 to verify that the issue >> is also there (or not)? > > I haven't been able to reproduce this on 4.15-rc6. Ah. Maybe try rebuilding a bad kernel with free_ldt_pgtables() modified to do nothing, and the read /sys/kernel/debug/page_tables/current (or current_kernel, or whatever it's called). The problem may be obvious. > > --Benjamin Gilbert
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Wed, Jan 03, 2018 at 10:20:16AM +0100, Greg Kroah-Hartman wrote: > Ick, not good, any chance you can test 4.15-rc6 to verify that the issue > is also there (or not)? I haven't been able to reproduce this on 4.15-rc6. --Benjamin Gilbert
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
> On Jan 3, 2018, at 2:58 PM, Thomas Gleixner wrote: > > > >> On Wed, 3 Jan 2018, Thomas Gleixner wrote: >> >>> On Wed, 3 Jan 2018, Benjamin Gilbert wrote: On Wed, Jan 03, 2018 at 11:34:46PM +0100, Thomas Gleixner wrote: Can you please send me your .config and a full dmesg ? >>> >>> I've attached a serial log from a local QEMU. I can rerun with a higher >>> loglevel if need be. >> >> Thanks! >> >> Cc'ing Andy who might have an idea and he's probably more away than I > > s/away/awake/ just to demonstrate the state I'm in ... > >> am. Will have a look tomorrow if Andy does not beat me to it. How much memory does the affected system have? It sounds like something is mapped in the LDT region and is getting corrupted because the LDT code expects to own that region. I got almost exactly this failure in an earlier version of the code when I typed the LDT base address macro. I'll try to reproduce. >> >> Thanks, >> >>tglx >>
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Wed, 3 Jan 2018, Andy Lutomirski wrote: > > On Jan 3, 2018, at 2:58 PM, Thomas Gleixner wrote: > >> On Wed, 3 Jan 2018, Thomas Gleixner wrote: > >> > >>> On Wed, 3 Jan 2018, Benjamin Gilbert wrote: > On Wed, Jan 03, 2018 at 11:34:46PM +0100, Thomas Gleixner wrote: > Can you please send me your .config and a full dmesg ? > >>> > >>> I've attached a serial log from a local QEMU. I can rerun with a higher > >>> loglevel if need be. > >> > >> Thanks! > >> > >> Cc'ing Andy who might have an idea and he's probably more away than I > > > > s/away/awake/ just to demonstrate the state I'm in ... > > > >> am. Will have a look tomorrow if Andy does not beat me to it. > > Can you forward me more of the thread? On the way.
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
> On Jan 3, 2018, at 2:58 PM, Thomas Gleixner wrote: > > > >> On Wed, 3 Jan 2018, Thomas Gleixner wrote: >> >>> On Wed, 3 Jan 2018, Benjamin Gilbert wrote: On Wed, Jan 03, 2018 at 11:34:46PM +0100, Thomas Gleixner wrote: Can you please send me your .config and a full dmesg ? >>> >>> I've attached a serial log from a local QEMU. I can rerun with a higher >>> loglevel if need be. >> >> Thanks! >> >> Cc'ing Andy who might have an idea and he's probably more away than I > > s/away/awake/ just to demonstrate the state I'm in ... > >> am. Will have a look tomorrow if Andy does not beat me to it. Can you forward me more of the thread? >> >> Thanks, >> >>tglx >>
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Wed, 3 Jan 2018, Thomas Gleixner wrote: > On Wed, 3 Jan 2018, Benjamin Gilbert wrote: > > On Wed, Jan 03, 2018 at 11:34:46PM +0100, Thomas Gleixner wrote: > > > Can you please send me your .config and a full dmesg ? > > > > I've attached a serial log from a local QEMU. I can rerun with a higher > > loglevel if need be. > > Thanks! > > Cc'ing Andy who might have an idea and he's probably more away than I s/away/awake/ just to demonstrate the state I'm in ... > am. Will have a look tomorrow if Andy does not beat me to it. > > Thanks, > > tglx >
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Wed, 3 Jan 2018, Benjamin Gilbert wrote: > On Wed, Jan 03, 2018 at 11:34:46PM +0100, Thomas Gleixner wrote: > > Can you please send me your .config and a full dmesg ? > > I've attached a serial log from a local QEMU. I can rerun with a higher > loglevel if need be. Thanks! Cc'ing Andy who might have an idea and he's probably more away than I am. Will have a look tomorrow if Andy does not beat me to it. Thanks, tglx
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Wed, Jan 03, 2018 at 11:34:46PM +0100, Thomas Gleixner wrote: > Can you please send me your .config and a full dmesg ? I've attached a serial log from a local QEMU. I can rerun with a higher loglevel if need be. --Benjamin Gilbert config-4.14.11.gz Description: application/gzip console.txt.gz Description: application/gzip
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Wed, 3 Jan 2018, Benjamin Gilbert wrote: > On Wed, Jan 03, 2018 at 04:48:33PM +0100, Ingo Molnar wrote: > > first please test the latest WIP.x86/pti branch which has a couple of fixes. > > I'm still seeing the problem with that branch (3ffdeb1a02be, plus a couple > of local patches which shouldn't affect the resulting binary). Can you please send me your .config and a full dmesg ? Thanks, tglx
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Wed, Jan 03, 2018 at 04:48:33PM +0100, Ingo Molnar wrote: > first please test the latest WIP.x86/pti branch which has a couple of fixes. I'm still seeing the problem with that branch (3ffdeb1a02be, plus a couple of local patches which shouldn't affect the resulting binary). --Benjamin Gilbert
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
* Greg Kroah-Hartman wrote: > On Wed, Jan 03, 2018 at 12:46:00AM -0800, Benjamin Gilbert wrote: > > [resending with less web] > > (adding lkml and x86 developers) > > > Hi all, > > > > In our regression tests on kernel 4.14.11, we're occasionally seeing a run > > of "bad pmd" messages during boot, followed by a "BUG: unable to handle > > kernel paging request". This happens on no more than a couple percent of > > boots, but we've seen it on AWS HVM, GCE, Oracle Cloud VMs, and local QEMU > > instances. It always happens immediately after "Loading compiled-in X.509 > > certificates". I can't reproduce it on 4.14.10, nor, so far, on 4.14.11 > > with pti=off. Here's a sample backtrace: A few other things to check: first please test the latest WIP.x86/pti branch which has a couple of fixes. In a -stable kernel tree you should be able to do: git pull --no-tags git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/pti in particular this recent fix from a couple of hours ago might make a difference: 52994c256df3: x86/pti: Make sure the user/kernel PTEs match Note that this commit: 694d99d40972: x86/cpu, x86/pti: Do not enable PTI on AMD processors disables PTI on AMD CPUs - so if you'd like to test it more broadly on all CPUs then you'll need to add "pti=on" to your boot commandline. Thanks, Ingo
Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
On Wed, Jan 03, 2018 at 12:46:00AM -0800, Benjamin Gilbert wrote: > [resending with less web] (adding lkml and x86 developers) > Hi all, > > In our regression tests on kernel 4.14.11, we're occasionally seeing a run > of "bad pmd" messages during boot, followed by a "BUG: unable to handle > kernel paging request". This happens on no more than a couple percent of > boots, but we've seen it on AWS HVM, GCE, Oracle Cloud VMs, and local QEMU > instances. It always happens immediately after "Loading compiled-in X.509 > certificates". I can't reproduce it on 4.14.10, nor, so far, on 4.14.11 > with pti=off. Here's a sample backtrace: > > [4.762964] Loading compiled-in X.509 certificates > [4.765620] ../source/mm/pgtable-generic.c:40: bad pmd > 8b39bf7ee000(80007d6000e3) > [4.769099] ../source/mm/pgtable-generic.c:40: bad pmd > 8b39bf7ee008(80007d8000e3) > [4.772479] ../source/mm/pgtable-generic.c:40: bad pmd > 8b39bf7ee010(80007da000e3) > [4.775919] ../source/mm/pgtable-generic.c:40: bad pmd > 8b39bf7ee018(80007dc000e3) > [4.779251] ../source/mm/pgtable-generic.c:40: bad pmd > 8b39bf7ee020(80007de000e3) > [4.782558] ../source/mm/pgtable-generic.c:40: bad pmd > 8b39bf7ee028(80007ee3) > [4.794160] ../source/mm/pgtable-generic.c:40: bad pmd > 8b39bf7ee030(80007e2000e3) > [4.797525] ../source/mm/pgtable-generic.c:40: bad pmd > 8b39bf7ee038(80007e4000e3) > [4.800776] ../source/mm/pgtable-generic.c:40: bad pmd > 8b39bf7ee040(80007e6000e3) > [4.804100] ../source/mm/pgtable-generic.c:40: bad pmd > 8b39bf7ee048(80007e8000e3) > [4.807437] ../source/mm/pgtable-generic.c:40: bad pmd > 8b39bf7ee050(80007ea000e3) > [4.810729] ../source/mm/pgtable-generic.c:40: bad pmd > 8b39bf7ee058(80007ec000e3) > [4.813989] ../source/mm/pgtable-generic.c:40: bad pmd > 8b39bf7ee060(80007ee000e3) > [4.817294] ../source/mm/pgtable-generic.c:40: bad pmd > 8b39bf7ee068(80007fe3) > [4.820713] ../source/mm/pgtable-generic.c:40: bad pmd > 8b39bf7ee070(80007f2000e3) > [4.823943] ../source/mm/pgtable-generic.c:40: bad pmd > 8b39bf7ee078(80007f4000e3) > [4.827311] BUG: unable to handle kernel paging request at fe27c1fdfba0 > [4.830109] IP: free_page_and_swap_cache+0x6/0xa0 > [4.831999] PGD 7f7ef067 P4D 7f7ef067 PUD 0 > [4.833779] Oops: [#1] SMP PTI > [4.835197] Modules linked in: > [4.836450] CPU: 0 PID: 45 Comm: modprobe Not tainted 4.14.11-coreos #1 > [4.839009] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 > [4.841551] task: 8b39b5a71e40 task.stack: b92580558000 > [4.844062] RIP: 0010:free_page_and_swap_cache+0x6/0xa0 > [4.846238] RSP: 0018:b9258055bc98 EFLAGS: 00010297 > [4.848300] RAX: RBX: fe27c0001000 RCX: > 8b39bf7ef4f8 > [4.851184] RDX: 0007f7ee RSI: fe27c1fdfb80 RDI: > fe27c1fdfb80 > [4.854090] RBP: 8b39bf7ee000 R08: R09: > 0162 > [4.856946] R10: ff90 R11: 0161 R12: > fe27ffe0 > [4.859777] R13: 8b39bf7ef000 R14: fe28 R15: > b9258055bd60 > [4.862602] FS: () GS:8b39bd20() > knlGS: > [4.865860] CS: 0010 DS: ES: CR0: 80050033 > [4.868175] CR2: fe27c1fdfba0 CR3: 2d00a001 CR4: > 001606f0 > [4.871162] Call Trace: > [4.872188] free_pgd_range+0x3a5/0x5b0 > [4.873781] free_ldt_pgtables.part.2+0x60/0xa0 > [4.875679] ? arch_tlb_finish_mmu+0x42/0x70 > [4.877476] ? tlb_finish_mmu+0x1f/0x30 > [4.878999] exit_mmap+0x5b/0x1a0 > [4.880327] ? dput+0xb8/0x1e0 > [4.881575] ? hrtimer_try_to_cancel+0x25/0x110 > [4.883388] mmput+0x52/0x110 > [4.884620] do_exit+0x330/0xb10 > [4.886044] ? task_work_run+0x6b/0xa0 > [4.887544] do_group_exit+0x3c/0xa0 > [4.889012] SyS_exit_group+0x10/0x10 > [4.890473] entry_SYSCALL_64_fastpath+0x1a/0x7d > [4.892364] RIP: 0033:0x7f4a41d4ded9 > [4.893812] RSP: 002b:7ffe25d85708 EFLAGS: 0246 ORIG_RAX: > 00e7 > [4.896974] RAX: ffda RBX: 5601b3c9e2e0 RCX: > 7f4a41d4ded9 > [4.899830] RDX: RSI: 0001 RDI: > 0001 > [4.902647] RBP: 5601b3c9d0e8 R08: 003c R09: > 00e7 > [4.905743] R10: ff90 R11: 0246 R12: > 5601b3c9d090 > [4.908659] R13: 0004 R14: 0001 R15: > 7ffe25d85828 > [4.911495] Code: e0 01 48 83 f8 01 19 c0 25 01 fe ff ff 05 00 02 00 00 3e > 29 43 1c 5b 5d 41 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 <48> > 8b 57 20 48 89 fb 48 8d 42 ff 83 e2 01 48 0f 44 c7 48 8b 48 > [4.919014] RIP: free_page_and_swap_cache+0x6/0xa0