Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Wed, Mar 22, 2017 at 9:38 AM, Thomas Garnierwrote: > On Wed, Mar 22, 2017 at 9:33 AM, Andy Lutomirski wrote: >> On Wed, Mar 22, 2017 at 12:36 AM, Ingo Molnar wrote: >>> >>> * Thomas Garnier wrote: >>> > static inline void setup_fixmap_gdt(int cpu) > { > __set_fixmap(get_cpu_gdt_ro_index(cpu), > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags); > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)), > +pg_fixmap_gdt_flags); > } > > /* Load the original GDT from the per-cpu structure */ > > This makes UP boot for me, but SMP (2 cpus) is still busted. This change fixed boot for me: diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index b65155cc3760..4e30707d9f9a 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -104,7 +104,9 @@ enum fixed_addresses { FIX_GDT_REMAP_BEGIN, FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, - __end_of_permanent_fixed_addresses, + __end_of_permanent_fixed_addresses = + (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) & + -PTRS_PER_PTE, Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD. I don't think that the right fix but it might help understand the exact root cause. >>> >>> Could this be related to the permission bits in the PMD itself getting out >>> of sync >>> with the PTEs? WP test marks a page writable/unwritable, and maybe we mess >>> up the >>> restoration. If they are on separate PMDs then this is worked around >>> because the >>> fixmap GDT is on a separate PMD. >>> >> >> I don't think so. I think it's a pair of bugs related to the way that >> percpu areas are virtually mapped. >> >> Bug 1: __pa is totally bogus on percpu pointers. Oddly, we have one >> older instance of exactly the same bug (on the same GDT address) in >> the kernel. I'll send a patch. >> >> Bug 2: Nothing syncs a freshly-set-up CPU's percpu area into >> initial_page_table. This makes access to the gdt fail in >> startup_32_smp. This looks like a longstanding bug, and I don't see >> what it has to do with Thomas' series. I'm still mulling over what to >> do about it. > > Why do you think padding the fixmap also fix the problem? That's the > thing I don't get. > > With the padding the PA is now correct and the memcmp check also > succeed. That's odd. > Not sure. There are some complicated heuristics in the percpu code that determine how it's allocated, and the padding might be affecting those heuristics. --Andy
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Wed, Mar 22, 2017 at 9:38 AM, Thomas Garnier wrote: > On Wed, Mar 22, 2017 at 9:33 AM, Andy Lutomirski wrote: >> On Wed, Mar 22, 2017 at 12:36 AM, Ingo Molnar wrote: >>> >>> * Thomas Garnier wrote: >>> > static inline void setup_fixmap_gdt(int cpu) > { > __set_fixmap(get_cpu_gdt_ro_index(cpu), > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags); > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)), > +pg_fixmap_gdt_flags); > } > > /* Load the original GDT from the per-cpu structure */ > > This makes UP boot for me, but SMP (2 cpus) is still busted. This change fixed boot for me: diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index b65155cc3760..4e30707d9f9a 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -104,7 +104,9 @@ enum fixed_addresses { FIX_GDT_REMAP_BEGIN, FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, - __end_of_permanent_fixed_addresses, + __end_of_permanent_fixed_addresses = + (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) & + -PTRS_PER_PTE, Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD. I don't think that the right fix but it might help understand the exact root cause. >>> >>> Could this be related to the permission bits in the PMD itself getting out >>> of sync >>> with the PTEs? WP test marks a page writable/unwritable, and maybe we mess >>> up the >>> restoration. If they are on separate PMDs then this is worked around >>> because the >>> fixmap GDT is on a separate PMD. >>> >> >> I don't think so. I think it's a pair of bugs related to the way that >> percpu areas are virtually mapped. >> >> Bug 1: __pa is totally bogus on percpu pointers. Oddly, we have one >> older instance of exactly the same bug (on the same GDT address) in >> the kernel. I'll send a patch. >> >> Bug 2: Nothing syncs a freshly-set-up CPU's percpu area into >> initial_page_table. This makes access to the gdt fail in >> startup_32_smp. This looks like a longstanding bug, and I don't see >> what it has to do with Thomas' series. I'm still mulling over what to >> do about it. > > Why do you think padding the fixmap also fix the problem? That's the > thing I don't get. > > With the padding the PA is now correct and the memcmp check also > succeed. That's odd. > Not sure. There are some complicated heuristics in the percpu code that determine how it's allocated, and the padding might be affecting those heuristics. --Andy
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Wed, Mar 22, 2017 at 9:33 AM, Andy Lutomirskiwrote: > On Wed, Mar 22, 2017 at 12:36 AM, Ingo Molnar wrote: >> >> * Thomas Garnier wrote: >> >>> > static inline void setup_fixmap_gdt(int cpu) >>> > { >>> > __set_fixmap(get_cpu_gdt_ro_index(cpu), >>> > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags); >>> > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)), >>> > +pg_fixmap_gdt_flags); >>> > } >>> > >>> > /* Load the original GDT from the per-cpu structure */ >>> > >>> > This makes UP boot for me, but SMP (2 cpus) is still busted. >>> >>> This change fixed boot for me: >>> >>> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h >>> index b65155cc3760..4e30707d9f9a 100644 >>> --- a/arch/x86/include/asm/fixmap.h >>> +++ b/arch/x86/include/asm/fixmap.h >>> @@ -104,7 +104,9 @@ enum fixed_addresses { >>> FIX_GDT_REMAP_BEGIN, >>> FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, >>> >>> - __end_of_permanent_fixed_addresses, >>> + __end_of_permanent_fixed_addresses = >>> + (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) & >>> + -PTRS_PER_PTE, >>> >>> Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD. >>> >>> I don't think that the right fix but it might help understand the >>> exact root cause. >> >> Could this be related to the permission bits in the PMD itself getting out >> of sync >> with the PTEs? WP test marks a page writable/unwritable, and maybe we mess >> up the >> restoration. If they are on separate PMDs then this is worked around because >> the >> fixmap GDT is on a separate PMD. >> > > I don't think so. I think it's a pair of bugs related to the way that > percpu areas are virtually mapped. > > Bug 1: __pa is totally bogus on percpu pointers. Oddly, we have one > older instance of exactly the same bug (on the same GDT address) in > the kernel. I'll send a patch. > > Bug 2: Nothing syncs a freshly-set-up CPU's percpu area into > initial_page_table. This makes access to the gdt fail in > startup_32_smp. This looks like a longstanding bug, and I don't see > what it has to do with Thomas' series. I'm still mulling over what to > do about it. Why do you think padding the fixmap also fix the problem? That's the thing I don't get. With the padding the PA is now correct and the memcmp check also succeed. That's odd. > > --Andy -- Thomas
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Wed, Mar 22, 2017 at 9:33 AM, Andy Lutomirski wrote: > On Wed, Mar 22, 2017 at 12:36 AM, Ingo Molnar wrote: >> >> * Thomas Garnier wrote: >> >>> > static inline void setup_fixmap_gdt(int cpu) >>> > { >>> > __set_fixmap(get_cpu_gdt_ro_index(cpu), >>> > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags); >>> > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)), >>> > +pg_fixmap_gdt_flags); >>> > } >>> > >>> > /* Load the original GDT from the per-cpu structure */ >>> > >>> > This makes UP boot for me, but SMP (2 cpus) is still busted. >>> >>> This change fixed boot for me: >>> >>> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h >>> index b65155cc3760..4e30707d9f9a 100644 >>> --- a/arch/x86/include/asm/fixmap.h >>> +++ b/arch/x86/include/asm/fixmap.h >>> @@ -104,7 +104,9 @@ enum fixed_addresses { >>> FIX_GDT_REMAP_BEGIN, >>> FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, >>> >>> - __end_of_permanent_fixed_addresses, >>> + __end_of_permanent_fixed_addresses = >>> + (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) & >>> + -PTRS_PER_PTE, >>> >>> Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD. >>> >>> I don't think that the right fix but it might help understand the >>> exact root cause. >> >> Could this be related to the permission bits in the PMD itself getting out >> of sync >> with the PTEs? WP test marks a page writable/unwritable, and maybe we mess >> up the >> restoration. If they are on separate PMDs then this is worked around because >> the >> fixmap GDT is on a separate PMD. >> > > I don't think so. I think it's a pair of bugs related to the way that > percpu areas are virtually mapped. > > Bug 1: __pa is totally bogus on percpu pointers. Oddly, we have one > older instance of exactly the same bug (on the same GDT address) in > the kernel. I'll send a patch. > > Bug 2: Nothing syncs a freshly-set-up CPU's percpu area into > initial_page_table. This makes access to the gdt fail in > startup_32_smp. This looks like a longstanding bug, and I don't see > what it has to do with Thomas' series. I'm still mulling over what to > do about it. Why do you think padding the fixmap also fix the problem? That's the thing I don't get. With the padding the PA is now correct and the memcmp check also succeed. That's odd. > > --Andy -- Thomas
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Wed, Mar 22, 2017 at 12:36 AM, Ingo Molnarwrote: > > * Thomas Garnier wrote: > >> > static inline void setup_fixmap_gdt(int cpu) >> > { >> > __set_fixmap(get_cpu_gdt_ro_index(cpu), >> > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags); >> > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)), >> > +pg_fixmap_gdt_flags); >> > } >> > >> > /* Load the original GDT from the per-cpu structure */ >> > >> > This makes UP boot for me, but SMP (2 cpus) is still busted. >> >> This change fixed boot for me: >> >> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h >> index b65155cc3760..4e30707d9f9a 100644 >> --- a/arch/x86/include/asm/fixmap.h >> +++ b/arch/x86/include/asm/fixmap.h >> @@ -104,7 +104,9 @@ enum fixed_addresses { >> FIX_GDT_REMAP_BEGIN, >> FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, >> >> - __end_of_permanent_fixed_addresses, >> + __end_of_permanent_fixed_addresses = >> + (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) & >> + -PTRS_PER_PTE, >> >> Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD. >> >> I don't think that the right fix but it might help understand the >> exact root cause. > > Could this be related to the permission bits in the PMD itself getting out of > sync > with the PTEs? WP test marks a page writable/unwritable, and maybe we mess up > the > restoration. If they are on separate PMDs then this is worked around because > the > fixmap GDT is on a separate PMD. > I don't think so. I think it's a pair of bugs related to the way that percpu areas are virtually mapped. Bug 1: __pa is totally bogus on percpu pointers. Oddly, we have one older instance of exactly the same bug (on the same GDT address) in the kernel. I'll send a patch. Bug 2: Nothing syncs a freshly-set-up CPU's percpu area into initial_page_table. This makes access to the gdt fail in startup_32_smp. This looks like a longstanding bug, and I don't see what it has to do with Thomas' series. I'm still mulling over what to do about it. --Andy
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Wed, Mar 22, 2017 at 12:36 AM, Ingo Molnar wrote: > > * Thomas Garnier wrote: > >> > static inline void setup_fixmap_gdt(int cpu) >> > { >> > __set_fixmap(get_cpu_gdt_ro_index(cpu), >> > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags); >> > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)), >> > +pg_fixmap_gdt_flags); >> > } >> > >> > /* Load the original GDT from the per-cpu structure */ >> > >> > This makes UP boot for me, but SMP (2 cpus) is still busted. >> >> This change fixed boot for me: >> >> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h >> index b65155cc3760..4e30707d9f9a 100644 >> --- a/arch/x86/include/asm/fixmap.h >> +++ b/arch/x86/include/asm/fixmap.h >> @@ -104,7 +104,9 @@ enum fixed_addresses { >> FIX_GDT_REMAP_BEGIN, >> FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, >> >> - __end_of_permanent_fixed_addresses, >> + __end_of_permanent_fixed_addresses = >> + (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) & >> + -PTRS_PER_PTE, >> >> Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD. >> >> I don't think that the right fix but it might help understand the >> exact root cause. > > Could this be related to the permission bits in the PMD itself getting out of > sync > with the PTEs? WP test marks a page writable/unwritable, and maybe we mess up > the > restoration. If they are on separate PMDs then this is worked around because > the > fixmap GDT is on a separate PMD. > I don't think so. I think it's a pair of bugs related to the way that percpu areas are virtually mapped. Bug 1: __pa is totally bogus on percpu pointers. Oddly, we have one older instance of exactly the same bug (on the same GDT address) in the kernel. I'll send a patch. Bug 2: Nothing syncs a freshly-set-up CPU's percpu area into initial_page_table. This makes access to the gdt fail in startup_32_smp. This looks like a longstanding bug, and I don't see what it has to do with Thomas' series. I'm still mulling over what to do about it. --Andy
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
* Thomas Garnierwrote: > > static inline void setup_fixmap_gdt(int cpu) > > { > > __set_fixmap(get_cpu_gdt_ro_index(cpu), > > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags); > > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)), > > +pg_fixmap_gdt_flags); > > } > > > > /* Load the original GDT from the per-cpu structure */ > > > > This makes UP boot for me, but SMP (2 cpus) is still busted. > > This change fixed boot for me: > > diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h > index b65155cc3760..4e30707d9f9a 100644 > --- a/arch/x86/include/asm/fixmap.h > +++ b/arch/x86/include/asm/fixmap.h > @@ -104,7 +104,9 @@ enum fixed_addresses { > FIX_GDT_REMAP_BEGIN, > FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, > > - __end_of_permanent_fixed_addresses, > + __end_of_permanent_fixed_addresses = > + (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) & > + -PTRS_PER_PTE, > > Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD. > > I don't think that the right fix but it might help understand the > exact root cause. Could this be related to the permission bits in the PMD itself getting out of sync with the PTEs? WP test marks a page writable/unwritable, and maybe we mess up the restoration. If they are on separate PMDs then this is worked around because the fixmap GDT is on a separate PMD. Thanks, Ingo
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
* Thomas Garnier wrote: > > static inline void setup_fixmap_gdt(int cpu) > > { > > __set_fixmap(get_cpu_gdt_ro_index(cpu), > > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags); > > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)), > > +pg_fixmap_gdt_flags); > > } > > > > /* Load the original GDT from the per-cpu structure */ > > > > This makes UP boot for me, but SMP (2 cpus) is still busted. > > This change fixed boot for me: > > diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h > index b65155cc3760..4e30707d9f9a 100644 > --- a/arch/x86/include/asm/fixmap.h > +++ b/arch/x86/include/asm/fixmap.h > @@ -104,7 +104,9 @@ enum fixed_addresses { > FIX_GDT_REMAP_BEGIN, > FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, > > - __end_of_permanent_fixed_addresses, > + __end_of_permanent_fixed_addresses = > + (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) & > + -PTRS_PER_PTE, > > Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD. > > I don't think that the right fix but it might help understand the > exact root cause. Could this be related to the permission bits in the PMD itself getting out of sync with the PTEs? WP test marks a page writable/unwritable, and maybe we mess up the restoration. If they are on separate PMDs then this is worked around because the fixmap GDT is on a separate PMD. Thanks, Ingo
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Tue, Mar 21, 2017 at 9:27 PM, Andy Lutomirskiwrote: > On Tue, Mar 21, 2017 at 5:41 PM, Thomas Garnier wrote: >> On Tue, Mar 21, 2017 at 4:51 PM, Andy Lutomirski wrote: >>> On Tue, Mar 21, 2017 at 3:32 PM, Andy Lutomirski >>> wrote: On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds wrote: > On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier > wrote: >> The issue seems to be related to exceptions happening in close pages >> to the fixmap GDT remapping. >> >> The original page fault happen in do_test_wp_bit which set a fixmap >> entry to test WP flag. If I grow the number of processors supported >> increasing the distance between the remapped GDT page and the WP test >> page, the error does not reproduce. >> >> I am still looking at the exact distance between repro and no-repro as >> well as the exact root cause. > > Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can* > cover 8k entries, which at 8 bytes each would be 64kB. The QEMU barf says the GDT limit is 0xff, for better or for worse. > > So somebody trying to load an invalid segment (say, 0x) might end > up causing an access to the GDT base + 64k - 8. > > It is also possible that the CPU might do a page table writability > check *before* it does the limit check. That would sound odd, though. > Might be a CPU errata. > >>> There's presumably something genuinely wrong with our GDT. >>> >>> This is suspicious. I added this code in test_wp_bit: >>> >>> if (memcmp(get_current_gdt_ro(), get_current_gdt_rw(), 4096) != 0) { >>> pr_err("Oh crap\n"); >>> BUG_ON(1); >>> } >>> >>> It printed "Oh crap" and blew up. Methinks something's wrong with the >>> fixmap. Is it possible that we're crossing a PMD boundary and failing >>> to translate the addresses right? >> >> I might be that. We crash when the PKMAP_BASE is just after the FIX_WP_TEST. >> >> I will continue testing couple scenarios and design a fix. Moving the >> GDT FIXMAP at the beginning or align the base (or pad the end). >> > > Talk about barking up the wrong tree... > > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > index f8e22dbad86c..c564f62c7a8d 100644 > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -462,7 +464,8 @@ pgprot_t pg_fixmap_gdt_flags = PAGE_KERNEL; > static inline void setup_fixmap_gdt(int cpu) > { > __set_fixmap(get_cpu_gdt_ro_index(cpu), > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags); > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)), > +pg_fixmap_gdt_flags); > } > > /* Load the original GDT from the per-cpu structure */ > > This makes UP boot for me, but SMP (2 cpus) is still busted. This change fixed boot for me: diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index b65155cc3760..4e30707d9f9a 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -104,7 +104,9 @@ enum fixed_addresses { FIX_GDT_REMAP_BEGIN, FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, - __end_of_permanent_fixed_addresses, + __end_of_permanent_fixed_addresses = + (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) & + -PTRS_PER_PTE, Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD. I don't think that the right fix but it might help understand the exact root cause. -- Thomas
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Tue, Mar 21, 2017 at 9:27 PM, Andy Lutomirski wrote: > On Tue, Mar 21, 2017 at 5:41 PM, Thomas Garnier wrote: >> On Tue, Mar 21, 2017 at 4:51 PM, Andy Lutomirski wrote: >>> On Tue, Mar 21, 2017 at 3:32 PM, Andy Lutomirski >>> wrote: On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds wrote: > On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier > wrote: >> The issue seems to be related to exceptions happening in close pages >> to the fixmap GDT remapping. >> >> The original page fault happen in do_test_wp_bit which set a fixmap >> entry to test WP flag. If I grow the number of processors supported >> increasing the distance between the remapped GDT page and the WP test >> page, the error does not reproduce. >> >> I am still looking at the exact distance between repro and no-repro as >> well as the exact root cause. > > Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can* > cover 8k entries, which at 8 bytes each would be 64kB. The QEMU barf says the GDT limit is 0xff, for better or for worse. > > So somebody trying to load an invalid segment (say, 0x) might end > up causing an access to the GDT base + 64k - 8. > > It is also possible that the CPU might do a page table writability > check *before* it does the limit check. That would sound odd, though. > Might be a CPU errata. > >>> There's presumably something genuinely wrong with our GDT. >>> >>> This is suspicious. I added this code in test_wp_bit: >>> >>> if (memcmp(get_current_gdt_ro(), get_current_gdt_rw(), 4096) != 0) { >>> pr_err("Oh crap\n"); >>> BUG_ON(1); >>> } >>> >>> It printed "Oh crap" and blew up. Methinks something's wrong with the >>> fixmap. Is it possible that we're crossing a PMD boundary and failing >>> to translate the addresses right? >> >> I might be that. We crash when the PKMAP_BASE is just after the FIX_WP_TEST. >> >> I will continue testing couple scenarios and design a fix. Moving the >> GDT FIXMAP at the beginning or align the base (or pad the end). >> > > Talk about barking up the wrong tree... > > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > index f8e22dbad86c..c564f62c7a8d 100644 > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -462,7 +464,8 @@ pgprot_t pg_fixmap_gdt_flags = PAGE_KERNEL; > static inline void setup_fixmap_gdt(int cpu) > { > __set_fixmap(get_cpu_gdt_ro_index(cpu), > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags); > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)), > +pg_fixmap_gdt_flags); > } > > /* Load the original GDT from the per-cpu structure */ > > This makes UP boot for me, but SMP (2 cpus) is still busted. This change fixed boot for me: diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index b65155cc3760..4e30707d9f9a 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -104,7 +104,9 @@ enum fixed_addresses { FIX_GDT_REMAP_BEGIN, FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, - __end_of_permanent_fixed_addresses, + __end_of_permanent_fixed_addresses = + (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) & + -PTRS_PER_PTE, Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD. I don't think that the right fix but it might help understand the exact root cause. -- Thomas
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Tue, Mar 21, 2017 at 5:41 PM, Thomas Garnierwrote: > On Tue, Mar 21, 2017 at 4:51 PM, Andy Lutomirski wrote: >> On Tue, Mar 21, 2017 at 3:32 PM, Andy Lutomirski wrote: >>> On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds >>> wrote: On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier wrote: > The issue seems to be related to exceptions happening in close pages > to the fixmap GDT remapping. > > The original page fault happen in do_test_wp_bit which set a fixmap > entry to test WP flag. If I grow the number of processors supported > increasing the distance between the remapped GDT page and the WP test > page, the error does not reproduce. > > I am still looking at the exact distance between repro and no-repro as > well as the exact root cause. Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can* cover 8k entries, which at 8 bytes each would be 64kB. >>> >>> The QEMU barf says the GDT limit is 0xff, for better or for worse. >>> So somebody trying to load an invalid segment (say, 0x) might end up causing an access to the GDT base + 64k - 8. It is also possible that the CPU might do a page table writability check *before* it does the limit check. That would sound odd, though. Might be a CPU errata. >>> >> >>> There's presumably something genuinely wrong with our GDT. >> >> This is suspicious. I added this code in test_wp_bit: >> >> if (memcmp(get_current_gdt_ro(), get_current_gdt_rw(), 4096) != 0) { >> pr_err("Oh crap\n"); >> BUG_ON(1); >> } >> >> It printed "Oh crap" and blew up. Methinks something's wrong with the >> fixmap. Is it possible that we're crossing a PMD boundary and failing >> to translate the addresses right? > > I might be that. We crash when the PKMAP_BASE is just after the FIX_WP_TEST. > > I will continue testing couple scenarios and design a fix. Moving the > GDT FIXMAP at the beginning or align the base (or pad the end). > Talk about barking up the wrong tree... diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index f8e22dbad86c..c564f62c7a8d 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -462,7 +464,8 @@ pgprot_t pg_fixmap_gdt_flags = PAGE_KERNEL; static inline void setup_fixmap_gdt(int cpu) { __set_fixmap(get_cpu_gdt_ro_index(cpu), -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags); +slow_virt_to_phys(get_cpu_gdt_rw(cpu)), +pg_fixmap_gdt_flags); } /* Load the original GDT from the per-cpu structure */ This makes UP boot for me, but SMP (2 cpus) is still busted.
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Tue, Mar 21, 2017 at 5:41 PM, Thomas Garnier wrote: > On Tue, Mar 21, 2017 at 4:51 PM, Andy Lutomirski wrote: >> On Tue, Mar 21, 2017 at 3:32 PM, Andy Lutomirski wrote: >>> On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds >>> wrote: On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier wrote: > The issue seems to be related to exceptions happening in close pages > to the fixmap GDT remapping. > > The original page fault happen in do_test_wp_bit which set a fixmap > entry to test WP flag. If I grow the number of processors supported > increasing the distance between the remapped GDT page and the WP test > page, the error does not reproduce. > > I am still looking at the exact distance between repro and no-repro as > well as the exact root cause. Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can* cover 8k entries, which at 8 bytes each would be 64kB. >>> >>> The QEMU barf says the GDT limit is 0xff, for better or for worse. >>> So somebody trying to load an invalid segment (say, 0x) might end up causing an access to the GDT base + 64k - 8. It is also possible that the CPU might do a page table writability check *before* it does the limit check. That would sound odd, though. Might be a CPU errata. >>> >> >>> There's presumably something genuinely wrong with our GDT. >> >> This is suspicious. I added this code in test_wp_bit: >> >> if (memcmp(get_current_gdt_ro(), get_current_gdt_rw(), 4096) != 0) { >> pr_err("Oh crap\n"); >> BUG_ON(1); >> } >> >> It printed "Oh crap" and blew up. Methinks something's wrong with the >> fixmap. Is it possible that we're crossing a PMD boundary and failing >> to translate the addresses right? > > I might be that. We crash when the PKMAP_BASE is just after the FIX_WP_TEST. > > I will continue testing couple scenarios and design a fix. Moving the > GDT FIXMAP at the beginning or align the base (or pad the end). > Talk about barking up the wrong tree... diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index f8e22dbad86c..c564f62c7a8d 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -462,7 +464,8 @@ pgprot_t pg_fixmap_gdt_flags = PAGE_KERNEL; static inline void setup_fixmap_gdt(int cpu) { __set_fixmap(get_cpu_gdt_ro_index(cpu), -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags); +slow_virt_to_phys(get_cpu_gdt_rw(cpu)), +pg_fixmap_gdt_flags); } /* Load the original GDT from the per-cpu structure */ This makes UP boot for me, but SMP (2 cpus) is still busted.
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Tue, Mar 21, 2017 at 4:51 PM, Andy Lutomirskiwrote: > On Tue, Mar 21, 2017 at 3:32 PM, Andy Lutomirski wrote: >> On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds >> wrote: >>> On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier wrote: The issue seems to be related to exceptions happening in close pages to the fixmap GDT remapping. The original page fault happen in do_test_wp_bit which set a fixmap entry to test WP flag. If I grow the number of processors supported increasing the distance between the remapped GDT page and the WP test page, the error does not reproduce. I am still looking at the exact distance between repro and no-repro as well as the exact root cause. >>> >>> Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can* >>> cover 8k entries, which at 8 bytes each would be 64kB. >> >> The QEMU barf says the GDT limit is 0xff, for better or for worse. >> >>> >>> So somebody trying to load an invalid segment (say, 0x) might end >>> up causing an access to the GDT base + 64k - 8. >>> >>> It is also possible that the CPU might do a page table writability >>> check *before* it does the limit check. That would sound odd, though. >>> Might be a CPU errata. >>> >> > >> There's presumably something genuinely wrong with our GDT. > > This is suspicious. I added this code in test_wp_bit: > > if (memcmp(get_current_gdt_ro(), get_current_gdt_rw(), 4096) != 0) { > pr_err("Oh crap\n"); > BUG_ON(1); > } > > It printed "Oh crap" and blew up. Methinks something's wrong with the > fixmap. Is it possible that we're crossing a PMD boundary and failing > to translate the addresses right? I might be that. We crash when the PKMAP_BASE is just after the FIX_WP_TEST. I will continue testing couple scenarios and design a fix. Moving the GDT FIXMAP at the beginning or align the base (or pad the end). -- Thomas
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Tue, Mar 21, 2017 at 4:51 PM, Andy Lutomirski wrote: > On Tue, Mar 21, 2017 at 3:32 PM, Andy Lutomirski wrote: >> On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds >> wrote: >>> On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier wrote: The issue seems to be related to exceptions happening in close pages to the fixmap GDT remapping. The original page fault happen in do_test_wp_bit which set a fixmap entry to test WP flag. If I grow the number of processors supported increasing the distance between the remapped GDT page and the WP test page, the error does not reproduce. I am still looking at the exact distance between repro and no-repro as well as the exact root cause. >>> >>> Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can* >>> cover 8k entries, which at 8 bytes each would be 64kB. >> >> The QEMU barf says the GDT limit is 0xff, for better or for worse. >> >>> >>> So somebody trying to load an invalid segment (say, 0x) might end >>> up causing an access to the GDT base + 64k - 8. >>> >>> It is also possible that the CPU might do a page table writability >>> check *before* it does the limit check. That would sound odd, though. >>> Might be a CPU errata. >>> >> > >> There's presumably something genuinely wrong with our GDT. > > This is suspicious. I added this code in test_wp_bit: > > if (memcmp(get_current_gdt_ro(), get_current_gdt_rw(), 4096) != 0) { > pr_err("Oh crap\n"); > BUG_ON(1); > } > > It printed "Oh crap" and blew up. Methinks something's wrong with the > fixmap. Is it possible that we're crossing a PMD boundary and failing > to translate the addresses right? I might be that. We crash when the PKMAP_BASE is just after the FIX_WP_TEST. I will continue testing couple scenarios and design a fix. Moving the GDT FIXMAP at the beginning or align the base (or pad the end). -- Thomas
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Tue, Mar 21, 2017 at 3:32 PM, Andy Lutomirskiwrote: > On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds > wrote: >> On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier wrote: >>> The issue seems to be related to exceptions happening in close pages >>> to the fixmap GDT remapping. >>> >>> The original page fault happen in do_test_wp_bit which set a fixmap >>> entry to test WP flag. If I grow the number of processors supported >>> increasing the distance between the remapped GDT page and the WP test >>> page, the error does not reproduce. >>> >>> I am still looking at the exact distance between repro and no-repro as >>> well as the exact root cause. >> >> Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can* >> cover 8k entries, which at 8 bytes each would be 64kB. > > The QEMU barf says the GDT limit is 0xff, for better or for worse. > >> >> So somebody trying to load an invalid segment (say, 0x) might end >> up causing an access to the GDT base + 64k - 8. >> >> It is also possible that the CPU might do a page table writability >> check *before* it does the limit check. That would sound odd, though. >> Might be a CPU errata. >> > > There's presumably something genuinely wrong with our GDT. This is suspicious. I added this code in test_wp_bit: if (memcmp(get_current_gdt_ro(), get_current_gdt_rw(), 4096) != 0) { pr_err("Oh crap\n"); BUG_ON(1); } It printed "Oh crap" and blew up. Methinks something's wrong with the fixmap. Is it possible that we're crossing a PMD boundary and failing to translate the addresses right?
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Tue, Mar 21, 2017 at 3:32 PM, Andy Lutomirski wrote: > On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds > wrote: >> On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier wrote: >>> The issue seems to be related to exceptions happening in close pages >>> to the fixmap GDT remapping. >>> >>> The original page fault happen in do_test_wp_bit which set a fixmap >>> entry to test WP flag. If I grow the number of processors supported >>> increasing the distance between the remapped GDT page and the WP test >>> page, the error does not reproduce. >>> >>> I am still looking at the exact distance between repro and no-repro as >>> well as the exact root cause. >> >> Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can* >> cover 8k entries, which at 8 bytes each would be 64kB. > > The QEMU barf says the GDT limit is 0xff, for better or for worse. > >> >> So somebody trying to load an invalid segment (say, 0x) might end >> up causing an access to the GDT base + 64k - 8. >> >> It is also possible that the CPU might do a page table writability >> check *before* it does the limit check. That would sound odd, though. >> Might be a CPU errata. >> > > There's presumably something genuinely wrong with our GDT. This is suspicious. I added this code in test_wp_bit: if (memcmp(get_current_gdt_ro(), get_current_gdt_rw(), 4096) != 0) { pr_err("Oh crap\n"); BUG_ON(1); } It printed "Oh crap" and blew up. Methinks something's wrong with the fixmap. Is it possible that we're crossing a PMD boundary and failing to translate the addresses right?
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvaldswrote: > On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier wrote: >> The issue seems to be related to exceptions happening in close pages >> to the fixmap GDT remapping. >> >> The original page fault happen in do_test_wp_bit which set a fixmap >> entry to test WP flag. If I grow the number of processors supported >> increasing the distance between the remapped GDT page and the WP test >> page, the error does not reproduce. >> >> I am still looking at the exact distance between repro and no-repro as >> well as the exact root cause. > > Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can* > cover 8k entries, which at 8 bytes each would be 64kB. The QEMU barf says the GDT limit is 0xff, for better or for worse. > > So somebody trying to load an invalid segment (say, 0x) might end > up causing an access to the GDT base + 64k - 8. > > It is also possible that the CPU might do a page table writability > check *before* it does the limit check. That would sound odd, though. > Might be a CPU errata. > I added a global TLB flush right after __set_fixmap(), with no effect. I instrumented the code a bit and I see: [0.00] Checking if this processor honours the WP bit even in supervisor mode... [0.00] Will do WP test: PA 258b000 VA ff874000 GDTRW 547e GDTRO ffa94000 KVM internal error. Suberror: 3 extra data[0]: 8b0e extra data[1]: 31 EAX=0001 EBX=cbb13bc3 ECX= EDX=f000 ESI=547e EDI=ffa94000 EBP=42201f4c ESP=42201f4c EIP=4105819d EFL=00210006 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =007b 00c0f300 DPL=3 DS [-WA] CS =0060 00c09b00 DPL=0 CS32 [-RA] SS =0068 00c09300 DPL=0 DS [-WA] DS =007b 00c0f300 DPL=3 DS [-WA] FS =00d8 123b2000 00809300 DPL=0 DS16 [-WA] GS =00e0 5492d300 0018 00409100 DPL=0 DS [--A] LDT= TR =0080 5492b180 206b 8b00 DPL=0 TSS32-busy GDT= ffa94000 00ff IDT= fffba000 07ff CR0=80050033 CR2=ff874000 CR3=0258b000 CR4=00040690 DR0= DR1= DR2= DR3= DR6=fffe0ff0 DR7=0400 EFER= Code=58 d1 00 b8 01 00 00 00 8b 15 ac 13 22 42 8a 8a 00 50 87 ff <88> 8a 00 50 87 ff 31 c0 5d c3 90 90 90 90 90 90 90 90 90 55 2d 84 02 00 00 89 e5 e8 c3 05 The faulting instruction is, as expected: e:8a 8a 00 50 87 ffmov-0x78b000(%rdx),%cl 14:*88 8a 00 50 87 ffmov%cl,-0x78b000(%rdx) <-- trapping instruction CR2 is what we expect. It would be nice to see the GPA and GLA for the EPT misconfiguration, but KVM doesn't appear to show it. I doubt we're looking at an erratum here. QEMU TCG triple-faults: [0.00] Will do WP test: PA 258b000 VA ff874000 GDTRW 547e GDTRO ffa94000 check_exception old: 0x new 0xe [#PF] 0: v=0e e=0003 i=0 cpl=0 IP=0060:4105819d pc=4105819d SP=0068:42201f4c CR2=ff874000 EAX=0001 EBX=88eed8df ECX= EDX=f000 ESI=547e EDI=ffa94000 EBP=42201f4c ESP=42201f4c EIP=4105819d EFL=0026 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =007b 00cff300 DPL=3 DS [-WA] CS =0060 00cf9a00 DPL=0 CS32 [-R-] SS =0068 00cf9300 DPL=0 DS [-WA] DS =007b 00cff300 DPL=3 DS [-WA] FS =00d8 123b2000 008f9300 DPL=0 DS16 [-WA] GS =00e0 5492d300 0018 00409100 DPL=0 DS [--A] LDT= 8200 DPL=0 LDT TR =0080 5492b180 206b 8900 DPL=0 TSS32-avl GDT= ffa94000 00ff IDT= fffba000 07ff CR0=80050033 CR2=ff874000 CR3=0258b000 CR4=0690 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 CCS=0004 CCD=42201f3c CCO=ADDL EFER= check_exception old: 0xe new 0xd [#GP] 1: v=08 e= i=0 cpl=0 IP=0060:4105819d pc=4105819d SP=0068:42201f4c env->regs[R_EAX]=0001 EAX=0001 EBX=88eed8df ECX= EDX=f000 ESI=547e EDI=ffa94000 EBP=42201f4c ESP=42201f4c EIP=4105819d EFL=0026 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =007b 00cff300 DPL=3 DS [-WA] CS =0060 00cf9a00 DPL=0 CS32 [-R-] SS =0068 00cf9300 DPL=0 DS [-WA] DS =007b 00cff300 DPL=3 DS [-WA] FS =00d8 123b2000 008f9300 DPL=0 DS16 [-WA] GS =00e0 5492d300 0018 00409100 DPL=0 DS [--A] LDT= 8200 DPL=0 LDT TR =0080 5492b180 206b 8900 DPL=0 TSS32-avl GDT= ffa94000 00ff IDT= fffba000 07ff CR0=80050033 CR2=ff874000 CR3=0258b000 CR4=0690 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 CCS=0004
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds wrote: > On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier wrote: >> The issue seems to be related to exceptions happening in close pages >> to the fixmap GDT remapping. >> >> The original page fault happen in do_test_wp_bit which set a fixmap >> entry to test WP flag. If I grow the number of processors supported >> increasing the distance between the remapped GDT page and the WP test >> page, the error does not reproduce. >> >> I am still looking at the exact distance between repro and no-repro as >> well as the exact root cause. > > Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can* > cover 8k entries, which at 8 bytes each would be 64kB. The QEMU barf says the GDT limit is 0xff, for better or for worse. > > So somebody trying to load an invalid segment (say, 0x) might end > up causing an access to the GDT base + 64k - 8. > > It is also possible that the CPU might do a page table writability > check *before* it does the limit check. That would sound odd, though. > Might be a CPU errata. > I added a global TLB flush right after __set_fixmap(), with no effect. I instrumented the code a bit and I see: [0.00] Checking if this processor honours the WP bit even in supervisor mode... [0.00] Will do WP test: PA 258b000 VA ff874000 GDTRW 547e GDTRO ffa94000 KVM internal error. Suberror: 3 extra data[0]: 8b0e extra data[1]: 31 EAX=0001 EBX=cbb13bc3 ECX= EDX=f000 ESI=547e EDI=ffa94000 EBP=42201f4c ESP=42201f4c EIP=4105819d EFL=00210006 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =007b 00c0f300 DPL=3 DS [-WA] CS =0060 00c09b00 DPL=0 CS32 [-RA] SS =0068 00c09300 DPL=0 DS [-WA] DS =007b 00c0f300 DPL=3 DS [-WA] FS =00d8 123b2000 00809300 DPL=0 DS16 [-WA] GS =00e0 5492d300 0018 00409100 DPL=0 DS [--A] LDT= TR =0080 5492b180 206b 8b00 DPL=0 TSS32-busy GDT= ffa94000 00ff IDT= fffba000 07ff CR0=80050033 CR2=ff874000 CR3=0258b000 CR4=00040690 DR0= DR1= DR2= DR3= DR6=fffe0ff0 DR7=0400 EFER= Code=58 d1 00 b8 01 00 00 00 8b 15 ac 13 22 42 8a 8a 00 50 87 ff <88> 8a 00 50 87 ff 31 c0 5d c3 90 90 90 90 90 90 90 90 90 55 2d 84 02 00 00 89 e5 e8 c3 05 The faulting instruction is, as expected: e:8a 8a 00 50 87 ffmov-0x78b000(%rdx),%cl 14:*88 8a 00 50 87 ffmov%cl,-0x78b000(%rdx) <-- trapping instruction CR2 is what we expect. It would be nice to see the GPA and GLA for the EPT misconfiguration, but KVM doesn't appear to show it. I doubt we're looking at an erratum here. QEMU TCG triple-faults: [0.00] Will do WP test: PA 258b000 VA ff874000 GDTRW 547e GDTRO ffa94000 check_exception old: 0x new 0xe [#PF] 0: v=0e e=0003 i=0 cpl=0 IP=0060:4105819d pc=4105819d SP=0068:42201f4c CR2=ff874000 EAX=0001 EBX=88eed8df ECX= EDX=f000 ESI=547e EDI=ffa94000 EBP=42201f4c ESP=42201f4c EIP=4105819d EFL=0026 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =007b 00cff300 DPL=3 DS [-WA] CS =0060 00cf9a00 DPL=0 CS32 [-R-] SS =0068 00cf9300 DPL=0 DS [-WA] DS =007b 00cff300 DPL=3 DS [-WA] FS =00d8 123b2000 008f9300 DPL=0 DS16 [-WA] GS =00e0 5492d300 0018 00409100 DPL=0 DS [--A] LDT= 8200 DPL=0 LDT TR =0080 5492b180 206b 8900 DPL=0 TSS32-avl GDT= ffa94000 00ff IDT= fffba000 07ff CR0=80050033 CR2=ff874000 CR3=0258b000 CR4=0690 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 CCS=0004 CCD=42201f3c CCO=ADDL EFER= check_exception old: 0xe new 0xd [#GP] 1: v=08 e= i=0 cpl=0 IP=0060:4105819d pc=4105819d SP=0068:42201f4c env->regs[R_EAX]=0001 EAX=0001 EBX=88eed8df ECX= EDX=f000 ESI=547e EDI=ffa94000 EBP=42201f4c ESP=42201f4c EIP=4105819d EFL=0026 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =007b 00cff300 DPL=3 DS [-WA] CS =0060 00cf9a00 DPL=0 CS32 [-R-] SS =0068 00cf9300 DPL=0 DS [-WA] DS =007b 00cff300 DPL=3 DS [-WA] FS =00d8 123b2000 008f9300 DPL=0 DS16 [-WA] GS =00e0 5492d300 0018 00409100 DPL=0 DS [--A] LDT= 8200 DPL=0 LDT TR =0080 5492b180 206b 8900 DPL=0 TSS32-avl GDT= ffa94000 00ff IDT= fffba000 07ff CR0=80050033 CR2=ff874000 CR3=0258b000 CR4=0690 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 CCS=0004 CCD=42201f3c CCO=ADDL EFER=
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnierwrote: > The issue seems to be related to exceptions happening in close pages > to the fixmap GDT remapping. > > The original page fault happen in do_test_wp_bit which set a fixmap > entry to test WP flag. If I grow the number of processors supported > increasing the distance between the remapped GDT page and the WP test > page, the error does not reproduce. > > I am still looking at the exact distance between repro and no-repro as > well as the exact root cause. Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can* cover 8k entries, which at 8 bytes each would be 64kB. So somebody trying to load an invalid segment (say, 0x) might end up causing an access to the GDT base + 64k - 8. It is also possible that the CPU might do a page table writability check *before* it does the limit check. That would sound odd, though. Might be a CPU errata. Linus
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier wrote: > The issue seems to be related to exceptions happening in close pages > to the fixmap GDT remapping. > > The original page fault happen in do_test_wp_bit which set a fixmap > entry to test WP flag. If I grow the number of processors supported > increasing the distance between the remapped GDT page and the WP test > page, the error does not reproduce. > > I am still looking at the exact distance between repro and no-repro as > well as the exact root cause. Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can* cover 8k entries, which at 8 bytes each would be 64kB. So somebody trying to load an invalid segment (say, 0x) might end up causing an access to the GDT base + 64k - 8. It is also possible that the CPU might do a page table writability check *before* it does the limit check. That would sound odd, though. Might be a CPU errata. Linus
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
The issue seems to be related to exceptions happening in close pages to the fixmap GDT remapping. The original page fault happen in do_test_wp_bit which set a fixmap entry to test WP flag. If I grow the number of processors supported increasing the distance between the remapped GDT page and the WP test page, the error does not reproduce. I am still looking at the exact distance between repro and no-repro as well as the exact root cause. On Tue, Mar 21, 2017 at 12:23 PM, Thomas Garnierwrote: > On Tue, Mar 21, 2017 at 12:20 PM, Linus Torvalds > wrote: >> >> On Tue, Mar 21, 2017 at 11:16 AM, Thomas Garnier wrote: >> > This error happens even with Andy TLS fix on 32-bit (GDT is on fixmap >> > but not readonly). I am looking into it. >> > >> > KVM internal error. Suberror: 3 >> > extra data[0]: 8b0e >> > extra data[1]: 31 >> >> If I read that right, it's extra data[1] 0x31, which >> EXIT_REASON_EPT_MISCONFIG. >> >> I'm not seeing how the A bit in a GDT entry could have anything to do >> with it. I'm assuming it happens even without Andy's patch? > > Correct. > >> >> Linus > > > > > -- > Thomas -- Thomas
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
The issue seems to be related to exceptions happening in close pages to the fixmap GDT remapping. The original page fault happen in do_test_wp_bit which set a fixmap entry to test WP flag. If I grow the number of processors supported increasing the distance between the remapped GDT page and the WP test page, the error does not reproduce. I am still looking at the exact distance between repro and no-repro as well as the exact root cause. On Tue, Mar 21, 2017 at 12:23 PM, Thomas Garnier wrote: > On Tue, Mar 21, 2017 at 12:20 PM, Linus Torvalds > wrote: >> >> On Tue, Mar 21, 2017 at 11:16 AM, Thomas Garnier wrote: >> > This error happens even with Andy TLS fix on 32-bit (GDT is on fixmap >> > but not readonly). I am looking into it. >> > >> > KVM internal error. Suberror: 3 >> > extra data[0]: 8b0e >> > extra data[1]: 31 >> >> If I read that right, it's extra data[1] 0x31, which >> EXIT_REASON_EPT_MISCONFIG. >> >> I'm not seeing how the A bit in a GDT entry could have anything to do >> with it. I'm assuming it happens even without Andy's patch? > > Correct. > >> >> Linus > > > > > -- > Thomas -- Thomas
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Tue, Mar 21, 2017 at 12:20 PM, Linus Torvaldswrote: > > On Tue, Mar 21, 2017 at 11:16 AM, Thomas Garnier wrote: > > This error happens even with Andy TLS fix on 32-bit (GDT is on fixmap > > but not readonly). I am looking into it. > > > > KVM internal error. Suberror: 3 > > extra data[0]: 8b0e > > extra data[1]: 31 > > If I read that right, it's extra data[1] 0x31, which > EXIT_REASON_EPT_MISCONFIG. > > I'm not seeing how the A bit in a GDT entry could have anything to do > with it. I'm assuming it happens even without Andy's patch? Correct. > > Linus -- Thomas
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Tue, Mar 21, 2017 at 12:20 PM, Linus Torvalds wrote: > > On Tue, Mar 21, 2017 at 11:16 AM, Thomas Garnier wrote: > > This error happens even with Andy TLS fix on 32-bit (GDT is on fixmap > > but not readonly). I am looking into it. > > > > KVM internal error. Suberror: 3 > > extra data[0]: 8b0e > > extra data[1]: 31 > > If I read that right, it's extra data[1] 0x31, which > EXIT_REASON_EPT_MISCONFIG. > > I'm not seeing how the A bit in a GDT entry could have anything to do > with it. I'm assuming it happens even without Andy's patch? Correct. > > Linus -- Thomas
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Tue, Mar 21, 2017 at 11:16 AM, Thomas Garnierwrote: > This error happens even with Andy TLS fix on 32-bit (GDT is on fixmap > but not readonly). I am looking into it. > > KVM internal error. Suberror: 3 > extra data[0]: 8b0e > extra data[1]: 31 If I read that right, it's extra data[1] 0x31, which EXIT_REASON_EPT_MISCONFIG. I'm not seeing how the A bit in a GDT entry could have anything to do with it. I'm assuming it happens even without Andy's patch? Linus
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
On Tue, Mar 21, 2017 at 11:16 AM, Thomas Garnier wrote: > This error happens even with Andy TLS fix on 32-bit (GDT is on fixmap > but not readonly). I am looking into it. > > KVM internal error. Suberror: 3 > extra data[0]: 8b0e > extra data[1]: 31 If I read that right, it's extra data[1] 0x31, which EXIT_REASON_EPT_MISCONFIG. I'm not seeing how the A bit in a GDT entry could have anything to do with it. I'm assuming it happens even without Andy's patch? Linus
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
This error happens even with Andy TLS fix on 32-bit (GDT is on fixmap but not readonly). I am looking into it. KVM internal error. Suberror: 3 extra data[0]: 8b0e extra data[1]: 31 EAX=0001 EBX=9f9121f3 ECX=4330b100 EDX=f000 ESI=547e EDI=ffa74000 EBP=42273ef8 ESP=42273ef8 EIP=4105a0a2 EFL=00210002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =007b 00c0f300 DPL=3 DS [-WA] CS =0060 00c09b00 DPL=0 CS32 [-RA] SS =0068 00c09300 DPL=0 DS [-WA] DS =007b 00c0f300 DPL=3 DS [-WA] FS =00d8 1232d000 00809300 DPL=0 DS16 [-WA] GS =00e0 5492d4c0 0018 00409100 DPL=0 DS [--A] LDT= 00c0 TR =0080 5492b340 206b 8b00 DPL=0 TSS32-busy GDT= ffa94000 00ff IDT= fffba000 07ff CR0=80050033 CR2=ff874000 CR3=0261 CR4=0690 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=f0 d7 00 b8 01 00 00 00 8b 15 ec 48 29 42 8a 8a 00 50 87 ff <88> 8a 00 50 87 ff 31 c0 5d c3 55 89 e5 57 56 53 e8 8d f0 d7 00 89 c6 a1 00 ff dd 42 05 00 On Mon, Mar 20, 2017 at 9:57 PM, kernel test robotwrote: > > FYI, we noticed the following commit: > > commit: 69218e47994da614e7af600bf06887750ab6657a ("x86: Remap GDT tables in > the fixmap section") > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > in testcase: trinity > with following parameters: > > runtime: 300s > > test-description: Trinity is a linux system call fuzz tester. > test-url: http://codemonkey.org.uk/projects/trinity/ > > > on test machine: qemu-system-i386 -enable-kvm -smp 2 -m 320M > > caused below changes (please refer to attached dmesg/kmsg for entire > log/backtrace): > > > +--+++ > | | f06bdd4001 | > 69218e4799 | > +--+++ > | boot_successes | 3 | > 0 | > | boot_failures| 5 | > 8 | > | BUG:kernel_reboot-without-warning_in_boot_stage | 4 | >| > | WARNING:at_arch/x86/include/asm/fpu/internal.h:#fpu__restore | 1 | >| > | BUG:kernel_hang_in_boot_stage| 0 | > 8 | > +--+++ > > [0.00] sysrq: sysrq always enabled. > [0.00] PID hash table entries: 2048 (order: 1, 8192 bytes) > [0.00] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) > [0.00] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) > [0.00] Initializing CPU#0 > [0.00] allocated 331644 bytes of page_ext > [0.00] Initializing HighMem for node 0 (:) > [0.00] Memory: 267332K/327160K available (14061K kernel code, 1104K > rwdata, 4696K rodata, 2460K init, 13208K bss, 59828K reserved, 0K > cma-reserved, 0K highmem) > [0.00] virtual kernel memory layout: > [0.00] fixmap : 0xffa74000 - 0xf000 (5676 kB) > [0.00] pkmap : 0xff40 - 0xff80 (4096 kB) > [0.00] vmalloc : 0x547e - 0xff3fe000 (2732 MB) > [0.00] lowmem : 0x4000 - 0x53fe ( 319 MB) > [0.00] .init : 0x4236a000 - 0x425d1000 (2460 kB) > [0.00] .data : 0x41dbb72c - 0x42368300 (5810 kB) > [0.00] .text : 0x4100 - 0x41dbb72c (14061 kB) > > > Elapsed time: 480 > BUG: kernel hang in boot stage > > initrds=( > /osimage/yocto/yocto-tiny-i386-2016-04-22.cgz > > /lkp/scheduled/vm-lkp-hsw01-yocto-i386-25/trinity-300s-yocto-tiny-i386-2016-04-22.cgz-69218e47994da614e7af600bf06887750ab6657a-20170319-117821-bls48g-0.cgz > /lkp/lkp/lkp-i386.cgz > > > To reproduce: > > git clone https://github.com/01org/lkp-tests.git > cd lkp-tests > bin/lkp qemu -k job-script # job-script is attached in > this email > > > > Thanks, > Xiaolong -- Thomas
Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
This error happens even with Andy TLS fix on 32-bit (GDT is on fixmap but not readonly). I am looking into it. KVM internal error. Suberror: 3 extra data[0]: 8b0e extra data[1]: 31 EAX=0001 EBX=9f9121f3 ECX=4330b100 EDX=f000 ESI=547e EDI=ffa74000 EBP=42273ef8 ESP=42273ef8 EIP=4105a0a2 EFL=00210002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =007b 00c0f300 DPL=3 DS [-WA] CS =0060 00c09b00 DPL=0 CS32 [-RA] SS =0068 00c09300 DPL=0 DS [-WA] DS =007b 00c0f300 DPL=3 DS [-WA] FS =00d8 1232d000 00809300 DPL=0 DS16 [-WA] GS =00e0 5492d4c0 0018 00409100 DPL=0 DS [--A] LDT= 00c0 TR =0080 5492b340 206b 8b00 DPL=0 TSS32-busy GDT= ffa94000 00ff IDT= fffba000 07ff CR0=80050033 CR2=ff874000 CR3=0261 CR4=0690 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=f0 d7 00 b8 01 00 00 00 8b 15 ec 48 29 42 8a 8a 00 50 87 ff <88> 8a 00 50 87 ff 31 c0 5d c3 55 89 e5 57 56 53 e8 8d f0 d7 00 89 c6 a1 00 ff dd 42 05 00 On Mon, Mar 20, 2017 at 9:57 PM, kernel test robot wrote: > > FYI, we noticed the following commit: > > commit: 69218e47994da614e7af600bf06887750ab6657a ("x86: Remap GDT tables in > the fixmap section") > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > in testcase: trinity > with following parameters: > > runtime: 300s > > test-description: Trinity is a linux system call fuzz tester. > test-url: http://codemonkey.org.uk/projects/trinity/ > > > on test machine: qemu-system-i386 -enable-kvm -smp 2 -m 320M > > caused below changes (please refer to attached dmesg/kmsg for entire > log/backtrace): > > > +--+++ > | | f06bdd4001 | > 69218e4799 | > +--+++ > | boot_successes | 3 | > 0 | > | boot_failures| 5 | > 8 | > | BUG:kernel_reboot-without-warning_in_boot_stage | 4 | >| > | WARNING:at_arch/x86/include/asm/fpu/internal.h:#fpu__restore | 1 | >| > | BUG:kernel_hang_in_boot_stage| 0 | > 8 | > +--+++ > > [0.00] sysrq: sysrq always enabled. > [0.00] PID hash table entries: 2048 (order: 1, 8192 bytes) > [0.00] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) > [0.00] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) > [0.00] Initializing CPU#0 > [0.00] allocated 331644 bytes of page_ext > [0.00] Initializing HighMem for node 0 (:) > [0.00] Memory: 267332K/327160K available (14061K kernel code, 1104K > rwdata, 4696K rodata, 2460K init, 13208K bss, 59828K reserved, 0K > cma-reserved, 0K highmem) > [0.00] virtual kernel memory layout: > [0.00] fixmap : 0xffa74000 - 0xf000 (5676 kB) > [0.00] pkmap : 0xff40 - 0xff80 (4096 kB) > [0.00] vmalloc : 0x547e - 0xff3fe000 (2732 MB) > [0.00] lowmem : 0x4000 - 0x53fe ( 319 MB) > [0.00] .init : 0x4236a000 - 0x425d1000 (2460 kB) > [0.00] .data : 0x41dbb72c - 0x42368300 (5810 kB) > [0.00] .text : 0x4100 - 0x41dbb72c (14061 kB) > > > Elapsed time: 480 > BUG: kernel hang in boot stage > > initrds=( > /osimage/yocto/yocto-tiny-i386-2016-04-22.cgz > > /lkp/scheduled/vm-lkp-hsw01-yocto-i386-25/trinity-300s-yocto-tiny-i386-2016-04-22.cgz-69218e47994da614e7af600bf06887750ab6657a-20170319-117821-bls48g-0.cgz > /lkp/lkp/lkp-i386.cgz > > > To reproduce: > > git clone https://github.com/01org/lkp-tests.git > cd lkp-tests > bin/lkp qemu -k job-script # job-script is attached in > this email > > > > Thanks, > Xiaolong -- Thomas