Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-22 Thread Andy Lutomirski
On Wed, Mar 22, 2017 at 9:38 AM, Thomas Garnier  wrote:
> On Wed, Mar 22, 2017 at 9:33 AM, Andy Lutomirski  wrote:
>> On Wed, Mar 22, 2017 at 12:36 AM, Ingo Molnar  wrote:
>>>
>>> * Thomas Garnier  wrote:
>>>
 >  static inline void setup_fixmap_gdt(int cpu)
 >  {
 > __set_fixmap(get_cpu_gdt_ro_index(cpu),
 > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags);
 > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)),
 > +pg_fixmap_gdt_flags);
 >  }
 >
 >  /* Load the original GDT from the per-cpu structure */
 >
 > This makes UP boot for me, but SMP (2 cpus) is still busted.

 This change fixed boot for me:

 diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
 index b65155cc3760..4e30707d9f9a 100644
 --- a/arch/x86/include/asm/fixmap.h
 +++ b/arch/x86/include/asm/fixmap.h
 @@ -104,7 +104,9 @@ enum fixed_addresses {
 FIX_GDT_REMAP_BEGIN,
 FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,

 -   __end_of_permanent_fixed_addresses,
 +   __end_of_permanent_fixed_addresses =
 +   (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) &
 +   -PTRS_PER_PTE,

 Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD.

 I don't think that the right fix but it might help understand the
 exact root cause.
>>>
>>> Could this be related to the permission bits in the PMD itself getting out 
>>> of sync
>>> with the PTEs? WP test marks a page writable/unwritable, and maybe we mess 
>>> up the
>>> restoration. If they are on separate PMDs then this is worked around 
>>> because the
>>> fixmap GDT is on a separate PMD.
>>>
>>
>> I don't think so.  I think it's a pair of bugs related to the way that
>> percpu areas are virtually mapped.
>>
>> Bug 1: __pa is totally bogus on percpu pointers.  Oddly, we have one
>> older instance of exactly the same bug (on the same GDT address) in
>> the kernel.  I'll send a patch.
>>
>> Bug 2: Nothing syncs a freshly-set-up CPU's percpu area into
>> initial_page_table.  This makes access to the gdt fail in
>> startup_32_smp.  This looks like a longstanding bug, and I don't see
>> what it has to do with Thomas' series.  I'm still mulling over what to
>> do about it.
>
> Why do you think padding the fixmap also fix the problem? That's the
> thing I don't get.
>
> With the padding the PA is now correct and the memcmp check also
> succeed. That's odd.
>

Not sure.  There are some complicated heuristics in the percpu code
that determine how it's allocated, and the padding might be affecting
those heuristics.

--Andy


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-22 Thread Andy Lutomirski
On Wed, Mar 22, 2017 at 9:38 AM, Thomas Garnier  wrote:
> On Wed, Mar 22, 2017 at 9:33 AM, Andy Lutomirski  wrote:
>> On Wed, Mar 22, 2017 at 12:36 AM, Ingo Molnar  wrote:
>>>
>>> * Thomas Garnier  wrote:
>>>
 >  static inline void setup_fixmap_gdt(int cpu)
 >  {
 > __set_fixmap(get_cpu_gdt_ro_index(cpu),
 > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags);
 > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)),
 > +pg_fixmap_gdt_flags);
 >  }
 >
 >  /* Load the original GDT from the per-cpu structure */
 >
 > This makes UP boot for me, but SMP (2 cpus) is still busted.

 This change fixed boot for me:

 diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
 index b65155cc3760..4e30707d9f9a 100644
 --- a/arch/x86/include/asm/fixmap.h
 +++ b/arch/x86/include/asm/fixmap.h
 @@ -104,7 +104,9 @@ enum fixed_addresses {
 FIX_GDT_REMAP_BEGIN,
 FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,

 -   __end_of_permanent_fixed_addresses,
 +   __end_of_permanent_fixed_addresses =
 +   (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) &
 +   -PTRS_PER_PTE,

 Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD.

 I don't think that the right fix but it might help understand the
 exact root cause.
>>>
>>> Could this be related to the permission bits in the PMD itself getting out 
>>> of sync
>>> with the PTEs? WP test marks a page writable/unwritable, and maybe we mess 
>>> up the
>>> restoration. If they are on separate PMDs then this is worked around 
>>> because the
>>> fixmap GDT is on a separate PMD.
>>>
>>
>> I don't think so.  I think it's a pair of bugs related to the way that
>> percpu areas are virtually mapped.
>>
>> Bug 1: __pa is totally bogus on percpu pointers.  Oddly, we have one
>> older instance of exactly the same bug (on the same GDT address) in
>> the kernel.  I'll send a patch.
>>
>> Bug 2: Nothing syncs a freshly-set-up CPU's percpu area into
>> initial_page_table.  This makes access to the gdt fail in
>> startup_32_smp.  This looks like a longstanding bug, and I don't see
>> what it has to do with Thomas' series.  I'm still mulling over what to
>> do about it.
>
> Why do you think padding the fixmap also fix the problem? That's the
> thing I don't get.
>
> With the padding the PA is now correct and the memcmp check also
> succeed. That's odd.
>

Not sure.  There are some complicated heuristics in the percpu code
that determine how it's allocated, and the padding might be affecting
those heuristics.

--Andy


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-22 Thread Thomas Garnier
On Wed, Mar 22, 2017 at 9:33 AM, Andy Lutomirski  wrote:
> On Wed, Mar 22, 2017 at 12:36 AM, Ingo Molnar  wrote:
>>
>> * Thomas Garnier  wrote:
>>
>>> >  static inline void setup_fixmap_gdt(int cpu)
>>> >  {
>>> > __set_fixmap(get_cpu_gdt_ro_index(cpu),
>>> > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags);
>>> > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)),
>>> > +pg_fixmap_gdt_flags);
>>> >  }
>>> >
>>> >  /* Load the original GDT from the per-cpu structure */
>>> >
>>> > This makes UP boot for me, but SMP (2 cpus) is still busted.
>>>
>>> This change fixed boot for me:
>>>
>>> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
>>> index b65155cc3760..4e30707d9f9a 100644
>>> --- a/arch/x86/include/asm/fixmap.h
>>> +++ b/arch/x86/include/asm/fixmap.h
>>> @@ -104,7 +104,9 @@ enum fixed_addresses {
>>> FIX_GDT_REMAP_BEGIN,
>>> FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
>>>
>>> -   __end_of_permanent_fixed_addresses,
>>> +   __end_of_permanent_fixed_addresses =
>>> +   (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) &
>>> +   -PTRS_PER_PTE,
>>>
>>> Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD.
>>>
>>> I don't think that the right fix but it might help understand the
>>> exact root cause.
>>
>> Could this be related to the permission bits in the PMD itself getting out 
>> of sync
>> with the PTEs? WP test marks a page writable/unwritable, and maybe we mess 
>> up the
>> restoration. If they are on separate PMDs then this is worked around because 
>> the
>> fixmap GDT is on a separate PMD.
>>
>
> I don't think so.  I think it's a pair of bugs related to the way that
> percpu areas are virtually mapped.
>
> Bug 1: __pa is totally bogus on percpu pointers.  Oddly, we have one
> older instance of exactly the same bug (on the same GDT address) in
> the kernel.  I'll send a patch.
>
> Bug 2: Nothing syncs a freshly-set-up CPU's percpu area into
> initial_page_table.  This makes access to the gdt fail in
> startup_32_smp.  This looks like a longstanding bug, and I don't see
> what it has to do with Thomas' series.  I'm still mulling over what to
> do about it.

Why do you think padding the fixmap also fix the problem? That's the
thing I don't get.

With the padding the PA is now correct and the memcmp check also
succeed. That's odd.

>
> --Andy



-- 
Thomas


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-22 Thread Thomas Garnier
On Wed, Mar 22, 2017 at 9:33 AM, Andy Lutomirski  wrote:
> On Wed, Mar 22, 2017 at 12:36 AM, Ingo Molnar  wrote:
>>
>> * Thomas Garnier  wrote:
>>
>>> >  static inline void setup_fixmap_gdt(int cpu)
>>> >  {
>>> > __set_fixmap(get_cpu_gdt_ro_index(cpu),
>>> > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags);
>>> > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)),
>>> > +pg_fixmap_gdt_flags);
>>> >  }
>>> >
>>> >  /* Load the original GDT from the per-cpu structure */
>>> >
>>> > This makes UP boot for me, but SMP (2 cpus) is still busted.
>>>
>>> This change fixed boot for me:
>>>
>>> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
>>> index b65155cc3760..4e30707d9f9a 100644
>>> --- a/arch/x86/include/asm/fixmap.h
>>> +++ b/arch/x86/include/asm/fixmap.h
>>> @@ -104,7 +104,9 @@ enum fixed_addresses {
>>> FIX_GDT_REMAP_BEGIN,
>>> FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
>>>
>>> -   __end_of_permanent_fixed_addresses,
>>> +   __end_of_permanent_fixed_addresses =
>>> +   (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) &
>>> +   -PTRS_PER_PTE,
>>>
>>> Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD.
>>>
>>> I don't think that the right fix but it might help understand the
>>> exact root cause.
>>
>> Could this be related to the permission bits in the PMD itself getting out 
>> of sync
>> with the PTEs? WP test marks a page writable/unwritable, and maybe we mess 
>> up the
>> restoration. If they are on separate PMDs then this is worked around because 
>> the
>> fixmap GDT is on a separate PMD.
>>
>
> I don't think so.  I think it's a pair of bugs related to the way that
> percpu areas are virtually mapped.
>
> Bug 1: __pa is totally bogus on percpu pointers.  Oddly, we have one
> older instance of exactly the same bug (on the same GDT address) in
> the kernel.  I'll send a patch.
>
> Bug 2: Nothing syncs a freshly-set-up CPU's percpu area into
> initial_page_table.  This makes access to the gdt fail in
> startup_32_smp.  This looks like a longstanding bug, and I don't see
> what it has to do with Thomas' series.  I'm still mulling over what to
> do about it.

Why do you think padding the fixmap also fix the problem? That's the
thing I don't get.

With the padding the PA is now correct and the memcmp check also
succeed. That's odd.

>
> --Andy



-- 
Thomas


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-22 Thread Andy Lutomirski
On Wed, Mar 22, 2017 at 12:36 AM, Ingo Molnar  wrote:
>
> * Thomas Garnier  wrote:
>
>> >  static inline void setup_fixmap_gdt(int cpu)
>> >  {
>> > __set_fixmap(get_cpu_gdt_ro_index(cpu),
>> > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags);
>> > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)),
>> > +pg_fixmap_gdt_flags);
>> >  }
>> >
>> >  /* Load the original GDT from the per-cpu structure */
>> >
>> > This makes UP boot for me, but SMP (2 cpus) is still busted.
>>
>> This change fixed boot for me:
>>
>> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
>> index b65155cc3760..4e30707d9f9a 100644
>> --- a/arch/x86/include/asm/fixmap.h
>> +++ b/arch/x86/include/asm/fixmap.h
>> @@ -104,7 +104,9 @@ enum fixed_addresses {
>> FIX_GDT_REMAP_BEGIN,
>> FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
>>
>> -   __end_of_permanent_fixed_addresses,
>> +   __end_of_permanent_fixed_addresses =
>> +   (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) &
>> +   -PTRS_PER_PTE,
>>
>> Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD.
>>
>> I don't think that the right fix but it might help understand the
>> exact root cause.
>
> Could this be related to the permission bits in the PMD itself getting out of 
> sync
> with the PTEs? WP test marks a page writable/unwritable, and maybe we mess up 
> the
> restoration. If they are on separate PMDs then this is worked around because 
> the
> fixmap GDT is on a separate PMD.
>

I don't think so.  I think it's a pair of bugs related to the way that
percpu areas are virtually mapped.

Bug 1: __pa is totally bogus on percpu pointers.  Oddly, we have one
older instance of exactly the same bug (on the same GDT address) in
the kernel.  I'll send a patch.

Bug 2: Nothing syncs a freshly-set-up CPU's percpu area into
initial_page_table.  This makes access to the gdt fail in
startup_32_smp.  This looks like a longstanding bug, and I don't see
what it has to do with Thomas' series.  I'm still mulling over what to
do about it.

--Andy


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-22 Thread Andy Lutomirski
On Wed, Mar 22, 2017 at 12:36 AM, Ingo Molnar  wrote:
>
> * Thomas Garnier  wrote:
>
>> >  static inline void setup_fixmap_gdt(int cpu)
>> >  {
>> > __set_fixmap(get_cpu_gdt_ro_index(cpu),
>> > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags);
>> > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)),
>> > +pg_fixmap_gdt_flags);
>> >  }
>> >
>> >  /* Load the original GDT from the per-cpu structure */
>> >
>> > This makes UP boot for me, but SMP (2 cpus) is still busted.
>>
>> This change fixed boot for me:
>>
>> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
>> index b65155cc3760..4e30707d9f9a 100644
>> --- a/arch/x86/include/asm/fixmap.h
>> +++ b/arch/x86/include/asm/fixmap.h
>> @@ -104,7 +104,9 @@ enum fixed_addresses {
>> FIX_GDT_REMAP_BEGIN,
>> FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
>>
>> -   __end_of_permanent_fixed_addresses,
>> +   __end_of_permanent_fixed_addresses =
>> +   (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) &
>> +   -PTRS_PER_PTE,
>>
>> Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD.
>>
>> I don't think that the right fix but it might help understand the
>> exact root cause.
>
> Could this be related to the permission bits in the PMD itself getting out of 
> sync
> with the PTEs? WP test marks a page writable/unwritable, and maybe we mess up 
> the
> restoration. If they are on separate PMDs then this is worked around because 
> the
> fixmap GDT is on a separate PMD.
>

I don't think so.  I think it's a pair of bugs related to the way that
percpu areas are virtually mapped.

Bug 1: __pa is totally bogus on percpu pointers.  Oddly, we have one
older instance of exactly the same bug (on the same GDT address) in
the kernel.  I'll send a patch.

Bug 2: Nothing syncs a freshly-set-up CPU's percpu area into
initial_page_table.  This makes access to the gdt fail in
startup_32_smp.  This looks like a longstanding bug, and I don't see
what it has to do with Thomas' series.  I'm still mulling over what to
do about it.

--Andy


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-22 Thread Ingo Molnar

* Thomas Garnier  wrote:

> >  static inline void setup_fixmap_gdt(int cpu)
> >  {
> > __set_fixmap(get_cpu_gdt_ro_index(cpu),
> > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags);
> > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)),
> > +pg_fixmap_gdt_flags);
> >  }
> >
> >  /* Load the original GDT from the per-cpu structure */
> >
> > This makes UP boot for me, but SMP (2 cpus) is still busted.
> 
> This change fixed boot for me:
> 
> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
> index b65155cc3760..4e30707d9f9a 100644
> --- a/arch/x86/include/asm/fixmap.h
> +++ b/arch/x86/include/asm/fixmap.h
> @@ -104,7 +104,9 @@ enum fixed_addresses {
> FIX_GDT_REMAP_BEGIN,
> FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
> 
> -   __end_of_permanent_fixed_addresses,
> +   __end_of_permanent_fixed_addresses =
> +   (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) &
> +   -PTRS_PER_PTE,
> 
> Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD.
> 
> I don't think that the right fix but it might help understand the
> exact root cause.

Could this be related to the permission bits in the PMD itself getting out of 
sync 
with the PTEs? WP test marks a page writable/unwritable, and maybe we mess up 
the 
restoration. If they are on separate PMDs then this is worked around because 
the 
fixmap GDT is on a separate PMD.

Thanks,

Ingo


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-22 Thread Ingo Molnar

* Thomas Garnier  wrote:

> >  static inline void setup_fixmap_gdt(int cpu)
> >  {
> > __set_fixmap(get_cpu_gdt_ro_index(cpu),
> > -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags);
> > +slow_virt_to_phys(get_cpu_gdt_rw(cpu)),
> > +pg_fixmap_gdt_flags);
> >  }
> >
> >  /* Load the original GDT from the per-cpu structure */
> >
> > This makes UP boot for me, but SMP (2 cpus) is still busted.
> 
> This change fixed boot for me:
> 
> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
> index b65155cc3760..4e30707d9f9a 100644
> --- a/arch/x86/include/asm/fixmap.h
> +++ b/arch/x86/include/asm/fixmap.h
> @@ -104,7 +104,9 @@ enum fixed_addresses {
> FIX_GDT_REMAP_BEGIN,
> FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
> 
> -   __end_of_permanent_fixed_addresses,
> +   __end_of_permanent_fixed_addresses =
> +   (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) &
> +   -PTRS_PER_PTE,
> 
> Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD.
> 
> I don't think that the right fix but it might help understand the
> exact root cause.

Could this be related to the permission bits in the PMD itself getting out of 
sync 
with the PTEs? WP test marks a page writable/unwritable, and maybe we mess up 
the 
restoration. If they are on separate PMDs then this is worked around because 
the 
fixmap GDT is on a separate PMD.

Thanks,

Ingo


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Thomas Garnier
On Tue, Mar 21, 2017 at 9:27 PM, Andy Lutomirski  wrote:
> On Tue, Mar 21, 2017 at 5:41 PM, Thomas Garnier  wrote:
>> On Tue, Mar 21, 2017 at 4:51 PM, Andy Lutomirski  wrote:
>>> On Tue, Mar 21, 2017 at 3:32 PM, Andy Lutomirski  
>>> wrote:
 On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds
  wrote:
> On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier  
> wrote:
>> The issue seems to be related to exceptions happening in close pages
>> to the fixmap GDT remapping.
>>
>> The original page fault happen in do_test_wp_bit which set a fixmap
>> entry to test WP flag. If I grow the number of processors supported
>> increasing the distance between the remapped GDT page and the WP test
>> page, the error does not reproduce.
>>
>> I am still looking at the exact distance between repro and no-repro as
>> well as the exact root cause.
>
> Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can*
> cover 8k entries, which at 8 bytes each would be 64kB.

 The QEMU barf says the GDT limit is 0xff, for better or for worse.

>
> So somebody trying to load an invalid segment (say, 0x) might end
> up causing an access to the GDT base + 64k - 8.
>
> It is also possible that the CPU might do a page table writability
> check *before* it does the limit check. That would sound odd, though.
> Might be a CPU errata.
>

>>>
 There's presumably something genuinely wrong with our GDT.
>>>
>>> This is suspicious.  I added this code in test_wp_bit:
>>>
>>> if (memcmp(get_current_gdt_ro(), get_current_gdt_rw(), 4096) != 0) {
>>> pr_err("Oh crap\n");
>>> BUG_ON(1);
>>> }
>>>
>>> It printed "Oh crap" and blew up.  Methinks something's wrong with the
>>> fixmap.  Is it possible that we're crossing a PMD boundary and failing
>>> to translate the addresses right?
>>
>> I might be that. We crash when the PKMAP_BASE is just after the FIX_WP_TEST.
>>
>> I will continue testing couple scenarios and design a fix. Moving the
>> GDT FIXMAP at the beginning or align the base (or pad the end).
>>
>
> Talk about barking up the wrong tree...
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index f8e22dbad86c..c564f62c7a8d 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -462,7 +464,8 @@ pgprot_t pg_fixmap_gdt_flags = PAGE_KERNEL;
>  static inline void setup_fixmap_gdt(int cpu)
>  {
> __set_fixmap(get_cpu_gdt_ro_index(cpu),
> -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags);
> +slow_virt_to_phys(get_cpu_gdt_rw(cpu)),
> +pg_fixmap_gdt_flags);
>  }
>
>  /* Load the original GDT from the per-cpu structure */
>
> This makes UP boot for me, but SMP (2 cpus) is still busted.

This change fixed boot for me:

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index b65155cc3760..4e30707d9f9a 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -104,7 +104,9 @@ enum fixed_addresses {
FIX_GDT_REMAP_BEGIN,
FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,

-   __end_of_permanent_fixed_addresses,
+   __end_of_permanent_fixed_addresses =
+   (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) &
+   -PTRS_PER_PTE,

Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD.

I don't think that the right fix but it might help understand the
exact root cause.

-- 
Thomas


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Thomas Garnier
On Tue, Mar 21, 2017 at 9:27 PM, Andy Lutomirski  wrote:
> On Tue, Mar 21, 2017 at 5:41 PM, Thomas Garnier  wrote:
>> On Tue, Mar 21, 2017 at 4:51 PM, Andy Lutomirski  wrote:
>>> On Tue, Mar 21, 2017 at 3:32 PM, Andy Lutomirski  
>>> wrote:
 On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds
  wrote:
> On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier  
> wrote:
>> The issue seems to be related to exceptions happening in close pages
>> to the fixmap GDT remapping.
>>
>> The original page fault happen in do_test_wp_bit which set a fixmap
>> entry to test WP flag. If I grow the number of processors supported
>> increasing the distance between the remapped GDT page and the WP test
>> page, the error does not reproduce.
>>
>> I am still looking at the exact distance between repro and no-repro as
>> well as the exact root cause.
>
> Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can*
> cover 8k entries, which at 8 bytes each would be 64kB.

 The QEMU barf says the GDT limit is 0xff, for better or for worse.

>
> So somebody trying to load an invalid segment (say, 0x) might end
> up causing an access to the GDT base + 64k - 8.
>
> It is also possible that the CPU might do a page table writability
> check *before* it does the limit check. That would sound odd, though.
> Might be a CPU errata.
>

>>>
 There's presumably something genuinely wrong with our GDT.
>>>
>>> This is suspicious.  I added this code in test_wp_bit:
>>>
>>> if (memcmp(get_current_gdt_ro(), get_current_gdt_rw(), 4096) != 0) {
>>> pr_err("Oh crap\n");
>>> BUG_ON(1);
>>> }
>>>
>>> It printed "Oh crap" and blew up.  Methinks something's wrong with the
>>> fixmap.  Is it possible that we're crossing a PMD boundary and failing
>>> to translate the addresses right?
>>
>> I might be that. We crash when the PKMAP_BASE is just after the FIX_WP_TEST.
>>
>> I will continue testing couple scenarios and design a fix. Moving the
>> GDT FIXMAP at the beginning or align the base (or pad the end).
>>
>
> Talk about barking up the wrong tree...
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index f8e22dbad86c..c564f62c7a8d 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -462,7 +464,8 @@ pgprot_t pg_fixmap_gdt_flags = PAGE_KERNEL;
>  static inline void setup_fixmap_gdt(int cpu)
>  {
> __set_fixmap(get_cpu_gdt_ro_index(cpu),
> -__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags);
> +slow_virt_to_phys(get_cpu_gdt_rw(cpu)),
> +pg_fixmap_gdt_flags);
>  }
>
>  /* Load the original GDT from the per-cpu structure */
>
> This makes UP boot for me, but SMP (2 cpus) is still busted.

This change fixed boot for me:

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index b65155cc3760..4e30707d9f9a 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -104,7 +104,9 @@ enum fixed_addresses {
FIX_GDT_REMAP_BEGIN,
FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,

-   __end_of_permanent_fixed_addresses,
+   __end_of_permanent_fixed_addresses =
+   (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) &
+   -PTRS_PER_PTE,

Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD.

I don't think that the right fix but it might help understand the
exact root cause.

-- 
Thomas


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Andy Lutomirski
On Tue, Mar 21, 2017 at 5:41 PM, Thomas Garnier  wrote:
> On Tue, Mar 21, 2017 at 4:51 PM, Andy Lutomirski  wrote:
>> On Tue, Mar 21, 2017 at 3:32 PM, Andy Lutomirski  wrote:
>>> On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds
>>>  wrote:
 On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier  
 wrote:
> The issue seems to be related to exceptions happening in close pages
> to the fixmap GDT remapping.
>
> The original page fault happen in do_test_wp_bit which set a fixmap
> entry to test WP flag. If I grow the number of processors supported
> increasing the distance between the remapped GDT page and the WP test
> page, the error does not reproduce.
>
> I am still looking at the exact distance between repro and no-repro as
> well as the exact root cause.

 Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can*
 cover 8k entries, which at 8 bytes each would be 64kB.
>>>
>>> The QEMU barf says the GDT limit is 0xff, for better or for worse.
>>>

 So somebody trying to load an invalid segment (say, 0x) might end
 up causing an access to the GDT base + 64k - 8.

 It is also possible that the CPU might do a page table writability
 check *before* it does the limit check. That would sound odd, though.
 Might be a CPU errata.

>>>
>>
>>> There's presumably something genuinely wrong with our GDT.
>>
>> This is suspicious.  I added this code in test_wp_bit:
>>
>> if (memcmp(get_current_gdt_ro(), get_current_gdt_rw(), 4096) != 0) {
>> pr_err("Oh crap\n");
>> BUG_ON(1);
>> }
>>
>> It printed "Oh crap" and blew up.  Methinks something's wrong with the
>> fixmap.  Is it possible that we're crossing a PMD boundary and failing
>> to translate the addresses right?
>
> I might be that. We crash when the PKMAP_BASE is just after the FIX_WP_TEST.
>
> I will continue testing couple scenarios and design a fix. Moving the
> GDT FIXMAP at the beginning or align the base (or pad the end).
>

Talk about barking up the wrong tree...

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f8e22dbad86c..c564f62c7a8d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -462,7 +464,8 @@ pgprot_t pg_fixmap_gdt_flags = PAGE_KERNEL;
 static inline void setup_fixmap_gdt(int cpu)
 {
__set_fixmap(get_cpu_gdt_ro_index(cpu),
-__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags);
+slow_virt_to_phys(get_cpu_gdt_rw(cpu)),
+pg_fixmap_gdt_flags);
 }

 /* Load the original GDT from the per-cpu structure */

This makes UP boot for me, but SMP (2 cpus) is still busted.


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Andy Lutomirski
On Tue, Mar 21, 2017 at 5:41 PM, Thomas Garnier  wrote:
> On Tue, Mar 21, 2017 at 4:51 PM, Andy Lutomirski  wrote:
>> On Tue, Mar 21, 2017 at 3:32 PM, Andy Lutomirski  wrote:
>>> On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds
>>>  wrote:
 On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier  
 wrote:
> The issue seems to be related to exceptions happening in close pages
> to the fixmap GDT remapping.
>
> The original page fault happen in do_test_wp_bit which set a fixmap
> entry to test WP flag. If I grow the number of processors supported
> increasing the distance between the remapped GDT page and the WP test
> page, the error does not reproduce.
>
> I am still looking at the exact distance between repro and no-repro as
> well as the exact root cause.

 Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can*
 cover 8k entries, which at 8 bytes each would be 64kB.
>>>
>>> The QEMU barf says the GDT limit is 0xff, for better or for worse.
>>>

 So somebody trying to load an invalid segment (say, 0x) might end
 up causing an access to the GDT base + 64k - 8.

 It is also possible that the CPU might do a page table writability
 check *before* it does the limit check. That would sound odd, though.
 Might be a CPU errata.

>>>
>>
>>> There's presumably something genuinely wrong with our GDT.
>>
>> This is suspicious.  I added this code in test_wp_bit:
>>
>> if (memcmp(get_current_gdt_ro(), get_current_gdt_rw(), 4096) != 0) {
>> pr_err("Oh crap\n");
>> BUG_ON(1);
>> }
>>
>> It printed "Oh crap" and blew up.  Methinks something's wrong with the
>> fixmap.  Is it possible that we're crossing a PMD boundary and failing
>> to translate the addresses right?
>
> I might be that. We crash when the PKMAP_BASE is just after the FIX_WP_TEST.
>
> I will continue testing couple scenarios and design a fix. Moving the
> GDT FIXMAP at the beginning or align the base (or pad the end).
>

Talk about barking up the wrong tree...

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f8e22dbad86c..c564f62c7a8d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -462,7 +464,8 @@ pgprot_t pg_fixmap_gdt_flags = PAGE_KERNEL;
 static inline void setup_fixmap_gdt(int cpu)
 {
__set_fixmap(get_cpu_gdt_ro_index(cpu),
-__pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags);
+slow_virt_to_phys(get_cpu_gdt_rw(cpu)),
+pg_fixmap_gdt_flags);
 }

 /* Load the original GDT from the per-cpu structure */

This makes UP boot for me, but SMP (2 cpus) is still busted.


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Thomas Garnier
On Tue, Mar 21, 2017 at 4:51 PM, Andy Lutomirski  wrote:
> On Tue, Mar 21, 2017 at 3:32 PM, Andy Lutomirski  wrote:
>> On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds
>>  wrote:
>>> On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier  wrote:
 The issue seems to be related to exceptions happening in close pages
 to the fixmap GDT remapping.

 The original page fault happen in do_test_wp_bit which set a fixmap
 entry to test WP flag. If I grow the number of processors supported
 increasing the distance between the remapped GDT page and the WP test
 page, the error does not reproduce.

 I am still looking at the exact distance between repro and no-repro as
 well as the exact root cause.
>>>
>>> Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can*
>>> cover 8k entries, which at 8 bytes each would be 64kB.
>>
>> The QEMU barf says the GDT limit is 0xff, for better or for worse.
>>
>>>
>>> So somebody trying to load an invalid segment (say, 0x) might end
>>> up causing an access to the GDT base + 64k - 8.
>>>
>>> It is also possible that the CPU might do a page table writability
>>> check *before* it does the limit check. That would sound odd, though.
>>> Might be a CPU errata.
>>>
>>
>
>> There's presumably something genuinely wrong with our GDT.
>
> This is suspicious.  I added this code in test_wp_bit:
>
> if (memcmp(get_current_gdt_ro(), get_current_gdt_rw(), 4096) != 0) {
> pr_err("Oh crap\n");
> BUG_ON(1);
> }
>
> It printed "Oh crap" and blew up.  Methinks something's wrong with the
> fixmap.  Is it possible that we're crossing a PMD boundary and failing
> to translate the addresses right?

I might be that. We crash when the PKMAP_BASE is just after the FIX_WP_TEST.

I will continue testing couple scenarios and design a fix. Moving the
GDT FIXMAP at the beginning or align the base (or pad the end).

-- 
Thomas


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Thomas Garnier
On Tue, Mar 21, 2017 at 4:51 PM, Andy Lutomirski  wrote:
> On Tue, Mar 21, 2017 at 3:32 PM, Andy Lutomirski  wrote:
>> On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds
>>  wrote:
>>> On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier  wrote:
 The issue seems to be related to exceptions happening in close pages
 to the fixmap GDT remapping.

 The original page fault happen in do_test_wp_bit which set a fixmap
 entry to test WP flag. If I grow the number of processors supported
 increasing the distance between the remapped GDT page and the WP test
 page, the error does not reproduce.

 I am still looking at the exact distance between repro and no-repro as
 well as the exact root cause.
>>>
>>> Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can*
>>> cover 8k entries, which at 8 bytes each would be 64kB.
>>
>> The QEMU barf says the GDT limit is 0xff, for better or for worse.
>>
>>>
>>> So somebody trying to load an invalid segment (say, 0x) might end
>>> up causing an access to the GDT base + 64k - 8.
>>>
>>> It is also possible that the CPU might do a page table writability
>>> check *before* it does the limit check. That would sound odd, though.
>>> Might be a CPU errata.
>>>
>>
>
>> There's presumably something genuinely wrong with our GDT.
>
> This is suspicious.  I added this code in test_wp_bit:
>
> if (memcmp(get_current_gdt_ro(), get_current_gdt_rw(), 4096) != 0) {
> pr_err("Oh crap\n");
> BUG_ON(1);
> }
>
> It printed "Oh crap" and blew up.  Methinks something's wrong with the
> fixmap.  Is it possible that we're crossing a PMD boundary and failing
> to translate the addresses right?

I might be that. We crash when the PKMAP_BASE is just after the FIX_WP_TEST.

I will continue testing couple scenarios and design a fix. Moving the
GDT FIXMAP at the beginning or align the base (or pad the end).

-- 
Thomas


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Andy Lutomirski
On Tue, Mar 21, 2017 at 3:32 PM, Andy Lutomirski  wrote:
> On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds
>  wrote:
>> On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier  wrote:
>>> The issue seems to be related to exceptions happening in close pages
>>> to the fixmap GDT remapping.
>>>
>>> The original page fault happen in do_test_wp_bit which set a fixmap
>>> entry to test WP flag. If I grow the number of processors supported
>>> increasing the distance between the remapped GDT page and the WP test
>>> page, the error does not reproduce.
>>>
>>> I am still looking at the exact distance between repro and no-repro as
>>> well as the exact root cause.
>>
>> Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can*
>> cover 8k entries, which at 8 bytes each would be 64kB.
>
> The QEMU barf says the GDT limit is 0xff, for better or for worse.
>
>>
>> So somebody trying to load an invalid segment (say, 0x) might end
>> up causing an access to the GDT base + 64k - 8.
>>
>> It is also possible that the CPU might do a page table writability
>> check *before* it does the limit check. That would sound odd, though.
>> Might be a CPU errata.
>>
>

> There's presumably something genuinely wrong with our GDT.

This is suspicious.  I added this code in test_wp_bit:

if (memcmp(get_current_gdt_ro(), get_current_gdt_rw(), 4096) != 0) {
pr_err("Oh crap\n");
BUG_ON(1);
}

It printed "Oh crap" and blew up.  Methinks something's wrong with the
fixmap.  Is it possible that we're crossing a PMD boundary and failing
to translate the addresses right?


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Andy Lutomirski
On Tue, Mar 21, 2017 at 3:32 PM, Andy Lutomirski  wrote:
> On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds
>  wrote:
>> On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier  wrote:
>>> The issue seems to be related to exceptions happening in close pages
>>> to the fixmap GDT remapping.
>>>
>>> The original page fault happen in do_test_wp_bit which set a fixmap
>>> entry to test WP flag. If I grow the number of processors supported
>>> increasing the distance between the remapped GDT page and the WP test
>>> page, the error does not reproduce.
>>>
>>> I am still looking at the exact distance between repro and no-repro as
>>> well as the exact root cause.
>>
>> Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can*
>> cover 8k entries, which at 8 bytes each would be 64kB.
>
> The QEMU barf says the GDT limit is 0xff, for better or for worse.
>
>>
>> So somebody trying to load an invalid segment (say, 0x) might end
>> up causing an access to the GDT base + 64k - 8.
>>
>> It is also possible that the CPU might do a page table writability
>> check *before* it does the limit check. That would sound odd, though.
>> Might be a CPU errata.
>>
>

> There's presumably something genuinely wrong with our GDT.

This is suspicious.  I added this code in test_wp_bit:

if (memcmp(get_current_gdt_ro(), get_current_gdt_rw(), 4096) != 0) {
pr_err("Oh crap\n");
BUG_ON(1);
}

It printed "Oh crap" and blew up.  Methinks something's wrong with the
fixmap.  Is it possible that we're crossing a PMD boundary and failing
to translate the addresses right?


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Andy Lutomirski
On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds
 wrote:
> On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier  wrote:
>> The issue seems to be related to exceptions happening in close pages
>> to the fixmap GDT remapping.
>>
>> The original page fault happen in do_test_wp_bit which set a fixmap
>> entry to test WP flag. If I grow the number of processors supported
>> increasing the distance between the remapped GDT page and the WP test
>> page, the error does not reproduce.
>>
>> I am still looking at the exact distance between repro and no-repro as
>> well as the exact root cause.
>
> Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can*
> cover 8k entries, which at 8 bytes each would be 64kB.

The QEMU barf says the GDT limit is 0xff, for better or for worse.

>
> So somebody trying to load an invalid segment (say, 0x) might end
> up causing an access to the GDT base + 64k - 8.
>
> It is also possible that the CPU might do a page table writability
> check *before* it does the limit check. That would sound odd, though.
> Might be a CPU errata.
>

I added a global TLB flush right after __set_fixmap(), with no effect.
I instrumented the code a bit and I see:

[0.00] Checking if this processor honours the WP bit even in
supervisor mode...
[0.00] Will do WP test: PA 258b000 VA ff874000 GDTRW 547e
GDTRO ffa94000
KVM internal error. Suberror: 3
extra data[0]: 8b0e
extra data[1]: 31
EAX=0001 EBX=cbb13bc3 ECX= EDX=f000
ESI=547e EDI=ffa94000 EBP=42201f4c ESP=42201f4c
EIP=4105819d EFL=00210006 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b   00c0f300 DPL=3 DS   [-WA]
CS =0060   00c09b00 DPL=0 CS32 [-RA]
SS =0068   00c09300 DPL=0 DS   [-WA]
DS =007b   00c0f300 DPL=3 DS   [-WA]
FS =00d8 123b2000  00809300 DPL=0 DS16 [-WA]
GS =00e0 5492d300 0018 00409100 DPL=0 DS   [--A]
LDT=   
TR =0080 5492b180 206b 8b00 DPL=0 TSS32-busy
GDT= ffa94000 00ff
IDT= fffba000 07ff
CR0=80050033 CR2=ff874000 CR3=0258b000 CR4=00040690
DR0= DR1= DR2=
DR3=
DR6=fffe0ff0 DR7=0400
EFER=
Code=58 d1 00 b8 01 00 00 00 8b 15 ac 13 22 42 8a 8a 00 50 87 ff <88>
8a 00 50 87 ff 31 c0 5d c3 90 90 90 90 90 90 90 90 90 55 2d 84 02 00
00 89 e5 e8 c3 05

The faulting instruction is, as expected:

   e:8a 8a 00 50 87 ffmov-0x78b000(%rdx),%cl
  14:*88 8a 00 50 87 ffmov%cl,-0x78b000(%rdx)
<-- trapping instruction

CR2 is what we expect.  It would be nice to see the GPA and GLA for
the EPT misconfiguration, but KVM doesn't appear to show it.

I doubt we're looking at an erratum here.  QEMU TCG triple-faults:

[0.00] Will do WP test: PA 258b000 VA ff874000 GDTRW 547e
GDTRO ffa94000

check_exception old: 0x new 0xe [#PF]
 0: v=0e e=0003 i=0 cpl=0 IP=0060:4105819d
pc=4105819d SP=0068:42201f4c CR2=ff874000
EAX=0001 EBX=88eed8df ECX= EDX=f000
ESI=547e EDI=ffa94000 EBP=42201f4c ESP=42201f4c
EIP=4105819d EFL=0026 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b   00cff300 DPL=3 DS   [-WA]
CS =0060   00cf9a00 DPL=0 CS32 [-R-]
SS =0068   00cf9300 DPL=0 DS   [-WA]
DS =007b   00cff300 DPL=3 DS   [-WA]
FS =00d8 123b2000  008f9300 DPL=0 DS16 [-WA]
GS =00e0 5492d300 0018 00409100 DPL=0 DS   [--A]
LDT=   8200 DPL=0 LDT
TR =0080 5492b180 206b 8900 DPL=0 TSS32-avl
GDT= ffa94000 00ff
IDT= fffba000 07ff
CR0=80050033 CR2=ff874000 CR3=0258b000 CR4=0690
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
CCS=0004 CCD=42201f3c CCO=ADDL
EFER=
check_exception old: 0xe new 0xd [#GP]
 1: v=08 e= i=0 cpl=0 IP=0060:4105819d
pc=4105819d SP=0068:42201f4c
env->regs[R_EAX]=0001
EAX=0001 EBX=88eed8df ECX= EDX=f000
ESI=547e EDI=ffa94000 EBP=42201f4c ESP=42201f4c
EIP=4105819d EFL=0026 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b   00cff300 DPL=3 DS   [-WA]
CS =0060   00cf9a00 DPL=0 CS32 [-R-]
SS =0068   00cf9300 DPL=0 DS   [-WA]
DS =007b   00cff300 DPL=3 DS   [-WA]
FS =00d8 123b2000  008f9300 DPL=0 DS16 [-WA]
GS =00e0 5492d300 0018 00409100 DPL=0 DS   [--A]
LDT=   8200 DPL=0 LDT
TR =0080 5492b180 206b 8900 DPL=0 TSS32-avl
GDT= ffa94000 00ff
IDT= fffba000 07ff
CR0=80050033 CR2=ff874000 CR3=0258b000 CR4=0690
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
CCS=0004 

Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Andy Lutomirski
On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds
 wrote:
> On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier  wrote:
>> The issue seems to be related to exceptions happening in close pages
>> to the fixmap GDT remapping.
>>
>> The original page fault happen in do_test_wp_bit which set a fixmap
>> entry to test WP flag. If I grow the number of processors supported
>> increasing the distance between the remapped GDT page and the WP test
>> page, the error does not reproduce.
>>
>> I am still looking at the exact distance between repro and no-repro as
>> well as the exact root cause.
>
> Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can*
> cover 8k entries, which at 8 bytes each would be 64kB.

The QEMU barf says the GDT limit is 0xff, for better or for worse.

>
> So somebody trying to load an invalid segment (say, 0x) might end
> up causing an access to the GDT base + 64k - 8.
>
> It is also possible that the CPU might do a page table writability
> check *before* it does the limit check. That would sound odd, though.
> Might be a CPU errata.
>

I added a global TLB flush right after __set_fixmap(), with no effect.
I instrumented the code a bit and I see:

[0.00] Checking if this processor honours the WP bit even in
supervisor mode...
[0.00] Will do WP test: PA 258b000 VA ff874000 GDTRW 547e
GDTRO ffa94000
KVM internal error. Suberror: 3
extra data[0]: 8b0e
extra data[1]: 31
EAX=0001 EBX=cbb13bc3 ECX= EDX=f000
ESI=547e EDI=ffa94000 EBP=42201f4c ESP=42201f4c
EIP=4105819d EFL=00210006 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b   00c0f300 DPL=3 DS   [-WA]
CS =0060   00c09b00 DPL=0 CS32 [-RA]
SS =0068   00c09300 DPL=0 DS   [-WA]
DS =007b   00c0f300 DPL=3 DS   [-WA]
FS =00d8 123b2000  00809300 DPL=0 DS16 [-WA]
GS =00e0 5492d300 0018 00409100 DPL=0 DS   [--A]
LDT=   
TR =0080 5492b180 206b 8b00 DPL=0 TSS32-busy
GDT= ffa94000 00ff
IDT= fffba000 07ff
CR0=80050033 CR2=ff874000 CR3=0258b000 CR4=00040690
DR0= DR1= DR2=
DR3=
DR6=fffe0ff0 DR7=0400
EFER=
Code=58 d1 00 b8 01 00 00 00 8b 15 ac 13 22 42 8a 8a 00 50 87 ff <88>
8a 00 50 87 ff 31 c0 5d c3 90 90 90 90 90 90 90 90 90 55 2d 84 02 00
00 89 e5 e8 c3 05

The faulting instruction is, as expected:

   e:8a 8a 00 50 87 ffmov-0x78b000(%rdx),%cl
  14:*88 8a 00 50 87 ffmov%cl,-0x78b000(%rdx)
<-- trapping instruction

CR2 is what we expect.  It would be nice to see the GPA and GLA for
the EPT misconfiguration, but KVM doesn't appear to show it.

I doubt we're looking at an erratum here.  QEMU TCG triple-faults:

[0.00] Will do WP test: PA 258b000 VA ff874000 GDTRW 547e
GDTRO ffa94000

check_exception old: 0x new 0xe [#PF]
 0: v=0e e=0003 i=0 cpl=0 IP=0060:4105819d
pc=4105819d SP=0068:42201f4c CR2=ff874000
EAX=0001 EBX=88eed8df ECX= EDX=f000
ESI=547e EDI=ffa94000 EBP=42201f4c ESP=42201f4c
EIP=4105819d EFL=0026 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b   00cff300 DPL=3 DS   [-WA]
CS =0060   00cf9a00 DPL=0 CS32 [-R-]
SS =0068   00cf9300 DPL=0 DS   [-WA]
DS =007b   00cff300 DPL=3 DS   [-WA]
FS =00d8 123b2000  008f9300 DPL=0 DS16 [-WA]
GS =00e0 5492d300 0018 00409100 DPL=0 DS   [--A]
LDT=   8200 DPL=0 LDT
TR =0080 5492b180 206b 8900 DPL=0 TSS32-avl
GDT= ffa94000 00ff
IDT= fffba000 07ff
CR0=80050033 CR2=ff874000 CR3=0258b000 CR4=0690
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
CCS=0004 CCD=42201f3c CCO=ADDL
EFER=
check_exception old: 0xe new 0xd [#GP]
 1: v=08 e= i=0 cpl=0 IP=0060:4105819d
pc=4105819d SP=0068:42201f4c
env->regs[R_EAX]=0001
EAX=0001 EBX=88eed8df ECX= EDX=f000
ESI=547e EDI=ffa94000 EBP=42201f4c ESP=42201f4c
EIP=4105819d EFL=0026 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b   00cff300 DPL=3 DS   [-WA]
CS =0060   00cf9a00 DPL=0 CS32 [-R-]
SS =0068   00cf9300 DPL=0 DS   [-WA]
DS =007b   00cff300 DPL=3 DS   [-WA]
FS =00d8 123b2000  008f9300 DPL=0 DS16 [-WA]
GS =00e0 5492d300 0018 00409100 DPL=0 DS   [--A]
LDT=   8200 DPL=0 LDT
TR =0080 5492b180 206b 8900 DPL=0 TSS32-avl
GDT= ffa94000 00ff
IDT= fffba000 07ff
CR0=80050033 CR2=ff874000 CR3=0258b000 CR4=0690
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
CCS=0004 CCD=42201f3c CCO=ADDL
EFER=

Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Linus Torvalds
On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier  wrote:
> The issue seems to be related to exceptions happening in close pages
> to the fixmap GDT remapping.
>
> The original page fault happen in do_test_wp_bit which set a fixmap
> entry to test WP flag. If I grow the number of processors supported
> increasing the distance between the remapped GDT page and the WP test
> page, the error does not reproduce.
>
> I am still looking at the exact distance between repro and no-repro as
> well as the exact root cause.

Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can*
cover 8k entries, which at 8 bytes each would be 64kB.

So somebody trying to load an invalid segment (say, 0x) might end
up causing an access to the GDT base + 64k - 8.

It is also possible that the CPU might do a page table writability
check *before* it does the limit check. That would sound odd, though.
Might be a CPU errata.

  Linus


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Linus Torvalds
On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier  wrote:
> The issue seems to be related to exceptions happening in close pages
> to the fixmap GDT remapping.
>
> The original page fault happen in do_test_wp_bit which set a fixmap
> entry to test WP flag. If I grow the number of processors supported
> increasing the distance between the remapped GDT page and the WP test
> page, the error does not reproduce.
>
> I am still looking at the exact distance between repro and no-repro as
> well as the exact root cause.

Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can*
cover 8k entries, which at 8 bytes each would be 64kB.

So somebody trying to load an invalid segment (say, 0x) might end
up causing an access to the GDT base + 64k - 8.

It is also possible that the CPU might do a page table writability
check *before* it does the limit check. That would sound odd, though.
Might be a CPU errata.

  Linus


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Thomas Garnier
The issue seems to be related to exceptions happening in close pages
to the fixmap GDT remapping.

The original page fault happen in do_test_wp_bit which set a fixmap
entry to test WP flag. If I grow the number of processors supported
increasing the distance between the remapped GDT page and the WP test
page, the error does not reproduce.

I am still looking at the exact distance between repro and no-repro as
well as the exact root cause.

On Tue, Mar 21, 2017 at 12:23 PM, Thomas Garnier  wrote:
> On Tue, Mar 21, 2017 at 12:20 PM, Linus Torvalds
>  wrote:
>>
>> On Tue, Mar 21, 2017 at 11:16 AM, Thomas Garnier  wrote:
>> > This error happens even with Andy TLS fix on 32-bit (GDT is on fixmap
>> > but not readonly). I am looking into it.
>> >
>> > KVM internal error. Suberror: 3
>> > extra data[0]: 8b0e
>> > extra data[1]: 31
>>
>> If I read that right, it's extra data[1] 0x31, which 
>> EXIT_REASON_EPT_MISCONFIG.
>>
>> I'm not seeing how the A bit in a GDT entry could have anything to do
>> with it. I'm assuming it happens even without Andy's patch?
>
> Correct.
>
>>
>>   Linus
>
>
>
>
> --
> Thomas



-- 
Thomas


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Thomas Garnier
The issue seems to be related to exceptions happening in close pages
to the fixmap GDT remapping.

The original page fault happen in do_test_wp_bit which set a fixmap
entry to test WP flag. If I grow the number of processors supported
increasing the distance between the remapped GDT page and the WP test
page, the error does not reproduce.

I am still looking at the exact distance between repro and no-repro as
well as the exact root cause.

On Tue, Mar 21, 2017 at 12:23 PM, Thomas Garnier  wrote:
> On Tue, Mar 21, 2017 at 12:20 PM, Linus Torvalds
>  wrote:
>>
>> On Tue, Mar 21, 2017 at 11:16 AM, Thomas Garnier  wrote:
>> > This error happens even with Andy TLS fix on 32-bit (GDT is on fixmap
>> > but not readonly). I am looking into it.
>> >
>> > KVM internal error. Suberror: 3
>> > extra data[0]: 8b0e
>> > extra data[1]: 31
>>
>> If I read that right, it's extra data[1] 0x31, which 
>> EXIT_REASON_EPT_MISCONFIG.
>>
>> I'm not seeing how the A bit in a GDT entry could have anything to do
>> with it. I'm assuming it happens even without Andy's patch?
>
> Correct.
>
>>
>>   Linus
>
>
>
>
> --
> Thomas



-- 
Thomas


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Thomas Garnier
On Tue, Mar 21, 2017 at 12:20 PM, Linus Torvalds
 wrote:
>
> On Tue, Mar 21, 2017 at 11:16 AM, Thomas Garnier  wrote:
> > This error happens even with Andy TLS fix on 32-bit (GDT is on fixmap
> > but not readonly). I am looking into it.
> >
> > KVM internal error. Suberror: 3
> > extra data[0]: 8b0e
> > extra data[1]: 31
>
> If I read that right, it's extra data[1] 0x31, which 
> EXIT_REASON_EPT_MISCONFIG.
>
> I'm not seeing how the A bit in a GDT entry could have anything to do
> with it. I'm assuming it happens even without Andy's patch?

Correct.

>
>   Linus




-- 
Thomas


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Thomas Garnier
On Tue, Mar 21, 2017 at 12:20 PM, Linus Torvalds
 wrote:
>
> On Tue, Mar 21, 2017 at 11:16 AM, Thomas Garnier  wrote:
> > This error happens even with Andy TLS fix on 32-bit (GDT is on fixmap
> > but not readonly). I am looking into it.
> >
> > KVM internal error. Suberror: 3
> > extra data[0]: 8b0e
> > extra data[1]: 31
>
> If I read that right, it's extra data[1] 0x31, which 
> EXIT_REASON_EPT_MISCONFIG.
>
> I'm not seeing how the A bit in a GDT entry could have anything to do
> with it. I'm assuming it happens even without Andy's patch?

Correct.

>
>   Linus




-- 
Thomas


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Linus Torvalds
On Tue, Mar 21, 2017 at 11:16 AM, Thomas Garnier  wrote:
> This error happens even with Andy TLS fix on 32-bit (GDT is on fixmap
> but not readonly). I am looking into it.
>
> KVM internal error. Suberror: 3
> extra data[0]: 8b0e
> extra data[1]: 31

If I read that right, it's extra data[1] 0x31, which EXIT_REASON_EPT_MISCONFIG.

I'm not seeing how the A bit in a GDT entry could have anything to do
with it. I'm assuming it happens even without Andy's patch?

  Linus


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Linus Torvalds
On Tue, Mar 21, 2017 at 11:16 AM, Thomas Garnier  wrote:
> This error happens even with Andy TLS fix on 32-bit (GDT is on fixmap
> but not readonly). I am looking into it.
>
> KVM internal error. Suberror: 3
> extra data[0]: 8b0e
> extra data[1]: 31

If I read that right, it's extra data[1] 0x31, which EXIT_REASON_EPT_MISCONFIG.

I'm not seeing how the A bit in a GDT entry could have anything to do
with it. I'm assuming it happens even without Andy's patch?

  Linus


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Thomas Garnier
This error happens even with Andy TLS fix on 32-bit (GDT is on fixmap
but not readonly). I am looking into it.

KVM internal error. Suberror: 3
extra data[0]: 8b0e
extra data[1]: 31
EAX=0001 EBX=9f9121f3 ECX=4330b100 EDX=f000
ESI=547e EDI=ffa74000 EBP=42273ef8 ESP=42273ef8
EIP=4105a0a2 EFL=00210002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b   00c0f300 DPL=3 DS   [-WA]
CS =0060   00c09b00 DPL=0 CS32 [-RA]
SS =0068   00c09300 DPL=0 DS   [-WA]
DS =007b   00c0f300 DPL=3 DS   [-WA]
FS =00d8 1232d000  00809300 DPL=0 DS16 [-WA]
GS =00e0 5492d4c0 0018 00409100 DPL=0 DS   [--A]
LDT=   00c0
TR =0080 5492b340 206b 8b00 DPL=0 TSS32-busy
GDT= ffa94000 00ff
IDT= fffba000 07ff
CR0=80050033 CR2=ff874000 CR3=0261 CR4=0690
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=f0 d7 00 b8 01 00 00 00 8b 15 ec 48 29 42 8a 8a 00 50 87 ff <88>
8a 00 50 87 ff 31 c0 5d c3 55 89 e5 57 56 53 e8 8d f0 d7 00 89 c6 a1
00 ff dd 42 05 00

On Mon, Mar 20, 2017 at 9:57 PM, kernel test robot
 wrote:
>
> FYI, we noticed the following commit:
>
> commit: 69218e47994da614e7af600bf06887750ab6657a ("x86: Remap GDT tables in 
> the fixmap section")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> in testcase: trinity
> with following parameters:
>
> runtime: 300s
>
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/
>
>
> on test machine: qemu-system-i386 -enable-kvm -smp 2 -m 320M
>
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
>
>
> +--+++
> |  | f06bdd4001 | 
> 69218e4799 |
> +--+++
> | boot_successes   | 3  | 
> 0  |
> | boot_failures| 5  | 
> 8  |
> | BUG:kernel_reboot-without-warning_in_boot_stage  | 4  | 
>|
> | WARNING:at_arch/x86/include/asm/fpu/internal.h:#fpu__restore | 1  | 
>|
> | BUG:kernel_hang_in_boot_stage| 0  | 
> 8  |
> +--+++
>
> [0.00] sysrq: sysrq always enabled.
> [0.00] PID hash table entries: 2048 (order: 1, 8192 bytes)
> [0.00] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
> [0.00] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
> [0.00] Initializing CPU#0
> [0.00] allocated 331644 bytes of page_ext
> [0.00] Initializing HighMem for node 0 (:)
> [0.00] Memory: 267332K/327160K available (14061K kernel code, 1104K 
> rwdata, 4696K rodata, 2460K init, 13208K bss, 59828K reserved, 0K 
> cma-reserved, 0K highmem)
> [0.00] virtual kernel memory layout:
> [0.00] fixmap  : 0xffa74000 - 0xf000   (5676 kB)
> [0.00] pkmap   : 0xff40 - 0xff80   (4096 kB)
> [0.00] vmalloc : 0x547e - 0xff3fe000   (2732 MB)
> [0.00] lowmem  : 0x4000 - 0x53fe   ( 319 MB)
> [0.00]   .init : 0x4236a000 - 0x425d1000   (2460 kB)
> [0.00]   .data : 0x41dbb72c - 0x42368300   (5810 kB)
> [0.00]   .text : 0x4100 - 0x41dbb72c   (14061 kB)
>
>
> Elapsed time: 480
> BUG: kernel hang in boot stage
>
> initrds=(
> /osimage/yocto/yocto-tiny-i386-2016-04-22.cgz
> 
> /lkp/scheduled/vm-lkp-hsw01-yocto-i386-25/trinity-300s-yocto-tiny-i386-2016-04-22.cgz-69218e47994da614e7af600bf06887750ab6657a-20170319-117821-bls48g-0.cgz
> /lkp/lkp/lkp-i386.cgz
>
>
> To reproduce:
>
> git clone https://github.com/01org/lkp-tests.git
> cd lkp-tests
> bin/lkp qemu -k  job-script  # job-script is attached in 
> this email
>
>
>
> Thanks,
> Xiaolong



-- 
Thomas


Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage

2017-03-21 Thread Thomas Garnier
This error happens even with Andy TLS fix on 32-bit (GDT is on fixmap
but not readonly). I am looking into it.

KVM internal error. Suberror: 3
extra data[0]: 8b0e
extra data[1]: 31
EAX=0001 EBX=9f9121f3 ECX=4330b100 EDX=f000
ESI=547e EDI=ffa74000 EBP=42273ef8 ESP=42273ef8
EIP=4105a0a2 EFL=00210002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b   00c0f300 DPL=3 DS   [-WA]
CS =0060   00c09b00 DPL=0 CS32 [-RA]
SS =0068   00c09300 DPL=0 DS   [-WA]
DS =007b   00c0f300 DPL=3 DS   [-WA]
FS =00d8 1232d000  00809300 DPL=0 DS16 [-WA]
GS =00e0 5492d4c0 0018 00409100 DPL=0 DS   [--A]
LDT=   00c0
TR =0080 5492b340 206b 8b00 DPL=0 TSS32-busy
GDT= ffa94000 00ff
IDT= fffba000 07ff
CR0=80050033 CR2=ff874000 CR3=0261 CR4=0690
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=f0 d7 00 b8 01 00 00 00 8b 15 ec 48 29 42 8a 8a 00 50 87 ff <88>
8a 00 50 87 ff 31 c0 5d c3 55 89 e5 57 56 53 e8 8d f0 d7 00 89 c6 a1
00 ff dd 42 05 00

On Mon, Mar 20, 2017 at 9:57 PM, kernel test robot
 wrote:
>
> FYI, we noticed the following commit:
>
> commit: 69218e47994da614e7af600bf06887750ab6657a ("x86: Remap GDT tables in 
> the fixmap section")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> in testcase: trinity
> with following parameters:
>
> runtime: 300s
>
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/
>
>
> on test machine: qemu-system-i386 -enable-kvm -smp 2 -m 320M
>
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
>
>
> +--+++
> |  | f06bdd4001 | 
> 69218e4799 |
> +--+++
> | boot_successes   | 3  | 
> 0  |
> | boot_failures| 5  | 
> 8  |
> | BUG:kernel_reboot-without-warning_in_boot_stage  | 4  | 
>|
> | WARNING:at_arch/x86/include/asm/fpu/internal.h:#fpu__restore | 1  | 
>|
> | BUG:kernel_hang_in_boot_stage| 0  | 
> 8  |
> +--+++
>
> [0.00] sysrq: sysrq always enabled.
> [0.00] PID hash table entries: 2048 (order: 1, 8192 bytes)
> [0.00] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
> [0.00] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
> [0.00] Initializing CPU#0
> [0.00] allocated 331644 bytes of page_ext
> [0.00] Initializing HighMem for node 0 (:)
> [0.00] Memory: 267332K/327160K available (14061K kernel code, 1104K 
> rwdata, 4696K rodata, 2460K init, 13208K bss, 59828K reserved, 0K 
> cma-reserved, 0K highmem)
> [0.00] virtual kernel memory layout:
> [0.00] fixmap  : 0xffa74000 - 0xf000   (5676 kB)
> [0.00] pkmap   : 0xff40 - 0xff80   (4096 kB)
> [0.00] vmalloc : 0x547e - 0xff3fe000   (2732 MB)
> [0.00] lowmem  : 0x4000 - 0x53fe   ( 319 MB)
> [0.00]   .init : 0x4236a000 - 0x425d1000   (2460 kB)
> [0.00]   .data : 0x41dbb72c - 0x42368300   (5810 kB)
> [0.00]   .text : 0x4100 - 0x41dbb72c   (14061 kB)
>
>
> Elapsed time: 480
> BUG: kernel hang in boot stage
>
> initrds=(
> /osimage/yocto/yocto-tiny-i386-2016-04-22.cgz
> 
> /lkp/scheduled/vm-lkp-hsw01-yocto-i386-25/trinity-300s-yocto-tiny-i386-2016-04-22.cgz-69218e47994da614e7af600bf06887750ab6657a-20170319-117821-bls48g-0.cgz
> /lkp/lkp/lkp-i386.cgz
>
>
> To reproduce:
>
> git clone https://github.com/01org/lkp-tests.git
> cd lkp-tests
> bin/lkp qemu -k  job-script  # job-script is attached in 
> this email
>
>
>
> Thanks,
> Xiaolong



-- 
Thomas