Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-21 Thread Guenter Roeck
On Mon, Aug 20, 2018 at 01:18:43PM -0700, Andi Kleen wrote:
> On Mon, Aug 20, 2018 at 11:03:53AM -0700, Andi Kleen wrote:
> > On Mon, Aug 20, 2018 at 06:29:38PM +0200, Michal Hocko wrote:
> > > On Fri 17-08-18 15:27:33, Guenter Roeck wrote:
> > > > Hi,
> > > > 
> > > > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and 
> > > > v4.9.121
> > > > with CONFIG_TRANSPARENT_HUGEPAGE=y, 
> > > > CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.
> > > 
> > > Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page()
> > > macros"). I do not see it in stable 4.4 tree and it has been introduced
> > > much later in 4.14. This one gave us quite some headache because it is
> > > s easy to overlook.
> > 
> > Good catch!
> > 
> > I tested that with 4.9 and backporting the patch indeed fixes the
> > syzcaller test case running in a KVM VM. Backported patch appended.
> 
> Tested on 4.4 too and it fixes the syzkaller test case there too.
> 

Confirmed that the problem is fixed in v4.4.151-rc1 and v4.9.123-rc1.

Thanks a lot for tracking this down and backporting the fix!

Guenter


Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-21 Thread Guenter Roeck
On Mon, Aug 20, 2018 at 01:18:43PM -0700, Andi Kleen wrote:
> On Mon, Aug 20, 2018 at 11:03:53AM -0700, Andi Kleen wrote:
> > On Mon, Aug 20, 2018 at 06:29:38PM +0200, Michal Hocko wrote:
> > > On Fri 17-08-18 15:27:33, Guenter Roeck wrote:
> > > > Hi,
> > > > 
> > > > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and 
> > > > v4.9.121
> > > > with CONFIG_TRANSPARENT_HUGEPAGE=y, 
> > > > CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.
> > > 
> > > Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page()
> > > macros"). I do not see it in stable 4.4 tree and it has been introduced
> > > much later in 4.14. This one gave us quite some headache because it is
> > > s easy to overlook.
> > 
> > Good catch!
> > 
> > I tested that with 4.9 and backporting the patch indeed fixes the
> > syzcaller test case running in a KVM VM. Backported patch appended.
> 
> Tested on 4.4 too and it fixes the syzkaller test case there too.
> 

Confirmed that the problem is fixed in v4.4.151-rc1 and v4.9.123-rc1.

Thanks a lot for tracking this down and backporting the fix!

Guenter


Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-20 Thread Andi Kleen
On Mon, Aug 20, 2018 at 11:03:53AM -0700, Andi Kleen wrote:
> On Mon, Aug 20, 2018 at 06:29:38PM +0200, Michal Hocko wrote:
> > On Fri 17-08-18 15:27:33, Guenter Roeck wrote:
> > > Hi,
> > > 
> > > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121
> > > with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.
> > 
> > Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page()
> > macros"). I do not see it in stable 4.4 tree and it has been introduced
> > much later in 4.14. This one gave us quite some headache because it is
> > s easy to overlook.
> 
> Good catch!
> 
> I tested that with 4.9 and backporting the patch indeed fixes the
> syzcaller test case running in a KVM VM. Backported patch appended.

Tested on 4.4 too and it fixes the syzkaller test case there too.

-Andi


Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-20 Thread Andi Kleen
On Mon, Aug 20, 2018 at 11:03:53AM -0700, Andi Kleen wrote:
> On Mon, Aug 20, 2018 at 06:29:38PM +0200, Michal Hocko wrote:
> > On Fri 17-08-18 15:27:33, Guenter Roeck wrote:
> > > Hi,
> > > 
> > > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121
> > > with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.
> > 
> > Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page()
> > macros"). I do not see it in stable 4.4 tree and it has been introduced
> > much later in 4.14. This one gave us quite some headache because it is
> > s easy to overlook.
> 
> Good catch!
> 
> I tested that with 4.9 and backporting the patch indeed fixes the
> syzcaller test case running in a KVM VM. Backported patch appended.

Tested on 4.4 too and it fixes the syzkaller test case there too.

-Andi


Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-20 Thread Michal Hocko
On Mon 20-08-18 11:03:53, Andi Kleen wrote:
> On Mon, Aug 20, 2018 at 06:29:38PM +0200, Michal Hocko wrote:
> > On Fri 17-08-18 15:27:33, Guenter Roeck wrote:
> > > Hi,
> > > 
> > > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121
> > > with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.
> > 
> > Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page()
> > macros"). I do not see it in stable 4.4 tree and it has been introduced
> > much later in 4.14. This one gave us quite some headache because it is
> > s easy to overlook.
> 
> Good catch!
> 
> I tested that with 4.9 and backporting the patch indeed fixes the
> syzcaller test case running in a KVM VM. Backported patch appended.
> 
> Should probably go into 4.4 and 4.9.
> 
> Cannot explain the 4.17 report unfortunately.

I haven't seen that one yet and likely won't get to it tomorrow as well
but I would start looking for a direct pte_val usage. We have had som
out of tree xen code which was doing exactly this. Not really easy to
find by a code inspection.
-- 
Michal Hocko
SUSE Labs


Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-20 Thread Michal Hocko
On Mon 20-08-18 11:03:53, Andi Kleen wrote:
> On Mon, Aug 20, 2018 at 06:29:38PM +0200, Michal Hocko wrote:
> > On Fri 17-08-18 15:27:33, Guenter Roeck wrote:
> > > Hi,
> > > 
> > > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121
> > > with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.
> > 
> > Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page()
> > macros"). I do not see it in stable 4.4 tree and it has been introduced
> > much later in 4.14. This one gave us quite some headache because it is
> > s easy to overlook.
> 
> Good catch!
> 
> I tested that with 4.9 and backporting the patch indeed fixes the
> syzcaller test case running in a KVM VM. Backported patch appended.
> 
> Should probably go into 4.4 and 4.9.
> 
> Cannot explain the 4.17 report unfortunately.

I haven't seen that one yet and likely won't get to it tomorrow as well
but I would start looking for a direct pte_val usage. We have had som
out of tree xen code which was doing exactly this. Not really easy to
find by a code inspection.
-- 
Michal Hocko
SUSE Labs


Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-20 Thread Andi Kleen
On Mon, Aug 20, 2018 at 06:29:38PM +0200, Michal Hocko wrote:
> On Fri 17-08-18 15:27:33, Guenter Roeck wrote:
> > Hi,
> > 
> > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121
> > with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.
> 
> Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page()
> macros"). I do not see it in stable 4.4 tree and it has been introduced
> much later in 4.14. This one gave us quite some headache because it is
> s easy to overlook.

Good catch!

I tested that with 4.9 and backporting the patch indeed fixes the
syzcaller test case running in a KVM VM. Backported patch appended.

Should probably go into 4.4 and 4.9.

Cannot explain the 4.17 report unfortunately.

I'll resend it as an email too

---

x86/mm: Simplify p[g4um]d_page() macros

Create a pgd_pfn() macro similar to the p[4um]d_pfn() macros and then
use the p[g4um]d_pfn() macros in the p[g4um]d_page() macros instead of
duplicating the code.

[Needed to fix crashes caused by earlier backports in 4.9 stable, likely 4.4 
too]

Signed-off-by: Tom Lendacky 
Reviewed-by: Thomas Gleixner 
Reviewed-by: Borislav Petkov 
Cc: Alexander Potapenko 
Cc: Andrey Ryabinin 
Cc: Andy Lutomirski 
Cc: Arnd Bergmann 
Cc: Borislav Petkov 
Cc: Brijesh Singh 
Cc: Dave Young 
Cc: Dmitry Vyukov 
Cc: Jonathan Corbet 
Cc: Konrad Rzeszutek Wilk 
Cc: Larry Woodman 
Cc: Linus Torvalds 
Cc: Matt Fleming 
Cc: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
Cc: Radim Krčmář 
Cc: Rik van Riel 
Cc: Toshimitsu Kani 
Cc: kasan-...@googlegroups.com
Cc: k...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux...@kvack.org
Link: 
http://lkml.kernel.org/r/e61eb533a6d0aac941db2723d8aa63ef6b882dee.1500319216.git.thomas.lenda...@amd.com
[Backported to 4.9 stable by AK, suggested by Michael Hocko]
Signed-off-by: Ingo Molnar 
Signed-off-by: Andi Kleen 

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 4de6c282c02a..68a55273ce0f 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -173,6 +173,11 @@ static inline unsigned long pud_pfn(pud_t pud)
return (pfn & pud_pfn_mask(pud)) >> PAGE_SHIFT;
 }
 
+static inline unsigned long pgd_pfn(pgd_t pgd)
+{
+   return (pgd_val(pgd) & PTE_PFN_MASK) >> PAGE_SHIFT;
+}
+
 #define pte_page(pte)  pfn_to_page(pte_pfn(pte))
 
 static inline int pmd_large(pmd_t pte)
@@ -578,8 +583,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pmd_page(pmd)  \
-   pfn_to_page((pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT)
+#define pmd_page(pmd)  pfn_to_page(pmd_pfn(pmd))
 
 /*
  * the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD]
@@ -647,8 +651,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pud_page(pud)  \
-   pfn_to_page((pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT)
+#define pud_page(pud)  pfn_to_page(pud_pfn(pud))
 
 /* Find an entry in the second-level page table.. */
 static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
@@ -688,7 +691,7 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pgd_page(pgd)  pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT)
+#define pgd_page(pgd)  pfn_to_page(pgd_pfn(pgd))
 
 /* to find an entry in a page-table-directory. */
 static inline unsigned long pud_index(unsigned long address)


Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-20 Thread Andi Kleen
On Mon, Aug 20, 2018 at 06:29:38PM +0200, Michal Hocko wrote:
> On Fri 17-08-18 15:27:33, Guenter Roeck wrote:
> > Hi,
> > 
> > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121
> > with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.
> 
> Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page()
> macros"). I do not see it in stable 4.4 tree and it has been introduced
> much later in 4.14. This one gave us quite some headache because it is
> s easy to overlook.

Good catch!

I tested that with 4.9 and backporting the patch indeed fixes the
syzcaller test case running in a KVM VM. Backported patch appended.

Should probably go into 4.4 and 4.9.

Cannot explain the 4.17 report unfortunately.

I'll resend it as an email too

---

x86/mm: Simplify p[g4um]d_page() macros

Create a pgd_pfn() macro similar to the p[4um]d_pfn() macros and then
use the p[g4um]d_pfn() macros in the p[g4um]d_page() macros instead of
duplicating the code.

[Needed to fix crashes caused by earlier backports in 4.9 stable, likely 4.4 
too]

Signed-off-by: Tom Lendacky 
Reviewed-by: Thomas Gleixner 
Reviewed-by: Borislav Petkov 
Cc: Alexander Potapenko 
Cc: Andrey Ryabinin 
Cc: Andy Lutomirski 
Cc: Arnd Bergmann 
Cc: Borislav Petkov 
Cc: Brijesh Singh 
Cc: Dave Young 
Cc: Dmitry Vyukov 
Cc: Jonathan Corbet 
Cc: Konrad Rzeszutek Wilk 
Cc: Larry Woodman 
Cc: Linus Torvalds 
Cc: Matt Fleming 
Cc: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
Cc: Radim Krčmář 
Cc: Rik van Riel 
Cc: Toshimitsu Kani 
Cc: kasan-...@googlegroups.com
Cc: k...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux...@kvack.org
Link: 
http://lkml.kernel.org/r/e61eb533a6d0aac941db2723d8aa63ef6b882dee.1500319216.git.thomas.lenda...@amd.com
[Backported to 4.9 stable by AK, suggested by Michael Hocko]
Signed-off-by: Ingo Molnar 
Signed-off-by: Andi Kleen 

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 4de6c282c02a..68a55273ce0f 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -173,6 +173,11 @@ static inline unsigned long pud_pfn(pud_t pud)
return (pfn & pud_pfn_mask(pud)) >> PAGE_SHIFT;
 }
 
+static inline unsigned long pgd_pfn(pgd_t pgd)
+{
+   return (pgd_val(pgd) & PTE_PFN_MASK) >> PAGE_SHIFT;
+}
+
 #define pte_page(pte)  pfn_to_page(pte_pfn(pte))
 
 static inline int pmd_large(pmd_t pte)
@@ -578,8 +583,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pmd_page(pmd)  \
-   pfn_to_page((pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT)
+#define pmd_page(pmd)  pfn_to_page(pmd_pfn(pmd))
 
 /*
  * the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD]
@@ -647,8 +651,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pud_page(pud)  \
-   pfn_to_page((pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT)
+#define pud_page(pud)  pfn_to_page(pud_pfn(pud))
 
 /* Find an entry in the second-level page table.. */
 static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
@@ -688,7 +691,7 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pgd_page(pgd)  pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT)
+#define pgd_page(pgd)  pfn_to_page(pgd_pfn(pgd))
 
 /* to find an entry in a page-table-directory. */
 static inline unsigned long pud_index(unsigned long address)


Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-20 Thread Michal Hocko
On Fri 17-08-18 15:27:33, Guenter Roeck wrote:
> Hi,
> 
> the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121
> with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.

Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page()
macros"). I do not see it in stable 4.4 tree and it has been introduced
much later in 4.14. This one gave us quite some headache because it is
s easy to overlook.
-- 
Michal Hocko
SUSE Labs


Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-20 Thread Michal Hocko
On Fri 17-08-18 15:27:33, Guenter Roeck wrote:
> Hi,
> 
> the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121
> with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.

Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page()
macros"). I do not see it in stable 4.4 tree and it has been introduced
much later in 4.14. This one gave us quite some headache because it is
s easy to overlook.
-- 
Michal Hocko
SUSE Labs


Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-17 Thread Andi Kleen
> Plus I'd have expected the problem to have been in mainline too, and
> apparently it's just the 4.4 and 4.9 backports.

There's another problem in 4.17, but not 4.18, see 
https://bugzilla.redhat.com/show_bug.cgi?id=1618792

Could be the same or different.

-Andi

> 
> Your test-case does have mprotect with PROT_NONE. Which together with
> that mask that *might* be PHYSICAL_PMD_PAGE_MASK makes me think it
> might be related.
> 
>  Linus


Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-17 Thread Andi Kleen
> Plus I'd have expected the problem to have been in mainline too, and
> apparently it's just the 4.4 and 4.9 backports.

There's another problem in 4.17, but not 4.18, see 
https://bugzilla.redhat.com/show_bug.cgi?id=1618792

Could be the same or different.

-Andi

> 
> Your test-case does have mprotect with PROT_NONE. Which together with
> that mask that *might* be PHYSICAL_PMD_PAGE_MASK makes me think it
> might be related.
> 
>  Linus


Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-17 Thread Guenter Roeck

On 08/17/2018 05:25 PM, Linus Torvalds wrote:

On Fri, Aug 17, 2018 at 3:27 PM Guenter Roeck  wrote:


[6.649970] random: crng init done
[6.689002] BUG: unable to handle kernel paging request at eafffa1a0020


Hmm. Lots of bits set.


[6.689082] RIP: 0010:[]  [] 
page_remove_rmap+0x10/0x230
[6.689082] RSP: 0018:c97abc18  EFLAGS: 0296
[6.689082] RAX: ea0005e58000 RBX: eafffa1a RCX: 2020
[6.689082] RDX: 3fe0 RSI: 0001 RDI: eafffa1a


Is that RDX value the same value as PHYSICAL_PMD_PAGE_MASK?

If I did my math right, it would be, if your CPU has 46 bits of
physical memory. Might that be the case?


Yes.


The reason I mention that is because we had the bug with spurious
inversion of the zero pte/pmd, fixed by

   f19f5c49bbc3 ("x86/speculation/l1tf: Exempt zeroed PTEs from inversion")


I applied that patch, but it didn't help. I get exactly the same crash and
traceback.


and that would make a zeroed pmd entry be inverted by
PHYSICAL_PMD_PAGE_MASK, and then you get odd garbage page pointers
etc.

Maybe. I could have gotten the math wrong too, but it sounds like the
register contents _potentially_ might match up with something like
this, and then we'd zap a bogus hugepage because of some confusion.

Although then I'd have expected the bisection to hit
"x86/speculation/l1tf: Invert all not present mappings" instead of the
one you hit, so I don't know.

Plus I'd have expected the problem to have been in mainline too, and
apparently it's just the 4.4 and 4.9 backports.


Personally I suspect that something went wrong or is missing in the backport
from 4.14 to 4.9. 5-level paging was introduced in between, and thp support
was extended to support additional architectures. With all those changes,
it is easy to miss something. Only I have no idea what that might be.

Guenter



Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-17 Thread Guenter Roeck

On 08/17/2018 05:25 PM, Linus Torvalds wrote:

On Fri, Aug 17, 2018 at 3:27 PM Guenter Roeck  wrote:


[6.649970] random: crng init done
[6.689002] BUG: unable to handle kernel paging request at eafffa1a0020


Hmm. Lots of bits set.


[6.689082] RIP: 0010:[]  [] 
page_remove_rmap+0x10/0x230
[6.689082] RSP: 0018:c97abc18  EFLAGS: 0296
[6.689082] RAX: ea0005e58000 RBX: eafffa1a RCX: 2020
[6.689082] RDX: 3fe0 RSI: 0001 RDI: eafffa1a


Is that RDX value the same value as PHYSICAL_PMD_PAGE_MASK?

If I did my math right, it would be, if your CPU has 46 bits of
physical memory. Might that be the case?


Yes.


The reason I mention that is because we had the bug with spurious
inversion of the zero pte/pmd, fixed by

   f19f5c49bbc3 ("x86/speculation/l1tf: Exempt zeroed PTEs from inversion")


I applied that patch, but it didn't help. I get exactly the same crash and
traceback.


and that would make a zeroed pmd entry be inverted by
PHYSICAL_PMD_PAGE_MASK, and then you get odd garbage page pointers
etc.

Maybe. I could have gotten the math wrong too, but it sounds like the
register contents _potentially_ might match up with something like
this, and then we'd zap a bogus hugepage because of some confusion.

Although then I'd have expected the bisection to hit
"x86/speculation/l1tf: Invert all not present mappings" instead of the
one you hit, so I don't know.

Plus I'd have expected the problem to have been in mainline too, and
apparently it's just the 4.4 and 4.9 backports.


Personally I suspect that something went wrong or is missing in the backport
from 4.14 to 4.9. 5-level paging was introduced in between, and thp support
was extended to support additional architectures. With all those changes,
it is easy to miss something. Only I have no idea what that might be.

Guenter



Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-17 Thread Linus Torvalds
On Fri, Aug 17, 2018 at 3:27 PM Guenter Roeck  wrote:
>
> [6.649970] random: crng init done
> [6.689002] BUG: unable to handle kernel paging request at eafffa1a0020

Hmm. Lots of bits set.

> [6.689082] RIP: 0010:[]  [] 
> page_remove_rmap+0x10/0x230
> [6.689082] RSP: 0018:c97abc18  EFLAGS: 0296
> [6.689082] RAX: ea0005e58000 RBX: eafffa1a RCX: 
> 2020
> [6.689082] RDX: 3fe0 RSI: 0001 RDI: 
> eafffa1a

Is that RDX value the same value as PHYSICAL_PMD_PAGE_MASK?

If I did my math right, it would be, if your CPU has 46 bits of
physical memory. Might that be the case?

The reason I mention that is because we had the bug with spurious
inversion of the zero pte/pmd, fixed by

  f19f5c49bbc3 ("x86/speculation/l1tf: Exempt zeroed PTEs from inversion")

and that would make a zeroed pmd entry be inverted by
PHYSICAL_PMD_PAGE_MASK, and then you get odd garbage page pointers
etc.

Maybe. I could have gotten the math wrong too, but it sounds like the
register contents _potentially_ might match up with something like
this, and then we'd zap a bogus hugepage because of some confusion.

Although then I'd have expected the bisection to hit
"x86/speculation/l1tf: Invert all not present mappings" instead of the
one you hit, so I don't know.

Plus I'd have expected the problem to have been in mainline too, and
apparently it's just the 4.4 and 4.9 backports.

Your test-case does have mprotect with PROT_NONE. Which together with
that mask that *might* be PHYSICAL_PMD_PAGE_MASK makes me think it
might be related.

 Linus


Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-17 Thread Linus Torvalds
On Fri, Aug 17, 2018 at 3:27 PM Guenter Roeck  wrote:
>
> [6.649970] random: crng init done
> [6.689002] BUG: unable to handle kernel paging request at eafffa1a0020

Hmm. Lots of bits set.

> [6.689082] RIP: 0010:[]  [] 
> page_remove_rmap+0x10/0x230
> [6.689082] RSP: 0018:c97abc18  EFLAGS: 0296
> [6.689082] RAX: ea0005e58000 RBX: eafffa1a RCX: 
> 2020
> [6.689082] RDX: 3fe0 RSI: 0001 RDI: 
> eafffa1a

Is that RDX value the same value as PHYSICAL_PMD_PAGE_MASK?

If I did my math right, it would be, if your CPU has 46 bits of
physical memory. Might that be the case?

The reason I mention that is because we had the bug with spurious
inversion of the zero pte/pmd, fixed by

  f19f5c49bbc3 ("x86/speculation/l1tf: Exempt zeroed PTEs from inversion")

and that would make a zeroed pmd entry be inverted by
PHYSICAL_PMD_PAGE_MASK, and then you get odd garbage page pointers
etc.

Maybe. I could have gotten the math wrong too, but it sounds like the
register contents _potentially_ might match up with something like
this, and then we'd zap a bogus hugepage because of some confusion.

Although then I'd have expected the bisection to hit
"x86/speculation/l1tf: Invert all not present mappings" instead of the
one you hit, so I don't know.

Plus I'd have expected the problem to have been in mainline too, and
apparently it's just the 4.4 and 4.9 backports.

Your test-case does have mprotect with PROT_NONE. Which together with
that mask that *might* be PHYSICAL_PMD_PAGE_MASK makes me think it
might be related.

 Linus


Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-17 Thread Guenter Roeck
Hi,

the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121
with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.

[6.649970] random: crng init done
[6.689002] BUG: unable to handle kernel paging request at eafffa1a0020
[6.689082] IP: [] page_remove_rmap+0x10/0x230
[6.689082] PGD 0 [6.689082] 
[6.689082] Oops:  [#1] SMP
[6.689082] Modules linked in:
[6.689082] CPU: 3 PID: 1132 Comm: mmtest Not tainted 4.9.121 #16
[6.689082] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/204
[6.689082] task: 88017a558c40 task.stack: c97a8000
[6.689082] RIP: 0010:[]  [] 
page_remove_rmap+0x10/0x230
[6.689082] RSP: 0018:c97abc18  EFLAGS: 0296
[6.689082] RAX: ea0005e58000 RBX: eafffa1a RCX: 2020
[6.689082] RDX: 3fe0 RSI: 0001 RDI: eafffa1a
[6.689082] RBP: c97abc38 R08:  R09: 2080
[6.689082] R10: ea0005e51ec0 R11:  R12: eafffa1a
[6.689082] R13: c97abdc0 R14: 880179426808 R15: c97abdc0
[6.689082] FS:  () GS:88017fd8() 
knlGS:
[6.689082] CS:  0010 DS:  ES:  CR0: 80050033
[6.689082] CR2: eafffa1a0020 CR3: 00017a3f8000 CR4: 00340670
[6.689082] Stack:
[6.689082]  81138517 ea0005e50980 eafffa1a 
c97abdc0
[6.689082]  c97abc68 8118d52c 880179426808 
2020
[6.689082]  c97abdc0 c97abdc0 c97abd40 
8115e270
[6.689082] Call Trace:
[6.689082]  [] ? __alloc_pages_nodemask+0xd7/0x210
[6.689082]  [] zap_huge_pmd+0xec/0x2a0
[6.689082]  [] unmap_page_range+0x7d0/0x8d0
[6.689082]  [] unmap_single_vma+0x54/0xd0
[6.689082]  [] unmap_vmas+0x4c/0xa0
[6.689082]  [] exit_mmap+0xa7/0x130
[6.689082]  [] ? __khugepaged_exit+0x6f/0x100
[6.689082]  [] mmput+0x38/0x100
[6.689082]  [] do_exit+0x25c/0xb10
[6.689082]  [] do_group_exit+0x3e/0xa0
[6.689082]  [] SyS_exit_group+0xf/0x10
[6.689082]  [] do_syscall_64+0x5c/0xc0
[6.689082]  [] entry_SYSCALL_64_after_swapgs+0x58/0xc6
[6.689082] Code: 77 ff ff ff eb b8 be 13 00 00 00 4c 89 e7 e8 d8 40 fe ff 
48 63 d3 eb b3 0f 1f 00 55 48 89 e5 41 55 41 54 53 48 
[6.689082] RIP  [] page_remove_rmap+0x10/0x230
[6.689082]  RSP 
[6.689082] CR2: eafffa1a0020
[6.689082] ---[ end trace 62ac9ace190510cd ]---
[6.689082] Fixing recursive fault but reboot is needed!

A test program to trigger the crash is attached, as are bisect results
and some additional information.

Upstream commit 19f5c49bbc3 ("x86/speculation/l1tf: Exempt zeroed PTEs
from inversion") does not fix the problem.

Many thanks to the syzcaller team for finding the problem and for providing
a reproducer.

Any help to track down the problem would be appreciated. This is out of my
league.

Thanks,
Guenter

---
#define _GNU_SOURCE

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main()
{
  syscall(__NR_mmap, 0x2000, 0x100, 3, 0x32, -1, 0);
  syscall(__NR_madvise, 0x20a93000, 0x4000, 0xe);
  syscall(__NR_mremap, 0x20a96000, 0x1000, 0x80, 3, 0x2013);
  syscall(__NR_sigaltstack, 0x20341000, 0x20ef9ff8);
  syscall(__NR_mprotect, 0x2000, 0x80, 0);
  return 0;
}

---
# bad: [93e02ae4200184bab43ce29966e895826a756a37] Linux 4.9.120
# good: [8f21ecb4249a0914aea08bef1befca9019a3b44b] Linux 4.9.119
git bisect start 'v4.9.120' 'v4.9.119'
# bad: [a0695af3406ae2a08184bd47a9e948fe6f9858b9] x86/KVM: Warn user if KVM is 
loaded SMT and L1TF CPU bug being present
git bisect bad a0695af3406ae2a08184bd47a9e948fe6f9858b9
# good: [1a4922e0f01d08a4789b1e17b195bc30bf234a3b] mm: x86: move 
_PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1
git bisect good 1a4922e0f01d08a4789b1e17b195bc30bf234a3b
# bad: [e0439285c628dea71517a1e77cab805d9134f898] x86/cpu: Remove the pointless 
CPU printout
git bisect bad e0439285c628dea71517a1e77cab805d9134f898
# bad: [e3923475ebb1b503668dfdb3ba90e2ebd46931e6] x86/speculation/l1tf: Limit 
swap file size to MAX_PA/2
git bisect bad e3923475ebb1b503668dfdb3ba90e2ebd46931e6
# bad: [33182fe97add6e83c195e9d0f7297a6499563b52] x86/speculation/l1tf: Protect 
PROT_NONE PTEs against speculation
git bisect bad 33182fe97add6e83c195e9d0f7297a6499563b52
# good: [60712274887fcd4ad5eb8e01796022b6b202143c] x86/speculation/l1tf: 
Protect swap entries against L1TF
git bisect good 60712274887fcd4ad5eb8e01796022b6b202143c
# first bad commit: [33182fe97add6e83c195e9d0f7297a6499563b52]
# x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation

---
qemu command line:

qemu-system-x86_64 \
-kernel arch/x86/boot/bzImage -M q35 -cpu Broadwell-noTSX \
-no-reboot -m 4G -smp 4 \
-drive 

Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled

2018-08-17 Thread Guenter Roeck
Hi,

the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121
with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.

[6.649970] random: crng init done
[6.689002] BUG: unable to handle kernel paging request at eafffa1a0020
[6.689082] IP: [] page_remove_rmap+0x10/0x230
[6.689082] PGD 0 [6.689082] 
[6.689082] Oops:  [#1] SMP
[6.689082] Modules linked in:
[6.689082] CPU: 3 PID: 1132 Comm: mmtest Not tainted 4.9.121 #16
[6.689082] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/204
[6.689082] task: 88017a558c40 task.stack: c97a8000
[6.689082] RIP: 0010:[]  [] 
page_remove_rmap+0x10/0x230
[6.689082] RSP: 0018:c97abc18  EFLAGS: 0296
[6.689082] RAX: ea0005e58000 RBX: eafffa1a RCX: 2020
[6.689082] RDX: 3fe0 RSI: 0001 RDI: eafffa1a
[6.689082] RBP: c97abc38 R08:  R09: 2080
[6.689082] R10: ea0005e51ec0 R11:  R12: eafffa1a
[6.689082] R13: c97abdc0 R14: 880179426808 R15: c97abdc0
[6.689082] FS:  () GS:88017fd8() 
knlGS:
[6.689082] CS:  0010 DS:  ES:  CR0: 80050033
[6.689082] CR2: eafffa1a0020 CR3: 00017a3f8000 CR4: 00340670
[6.689082] Stack:
[6.689082]  81138517 ea0005e50980 eafffa1a 
c97abdc0
[6.689082]  c97abc68 8118d52c 880179426808 
2020
[6.689082]  c97abdc0 c97abdc0 c97abd40 
8115e270
[6.689082] Call Trace:
[6.689082]  [] ? __alloc_pages_nodemask+0xd7/0x210
[6.689082]  [] zap_huge_pmd+0xec/0x2a0
[6.689082]  [] unmap_page_range+0x7d0/0x8d0
[6.689082]  [] unmap_single_vma+0x54/0xd0
[6.689082]  [] unmap_vmas+0x4c/0xa0
[6.689082]  [] exit_mmap+0xa7/0x130
[6.689082]  [] ? __khugepaged_exit+0x6f/0x100
[6.689082]  [] mmput+0x38/0x100
[6.689082]  [] do_exit+0x25c/0xb10
[6.689082]  [] do_group_exit+0x3e/0xa0
[6.689082]  [] SyS_exit_group+0xf/0x10
[6.689082]  [] do_syscall_64+0x5c/0xc0
[6.689082]  [] entry_SYSCALL_64_after_swapgs+0x58/0xc6
[6.689082] Code: 77 ff ff ff eb b8 be 13 00 00 00 4c 89 e7 e8 d8 40 fe ff 
48 63 d3 eb b3 0f 1f 00 55 48 89 e5 41 55 41 54 53 48 
[6.689082] RIP  [] page_remove_rmap+0x10/0x230
[6.689082]  RSP 
[6.689082] CR2: eafffa1a0020
[6.689082] ---[ end trace 62ac9ace190510cd ]---
[6.689082] Fixing recursive fault but reboot is needed!

A test program to trigger the crash is attached, as are bisect results
and some additional information.

Upstream commit 19f5c49bbc3 ("x86/speculation/l1tf: Exempt zeroed PTEs
from inversion") does not fix the problem.

Many thanks to the syzcaller team for finding the problem and for providing
a reproducer.

Any help to track down the problem would be appreciated. This is out of my
league.

Thanks,
Guenter

---
#define _GNU_SOURCE

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main()
{
  syscall(__NR_mmap, 0x2000, 0x100, 3, 0x32, -1, 0);
  syscall(__NR_madvise, 0x20a93000, 0x4000, 0xe);
  syscall(__NR_mremap, 0x20a96000, 0x1000, 0x80, 3, 0x2013);
  syscall(__NR_sigaltstack, 0x20341000, 0x20ef9ff8);
  syscall(__NR_mprotect, 0x2000, 0x80, 0);
  return 0;
}

---
# bad: [93e02ae4200184bab43ce29966e895826a756a37] Linux 4.9.120
# good: [8f21ecb4249a0914aea08bef1befca9019a3b44b] Linux 4.9.119
git bisect start 'v4.9.120' 'v4.9.119'
# bad: [a0695af3406ae2a08184bd47a9e948fe6f9858b9] x86/KVM: Warn user if KVM is 
loaded SMT and L1TF CPU bug being present
git bisect bad a0695af3406ae2a08184bd47a9e948fe6f9858b9
# good: [1a4922e0f01d08a4789b1e17b195bc30bf234a3b] mm: x86: move 
_PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1
git bisect good 1a4922e0f01d08a4789b1e17b195bc30bf234a3b
# bad: [e0439285c628dea71517a1e77cab805d9134f898] x86/cpu: Remove the pointless 
CPU printout
git bisect bad e0439285c628dea71517a1e77cab805d9134f898
# bad: [e3923475ebb1b503668dfdb3ba90e2ebd46931e6] x86/speculation/l1tf: Limit 
swap file size to MAX_PA/2
git bisect bad e3923475ebb1b503668dfdb3ba90e2ebd46931e6
# bad: [33182fe97add6e83c195e9d0f7297a6499563b52] x86/speculation/l1tf: Protect 
PROT_NONE PTEs against speculation
git bisect bad 33182fe97add6e83c195e9d0f7297a6499563b52
# good: [60712274887fcd4ad5eb8e01796022b6b202143c] x86/speculation/l1tf: 
Protect swap entries against L1TF
git bisect good 60712274887fcd4ad5eb8e01796022b6b202143c
# first bad commit: [33182fe97add6e83c195e9d0f7297a6499563b52]
# x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation

---
qemu command line:

qemu-system-x86_64 \
-kernel arch/x86/boot/bzImage -M q35 -cpu Broadwell-noTSX \
-no-reboot -m 4G -smp 4 \
-drive